public inbox for [email protected]
help / color / mirror / Atom feedEliminating SPI / SQL from some RI triggers - take 3
61+ messages / 11 participants
[nested] [flat]
* Eliminating SPI / SQL from some RI triggers - take 3
@ 2024-12-20 04:23 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2024-12-20 04:23 UTC (permalink / raw)
To: pgsql-hackers
Hi,
We discussed $subject at [1] and [2] and I'd like to continue that
work with the hope to commit some part of it for v18.
In short, performing the RI checks for inserts and updates of a
referencing table as direct scans of the PK index results in up to 40%
improvement in their performance, especially when they are done in a
bulk manner as shown in the following example:
create unlogged table pk (a int primary key);
insert into pk select generate_series(1, 10000000);
insert into fk select generate_series(1, 10000000);
On my machine, the last query took 20 seconds with master, whereas 12
seconds with the patches. With master, a significant portion of the
time can be seen spent in ExecutorStart() and ExecutorEnd() on the
plan for the RI query, which adds up as it's done for each row in a
bulk load. Patch avoids that overhead because it calls the index AM
directly.
The patches haven't changed in the basic design since the last update
at [2], though there are few changes:
1. I noticed a few additions to the RI trigger functions the patch
touches, such as those to support temporal foreign keys. I decided to
leave the SQL for temporal queries in place as the plan for those
doesn't look, on a glance, as simple as a simple index scan.
2. As I mentioned in [3], the new way of doing the PK lookup didn't
have a way to recheck the PK tuple after detecting concurrent updates
of the PK, so would cause an error under READ COMMITTED isolation
level. The old way of executing an SQL plan would deal with that
using the EvalPlanQual() mechanism in the executor. In the updated
patch, I've added an equivalent rechecking function that's called in
the same situations as EvalPlanQual() would get called in the old
method.
3. I reordered the patches as Robert suggested at [5]. Mainly because
the patch set includes changes to address a bug where PK lookups could
return incorrect results under the REPEATABLE READ isolation level.
This issue arises because RI lookups on partitioned PK tables
manipulate ActiveSnapshot to pass the snapshot that's used by
find_inheritance_children() to determine the visibility of
detach-pending partitions to these RI lookups. To address this, the
patch set introduces refactoring of the PartitionDesc interface,
included in patch 0001. This refactoring eliminates the need to
manipulate ActiveSnapshot by explicitly passing the correct snapshot
for detach-pending visibility handling. The main patch (0002+0003),
which focuses on improving performance by avoiding SQL queries for RI
checks, builds upon these refactoring changes to pass the snapshot
directly instead of manipulating the ActiveSnapshot. Reordering the
patches this way ensures a logical progression of changes, as Robert
suggested, while avoiding any impression that the bug was introduced
by the ri_triggers.c changes.
However, I need to spend some time addressing Robert's feedback on the
basic design, as outlined at [5]. Specifically, the new PK lookup
function could benefit significantly from caching information rather
than recomputing it for each row. This implies that the PlanCreate
function should create a struct to store reusable information across
PlanExecute calls for different rows being checked.
Beyond implementing these changes, I also need to confirm that the new
plan execution preserves all operations performed by the SQL plan for
the same checks, particularly those affecting user-visible behavior.
I've already verified that permission checks are preserved: revoking
access to the PK table during the checks causes them to fail, as
expected. This behavior is maintained because permission checks are
performed during each execution. The planned changes to separate the
"plan" and "execute" steps should continue to uphold this and other
behaviors that might need to be preserved.
--
Thanks, Amit Langote
[1] Simplifying foreign key/RI checks:
https://www.postgresql.org/message-id/flat/CA%2BHiwqG5e8pk8s7%2B7zhr1Nc_PGyhEdM5f%3DpHkMOdK1RYWXfJsg...
[2] Eliminating SPI from RI triggers - take 2
https://www.postgresql.org/message-id/flat/CA%2BHiwqG5e8pk8s7%2B7zhr1Nc_PGyhEdM5f%3DpHkMOdK1RYWXfJsg...
[3] https://www.postgresql.org/message-id/[email protected]...
[4] https://www.postgresql.org/message-id/CA%2BTgmoa1DCQ0MdojD9o6Ppbfj%3DabXxe4FUkwA4O_6qBHwOMVjw%40mail...
[5] https://www.postgresql.org/message-id/[email protected]...
Attachments:
[application/x-patch] v1-0003-Avoid-using-an-SQL-query-for-some-RI-checks.patch (49.4K, 2-v1-0003-Avoid-using-an-SQL-query-for-some-RI-checks.patch)
download | inline diff:
From 6a2d606abf3fa17c94fa9facbf82f9fdab8135e6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Dec 2024 21:52:32 +0900
Subject: [PATCH v1 3/3] Avoid using an SQL query for some RI checks
For RI triggers that check whether a referenced value exists in the
referenced relation, it is sufficient to scan the foreign key
constraint's unique index directly, instead of issuing an SQL query.
This optimization improves the performance of these checks by nearly
40%, especially for bulk inserts.
This commit builds on the RIPlan infrastructure introduced in the
previous commit. It replaces ri_SqlStringPlanCreate() in
RI_FKey_check() and ri_Check_Pk_Match() with
ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement these checks.
ri_LookupKeyInPkRel() contains logic to directly scan the unique key
index associated with the foreign key constraint.
Additionally, ri_LookupKeyInPkRel() explicitly passes LatestSnapshot
as omit_detached_snapshot to CreatePartitionDirectory(), avoiding
the reliance on setting ActiveSnapshot. The previous approach caused
primary key lookups on partitioned tables to return incorrect results
under REPEATABLE READ isolation level, as demonstrated by a test case
added in commit 00cb86e75d. This issue has now been fixed, and the
buggy output in src/test/isolation/expected/fk-snapshot.out has been
updated to reflect the correct behavior.
Lastly, this commit introduces an isolation test suite to verify
foreign key insertions under concurrent updates to referenced values.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com
---
src/backend/executor/execPartition.c | 167 ++++-
src/backend/executor/nodeLockRows.c | 163 +++--
src/backend/utils/adt/ri_triggers.c | 668 ++++++++++++++----
src/include/executor/execPartition.h | 7 +
src/include/executor/executor.h | 9 +
.../expected/fk-concurrent-pk-upd.out | 58 ++
src/test/isolation/expected/fk-snapshot.out | 4 +-
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 42 ++
src/test/isolation/specs/fk-snapshot.spec | 5 +-
10 files changed, 918 insertions(+), 206 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d26cf20003..3494089b8e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -174,8 +174,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
EState *estate,
Datum *values,
bool *isnull);
-static int get_partition_for_tuple(PartitionDispatch pd, Datum *values,
- bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+ PartitionDesc partdesc,
+ Datum *values, bool *isnull);
static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
Datum *values,
bool *isnull,
@@ -316,7 +317,9 @@ ExecFindPartition(ModifyTableState *mtstate,
* these values, error out.
*/
if (partdesc->nparts == 0 ||
- (partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+ (partidx = get_partition_for_tuple(dispatch->key,
+ dispatch->partdesc,
+ values, isnull)) < 0)
{
char *val_desc;
@@ -1394,12 +1397,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
* found or -1 if none found.
*/
static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+ PartitionDesc partdesc,
+ Datum *values, bool *isnull)
{
int bound_offset = -1;
int part_index = -1;
- PartitionKey key = pd->key;
- PartitionDesc partdesc = pd->partdesc;
PartitionBoundInfo boundinfo = partdesc->boundinfo;
/*
@@ -1606,6 +1609,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
return part_index;
}
+/*
+ * ExecGetLeafPartitionForKey
+ * Finds the leaf partition of a partitioned table 'root_rel' that might
+ * contain the specified primary key tuple containing a subset of the
+ * table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Partition descriptors for tuple routing are obtained by referring to the
+ * caller-specified partition directory.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(PartitionDirectory partdir,
+ Relation root_rel, int key_natts,
+ const AttrNumber *key_attnums,
+ Datum *key_vals, bool *key_nulls,
+ Oid root_idxoid, int lockmode,
+ Oid *leaf_idxoid)
+{
+ Relation rel = root_rel;
+ Oid constr_idxoid = root_idxoid;
+
+ *leaf_idxoid = InvalidOid;
+
+ /*
+ * Descend through partitioned parents to find the leaf partition that
+ * would accept a row with the provided key values, starting with the root
+ * parent.
+ */
+ while (true)
+ {
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc;
+ Datum partkey_vals[PARTITION_MAX_KEYS];
+ bool partkey_isnull[PARTITION_MAX_KEYS];
+ AttrNumber *root_partattrs = partkey->partattrs;
+ int i,
+ j;
+ int partidx;
+ Oid partoid;
+ bool is_leaf;
+
+ /*
+ * Collect partition key values from the unique key.
+ *
+ * Because we only have the root table's copy of pk_attnums, must map
+ * any non-root table's partition key attribute numbers to the root
+ * table's.
+ */
+ if (rel != root_rel)
+ {
+ /*
+ * map->attnums will contain root table attribute numbers for each
+ * attribute of the current partitioned relation.
+ */
+ AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+ RelationGetDescr(rel),
+ false);
+
+ if (map)
+ {
+ root_partattrs = palloc(partkey->partnatts *
+ sizeof(AttrNumber));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ AttrNumber partattno = partkey->partattrs[i];
+
+ root_partattrs[i] = map->attnums[partattno - 1];
+ }
+
+ free_attrmap(map);
+ }
+ }
+
+ /*
+ * Referenced key specification does not allow expressions, so there
+ * would not be expressions in the partition keys either.
+ */
+ Assert(partkey->partexprs == NIL);
+ for (i = 0, j = 0; i < partkey->partnatts; i++)
+ {
+ int k;
+
+ for (k = 0; k < key_natts; k++)
+ {
+ if (root_partattrs[i] == key_attnums[k])
+ {
+ partkey_vals[j] = key_vals[k];
+ partkey_isnull[j] = key_nulls[k];
+ j++;
+ break;
+ }
+ }
+ }
+ /* Had better have found values for all of the partition keys. */
+ Assert(j == partkey->partnatts);
+
+ if (root_partattrs != partkey->partattrs)
+ pfree(root_partattrs);
+
+ /* Get the PartitionDesc using the partition directory machinery. */
+ partdesc = PartitionDirectoryLookup(partdir, rel);
+
+ /* Find the partition for the key. */
+ partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+ partkey_isnull);
+ Assert(partidx < 0 || partidx < partdesc->nparts);
+
+ /* Close any intermediate parents we opened, but keep the lock. */
+ if (rel != root_rel)
+ table_close(rel, NoLock);
+
+ /* No partition found. */
+ if (partidx < 0)
+ return NULL;
+
+ partoid = partdesc->oids[partidx];
+ rel = table_open(partoid, lockmode);
+ constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+ /*
+ * Return if the partition is a leaf, else find its partition in the
+ * next iteration.
+ */
+ is_leaf = partdesc->is_leaf[partidx];
+ if (is_leaf)
+ {
+ *leaf_idxoid = constr_idxoid;
+ return rel;
+ }
+ }
+
+ Assert(false);
+ return NULL;
+}
+
/*
* ExecBuildSlotPartitionKeyDescription
*
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 41754ddfea..fb3ebcd309 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
Datum datum;
bool isNull;
ItemPointerData tid;
- TM_FailureData tmfd;
LockTupleMode lockmode;
- int lockflags = 0;
- TM_Result test;
TupleTableSlot *markSlot;
/* clear any leftover test tuple for this rel */
@@ -178,74 +175,11 @@ lnext:
break;
}
- lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
- if (!IsolationUsesXactSnapshot())
- lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
- test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
- markSlot, estate->es_output_cid,
- lockmode, erm->waitPolicy,
- lockflags,
- &tmfd);
-
- switch (test)
- {
- case TM_WouldBlock:
- /* couldn't lock tuple in SKIP LOCKED mode */
- goto lnext;
-
- case TM_SelfModified:
-
- /*
- * The target tuple was already updated or deleted by the
- * current command, or by a later command in the current
- * transaction. We *must* ignore the tuple in the former
- * case, so as to avoid the "Halloween problem" of repeated
- * update attempts. In the latter case it might be sensible
- * to fetch the updated tuple instead, but doing so would
- * require changing heap_update and heap_delete to not
- * complain about updating "invisible" tuples, which seems
- * pretty scary (table_tuple_lock will not complain, but few
- * callers expect TM_Invisible, and we're not one of them). So
- * for now, treat the tuple as deleted and do not process.
- */
- goto lnext;
-
- case TM_Ok:
-
- /*
- * Got the lock successfully, the locked tuple saved in
- * markSlot for, if needed, EvalPlanQual testing below.
- */
- if (tmfd.traversed)
- epq_needed = true;
- break;
-
- case TM_Updated:
- if (IsolationUsesXactSnapshot())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("could not serialize access due to concurrent update")));
- elog(ERROR, "unexpected table_tuple_lock status: %u",
- test);
- break;
-
- case TM_Deleted:
- if (IsolationUsesXactSnapshot())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("could not serialize access due to concurrent update")));
- /* tuple was deleted so don't return it */
- goto lnext;
-
- case TM_Invisible:
- elog(ERROR, "attempted to lock invisible tuple");
- break;
-
- default:
- elog(ERROR, "unrecognized table_tuple_lock status: %u",
- test);
- }
+ /* skip tuple if it couldn't be locked */
+ if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+ estate->es_snapshot, estate->es_output_cid,
+ lockmode, erm->waitPolicy, &epq_needed))
+ goto lnext;
/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
erm->curCtid = tid;
@@ -280,6 +214,93 @@ lnext:
return slot;
}
+/*
+ * ExecLockTableTuple
+ * Locks tuple with the specified TID in lockmode following given wait
+ * policy
+ *
+ * Returns true if the tuple was successfully locked. Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+ Snapshot snapshot, CommandId cid,
+ LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+ bool *tuple_concurrently_updated)
+{
+ TM_FailureData tmfd;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+ TM_Result test;
+
+ if (tuple_concurrently_updated)
+ *tuple_concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+ waitPolicy, lockflags, &tmfd);
+
+ switch (test)
+ {
+ case TM_WouldBlock:
+ /* couldn't lock tuple in SKIP LOCKED mode */
+ return false;
+
+ case TM_SelfModified:
+ /*
+ * The target tuple was already updated or deleted by the
+ * current command, or by a later command in the current
+ * transaction. We *must* ignore the tuple in the former
+ * case, so as to avoid the "Halloween problem" of repeated
+ * update attempts. In the latter case it might be sensible
+ * to fetch the updated tuple instead, but doing so would
+ * require changing heap_update and heap_delete to not
+ * complain about updating "invisible" tuples, which seems
+ * pretty scary (table_tuple_lock will not complain, but few
+ * callers expect TM_Invisible, and we're not one of them). So
+ * for now, treat the tuple as deleted and do not process.
+ */
+ return false;
+
+ case TM_Ok:
+ /*
+ * Got the lock successfully, the locked tuple saved in
+ * slot for EvalPlanQual, if asked by the caller.
+ */
+ if (tmfd.traversed && tuple_concurrently_updated)
+ *tuple_concurrently_updated = true;
+ break;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ elog(ERROR, "unexpected table_tuple_lock status: %u",
+ test);
+ break;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ /* tuple was deleted so don't return it */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ return false;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+ return false;
+ }
+
+ return true;
+}
+
/* ----------------------------------------------------------------
* ExecInitLockRows
*
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 804a2a69e4..81546655fd 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,21 +23,31 @@
#include "postgres.h"
+#include "access/genam.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
+#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
+#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "commands/trigger.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/spi.h"
#include "lib/ilist.h"
#include "miscadmin.h"
#include "parser/parse_coerce.h"
#include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
#include "storage/bufmgr.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
@@ -50,6 +60,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rangetypes.h"
+#include "utils/partcache.h"
#include "utils/rel.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
@@ -157,6 +168,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
*/
typedef struct RI_Plan
{
+ /* Constraint for this plan. */
+ const RI_ConstraintInfo *riinfo;
+
+ /* RI query type code. */
+ int constr_queryno;
+
/*
* Context under which this struct and its subsidiary data gets allocated.
* It is made a child of CacheMemoryContext.
@@ -270,7 +287,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
Relation trig_rel, bool rel_is_pk);
static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
static Oid get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+ RI_PlanCreateFunc_type plan_create_func,
const char *querystr, int nargs, Oid *argtypes,
RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -294,6 +312,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
Snapshot crosscheck_snapshot,
int limit, CmdType *last_stmt_cmdtype);
static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+ const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+ Relation fk_rel, Relation pk_rel,
+ Datum *pk_vals, char *pk_nulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
/*
@@ -389,9 +416,9 @@ RI_FKey_check(TriggerData *trigdata)
/*
* MATCH PARTIAL - all non-null columns must match. (not
- * implemented, can be done by modifying the query below
- * to only include non-null columns, or by writing a
- * special version here)
+ * implemented, can be done by modifying
+ * LookupKeyInPkRelPlanExecute() to only include non-null
+ * columns.
*/
break;
#endif
@@ -411,24 +438,15 @@ RI_FKey_check(TriggerData *trigdata)
if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
{
- StringInfoData querybuf;
- char pkrelname[MAX_QUOTED_REL_NAME_LEN];
- char attname[MAX_QUOTED_NAME_LEN];
- char paramname[16];
- const char *querysep;
- Oid queryoids[RI_MAX_NUMKEYS];
- const char *pk_only;
-
/* ----------
- * The query string built is
+ * For simple FKs, use ri_LookupKeyInPkRelPlanCreate() to create
+ * the plan to check the row, which is equivalent to doing
* SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
* FOR KEY SHARE OF x
- * The type id's for the $ parameters are those of the
- * corresponding FK attributes.
*
- * But for temporal FKs we need to make sure
- * the FK's range is completely covered.
- * So we use this query instead:
+ * But for temporal FKs we use ri_SqlStringPlanCreate() because we need
+ * to make sure the FK's range is completely covered, which is done
+ * with this query instead:
* SELECT 1
* FROM (
* SELECT pkperiodatt AS r
@@ -442,45 +460,45 @@ RI_FKey_check(TriggerData *trigdata)
* we can make this a bit simpler.
* ----------
*/
- initStringInfo(&querybuf);
- pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
- "" : "ONLY ";
- quoteRelationName(pkrelname, pk_rel);
if (riinfo->hasperiod)
{
+ StringInfoData querybuf;
+ char pkrelname[MAX_QUOTED_REL_NAME_LEN];
+ char attname[MAX_QUOTED_NAME_LEN];
+ char paramname[16];
+ const char *querysep;
+ Oid queryoids[RI_MAX_NUMKEYS];
+ const char *pk_only;
+ Oid fk_type;
+
+ initStringInfo(&querybuf);
+ pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+ "" : "ONLY ";
+ quoteRelationName(pkrelname, pk_rel);
quoteOneName(attname,
RIAttName(pk_rel, riinfo->pk_attnums[riinfo->nkeys - 1]));
-
appendStringInfo(&querybuf,
"SELECT 1 FROM (SELECT %s AS r FROM %s%s x",
attname, pk_only, pkrelname);
- }
- else
- {
- appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
- pk_only, pkrelname);
- }
- querysep = "WHERE";
- for (int i = 0; i < riinfo->nkeys; i++)
- {
- Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
- Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
- quoteOneName(attname,
- RIAttName(pk_rel, riinfo->pk_attnums[i]));
- sprintf(paramname, "$%d", i + 1);
- ri_GenerateQual(&querybuf, querysep,
- attname, pk_type,
- riinfo->pf_eq_oprs[i],
- paramname, fk_type);
- querysep = "AND";
- queryoids[i] = fk_type;
- }
- appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
- if (riinfo->hasperiod)
- {
- Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[riinfo->nkeys - 1]);
+ querysep = "WHERE";
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
+
+ fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ quoteOneName(attname,
+ RIAttName(pk_rel, riinfo->pk_attnums[i]));
+ sprintf(paramname, "$%d", i + 1);
+ ri_GenerateQual(&querybuf, querysep,
+ attname, pk_type,
+ riinfo->pf_eq_oprs[i],
+ paramname, fk_type);
+ querysep = "AND";
+ queryoids[i] = fk_type;
+ }
+ appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+ fk_type = RIAttType(fk_rel, riinfo->fk_attnums[riinfo->nkeys - 1]);
appendStringInfo(&querybuf, ") x1 HAVING ");
sprintf(paramname, "$%d", riinfo->nkeys);
ri_GenerateQual(&querybuf, "",
@@ -488,26 +506,24 @@ RI_FKey_check(TriggerData *trigdata)
riinfo->agged_period_contained_by_oper,
"pg_catalog.range_agg", ANYMULTIRANGEOID);
appendStringInfo(&querybuf, "(x1.r)");
- }
- /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
- querybuf.data, riinfo->nkeys, queryoids,
- &qkey, fk_rel, pk_rel);
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
+ &qkey, fk_rel, pk_rel);
+ }
+ else
+ qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+ NULL, 0 /* nargs */, NULL /* argtypes */,
+ &qkey, fk_rel, pk_rel);
}
- /*
- * Now check that foreign key exists in PK table
- *
- * XXX detectNewRows must be true when a partitioned table is on the
- * referenced side. The reason is that our snapshot must be fresh in
- * order for the hack in find_inheritance_children() to work.
- */
+ /* Now check that foreign key exists in PK table */
ri_PerformCheck(riinfo, &qkey, qplan,
fk_rel, pk_rel,
NULL, newslot,
false,
- pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+ false,
CMD_SELECT);
table_close(pk_rel, RowShareLock);
@@ -578,24 +594,15 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
{
- StringInfoData querybuf;
- char pkrelname[MAX_QUOTED_REL_NAME_LEN];
- char attname[MAX_QUOTED_NAME_LEN];
- char paramname[16];
- const char *querysep;
- const char *pk_only;
- Oid queryoids[RI_MAX_NUMKEYS];
-
/* ----------
- * The query string built is
+ * For simple FKs, use ri_LookupKeyInPkRelPlanCreate() to create
+ * the plan to check the row, which is equivalent to doing
* SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
* FOR KEY SHARE OF x
- * The type id's for the $ parameters are those of the
- * PK attributes themselves.
*
- * But for temporal FKs we need to make sure
- * the old PK's range is completely covered.
- * So we use this query instead:
+ * But for temporal FKs we use ri_SqlStringPlanCreate() because we need
+ * to make sure the FK's range is completely covered, which is done
+ * with this query instead:
* SELECT 1
* FROM (
* SELECT pkperiodatt AS r
@@ -609,43 +616,44 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
* we can make this a bit simpler.
* ----------
*/
- initStringInfo(&querybuf);
- pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
- "" : "ONLY ";
- quoteRelationName(pkrelname, pk_rel);
if (riinfo->hasperiod)
{
+ StringInfoData querybuf;
+ char pkrelname[MAX_QUOTED_REL_NAME_LEN];
+ char attname[MAX_QUOTED_NAME_LEN];
+ char paramname[16];
+ const char *querysep;
+ const char *pk_only;
+ Oid queryoids[RI_MAX_NUMKEYS];
+ Oid fk_type;
+
+ initStringInfo(&querybuf);
+ pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+ "" : "ONLY ";
+ quoteRelationName(pkrelname, pk_rel);
quoteOneName(attname, RIAttName(pk_rel, riinfo->pk_attnums[riinfo->nkeys - 1]));
appendStringInfo(&querybuf,
"SELECT 1 FROM (SELECT %s AS r FROM %s%s x",
attname, pk_only, pkrelname);
- }
- else
- {
- appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
- pk_only, pkrelname);
- }
- querysep = "WHERE";
- for (int i = 0; i < riinfo->nkeys; i++)
- {
- Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
- quoteOneName(attname,
- RIAttName(pk_rel, riinfo->pk_attnums[i]));
- sprintf(paramname, "$%d", i + 1);
- ri_GenerateQual(&querybuf, querysep,
- attname, pk_type,
- riinfo->pp_eq_oprs[i],
- paramname, pk_type);
- querysep = "AND";
- queryoids[i] = pk_type;
- }
- appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
- if (riinfo->hasperiod)
- {
- Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[riinfo->nkeys - 1]);
+ querysep = "WHERE";
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
+
+ quoteOneName(attname,
+ RIAttName(pk_rel, riinfo->pk_attnums[i]));
+ sprintf(paramname, "$%d", i + 1);
+ ri_GenerateQual(&querybuf, querysep,
+ attname, pk_type,
+ riinfo->pp_eq_oprs[i],
+ paramname, pk_type);
+ querysep = "AND";
+ queryoids[i] = pk_type;
+ }
+ appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+ fk_type = RIAttType(fk_rel, riinfo->fk_attnums[riinfo->nkeys - 1]);
appendStringInfo(&querybuf, ") x1 HAVING ");
sprintf(paramname, "$%d", riinfo->nkeys);
ri_GenerateQual(&querybuf, "",
@@ -653,12 +661,15 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
riinfo->agged_period_contained_by_oper,
"pg_catalog.range_agg", ANYMULTIRANGEOID);
appendStringInfo(&querybuf, "(x1.r)");
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
+ &qkey, fk_rel, pk_rel);
}
-
- /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
- querybuf.data, riinfo->nkeys, queryoids,
- &qkey, fk_rel, pk_rel);
+ else
+ qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+ NULL, 0 /* nargs */, NULL /* argtypes */,
+ &qkey, fk_rel, pk_rel);
}
/*
@@ -840,7 +851,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -937,7 +948,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
}
/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -1051,7 +1062,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
querybuf.data, riinfo->nkeys * 2, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -1275,7 +1286,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
}
/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
- qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -2090,6 +2101,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
* saving lots of work and memory when there are many partitions with
* similar FK constraints.
*
+ * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+ * because its execution function (ri_LookupKeyInPkRel()) expects to see
+ * the RI_ConstraintInfo of the individual leaf partitions that the
+ * query fired on.
+ *
* (Note that we must still have a separate RI_ConstraintInfo for each
* constraint, because partitions can have different column orders,
* resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2097,7 +2113,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
* We assume struct RI_QueryKey contains no padding bytes, else we'd need
* to use memset to clear them.
*/
- if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+ if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+ constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
key->constr_id = riinfo->constraint_root_id;
else
key->constr_id = riinfo->constraint_id;
@@ -2375,10 +2392,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
}
}
+typedef enum RI_Plantype
+{
+ RI_PLAN_SQL = 0,
+ RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
/* Query string or an equivalent name to show in the error CONTEXT. */
typedef struct RIErrorCallbackArg
{
const char *query;
+ RI_Plantype plantype;
} RIErrorCallbackArg;
/*
@@ -2408,7 +2432,17 @@ _RI_error_callback(void *arg)
internalerrquery(query);
}
else
- errcontext("SQL statement \"%s\"", query);
+ {
+ switch (carg->plantype)
+ {
+ case RI_PLAN_SQL:
+ errcontext("SQL statement \"%s\"", query);
+ break;
+ case RI_PLAN_CHECK_FUNCTION:
+ errcontext("RI check function \"%s\"", query);
+ break;
+ }
+ }
}
/*
@@ -2644,14 +2678,387 @@ ri_SqlStringPlanFree(RI_Plan *plan)
}
}
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+ const char *querystr, int nargs, Oid *paramtypes)
+{
+ Assert(querystr == NULL);
+ plan->plan_exec_func = ri_LookupKeyInPkRel;
+ plan->plan_exec_arg = NULL;
+ plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+ plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+ Oid result = InvalidOid;
+ HeapTuple tp;
+
+ tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+ if (HeapTupleIsValid(tp))
+ {
+ Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+ if (contup->contype == CONSTRAINT_FOREIGN)
+ result = contup->conindid;
+ ReleaseSysCache(tp);
+ }
+
+ if (!OidIsValid(result))
+ elog(ERROR, "unique index not found for foreign key constraint %u",
+ conoid);
+
+ return result;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the new user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ *
+ * Provided for non-SQL implementors of an RI_Plan.
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * This checks that the index key of the tuple specified in 'new_slot' matches
+ * the key that has already been found in the PK index relation 'idxrel'.
+ *
+ * Returns true if the index key of the tuple matches the existing index
+ * key, false otherwise.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ skey->sk_argument,
+ values[i])))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'. The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+ Relation fk_rel, Relation pk_rel,
+ Datum *pk_vals, char *pk_nulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype)
+{
+ const RI_ConstraintInfo *riinfo = plan->riinfo;
+ Oid constr_id = riinfo->constraint_id;
+ Oid idxoid;
+ Relation idxrel;
+ Relation leaf_pk_rel = NULL;
+ int num_pk;
+ int i;
+ int tuples_processed = 0;
+ const Oid *eq_oprs;
+ Datum pk_values[INDEX_MAX_KEYS];
+ bool pk_isnulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ IndexScanDesc scan;
+ TupleTableSlot *outslot;
+ RIErrorCallbackArg ricallbackarg;
+ ErrorContextCallback rierrcontext;
+
+ /* We're effectively doing a CMD_SELECT below. */
+ *last_stmt_cmdtype = CMD_SELECT;
+
+ /*
+ * Setup error traceback support for ereport()
+ */
+ ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+ ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+ rierrcontext.callback = _RI_error_callback;
+ rierrcontext.arg = &ricallbackarg;
+ rierrcontext.previous = error_context_stack;
+ error_context_stack = &rierrcontext;
+
+ /* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+ CHECK_FOR_INTERRUPTS();
+
+ ri_CheckPermissions(pk_rel);
+
+ /*
+ * Choose the equality operators to use when scanning the PK index below.
+ *
+ * May need to cast the foreign key value (of the FK column's type) to
+ * the corresponding PK column's type if the equality operator
+ * demands it.
+ */
+ if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+ {
+ /* Use PK = PK equality operator. */
+ eq_oprs = riinfo->pp_eq_oprs;
+
+ for (i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ pk_isnulls[i] = false;
+ pk_values[i] = pk_vals[i];
+ }
+ else
+ {
+ Assert(false);
+ }
+ }
+ }
+ else
+ {
+ Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK);
+ /* Use PK = FK equality operator. */
+ eq_oprs = riinfo->pf_eq_oprs;
+
+ for (i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ Oid eq_opr = eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ pk_isnulls[i] = false;
+ pk_values[i] = pk_vals[i];
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ {
+ pk_values[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+ else
+ {
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * Open the constraint index to be scanned.
+ *
+ * If the target table is partitioned, we must look up the leaf partition
+ * and its corresponding unique index to search the keys in.
+ */
+ idxoid = get_fkey_unique_index(constr_id);
+ if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ Oid leaf_idxoid;
+ PartitionDirectory partdir;
+
+ /*
+ * Pass the latest snapshot for omit_detached_snapshot so that any
+ * detach-pending partitions are correctly omitted or included from
+ * the considerations of this lookup. The PartitionDesc machinery
+ * that runs as part of this will need to use the snapshot to determine
+ * whether to omit or include any detach-pending partition based on the
+ * whether the pg_inherits row that marks it as detach-pending is
+ * is visible to it or not, respectively.
+ */
+ partdir = CreatePartitionDirectory(CurrentMemoryContext,
+ GetLatestSnapshot());
+ leaf_pk_rel = ExecGetLeafPartitionForKey(partdir,
+ pk_rel, riinfo->nkeys,
+ riinfo->pk_attnums,
+ pk_values, pk_isnulls,
+ idxoid, RowShareLock,
+ &leaf_idxoid);
+
+ /*
+ * XXX - Would be nice if this could be saved across calls. Problem
+ * with just putting it in RI_Plan.plan_exec_arg is that the RI_Plan
+ * is cached for the session duration, whereas the PartitionDirectory
+ * can't last past the transaction.
+ */
+ DestroyPartitionDirectory(partdir);
+
+ /*
+ * If no suitable leaf partition exists, neither can the key we're
+ * looking for.
+ */
+ if (leaf_pk_rel == NULL)
+ goto done;
+
+ pk_rel = leaf_pk_rel;
+ idxoid = leaf_idxoid;
+ }
+ idxrel = index_open(idxoid, RowShareLock);
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+ for (i = 0; i < num_pk; i++)
+ {
+ int pkattno = i + 1;
+ Oid lefttype,
+ righttype;
+ Oid operator = eq_oprs[i];
+ Oid opfamily = idxrel->rd_opfamily[i];
+ int strat;
+ RegProcedure regop = get_opcode(operator);
+
+ Assert(!pk_isnulls[i]);
+ get_op_opfamily_properties(operator, opfamily, false, &strat,
+ &lefttype, &righttype);
+ ScanKeyEntryInitialize(&skey[i], 0, pkattno, strat, righttype,
+ idxrel->rd_indcollation[i], regop,
+ pk_values[i]);
+ }
+
+ Assert(ActiveSnapshotSet());
+ scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+
+ /* Install the ScanKeys. */
+ index_rescan(scan, skey, num_pk, NULL, 0);
+
+ /* Look for the tuple, and if found, try to lock it in key share mode. */
+ outslot = table_slot_create(pk_rel, NULL);
+ while (index_getnext_slot(scan, ForwardScanDirection, outslot))
+ {
+ bool tuple_concurrently_updated;
+
+ /*
+ * If we fail to lock the tuple for whatever reason, assume it doesn't
+ * exist. If the locked tuple is the one that was found to be updated
+ * concurrently, retry.
+ */
+ if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+ GetActiveSnapshot(),
+ GetCurrentCommandId(false),
+ LockTupleKeyShare,
+ LockWaitBlock,
+ &tuple_concurrently_updated))
+ {
+ bool matched = true;
+
+ /*
+ * If the matched table tuple has been updated, check if the key is
+ * still the same.
+ *
+ * This emulates EvalPlanQual() in the executor.
+ */
+ if (tuple_concurrently_updated &&
+ !recheck_matched_pk_tuple(idxrel, skey, outslot))
+ matched = false;
+
+ if (matched)
+ tuples_processed = 1;
+ }
+
+ break;
+ }
+
+ index_endscan(scan);
+ ExecDropSingleTupleTableSlot(outslot);
+
+ /* Don't release lock until commit. */
+ index_close(idxrel, NoLock);
+
+ /* Close leaf partition relation if any. */
+ if (leaf_pk_rel)
+ table_close(leaf_pk_rel, NoLock);
+
+done:
+ /*
+ * Pop the error context stack
+ */
+ error_context_stack = rierrcontext.previous;
+
+ return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+ /* Never store anything that can be invalidated. */
+ return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+ /* Nothing to free. */
+}
+
/*
* Create an RI_Plan for a given RI check query and initialize the
* plan callbacks and execution argument using the caller specified
* function.
*/
static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
- const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+ RI_PlanCreateFunc_type plan_create_func,
+ const char *querystr, int nargs, Oid *paramtypes,
+ int constr_queryno)
{
RI_Plan *plan;
MemoryContext plancxt,
@@ -2666,6 +3073,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
ALLOCSET_SMALL_SIZES);
oldcxt = MemoryContextSwitchTo(plancxt);
plan = (RI_Plan *) palloc0(sizeof(*plan));
+ plan->riinfo = riinfo;
+ plan->constr_queryno = constr_queryno;
plan->plancxt = plancxt;
plan->nargs = nargs;
if (plan->nargs > 0)
@@ -2730,7 +3139,8 @@ ri_FreePlan(RI_Plan *plan)
* Prepare execution plan for a query to enforce an RI restriction
*/
static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+ RI_PlanCreateFunc_type plan_create_func,
const char *querystr, int nargs, Oid *argtypes,
RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
{
@@ -2754,7 +3164,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
/* Create the plan */
- qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+ qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+ argtypes, qkey->constr_queryno);
/* Restore UID and security context */
SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3399,7 +3810,10 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
* ri_HashCompareOp -
*
* See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not. The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
*/
static RI_CompareHashEntry *
ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3455,8 +3869,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the values that will be passed to the
+ * operator will be of expected operand type(s). The operator can be
+ * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+ * case, we only need the cast if the right operand value doesn't match
+ * the type expected by the operator.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index c09bc83b2a..e285427b48 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,13 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(PartitionDirectory partdir,
+ Relation root_rel,
+ int key_natts,
+ const AttrNumber *key_attnums,
+ Datum *key_vals, bool *key_nulls,
+ Oid root_idxoid, int lockmode,
+ Oid *leaf_idxoid);
/*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c8e6befca8..fbf6b6c2c5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -262,6 +262,15 @@ extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+ Snapshot snapshot, CommandId cid,
+ LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+ bool *tuple_concurrently_updated);
+
/* ----------------------------------------------------------------
* ExecProcNode
*
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 0000000000..9bbec638ac
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,58 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2ukey s1i s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2uaux s1i s2c s1c s2s s1s
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2ukey s1i s2ukey2 s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index bdd26bac6c..c4a35b69bb 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
step s2ifn2: INSERT INTO fk_noparted VALUES (2);
step s2c: COMMIT;
+ERROR: insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
step s2sfn: SELECT * FROM fk_noparted;
a
-
1
-2
-(2 rows)
+(1 row)
starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 143109aa4d..106deb3034 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -34,6 +34,7 @@ test: fk-deadlock2
test: fk-partitioned-1
test: fk-partitioned-2
test: fk-snapshot
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 0000000000..4bdd92cd2d
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,42 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+setup { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+setup { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+# fail
+permutation s2ukey s1i s2c s1c s2s s1s
+# ok
+permutation s2uaux s1i s2c s1c s2s s1s
+# ok
+permutation s2ukey s1i s2ukey2 s2c s1c s2s s1s
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 9fad57e768..ec5fe0c50c 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -53,10 +53,7 @@ step s2sfn { SELECT * FROM fk_noparted; }
# inserting into referencing tables in transaction-snapshot mode
# PK table is non-partitioned
permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
# inserting into referencing tables in up-to-date snapshot mode
--
2.43.0
[application/x-patch] v1-0001-Explicitly-pass-snapshot-necessary-for-omit_detac.patch (17.3K, 3-v1-0001-Explicitly-pass-snapshot-necessary-for-omit_detac.patch)
download | inline diff:
From f6fb0f4b9bf68e58d8d857721bb22ded5130149a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Dec 2024 21:20:11 +0900
Subject: [PATCH v1 1/3] Explicitly pass snapshot necessary for omit_detached
logic
This commit changes find_inheritance_children_extended() and
RelationBuildPartitionDesc() to accept the snapshot necessary
to implement the omit_detach logic correctly.
Previously, these functions used ActiveSnapshot to check if a
detach-pending partition's pg_inherits row was visible. This
logic aimed to make RI queries over partitioned PK tables under
REPEATABLE READ isolation handle detach-pending partitions
correctly. However, forcing a snapshot onto ActiveSnapshot led
to isolation violations by making scans in the query see changes
not consistent with the parent transaction's snapshot. A test
added in commit 00cb86e75d demonstrates this issue.
The new interface of RelationBuildPartitionDesc() and relevant
callers allows passing the necessary snapshot explcitly, thus
avoiding modifications to ActiveSnapshot. Default behavior remains
unchanged when no snapshot is provided, maintaining compatibility
with non-RI queries and other uses of
find_inheritance_children_extended().
A future commit will update RI PK lookups to use this interface.
Robert Haas contributed the changes to PartitionDesc interface.
Co-author: Robert Haas
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com
---
src/backend/catalog/pg_inherits.c | 33 +++++-----
src/backend/executor/execPartition.c | 23 +++++--
src/backend/optimizer/util/plancat.c | 6 +-
src/backend/partitioning/partdesc.c | 94 +++++++++++++++++-----------
src/include/catalog/pg_inherits.h | 7 ++-
src/include/partitioning/partdesc.h | 6 +-
6 files changed, 109 insertions(+), 60 deletions(-)
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 836b4bfd89..8917a65690 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -51,14 +51,18 @@ typedef struct SeenRelsEntry
* then no locks are acquired, but caller must beware of race conditions
* against possible DROPs of child relations.
*
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
* find_inheritance_children_extended for details.
*/
List *
find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
{
- return find_inheritance_children_extended(parentrelId, true, lockmode,
- NULL, NULL);
+ return find_inheritance_children_extended(parentrelId,
+ ActiveSnapshotSet() ?
+ GetActiveSnapshot() : NULL,
+ lockmode, NULL, NULL);
}
/*
@@ -70,16 +74,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
* If a partition's pg_inherits row is marked "detach pending",
* *detached_exist (if not null) is set true.
*
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list. This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot. In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If the caller passed 'omit_detached_snapshot', the partition whose
+ * pg_inherits tuple marks it as "detach pending" is omitted from the output
+ * list if the tuple is visible to that snapshot. That is, such a partition
+ * is omitted from the output list depending on whether the transaction that
+ * marked that partition as detached appears committed to
+ * omit_detached_snapshot. If omitted, *detached_xmin (if non NULL) is set
+ * to the xmin of that pg_inherits tuple.
*/
List *
-find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+find_inheritance_children_extended(Oid parentrelId,
+ Snapshot omit_detached_snapshot,
LOCKMODE lockmode, bool *detached_exist,
TransactionId *detached_xmin)
{
@@ -140,15 +145,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
if (detached_exist)
*detached_exist = true;
- if (omit_detached && ActiveSnapshotSet())
+ if (omit_detached_snapshot)
{
TransactionId xmin;
- Snapshot snap;
xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
- snap = GetActiveSnapshot();
- if (!XidInMVCCSnapshot(xmin, snap))
+ if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
{
if (detached_xmin)
{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7651886229..d26cf20003 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -31,6 +31,7 @@
#include "utils/partcache.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
/*-----------------------
@@ -1101,17 +1102,24 @@ ExecInitPartitionDispatchInfo(EState *estate,
MemoryContext oldcxt;
/*
- * For data modification, it is better that executor does not include
- * partitions being detached, except when running in snapshot-isolation
- * mode. This means that a read-committed transaction immediately gets a
+ * For data modification, it is better that executor omits the partitions
+ * being detached, except when running in snapshot-isolation mode. This
+ * means that a read-committed transaction immediately gets a
* "no partition for tuple" error when a tuple is inserted into a
* partition that's being detached concurrently, but a transaction in
* repeatable-read mode can still use such a partition.
*/
if (estate->es_partition_directory == NULL)
+ {
+ Snapshot omit_detached_snapshot = NULL;
+
+ Assert(ActiveSnapshotSet());
+ if (!IsolationUsesXactSnapshot())
+ omit_detached_snapshot = GetActiveSnapshot();
estate->es_partition_directory =
CreatePartitionDirectory(estate->es_query_cxt,
- !IsolationUsesXactSnapshot());
+ omit_detached_snapshot);
+ }
oldcxt = MemoryContextSwitchTo(proute->memcxt);
@@ -1871,10 +1879,13 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
int i;
ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always includes detached partitions */
+ /*
+ * For data reading, executor always includes detached partitions,
+ * so pass NULL for omit_detached_snapshot.
+ */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ CreatePartitionDirectory(estate->es_query_cxt, NULL);
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 153390f2dc..ee146db082 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2378,11 +2378,15 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
/*
* Create the PartitionDirectory infrastructure if we didn't already.
+ * Note that the planner always omits the partitions being detached
+ * concurrently.
*/
if (root->glob->partition_directory == NULL)
{
+ Assert(ActiveSnapshotSet());
root->glob->partition_directory =
- CreatePartitionDirectory(CurrentMemoryContext, true);
+ CreatePartitionDirectory(CurrentMemoryContext,
+ GetActiveSnapshot());
}
partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index b4e0ed0e71..a80bbe7378 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -36,7 +36,7 @@ typedef struct PartitionDirectoryData
{
MemoryContext pdir_mcxt;
HTAB *pdir_hash;
- bool omit_detached;
+ Snapshot omit_detached_snapshot;
} PartitionDirectoryData;
typedef struct PartitionDirectoryEntry
@@ -47,17 +47,23 @@ typedef struct PartitionDirectoryEntry
} PartitionDirectoryEntry;
static PartitionDesc RelationBuildPartitionDesc(Relation rel,
- bool omit_detached);
+ Snapshot omit_detached_snapshot);
/*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * Get partition descriptor of a partitioned table, building one and
+ * caching it for later use if not already or if the cached one would
+ * not be suitable for a given request
*
* We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetached
- * omits (some of) those. We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetached omits the detach-pending partition. If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
*
* Note: we arrange for partition descriptors to not get freed until the
* relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -68,7 +74,7 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
* that the data doesn't become stale.
*/
PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, Snapshot omit_detached_snapshot)
{
Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
@@ -77,36 +83,51 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
* do so when we are asked to include all partitions including detached;
* and also when we know that there are no detached partitions.
*
- * If there is no active snapshot, detached partitions aren't omitted
- * either, so we can use the cached descriptor too in that case.
+ * omit_detached_snapshot being NULL means that the caller doesn't care
+ * that the returned partition descriptor may contain detached partitions,
+ * so we we can used the cached descriptor in that case too.
*/
if (likely(rel->rd_partdesc &&
- (!rel->rd_partdesc->detached_exist || !omit_detached ||
- !ActiveSnapshotSet())))
+ (!rel->rd_partdesc->detached_exist ||
+ omit_detached_snapshot == NULL)))
return rel->rd_partdesc;
/*
- * If we're asked to omit detached partitions, we may be able to use a
- * cached descriptor too. We determine that based on the pg_inherits.xmin
- * that was saved alongside that descriptor: if the xmin that was not in
- * progress for that active snapshot is also not in progress for the
- * current active snapshot, then we can use it. Otherwise build one from
- * scratch.
+ * If we're asked to omit the detached partition, we may be able to use
+ * the other cached descriptor, which has been made to omit the detached
+ * partition. Whether that descriptor can be reused in this case is
+ * determined based on cross-checking the visibility of
+ * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+ * pg_inherits row of the detached partition: if the xmin seems in-progress
+ * to both the given omit_detached_snapshot and to the snapshot that would
+ * have been passed when rd_partdesc_nodetached was built, then we can
+ * reuse it. Otherwise we must build one from scratch.
*/
- if (omit_detached &&
- rel->rd_partdesc_nodetached &&
- ActiveSnapshotSet())
+ if (rel->rd_partdesc_nodetached && omit_detached_snapshot)
{
- Snapshot activesnap;
-
Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
- activesnap = GetActiveSnapshot();
- if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+ if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+ omit_detached_snapshot))
return rel->rd_partdesc_nodetached;
}
- return RelationBuildPartitionDesc(rel, omit_detached);
+ return RelationBuildPartitionDesc(rel, omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ * Like RelationGetPartitionDescExt() but for callers that are fine with
+ * ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+ Snapshot snapshot = NULL;
+
+ if (omit_detached && ActiveSnapshotSet())
+ snapshot = GetActiveSnapshot();
+ return RelationGetPartitionDescExt(rel, snapshot);
}
/*
@@ -131,7 +152,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
* for them.
*/
static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel,
+ Snapshot omit_detached_snapshot)
{
PartitionDesc partdesc;
PartitionBoundInfo boundinfo = NULL;
@@ -162,7 +184,8 @@ retry:
detached_exist = false;
detached_xmin = InvalidTransactionId;
inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
- omit_detached, NoLock,
+ omit_detached_snapshot,
+ NoLock,
&detached_exist,
&detached_xmin);
@@ -362,11 +385,11 @@ retry:
*
* Note that if a partition was found by the catalog's scan to have been
* detached, but the pg_inherit tuple saying so was not visible to the
- * active snapshot (find_inheritance_children_extended will not have set
- * detached_xmin in that case), we consider there to be no "omittable"
- * detached partitions.
+ * omit_detached_snapshot (find_inheritance_children_extended() will not
+ * have set detached_xmin in that case), we consider there to be no
+ * "omittable" detached partitions.
*/
- is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+ is_omit = detached_exist && omit_detached_snapshot &&
TransactionIdIsValid(detached_xmin);
/*
@@ -420,7 +443,7 @@ retry:
* Create a new partition directory object.
*/
PartitionDirectory
-CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
+CreatePartitionDirectory(MemoryContext mcxt, Snapshot omit_detached_snapshot)
{
MemoryContext oldcontext = MemoryContextSwitchTo(mcxt);
PartitionDirectory pdir;
@@ -435,7 +458,7 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
pdir->pdir_hash = hash_create("partition directory", 256, &ctl,
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
- pdir->omit_detached = omit_detached;
+ pdir->omit_detached_snapshot = omit_detached_snapshot;
MemoryContextSwitchTo(oldcontext);
return pdir;
@@ -468,7 +491,8 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
*/
RelationIncrementReferenceCount(rel);
pde->rel = rel;
- pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+ pde->pd = RelationGetPartitionDescExt(rel,
+ pdir->omit_detached_snapshot);
Assert(pde->pd != NULL);
}
return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index b3da78c24b..465999795d 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
#include "nodes/pg_list.h"
#include "storage/lock.h"
+#include "utils/snapshot.h"
/* ----------------
* pg_inherits definition. cpp turns this into
@@ -49,8 +50,10 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, pg_inherits
extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
-extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
- LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+extern List *find_inheritance_children_extended(Oid parentrelId,
+ Snapshot omit_detached_snapshot,
+ LOCKMODE lockmode, bool *detached_exist,
+ TransactionId *detached_xmin);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 87abfd76d7..d4a8ab3fb7 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -14,6 +14,7 @@
#include "partitioning/partdefs.h"
#include "utils/relcache.h"
+#include "utils/snapshot.h"
/*
* Information about partitions of a partitioned table.
@@ -65,8 +66,11 @@ typedef struct PartitionDescData
extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel,
+ Snapshot omit_detached_snapshot);
-extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
+extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
+ Snapshot omit_detached_snapshot);
extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
extern void DestroyPartitionDirectory(PartitionDirectory pdir);
--
2.43.0
[application/x-patch] v1-0002-Avoid-using-SPI-in-RI-trigger-functions.patch (32.8K, 4-v1-0002-Avoid-using-SPI-in-RI-trigger-functions.patch)
download | inline diff:
From dc49e938cb6a31034c974f33e268c0a16c778e9d Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Dec 2024 21:35:04 +0900
Subject: [PATCH v1 2/3] Avoid using SPI in RI trigger functions
Currently, ri_PlanCheck() uses SPI_prepare() to create an SPI plan
containing a CachedPlanSource for the SQL query used in RI checks.
Similarly, ri_PerformCheck() calls SPI_execute_snapshot() to
execute the query with a specific snapshot.
This commit introduces ri_PlanCreate() and ri_PlanExecute() to
replace SPI_prepare() and SPI_execute_snapshot(), respectively.
* ri_PlanCreate()
Creates an "RI plan" for a query using a caller-specified callback
function, such as ri_SqlStringPlanCreate(), which produces a
CachedPlanSource for the input SQL string. This mirrors SPI_prepare()
functionality.
* ri_PlanExecute()
Executes an "RI plan" using a callback saved within the "RI plan"
(struct RIPlan). For example, ri_SqlStringPlanExecute() fetches a
CachedPlan from the CachedPlanSource and executes its PlannedStmt
using the executor. Snapshot handling is now fully managed by
ri_PerformCheck(), eliminating dependence on SPI's snapshot logic.
These changes make ri_PlanCreate() and ri_PlanExecute() pluggable,
laying the groundwork for future commits to replace SQL-based RI
checks with optimized C functions for direct table/index scans.
Note:
This only addresses RI_* functions addressed from RI triggers and
not those called from tablecmds.c, such as RI_Initial_Check() and
RI_PartitionRemove_Check(), which still use SPI_prepare() and
SPI_execute_snapshot().
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com
---
src/backend/executor/spi.c | 2 +-
src/backend/utils/adt/ri_triggers.c | 595 +++++++++++++++++++++++-----
2 files changed, 488 insertions(+), 109 deletions(-)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c1d8fd08c6..1d5e6532bf 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -764,7 +764,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
* end of the command.
*
* This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
*
* Passing snapshot == InvalidSnapshot will select the normal behavior of
* fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3185f48afa..804a2a69e4 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
* across query and transaction boundaries, in fact they live as long as
* the backend does. This works because the hashtable structures
* themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- * and the SPI plans they point to are saved using SPI_keepplan().
+ * and the CachedPlanSources they point to are saved in CachedMemoryContext.
* There is not currently any provision for throwing away a no-longer-needed
* plan --- consider improving this someday.
*
@@ -38,6 +38,9 @@
#include "miscadmin.h"
#include "parser/parse_coerce.h"
#include "parser/parse_relation.h"
+#include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/datum.h"
@@ -132,10 +135,55 @@ typedef struct RI_ConstraintInfo
dlist_node valid_link; /* Link in list of valid entries */
} RI_ConstraintInfo;
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+ Datum *param_vals, char *params_isnulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache. The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+ /*
+ * Context under which this struct and its subsidiary data gets allocated.
+ * It is made a child of CacheMemoryContext.
+ */
+ MemoryContext plancxt;
+
+ /* Query parameter types. */
+ int nargs;
+ Oid *paramtypes;
+
+ /*
+ * Set of functions specified by a RI trigger function to implement
+ * the plan for the trigger's RI query.
+ */
+ RI_PlanExecFunc_type plan_exec_func; /* execute the plan */
+ void *plan_exec_arg; /* execution argument, such as
+ * a List of CachedPlanSource */
+ RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+ * valid for ri_query_cache
+ * to continue caching it */
+ RI_PlanFreeFunc_type plan_free_func; /* release plan resources */
+} RI_Plan;
+
/*
* RI_QueryKey
*
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
*/
typedef struct RI_QueryKey
{
@@ -149,7 +197,7 @@ typedef struct RI_QueryKey
typedef struct RI_QueryHashEntry
{
RI_QueryKey key;
- SPIPlanPtr plan;
+ RI_Plan *plan;
} RI_QueryHashEntry;
/*
@@ -212,8 +260,8 @@ static bool ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
static void ri_InitHashTables(void);
static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -222,14 +270,15 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
Relation trig_rel, bool rel_is_pk);
static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
static Oid get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
- RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ const char *querystr, int nargs, Oid *argtypes,
+ RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
- RI_QueryKey *qkey, SPIPlanPtr qplan,
+ RI_QueryKey *qkey, RI_Plan *qplan,
Relation fk_rel, Relation pk_rel,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
- bool detectNewRows, int expect_OK);
+ bool detectNewRows, int expected_cmdtype);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -237,6 +286,14 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+ const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+ Datum *vals, char *nulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
/*
@@ -252,7 +309,7 @@ RI_FKey_check(TriggerData *trigdata)
Relation pk_rel;
TupleTableSlot *newslot;
RI_QueryKey qkey;
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
trigdata->tg_relation, false);
@@ -349,8 +406,6 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
- SPI_connect();
-
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -435,8 +490,9 @@ RI_FKey_check(TriggerData *trigdata)
appendStringInfo(&querybuf, "(x1.r)");
}
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -452,10 +508,7 @@ RI_FKey_check(TriggerData *trigdata)
NULL, newslot,
false,
pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
- SPI_OK_SELECT);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_SELECT);
table_close(pk_rel, RowShareLock);
@@ -510,15 +563,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
TupleTableSlot *oldslot,
const RI_ConstraintInfo *riinfo)
{
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
RI_QueryKey qkey;
bool result;
/* Only called for non-null rows */
Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
- SPI_connect();
-
/*
* Fetch or prepare a saved plan for checking PK table with values coming
* from a PK row
@@ -604,8 +655,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
appendStringInfo(&querybuf, "(x1.r)");
}
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -617,10 +669,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
oldslot, NULL,
false,
true, /* treat like update */
- SPI_OK_SELECT);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_SELECT);
return result;
}
@@ -714,7 +763,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
Relation pk_rel;
TupleTableSlot *oldslot;
RI_QueryKey qkey;
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
trigdata->tg_relation, true);
@@ -742,8 +791,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
return PointerGetDatum(NULL);
}
- SPI_connect();
-
/*
* Fetch or prepare a saved plan for the restrict lookup (it's the same
* query for delete and update cases)
@@ -792,8 +839,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
}
appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -805,10 +853,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
oldslot, NULL,
!is_no_action,
true, /* must detect new rows */
- SPI_OK_SELECT);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_SELECT);
table_close(fk_rel, RowShareLock);
@@ -830,7 +875,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
Relation pk_rel;
TupleTableSlot *oldslot;
RI_QueryKey qkey;
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
/* Check that this is a valid trigger call on the right time and event. */
ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -848,8 +893,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
pk_rel = trigdata->tg_relation;
oldslot = trigdata->tg_trigslot;
- SPI_connect();
-
/* Fetch or prepare a saved plan for the cascaded delete */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
@@ -893,8 +936,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
queryoids[i] = pk_type;
}
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -907,10 +951,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
oldslot, NULL,
false,
true, /* must detect new rows */
- SPI_OK_DELETE);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_DELETE);
table_close(fk_rel, RowExclusiveLock);
@@ -933,7 +974,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
TupleTableSlot *newslot;
TupleTableSlot *oldslot;
RI_QueryKey qkey;
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
/* Check that this is a valid trigger call on the right time and event. */
ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -953,8 +994,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
newslot = trigdata->tg_newslot;
oldslot = trigdata->tg_trigslot;
- SPI_connect();
-
/* Fetch or prepare a saved plan for the cascaded update */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
@@ -1011,8 +1050,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
}
appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys * 2, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -1024,10 +1064,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
oldslot, newslot,
false,
true, /* must detect new rows */
- SPI_OK_UPDATE);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_UPDATE);
table_close(fk_rel, RowExclusiveLock);
@@ -1109,7 +1146,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
Relation pk_rel;
TupleTableSlot *oldslot;
RI_QueryKey qkey;
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
int32 queryno;
riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1125,8 +1162,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
pk_rel = trigdata->tg_relation;
oldslot = trigdata->tg_trigslot;
- SPI_connect();
-
/*
* Fetch or prepare a saved plan for the trigger.
*/
@@ -1239,8 +1274,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
queryoids[i] = pk_type;
}
- /* Prepare and save the plan */
- qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+ /* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+ qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+ querybuf.data, riinfo->nkeys, queryoids,
&qkey, fk_rel, pk_rel);
}
@@ -1252,10 +1288,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
oldslot, NULL,
false,
true, /* must detect new rows */
- SPI_OK_UPDATE);
-
- if (SPI_finish() != SPI_OK_FINISH)
- elog(ERROR, "SPI_finish failed");
+ CMD_UPDATE);
table_close(fk_rel, RowExclusiveLock);
@@ -1445,7 +1478,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
int save_nestlevel;
char workmembuf[32];
int spi_result;
- SPIPlanPtr qplan;
+ SPIPlanPtr qplan;
riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
@@ -2034,7 +2067,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
/* ----------
* ri_BuildQueryKey -
*
- * Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ * Construct a hashtable key for a plan of an FK constraint.
*
* key: output argument, *key is filled in based on the other arguments
* riinfo: info derived from pg_constraint entry
@@ -2053,9 +2086,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
* the FK constraint (i.e., not the table on which the trigger has been
* fired), and so it will be the same for all members of the inheritance
* tree. So we may use the root constraint's OID in the hash key, rather
- * than the constraint's own OID. This avoids creating duplicate SPI
- * plans, saving lots of work and memory when there are many partitions
- * with similar FK constraints.
+ * than the constraint's own OID. This avoids creating duplicate plans,
+ * saving lots of work and memory when there are many partitions with
+ * similar FK constraints.
*
* (Note that we must still have a separate RI_ConstraintInfo for each
* constraint, because partitions can have different column orders,
@@ -2342,15 +2375,366 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
}
}
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+ const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+ RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+ const char *query = carg->query;
+ int syntaxerrposition;
+
+ Assert(query != NULL);
+
+ /*
+ * If there is a syntax error position, convert to internal syntax error;
+ * otherwise treat the query as an item of context stack
+ */
+ syntaxerrposition = geterrposition();
+ if (syntaxerrposition > 0)
+ {
+ errposition(0);
+ internalerrposition(syntaxerrposition);
+ internalerrquery(query);
+ }
+ else
+ errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+ const char *querystr, int nargs, Oid *paramtypes)
+{
+ List *raw_parsetree_list;
+ List *plancache_list = NIL;
+ ListCell *list_item;
+ RIErrorCallbackArg ricallbackarg;
+ ErrorContextCallback rierrcontext;
+
+ Assert(querystr != NULL);
+
+ /*
+ * Setup error traceback support for ereport()
+ */
+ ricallbackarg.query = querystr;
+ rierrcontext.callback = _RI_error_callback;
+ rierrcontext.arg = &ricallbackarg;
+ rierrcontext.previous = error_context_stack;
+ error_context_stack = &rierrcontext;
+
+ /*
+ * Parse the request string into a list of raw parse trees.
+ */
+ raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+ /*
+ * Do parse analysis and rule rewrite for each raw parsetree, storing the
+ * results into unsaved plancache entries.
+ */
+ plancache_list = NIL;
+
+ foreach(list_item, raw_parsetree_list)
+ {
+ RawStmt *parsetree = lfirst_node(RawStmt, list_item);
+ List *stmt_list;
+ CachedPlanSource *plansource;
+
+ /*
+ * Create the CachedPlanSource before we do parse analysis, since it
+ * needs to see the unmodified raw parse tree.
+ */
+ plansource = CreateCachedPlan(parsetree, querystr,
+ CreateCommandTag(parsetree->stmt));
+
+ stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+ paramtypes, nargs,
+ NULL);
+
+ /* Finish filling in the CachedPlanSource */
+ CompleteCachedPlan(plansource,
+ stmt_list,
+ NULL,
+ paramtypes, nargs,
+ NULL, NULL, 0,
+ false); /* not fixed result */
+
+ SaveCachedPlan(plansource);
+ plancache_list = lappend(plancache_list, plansource);
+ }
+
+ plan->plan_exec_func = ri_SqlStringPlanExecute;
+ plan->plan_exec_arg = (void *) plancache_list;
+ plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+ plan->plan_free_func = ri_SqlStringPlanFree;
+
+ /*
+ * Pop the error context stack
+ */
+ error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+ Datum *param_vals, char *param_isnulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype)
+{
+ List *plancache_list = (List *) plan->plan_exec_arg;
+ ListCell *lc;
+ CachedPlan *cplan;
+ ResourceOwner plan_owner;
+ int tuples_processed = 0; /* appease compiler */
+ ParamListInfo paramLI;
+ RIErrorCallbackArg ricallbackarg;
+ ErrorContextCallback rierrcontext;
+
+ Assert(list_length(plancache_list) > 0);
+
+ /*
+ * Setup error traceback support for ereport()
+ */
+ ricallbackarg.query = NULL; /* will be filled below */
+ rierrcontext.callback = _RI_error_callback;
+ rierrcontext.arg = &ricallbackarg;
+ rierrcontext.previous = error_context_stack;
+ error_context_stack = &rierrcontext;
+
+ /*
+ * Convert the parameters into a format that the planner and the executor
+ * expect them to be in.
+ */
+ if (plan->nargs > 0)
+ {
+ paramLI = makeParamList(plan->nargs);
+
+ for (int i = 0; i < plan->nargs; i++)
+ {
+ ParamExternData *prm = ¶mLI->params[i];
+
+ prm->value = param_vals[i];
+ prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+ prm->pflags = PARAM_FLAG_CONST;
+ prm->ptype = plan->paramtypes[i];
+ }
+ }
+ else
+ paramLI = NULL;
+
+ plan_owner = CurrentResourceOwner; /* XXX - why? */
+ foreach(lc, plancache_list)
+ {
+ CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+ List *stmt_list;
+ ListCell *lc2;
+
+ ricallbackarg.query = plansource->query_string;
+
+ /*
+ * Replan if needed, and increment plan refcount. If it's a saved
+ * plan, the refcount must be backed by the plan_owner.
+ */
+ cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+ stmt_list = cplan->stmt_list;
+
+ foreach(lc2, stmt_list)
+ {
+ PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ DestReceiver *dest;
+ QueryDesc *qdesc;
+ int eflags;
+
+ *last_stmt_cmdtype = stmt->commandType;
+
+ /*
+ * Advance the command counter before each command and update the
+ * snapshot.
+ */
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+
+ dest = CreateDestReceiver(DestNone);
+ qdesc = CreateQueryDesc(stmt, plansource->query_string,
+ GetActiveSnapshot(), crosscheck_snapshot,
+ dest, paramLI, NULL, 0);
+
+ /* Select execution options */
+ eflags = EXEC_FLAG_SKIP_TRIGGERS;
+ ExecutorStart(qdesc, eflags);
+ ExecutorRun(qdesc, ForwardScanDirection, limit);
+
+ /* We return the last executed statement's value. */
+ tuples_processed = qdesc->estate->es_processed;
+
+ ExecutorFinish(qdesc);
+ ExecutorEnd(qdesc);
+ FreeQueryDesc(qdesc);
+ }
+
+ /* Done with this plan, so release refcount */
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ cplan = NULL;
+ }
+
+ Assert(cplan == NULL);
+
+ /*
+ * Pop the error context stack
+ */
+ error_context_stack = rierrcontext.previous;
+
+ return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+ List *plancache_list = (List *) plan->plan_exec_arg;
+ ListCell *lc;
+
+ foreach(lc, plancache_list)
+ {
+ CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+ if (!CachedPlanIsValid(plansource))
+ return false;
+ }
+ return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+ List *plancache_list = (List *) plan->plan_exec_arg;
+ ListCell *lc;
+
+ foreach(lc, plancache_list)
+ {
+ CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+ DropCachedPlan(plansource);
+ }
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+ const char *querystr, int nargs, Oid *paramtypes)
+{
+ RI_Plan *plan;
+ MemoryContext plancxt,
+ oldcxt;
+
+ /*
+ * Create a memory context for the plan underneath CurrentMemoryContext,
+ * which is reparented later to be underneath CacheMemoryContext;
+ */
+ plancxt = AllocSetContextCreate(CurrentMemoryContext,
+ "RI Plan",
+ ALLOCSET_SMALL_SIZES);
+ oldcxt = MemoryContextSwitchTo(plancxt);
+ plan = (RI_Plan *) palloc0(sizeof(*plan));
+ plan->plancxt = plancxt;
+ plan->nargs = nargs;
+ if (plan->nargs > 0)
+ {
+ plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+ memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+ }
+
+ plan_create_func(plan, querystr, nargs, paramtypes);
+
+ MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+ MemoryContextSwitchTo(oldcxt);
+
+ return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+ Datum *param_vals, char *param_isnulls,
+ Snapshot crosscheck_snapshot,
+ int limit, CmdType *last_stmt_cmdtype)
+{
+ Assert(ActiveSnapshotSet());
+ return plan->plan_exec_func(plan, fk_rel, pk_rel,
+ param_vals, param_isnulls,
+ crosscheck_snapshot,
+ limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+ return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+ /* First call the implementation specific release function. */
+ plan->plan_free_func(plan);
+
+ /* Now get rid of the RI_plan and subsidiary data in its plancxt */
+ MemoryContextDelete(plan->plancxt);
+}
/*
* Prepare execution plan for a query to enforce an RI restriction
*/
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ const char *querystr, int nargs, Oid *argtypes,
RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
{
- SPIPlanPtr qplan;
+ RI_Plan *qplan;
Relation query_rel;
Oid save_userid;
int save_sec_context;
@@ -2369,18 +2753,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
-
/* Create the plan */
- qplan = SPI_prepare(querystr, nargs, argtypes);
-
- if (qplan == NULL)
- elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+ qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
/* Restore UID and security context */
SetUserIdAndSecContext(save_userid, save_sec_context);
- /* Save the plan */
- SPI_keepplan(qplan);
ri_HashPreparedPlan(qkey, qplan);
return qplan;
@@ -2391,23 +2769,23 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
*/
static bool
ri_PerformCheck(const RI_ConstraintInfo *riinfo,
- RI_QueryKey *qkey, SPIPlanPtr qplan,
+ RI_QueryKey *qkey, RI_Plan *qplan,
Relation fk_rel, Relation pk_rel,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
- bool detectNewRows, int expect_OK)
+ bool detectNewRows, int expected_cmdtype)
{
Relation query_rel,
source_rel;
bool source_is_pk;
- Snapshot test_snapshot;
Snapshot crosscheck_snapshot;
int limit;
- int spi_result;
+ int tuples_processed;
Oid save_userid;
int save_sec_context;
Datum vals[RI_MAX_NUMKEYS * 2];
char nulls[RI_MAX_NUMKEYS * 2];
+ CmdType last_stmt_cmdtype;
/*
* Use the query type code to determine whether the query is run against
@@ -2458,30 +2836,34 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
* the caller passes detectNewRows == false then it's okay to do the query
* with the transaction snapshot; otherwise we use a current snapshot, and
* tell the executor to error out if it finds any rows under the current
- * snapshot that wouldn't be visible per the transaction snapshot. Note
- * that SPI_execute_snapshot will register the snapshots, so we don't need
- * to bother here.
+ * snapshot that wouldn't be visible per the transaction snapshot.
+ *
+ * Also push the chosen snapshot so that anyplace that wants to use it
+ * can get it by calling GetActiveSnapshot().
*/
if (IsolationUsesXactSnapshot() && detectNewRows)
{
- CommandCounterIncrement(); /* be sure all my own work is visible */
- test_snapshot = GetLatestSnapshot();
crosscheck_snapshot = GetTransactionSnapshot();
+ /* Make sure we have a private copy of the snapshot to modify. */
+ PushCopiedSnapshot(GetLatestSnapshot());
}
else
{
- /* the default SPI behavior is okay */
- test_snapshot = InvalidSnapshot;
crosscheck_snapshot = InvalidSnapshot;
+ PushActiveSnapshot(GetTransactionSnapshot());
}
+ /* Also advance the command counter and update the snapshot. */
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+
/*
* If this is a select query (e.g., for a 'no action' or 'restrict'
* trigger), we only need to see if there is a single row in the table,
* matching the key. Otherwise, limit = 0 - because we want the query to
* affect ALL the matching rows.
*/
- limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+ limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
/* Switch to proper UID to perform check as */
GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2490,19 +2872,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
SECURITY_NOFORCE_RLS);
/* Finally we can run the query. */
- spi_result = SPI_execute_snapshot(qplan,
- vals, nulls,
- test_snapshot, crosscheck_snapshot,
- false, false, limit);
+ tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
+ crosscheck_snapshot,
+ limit, &last_stmt_cmdtype);
/* Restore UID and security context */
SetUserIdAndSecContext(save_userid, save_sec_context);
- /* Check result */
- if (spi_result < 0)
- elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+ PopActiveSnapshot();
- if (expect_OK >= 0 && spi_result != expect_OK)
+ if (last_stmt_cmdtype != expected_cmdtype)
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2513,15 +2892,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/* XXX wouldn't it be clearer to do this part at the caller? */
if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
- expect_OK == SPI_OK_SELECT &&
- (SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+ expected_cmdtype == CMD_SELECT &&
+ (tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
ri_ReportViolation(riinfo,
pk_rel, fk_rel,
newslot ? newslot : oldslot,
NULL,
qkey->constr_queryno, is_restrict, false);
- return SPI_processed != 0;
+ return tuples_processed != 0;
}
/*
@@ -2798,14 +3177,14 @@ ri_InitHashTables(void)
/*
* ri_FetchPreparedPlan -
*
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
*/
-static SPIPlanPtr
+static RI_Plan *
ri_FetchPreparedPlan(RI_QueryKey *key)
{
RI_QueryHashEntry *entry;
- SPIPlanPtr plan;
+ RI_Plan *plan;
/*
* On the first call initialize the hashtable
@@ -2833,7 +3212,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
* locked both FK and PK rels.
*/
plan = entry->plan;
- if (plan && SPI_plan_is_valid(plan))
+ if (plan && ri_PlanIsValid(plan))
return plan;
/*
@@ -2842,7 +3221,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
*/
entry->plan = NULL;
if (plan)
- SPI_freeplan(plan);
+ ri_FreePlan(plan);
return NULL;
}
@@ -2854,7 +3233,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
* Add another plan to our private SPI query plan hashtable.
*/
static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
{
RI_QueryHashEntry *entry;
bool found;
--
2.43.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2025-10-21 04:07 ` Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2025-10-21 04:07 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Junwang Zhao <[email protected]>
On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <[email protected]> wrote:
> On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <[email protected]> wrote:
> > We discussed $subject at [1] and [2] and I'd like to continue that
> > work with the hope to commit some part of it for v18.
>
> I did not get a chance to do any further work on this in this cycle,
> but plan to start working on it after beta release, so moving this to
> the next CF. I will post a rebased patch after the freeze to keep the
> bots green for now.
Sorry for the inactivity. I've moved the patch entry in the CF app to
PG19-Drafts, since I don't plan to work on it myself in the immediate
future. However, Junwang Zhao has expressed interest in taking this
work forward, and I look forward to working with him on it.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2025-10-21 05:10 ` Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Pavel Stehule @ 2025-10-21 05:10 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: pgsql-hackers; Junwang Zhao <[email protected]>
Hi
út 21. 10. 2025 v 6:07 odesílatel Amit Langote <[email protected]>
napsal:
> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <[email protected]>
> wrote:
> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <[email protected]>
> wrote:
> > > We discussed $subject at [1] and [2] and I'd like to continue that
> > > work with the hope to commit some part of it for v18.
> >
> > I did not get a chance to do any further work on this in this cycle,
> > but plan to start working on it after beta release, so moving this to
> > the next CF. I will post a rebased patch after the freeze to keep the
> > bots green for now.
>
> Sorry for the inactivity. I've moved the patch entry in the CF app to
> PG19-Drafts, since I don't plan to work on it myself in the immediate
> future. However, Junwang Zhao has expressed interest in taking this
> work forward, and I look forward to working with him on it.
>
This is very interesting and important feature - I can help with testing
and review if it will be necessary
Regards
Pavel
>
> --
> Thanks, Amit Langote
>
>
>
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
@ 2025-10-22 13:55 ` Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2025-10-22 13:55 UTC (permalink / raw)
To: Pavel Stehule <[email protected]>; +Cc: pgsql-hackers; Junwang Zhao <[email protected]>
.
On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <[email protected]> wrote:
> út 21. 10. 2025 v 6:07 odesílatel Amit Langote <[email protected]> napsal:
>>
>> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <[email protected]> wrote:
>> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <[email protected]> wrote:
>> > > We discussed $subject at [1] and [2] and I'd like to continue that
>> > > work with the hope to commit some part of it for v18.
>> >
>> > I did not get a chance to do any further work on this in this cycle,
>> > but plan to start working on it after beta release, so moving this to
>> > the next CF. I will post a rebased patch after the freeze to keep the
>> > bots green for now.
>>
>> Sorry for the inactivity. I've moved the patch entry in the CF app to
>> PG19-Drafts, since I don't plan to work on it myself in the immediate
>> future. However, Junwang Zhao has expressed interest in taking this
>> work forward, and I look forward to working with him on it.
>
>
> This is very interesting and important feature - I can help with testing and review if it will be necessary
Thanks for the interest.
Just to add a quick note on the current direction I’ve been discussing
off-list with Junwang:
The next iteration of this work will likely follow a hybrid "fast-path
+ fallback" design rather than the original pure fast-path approach.
The idea is to keep the optimization for straightforward cases where
the foreign key and referenced key can be verified by a direct index
probe, while falling back to the existing SPI path only when the
runtime behavior of the executor is non-trivial to replicate -- such
as visibility rechecks under concurrent updates -- or when the
constraint itself involves richer semantics, like temporal foreign
keys that require range and aggregation logic. That keeps the
optimization safe without changing the meaning of constraint
enforcement.
This direction comes partly in response to the feedback from Robert
and Tom in the earlier Eliminating SPI threads, who raised concerns
that a fast path might silently diverge from what the executor does at
runtime in subtle cases. The fallback design aims to address that
directly: it keeps the optimization where it’s clearly safe, but
defers to the existing SPI-based implementation whenever correctness
might depend on executor behavior that would otherwise be difficult or
risky to reproduce locally.
In practice, this means adding a guarded fast path that performs the
index probe and tuple lock directly under the same snapshot and
security context that SPI would use, while caching stable metadata
such as index descriptors, scan keys, and operator information per
constraint or per statement. The fallback to SPI remains for the few
cases that either depend on executor behavior or need features beyond
a simple index probe:
* Concurrent updates or deletes: If table_tuple_lock() reports that
the target tuple was updated or deleted, we delegate to the SPI path
so that EvalPlanQual and visibility rules are applied as today.
* Partitioned parents: Skipped in v1 for simplicity, since they
require routing the probe through the correct partition using
PartitionDirectory. This can be added later as a separate patch once
the core mechanism is stable.
* Temporal foreign keys: These use range overlap and containment
semantics (&&, <@, range_agg()) that inherently involve aggregation
and multiple-row reasoning, so they stay on the SPI path.
Everything else -- multi-column keys, cross-type equality supported by
the index opfamily, collation matching, and RLS/ACL enforcement --
will be handled directly in the fast path. The security behavior will
mirror the existing SPI path by temporarily switching to the parent
table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS
around the probe, like ri_PerformCheck() does.
For concurrency, the fast path locks the located parent tuple with
LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok),
the check passes immediately. While non-TM_Ok cases fall back for now,
a later refinement could follow the update chain with
table_tuple_fetch_row_version() under the current snapshot and re-lock
the visible version, making the fast path fully self-contained.
That’s the direction Junwang and I plan to explore next.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2025-12-01 06:09 ` Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Junwang Zhao @ 2025-12-01 06:09 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Pavel Stehule <[email protected]>; pgsql-hackers
Hi,
On Wed, Oct 22, 2025 at 9:56 PM Amit Langote <[email protected]> wrote:
>
> .
> On Tue, Oct 21, 2025 at 2:10 PM Pavel Stehule <[email protected]> wrote:
> > út 21. 10. 2025 v 6:07 odesílatel Amit Langote <[email protected]> napsal:
> >>
> >> On Thu, Apr 3, 2025 at 7:19 PM Amit Langote <[email protected]> wrote:
> >> > On Fri, Dec 20, 2024 at 1:23 PM Amit Langote <[email protected]> wrote:
> >> > > We discussed $subject at [1] and [2] and I'd like to continue that
> >> > > work with the hope to commit some part of it for v18.
> >> >
> >> > I did not get a chance to do any further work on this in this cycle,
> >> > but plan to start working on it after beta release, so moving this to
> >> > the next CF. I will post a rebased patch after the freeze to keep the
> >> > bots green for now.
> >>
> >> Sorry for the inactivity. I've moved the patch entry in the CF app to
> >> PG19-Drafts, since I don't plan to work on it myself in the immediate
> >> future. However, Junwang Zhao has expressed interest in taking this
> >> work forward, and I look forward to working with him on it.
> >
> >
> > This is very interesting and important feature - I can help with testing and review if it will be necessary
>
> Thanks for the interest.
>
> Just to add a quick note on the current direction I’ve been discussing
> off-list with Junwang:
>
> The next iteration of this work will likely follow a hybrid "fast-path
> + fallback" design rather than the original pure fast-path approach.
> The idea is to keep the optimization for straightforward cases where
> the foreign key and referenced key can be verified by a direct index
> probe, while falling back to the existing SPI path only when the
> runtime behavior of the executor is non-trivial to replicate -- such
> as visibility rechecks under concurrent updates -- or when the
> constraint itself involves richer semantics, like temporal foreign
> keys that require range and aggregation logic. That keeps the
> optimization safe without changing the meaning of constraint
> enforcement.
>
> This direction comes partly in response to the feedback from Robert
> and Tom in the earlier Eliminating SPI threads, who raised concerns
> that a fast path might silently diverge from what the executor does at
> runtime in subtle cases. The fallback design aims to address that
> directly: it keeps the optimization where it’s clearly safe, but
> defers to the existing SPI-based implementation whenever correctness
> might depend on executor behavior that would otherwise be difficult or
> risky to reproduce locally.
>
> In practice, this means adding a guarded fast path that performs the
> index probe and tuple lock directly under the same snapshot and
> security context that SPI would use, while caching stable metadata
> such as index descriptors, scan keys, and operator information per
> constraint or per statement. The fallback to SPI remains for the few
> cases that either depend on executor behavior or need features beyond
> a simple index probe:
>
> * Concurrent updates or deletes: If table_tuple_lock() reports that
> the target tuple was updated or deleted, we delegate to the SPI path
> so that EvalPlanQual and visibility rules are applied as today.
>
> * Partitioned parents: Skipped in v1 for simplicity, since they
> require routing the probe through the correct partition using
> PartitionDirectory. This can be added later as a separate patch once
> the core mechanism is stable.
>
> * Temporal foreign keys: These use range overlap and containment
> semantics (&&, <@, range_agg()) that inherently involve aggregation
> and multiple-row reasoning, so they stay on the SPI path.
>
> Everything else -- multi-column keys, cross-type equality supported by
> the index opfamily, collation matching, and RLS/ACL enforcement --
> will be handled directly in the fast path. The security behavior will
> mirror the existing SPI path by temporarily switching to the parent
> table's owner with SECURITY_LOCAL_USERID_CHANGE | SECURITY_NOFORCE_RLS
> around the probe, like ri_PerformCheck() does.
>
> For concurrency, the fast path locks the located parent tuple with
> LockTupleKeyShare under GetActiveSnapshot(). If that succeeds (TM_Ok),
> the check passes immediately. While non-TM_Ok cases fall back for now,
> a later refinement could follow the update chain with
> table_tuple_fetch_row_version() under the current snapshot and re-lock
> the visible version, making the fast path fully self-contained.
>
> That’s the direction Junwang and I plan to explore next.
>
> --
> Thanks, Amit Langote
As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
design.
0001 adds a fast path optimization for foreign key constraint checks
that bypasses the SPI executor, the fast path applies when the referenced
table is not partitioned, and the constraint does not involve temporal
semantics.
With the following test:
create table pk (a numeric primary key);
create table fk (a bigint references pk);
insert into pk select generate_series(1, 2000000);
head:
[local] zhjwpku@postgres:5432-90419=# insert into fk select
generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 13516.177 ms (00:13.516)
[local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
UPDATE 1000000
Time: 15057.638 ms (00:15.058)
patched:
[local] zhjwpku@postgres:5432-98673=# insert into fk select
generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 8248.777 ms (00:08.249)
[local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
UPDATE 1000000
Time: 10117.002 ms (00:10.117)
0002 cache fast-path metadata used by the index probe, at the current
time only comparison operator hash entries, operator function OIDs
and strategy numbers and subtypes for index scans. But this cache
doesn't buy any performance improvement.
Caching additional metadata should improve performance for foreign key checks.
Amit suggested introducing a mechanism for ri_triggers.c to register a
cleanup callback in the EState, which AfterTriggerEndQuery() could then
invoke to release per-statement cached metadata (such as the IndexScanDesc).
However, I haven't been able to implement this mechanism yet.
Amit and I agree that we can post the patches here for review now. We are
continuing to work on improving the metadata cache implementation.
--
Regards
Junwang Zhao
Attachments:
[application/octet-stream] v2-0002-Cache-fast-path-metadata-for-foreign-key-checks.patch (5.6K, 2-v2-0002-Cache-fast-path-metadata-for-foreign-key-checks.patch)
download | inline diff:
From 020729139c824d65a008c6644431be8e8efd7800 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Mon, 1 Dec 2025 12:58:59 +0800
Subject: [PATCH v2 2/2] Cache fast-path metadata for foreign key checks
The metadata is populated lazily on first use via
ri_populate_fastpath_metadata() and reused in subsequent checks via
build_scankeys_from_cache(). This eliminates repeated calls to
ri_HashCompareOp() and get_op_opfamily_properties() during FK checks.
---
src/backend/utils/adt/ri_triggers.c | 90 +++++++++++++++++++++--------
1 file changed, 65 insertions(+), 25 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfb85b9d753..f2e7e4f4ae9 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -94,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+struct RI_CompareHashEntry;
/*
* RI_ConstraintInfo
@@ -133,6 +134,16 @@ typedef struct RI_ConstraintInfo
Oid agged_period_contained_by_oper; /* fkattr <@ range_agg(pkattr) */
Oid period_intersect_oper; /* anyrange * anyrange */
dlist_node valid_link; /* Link in list of valid entries */
+
+ /* Fast-path metadata for RI checks on foreign tables */
+ bool fpmeta_valid; /* is fast-path metadata valid? */
+ // Relation idxrel;
+ // IndexScanDesc idxscan;
+ // TupleTableSlot *outslot;
+ struct RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
} RI_ConstraintInfo;
/*
@@ -295,6 +306,42 @@ get_fkey_unique_index(Oid conoid)
return result;
}
+static void
+ri_populate_fastpath_metadata(Oid constraintOid,
+ Relation pk_rel, Relation fk_rel, Relation idx_rel)
+{
+ RI_ConstraintInfo *riinfo;
+
+ /* Find the constraint info */
+ riinfo = (RI_ConstraintInfo *)
+ hash_search(ri_constraint_cache,
+ &constraintOid,
+ HASH_FIND,
+ NULL);
+ Assert(riinfo != NULL && riinfo->valid);
+
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ /* Use PK = FK equality operator. */
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ riinfo->compare_entries[i] = entry;
+ riinfo->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &riinfo->strats[i],
+ &lefttype,
+ &riinfo->subtypes[i]);
+ }
+
+ riinfo->fpmeta_valid = true;
+}
+
/*
* ri_CheckPermissions
* Check that the new user has permissions to look into the schema of
@@ -365,20 +412,14 @@ recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
}
/*
- * Doesn't include any cache for now.
+ * Build ScanKeys from cached metadata for fast-path foreign key checks
*/
static void
build_scankeys_from_cache(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
- Relation idx_rel, int num_pk,
- Datum *pk_vals, char *pk_nulls,
- ScanKey skeys)
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
{
- /* Use PK = FK equality operator. */
- const Oid *eq_oprs = riinfo->pf_eq_oprs;
-
- Assert(num_pk == riinfo->nkeys);
-
/*
* May need to cast each of the individual values of the foreign key
* to the corresponding PK column's type if the equality operator
@@ -388,9 +429,7 @@ build_scankeys_from_cache(const RI_ConstraintInfo *riinfo,
{
if (pk_nulls[i] != 'n')
{
- Oid eq_opr = eq_oprs[i];
- Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
- RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+ RI_CompareHashEntry *entry = riinfo->compare_entries[i];
if (OidIsValid(entry->cast_func_finfo.fn_oid))
pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
@@ -406,20 +445,12 @@ build_scankeys_from_cache(const RI_ConstraintInfo *riinfo,
* Set up ScanKeys for the index scan. This is essentially how
* ExecIndexBuildScanKeys() sets them up.
*/
- for (int i = 0; i < num_pk; i++)
+ for (int i = 0; i < riinfo->nkeys; i++)
{
int pkattrno = i + 1;
- Oid lefttype,
- righttype;
- Oid operator = eq_oprs[i];
- Oid opfamily = idx_rel->rd_opfamily[i];
- int strat;
- RegProcedure regop = get_opcode(operator);
-
- get_op_opfamily_properties(operator, opfamily, false, &strat,
- &lefttype, &righttype);
- ScanKeyEntryInitialize(&skeys[i], 0, pkattrno, strat, righttype,
- idx_rel->rd_indcollation[i], regop,
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno, riinfo->strats[i], riinfo->subtypes[i],
+ idx_rel->rd_indcollation[i], riinfo->regops[i],
pk_vals[i]);
}
}
@@ -583,7 +614,15 @@ RI_FKey_check(TriggerData *trigdata)
idxrel = index_open(idxoid, RowShareLock);
num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
- build_scankeys_from_cache(riinfo, pk_rel, fk_rel, idxrel, num_pk,
+ Assert(num_pk == riinfo->nkeys);
+
+ /* If Fast-path metadata hasn't been populated, do it now */
+ if (!riinfo->fpmeta_valid)
+ ri_populate_fastpath_metadata(riinfo->constraint_id,
+ pk_rel, fk_rel, idxrel);
+ Assert(riinfo->fpmeta_valid);
+
+ build_scankeys_from_cache(riinfo, pk_rel, fk_rel, idxrel,
pk_vals, pk_nulls, skey);
scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), NULL, riinfo->nkeys, 0);
@@ -2663,6 +2702,7 @@ ri_LoadConstraintInfo(Oid constraintOid)
dclist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
riinfo->valid = true;
+ riinfo->fpmeta_valid = false;
return riinfo;
}
--
2.41.0
[application/octet-stream] v2-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (26.4K, 3-v2-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From c93ee8b6dfd5f345603c327e82b50f1dd8f31cf0 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Mon, 1 Dec 2025 12:16:46 +0800
Subject: [PATCH v2 1/2] Add fast path for foreign key constraint checks
Add a fast path optimization for foreign key constraint checks that
bypasses the SPI executor for simple foreign keys by directly probing
the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned,
and the constraint does not involve temporal semantics. It extracts
the FK value, scans the unique index directly, and locks the tuple
with KEY SHARE lock, matching SPI behavior.
This avoids SPI overhead and improves performance for bulk operations
with many FK checks.
Refactoring: Extract tuple locking logic into ExecLockTableTuple() for
reuse.
Author: Amit Langote, Junwang Zhao
Discussion:
---
src/backend/executor/nodeLockRows.c | 164 +++++----
src/backend/utils/adt/ri_triggers.c | 323 +++++++++++++++++-
src/include/executor/executor.h | 9 +
.../expected/fk-concurrent-pk-upd.out | 58 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 42 +++
src/test/regress/expected/foreign_key.out | 47 +++
src/test/regress/sql/foreign_key.sql | 64 ++++
8 files changed, 635 insertions(+), 73 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a8afbf93b48..06c4784c0f5 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
Datum datum;
bool isNull;
ItemPointerData tid;
- TM_FailureData tmfd;
LockTupleMode lockmode;
- int lockflags = 0;
- TM_Result test;
TupleTableSlot *markSlot;
/* clear any leftover test tuple for this rel */
@@ -178,74 +175,11 @@ lnext:
break;
}
- lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
- if (!IsolationUsesXactSnapshot())
- lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
- test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
- markSlot, estate->es_output_cid,
- lockmode, erm->waitPolicy,
- lockflags,
- &tmfd);
-
- switch (test)
- {
- case TM_WouldBlock:
- /* couldn't lock tuple in SKIP LOCKED mode */
- goto lnext;
-
- case TM_SelfModified:
-
- /*
- * The target tuple was already updated or deleted by the
- * current command, or by a later command in the current
- * transaction. We *must* ignore the tuple in the former
- * case, so as to avoid the "Halloween problem" of repeated
- * update attempts. In the latter case it might be sensible
- * to fetch the updated tuple instead, but doing so would
- * require changing heap_update and heap_delete to not
- * complain about updating "invisible" tuples, which seems
- * pretty scary (table_tuple_lock will not complain, but few
- * callers expect TM_Invisible, and we're not one of them). So
- * for now, treat the tuple as deleted and do not process.
- */
- goto lnext;
-
- case TM_Ok:
-
- /*
- * Got the lock successfully, the locked tuple saved in
- * markSlot for, if needed, EvalPlanQual testing below.
- */
- if (tmfd.traversed)
- epq_needed = true;
- break;
-
- case TM_Updated:
- if (IsolationUsesXactSnapshot())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("could not serialize access due to concurrent update")));
- elog(ERROR, "unexpected table_tuple_lock status: %u",
- test);
- break;
-
- case TM_Deleted:
- if (IsolationUsesXactSnapshot())
- ereport(ERROR,
- (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
- errmsg("could not serialize access due to concurrent update")));
- /* tuple was deleted so don't return it */
- goto lnext;
-
- case TM_Invisible:
- elog(ERROR, "attempted to lock invisible tuple");
- break;
-
- default:
- elog(ERROR, "unrecognized table_tuple_lock status: %u",
- test);
- }
+ /* skip tuple if it couldn't be locked */
+ if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+ estate->es_snapshot, estate->es_output_cid,
+ lockmode, erm->waitPolicy, &epq_needed))
+ goto lnext;
/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
erm->curCtid = tid;
@@ -280,6 +214,94 @@ lnext:
return slot;
}
+
+/*
+ * ExecLockTableTuple
+ * Locks tuple with the specified TID in lockmode following given wait
+ * policy
+ *
+ * Returns true if the tuple was successfully locked. Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+ Snapshot snapshot, CommandId cid,
+ LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+ bool *tuple_concurrently_updated)
+{
+ TM_FailureData tmfd;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+ TM_Result test;
+
+ if (tuple_concurrently_updated)
+ *tuple_concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+ waitPolicy, lockflags, &tmfd);
+
+ switch (test)
+ {
+ case TM_WouldBlock:
+ /* couldn't lock tuple in SKIP LOCKED mode */
+ return false;
+
+ case TM_SelfModified:
+ /*
+ * The target tuple was already updated or deleted by the
+ * current command, or by a later command in the current
+ * transaction. We *must* ignore the tuple in the former
+ * case, so as to avoid the "Halloween problem" of repeated
+ * update attempts. In the latter case it might be sensible
+ * to fetch the updated tuple instead, but doing so would
+ * require changing heap_update and heap_delete to not
+ * complain about updating "invisible" tuples, which seems
+ * pretty scary (table_tuple_lock will not complain, but few
+ * callers expect TM_Invisible, and we're not one of them). So
+ * for now, treat the tuple as deleted and do not process.
+ */
+ return false;
+
+ case TM_Ok:
+ /*
+ * Got the lock successfully, the locked tuple saved in
+ * slot for EvalPlanQual, if asked by the caller.
+ */
+ if (tmfd.traversed && tuple_concurrently_updated)
+ *tuple_concurrently_updated = true;
+ break;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ elog(ERROR, "unexpected table_tuple_lock status: %u",
+ test);
+ break;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ /* tuple was deleted so don't return it */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ return false;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+ return false;
+ }
+
+ return true;
+}
+
/* ----------------------------------------------------------------
* ExecInitLockRows
*
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 059fc5ebf60..cfb85b9d753 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -238,6 +241,188 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo, Relation pk_rel)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since
+ * they require routing the probe through the correct partition using
+ * PartitionDirectory.
+ * This can be added later as a separate patch once the core mechanism
+ * is stable.
+ */
+ if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics
+ * (&&, <@, range_agg()) that inherently involve aggregation and
+ * multiple-row reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * get_fkey_unique_index
+ * Returns the unique index used by a supposedly foreign key constraint
+ *
+ * XXX This is very similar to get_constraint_index; probably they should be
+ * unified.
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+ Oid result = InvalidOid;
+ HeapTuple tp;
+
+ tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+ if (HeapTupleIsValid(tp))
+ {
+ Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+ if (contup->contype == CONSTRAINT_FOREIGN)
+ result = contup->conindid;
+ ReleaseSysCache(tp);
+ }
+
+ if (!OidIsValid(result))
+ elog(ERROR, "unique index not found for foreign key constraint %u",
+ conoid);
+
+ return result;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the new user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ *
+ * Provided for non-SQL implementors of an RI_Plan.
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * This checks that the index key of the tuple specified in 'new_slot' matches
+ * the key that has already been found in the PK index relation 'idxrel'.
+ *
+ * Returns true if the index key of the tuple matches the existing index
+ * key, false otherwise.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ skey->sk_argument,
+ values[i])))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * Doesn't include any cache for now.
+ */
+static void
+build_scankeys_from_cache(const RI_ConstraintInfo *riinfo,
+ Relation pk_rel, Relation fk_rel,
+ Relation idx_rel, int num_pk,
+ Datum *pk_vals, char *pk_nulls,
+ ScanKey skeys)
+{
+ /* Use PK = FK equality operator. */
+ const Oid *eq_oprs = riinfo->pf_eq_oprs;
+
+ Assert(num_pk == riinfo->nkeys);
+
+ /*
+ * May need to cast each of the individual values of the foreign key
+ * to the corresponding PK column's type if the equality operator
+ * demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ Oid eq_opr = eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ } else {
+ Assert(false);
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < num_pk; i++)
+ {
+ int pkattrno = i + 1;
+ Oid lefttype,
+ righttype;
+ Oid operator = eq_oprs[i];
+ Oid opfamily = idx_rel->rd_opfamily[i];
+ int strat;
+ RegProcedure regop = get_opcode(operator);
+
+ get_op_opfamily_properties(operator, opfamily, false, &strat,
+ &lefttype, &righttype);
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno, strat, righttype,
+ idx_rel->rd_indcollation[i], regop,
+ pk_vals[i]);
+ }
+}
/*
* RI_FKey_check -
@@ -349,6 +534,132 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /* Fast path, for simple cases, probe the unique index directly */
+ if (ri_fastpath_is_applicable(riinfo, pk_rel))
+ {
+ Oid idxoid;
+ Relation idxrel;
+ int num_pk;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ IndexScanDesc scan;
+ TupleTableSlot *outslot;
+ Oid saved_userid;
+ int saved_sec_context;
+ bool tuple_concurrently_updated;
+ int tuples_processed = 0;
+
+ elog(DEBUG1,
+ "RI fastpath: constraint \"%s\" using fast path",
+ NameStr(riinfo->conname));
+
+ /*
+ * Extract the unique key from the provided slot and choose the
+ * equality operators to use when scanning the index below.
+ */
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+
+ /*
+ * Switch to referenced table's owner to perform the below operations as.
+ * This matches what ri_PerformCheck() does.
+ */
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ PushActiveSnapshot(GetTransactionSnapshot());
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+
+ /*
+ * Open the constraint index to be scanned.
+ *
+ * Handle partitioned 'pk_rel' later, skipped in ri_fastpath_is_applicable
+ */
+ idxoid = get_fkey_unique_index(riinfo->constraint_id);
+ idxrel = index_open(idxoid, RowShareLock);
+ num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+ build_scankeys_from_cache(riinfo, pk_rel, fk_rel, idxrel, num_pk,
+ pk_vals, pk_nulls, skey);
+
+ scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), NULL, riinfo->nkeys, 0);
+
+ /* Install the ScanKeys. */
+ index_rescan(scan, skey, num_pk, NULL, 0);
+
+ /* should be cached, avoid create for each row */
+ outslot = table_slot_create(pk_rel, NULL);
+
+ /* Look for the tuple, and if found, try to lock it in key share mode. */
+ if (!index_getnext_slot(scan, ForwardScanDirection, outslot))
+ ri_ReportViolation(riinfo,
+ pk_rel, fk_rel,
+ newslot,
+ NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ /*
+ * If we fail to lock the tuple for whatever reason, assume it doesn't
+ * exist. If the locked tuple is the one that was found to be updated
+ * concurrently, retry.
+ */
+ if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+ GetActiveSnapshot(),
+ GetCurrentCommandId(false),
+ LockTupleKeyShare,
+ LockWaitBlock,
+ &tuple_concurrently_updated))
+ {
+ bool matched = true;
+
+ /*
+ * If the matched table tuple has been updated, check if the key is
+ * still the same.
+ *
+ * This emulates EvalPlanQual() in the executor.
+ */
+ if (tuple_concurrently_updated &&
+ !recheck_matched_pk_tuple(idxrel, skey, outslot))
+ matched = false;
+
+ if (matched)
+ tuples_processed = 1;
+ }
+
+ index_endscan(scan);
+ ExecDropSingleTupleTableSlot(outslot);
+
+ /* Don't release lock until commit. */
+ index_close(idxrel, NoLock);
+
+ PopActiveSnapshot();
+
+ /* Restore UID and security context */
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ if (tuples_processed == 1)
+ {
+ table_close(pk_rel, RowShareLock);
+ return PointerGetDatum(NULL);
+ }
+ else
+ {
+ ri_ReportViolation(riinfo,
+ pk_rel, fk_rel,
+ newslot,
+ NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+
+ /* Fall back to SPI */
+ elog(DEBUG1, "RI fastpath: constraint \"%s\" falling back to SPI",
+ NameStr(riinfo->conname));
+
SPI_connect();
/* Fetch or prepare a saved plan for the real check */
@@ -3165,8 +3476,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the values that will be passed to the
+ * operator will be of expected operand type(s). The operator can be
+ * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+ * case, we only need the cast if the right operand value doesn't match
+ * the type expected by the operator.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..8155aa7ae79 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -303,6 +303,15 @@ extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
+/*
+ * functions in nodeLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+ Snapshot snapshot, CommandId cid,
+ LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+ bool *tuple_concurrently_updated);
+
/* ----------------------------------------------------------------
* ExecProcNode
*
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..9bbec638ac9
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,58 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2ukey s1i s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2uaux s1i s2c s1c s2s s1s
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2ukey s1i s2ukey2 s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 112f05a3677..124d4cc289f 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..cba05a85f78
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,42 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+setup { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+setup { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+# fail
+permutation s2ukey s1i s2c s1c s2s s1s
+# ok
+permutation s2uaux s1i s2c s1c s2s s1s
+# ok
+permutation s2ukey s1i s2ukey2 s2c s1c s2s s1s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 7f9e0ebb82d..eb7d393ea25 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 4a6172b8e56..4b2198348d2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
--
2.41.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-02 07:49 ` Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-02 07:49 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
Hi Tomas,
Thanks for the thorough benchmarking.
On Sun, Mar 1, 2026 at 9:22 PM Tomas Vondra <[email protected]> wrote:
> On 2/28/26 08:08, Amit Langote wrote:
> > Tomas Vondra also tested with an I/O-intensive workload (dataset
> > larger than shared_buffers, combined with his and Peter Geoghegan's
> > I/O prefetching patches) and confirmed that the batching + SAOP
> > approach helps there too, not just in the CPU-bound / memory-resident
> > case. In fact he showed that the patches here don't make a big dent
> > when the main bottleneck is I/O as shown in numbers that he shared in
> > an off-list email:
> >
> > master: 161617 ms
> > ri-check (0001..0004): 149446 ms (1.08x)
> > ri-check + i/o prefetching: 50885 ms (3.2x)
> >
> > So the RI patches alone only give ~8% here since most time is waiting
> > on reads. But the batching gives the prefetch machinery a window of
> > upcoming probes to issue readahead against, so the two together yield
> > 3.2x.
> >
>
> I tested this (with the index prefetching v11 patch), because I wanted
> to check if the revised API works fine for other use cases, not just the
> regular index scans. Turns out the answer is "yes", the necessary tweaks
> to the FK batching patch were pretty minimal, and at the same time it
> did help quite a bit for cases bottle-necked on I/O.
Do you think those changes to the FK batching are only necessary for
making it work with your patch or is that worth including with the set
here because it's generally applicable?
> FWIW I wonder how difficult would it be to do something like this for
> inserts into indexes. It's an orthogonal issue to FK checks (especially
> for the CPU-bound cases this thread focuses on), but it's a bit similar
> to the I/O-bound case. In fact, I now realize I actually did a PoC for
> that in 2023-11 [1], but it went stale ...
Interesting. I hadn't seen your earlier PoC. Does the current I/O
prefetching infrastructure simplify that approach, or are they
independent paths? The old patch calls PrefetchBuffer() directly on
the leaf, which seems orthogonal to the scan-side prefetching. Either
way, would be nice to see more paths benefit from batching.
> benchmarks
> ----------
>
> Anyway, thinking about the CPU-bound case, I decided to do a bit of
> testing on my own. I was wondering about three things:
>
> (a) how does the improvement depend on data distribution
> (b) could it cause regressions for small inserts
> (c) how sensitive is the batch size
>
> So I devised two simple benchmarks:
>
> 1) run-pattern.sh - Inserts batches of values into a table, both the
> batch and table can be either random or sequential. It's either 100k or
> 1M rows, logged or unlogged, etc.
>
> 2) run-pgbench.sh - Runs short pgbench inserting data into a table,
> similar to (1), but with very few rows - so the timing approach is not
> suitable to measure this.
>
> Both scripts run against master, and then patched branch with three
> batch sizes (default 64, 16 and 256).
>
>
> results
> -------
>
> The results are very positive - see the attached PDF files comparing the
> patched builds to master.
>
> I have not found a single case where the batching causes regressions.
> This surprised me a bit, I've expected small regressions for single-row
> inserts in the pgbench test, but even that shows a small (~5%) gain.
> Even just 2-row inserts show +25% improvement in pgbench throughput.
This is reassuring. I too was half-expecting the batching
infrastructure to add measurable overhead for single-row inserts, but
it looks like the SPI bypass alone more than covers it.
> There are a couple cases where it matches master, I assume that's for
> I/O bound cases where the CPU optimizations do not really matter. That's
> expected, of course.
>
> I don't see much sensitivity on the batch size. The 256 batches seem to
> be a bit slower, but there's little difference between 16 and 64. So I'd
> say 64 seems reasonable.
Agreed. Interesting that 16 is consistently a little better than 64 in
the patterns benchmark. I'd guess that's the per-PK-index-match linear
scan over the batch cost showing up, since it's O(batch_size) per PK
match. 256 being noticeably worse fits that picture. 64 seems like a
good middle ground since the pgbench numbers show virtually no
difference between 16 and 64.
The best-case numbers are striking -- when both the PK table and the
FK values being inserted are in sequential order, the unlogged
patterns case hits 4-5x, wow. I guess that makes sense because
sequential FK values turn into a sorted SAOP array that walks
consecutive leaf pages, so it's essentially a single sequential scan
of the relevant index portion.
> Overall, I think these results looks quite good. I haven't looked at the
> code very closely, not beyond adjusting it to work with index prefetch.
If you get a chance, I'd welcome a closer look. Your memory context
catch was a real bug that I'd missed entirely. The area that would
benefit most from a second pair of eyes is the snapshot and permission
caching semantics in 0002. The argument for why reusing the snapshot
and checking permissions once per batch is safe rather than per-row is
sound I think, but the effects are global and hard to validate by
testing alone..
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 11:07 ` Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Sandro Santilli @ 2026-04-09 11:07 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Amit Langote <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
>
> TBH I haven't noticed the memory context issue myself, I only noticed
> because the builds with index prefetch started crashing.
We're getting a crash in PostGIS too, since that commit was merged into
the master branch, see https://trac.osgeo.org/postgis/ticket/6066
The crash is triggered a C function using SPI.
--strk;
Attachments:
[application/pgp-signature] signature.asc (659B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
@ 2026-04-09 11:55 ` Amit Langote <[email protected]>
2026-04-09 16:01 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-09 11:55 UTC (permalink / raw)
To: Sandro Santilli <[email protected]>; Tomas Vondra <[email protected]>; Amit Langote <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
Hi Sandro,
On Thu, Apr 9, 2026 at 8:07 PM Sandro Santilli <[email protected]> wrote:
> On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
> >
> > TBH I haven't noticed the memory context issue myself, I only noticed
> > because the builds with index prefetch started crashing.
>
> We're getting a crash in PostGIS too, since that commit was merged into
> the master branch, see https://trac.osgeo.org/postgis/ticket/6066
>
> The crash is triggered a C function using SPI.
Evan Montgomery-Recht posted a report of the same issue on this thread
a couple of days ago.
I have posted a patch to fix the issue, which I will commit tomorrow
after a bit more testing.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 16:01 ` Sandro Santilli <[email protected]>
2026-04-10 04:14 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Sandro Santilli @ 2026-04-09 16:01 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
On Thu, Apr 09, 2026 at 08:55:01PM +0900, Amit Langote wrote:
> Hi Sandro,
>
> On Thu, Apr 9, 2026 at 8:07 PM Sandro Santilli <[email protected]> wrote:
> > On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
> > >
> > > TBH I haven't noticed the memory context issue myself, I only noticed
> > > because the builds with index prefetch started crashing.
> >
> > We're getting a crash in PostGIS too, since that commit was merged into
> > the master branch, see https://trac.osgeo.org/postgis/ticket/6066
> >
> > The crash is triggered a C function using SPI.
>
> Evan Montgomery-Recht posted a report of the same issue on this thread
> a couple of days ago.
I confirm the patch attached in Evan's email [1] fixes the crash for us.
[1] https://www.postgresql.org/message-id/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g%40mail.gma...
> I have posted a patch to fix the issue, which I will commit tomorrow
> after a bit more testing.
I also confirm your patch v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch
fixes the crash for us. Thank you !
Let me know when it is time to test again against master.
--strk;
Attachments:
[application/pgp-signature] signature.asc (659B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 16:01 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
@ 2026-04-10 04:14 ` Amit Langote <[email protected]>
2026-04-10 04:20 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-10 18:35 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-10 04:14 UTC (permalink / raw)
To: Sandro Santilli <[email protected]>; Amit Langote <[email protected]>; Tomas Vondra <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
Hi Sandro,
On Fri, Apr 10, 2026 at 1:02 AM Sandro Santilli <[email protected]> wrote:
> On Thu, Apr 09, 2026 at 08:55:01PM +0900, Amit Langote wrote:
> > Hi Sandro,
> >
> > On Thu, Apr 9, 2026 at 8:07 PM Sandro Santilli <[email protected]> wrote:
> > > On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
> > > >
> > > > TBH I haven't noticed the memory context issue myself, I only noticed
> > > > because the builds with index prefetch started crashing.
> > >
> > > We're getting a crash in PostGIS too, since that commit was merged into
> > > the master branch, see https://trac.osgeo.org/postgis/ticket/6066
> > >
> > > The crash is triggered a C function using SPI.
> >
> > Evan Montgomery-Recht posted a report of the same issue on this thread
> > a couple of days ago.
>
> I confirm the patch attached in Evan's email [1] fixes the crash for us.
>
> [1] https://www.postgresql.org/message-id/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g%40mail.gma...
>
> > I have posted a patch to fix the issue, which I will commit tomorrow
> > after a bit more testing.
>
> I also confirm your patch v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch
> fixes the crash for us. Thank you !
Thanks for confirming that.
> Let me know when it is time to test again against master.
I have just pushed 0001 which you'll find in master as 34a3078629.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 16:01 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-10 04:14 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-10 04:20 ` Chao Li <[email protected]>
2026-04-10 04:21 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-10 04:20 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Sandro Santilli <[email protected]>; Tomas Vondra <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; jie wang <[email protected]>
> On Apr 10, 2026, at 12:14, Amit Langote <[email protected]> wrote:
>
> Hi Sandro,
>
> On Fri, Apr 10, 2026 at 1:02 AM Sandro Santilli <[email protected]> wrote:
>> On Thu, Apr 09, 2026 at 08:55:01PM +0900, Amit Langote wrote:
>>> Hi Sandro,
>>>
>>> On Thu, Apr 9, 2026 at 8:07 PM Sandro Santilli <[email protected]> wrote:
>>>> On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
>>>>>
>>>>> TBH I haven't noticed the memory context issue myself, I only noticed
>>>>> because the builds with index prefetch started crashing.
>>>>
>>>> We're getting a crash in PostGIS too, since that commit was merged into
>>>> the master branch, see https://trac.osgeo.org/postgis/ticket/6066
>>>>
>>>> The crash is triggered a C function using SPI.
>>>
>>> Evan Montgomery-Recht posted a report of the same issue on this thread
>>> a couple of days ago.
>>
>> I confirm the patch attached in Evan's email [1] fixes the crash for us.
>>
>> [1] https://www.postgresql.org/message-id/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g%40mail.gma...
>>
>>> I have posted a patch to fix the issue, which I will commit tomorrow
>>> after a bit more testing.
>>
>> I also confirm your patch v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch
>> fixes the crash for us. Thank you !
>
> Thanks for confirming that.
>
>> Let me know when it is time to test again against master.
>
> I have just pushed 0001 which you'll find in master as 34a3078629.
>
> --
> Thanks, Amit Langote
Hi Amit, looks like you missed to fix the typo that Jie pointed out. In 34a307862930056e1976471d6d81a5e2efc148df,
```
+ bool firing_batch_callbacks; /* true when in
+ * FireAfterTriggersBatchCallbacks() */
```
The typo is still there.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 16:01 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-10 04:14 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-10 04:20 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-10 04:21 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-10 04:21 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Sandro Santilli <[email protected]>; Tomas Vondra <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; jie wang <[email protected]>
On Fri, Apr 10, 2026 at 1:21 PM Chao Li <[email protected]> wrote:
> > On Apr 10, 2026, at 12:14, Amit Langote <[email protected]> wrote:
> >
> > Hi Sandro,
> >
> > On Fri, Apr 10, 2026 at 1:02 AM Sandro Santilli <[email protected]> wrote:
> >> On Thu, Apr 09, 2026 at 08:55:01PM +0900, Amit Langote wrote:
> >>> Hi Sandro,
> >>>
> >>> On Thu, Apr 9, 2026 at 8:07 PM Sandro Santilli <[email protected]> wrote:
> >>>> On Mon, Mar 02, 2026 at 01:34:41PM +0100, Tomas Vondra wrote:
> >>>>>
> >>>>> TBH I haven't noticed the memory context issue myself, I only noticed
> >>>>> because the builds with index prefetch started crashing.
> >>>>
> >>>> We're getting a crash in PostGIS too, since that commit was merged into
> >>>> the master branch, see https://trac.osgeo.org/postgis/ticket/6066
> >>>>
> >>>> The crash is triggered a C function using SPI.
> >>>
> >>> Evan Montgomery-Recht posted a report of the same issue on this thread
> >>> a couple of days ago.
> >>
> >> I confirm the patch attached in Evan's email [1] fixes the crash for us.
> >>
> >> [1] https://www.postgresql.org/message-id/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g%40mail.gma...
> >>
> >>> I have posted a patch to fix the issue, which I will commit tomorrow
> >>> after a bit more testing.
> >>
> >> I also confirm your patch v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch
> >> fixes the crash for us. Thank you !
> >
> > Thanks for confirming that.
> >
> >> Let me know when it is time to test again against master.
> >
> > I have just pushed 0001 which you'll find in master as 34a3078629.
> >
> > --
> > Thanks, Amit Langote
>
> Hi Amit, looks like you missed to fix the typo that Jie pointed out. In 34a307862930056e1976471d6d81a5e2efc148df,
> ```
> + bool firing_batch_callbacks; /* true when in
> + * FireAfterTriggersBatchCallbacks() */
> ```
> The typo is still there.
Yep, my bad. Will push a fix shortly.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 11:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 16:01 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Sandro Santilli <[email protected]>
2026-04-10 04:14 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-10 18:35 ` Sandro Santilli <[email protected]>
1 sibling, 0 replies; 61+ messages in thread
From: Sandro Santilli @ 2026-04-10 18:35 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Junwang Zhao <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
On Fri, Apr 10, 2026 at 01:14:11PM +0900, Amit Langote wrote:
>
> I have just pushed 0001 which you'll find in master as 34a3078629.
No crash with commit 2a3d2f9f68da0c430c497bf29f60373f5214307d
(which includes 34a3078629).
Thank you !
--strk;
Libre GIS consultant/developer 🎺
https://strk.kbt.io/services.html
Attachments:
[application/pgp-signature] signature.asc (659B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-02 15:30 ` Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Junwang Zhao @ 2026-03-02 15:30 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Pavel Stehule <[email protected]>; pgsql-hackers
On Sat, Feb 28, 2026 at 3:08 PM Amit Langote <[email protected]> wrote:
>
> Hi Junwang,
>
> On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> > > I re-ran the benchmarks (same test as yours, different machine):
> > >
> > > create table pk (a numeric primary key);
> > > create table fk (a bigint references pk);
> > > insert into pk select generate_series(1, 2000000);
> > > insert into fk select generate_series(1, 2000000, 2);
> > >
> > > master: 2444 ms (median of 3 runs)
> > > 0001: 1382 ms (43% faster)
> > > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> >
> > I can get similar improvement on my old mac intel chip:
> >
> > master: 12963.993 ms
> > 0001: 6641.692 ms, 48.8% faster
> > 0001+0002: 5771.703 ms, 55.5% faster
> > >
> > > Also, with int PK / int FK (1M rows):
> > >
> > > create table pk (a int primary key);
> > > create table fk (a int references pk);
> > > insert into pk select generate_series(1, 1000000);
> > > insert into fk select generate_series(1, 1000000);
> > >
> > > master: 1000 ms
> > > 0001: 520 ms (48% faster)
> > > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> >
> > master: 11134.583 ms
> > 0001: 5240.298 ms, 52.9% faster
> > 0001+0002: 4554.215 ms, 59.1% faster
>
> Thanks for testing, good to see similar numbers. I had forgotten to
> note that these results are when these PK index probes don't do any
> I/O, though you might be aware of that. Below, I report some numbers
> that Tomas Vondra shared with me off-list where the probes do have to
> perform I/O and there the benefits from only this patch set are only
> marginal.
>
> > I don't have any additional comments on the patch except one minor nit,
> > maybe merge the following two if conditions into one, not a strong opinion
> > though.
> >
> > if (use_cache)
> > {
> > /*
> > * The snapshot was registered once when the cache entry was created.
> > * We just patch curcid to reflect the new command counter.
> > * SnapshotSetCommandId() only patches process-global statics, not
> > * registered copies, so we do it directly.
> > *
> > * The xmin/xmax/xip fields don't need refreshing: within a single
> > * statement batch, only curcid changes between rows.
> > */
> > Assert(fpentry && fpentry->snapshot != NULL);
> > snapshot = fpentry->snapshot;
> > snapshot->curcid = GetCurrentCommandId(false);
> > }
> > else
> > snapshot = RegisterSnapshot(GetLatestSnapshot());
> >
> > if (use_cache)
> > {
> > pk_rel = fpentry->pk_rel;
> > idx_rel = fpentry->idx_rel;
> > scandesc = fpentry->scandesc;
> > slot = fpentry->slot;
> > }
> > else
> > {
> > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > scandesc = index_beginscan(pk_rel, idx_rel,
> > snapshot, NULL,
> > riinfo->nkeys, 0);
> > slot = table_slot_create(pk_rel, NULL);
> > }
>
> Good idea, done.
>
> While polishing 0002, I revisited the snapshot caching semantics. The
> previous commit message hand-waved about only curcid changing between
> rows, but GetLatestSnapshot() also reflects other backends' commits,
> so reusing the snapshot is a deliberate semantic change from the SPI
> path. I think it's safe because curcid is all we need for
> intra-statement visibility, concurrent commits either already happened
> before our snapshot (and are visible) or are racing with our statement
> and wouldn't be seen reliably even with per-row snapshots since the
> order in which FK rows are checked is nondeterministic, and
> LockTupleKeyShare prevents the PK row from disappearing regardless. In
> essence, we're treating all the FK checks within a trigger-firing
> cycle as a single plan execution that happens to scan N rows, rather
> than N independent SPI queries each taking a fresh snapshot. That's
> the natural model -- a normal SELECT ... FOR KEY SHARE plan doesn't
> re-take GetLatestSnapshot() between rows either.
>
> Similarly, the permission check (schema USAGE + table SELECT) is now
> done once at cache entry creation in ri_FastPathGetEntry() rather than
> on every flush.
nice improvement.
> The RI check runs as the PK table owner, so we're
> verifying that the owner can access their own table -- a condition
> that won't change unless someone explicitly revokes from the owner,
> which would also break the SPI path.
>
> > > David Rowley mentioned off-list that it might be worth batching
> > > multiple FK values into a single index probe, leveraging the
> > > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > > to buffer FK values across trigger invocations in the per-constraint
> > > cache (0002 already has the right structure for this), build a
> > > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > > pages in one sorted traversal instead of one tree descent per row. The
> > > locking and recheck would still be per-tuple, but the index traversal
> > > cost drops significantly. Single-column FKs are the obvious starting
> > > point. That seems worth exploring but can be done as a separate patch
> > > on top of this.
> >
> > I will take a look at this in the following weeks.
>
> I ended up going ahead with the batching and SAOP idea that David
> mentioned -- I had a proof-of-concept working shortly after posting v3
> and kept iterating on it. So attached set is now:
>
> 0001 - Core fast path (your 0001+0002 reworked, as before)
>
> 0002 - Per-batch resource caching (PK relation, index, scandesc, snapshot)
>
> 0003 - FK row buffering: materialize FK tuples into a per-constraint
> batch buffer (64 rows), flush when full or at batch end
>
> 0004 - SK_SEARCHARRAY for single-column FKs: build an array from the
> buffered FK values and do one index scan instead of 64 separate tree
> descents. Multi-column FKs fall back to a per-row loop.
>
> 0003 is pure infrastructure -- it doesn't improve performance on its
> own because the per-row index descent still dominates. The payoff
> comes in 0004.
>
> Numbers (same machine as before, median of 3 runs):
>
> numeric PK / bigint FK, 1M rows:
> master: 2487 ms
> 0001..0004: 1168 ms (2.1x)
>
> int PK / int FK, 500K rows:
> master: 1043 ms
> 0001..0004: 335 ms (3.1x)
>
> The int/int case benefits most because the per-row cost is lower, so
> the SAOP traversal savings are a larger fraction of the total. The
> numeric/bigint case still sees a solid improvement despite the
> cross-type cast overhead.
>
> Tomas Vondra also tested with an I/O-intensive workload (dataset
> larger than shared_buffers, combined with his and Peter Geoghegan's
> I/O prefetching patches) and confirmed that the batching + SAOP
> approach helps there too, not just in the CPU-bound / memory-resident
> case. In fact he showed that the patches here don't make a big dent
> when the main bottleneck is I/O as shown in numbers that he shared in
> an off-list email:
>
> master: 161617 ms
> ri-check (0001..0004): 149446 ms (1.08x)
> ri-check + i/o prefetching: 50885 ms (3.2x)
>
> So the RI patches alone only give ~8% here since most time is waiting
> on reads. But the batching gives the prefetch machinery a window of
> upcoming probes to issue readahead against, so the two together yield
> 3.2x.
impressive!
>
> Tomas also caught a memory context bug in the batch flush path: the
> cached scandesc lives in TopTransactionContext, but the btree AM
> defers _bt_preprocess_keys allocation to the first getnext call, which
> pallocs into CurrentMemoryContext. If that's a short-lived
> per-trigger-row context, the scandesc has dangling pointers on the
> next rescan. Fixed by switching to TopTransactionContext before the
> probe loop.
>
> Finally, I've fixed a number of other small and not-so-small bugs
> found while polishing the old patches and made other stylistic
> improvements. One notable change is that I introduced a FastPathMeta
Yeah, this is much better than the fpmeta_valid field.
> struct to store the fast path metadata instead of dumping those arrays
> in the RI_ConstraintInfo. It's allocated lazily on first use and holds
> the per-key compare entries, operator procedures, and index strategy
> info needed by the scan key construction, so RI_ConstraintInfo doesn't
> pay for them when the fast path isn't used.
>
>
> On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> >
> > Hi Amit,
> >
> > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> > >
> > > Hi Junwang,
> > >
> > > On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> > > > As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> > > > design.
> > > >
> > > > 0001 adds a fast path optimization for foreign key constraint checks
> > > > that bypasses the SPI executor, the fast path applies when the referenced
> > > > table is not partitioned, and the constraint does not involve temporal
> > > > semantics.
> > > >
> > > > With the following test:
> > > >
> > > > create table pk (a numeric primary key);
> > > > create table fk (a bigint references pk);
> > > > insert into pk select generate_series(1, 2000000);
> > > >
> > > > head:
> > > >
> > > > [local] zhjwpku@postgres:5432-90419=# insert into fk select
> > > > generate_series(1, 2000000, 2);
> > > > INSERT 0 1000000
> > > > Time: 13516.177 ms (00:13.516)
> > > >
> > > > [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> > > > UPDATE 1000000
> > > > Time: 15057.638 ms (00:15.058)
> > > >
> > > > patched:
> > > >
> > > > [local] zhjwpku@postgres:5432-98673=# insert into fk select
> > > > generate_series(1, 2000000, 2);
> > > > INSERT 0 1000000
> > > > Time: 8248.777 ms (00:08.249)
> > > >
> > > > [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> > > > UPDATE 1000000
> > > > Time: 10117.002 ms (00:10.117)
> > > >
> > > > 0002 cache fast-path metadata used by the index probe, at the current
> > > > time only comparison operator hash entries, operator function OIDs
> > > > and strategy numbers and subtypes for index scans. But this cache
> > > > doesn't buy any performance improvement.
> > > >
> > > > Caching additional metadata should improve performance for foreign key checks.
> > > >
> > > > Amit suggested introducing a mechanism for ri_triggers.c to register a
> > > > cleanup callback in the EState, which AfterTriggerEndQuery() could then
> > > > invoke to release per-statement cached metadata (such as the IndexScanDesc).
> > > > However, I haven't been able to implement this mechanism yet.
> > >
> > > Thanks for working on this. I've taken your patches as a starting
> > > point and reworked the series into two patches (attached): 1st is your
> > > 0001+0002 as the core patch that adds a gated fast-path alternative to
> > > SPI and 2nd where I added per-statement resource caching. Doing the
> > > latter turned out to be not so hard thanks to the structure you chose
> > > to build the core fast path. Good call on adding the RLS and ACL test
> > > cases, btw.
> > >
> > > So, 0001 is a functionally complete fast path: concurrency handling,
> > > REPEATABLE READ crosscheck, cross-type operators, security context,
> > > and metadata caching. 0002 implements the per-statement resource
> > > caching we discussed, though instead of sharing the EState between
> > > trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
> > > mechanism that fires at the end of each trigger-firing cycle
> > > (per-statement for immediate constraints, or until COMMIT for deferred
> > > ones). It layers resource caching on top so that the PK relation,
> > > index, scan descriptor, and snapshot stay open across all FK trigger
> > > invocations within a single trigger-firing cycle rather than being
> > > opened and closed per row.
> > >
> > > Note that phe previous 0002 (metadata caching) is folded into 0001,
> > > and most of the new fast-path logic added in 0001 now lives in
> > > ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
> > > RI_FKey_check diff is just the gating call and SPI fallback.
> > >
> > > I re-ran the benchmarks (same test as yours, different machine):
> > >
> > > create table pk (a numeric primary key);
> > > create table fk (a bigint references pk);
> > > insert into pk select generate_series(1, 2000000);
> > > insert into fk select generate_series(1, 2000000, 2);
> > >
> > > master: 2444 ms (median of 3 runs)
> > > 0001: 1382 ms (43% faster)
> > > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> >
> > I can get similar improvement on my old mac intel chip:
> >
> > master: 12963.993 ms
> > 0001: 6641.692 ms, 48.8% faster
> > 0001+0002: 5771.703 ms, 55.5% faster
> >
> > >
> > > Also, with int PK / int FK (1M rows):
> > >
> > > create table pk (a int primary key);
> > > create table fk (a int references pk);
> > > insert into pk select generate_series(1, 1000000);
> > > insert into fk select generate_series(1, 1000000);
> > >
> > > master: 1000 ms
> > > 0001: 520 ms (48% faster)
> > > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> >
> > master: 11134.583 ms
> > 0001: 5240.298 ms, 52.9% faster
> > 0001+0002: 4554.215 ms, 59.1% faster
> >
> > >
> > > The incremental gain from 0002 comes from eliminating per-row relation
> > > open/close, scan begin/end, slot alloc/free, and replacing per-row
> > > GetSnapshotData() with only curcid adjustment on the registered
> > > snapshot copy in the cache.
> > >
> > > The two current limitations are partitioned referenced tables and
> > > temporal foreign keys. Partitioned PKs are relatively uncommon in
> > > practice, so the non-partitioned case should cover most FK workloads,
> > > so I'm not sure it's worth the added complexity to support them.
> > > Temporal FKs are inherently multi-row, so they're a poor fit for a
> > > single-probe fast path.
> > >
> > > David Rowley mentioned off-list that it might be worth batching
> > > multiple FK values into a single index probe, leveraging the
> > > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > > to buffer FK values across trigger invocations in the per-constraint
> > > cache (0002 already has the right structure for this), build a
> > > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > > pages in one sorted traversal instead of one tree descent per row. The
> > > locking and recheck would still be per-tuple, but the index traversal
> > > cost drops significantly. Single-column FKs are the obvious starting
> > > point. That seems worth exploring but can be done as a separate patch
> > > on top of this.
> >
> > I will take a look at this in the following weeks.
> >
> > >
> > > I think the series is in reasonable shape but would appreciate extra
> > > eyeballs, especially on the concurrency handling in ri_LockPKTuple()
> > > in 0001 and the snapshot lifecycle in 0002. Or anything else that
> > > catches one's eye.
> > >
> > > --
> > > Thanks, Amit Langote
> >
> > I don't have any additional comments on the patch except one minor nit,
> > maybe merge the following two if conditions into one, not a strong opinion
> > though.
> >
> > if (use_cache)
> > {
> > /*
> > * The snapshot was registered once when the cache entry was created.
> > * We just patch curcid to reflect the new command counter.
> > * SnapshotSetCommandId() only patches process-global statics, not
> > * registered copies, so we do it directly.
> > *
> > * The xmin/xmax/xip fields don't need refreshing: within a single
> > * statement batch, only curcid changes between rows.
> > */
> > Assert(fpentry && fpentry->snapshot != NULL);
> > snapshot = fpentry->snapshot;
> > snapshot->curcid = GetCurrentCommandId(false);
> > }
> > else
> > snapshot = RegisterSnapshot(GetLatestSnapshot());
> >
> > if (use_cache)
> > {
> > pk_rel = fpentry->pk_rel;
> > idx_rel = fpentry->idx_rel;
> > scandesc = fpentry->scandesc;
> > slot = fpentry->slot;
> > }
> > else
> > {
> > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > scandesc = index_beginscan(pk_rel, idx_rel,
> > snapshot, NULL,
> > riinfo->nkeys, 0);
> > slot = table_slot_create(pk_rel, NULL);
> > }
> >
> > --
> > Regards
> > Junwang Zhao
>
>
>
> --
> Thanks, Amit Langote
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-10 12:28 ` Junwang Zhao <[email protected]>
2026-03-16 14:03 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Junwang Zhao @ 2026-03-10 12:28 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Pavel Stehule <[email protected]>; pgsql-hackers
Hi,
On Mon, Mar 2, 2026 at 11:30 PM Junwang Zhao <[email protected]> wrote:
>
> On Sat, Feb 28, 2026 at 3:08 PM Amit Langote <[email protected]> wrote:
> >
> > Hi Junwang,
> >
> > On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> > > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> > > > I re-ran the benchmarks (same test as yours, different machine):
> > > >
> > > > create table pk (a numeric primary key);
> > > > create table fk (a bigint references pk);
> > > > insert into pk select generate_series(1, 2000000);
> > > > insert into fk select generate_series(1, 2000000, 2);
> > > >
> > > > master: 2444 ms (median of 3 runs)
> > > > 0001: 1382 ms (43% faster)
> > > > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> > >
> > > I can get similar improvement on my old mac intel chip:
> > >
> > > master: 12963.993 ms
> > > 0001: 6641.692 ms, 48.8% faster
> > > 0001+0002: 5771.703 ms, 55.5% faster
> > > >
> > > > Also, with int PK / int FK (1M rows):
> > > >
> > > > create table pk (a int primary key);
> > > > create table fk (a int references pk);
> > > > insert into pk select generate_series(1, 1000000);
> > > > insert into fk select generate_series(1, 1000000);
> > > >
> > > > master: 1000 ms
> > > > 0001: 520 ms (48% faster)
> > > > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> > >
> > > master: 11134.583 ms
> > > 0001: 5240.298 ms, 52.9% faster
> > > 0001+0002: 4554.215 ms, 59.1% faster
> >
> > Thanks for testing, good to see similar numbers. I had forgotten to
> > note that these results are when these PK index probes don't do any
> > I/O, though you might be aware of that. Below, I report some numbers
> > that Tomas Vondra shared with me off-list where the probes do have to
> > perform I/O and there the benefits from only this patch set are only
> > marginal.
> >
> > > I don't have any additional comments on the patch except one minor nit,
> > > maybe merge the following two if conditions into one, not a strong opinion
> > > though.
> > >
> > > if (use_cache)
> > > {
> > > /*
> > > * The snapshot was registered once when the cache entry was created.
> > > * We just patch curcid to reflect the new command counter.
> > > * SnapshotSetCommandId() only patches process-global statics, not
> > > * registered copies, so we do it directly.
> > > *
> > > * The xmin/xmax/xip fields don't need refreshing: within a single
> > > * statement batch, only curcid changes between rows.
> > > */
> > > Assert(fpentry && fpentry->snapshot != NULL);
> > > snapshot = fpentry->snapshot;
> > > snapshot->curcid = GetCurrentCommandId(false);
> > > }
> > > else
> > > snapshot = RegisterSnapshot(GetLatestSnapshot());
> > >
> > > if (use_cache)
> > > {
> > > pk_rel = fpentry->pk_rel;
> > > idx_rel = fpentry->idx_rel;
> > > scandesc = fpentry->scandesc;
> > > slot = fpentry->slot;
> > > }
> > > else
> > > {
> > > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > > scandesc = index_beginscan(pk_rel, idx_rel,
> > > snapshot, NULL,
> > > riinfo->nkeys, 0);
> > > slot = table_slot_create(pk_rel, NULL);
> > > }
> >
> > Good idea, done.
> >
> > While polishing 0002, I revisited the snapshot caching semantics. The
> > previous commit message hand-waved about only curcid changing between
> > rows, but GetLatestSnapshot() also reflects other backends' commits,
> > so reusing the snapshot is a deliberate semantic change from the SPI
> > path. I think it's safe because curcid is all we need for
> > intra-statement visibility, concurrent commits either already happened
> > before our snapshot (and are visible) or are racing with our statement
> > and wouldn't be seen reliably even with per-row snapshots since the
> > order in which FK rows are checked is nondeterministic, and
> > LockTupleKeyShare prevents the PK row from disappearing regardless. In
> > essence, we're treating all the FK checks within a trigger-firing
> > cycle as a single plan execution that happens to scan N rows, rather
> > than N independent SPI queries each taking a fresh snapshot. That's
> > the natural model -- a normal SELECT ... FOR KEY SHARE plan doesn't
> > re-take GetLatestSnapshot() between rows either.
> >
> > Similarly, the permission check (schema USAGE + table SELECT) is now
> > done once at cache entry creation in ri_FastPathGetEntry() rather than
> > on every flush.
>
> nice improvement.
>
> > The RI check runs as the PK table owner, so we're
> > verifying that the owner can access their own table -- a condition
> > that won't change unless someone explicitly revokes from the owner,
> > which would also break the SPI path.
> >
> > > > David Rowley mentioned off-list that it might be worth batching
> > > > multiple FK values into a single index probe, leveraging the
> > > > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > > > to buffer FK values across trigger invocations in the per-constraint
> > > > cache (0002 already has the right structure for this), build a
> > > > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > > > pages in one sorted traversal instead of one tree descent per row. The
> > > > locking and recheck would still be per-tuple, but the index traversal
> > > > cost drops significantly. Single-column FKs are the obvious starting
> > > > point. That seems worth exploring but can be done as a separate patch
> > > > on top of this.
> > >
> > > I will take a look at this in the following weeks.
> >
> > I ended up going ahead with the batching and SAOP idea that David
> > mentioned -- I had a proof-of-concept working shortly after posting v3
> > and kept iterating on it. So attached set is now:
> >
> > 0001 - Core fast path (your 0001+0002 reworked, as before)
> >
> > 0002 - Per-batch resource caching (PK relation, index, scandesc, snapshot)
> >
> > 0003 - FK row buffering: materialize FK tuples into a per-constraint
> > batch buffer (64 rows), flush when full or at batch end
> >
> > 0004 - SK_SEARCHARRAY for single-column FKs: build an array from the
> > buffered FK values and do one index scan instead of 64 separate tree
> > descents. Multi-column FKs fall back to a per-row loop.
> >
> > 0003 is pure infrastructure -- it doesn't improve performance on its
> > own because the per-row index descent still dominates. The payoff
> > comes in 0004.
> >
> > Numbers (same machine as before, median of 3 runs):
> >
> > numeric PK / bigint FK, 1M rows:
> > master: 2487 ms
> > 0001..0004: 1168 ms (2.1x)
> >
> > int PK / int FK, 500K rows:
> > master: 1043 ms
> > 0001..0004: 335 ms (3.1x)
> >
> > The int/int case benefits most because the per-row cost is lower, so
> > the SAOP traversal savings are a larger fraction of the total. The
> > numeric/bigint case still sees a solid improvement despite the
> > cross-type cast overhead.
> >
> > Tomas Vondra also tested with an I/O-intensive workload (dataset
> > larger than shared_buffers, combined with his and Peter Geoghegan's
> > I/O prefetching patches) and confirmed that the batching + SAOP
> > approach helps there too, not just in the CPU-bound / memory-resident
> > case. In fact he showed that the patches here don't make a big dent
> > when the main bottleneck is I/O as shown in numbers that he shared in
> > an off-list email:
> >
> > master: 161617 ms
> > ri-check (0001..0004): 149446 ms (1.08x)
> > ri-check + i/o prefetching: 50885 ms (3.2x)
> >
> > So the RI patches alone only give ~8% here since most time is waiting
> > on reads. But the batching gives the prefetch machinery a window of
> > upcoming probes to issue readahead against, so the two together yield
> > 3.2x.
>
> impressive!
>
> >
> > Tomas also caught a memory context bug in the batch flush path: the
> > cached scandesc lives in TopTransactionContext, but the btree AM
> > defers _bt_preprocess_keys allocation to the first getnext call, which
> > pallocs into CurrentMemoryContext. If that's a short-lived
> > per-trigger-row context, the scandesc has dangling pointers on the
> > next rescan. Fixed by switching to TopTransactionContext before the
> > probe loop.
> >
> > Finally, I've fixed a number of other small and not-so-small bugs
> > found while polishing the old patches and made other stylistic
> > improvements. One notable change is that I introduced a FastPathMeta
>
> Yeah, this is much better than the fpmeta_valid field.
>
> > struct to store the fast path metadata instead of dumping those arrays
> > in the RI_ConstraintInfo. It's allocated lazily on first use and holds
> > the per-key compare entries, operator procedures, and index strategy
> > info needed by the scan key construction, so RI_ConstraintInfo doesn't
> > pay for them when the fast path isn't used.
> >
> >
> > On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> > >
> > > Hi Amit,
> > >
> > > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> > > >
> > > > Hi Junwang,
> > > >
> > > > On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> > > > > As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> > > > > design.
> > > > >
> > > > > 0001 adds a fast path optimization for foreign key constraint checks
> > > > > that bypasses the SPI executor, the fast path applies when the referenced
> > > > > table is not partitioned, and the constraint does not involve temporal
> > > > > semantics.
> > > > >
> > > > > With the following test:
> > > > >
> > > > > create table pk (a numeric primary key);
> > > > > create table fk (a bigint references pk);
> > > > > insert into pk select generate_series(1, 2000000);
> > > > >
> > > > > head:
> > > > >
> > > > > [local] zhjwpku@postgres:5432-90419=# insert into fk select
> > > > > generate_series(1, 2000000, 2);
> > > > > INSERT 0 1000000
> > > > > Time: 13516.177 ms (00:13.516)
> > > > >
> > > > > [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> > > > > UPDATE 1000000
> > > > > Time: 15057.638 ms (00:15.058)
> > > > >
> > > > > patched:
> > > > >
> > > > > [local] zhjwpku@postgres:5432-98673=# insert into fk select
> > > > > generate_series(1, 2000000, 2);
> > > > > INSERT 0 1000000
> > > > > Time: 8248.777 ms (00:08.249)
> > > > >
> > > > > [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> > > > > UPDATE 1000000
> > > > > Time: 10117.002 ms (00:10.117)
> > > > >
> > > > > 0002 cache fast-path metadata used by the index probe, at the current
> > > > > time only comparison operator hash entries, operator function OIDs
> > > > > and strategy numbers and subtypes for index scans. But this cache
> > > > > doesn't buy any performance improvement.
> > > > >
> > > > > Caching additional metadata should improve performance for foreign key checks.
> > > > >
> > > > > Amit suggested introducing a mechanism for ri_triggers.c to register a
> > > > > cleanup callback in the EState, which AfterTriggerEndQuery() could then
> > > > > invoke to release per-statement cached metadata (such as the IndexScanDesc).
> > > > > However, I haven't been able to implement this mechanism yet.
> > > >
> > > > Thanks for working on this. I've taken your patches as a starting
> > > > point and reworked the series into two patches (attached): 1st is your
> > > > 0001+0002 as the core patch that adds a gated fast-path alternative to
> > > > SPI and 2nd where I added per-statement resource caching. Doing the
> > > > latter turned out to be not so hard thanks to the structure you chose
> > > > to build the core fast path. Good call on adding the RLS and ACL test
> > > > cases, btw.
> > > >
> > > > So, 0001 is a functionally complete fast path: concurrency handling,
> > > > REPEATABLE READ crosscheck, cross-type operators, security context,
> > > > and metadata caching. 0002 implements the per-statement resource
> > > > caching we discussed, though instead of sharing the EState between
> > > > trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
> > > > mechanism that fires at the end of each trigger-firing cycle
> > > > (per-statement for immediate constraints, or until COMMIT for deferred
> > > > ones). It layers resource caching on top so that the PK relation,
> > > > index, scan descriptor, and snapshot stay open across all FK trigger
> > > > invocations within a single trigger-firing cycle rather than being
> > > > opened and closed per row.
> > > >
> > > > Note that phe previous 0002 (metadata caching) is folded into 0001,
> > > > and most of the new fast-path logic added in 0001 now lives in
> > > > ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
> > > > RI_FKey_check diff is just the gating call and SPI fallback.
> > > >
> > > > I re-ran the benchmarks (same test as yours, different machine):
> > > >
> > > > create table pk (a numeric primary key);
> > > > create table fk (a bigint references pk);
> > > > insert into pk select generate_series(1, 2000000);
> > > > insert into fk select generate_series(1, 2000000, 2);
> > > >
> > > > master: 2444 ms (median of 3 runs)
> > > > 0001: 1382 ms (43% faster)
> > > > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> > >
> > > I can get similar improvement on my old mac intel chip:
> > >
> > > master: 12963.993 ms
> > > 0001: 6641.692 ms, 48.8% faster
> > > 0001+0002: 5771.703 ms, 55.5% faster
> > >
> > > >
> > > > Also, with int PK / int FK (1M rows):
> > > >
> > > > create table pk (a int primary key);
> > > > create table fk (a int references pk);
> > > > insert into pk select generate_series(1, 1000000);
> > > > insert into fk select generate_series(1, 1000000);
> > > >
> > > > master: 1000 ms
> > > > 0001: 520 ms (48% faster)
> > > > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> > >
> > > master: 11134.583 ms
> > > 0001: 5240.298 ms, 52.9% faster
> > > 0001+0002: 4554.215 ms, 59.1% faster
> > >
> > > >
> > > > The incremental gain from 0002 comes from eliminating per-row relation
> > > > open/close, scan begin/end, slot alloc/free, and replacing per-row
> > > > GetSnapshotData() with only curcid adjustment on the registered
> > > > snapshot copy in the cache.
> > > >
> > > > The two current limitations are partitioned referenced tables and
> > > > temporal foreign keys. Partitioned PKs are relatively uncommon in
> > > > practice, so the non-partitioned case should cover most FK workloads,
> > > > so I'm not sure it's worth the added complexity to support them.
> > > > Temporal FKs are inherently multi-row, so they're a poor fit for a
> > > > single-probe fast path.
> > > >
> > > > David Rowley mentioned off-list that it might be worth batching
> > > > multiple FK values into a single index probe, leveraging the
> > > > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > > > to buffer FK values across trigger invocations in the per-constraint
> > > > cache (0002 already has the right structure for this), build a
> > > > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > > > pages in one sorted traversal instead of one tree descent per row. The
> > > > locking and recheck would still be per-tuple, but the index traversal
> > > > cost drops significantly. Single-column FKs are the obvious starting
> > > > point. That seems worth exploring but can be done as a separate patch
> > > > on top of this.
> > >
> > > I will take a look at this in the following weeks.
> > >
> > > >
> > > > I think the series is in reasonable shape but would appreciate extra
> > > > eyeballs, especially on the concurrency handling in ri_LockPKTuple()
> > > > in 0001 and the snapshot lifecycle in 0002. Or anything else that
> > > > catches one's eye.
> > > >
> > > > --
> > > > Thanks, Amit Langote
> > >
> > > I don't have any additional comments on the patch except one minor nit,
> > > maybe merge the following two if conditions into one, not a strong opinion
> > > though.
> > >
> > > if (use_cache)
> > > {
> > > /*
> > > * The snapshot was registered once when the cache entry was created.
> > > * We just patch curcid to reflect the new command counter.
> > > * SnapshotSetCommandId() only patches process-global statics, not
> > > * registered copies, so we do it directly.
> > > *
> > > * The xmin/xmax/xip fields don't need refreshing: within a single
> > > * statement batch, only curcid changes between rows.
> > > */
> > > Assert(fpentry && fpentry->snapshot != NULL);
> > > snapshot = fpentry->snapshot;
> > > snapshot->curcid = GetCurrentCommandId(false);
> > > }
> > > else
> > > snapshot = RegisterSnapshot(GetLatestSnapshot());
> > >
> > > if (use_cache)
> > > {
> > > pk_rel = fpentry->pk_rel;
> > > idx_rel = fpentry->idx_rel;
> > > scandesc = fpentry->scandesc;
> > > slot = fpentry->slot;
> > > }
> > > else
> > > {
> > > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > > scandesc = index_beginscan(pk_rel, idx_rel,
> > > snapshot, NULL,
> > > riinfo->nkeys, 0);
> > > slot = table_slot_create(pk_rel, NULL);
> > > }
> > >
> > > --
> > > Regards
> > > Junwang Zhao
> >
> >
> >
> > --
> > Thanks, Amit Langote
>
>
>
> --
> Regards
> Junwang Zhao
I had an offline discussion with Amit today. There were a few small things
that could be improved, so I posted a new version of the patch set.
1.
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ bool found = ri_FastPathCheck(riinfo, fk_rel, newslot);
+
+ if (found)
+ return PointerGetDatum(NULL);
+
+ /*
+ * ri_FastPathCheck opens pk_rel internally; we need it for
+ * ri_ReportViolation. Re-open briefly.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
Move ri_ReportViolation into ri_FastPathCheck, so table_open is no
longer needed, and ri_FastPathCheck now returns void. Since Amit
agreed this is the right approach, I included it directly in v5-0001.
2.
After adding the batch fast path, the original ri_FastPathCheck is only
used by the ALTER TABLE validation path. This path cannot use the
cache because the registered AfterTriggerBatch callback will never run.
Therefore, the use_cache branch can be removed.
I made this change in v5-0004 and also updated some related comments.
Once we agree the changes are correct, it can be merged into v5-0003.
3.
+ fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
ri_FastPathBatchFlush creates a new fk_slot but does not cache it in
RI_FastPathEntry. I tried caching it in v5-0006 and ran some benchmarks,
it didn't show much improvement. This might be because the slot creation
function is called once per batch rather than once per row, so the overall
impact is minimal. I'm posting this here for Amit to take a look and decide
whether we should adopt it or drop it, since I mentioned the idea to
him earlier.
4.
ri_FastPathFlushArray currently uses SK_SEARCHARRAY only for
single-column checks. I asked whether this could be extended to support
multi-column cases, and Amit encouraged me to look into it.
After a brief investigation, it seems that ScanKeyEntryInitialize only allows
passing a single subtype/collation/procedure, which makes it difficult to
handle multiple types. Based on this, my current understanding is that
SK_SEARCHARRAY may not work for multi-column checks.
--
Regards
Junwang Zhao
Attachments:
[application/octet-stream] v5-0005-Use-SK_SEARCHARRAY-for-batched-fast-path-FK-probe.patch (15.0K, 2-v5-0005-Use-SK_SEARCHARRAY-for-batched-fast-path-FK-probe.patch)
download | inline diff:
From 40bc60306be174ae54ab1320f0d77c8fb6afb2ff Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Feb 2026 21:25:25 +0900
Subject: [PATCH v5 5/6] Use SK_SEARCHARRAY for batched fast-path FK probes
For single-column foreign keys, replace the per-row index probe loop
in ri_FastPathBatchFlush() with a single SK_SEARCHARRAY scan key.
The btree AM sorts and deduplicates the array internally, then walks
the matching leaf pages in one ordered traversal instead of descending
from the root once per row.
ri_FastPathBatchFlush() now dispatches to ri_FastPathFlushArray() for
single-column FKs and ri_FastPathFlushLoop() for multi-column FKs,
which retains the per-row probe loop.
ri_FastPathFlushArray() builds an ArrayType from the buffered FK
values (casting to the PK-side type if needed), constructs a scan key
with the SK_SEARCHARRAY flag, and iterates the matches. Each matched
PK tuple is locked and rechecked as before. A matched[] bitmap tracks
which batch items were satisfied; unmatched items are reported as
violations.
With a batch size of 64 and int/int FK, this gives a 3.3x speedup
over unpatched master (vs 2.2x with per-row probing alone).
---
src/backend/utils/adt/ri_triggers.c | 288 +++++++++++++++++-----
src/test/regress/expected/foreign_key.out | 17 ++
src/test/regress/sql/foreign_key.sql | 17 ++
3 files changed, 262 insertions(+), 60 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 200c4094861..eb112aabc98 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -224,6 +224,10 @@ typedef struct RI_FastPathEntry
/* For ri_FastPathEndBatch() */
const RI_ConstraintInfo *riinfo;
+
+ /* For ri_FastPathFlushArray() */
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
} RI_FastPathEntry;
/*
@@ -315,6 +319,10 @@ static void ri_FastPathEndBatch(void *arg);
static void ri_FastPathTeardown(void);
static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
Relation fk_rel);
@@ -4003,102 +4011,262 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
return entry;
}
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
static void
-ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
{
- const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
IndexScanDesc scandesc = fpentry->scandesc;
TupleTableSlot *slot = fpentry->slot;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *fk_slot;
Datum pk_vals[INDEX_MAX_KEYS];
char pk_nulls[INDEX_MAX_KEYS];
ScanKeyData skey[INDEX_MAX_KEYS];
- Oid saved_userid;
- int saved_sec_context;
- MemoryContext oldcxt;
- if (fpentry->batch_count == 0)
- return;
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ bool found = false;
- if (riinfo->fpmeta == NULL)
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
- Assert(riinfo->fpmeta);
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
- CommandCounterIncrement();
- snapshot->curcid = GetCurrentCommandId(false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc,
+ fpentry->xact_scan, slot,
+ snapshot, fpentry->xact_snap,
+ riinfo, skey, riinfo->nkeys,
+ true);
- GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
- SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
- saved_sec_context |
- SECURITY_LOCAL_USERID_CHANGE |
- SECURITY_NOFORCE_RLS);
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+}
- fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
- &TTSOpsHeapTuple);
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *slot = fpentry->slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum *search_vals = fpentry->search_vals;
+ bool *matched = fpentry->matched;
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+ MemoryContext oldcxt;
- oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- for (int i = 0; i < fpentry->batch_count; i++)
- {
- HeapTuple fktuple = fpentry->batch[i];
- bool found = false;
+ Assert(fpmeta);
- ExecStoreHeapTuple(fktuple, fk_slot, false);
+ memset(matched, 0, nvals * sizeof(bool));
+ /*
+ * Extract and cast FK values. We need the PK-side type for
+ * the array element type since the scan key compares against
+ * the index which stores PK-typed values.
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
- index_rescan(scandesc, skey, riinfo->nkeys, NULL, 0);
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
- if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
- {
- bool concurrently_updated;
+ /*
+ * Array element type must match the operator's right-hand input
+ * type, which is what the index comparison expects on the search
+ * side. ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's
+ * right-hand type as the subtype for cross-type operators (e.g.
+ * int8 for int48eq) and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
- if (ri_LockPKTuple(pk_rel, slot, snapshot,
- &concurrently_updated))
- {
- if (concurrently_updated)
- found = recheck_matched_pk_tuple(idx_rel, skey, slot);
- else
- found = true;
- }
- }
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
- if (found && IsolationUsesXactSnapshot())
- {
- IndexScanDesc xact_scan;
- TupleTableSlot *xact_slot;
- Snapshot xact_snap = GetTransactionSnapshot();
+ /*
+ * Build scan key with SK_SEARCHARRAY. The btree code will
+ * internally sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
- xact_slot = table_slot_create(pk_rel, NULL);
- xact_scan = index_beginscan(pk_rel, idx_rel,
- xact_snap, NULL,
- riinfo->nkeys, 0);
- index_rescan(xact_scan, skey, riinfo->nkeys, NULL, 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- if (!index_getnext_slot(xact_scan, ForwardScanDirection,
- xact_slot))
- found = false;
+ index_rescan(scandesc, skey, 1, NULL, 0);
- index_endscan(xact_scan);
- ExecDropSingleTupleTableSlot(xact_slot);
+ /*
+ * Walk all matches. The btree returns them in index order.
+ * For each match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+ bool recheck_skey_valid = false;
+
+ if (!ri_LockPKTuple(pk_rel, slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the
+ * actual PK value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ recheck_skey_valid = true;
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, slot))
+ continue;
+ }
+
+ /* RR/SERIALIZABLE crosscheck */
+ if (IsolationUsesXactSnapshot())
+ {
+ IndexScanDesc xact_scan = fpentry->xact_scan;
+
+ if (!recheck_skey_valid)
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+
+ index_rescan(xact_scan, recheck_skey, 1, NULL, 0);
+ if (!index_getnext_slot(xact_scan, ForwardScanDirection, slot))
+ continue;
}
/*
- * Report immediately. ri_ReportViolation calls ereport(ERROR)
- * which doesn't return, so remaining batch items and cleanup
- * are handled by the error path (ResourceOwner + XactCallback).
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine
+ * for the current batch size of 64.
*/
- if (!found)
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
ri_ReportViolation(riinfo, pk_rel, fk_rel,
fk_slot, NULL,
RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
}
MemoryContextSwitchTo(oldcxt);
+
+ pfree(arr);
+}
+
+
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *fk_slot;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+
SetUserIdAndSecContext(saved_userid, saved_sec_context);
/* Free materialized tuples and reset */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 16bb6370a97..0a24acdb138 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3593,3 +3593,20 @@ COMMIT;
ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index bc24272df20..ce85a21fc79 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2577,3 +2577,20 @@ INSERT INTO fp_fk_commit VALUES (1);
INSERT INTO fp_fk_commit VALUES (999);
COMMIT;
DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
--
2.41.0
[application/octet-stream] v5-0006-Reuse-FK-tuple-slot-across-fast-path-batches.patch (3.7K, 3-v5-0006-Reuse-FK-tuple-slot-across-fast-path-batches.patch)
download | inline diff:
From 7619357adf748e4396f418652adf40e9d77941fc Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Tue, 10 Mar 2026 18:06:21 +0800
Subject: [PATCH v5 6/6] Reuse FK tuple slot across fast-path batches
---
src/backend/utils/adt/ri_triggers.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index eb112aabc98..ae1765a37f4 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -213,8 +213,9 @@ typedef struct RI_FastPathEntry
Relation idx_rel;
IndexScanDesc scandesc;
TupleTableSlot *slot;
+ TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
-
+
/* For when IsolationUsesXactSnapshot() is true */
Snapshot xact_snap;
IndexScanDesc xact_scan;
@@ -314,7 +315,7 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
-static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel);
static void ri_FastPathEndBatch(void *arg);
static void ri_FastPathTeardown(void);
static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
@@ -3851,6 +3852,8 @@ ri_FastPathTeardown(void)
table_close(entry->pk_rel, NoLock);
if (entry->slot)
ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
if (entry->xact_snap)
@@ -3909,7 +3912,7 @@ ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
* index_rescan() with new keys.
*/
static RI_FastPathEntry *
-ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
{
RI_FastPathEntry *entry;
bool found;
@@ -3974,6 +3977,8 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
entry->snapshot = RegisterSnapshot(GetLatestSnapshot());
entry->slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
entry->snapshot, NULL,
@@ -4238,7 +4243,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *fk_slot;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
@@ -4259,9 +4264,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
- &TTSOpsHeapTuple);
-
if (riinfo->nkeys == 1)
ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
else
@@ -4272,10 +4274,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
/* Free materialized tuples and reset */
for (int i = 0; i < fpentry->batch_count; i++)
heap_freetuple(fpentry->batch[i]);
-
fpentry->batch_count = 0;
-
- ExecDropSingleTupleTableSlot(fk_slot);
}
/*
@@ -4297,7 +4296,7 @@ ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
RI_FastPathEntry *fpentry;
MemoryContext oldcxt;
- fpentry = ri_FastPathGetEntry(riinfo);
+ fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
fpentry->batch[fpentry->batch_count] =
--
2.41.0
[application/octet-stream] v5-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (30.5K, 4-v5-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From 5e07d2eb37b5be472c9114b9bfd4d164bf3d1133 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Fri, 27 Feb 2026 16:27:21 +0900
Subject: [PATCH v5 2/6] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in the previous commits opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation time, and its curcid
is patched in place on each subsequent row rather than calling
GetLatestSnapshot() again. This avoids the per-row
GetSnapshotData() cost, which takes ProcArrayLock and iterates all
backend slots.
This is a deliberate simplification compared to the SPI path, which
obtains a fresh snapshot per row via GetLatestSnapshot() in
ri_PerformCheck(). The reused snapshot will not reflect PK rows
committed by other backends between trigger invocations within the
same batch. This is acceptable because: (1) the FK check only needs
to see PK rows that existed before the current statement began, plus
the effects of earlier triggers in the same statement (which is what
curcid tracks), (2) any PK row committed by another backend after our
snapshot was taken either committed before our statement started (and
is already visible) or committed concurrently (and would not be
reliably visible even with per-row snapshots, since trigger firing
order is not deterministic), and (3) the tuple locking via
LockTupleKeyShare ensures the PK row cannot be deleted or key-updated
while we hold the lock, regardless of snapshot freshness.
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.
Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush. The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path. Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathCleanup as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
---
src/backend/commands/trigger.c | 84 ++++++
src/backend/utils/adt/ri_triggers.c | 341 +++++++++++++++++++---
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 66 +++++
src/test/regress/sql/foreign_key.sql | 58 ++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 532 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..a0790a5c8c5 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3891,6 +3891,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3927,6 +3929,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3962,6 +3971,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5087,6 +5097,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5208,6 +5219,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5315,6 +5328,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6057,6 +6072,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6753,3 +6770,70 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Called at the end of each trigger-firing batch.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 18373a586d6..9611b23e1ce 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,27 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheck().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+
+ /* For when IsolationUsesXactSnapshot() is true */
+ Snapshot xact_snap;
+ IndexScanDesc xact_scan;
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +226,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -256,9 +279,11 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
- IndexScanDesc scandesc, TupleTableSlot *slot,
- Snapshot snapshot, const RI_ConstraintInfo *riinfo,
- ScanKeyData *skey, int nkeys);
+ IndexScanDesc scandesc, IndexScanDesc xact_scan,
+ TupleTableSlot *slot,
+ Snapshot snapshot, Snapshot xact_snap,
+ const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys, bool use_cache);
static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
bool *concurrently_updated);
static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
@@ -277,6 +302,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo);
+static void ri_FastPathCleanup(void *arg);
/*
@@ -382,9 +409,10 @@ RI_FKey_check(TriggerData *trigdata)
/*
* Fast path: probe the PK unique index directly, bypassing SPI.
*
- * Note: pk_rel is NOT opened above. ri_fastpath_is_applicable() uses
- * cached metadata (pk_is_partitioned) rather than an open Relation, and
- * ri_FastPathCheck() opens it internally.
+ * pk_rel is not opened here. ri_fastpath_is_applicable() uses cached
+ * metadata (pk_is_partitioned), and pk_rel is opened later by either
+ * ri_FastPathGetEntry() (batched path) or ri_FastPathCheck() (ALTER
+ * TABLE validation path).
*/
if (ri_fastpath_is_applicable(riinfo))
{
@@ -2683,6 +2711,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation pk_rel;
Relation idx_rel;
IndexScanDesc scandesc;
+ IndexScanDesc xact_scan = NULL;
TupleTableSlot *slot;
Datum pk_vals[INDEX_MAX_KEYS];
char pk_nulls[INDEX_MAX_KEYS];
@@ -2691,6 +2720,20 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Oid saved_userid;
int saved_sec_context;
Snapshot snapshot;
+ Snapshot xact_snap = NULL;
+ bool use_cache;
+ RI_FastPathEntry *fpentry = NULL;
+
+ /*
+ * Use the per-batch cache only if we're inside the after-trigger
+ * framework, where our cleanup callback will fire. During ALTER TABLE
+ * ... ADD FOREIGN KEY validation, triggers are called directly and the
+ * callback would never run, leaking resources.
+ */
+ use_cache = AfterTriggerBatchIsActive();
+
+ if (use_cache)
+ fpentry = ri_FastPathGetEntry(riinfo);
/*
* Advance the command counter so the snapshot sees the effects of prior
@@ -2698,15 +2741,36 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ri_PerformCheck().
*/
CommandCounterIncrement();
- snapshot = RegisterSnapshot(GetLatestSnapshot());
-
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
- idx_rel = index_open(riinfo->conindid, AccessShareLock);
-
- slot = table_slot_create(pk_rel, NULL);
- scandesc = index_beginscan(pk_rel, idx_rel,
- snapshot, NULL,
- riinfo->nkeys, 0);
+ if (use_cache)
+ {
+ /*
+ * The snapshot was registered once when the cache entry was created.
+ * Patch curcid so it reflects the effects of prior triggers in this
+ * statement. We deliberately do not call GetLatestSnapshot() again:
+ * the xmin/xmax/xip fields do not need refreshing because any PK row
+ * we need to see was either already visible when the batch started or
+ * will be found via the tuple-lock wait (LockTupleKeyShare).
+ */
+ Assert(fpentry && fpentry->snapshot != NULL);
+ snapshot = fpentry->snapshot;
+ snapshot->curcid = GetCurrentCommandId(false);
+ xact_scan = fpentry->xact_scan;
+ xact_snap = fpentry->xact_snap;
+ pk_rel = fpentry->pk_rel;
+ idx_rel = fpentry->idx_rel;
+ scandesc = fpentry->scandesc;
+ slot = fpentry->slot;
+ }
+ else
+ {
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+ slot = table_slot_create(pk_rel, NULL);
+ }
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
@@ -2718,24 +2782,29 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- ri_CheckPermissions(pk_rel);
+ if (!use_cache)
+ ri_CheckPermissions(pk_rel);
ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
- found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
- snapshot, riinfo, skey, riinfo->nkeys);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, xact_scan,
+ slot, snapshot, xact_snap, riinfo,
+ skey, riinfo->nkeys, use_cache);
if (!found)
ri_ReportViolation(riinfo, pk_rel, fk_rel,
newslot, NULL,
RI_PLAN_CHECK_LOOKUPPK, false, false);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
- index_endscan(scandesc);
- index_close(idx_rel, NoLock);
- table_close(pk_rel, NoLock);
- ExecDropSingleTupleTableSlot(slot);
-
- UnregisterSnapshot(snapshot);
+ /* Non-cached path: clean up per-invocation resources */
+ if (!use_cache)
+ {
+ index_endscan(scandesc);
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+ }
}
/*
@@ -2752,14 +2821,20 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
*/
static bool
ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
- IndexScanDesc scandesc, TupleTableSlot *slot,
- Snapshot snapshot, const RI_ConstraintInfo *riinfo,
- ScanKeyData *skey, int nkeys)
+ IndexScanDesc scandesc, IndexScanDesc xact_scan,
+ TupleTableSlot *slot,
+ Snapshot snapshot, Snapshot xact_snap,
+ const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys, bool use_cache)
{
bool found = false;
+ MemoryContext oldcxt = NULL;
index_rescan(scandesc, skey, nkeys, NULL, 0);
+ if (use_cache)
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
{
bool concurrently_updated;
@@ -2794,22 +2869,26 @@ ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
*/
if (found && IsolationUsesXactSnapshot())
{
- IndexScanDesc xact_scan;
- TupleTableSlot *xact_slot;
- Snapshot xact_snap = GetTransactionSnapshot();
+ bool close_scan = false;
- xact_slot = table_slot_create(pk_rel, NULL);
- xact_scan = index_beginscan(pk_rel, idx_rel,
- xact_snap, NULL, nkeys, 0);
+ if (xact_snap == NULL)
+ xact_snap = GetTransactionSnapshot();
+ if (xact_scan == NULL)
+ {
+ xact_scan = index_beginscan(pk_rel, idx_rel, xact_snap, NULL,
+ riinfo->nkeys, 0);
+ close_scan = true;
+ }
index_rescan(xact_scan, skey, nkeys, NULL, 0);
-
- if (!index_getnext_slot(xact_scan, ForwardScanDirection, xact_slot))
+ if (!index_getnext_slot(xact_scan, ForwardScanDirection, slot))
found = false;
-
- index_endscan(xact_scan);
- ExecDropSingleTupleTableSlot(xact_slot);
+ if (close_scan)
+ index_endscan(xact_scan);
}
+ if (oldcxt)
+ MemoryContextSwitchTo(oldcxt);
+
return found;
}
@@ -3701,3 +3780,189 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathCleanup
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathCleanup(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ /* Close both scans before closing idx_rel. */
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->xact_scan)
+ index_endscan(entry->xact_scan);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ if (entry->xact_snap)
+ UnregisterSnapshot(entry->xact_snap);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations, scans, and snapshots.
+ * Just NULL our pointers so the still-registered batch callback
+ * becomes a no-op. The hash table memory in TopTransactionContext
+ * will be freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheck()), avoiding per-row
+ * GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetLatestSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (IsolationUsesXactSnapshot())
+ {
+ entry->xact_snap = RegisterSnapshot(GetTransactionSnapshot());
+ entry->xact_scan = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->xact_snap, NULL,
+ riinfo->nkeys, 0);
+ }
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathCleanup, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 556c86bf5e1..4304abffc8d 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..808f2e632e7 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,69 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..ef6a3381e08 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,61 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..ef122944014 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2448,6 +2450,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.41.0
[application/octet-stream] v5-0004-Refine-fast-path-FK-validation-path.patch (5.5K, 5-v5-0004-Refine-fast-path-FK-validation-path.patch)
download | inline diff:
From 1923dbce2127f91f0ded31ed1f2465a2961cedca Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Tue, 10 Mar 2026 16:06:56 +0800
Subject: [PATCH v5 4/6] Refine fast-path FK validation path
The `use_cache` branch in `ri_FastPathCheck` is no longer needed.
Remove it and update the comments that are no longer applicable.
---
src/backend/utils/adt/ri_triggers.c | 73 +++++++++--------------------
1 file changed, 23 insertions(+), 50 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 7929d6f8c85..200c4094861 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -200,7 +200,7 @@ typedef struct RI_CompareHashEntry
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathCheck().
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
@@ -2722,10 +2722,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2744,8 +2748,6 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
int saved_sec_context;
Snapshot snapshot;
Snapshot xact_snap = NULL;
- bool use_cache;
- RI_FastPathEntry *fpentry = NULL;
/*
* Use the per-batch cache only if we're inside the after-trigger
@@ -2753,10 +2755,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ... ADD FOREIGN KEY validation, triggers are called directly and the
* callback would never run, leaking resources.
*/
- use_cache = AfterTriggerBatchIsActive();
-
- if (use_cache)
- fpentry = ri_FastPathGetEntry(riinfo);
+ Assert(!AfterTriggerBatchIsActive());
/*
* Advance the command counter so the snapshot sees the effects of prior
@@ -2764,36 +2763,14 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ri_PerformCheck().
*/
CommandCounterIncrement();
- if (use_cache)
- {
- /*
- * The snapshot was registered once when the cache entry was created.
- * Patch curcid so it reflects the effects of prior triggers in this
- * statement. We deliberately do not call GetLatestSnapshot() again:
- * the xmin/xmax/xip fields do not need refreshing because any PK row
- * we need to see was either already visible when the batch started or
- * will be found via the tuple-lock wait (LockTupleKeyShare).
- */
- Assert(fpentry && fpentry->snapshot != NULL);
- snapshot = fpentry->snapshot;
- snapshot->curcid = GetCurrentCommandId(false);
- xact_scan = fpentry->xact_scan;
- xact_snap = fpentry->xact_snap;
- pk_rel = fpentry->pk_rel;
- idx_rel = fpentry->idx_rel;
- scandesc = fpentry->scandesc;
- slot = fpentry->slot;
- }
- else
- {
- snapshot = RegisterSnapshot(GetLatestSnapshot());
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
- idx_rel = index_open(riinfo->conindid, AccessShareLock);
- scandesc = index_beginscan(pk_rel, idx_rel,
- snapshot, NULL,
- riinfo->nkeys, 0);
- slot = table_slot_create(pk_rel, NULL);
- }
+
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+ slot = table_slot_create(pk_rel, NULL);
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
@@ -2805,14 +2782,13 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- if (!use_cache)
- ri_CheckPermissions(pk_rel);
+ ri_CheckPermissions(pk_rel);
ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, xact_scan,
slot, snapshot, xact_snap, riinfo,
- skey, riinfo->nkeys, use_cache);
+ skey, riinfo->nkeys, false);
if (!found)
ri_ReportViolation(riinfo, pk_rel, fk_rel,
newslot, NULL,
@@ -2820,14 +2796,11 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
SetUserIdAndSecContext(saved_userid, saved_sec_context);
/* Non-cached path: clean up per-invocation resources */
- if (!use_cache)
- {
- index_endscan(scandesc);
- index_close(idx_rel, NoLock);
- table_close(pk_rel, NoLock);
- ExecDropSingleTupleTableSlot(slot);
- UnregisterSnapshot(snapshot);
- }
+ index_endscan(scandesc);
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
}
/*
@@ -3987,7 +3960,7 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
/*
* Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathCheck()), avoiding per-row
+ * on each subsequent row (see ri_FastPathBatchFlush()), avoiding per-row
* GetSnapshotData() overhead.
*/
entry->snapshot = RegisterSnapshot(GetLatestSnapshot());
--
2.41.0
[application/octet-stream] v5-0003-Buffer-FK-rows-for-batched-fast-path-probing.patch (12.4K, 6-v5-0003-Buffer-FK-rows-for-batched-fast-path-probing.patch)
download | inline diff:
From f74e5dd795470979ccc557009402f5634e498964 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Feb 2026 21:25:14 +0900
Subject: [PATCH v5 3/6] Buffer FK rows for batched fast-path probing
Instead of probing the PK index immediately on each trigger
invocation, buffer FK rows in the per-constraint cache entry
(RI_FastPathEntry) and flush them in a batch. When the buffer
fills (64 rows) or the trigger-firing cycle ends, ri_FastPathBatchFlush()
probes the index for all buffered rows in a tight loop, sharing a
single CommandCounterIncrement, security context switch, and
permissions check across the batch.
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
ri_FastPathCleanup() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), the entry
stashes fk_relid and reopens it if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
On its own, this patch does not improve performance over 0002 because
the per-row index descent still dominates. It provides the buffering
infrastructure for the next patch, which replaces the tight loop with
a single SK_SEARCHARRAY index probe.
---
src/backend/utils/adt/ri_triggers.c | 217 +++++++++++++++++++++-
src/test/regress/expected/foreign_key.out | 23 +++
src/test/regress/sql/foreign_key.sql | 21 +++
3 files changed, 253 insertions(+), 8 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 9611b23e1ce..7929d6f8c85 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,8 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+#define RI_FASTPATH_BATCH_SIZE 64
+
/*
* RI_FastPathEntry
* Per-constraint cache of resources needed by ri_FastPathCheck().
@@ -216,6 +218,12 @@ typedef struct RI_FastPathEntry
/* For when IsolationUsesXactSnapshot() is true */
Snapshot xact_snap;
IndexScanDesc xact_scan;
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
} RI_FastPathEntry;
/*
@@ -303,7 +311,12 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo);
-static void ri_FastPathCleanup(void *arg);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
/*
@@ -416,8 +429,18 @@ RI_FKey_check(TriggerData *trigdata)
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
- return PointerGetDatum(NULL);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
}
SPI_connect();
@@ -3782,13 +3805,50 @@ RI_FKey_trigger_type(Oid tgfoid)
}
/*
- * ri_FastPathCleanup
- * Tear down all cached fast-path state.
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
*
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
*/
static void
-ri_FastPathCleanup(void *arg)
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches — can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Release all cached resources (scans, relations, snapshots).
+ *
+ * Pure resource cleanup -- no user-visible side effects, no errors.
+ */
+static void
+ri_FastPathTeardown(void)
{
HASH_SEQ_STATUS status;
RI_FastPathEntry *entry;
@@ -3951,7 +4011,7 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
/* Ensure cleanup at end of this trigger-firing batch */
if (!ri_fastpath_callback_registered)
{
- RegisterAfterTriggerBatchCallback(ri_FastPathCleanup, NULL);
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
ri_fastpath_callback_registered = true;
}
@@ -3962,7 +4022,148 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(entry->pk_rel);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
}
return entry;
}
+
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *slot = fpentry->slot;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *fk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ HeapTuple fktuple = fpentry->batch[i];
+ bool found = false;
+
+ ExecStoreHeapTuple(fktuple, fk_slot, false);
+
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ index_rescan(scandesc, skey, riinfo->nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ if (found && IsolationUsesXactSnapshot())
+ {
+ IndexScanDesc xact_scan;
+ TupleTableSlot *xact_slot;
+ Snapshot xact_snap = GetTransactionSnapshot();
+
+ xact_slot = table_slot_create(pk_rel, NULL);
+ xact_scan = index_beginscan(pk_rel, idx_rel,
+ xact_snap, NULL,
+ riinfo->nkeys, 0);
+ index_rescan(xact_scan, skey, riinfo->nkeys, NULL, 0);
+
+ if (!index_getnext_slot(xact_scan, ForwardScanDirection,
+ xact_slot))
+ found = false;
+
+ index_endscan(xact_scan);
+ ExecDropSingleTupleTableSlot(xact_slot);
+ }
+
+ /*
+ * Report immediately. ri_ReportViolation calls ereport(ERROR)
+ * which doesn't return, so remaining batch items and cleanup
+ * are handled by the error path (ResourceOwner + XactCallback).
+ */
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+
+ fpentry->batch_count = 0;
+
+ ExecDropSingleTupleTableSlot(fk_slot);
+}
+
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes
+ * all buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not
+ * return).
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathTeardown.
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry;
+ MemoryContext oldcxt;
+
+ fpentry = ri_FastPathGetEntry(riinfo);
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 808f2e632e7..16bb6370a97 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3570,3 +3570,26 @@ SELECT * FROM fp_fk_subxact;
(2 rows)
DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index ef6a3381e08..bc24272df20 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2556,3 +2556,24 @@ INSERT INTO fp_fk_subxact VALUES (1);
COMMIT;
SELECT * FROM fp_fk_subxact;
DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
--
2.41.0
[application/octet-stream] v5-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (30.2K, 7-v5-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 0a397570411859c60c581c0bc3ac5032b31de9e8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Fri, 27 Feb 2026 22:27:53 +0900
Subject: [PATCH v5 1/6] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. ri_FastPathCheck()
extracts the FK values, builds scan keys, performs an index scan, and
locks the matching tuple with LockTupleKeyShare via ri_LockPKTuple(),
which handles the RI-specific subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
For REPEATABLE READ / SERIALIZABLE, a second index probe with
GetTransactionSnapshot() replicates the SPI path's crosscheck_snapshot
behavior: a PK row visible to the latest snapshot but not to the
transaction snapshot is rejected.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path does implicitly.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The original code asserted same-type operators only.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
Author: Junwang Zhao <[email protected]>
Author: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/
---
src/backend/utils/adt/ri_triggers.c | 493 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
6 files changed, 751 insertions(+), 12 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..18373a586d6 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,27 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * Note: pk_rel is NOT opened above. ri_fastpath_is_applicable() uses
+ * cached metadata (pk_is_partitioned) rather than an open Relation, and
+ * ri_FastPathCheck() opens it internally.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2402,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2669,415 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ index_endscan(scandesc);
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+ ExecDropSingleTupleTableSlot(slot);
+
+ UnregisterSnapshot(snapshot);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple, and perform the RR/SERIALIZABLE crosscheck if needed.
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ /*--------
+ * Crosscheck for REPEATABLE READ / SERIALIZABLE:
+ *
+ * The latest snapshot can see PK rows committed after our transaction
+ * started. But the FK check must only succeed if the key also exists
+ * in a version visible to our transaction snapshot. We can't just do
+ * table_tuple_satisfies_snapshot on the locked tuple, because a
+ * non-key update creates a new version invisible to our snapshot even
+ * though the key hasn't changed.
+ *
+ * Instead, do a second index probe with the transaction snapshot.
+ * This correctly handles both cases:
+ * - Newly inserted PK row: not found -> reject
+ * - Non-key update of existing row: old version found -> accept
+ *
+ * This matches the crosscheck_snapshot behavior in the SPI path's
+ * ri_PerformCheck().
+ */
+ if (found && IsolationUsesXactSnapshot())
+ {
+ IndexScanDesc xact_scan;
+ TupleTableSlot *xact_slot;
+ Snapshot xact_snap = GetTransactionSnapshot();
+
+ xact_slot = table_slot_create(pk_rel, NULL);
+ xact_scan = index_beginscan(pk_rel, idx_rel,
+ xact_snap, NULL, nkeys, 0);
+ index_rescan(xact_scan, skey, nkeys, NULL, 0);
+
+ if (!index_getnext_slot(xact_scan, ForwardScanDirection, xact_slot))
+ found = false;
+
+ index_endscan(xact_scan);
+ ExecDropSingleTupleTableSlot(xact_slot);
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * This checks that the index key of the tuple specified in 'new_slot' matches
+ * the key that has already been found in the PK index relation 'idxrel'.
+ *
+ * Returns true if the index key of the tuple matches the existing index
+ * key, false otherwise.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3169,8 +3630,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the FK column type already matches what the
+ * operator expects. For same-type operators, that's the common type.
+ * For cross-type operators (e.g. int48eq for int4 PK / int8 FK), the
+ * FK value is the right operand, so skip the cast if typeid matches
+ * righttype.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
--
2.41.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-16 14:03 ` Amit Langote <[email protected]>
2026-03-20 08:20 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-16 14:03 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Pavel Stehule <[email protected]>; pgsql-hackers
Hi Junwang,
Thanks for sending the new version.
On Tue, Mar 10, 2026 at ... Junwang Zhao <[email protected]> wrote:
> 1.
> Move ri_ReportViolation into ri_FastPathCheck, so table_open is no
> longer needed, and ri_FastPathCheck now returns void.
Good, kept.
> 2.
> After adding the batch fast path, the original ri_FastPathCheck is only
> used by the ALTER TABLE validation path. This path cannot use the
> cache because the registered AfterTriggerBatch callback will never run.
> Therefore, the use_cache branch can be removed.
Agreed. I went a step further and restructured 0002 to avoid the
use_cache branching entirely. Instead of adding if/else blocks to
ri_FastPathCheck, 0002 now adds a separate ri_FastPathCheckCached()
function with its own resource lifecycle. 0003 then replaces it with
ri_FastPathBatchAdd() -- a clean swap rather than completely undoing
what 0002 added. This also removes the use_cache parameter from
ri_FastPathProbeOne; the memory context switch to
TopTransactionContext is now the caller's responsibility.
> 3.
> ri_FastPathBatchFlush creates a new fk_slot but does not cache it in
> RI_FastPathEntry. I tried caching it in v5-0006 and ran some benchmarks,
> it didn't show much improvement.
I put the fk_slot in the cache entry since it's a small change.
> 4.
> ri_FastPathFlushArray currently uses SK_SEARCHARRAY only for
> single-column checks. [...] my current understanding is that
> SK_SEARCHARRAY may not work for multi-column checks.
Right, I haven't investigated this deeply either. The FlushLoop
fallback is the right approach for now. If we want to explore a
SEARCHARRAY approach for multi-column keys in a follow-up, it would be
worth checking with Peter Geoghegan or someone else familiar with the
btree SAOP internals on how multiple array keys across columns
are iterated and whether that's usable at all for this use case.
Attached is v6, three patches -- combined the old 0003 (buffering) and
0004 (SK_SEARCHARRAY) into a single 0003, since the buffering alone
has no performance benefit (or at least only minor) and the split
added unnecessary diff/rebase churn.
The biggest change in this version is the snapshot handling. Looking
more carefully at what the SPI path actually does for RI_FKey_check
(non-partitioned PK, detectNewRows = false), I found that
ri_PerformCheck passes InvalidSnapshot to SPI_execute_snapshot, and
_SPI_execute_plan ends up doing
PushActiveSnapshot(GetTransactionSnapshot()). So the SPI path scans
with the transaction snapshot, not the latest
snapshot.
So I've changed the fast path to match: ri_FastPathCheck now uses
GetTransactionSnapshot() directly. Under READ COMMITTED this is a
fresh snapshot; under REPEATABLE READ it's the frozen
transaction-start snapshot, so PK rows committed after the transaction
started are simply not visible. This means the second index probe
(the IsolationUsesXactSnapshot crosscheck block) is no longer needed
and is removed. The existing fk-snapshot isolation test confirms this
is the correct behavior.
Other changes since v5:
* Fixed the batch callback firing during nested SPI: another AFTER
trigger doing DML via SPI would call AfterTriggerEndQuery at a nested
level, tearing down our cache mid-batch. Fixed by checking
query_depth inside FireAfterTriggerBatchCallbacks. Added a test case
with a trigger that auto-provisions PK rows via SPI.
* Security context is now restored before ri_ReportViolation, not
after (the ereport doesn't return).
* search_vals[] and matched[] moved from RI_FastPathEntry to stack
locals in ri_FastPathFlushArray -- they're rewritten from scratch on
every flush with no state carried between calls.
* Various comment fixes.
I think this is getting close to committable shape. That said, another
pair of eyes would be reassuring before I pull the trigger. Tomas, if
you've had a chance to look, would welcome your thoughts.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v6-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch (24.7K, 2-v6-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch)
download | inline diff:
From a1e04ae6d5f3502be1b31317346e4705e1500472 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Feb 2026 21:25:14 +0900
Subject: [PATCH v6 3/3] Batch FK rows and use SK_SEARCHARRAY for fast-path
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), the entry
stashes fk_relid and reopens it if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
---
src/backend/utils/adt/ri_triggers.c | 391 +++++++++++++++++++---
src/test/regress/expected/foreign_key.out | 40 +++
src/test/regress/sql/foreign_key.sql | 38 +++
3 files changed, 421 insertions(+), 48 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 467418cadc0..cc085120c79 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,13 +196,18 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+#define RI_FASTPATH_BATCH_SIZE 64
+
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ * Per-constraint cache of resources needed by ri_FastPathFlushBatch().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
* and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
*/
typedef struct RI_FastPathEntry
{
@@ -210,8 +215,15 @@ typedef struct RI_FastPathEntry
Relation pk_rel;
Relation idx_rel;
IndexScanDesc scandesc;
- TupleTableSlot *slot;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
} RI_FastPathEntry;
/*
@@ -274,8 +286,14 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -300,8 +318,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
int queryno, bool is_restrict, bool partgone);
static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
Relation fk_rel);
-static void ri_FastPathTeardown(void *arg);
-
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
* RI_FKey_check -
@@ -411,16 +429,22 @@ RI_FKey_check(TriggerData *trigdata)
* index scan + tuple lock. This is semantically equivalent to
* the SPI path below but avoids the per-row executor overhead.
*
- * ri_FastPathCheckCached and ri_FastPathCheck() reports the violation
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
* themselves if no matching PK row is found, so it only returns on
* success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
if (AfterTriggerBatchIsActive())
- ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2703,10 +2727,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2771,72 +2799,295 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
}
/*
- * ri_FastPathCheckCached
- * Cached-resource variant of ri_FastPathCheck for use within the
- * after-trigger framework.
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
*
* Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
* open/close, scan begin/end, and snapshot registration. The snapshot's
- * curcid is patched each call so the scan sees effects of prior triggers.
+ * curcid is patched at flush time so the scan sees effects of prior triggers.
*
- * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
- * if no matching PK row is found.
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
*/
static void
-ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot)
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *slot = fpentry->slot;
- Datum pk_vals[INDEX_MAX_KEYS];
- char pk_nulls[INDEX_MAX_KEYS];
- ScanKeyData skey[INDEX_MAX_KEYS];
- bool found;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
MemoryContext oldcxt;
- /*
- * Advance the command counter and patch the cached snapshot's curcid so
- * the scan sees PK rows inserted by earlier triggers in this statement.
- */
- CommandCounterIncrement();
- fpentry->snapshot->curcid = GetCurrentCommandId(false);
+ if (fpentry->batch_count == 0)
+ return;
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
fk_rel, idx_rel);
Assert(riinfo->fpmeta);
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all
+ * AFTER triggers for the buffered rows have already fired (trigger
+ * invocations strictly alternate per row), so a single CCI advances
+ * past all their effects. Per-row security context switch is
+ * unnecessary because each row's probe runs entirely as the PK table
+ * owner, same as the SPI path -- the only difference is that the SPI
+ * path sets and restores the context per row whereas we do it once
+ * around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
-
/*
- * The cached scandesc lives in TopTransactionContext, but the btree AM
- * defers some allocations to the first index_getnext_slot call. Ensure
- * those land in TopTransactionContext too.
+ * The cached scandesc lives in TopTransactionContext, but the index AMs
+ * might defer some allocations to the first index_getnext_slot call.
+ * Ensure those land in TopTransactionContext too.
*/
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
- riinfo, skey, riinfo->nkeys);
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
MemoryContextSwitchTo(oldcxt);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
- if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+ fpentry->batch_count = 0;
}
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ bool found = false;
+
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input
+ * type if needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input
+ * type, which is what the index comparison expects on the search
+ * side. ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's
+ * right-hand type as the subtype for cross-type operators (e.g.
+ * int8 for int48eq) and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will
+ * internally sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the
+ * actual PK value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine
+ * for the current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+
+ pfree(arr);
+}
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3768,14 +4019,51 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
/*
* ri_FastPathTeardown
- * Tear down all cached fast-path state.
+ * Release all cached resources (scans, relations, snapshots).
*
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
*/
static void
-ri_FastPathTeardown(void *arg)
+ri_FastPathTeardown(void)
{
HASH_SEQ_STATUS status;
RI_FastPathEntry *entry;
@@ -3793,8 +4081,10 @@ ri_FastPathTeardown(void *arg)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
table_close(entry->pk_rel, NoLock);
- if (entry->slot)
- ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
}
@@ -3910,12 +4200,14 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/*
* Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
* per-row GetSnapshotData() overhead.
*/
entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
- entry->slot = table_slot_create(entry->pk_rel, NULL);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
entry->snapshot, NULL,
@@ -3926,7 +4218,7 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/* Ensure cleanup at end of this trigger-firing batch */
if (!ri_fastpath_callback_registered)
{
- RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
ri_fastpath_callback_registered = true;
}
@@ -3937,6 +4229,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(entry->pk_rel);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
}
return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 25d505c6c12..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3590,3 +3590,43 @@ NOTICE: fp_auto_pk called
NOTICE: fp_auto_pk called
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index cedd20c8d11..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2578,3 +2578,41 @@ INSERT INTO fp_fk_cci VALUES (1), (2), (3);
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
--
2.47.3
[application/octet-stream] v6-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (29.5K, 3-v6-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 78d151fa917a30952a7380bb51721fe33a030288 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sun, 15 Mar 2026 16:53:27 +0900
Subject: [PATCH v6 1/3] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. ri_FastPathCheck()
extracts the FK values, builds scan keys, performs an index scan, and
locks the matching tuple with LockTupleKeyShare via ri_LockPKTuple(),
which handles the RI-specific subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path does implicitly.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The original code asserted same-type operators only.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Author: Junwang Zhao <[email protected]>
Author: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/
---
src/backend/utils/adt/ri_triggers.c | 464 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
6 files changed, 722 insertions(+), 12 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..2357735c4c8 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI
+ * machinery (plan cache, executor setup, etc.) and do a direct
+ * index scan + tuple lock. This is semantically equivalent to
+ * the SPI path below but avoids the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport)
+ * if no matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3169,8 +3601,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the FK column type already matches what the
+ * operator expects. For same-type operators, that's the common type.
+ * For cross-type operators (e.g. int48eq for int4 PK / int8 FK), the
+ * FK value is the right operand, so skip the cast if typeid matches
+ * righttype.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
--
2.47.3
[application/octet-stream] v6-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (28.2K, 4-v6-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From 5bc904406b619e1020a1e56d2b77b25d787ae34e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 16 Mar 2026 20:57:35 +0900
Subject: [PATCH v6 2/3] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in the previous commit opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation time, and its
curcid is patched in place on each subsequent row rather than
taking a fresh snapshot per invocation. This avoids the per-row
GetSnapshotData() cost. Under REPEATABLE READ the transaction
snapshot is immutable so caching is a no-op. Under READ COMMITTED
the cached snapshot will not reflect PK rows committed by other
backends mid-batch, but this is acceptable: the FK check only needs
PK rows visible before the statement began plus effects of earlier
triggers (tracked by curcid), concurrent commits would not be
reliably visible even with per-row snapshots since trigger firing
order is nondeterministic, and LockTupleKeyShare prevents the PK
row from disappearing regardless.
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.
Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush. The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path. Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathTeardown as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
---
src/backend/commands/trigger.c | 90 +++++++
src/backend/utils/adt/ri_triggers.c | 275 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 86 +++++++
src/test/regress/sql/foreign_key.sql | 80 +++++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 549 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..57d33f76a8c 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3891,6 +3891,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3927,6 +3929,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3962,6 +3971,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5087,6 +5097,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5208,6 +5219,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5315,6 +5328,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6057,6 +6072,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6753,3 +6770,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2357735c4c8..467418cadc0 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,23 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +222,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +274,8 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +298,9 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathTeardown(void *arg);
/*
@@ -387,12 +411,16 @@ RI_FKey_check(TriggerData *trigdata)
* index scan + tuple lock. This is semantically equivalent to
* the SPI path below but avoids the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport)
- * if no matching PK row is found, so it only returns on success.
+ * ri_FastPathCheckCached and ri_FastPathCheck() reports the violation
+ * themselves if no matching PK row is found, so it only returns on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ else
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
return PointerGetDatum(NULL);
}
@@ -2742,6 +2770,73 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathCheckCached
+ * Cached-resource variant of ri_FastPathCheck for use within the
+ * after-trigger framework.
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, scan begin/end, and snapshot registration. The snapshot's
+ * curcid is patched each call so the scan sees effects of prior triggers.
+ *
+ * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
+ * if no matching PK row is found.
+ */
+static void
+ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *slot = fpentry->slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ /*
+ * Advance the command counter and patch the cached snapshot's curcid so
+ * the scan sees PK rows inserted by earlier triggers in this statement.
+ */
+ CommandCounterIncrement();
+ fpentry->snapshot->curcid = GetCurrentCommandId(false);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ /*
+ * The cached scandesc lives in TopTransactionContext, but the btree AM
+ * defers some allocations to the first index_getnext_slot call. Ensure
+ * those land in TopTransactionContext too.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
+ riinfo, skey, riinfo->nkeys);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3672,3 +3767,177 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathTeardown(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ /* Close both scans before closing idx_rel. */
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations and snapshots. Just
+ * NULL our pointers so the still-registered batch callback becomes a
+ * no-op. The hash table memory in TopTransactionContext will be
+ * freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * per-row GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 556c86bf5e1..4304abffc8d 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..25d505c6c12 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,89 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..cedd20c8d11 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,83 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3da19d41413..9b90f70ecce 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2448,6 +2450,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-16 14:03 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-20 08:20 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-03-20 08:20 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Mon, Mar 16, 2026 at 23:03 Amit Langote <[email protected]> wrote:
> Hi Junwang,
>
> Thanks for sending the new version.
>
> On Tue, Mar 10, 2026 at ... Junwang Zhao <[email protected]> wrote:
> > 1.
> > Move ri_ReportViolation into ri_FastPathCheck, so table_open is no
> > longer needed, and ri_FastPathCheck now returns void.
>
> Good, kept.
>
> > 2.
> > After adding the batch fast path, the original ri_FastPathCheck is only
> > used by the ALTER TABLE validation path. This path cannot use the
> > cache because the registered AfterTriggerBatch callback will never run.
> > Therefore, the use_cache branch can be removed.
>
> Agreed. I went a step further and restructured 0002 to avoid the
> use_cache branching entirely. Instead of adding if/else blocks to
> ri_FastPathCheck, 0002 now adds a separate ri_FastPathCheckCached()
> function with its own resource lifecycle. 0003 then replaces it with
> ri_FastPathBatchAdd() -- a clean swap rather than completely undoing
> what 0002 added. This also removes the use_cache parameter from
> ri_FastPathProbeOne; the memory context switch to
> TopTransactionContext is now the caller's responsibility.
>
> > 3.
> > ri_FastPathBatchFlush creates a new fk_slot but does not cache it in
> > RI_FastPathEntry. I tried caching it in v5-0006 and ran some benchmarks,
> > it didn't show much improvement.
>
> I put the fk_slot in the cache entry since it's a small change.
>
> > 4.
> > ri_FastPathFlushArray currently uses SK_SEARCHARRAY only for
> > single-column checks. [...] my current understanding is that
> > SK_SEARCHARRAY may not work for multi-column checks.
>
> Right, I haven't investigated this deeply either. The FlushLoop
> fallback is the right approach for now. If we want to explore a
> SEARCHARRAY approach for multi-column keys in a follow-up, it would be
> worth checking with Peter Geoghegan or someone else familiar with the
> btree SAOP internals on how multiple array keys across columns
> are iterated and whether that's usable at all for this use case.
>
> Attached is v6, three patches -- combined the old 0003 (buffering) and
> 0004 (SK_SEARCHARRAY) into a single 0003, since the buffering alone
> has no performance benefit (or at least only minor) and the split
> added unnecessary diff/rebase churn.
>
> The biggest change in this version is the snapshot handling. Looking
> more carefully at what the SPI path actually does for RI_FKey_check
> (non-partitioned PK, detectNewRows = false), I found that
> ri_PerformCheck passes InvalidSnapshot to SPI_execute_snapshot, and
> _SPI_execute_plan ends up doing
> PushActiveSnapshot(GetTransactionSnapshot()). So the SPI path scans
> with the transaction snapshot, not the latest
> snapshot.
>
> So I've changed the fast path to match: ri_FastPathCheck now uses
> GetTransactionSnapshot() directly. Under READ COMMITTED this is a
> fresh snapshot; under REPEATABLE READ it's the frozen
> transaction-start snapshot, so PK rows committed after the transaction
> started are simply not visible. This means the second index probe
> (the IsolationUsesXactSnapshot crosscheck block) is no longer needed
> and is removed. The existing fk-snapshot isolation test confirms this
> is the correct behavior.
>
> Other changes since v5:
>
> * Fixed the batch callback firing during nested SPI: another AFTER
> trigger doing DML via SPI would call AfterTriggerEndQuery at a nested
> level, tearing down our cache mid-batch. Fixed by checking
> query_depth inside FireAfterTriggerBatchCallbacks. Added a test case
> with a trigger that auto-provisions PK rows via SPI.
>
> * Security context is now restored before ri_ReportViolation, not
> after (the ereport doesn't return).
>
> * search_vals[] and matched[] moved from RI_FastPathEntry to stack
> locals in ri_FastPathFlushArray -- they're rewritten from scratch on
> every flush with no state carried between calls.
>
> * Various comment fixes.
>
> I think this is getting close to committable shape. That said, another
> pair of eyes would be reassuring before I pull the trigger. Tomas, if
> you've had a chance to look, would welcome your thoughts.
Adding Tomas to cc -- I thought he was already on the thread when I sent
this. Tomas, context is in the quoted email below. Junwang has posted a
couple of updated versions since then addressing review feedback from Haibo
Yan (memory context handling for the flush path, batch size rationale).
Latest patches are in the thread. If you have time, would appreciate a look.
Thanks, Amit
>
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-18 15:34 ` Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Junwang Zhao @ 2026-03-18 15:34 UTC (permalink / raw)
To: Haibo Yan <[email protected]>; +Cc: [email protected]; Pavel Stehule <[email protected]>; pgsql-hackers
Hi Haibo,
On Tue, Mar 17, 2026 at 8:28 AM Haibo Yan <[email protected]> wrote:
>
> Hi, Amit and Junwang
>
> Thanks for the latest patch. I think the overall direction makes sense, and the single-column SK_SEARCHARRAY path looks like one of the most valuable optimizations here. The patch also seems to cover several important cases, including deferred constraints, duplicate FK values, and multi-column fallback behavior.
>
> After reading through the patch, I have one major comments and a few smaller ones.
Thanks for your review.
>
> 1. TopTransactionContext usage during batched flush may be too coarse-grained
> My biggest concern is the use of TopTransactionContext around the batched flush path.
> As written, ri_FastPathBatchFlush() switches to TopTransactionContext before calling ri_FastPathFlushArray() / ri_FastPathFlushLoop(). That seems broad enough that temporary allocations made during the flush may end up there.
> In particular, in ri_FastPathFlushArray(), I think the objects worth checking carefully are the pass-by-reference Datums returned by the per-element cast call and stored in search_vals[], e.g.
>
> ```search_vals[i] = FunctionCall3(&entry->cast_func_finfo, ...);```
> If those cast results are separately allocated in the current memory context, then pfree(arr) only frees the constructed array object itself; it does not obviously free those intermediate cast results. If so, those allocations could survive until end of transaction rather than just until the end of the current flush.
> Maybe this is harmless in practice, but I think it needs a closer look. It might be better to use a dedicated short-lived context for per-flush temporary allocations, reset it after each flush, or otherwise separate allocations that really need transaction lifetime from those that are only needed transiently during batched processing.
Yeah, that concern is reasonable. After a brief discussion with Amit,
we now replace the TopTransactionContext usage with two
purpose-specific contexts, scan_cxt for index AM allocations, freed
at teardown, and flush_cxt for per-flush transient work, reset each flush,
TopTransactionContext is parent of scan_cxt, and scan_cxt is parent
of flush_cxt, so MemoryContextDelete(scan_cxt) in teardown cleans up
both.
These changes are in v7-0004.
>
> 2. RI_FastPathEntry comment mentions the wrong function name
> The comment above RI_FastPathEntry says it contains resources needed by ri_FastPathFlushBatch(), but the function is named ri_FastPathBatchFlush().
Fixed.
>
> 3. RI_FASTPATH_BATCH_SIZE needs some rationale
> RI_FASTPATH_BATCH_SIZE = 64 may well be a reasonable compromise, but right now it reads like a magic number.
> This choice seems especially relevant because the patch has two opposing effects:
> 3-1. larger batches should amortize the array-scan work better,
> 3-2. but the matched[] bookkeeping in ri_FastPathFlushArray() is O(batch_size^2) in the worst case.
> So I think it would help to include at least a brief rationale in a comment or in the commit message.
Added.
>
> 4. Commit message says the entry stashes fk_relid, but the code actually stashes riinfo
> The commit message says the entry stashes fk_relid and can reopen the relation if needed. Unless I am misreading it, the code actually stores riinfo and later uses riinfo->fk_relid. The distinction is small, but I think the wording should match the implementation more closely.
I changed the commit message to:
Since the FK relation may already be closed by flush time (e.g. for
deferred constraints at COMMIT), reopens the relation using
entry->riinfo->fk_relid if needed.
>
> Thanks again for working on this.
>
> Best regards,
>
> Haibo Yan
>
>
> On Mar 10, 2026, at 5:28 AM, Junwang Zhao <[email protected]> wrote:
>
> Hi,
>
> On Mon, Mar 2, 2026 at 11:30 PM Junwang Zhao <[email protected]> wrote:
>
>
> On Sat, Feb 28, 2026 at 3:08 PM Amit Langote <[email protected]> wrote:
>
>
> Hi Junwang,
>
> On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
>
> On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
>
> I re-ran the benchmarks (same test as yours, different machine):
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
> insert into fk select generate_series(1, 2000000, 2);
>
> master: 2444 ms (median of 3 runs)
> 0001: 1382 ms (43% faster)
> 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
>
>
> I can get similar improvement on my old mac intel chip:
>
> master: 12963.993 ms
> 0001: 6641.692 ms, 48.8% faster
> 0001+0002: 5771.703 ms, 55.5% faster
>
>
> Also, with int PK / int FK (1M rows):
>
> create table pk (a int primary key);
> create table fk (a int references pk);
> insert into pk select generate_series(1, 1000000);
> insert into fk select generate_series(1, 1000000);
>
> master: 1000 ms
> 0001: 520 ms (48% faster)
> 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
>
>
> master: 11134.583 ms
> 0001: 5240.298 ms, 52.9% faster
> 0001+0002: 4554.215 ms, 59.1% faster
>
>
> Thanks for testing, good to see similar numbers. I had forgotten to
> note that these results are when these PK index probes don't do any
> I/O, though you might be aware of that. Below, I report some numbers
> that Tomas Vondra shared with me off-list where the probes do have to
> perform I/O and there the benefits from only this patch set are only
> marginal.
>
> I don't have any additional comments on the patch except one minor nit,
> maybe merge the following two if conditions into one, not a strong opinion
> though.
>
> if (use_cache)
> {
> /*
> * The snapshot was registered once when the cache entry was created.
> * We just patch curcid to reflect the new command counter.
> * SnapshotSetCommandId() only patches process-global statics, not
> * registered copies, so we do it directly.
> *
> * The xmin/xmax/xip fields don't need refreshing: within a single
> * statement batch, only curcid changes between rows.
> */
> Assert(fpentry && fpentry->snapshot != NULL);
> snapshot = fpentry->snapshot;
> snapshot->curcid = GetCurrentCommandId(false);
> }
> else
> snapshot = RegisterSnapshot(GetLatestSnapshot());
>
> if (use_cache)
> {
> pk_rel = fpentry->pk_rel;
> idx_rel = fpentry->idx_rel;
> scandesc = fpentry->scandesc;
> slot = fpentry->slot;
> }
> else
> {
> pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> idx_rel = index_open(riinfo->conindid, AccessShareLock);
> scandesc = index_beginscan(pk_rel, idx_rel,
> snapshot, NULL,
> riinfo->nkeys, 0);
> slot = table_slot_create(pk_rel, NULL);
> }
>
>
> Good idea, done.
>
> While polishing 0002, I revisited the snapshot caching semantics. The
> previous commit message hand-waved about only curcid changing between
> rows, but GetLatestSnapshot() also reflects other backends' commits,
> so reusing the snapshot is a deliberate semantic change from the SPI
> path. I think it's safe because curcid is all we need for
> intra-statement visibility, concurrent commits either already happened
> before our snapshot (and are visible) or are racing with our statement
> and wouldn't be seen reliably even with per-row snapshots since the
> order in which FK rows are checked is nondeterministic, and
> LockTupleKeyShare prevents the PK row from disappearing regardless. In
> essence, we're treating all the FK checks within a trigger-firing
> cycle as a single plan execution that happens to scan N rows, rather
> than N independent SPI queries each taking a fresh snapshot. That's
> the natural model -- a normal SELECT ... FOR KEY SHARE plan doesn't
> re-take GetLatestSnapshot() between rows either.
>
> Similarly, the permission check (schema USAGE + table SELECT) is now
> done once at cache entry creation in ri_FastPathGetEntry() rather than
> on every flush.
>
>
> nice improvement.
>
> The RI check runs as the PK table owner, so we're
> verifying that the owner can access their own table -- a condition
> that won't change unless someone explicitly revokes from the owner,
> which would also break the SPI path.
>
> David Rowley mentioned off-list that it might be worth batching
> multiple FK values into a single index probe, leveraging the
> ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> to buffer FK values across trigger invocations in the per-constraint
> cache (0002 already has the right structure for this), build a
> SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> pages in one sorted traversal instead of one tree descent per row. The
> locking and recheck would still be per-tuple, but the index traversal
> cost drops significantly. Single-column FKs are the obvious starting
> point. That seems worth exploring but can be done as a separate patch
> on top of this.
>
>
> I will take a look at this in the following weeks.
>
>
> I ended up going ahead with the batching and SAOP idea that David
> mentioned -- I had a proof-of-concept working shortly after posting v3
> and kept iterating on it. So attached set is now:
>
> 0001 - Core fast path (your 0001+0002 reworked, as before)
>
> 0002 - Per-batch resource caching (PK relation, index, scandesc, snapshot)
>
> 0003 - FK row buffering: materialize FK tuples into a per-constraint
> batch buffer (64 rows), flush when full or at batch end
>
> 0004 - SK_SEARCHARRAY for single-column FKs: build an array from the
> buffered FK values and do one index scan instead of 64 separate tree
> descents. Multi-column FKs fall back to a per-row loop.
>
> 0003 is pure infrastructure -- it doesn't improve performance on its
> own because the per-row index descent still dominates. The payoff
> comes in 0004.
>
> Numbers (same machine as before, median of 3 runs):
>
> numeric PK / bigint FK, 1M rows:
> master: 2487 ms
> 0001..0004: 1168 ms (2.1x)
>
> int PK / int FK, 500K rows:
> master: 1043 ms
> 0001..0004: 335 ms (3.1x)
>
> The int/int case benefits most because the per-row cost is lower, so
> the SAOP traversal savings are a larger fraction of the total. The
> numeric/bigint case still sees a solid improvement despite the
> cross-type cast overhead.
>
> Tomas Vondra also tested with an I/O-intensive workload (dataset
> larger than shared_buffers, combined with his and Peter Geoghegan's
> I/O prefetching patches) and confirmed that the batching + SAOP
> approach helps there too, not just in the CPU-bound / memory-resident
> case. In fact he showed that the patches here don't make a big dent
> when the main bottleneck is I/O as shown in numbers that he shared in
> an off-list email:
>
> master: 161617 ms
> ri-check (0001..0004): 149446 ms (1.08x)
> ri-check + i/o prefetching: 50885 ms (3.2x)
>
> So the RI patches alone only give ~8% here since most time is waiting
> on reads. But the batching gives the prefetch machinery a window of
> upcoming probes to issue readahead against, so the two together yield
> 3.2x.
>
>
> impressive!
>
>
> Tomas also caught a memory context bug in the batch flush path: the
> cached scandesc lives in TopTransactionContext, but the btree AM
> defers _bt_preprocess_keys allocation to the first getnext call, which
> pallocs into CurrentMemoryContext. If that's a short-lived
> per-trigger-row context, the scandesc has dangling pointers on the
> next rescan. Fixed by switching to TopTransactionContext before the
> probe loop.
>
> Finally, I've fixed a number of other small and not-so-small bugs
> found while polishing the old patches and made other stylistic
> improvements. One notable change is that I introduced a FastPathMeta
>
>
> Yeah, this is much better than the fpmeta_valid field.
>
> struct to store the fast path metadata instead of dumping those arrays
> in the RI_ConstraintInfo. It's allocated lazily on first use and holds
> the per-key compare entries, operator procedures, and index strategy
> info needed by the scan key construction, so RI_ConstraintInfo doesn't
> pay for them when the fast path isn't used.
>
>
> On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
>
>
> Hi Amit,
>
> On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
>
>
> Hi Junwang,
>
> On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
>
> As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> design.
>
> 0001 adds a fast path optimization for foreign key constraint checks
> that bypasses the SPI executor, the fast path applies when the referenced
> table is not partitioned, and the constraint does not involve temporal
> semantics.
>
> With the following test:
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
>
> head:
>
> [local] zhjwpku@postgres:5432-90419=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 13516.177 ms (00:13.516)
>
> [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 15057.638 ms (00:15.058)
>
> patched:
>
> [local] zhjwpku@postgres:5432-98673=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 8248.777 ms (00:08.249)
>
> [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 10117.002 ms (00:10.117)
>
> 0002 cache fast-path metadata used by the index probe, at the current
> time only comparison operator hash entries, operator function OIDs
> and strategy numbers and subtypes for index scans. But this cache
> doesn't buy any performance improvement.
>
> Caching additional metadata should improve performance for foreign key checks.
>
> Amit suggested introducing a mechanism for ri_triggers.c to register a
> cleanup callback in the EState, which AfterTriggerEndQuery() could then
> invoke to release per-statement cached metadata (such as the IndexScanDesc).
> However, I haven't been able to implement this mechanism yet.
>
>
> Thanks for working on this. I've taken your patches as a starting
> point and reworked the series into two patches (attached): 1st is your
> 0001+0002 as the core patch that adds a gated fast-path alternative to
> SPI and 2nd where I added per-statement resource caching. Doing the
> latter turned out to be not so hard thanks to the structure you chose
> to build the core fast path. Good call on adding the RLS and ACL test
> cases, btw.
>
> So, 0001 is a functionally complete fast path: concurrency handling,
> REPEATABLE READ crosscheck, cross-type operators, security context,
> and metadata caching. 0002 implements the per-statement resource
> caching we discussed, though instead of sharing the EState between
> trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
> mechanism that fires at the end of each trigger-firing cycle
> (per-statement for immediate constraints, or until COMMIT for deferred
> ones). It layers resource caching on top so that the PK relation,
> index, scan descriptor, and snapshot stay open across all FK trigger
> invocations within a single trigger-firing cycle rather than being
> opened and closed per row.
>
> Note that phe previous 0002 (metadata caching) is folded into 0001,
> and most of the new fast-path logic added in 0001 now lives in
> ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
> RI_FKey_check diff is just the gating call and SPI fallback.
>
> I re-ran the benchmarks (same test as yours, different machine):
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
> insert into fk select generate_series(1, 2000000, 2);
>
> master: 2444 ms (median of 3 runs)
> 0001: 1382 ms (43% faster)
> 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
>
>
> I can get similar improvement on my old mac intel chip:
>
> master: 12963.993 ms
> 0001: 6641.692 ms, 48.8% faster
> 0001+0002: 5771.703 ms, 55.5% faster
>
>
> Also, with int PK / int FK (1M rows):
>
> create table pk (a int primary key);
> create table fk (a int references pk);
> insert into pk select generate_series(1, 1000000);
> insert into fk select generate_series(1, 1000000);
>
> master: 1000 ms
> 0001: 520 ms (48% faster)
> 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
>
>
> master: 11134.583 ms
> 0001: 5240.298 ms, 52.9% faster
> 0001+0002: 4554.215 ms, 59.1% faster
>
>
> The incremental gain from 0002 comes from eliminating per-row relation
> open/close, scan begin/end, slot alloc/free, and replacing per-row
> GetSnapshotData() with only curcid adjustment on the registered
> snapshot copy in the cache.
>
> The two current limitations are partitioned referenced tables and
> temporal foreign keys. Partitioned PKs are relatively uncommon in
> practice, so the non-partitioned case should cover most FK workloads,
> so I'm not sure it's worth the added complexity to support them.
> Temporal FKs are inherently multi-row, so they're a poor fit for a
> single-probe fast path.
>
> David Rowley mentioned off-list that it might be worth batching
> multiple FK values into a single index probe, leveraging the
> ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> to buffer FK values across trigger invocations in the per-constraint
> cache (0002 already has the right structure for this), build a
> SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> pages in one sorted traversal instead of one tree descent per row. The
> locking and recheck would still be per-tuple, but the index traversal
> cost drops significantly. Single-column FKs are the obvious starting
> point. That seems worth exploring but can be done as a separate patch
> on top of this.
>
>
> I will take a look at this in the following weeks.
>
>
> I think the series is in reasonable shape but would appreciate extra
> eyeballs, especially on the concurrency handling in ri_LockPKTuple()
> in 0001 and the snapshot lifecycle in 0002. Or anything else that
> catches one's eye.
>
> --
> Thanks, Amit Langote
>
>
> I don't have any additional comments on the patch except one minor nit,
> maybe merge the following two if conditions into one, not a strong opinion
> though.
>
> if (use_cache)
> {
> /*
> * The snapshot was registered once when the cache entry was created.
> * We just patch curcid to reflect the new command counter.
> * SnapshotSetCommandId() only patches process-global statics, not
> * registered copies, so we do it directly.
> *
> * The xmin/xmax/xip fields don't need refreshing: within a single
> * statement batch, only curcid changes between rows.
> */
> Assert(fpentry && fpentry->snapshot != NULL);
> snapshot = fpentry->snapshot;
> snapshot->curcid = GetCurrentCommandId(false);
> }
> else
> snapshot = RegisterSnapshot(GetLatestSnapshot());
>
> if (use_cache)
> {
> pk_rel = fpentry->pk_rel;
> idx_rel = fpentry->idx_rel;
> scandesc = fpentry->scandesc;
> slot = fpentry->slot;
> }
> else
> {
> pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> idx_rel = index_open(riinfo->conindid, AccessShareLock);
> scandesc = index_beginscan(pk_rel, idx_rel,
> snapshot, NULL,
> riinfo->nkeys, 0);
> slot = table_slot_create(pk_rel, NULL);
> }
>
> --
> Regards
> Junwang Zhao
>
>
>
>
> --
> Thanks, Amit Langote
>
>
>
>
> --
> Regards
> Junwang Zhao
>
>
> I had an offline discussion with Amit today. There were a few small things
> that could be improved, so I posted a new version of the patch set.
>
> 1.
>
> + if (ri_fastpath_is_applicable(riinfo))
> + {
> + bool found = ri_FastPathCheck(riinfo, fk_rel, newslot);
> +
> + if (found)
> + return PointerGetDatum(NULL);
> +
> + /*
> + * ri_FastPathCheck opens pk_rel internally; we need it for
> + * ri_ReportViolation. Re-open briefly.
> + */
> + pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> + ri_ReportViolation(riinfo, pk_rel, fk_rel,
> + newslot, NULL,
> + RI_PLAN_CHECK_LOOKUPPK, false, false);
> + }
>
> Move ri_ReportViolation into ri_FastPathCheck, so table_open is no
> longer needed, and ri_FastPathCheck now returns void. Since Amit
> agreed this is the right approach, I included it directly in v5-0001.
>
> 2.
>
> After adding the batch fast path, the original ri_FastPathCheck is only
> used by the ALTER TABLE validation path. This path cannot use the
> cache because the registered AfterTriggerBatch callback will never run.
> Therefore, the use_cache branch can be removed.
>
> I made this change in v5-0004 and also updated some related comments.
> Once we agree the changes are correct, it can be merged into v5-0003.
>
> 3.
>
> + fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
> + &TTSOpsHeapTuple);
>
> ri_FastPathBatchFlush creates a new fk_slot but does not cache it in
> RI_FastPathEntry. I tried caching it in v5-0006 and ran some benchmarks,
> it didn't show much improvement. This might be because the slot creation
> function is called once per batch rather than once per row, so the overall
> impact is minimal. I'm posting this here for Amit to take a look and decide
> whether we should adopt it or drop it, since I mentioned the idea to
> him earlier.
>
> 4.
>
> ri_FastPathFlushArray currently uses SK_SEARCHARRAY only for
> single-column checks. I asked whether this could be extended to support
> multi-column cases, and Amit encouraged me to look into it.
>
> After a brief investigation, it seems that ScanKeyEntryInitialize only allows
> passing a single subtype/collation/procedure, which makes it difficult to
> handle multiple types. Based on this, my current understanding is that
> SK_SEARCHARRAY may not work for multi-column checks.
>
> --
> Regards
> Junwang Zhao
> <v5-0005-Use-SK_SEARCHARRAY-for-batched-fast-path-FK-probe.patch><v5-0006-Reuse-FK-tuple-slot-across-fast-path-batches.patch><v5-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch><v5-0004-Refine-fast-path-FK-validation-path.patch><v5-0003-Buffer-FK-rows-for-batched-fast-path-probing.patch><v5-0001-Add-fast-path-for-foreign-key-constraint-checks.patch>
>
>
--
Regards
Junwang Zhao
Attachments:
[application/octet-stream] v7-0004-Refactor-RI-fast-path-to-use-scan_cxt-and-flush_c.patch (7.9K, 2-v7-0004-Refactor-RI-fast-path-to-use-scan_cxt-and-flush_c.patch)
download | inline diff:
From 063e128ac2398706ff8c9f9f72dbb5bf3a733c49 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Tue, 17 Mar 2026 22:57:00 +0800
Subject: [PATCH v7 4/4] Refactor RI fast-path to use scan_cxt and flush_cxt
Replace TopTransactionContext usage with two purpose-specific contexts:
- scan_cxt: Child of TopTransactionContext for index scan allocations
(e.g. _bt_preprocess_keys). Lives for the batch, deleted at teardown,
so these allocations are freed when the trigger batch ends instead of
at transaction end.
- flush_cxt: Child of scan_cxt for per-flush transient work (cast
results, search array). Reset after each flush; deleting scan_cxt in
teardown also frees flush_cxt.
FlushArray and FlushLoop switch to scan_cxt around index_getnext_slot()
calls and to flush_cxt for per-flush work. BatchFlush restores the
caller's memory context after the flush.
Also document the rationale for RI_FASTPATH_BATCH_SIZE = 64 and fix
related comments.
---
src/backend/utils/adt/ri_triggers.c | 68 +++++++++++++++++++++++------
1 file changed, 54 insertions(+), 14 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cc085120c79..0727cbd1656 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,11 +196,21 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in TopTransactionContext,
+ * and the matched[] scan in ri_FastPathFlushArray() is O(batch_size)
+ * per index match. Benchmarking showed little difference between 16
+ * and 64, with 256 consistently slower. 64 is a reasonable default.
+ */
#define RI_FASTPATH_BATCH_SIZE 64
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathFlushBatch().
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
@@ -218,6 +228,8 @@ typedef struct RI_FastPathEntry
TupleTableSlot *pk_slot;
TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
+ MemoryContext scan_cxt; /* index scan allocations */
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
int batch_count;
@@ -430,7 +442,7 @@ RI_FKey_check(TriggerData *trigdata)
* the SPI path below but avoids the per-row executor overhead.
*
* ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
- * themselves if no matching PK row is found, so it only returns on
+ * themselves if no matching PK row is found, so they only return on
* success.
*/
if (ri_fastpath_is_applicable(riinfo))
@@ -2849,7 +2861,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
- MemoryContext oldcxt;
+ MemoryContext oldcxt = CurrentMemoryContext;
if (fpentry->batch_count == 0)
return;
@@ -2859,7 +2871,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
fk_rel, idx_rel);
Assert(riinfo->fpmeta);
-
/*
* CCI and security context switch are done once for the entire batch.
* Per-row CCI is unnecessary because by the time a flush runs, all
@@ -2880,12 +2891,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- /*
- * The cached scandesc lives in TopTransactionContext, but the index AMs
- * might defer some allocations to the first index_getnext_slot call.
- * Ensure those land in TopTransactionContext too.
- */
- oldcxt = MemoryContextSwitchTo(TopTransactionContext);
if (riinfo->nkeys == 1)
ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
else
@@ -2925,8 +2930,16 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ MemoryContextSwitchTo(fpentry->flush_cxt);
ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
snapshot, riinfo, skey, riinfo->nkeys);
@@ -2935,6 +2948,7 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
fk_slot, NULL,
RI_PLAN_CHECK_LOOKUPPK, false, false);
}
+ MemoryContextReset(fpentry->flush_cxt);
}
/*
@@ -2974,6 +2988,13 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
memset(matched, 0, nvals * sizeof(bool));
+ /*
+ * Transient per-flush allocations (cast results, the search array) must
+ * not accumulate across repeated flushes. Use the entry's short-lived
+ * flush context, reset after each flush.
+ */
+ MemoryContextSwitchTo(fpentry->flush_cxt);
+
/*
* Extract FK values, casting to the operator's expected input
* type if needed (e.g. int8 FK -> int4 for int48eq).
@@ -3022,6 +3043,14 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
fpmeta->regops[0],
PointerGetDatum(arr));
+ /*
+ * Switch to scan_cxt for the index scan: index AMs may defer internal
+ * allocations (e.g. _bt_preprocess_keys) to the first index_getnext_slot()
+ * call. Those must survive across rescans within a batch; scan_cxt is
+ * deleted in teardown, cleaning them up when the batch ends.
+ */
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
index_rescan(scandesc, skey, 1, NULL, 0);
/*
@@ -3086,8 +3115,9 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
}
}
- pfree(arr);
+ MemoryContextReset(fpentry->flush_cxt);
}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3096,9 +3126,10 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
* Returns true if a matching PK row was found, locked, and (if
* applicable) visible to the transaction snapshot.
*
- * The caller must ensure CurrentMemoryContext is long-lived enough
- * for the scan descriptor's internal allocations (typically
- * TopTransactionContext when using a cached scandesc).
+ * When using a cached scandesc (from the batch path), the caller must switch
+ * to the entry's scan_cxt before calling so that index AM allocations during
+ * index_getnext_slot() survive across rescans. ri_FastPathCheck uses a
+ * one-shot scan and ends it immediately, so no such switch is needed.
*/
static bool
ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -4087,6 +4118,8 @@ ri_FastPathTeardown(void)
ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
+ if (entry->scan_cxt)
+ MemoryContextDelete(entry->scan_cxt);
}
hash_destroy(ri_fastpath_cache);
@@ -4213,6 +4246,13 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
entry->snapshot, NULL,
riinfo->nkeys, 0);
+ entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path scan context",
+ ALLOCSET_DEFAULT_SIZES);
+ entry->flush_cxt = AllocSetContextCreate(entry->scan_cxt,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+
MemoryContextSwitchTo(oldcxt);
/* Ensure cleanup at end of this trigger-firing batch */
--
2.41.0
[application/octet-stream] v7-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch (24.7K, 3-v7-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch)
download | inline diff:
From 4d2359770bcd8c013b2d119f8eac0733f6f17075 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Feb 2026 21:25:14 +0900
Subject: [PATCH v7 3/4] Batch FK rows and use SK_SEARCHARRAY for fast-path
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
---
src/backend/utils/adt/ri_triggers.c | 391 +++++++++++++++++++---
src/test/regress/expected/foreign_key.out | 40 +++
src/test/regress/sql/foreign_key.sql | 38 +++
3 files changed, 421 insertions(+), 48 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 467418cadc0..cc085120c79 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,13 +196,18 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+#define RI_FASTPATH_BATCH_SIZE 64
+
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ * Per-constraint cache of resources needed by ri_FastPathFlushBatch().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
* and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
*/
typedef struct RI_FastPathEntry
{
@@ -210,8 +215,15 @@ typedef struct RI_FastPathEntry
Relation pk_rel;
Relation idx_rel;
IndexScanDesc scandesc;
- TupleTableSlot *slot;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
} RI_FastPathEntry;
/*
@@ -274,8 +286,14 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -300,8 +318,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
int queryno, bool is_restrict, bool partgone);
static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
Relation fk_rel);
-static void ri_FastPathTeardown(void *arg);
-
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
* RI_FKey_check -
@@ -411,16 +429,22 @@ RI_FKey_check(TriggerData *trigdata)
* index scan + tuple lock. This is semantically equivalent to
* the SPI path below but avoids the per-row executor overhead.
*
- * ri_FastPathCheckCached and ri_FastPathCheck() reports the violation
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
* themselves if no matching PK row is found, so it only returns on
* success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
if (AfterTriggerBatchIsActive())
- ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2703,10 +2727,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2771,72 +2799,295 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
}
/*
- * ri_FastPathCheckCached
- * Cached-resource variant of ri_FastPathCheck for use within the
- * after-trigger framework.
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
*
* Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
* open/close, scan begin/end, and snapshot registration. The snapshot's
- * curcid is patched each call so the scan sees effects of prior triggers.
+ * curcid is patched at flush time so the scan sees effects of prior triggers.
*
- * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
- * if no matching PK row is found.
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
*/
static void
-ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot)
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *slot = fpentry->slot;
- Datum pk_vals[INDEX_MAX_KEYS];
- char pk_nulls[INDEX_MAX_KEYS];
- ScanKeyData skey[INDEX_MAX_KEYS];
- bool found;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
MemoryContext oldcxt;
- /*
- * Advance the command counter and patch the cached snapshot's curcid so
- * the scan sees PK rows inserted by earlier triggers in this statement.
- */
- CommandCounterIncrement();
- fpentry->snapshot->curcid = GetCurrentCommandId(false);
+ if (fpentry->batch_count == 0)
+ return;
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
fk_rel, idx_rel);
Assert(riinfo->fpmeta);
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all
+ * AFTER triggers for the buffered rows have already fired (trigger
+ * invocations strictly alternate per row), so a single CCI advances
+ * past all their effects. Per-row security context switch is
+ * unnecessary because each row's probe runs entirely as the PK table
+ * owner, same as the SPI path -- the only difference is that the SPI
+ * path sets and restores the context per row whereas we do it once
+ * around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
-
/*
- * The cached scandesc lives in TopTransactionContext, but the btree AM
- * defers some allocations to the first index_getnext_slot call. Ensure
- * those land in TopTransactionContext too.
+ * The cached scandesc lives in TopTransactionContext, but the index AMs
+ * might defer some allocations to the first index_getnext_slot call.
+ * Ensure those land in TopTransactionContext too.
*/
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
- riinfo, skey, riinfo->nkeys);
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
MemoryContextSwitchTo(oldcxt);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
- if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+ fpentry->batch_count = 0;
}
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ bool found = false;
+
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input
+ * type if needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input
+ * type, which is what the index comparison expects on the search
+ * side. ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's
+ * right-hand type as the subtype for cross-type operators (e.g.
+ * int8 for int48eq) and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will
+ * internally sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the
+ * actual PK value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine
+ * for the current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+
+ pfree(arr);
+}
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3768,14 +4019,51 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
/*
* ri_FastPathTeardown
- * Tear down all cached fast-path state.
+ * Release all cached resources (scans, relations, snapshots).
*
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
*/
static void
-ri_FastPathTeardown(void *arg)
+ri_FastPathTeardown(void)
{
HASH_SEQ_STATUS status;
RI_FastPathEntry *entry;
@@ -3793,8 +4081,10 @@ ri_FastPathTeardown(void *arg)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
table_close(entry->pk_rel, NoLock);
- if (entry->slot)
- ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
}
@@ -3910,12 +4200,14 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/*
* Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
* per-row GetSnapshotData() overhead.
*/
entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
- entry->slot = table_slot_create(entry->pk_rel, NULL);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
entry->snapshot, NULL,
@@ -3926,7 +4218,7 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/* Ensure cleanup at end of this trigger-firing batch */
if (!ri_fastpath_callback_registered)
{
- RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
ri_fastpath_callback_registered = true;
}
@@ -3937,6 +4229,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(entry->pk_rel);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
}
return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 25d505c6c12..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3590,3 +3590,43 @@ NOTICE: fp_auto_pk called
NOTICE: fp_auto_pk called
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index cedd20c8d11..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2578,3 +2578,41 @@ INSERT INTO fp_fk_cci VALUES (1), (2), (3);
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
--
2.41.0
[application/octet-stream] v7-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (28.2K, 4-v7-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From f2a24b790cdfb2e5d2fbbc0f1fb1e5630f3a2c8c Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 16 Mar 2026 20:57:35 +0900
Subject: [PATCH v7 2/4] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in the previous commit opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation time, and its
curcid is patched in place on each subsequent row rather than
taking a fresh snapshot per invocation. This avoids the per-row
GetSnapshotData() cost. Under REPEATABLE READ the transaction
snapshot is immutable so caching is a no-op. Under READ COMMITTED
the cached snapshot will not reflect PK rows committed by other
backends mid-batch, but this is acceptable: the FK check only needs
PK rows visible before the statement began plus effects of earlier
triggers (tracked by curcid), concurrent commits would not be
reliably visible even with per-row snapshots since trigger firing
order is nondeterministic, and LockTupleKeyShare prevents the PK
row from disappearing regardless.
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.
Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush. The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path. Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathTeardown as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
---
src/backend/commands/trigger.c | 90 +++++++
src/backend/utils/adt/ri_triggers.c | 275 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 86 +++++++
src/test/regress/sql/foreign_key.sql | 80 +++++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 549 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9c0438a125a..c828396843b 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5330,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6074,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6772,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2357735c4c8..467418cadc0 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,23 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +222,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +274,8 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +298,9 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathTeardown(void *arg);
/*
@@ -387,12 +411,16 @@ RI_FKey_check(TriggerData *trigdata)
* index scan + tuple lock. This is semantically equivalent to
* the SPI path below but avoids the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport)
- * if no matching PK row is found, so it only returns on success.
+ * ri_FastPathCheckCached and ri_FastPathCheck() reports the violation
+ * themselves if no matching PK row is found, so it only returns on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ else
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
return PointerGetDatum(NULL);
}
@@ -2742,6 +2770,73 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathCheckCached
+ * Cached-resource variant of ri_FastPathCheck for use within the
+ * after-trigger framework.
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, scan begin/end, and snapshot registration. The snapshot's
+ * curcid is patched each call so the scan sees effects of prior triggers.
+ *
+ * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
+ * if no matching PK row is found.
+ */
+static void
+ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *slot = fpentry->slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ /*
+ * Advance the command counter and patch the cached snapshot's curcid so
+ * the scan sees PK rows inserted by earlier triggers in this statement.
+ */
+ CommandCounterIncrement();
+ fpentry->snapshot->curcid = GetCurrentCommandId(false);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ /*
+ * The cached scandesc lives in TopTransactionContext, but the btree AM
+ * defers some allocations to the first index_getnext_slot call. Ensure
+ * those land in TopTransactionContext too.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
+ riinfo, skey, riinfo->nkeys);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3672,3 +3767,177 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathTeardown(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ /* Close both scans before closing idx_rel. */
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations and snapshots. Just
+ * NULL our pointers so the still-registered batch callback becomes a
+ * no-op. The hash table memory in TopTransactionContext will be
+ * freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * per-row GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 556c86bf5e1..4304abffc8d 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..25d505c6c12 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,89 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..cedd20c8d11 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,83 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..e00c5bf63dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2469,6 +2471,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.41.0
[application/octet-stream] v7-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (29.5K, 5-v7-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From faad7b6a4e9b96f93066b778d803abeff76e25c3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sun, 15 Mar 2026 16:53:27 +0900
Subject: [PATCH v7 1/4] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. ri_FastPathCheck()
extracts the FK values, builds scan keys, performs an index scan, and
locks the matching tuple with LockTupleKeyShare via ri_LockPKTuple(),
which handles the RI-specific subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path does implicitly.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The original code asserted same-type operators only.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Author: Junwang Zhao <[email protected]>
Author: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/
---
src/backend/utils/adt/ri_triggers.c | 464 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
6 files changed, 722 insertions(+), 12 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..2357735c4c8 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI
+ * machinery (plan cache, executor setup, etc.) and do a direct
+ * index scan + tuple lock. This is semantically equivalent to
+ * the SPI path below but avoids the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport)
+ * if no matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3169,8 +3601,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the FK column type already matches what the
+ * operator expects. For same-type operators, that's the common type.
+ * For cross-type operators (e.g. int48eq for int4 PK / int8 FK), the
+ * FK value is the right operand, so skip the cast if typeid matches
+ * righttype.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
--
2.41.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-19 16:19 ` Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Junwang Zhao @ 2026-03-19 16:19 UTC (permalink / raw)
To: Haibo Yan <[email protected]>; +Cc: [email protected]; Pavel Stehule <[email protected]>; pgsql-hackers
On Wed, Mar 18, 2026 at 11:34 PM Junwang Zhao <[email protected]> wrote:
>
> Hi Haibo,
>
> On Tue, Mar 17, 2026 at 8:28 AM Haibo Yan <[email protected]> wrote:
> >
> > Hi, Amit and Junwang
> >
> > Thanks for the latest patch. I think the overall direction makes sense, and the single-column SK_SEARCHARRAY path looks like one of the most valuable optimizations here. The patch also seems to cover several important cases, including deferred constraints, duplicate FK values, and multi-column fallback behavior.
> >
> > After reading through the patch, I have one major comments and a few smaller ones.
>
> Thanks for your review.
>
> >
> > 1. TopTransactionContext usage during batched flush may be too coarse-grained
> > My biggest concern is the use of TopTransactionContext around the batched flush path.
> > As written, ri_FastPathBatchFlush() switches to TopTransactionContext before calling ri_FastPathFlushArray() / ri_FastPathFlushLoop(). That seems broad enough that temporary allocations made during the flush may end up there.
> > In particular, in ri_FastPathFlushArray(), I think the objects worth checking carefully are the pass-by-reference Datums returned by the per-element cast call and stored in search_vals[], e.g.
> >
> > ```search_vals[i] = FunctionCall3(&entry->cast_func_finfo, ...);```
> > If those cast results are separately allocated in the current memory context, then pfree(arr) only frees the constructed array object itself; it does not obviously free those intermediate cast results. If so, those allocations could survive until end of transaction rather than just until the end of the current flush.
> > Maybe this is harmless in practice, but I think it needs a closer look. It might be better to use a dedicated short-lived context for per-flush temporary allocations, reset it after each flush, or otherwise separate allocations that really need transaction lifetime from those that are only needed transiently during batched processing.
>
> Yeah, that concern is reasonable. After a brief discussion with Amit,
> we now replace the TopTransactionContext usage with two
> purpose-specific contexts, scan_cxt for index AM allocations, freed
> at teardown, and flush_cxt for per-flush transient work, reset each flush,
> TopTransactionContext is parent of scan_cxt, and scan_cxt is parent
> of flush_cxt, so MemoryContextDelete(scan_cxt) in teardown cleans up
> both.
>
> These changes are in v7-0004.
>
> >
> > 2. RI_FastPathEntry comment mentions the wrong function name
> > The comment above RI_FastPathEntry says it contains resources needed by ri_FastPathFlushBatch(), but the function is named ri_FastPathBatchFlush().
>
> Fixed.
>
> >
> > 3. RI_FASTPATH_BATCH_SIZE needs some rationale
> > RI_FASTPATH_BATCH_SIZE = 64 may well be a reasonable compromise, but right now it reads like a magic number.
> > This choice seems especially relevant because the patch has two opposing effects:
> > 3-1. larger batches should amortize the array-scan work better,
> > 3-2. but the matched[] bookkeeping in ri_FastPathFlushArray() is O(batch_size^2) in the worst case.
> > So I think it would help to include at least a brief rationale in a comment or in the commit message.
>
> Added.
>
> >
> > 4. Commit message says the entry stashes fk_relid, but the code actually stashes riinfo
> > The commit message says the entry stashes fk_relid and can reopen the relation if needed. Unless I am misreading it, the code actually stores riinfo and later uses riinfo->fk_relid. The distinction is small, but I think the wording should match the implementation more closely.
>
> I changed the commit message to:
>
> Since the FK relation may already be closed by flush time (e.g. for
> deferred constraints at COMMIT), reopens the relation using
> entry->riinfo->fk_relid if needed.
>
> >
> > Thanks again for working on this.
> >
> > Best regards,
> >
> > Haibo Yan
> >
> >
> > On Mar 10, 2026, at 5:28 AM, Junwang Zhao <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, Mar 2, 2026 at 11:30 PM Junwang Zhao <[email protected]> wrote:
> >
> >
> > On Sat, Feb 28, 2026 at 3:08 PM Amit Langote <[email protected]> wrote:
> >
> >
> > Hi Junwang,
> >
> > On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> >
> > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> >
> > I re-ran the benchmarks (same test as yours, different machine):
> >
> > create table pk (a numeric primary key);
> > create table fk (a bigint references pk);
> > insert into pk select generate_series(1, 2000000);
> > insert into fk select generate_series(1, 2000000, 2);
> >
> > master: 2444 ms (median of 3 runs)
> > 0001: 1382 ms (43% faster)
> > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> >
> >
> > I can get similar improvement on my old mac intel chip:
> >
> > master: 12963.993 ms
> > 0001: 6641.692 ms, 48.8% faster
> > 0001+0002: 5771.703 ms, 55.5% faster
> >
> >
> > Also, with int PK / int FK (1M rows):
> >
> > create table pk (a int primary key);
> > create table fk (a int references pk);
> > insert into pk select generate_series(1, 1000000);
> > insert into fk select generate_series(1, 1000000);
> >
> > master: 1000 ms
> > 0001: 520 ms (48% faster)
> > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> >
> >
> > master: 11134.583 ms
> > 0001: 5240.298 ms, 52.9% faster
> > 0001+0002: 4554.215 ms, 59.1% faster
> >
> >
> > Thanks for testing, good to see similar numbers. I had forgotten to
> > note that these results are when these PK index probes don't do any
> > I/O, though you might be aware of that. Below, I report some numbers
> > that Tomas Vondra shared with me off-list where the probes do have to
> > perform I/O and there the benefits from only this patch set are only
> > marginal.
> >
> > I don't have any additional comments on the patch except one minor nit,
> > maybe merge the following two if conditions into one, not a strong opinion
> > though.
> >
> > if (use_cache)
> > {
> > /*
> > * The snapshot was registered once when the cache entry was created.
> > * We just patch curcid to reflect the new command counter.
> > * SnapshotSetCommandId() only patches process-global statics, not
> > * registered copies, so we do it directly.
> > *
> > * The xmin/xmax/xip fields don't need refreshing: within a single
> > * statement batch, only curcid changes between rows.
> > */
> > Assert(fpentry && fpentry->snapshot != NULL);
> > snapshot = fpentry->snapshot;
> > snapshot->curcid = GetCurrentCommandId(false);
> > }
> > else
> > snapshot = RegisterSnapshot(GetLatestSnapshot());
> >
> > if (use_cache)
> > {
> > pk_rel = fpentry->pk_rel;
> > idx_rel = fpentry->idx_rel;
> > scandesc = fpentry->scandesc;
> > slot = fpentry->slot;
> > }
> > else
> > {
> > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > scandesc = index_beginscan(pk_rel, idx_rel,
> > snapshot, NULL,
> > riinfo->nkeys, 0);
> > slot = table_slot_create(pk_rel, NULL);
> > }
> >
> >
> > Good idea, done.
> >
> > While polishing 0002, I revisited the snapshot caching semantics. The
> > previous commit message hand-waved about only curcid changing between
> > rows, but GetLatestSnapshot() also reflects other backends' commits,
> > so reusing the snapshot is a deliberate semantic change from the SPI
> > path. I think it's safe because curcid is all we need for
> > intra-statement visibility, concurrent commits either already happened
> > before our snapshot (and are visible) or are racing with our statement
> > and wouldn't be seen reliably even with per-row snapshots since the
> > order in which FK rows are checked is nondeterministic, and
> > LockTupleKeyShare prevents the PK row from disappearing regardless. In
> > essence, we're treating all the FK checks within a trigger-firing
> > cycle as a single plan execution that happens to scan N rows, rather
> > than N independent SPI queries each taking a fresh snapshot. That's
> > the natural model -- a normal SELECT ... FOR KEY SHARE plan doesn't
> > re-take GetLatestSnapshot() between rows either.
> >
> > Similarly, the permission check (schema USAGE + table SELECT) is now
> > done once at cache entry creation in ri_FastPathGetEntry() rather than
> > on every flush.
> >
> >
> > nice improvement.
> >
> > The RI check runs as the PK table owner, so we're
> > verifying that the owner can access their own table -- a condition
> > that won't change unless someone explicitly revokes from the owner,
> > which would also break the SPI path.
> >
> > David Rowley mentioned off-list that it might be worth batching
> > multiple FK values into a single index probe, leveraging the
> > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > to buffer FK values across trigger invocations in the per-constraint
> > cache (0002 already has the right structure for this), build a
> > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > pages in one sorted traversal instead of one tree descent per row. The
> > locking and recheck would still be per-tuple, but the index traversal
> > cost drops significantly. Single-column FKs are the obvious starting
> > point. That seems worth exploring but can be done as a separate patch
> > on top of this.
> >
> >
> > I will take a look at this in the following weeks.
> >
> >
> > I ended up going ahead with the batching and SAOP idea that David
> > mentioned -- I had a proof-of-concept working shortly after posting v3
> > and kept iterating on it. So attached set is now:
> >
> > 0001 - Core fast path (your 0001+0002 reworked, as before)
> >
> > 0002 - Per-batch resource caching (PK relation, index, scandesc, snapshot)
> >
> > 0003 - FK row buffering: materialize FK tuples into a per-constraint
> > batch buffer (64 rows), flush when full or at batch end
> >
> > 0004 - SK_SEARCHARRAY for single-column FKs: build an array from the
> > buffered FK values and do one index scan instead of 64 separate tree
> > descents. Multi-column FKs fall back to a per-row loop.
> >
> > 0003 is pure infrastructure -- it doesn't improve performance on its
> > own because the per-row index descent still dominates. The payoff
> > comes in 0004.
> >
> > Numbers (same machine as before, median of 3 runs):
> >
> > numeric PK / bigint FK, 1M rows:
> > master: 2487 ms
> > 0001..0004: 1168 ms (2.1x)
> >
> > int PK / int FK, 500K rows:
> > master: 1043 ms
> > 0001..0004: 335 ms (3.1x)
> >
> > The int/int case benefits most because the per-row cost is lower, so
> > the SAOP traversal savings are a larger fraction of the total. The
> > numeric/bigint case still sees a solid improvement despite the
> > cross-type cast overhead.
> >
> > Tomas Vondra also tested with an I/O-intensive workload (dataset
> > larger than shared_buffers, combined with his and Peter Geoghegan's
> > I/O prefetching patches) and confirmed that the batching + SAOP
> > approach helps there too, not just in the CPU-bound / memory-resident
> > case. In fact he showed that the patches here don't make a big dent
> > when the main bottleneck is I/O as shown in numbers that he shared in
> > an off-list email:
> >
> > master: 161617 ms
> > ri-check (0001..0004): 149446 ms (1.08x)
> > ri-check + i/o prefetching: 50885 ms (3.2x)
> >
> > So the RI patches alone only give ~8% here since most time is waiting
> > on reads. But the batching gives the prefetch machinery a window of
> > upcoming probes to issue readahead against, so the two together yield
> > 3.2x.
> >
> >
> > impressive!
> >
> >
> > Tomas also caught a memory context bug in the batch flush path: the
> > cached scandesc lives in TopTransactionContext, but the btree AM
> > defers _bt_preprocess_keys allocation to the first getnext call, which
> > pallocs into CurrentMemoryContext. If that's a short-lived
> > per-trigger-row context, the scandesc has dangling pointers on the
> > next rescan. Fixed by switching to TopTransactionContext before the
> > probe loop.
> >
> > Finally, I've fixed a number of other small and not-so-small bugs
> > found while polishing the old patches and made other stylistic
> > improvements. One notable change is that I introduced a FastPathMeta
> >
> >
> > Yeah, this is much better than the fpmeta_valid field.
> >
> > struct to store the fast path metadata instead of dumping those arrays
> > in the RI_ConstraintInfo. It's allocated lazily on first use and holds
> > the per-key compare entries, operator procedures, and index strategy
> > info needed by the scan key construction, so RI_ConstraintInfo doesn't
> > pay for them when the fast path isn't used.
> >
> >
> > On Mon, Feb 23, 2026 at 10:45 PM Junwang Zhao <[email protected]> wrote:
> >
> >
> > Hi Amit,
> >
> > On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
> >
> >
> > Hi Junwang,
> >
> > On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> >
> > As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> > design.
> >
> > 0001 adds a fast path optimization for foreign key constraint checks
> > that bypasses the SPI executor, the fast path applies when the referenced
> > table is not partitioned, and the constraint does not involve temporal
> > semantics.
> >
> > With the following test:
> >
> > create table pk (a numeric primary key);
> > create table fk (a bigint references pk);
> > insert into pk select generate_series(1, 2000000);
> >
> > head:
> >
> > [local] zhjwpku@postgres:5432-90419=# insert into fk select
> > generate_series(1, 2000000, 2);
> > INSERT 0 1000000
> > Time: 13516.177 ms (00:13.516)
> >
> > [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> > UPDATE 1000000
> > Time: 15057.638 ms (00:15.058)
> >
> > patched:
> >
> > [local] zhjwpku@postgres:5432-98673=# insert into fk select
> > generate_series(1, 2000000, 2);
> > INSERT 0 1000000
> > Time: 8248.777 ms (00:08.249)
> >
> > [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> > UPDATE 1000000
> > Time: 10117.002 ms (00:10.117)
> >
> > 0002 cache fast-path metadata used by the index probe, at the current
> > time only comparison operator hash entries, operator function OIDs
> > and strategy numbers and subtypes for index scans. But this cache
> > doesn't buy any performance improvement.
> >
> > Caching additional metadata should improve performance for foreign key checks.
> >
> > Amit suggested introducing a mechanism for ri_triggers.c to register a
> > cleanup callback in the EState, which AfterTriggerEndQuery() could then
> > invoke to release per-statement cached metadata (such as the IndexScanDesc).
> > However, I haven't been able to implement this mechanism yet.
> >
> >
> > Thanks for working on this. I've taken your patches as a starting
> > point and reworked the series into two patches (attached): 1st is your
> > 0001+0002 as the core patch that adds a gated fast-path alternative to
> > SPI and 2nd where I added per-statement resource caching. Doing the
> > latter turned out to be not so hard thanks to the structure you chose
> > to build the core fast path. Good call on adding the RLS and ACL test
> > cases, btw.
> >
> > So, 0001 is a functionally complete fast path: concurrency handling,
> > REPEATABLE READ crosscheck, cross-type operators, security context,
> > and metadata caching. 0002 implements the per-statement resource
> > caching we discussed, though instead of sharing the EState between
> > trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
> > mechanism that fires at the end of each trigger-firing cycle
> > (per-statement for immediate constraints, or until COMMIT for deferred
> > ones). It layers resource caching on top so that the PK relation,
> > index, scan descriptor, and snapshot stay open across all FK trigger
> > invocations within a single trigger-firing cycle rather than being
> > opened and closed per row.
> >
> > Note that phe previous 0002 (metadata caching) is folded into 0001,
> > and most of the new fast-path logic added in 0001 now lives in
> > ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
> > RI_FKey_check diff is just the gating call and SPI fallback.
> >
> > I re-ran the benchmarks (same test as yours, different machine):
> >
> > create table pk (a numeric primary key);
> > create table fk (a bigint references pk);
> > insert into pk select generate_series(1, 2000000);
> > insert into fk select generate_series(1, 2000000, 2);
> >
> > master: 2444 ms (median of 3 runs)
> > 0001: 1382 ms (43% faster)
> > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
> >
> >
> > I can get similar improvement on my old mac intel chip:
> >
> > master: 12963.993 ms
> > 0001: 6641.692 ms, 48.8% faster
> > 0001+0002: 5771.703 ms, 55.5% faster
> >
> >
> > Also, with int PK / int FK (1M rows):
> >
> > create table pk (a int primary key);
> > create table fk (a int references pk);
> > insert into pk select generate_series(1, 1000000);
> > insert into fk select generate_series(1, 1000000);
> >
> > master: 1000 ms
> > 0001: 520 ms (48% faster)
> > 0001+0002: 432 ms (57% faster, 17% over 0001 alone)
> >
> >
> > master: 11134.583 ms
> > 0001: 5240.298 ms, 52.9% faster
> > 0001+0002: 4554.215 ms, 59.1% faster
> >
> >
> > The incremental gain from 0002 comes from eliminating per-row relation
> > open/close, scan begin/end, slot alloc/free, and replacing per-row
> > GetSnapshotData() with only curcid adjustment on the registered
> > snapshot copy in the cache.
> >
> > The two current limitations are partitioned referenced tables and
> > temporal foreign keys. Partitioned PKs are relatively uncommon in
> > practice, so the non-partitioned case should cover most FK workloads,
> > so I'm not sure it's worth the added complexity to support them.
> > Temporal FKs are inherently multi-row, so they're a poor fit for a
> > single-probe fast path.
> >
> > David Rowley mentioned off-list that it might be worth batching
> > multiple FK values into a single index probe, leveraging the
> > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> > to buffer FK values across trigger invocations in the per-constraint
> > cache (0002 already has the right structure for this), build a
> > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> > pages in one sorted traversal instead of one tree descent per row. The
> > locking and recheck would still be per-tuple, but the index traversal
> > cost drops significantly. Single-column FKs are the obvious starting
> > point. That seems worth exploring but can be done as a separate patch
> > on top of this.
> >
> >
> > I will take a look at this in the following weeks.
> >
> >
> > I think the series is in reasonable shape but would appreciate extra
> > eyeballs, especially on the concurrency handling in ri_LockPKTuple()
> > in 0001 and the snapshot lifecycle in 0002. Or anything else that
> > catches one's eye.
> >
> > --
> > Thanks, Amit Langote
> >
> >
> > I don't have any additional comments on the patch except one minor nit,
> > maybe merge the following two if conditions into one, not a strong opinion
> > though.
> >
> > if (use_cache)
> > {
> > /*
> > * The snapshot was registered once when the cache entry was created.
> > * We just patch curcid to reflect the new command counter.
> > * SnapshotSetCommandId() only patches process-global statics, not
> > * registered copies, so we do it directly.
> > *
> > * The xmin/xmax/xip fields don't need refreshing: within a single
> > * statement batch, only curcid changes between rows.
> > */
> > Assert(fpentry && fpentry->snapshot != NULL);
> > snapshot = fpentry->snapshot;
> > snapshot->curcid = GetCurrentCommandId(false);
> > }
> > else
> > snapshot = RegisterSnapshot(GetLatestSnapshot());
> >
> > if (use_cache)
> > {
> > pk_rel = fpentry->pk_rel;
> > idx_rel = fpentry->idx_rel;
> > scandesc = fpentry->scandesc;
> > slot = fpentry->slot;
> > }
> > else
> > {
> > pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > idx_rel = index_open(riinfo->conindid, AccessShareLock);
> > scandesc = index_beginscan(pk_rel, idx_rel,
> > snapshot, NULL,
> > riinfo->nkeys, 0);
> > slot = table_slot_create(pk_rel, NULL);
> > }
> >
> > --
> > Regards
> > Junwang Zhao
> >
> >
> >
> >
> > --
> > Thanks, Amit Langote
> >
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao
> >
> >
> > I had an offline discussion with Amit today. There were a few small things
> > that could be improved, so I posted a new version of the patch set.
> >
> > 1.
> >
> > + if (ri_fastpath_is_applicable(riinfo))
> > + {
> > + bool found = ri_FastPathCheck(riinfo, fk_rel, newslot);
> > +
> > + if (found)
> > + return PointerGetDatum(NULL);
> > +
> > + /*
> > + * ri_FastPathCheck opens pk_rel internally; we need it for
> > + * ri_ReportViolation. Re-open briefly.
> > + */
> > + pk_rel = table_open(riinfo->pk_relid, RowShareLock);
> > + ri_ReportViolation(riinfo, pk_rel, fk_rel,
> > + newslot, NULL,
> > + RI_PLAN_CHECK_LOOKUPPK, false, false);
> > + }
> >
> > Move ri_ReportViolation into ri_FastPathCheck, so table_open is no
> > longer needed, and ri_FastPathCheck now returns void. Since Amit
> > agreed this is the right approach, I included it directly in v5-0001.
> >
> > 2.
> >
> > After adding the batch fast path, the original ri_FastPathCheck is only
> > used by the ALTER TABLE validation path. This path cannot use the
> > cache because the registered AfterTriggerBatch callback will never run.
> > Therefore, the use_cache branch can be removed.
> >
> > I made this change in v5-0004 and also updated some related comments.
> > Once we agree the changes are correct, it can be merged into v5-0003.
> >
> > 3.
> >
> > + fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
> > + &TTSOpsHeapTuple);
> >
> > ri_FastPathBatchFlush creates a new fk_slot but does not cache it in
> > RI_FastPathEntry. I tried caching it in v5-0006 and ran some benchmarks,
> > it didn't show much improvement. This might be because the slot creation
> > function is called once per batch rather than once per row, so the overall
> > impact is minimal. I'm posting this here for Amit to take a look and decide
> > whether we should adopt it or drop it, since I mentioned the idea to
> > him earlier.
> >
> > 4.
> >
> > ri_FastPathFlushArray currently uses SK_SEARCHARRAY only for
> > single-column checks. I asked whether this could be extended to support
> > multi-column cases, and Amit encouraged me to look into it.
> >
> > After a brief investigation, it seems that ScanKeyEntryInitialize only allows
> > passing a single subtype/collation/procedure, which makes it difficult to
> > handle multiple types. Based on this, my current understanding is that
> > SK_SEARCHARRAY may not work for multi-column checks.
> >
> > --
> > Regards
> > Junwang Zhao
> > <v5-0005-Use-SK_SEARCHARRAY-for-batched-fast-path-FK-probe.patch><v5-0006-Reuse-FK-tuple-slot-across-fast-path-batches.patch><v5-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch><v5-0004-Refine-fast-path-FK-validation-path.patch><v5-0003-Buffer-FK-rows-for-batched-fast-path-probing.patch><v5-0001-Add-fast-path-for-foreign-key-constraint-checks.patch>
> >
> >
>
>
> --
> Regards
> Junwang Zhao
I squashed 0004 into 0003 so that each file can be committed independently.
I also runned pgindent for each file.
--
Regards
Junwang Zhao
Attachments:
[application/octet-stream] v8-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (29.9K, 2-v8-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 672dad9c3cfe97069d8257f251e4a34e93ffaf18 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sun, 15 Mar 2026 16:53:27 +0900
Subject: [PATCH v8 1/3] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. ri_FastPathCheck()
extracts the FK values, builds scan keys, performs an index scan, and
locks the matching tuple with LockTupleKeyShare via ri_LockPKTuple(),
which handles the RI-specific subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path does implicitly.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The original code asserted same-type operators only.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Author: Junwang Zhao <[email protected]>
Author: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/
---
src/backend/utils/adt/ri_triggers.c | 464 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 723 insertions(+), 12 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..ce0f5c120f4 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+ * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+ * lock. This is semantically equivalent to the SPI path below but avoids
+ * the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport) if no
+ * matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3169,8 +3601,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the FK column type already matches what the
+ * operator expects. For same-type operators, that's the common type.
+ * For cross-type operators (e.g. int48eq for int4 PK / int8 FK), the
+ * FK value is the right operand, so skip the cast if typeid matches
+ * righttype.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4673eca9cd6..f840f471b35 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -811,6 +811,7 @@ ExtensionInfo
ExtensionLocation
ExtensionSiblingCache
ExtensionVersionInfo
+FastPathMeta
FDWCollateState
FD_SET
FILE
--
2.41.0
[application/octet-stream] v8-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (28.2K, 3-v8-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From 5c97f903235b72ed021a095381f403fbdf8d2e4f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 16 Mar 2026 20:57:35 +0900
Subject: [PATCH v8 2/3] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in the previous commit opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation time, and its
curcid is patched in place on each subsequent row rather than
taking a fresh snapshot per invocation. This avoids the per-row
GetSnapshotData() cost. Under REPEATABLE READ the transaction
snapshot is immutable so caching is a no-op. Under READ COMMITTED
the cached snapshot will not reflect PK rows committed by other
backends mid-batch, but this is acceptable: the FK check only needs
PK rows visible before the statement began plus effects of earlier
triggers (tracked by curcid), concurrent commits would not be
reliably visible even with per-row snapshots since trigger firing
order is nondeterministic, and LockTupleKeyShare prevents the PK
row from disappearing regardless.
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.
Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush. The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path. Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathTeardown as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
---
src/backend/commands/trigger.c | 90 +++++++
src/backend/utils/adt/ri_triggers.c | 275 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 86 +++++++
src/test/regress/sql/foreign_key.sql | 80 +++++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 549 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..b7442cf6cb1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5330,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6074,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6772,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index ce0f5c120f4..84bf7d74ec5 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,23 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +222,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +274,8 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +298,9 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathTeardown(void *arg);
/*
@@ -387,12 +411,16 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ else
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
return PointerGetDatum(NULL);
}
@@ -2742,6 +2770,73 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathCheckCached
+ * Cached-resource variant of ri_FastPathCheck for use within the
+ * after-trigger framework.
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, scan begin/end, and snapshot registration. The snapshot's
+ * curcid is patched each call so the scan sees effects of prior triggers.
+ *
+ * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
+ * if no matching PK row is found.
+ */
+static void
+ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *slot = fpentry->slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ /*
+ * Advance the command counter and patch the cached snapshot's curcid so
+ * the scan sees PK rows inserted by earlier triggers in this statement.
+ */
+ CommandCounterIncrement();
+ fpentry->snapshot->curcid = GetCurrentCommandId(false);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ /*
+ * The cached scandesc lives in TopTransactionContext, but the btree AM
+ * defers some allocations to the first index_getnext_slot call. Ensure
+ * those land in TopTransactionContext too.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
+ riinfo, skey, riinfo->nkeys);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3672,3 +3767,177 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathTeardown(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ /* Close both scans before closing idx_rel. */
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations and snapshots. Just
+ * NULL our pointers so the still-registered batch callback becomes a
+ * no-op. The hash table memory in TopTransactionContext will be
+ * freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * per-row GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..25d505c6c12 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,89 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..cedd20c8d11 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,83 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f840f471b35..100986b3d84 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2472,6 +2474,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.41.0
[application/octet-stream] v8-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch (27.9K, 4-v8-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch)
download | inline diff:
From 4005fa410f851ac84f7eeabda9bc9c64957a6b6f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Feb 2026 21:25:14 +0900
Subject: [PATCH v8 3/3] Batch FK rows and use SK_SEARCHARRAY for fast-path
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Introduce two purpose-specific contexts:
- scan_cxt: Child of TopTransactionContext for index scan
allocations (e.g. _bt_preprocess_keys). Lives for the batch,
deleted at teardown, so these allocations are freed when the
trigger batch ends instead of at transaction end.
- flush_cxt: Child of scan_cxt for per-flush transient work (cast
results, search array). Reset after each flush; deleting scan_cxt
in teardown also frees flush_cxt.
ri_FastPathFlushArray and ri_FastPathFlushLoop switch to scan_cxt
around index_getnext_slot() calls and to flush_cxt for per-flush
work. ri_FastPathBatchFlush restores the caller's memory context
after the flush.
---
src/backend/utils/adt/ri_triggers.c | 443 +++++++++++++++++++---
src/test/regress/expected/foreign_key.out | 40 ++
src/test/regress/sql/foreign_key.sql | 38 ++
3 files changed, 467 insertions(+), 54 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 84bf7d74ec5..8cffe11f564 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,13 +196,28 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in TopTransactionContext,
+ * and the matched[] scan in ri_FastPathFlushArray() is O(batch_size)
+ * per index match. Benchmarking showed little difference between 16
+ * and 64, with 256 consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
* and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
*/
typedef struct RI_FastPathEntry
{
@@ -210,8 +225,17 @@ typedef struct RI_FastPathEntry
Relation pk_rel;
Relation idx_rel;
IndexScanDesc scandesc;
- TupleTableSlot *slot;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
+ MemoryContext scan_cxt; /* index scan allocations */
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
} RI_FastPathEntry;
/*
@@ -274,8 +298,14 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -300,8 +330,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
int queryno, bool is_restrict, bool partgone);
static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
Relation fk_rel);
-static void ri_FastPathTeardown(void *arg);
-
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
* RI_FKey_check -
@@ -411,16 +441,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
* themselves if no matching PK row is found, so they only return on
* success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
if (AfterTriggerBatchIsActive())
- ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2703,10 +2739,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2771,70 +2811,311 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
}
/*
- * ri_FastPathCheckCached
- * Cached-resource variant of ri_FastPathCheck for use within the
- * after-trigger framework.
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
*
* Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
* open/close, scan begin/end, and snapshot registration. The snapshot's
- * curcid is patched each call so the scan sees effects of prior triggers.
+ * curcid is patched at flush time so the scan sees effects of prior triggers.
*
- * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
- * if no matching PK row is found.
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
*/
static void
-ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot)
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *slot = fpentry->slot;
- Datum pk_vals[INDEX_MAX_KEYS];
- char pk_nulls[INDEX_MAX_KEYS];
- ScanKeyData skey[INDEX_MAX_KEYS];
- bool found;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
- MemoryContext oldcxt;
+ MemoryContext oldcxt = CurrentMemoryContext;
- /*
- * Advance the command counter and patch the cached snapshot's curcid so
- * the scan sees PK rows inserted by earlier triggers in this statement.
- */
- CommandCounterIncrement();
- fpentry->snapshot->curcid = GetCurrentCommandId(false);
+ if (fpentry->batch_count == 0)
+ return;
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
fk_rel, idx_rel);
Assert(riinfo->fpmeta);
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ bool found = false;
+
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ MemoryContextSwitchTo(fpentry->flush_cxt);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ MemoryContextReset(fpentry->flush_cxt);
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
/*
- * The cached scandesc lives in TopTransactionContext, but the btree AM
- * defers some allocations to the first index_getnext_slot call. Ensure
- * those land in TopTransactionContext too.
+ * Transient per-flush allocations (cast results, the search array) must
+ * not accumulate across repeated flushes. Use the entry's short-lived
+ * flush context, reset after each flush.
*/
- oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
- riinfo, skey, riinfo->nkeys);
- MemoryContextSwitchTo(oldcxt);
- SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ MemoryContextSwitchTo(fpentry->flush_cxt);
- if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ /*
+ * Switch to scan_cxt for the index scan: index AMs may defer internal
+ * allocations (e.g. _bt_preprocess_keys) to the first
+ * index_getnext_slot() call. Those must survive across rescans within a
+ * batch; scan_cxt is deleted in teardown, cleaning them up when the batch
+ * ends.
+ */
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
}
/*
@@ -2845,9 +3126,10 @@ ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
* Returns true if a matching PK row was found, locked, and (if
* applicable) visible to the transaction snapshot.
*
- * The caller must ensure CurrentMemoryContext is long-lived enough
- * for the scan descriptor's internal allocations (typically
- * TopTransactionContext when using a cached scandesc).
+ * When using a cached scandesc (from the batch path), the caller must switch
+ * to the entry's scan_cxt before calling so that index AM allocations during
+ * index_getnext_slot() survive across rescans. ri_FastPathCheck uses a
+ * one-shot scan and ends it immediately, so no such switch is needed.
*/
static bool
ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -3768,14 +4050,51 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
/*
* ri_FastPathTeardown
- * Tear down all cached fast-path state.
+ * Release all cached resources (scans, relations, snapshots).
*
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
*/
static void
-ri_FastPathTeardown(void *arg)
+ri_FastPathTeardown(void)
{
HASH_SEQ_STATUS status;
RI_FastPathEntry *entry;
@@ -3793,10 +4112,14 @@ ri_FastPathTeardown(void *arg)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
table_close(entry->pk_rel, NoLock);
- if (entry->slot)
- ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
+ if (entry->scan_cxt)
+ MemoryContextDelete(entry->scan_cxt);
}
hash_destroy(ri_fastpath_cache);
@@ -3910,23 +4233,32 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/*
* Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
* per-row GetSnapshotData() overhead.
*/
entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
- entry->slot = table_slot_create(entry->pk_rel, NULL);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
entry->snapshot, NULL,
riinfo->nkeys, 0);
+ entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path scan context",
+ ALLOCSET_DEFAULT_SIZES);
+ entry->flush_cxt = AllocSetContextCreate(entry->scan_cxt,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+
MemoryContextSwitchTo(oldcxt);
/* Ensure cleanup at end of this trigger-firing batch */
if (!ri_fastpath_callback_registered)
{
- RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
ri_fastpath_callback_registered = true;
}
@@ -3937,6 +4269,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(entry->pk_rel);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
}
return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 25d505c6c12..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3590,3 +3590,43 @@ NOTICE: fp_auto_pk called
NOTICE: fp_auto_pk called
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index cedd20c8d11..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2578,3 +2578,41 @@ INSERT INTO fp_fk_cci VALUES (1), (2), (3);
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
--
2.41.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-24 11:47 ` Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-24 11:47 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
Hi Junwang,
On Fri, Mar 20, 2026 at 1:20 AM Junwang Zhao <[email protected]> wrote:
> I squashed 0004 into 0003 so that each file can be committed independently.
> I also runned pgindent for each file.
Thanks for that.
Here's another version.
In 0001, I noticed that the condition change in ri_HashCompareOp could
be simplified further. Also improved the commentary surrounding that.
I also updated the commit message to clarify parity with the SPI path.
Updated the commit message of 0002 to talk about why caching the
snapshot for the entire trigger firing cycle of a given constraint
makes a trade off compared to the SPI path which retakes the snapshot
for every row checked and could in principle avoid failure for FK rows
whose corresponding PK row was added by a concurrently committed
transaction, at least in the READ COMMITTED case.
Updated the commit message of 0003 to clarify that it replaces
ri_FastPathCheckCached() from 0002 with the BatchAdd/BatchFlush pair,
and that the cached resources are used unchanged -- only the probing
cadence changes from per-row to per-flush. Per-flush CCI is safe
because all AFTER triggers for the buffered rows have already fired
by flush time; a new test case is added to show that.
Finally I added a short line at the end of each patch's commit message
to mention the speedup observed at each stage. There are placeholders
such as <commit-hash-0001> that I will replace by an actual commit
hash before pushing.
I will continue staring at these for any remaining issues before
pushing them one-by-one at some point by early next week. Happy to
hear any thoughts before I push.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v9-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch (28.6K, 2-v9-0003-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pat.patch)
download | inline diff:
From 3086452291a81844c9f9789082362a7e5769de64 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 20:09:07 +0900
Subject: [PATCH v9 3/3] Batch FK rows and use SK_SEARCHARRAY for fast-path
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch.
ri_FastPathCheckCached() from <commit-hash-0002>, which probed the
index once per trigger invocation using cached resources, is replaced
by ri_FastPathBatchAdd() which buffers rows, and
ri_FastPathBatchFlush() which probes for the entire batch at once.
The cached resources (pk_rel, idx_rel, scandesc, slot, snapshot) are
used unchanged; the difference is that CCI, security context switch,
and curcid patching now happen once per flush rather than per row.
Per-flush CCI is safe because by the time a flush runs, all AFTER
triggers for the buffered rows have already fired.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Introduce two purpose-specific memory contexts:
- scan_cxt: child of TopTransactionContext for index scan
allocations (e.g. _bt_preprocess_keys). Lives for the
trigger-firing batch, deleted at teardown, so these allocations
are freed when the batch ends instead of at transaction end.
- flush_cxt: child of scan_cxt for per-flush transient work (cast
results, search array). Reset after each flush; deleting
scan_cxt in teardown also frees flush_cxt.
Benchmarking shows that together with <commit-hash-0001>,
<commit-hash-0002>, bulk FK inserts are ~2.9x faster (int PK / int FK,
1M rows, PK table and index cached).
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 441 +++++++++++++++++++---
src/test/regress/expected/foreign_key.out | 40 ++
src/test/regress/sql/foreign_key.sql | 38 ++
3 files changed, 466 insertions(+), 53 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 12de0dd2cf6..993c3ac49a3 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,13 +196,28 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in TopTransactionContext,
+ * and the matched[] scan in ri_FastPathFlushArray() is O(batch_size)
+ * per index match. Benchmarking showed little difference between 16
+ * and 64, with 256 consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
/*
* RI_FastPathEntry
- * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
*
* One entry per constraint, keyed by pg_constraint OID. Created lazily
* by ri_FastPathGetEntry() on first use within a trigger-firing batch
* and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
*/
typedef struct RI_FastPathEntry
{
@@ -210,8 +225,17 @@ typedef struct RI_FastPathEntry
Relation pk_rel;
Relation idx_rel;
IndexScanDesc scandesc;
- TupleTableSlot *slot;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
Snapshot snapshot; /* registered snapshot for the scan */
+ MemoryContext scan_cxt; /* index scan allocations */
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
} RI_FastPathEntry;
/*
@@ -274,8 +298,14 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -300,8 +330,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
int queryno, bool is_restrict, bool partgone);
static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
Relation fk_rel);
-static void ri_FastPathTeardown(void *arg);
-
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
* RI_FKey_check -
@@ -411,16 +441,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
* themselves if no matching PK row is found, so they only return on
* success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
if (AfterTriggerBatchIsActive())
- ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2703,10 +2739,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2771,70 +2811,311 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
}
/*
- * ri_FastPathCheckCached
- * Cached-resource variant of ri_FastPathCheck for use within the
- * after-trigger framework.
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
*
* Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
* open/close, scan begin/end, and snapshot registration. The snapshot's
- * curcid is patched each call so the scan sees effects of prior triggers.
+ * curcid is patched at flush time so the scan sees effects of prior triggers.
*
- * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
- * if no matching PK row is found.
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
*/
static void
-ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
- Relation fk_rel, TupleTableSlot *newslot)
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
Snapshot snapshot = fpentry->snapshot;
- TupleTableSlot *slot = fpentry->slot;
- Datum pk_vals[INDEX_MAX_KEYS];
- char pk_nulls[INDEX_MAX_KEYS];
- ScanKeyData skey[INDEX_MAX_KEYS];
- bool found;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
Oid saved_userid;
int saved_sec_context;
- MemoryContext oldcxt;
+ MemoryContext oldcxt = CurrentMemoryContext;
- /*
- * Advance the command counter and patch the cached snapshot's curcid so
- * the scan sees PK rows inserted by earlier triggers in this statement.
- */
- CommandCounterIncrement();
- fpentry->snapshot->curcid = GetCurrentCommandId(false);
+ if (fpentry->batch_count == 0)
+ return;
if (riinfo->fpmeta == NULL)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
fk_rel, idx_rel);
Assert(riinfo->fpmeta);
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot->curcid = GetCurrentCommandId(false);
+
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
saved_sec_context |
SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);
- ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ bool found = false;
+
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ MemoryContextSwitchTo(fpentry->flush_cxt);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ MemoryContextReset(fpentry->flush_cxt);
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Snapshot snapshot = fpentry->snapshot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
/*
- * The cached scandesc lives in TopTransactionContext, but the btree AM
- * defers some allocations to the first index_getnext_slot call. Ensure
- * those land in TopTransactionContext too.
+ * Transient per-flush allocations (cast results, the search array) must
+ * not accumulate across repeated flushes. Use the entry's short-lived
+ * flush context, reset after each flush.
*/
- oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
- riinfo, skey, riinfo->nkeys);
- MemoryContextSwitchTo(oldcxt);
- SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ MemoryContextSwitchTo(fpentry->flush_cxt);
- if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ /*
+ * Switch to scan_cxt for the index scan: index AMs may defer internal
+ * allocations (e.g. _bt_preprocess_keys) to the first
+ * index_getnext_slot() call. Those must survive across rescans within a
+ * batch; scan_cxt is deleted in teardown, cleaning them up when the batch
+ * ends.
+ */
+ MemoryContextSwitchTo(fpentry->scan_cxt);
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
}
/*
@@ -2845,9 +3126,10 @@ ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
* Returns true if a matching PK row was found, locked, and (if
* applicable) visible to the transaction snapshot.
*
- * The caller must ensure CurrentMemoryContext is long-lived enough
- * for the scan descriptor's internal allocations (typically
- * TopTransactionContext when using a cached scandesc).
+ * When using a cached scandesc (from the batch path), the caller must switch
+ * to the entry's scan_cxt before calling so that index AM allocations during
+ * index_getnext_slot() survive across rescans. ri_FastPathCheck uses a
+ * one-shot scan and ends it immediately, so no such switch is needed.
*/
static bool
ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -3769,14 +4051,51 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
/*
* ri_FastPathTeardown
* Tear down all cached fast-path state.
*
- * Called as an AfterTriggerBatchCallback at end of batch.
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
*/
static void
-ri_FastPathTeardown(void *arg)
+ri_FastPathTeardown(void)
{
HASH_SEQ_STATUS status;
RI_FastPathEntry *entry;
@@ -3794,10 +4113,14 @@ ri_FastPathTeardown(void *arg)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
table_close(entry->pk_rel, NoLock);
- if (entry->slot)
- ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
if (entry->snapshot)
UnregisterSnapshot(entry->snapshot);
+ if (entry->scan_cxt)
+ MemoryContextDelete(entry->scan_cxt);
}
hash_destroy(ri_fastpath_cache);
@@ -3911,23 +4234,32 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
/*
* Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
* per-row GetSnapshotData() overhead.
*/
entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
- entry->slot = table_slot_create(entry->pk_rel, NULL);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
entry->snapshot, NULL,
riinfo->nkeys, 0);
+ entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path scan context",
+ ALLOCSET_DEFAULT_SIZES);
+ entry->flush_cxt = AllocSetContextCreate(entry->scan_cxt,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+
MemoryContextSwitchTo(oldcxt);
/* Ensure cleanup at end of this trigger-firing batch */
if (!ri_fastpath_callback_registered)
{
- RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
ri_fastpath_callback_registered = true;
}
@@ -3938,6 +4270,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(entry->pk_rel);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
}
return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 25d505c6c12..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3590,3 +3590,43 @@ NOTICE: fp_auto_pk called
NOTICE: fp_auto_pk called
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index cedd20c8d11..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2578,3 +2578,41 @@ INSERT INTO fp_fk_cci VALUES (1), (2), (3);
DROP TABLE fp_fk_cci, fp_pk_cci;
DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
--
2.47.3
[application/octet-stream] v9-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (29.2K, 3-v9-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From 81a0149aaf044dc32610355c9178a40e7f7b4d57 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 20:13:03 +0900
Subject: [PATCH v9 2/3] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in <commit-hash-0001> opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation time, and its
curcid is patched in place on each subsequent row rather than
taking a fresh snapshot per invocation. This avoids the per-row
ProcArrayLock acquire/release that GetSnapshotData() requires even
in its fast-path reuse case. Under REPEATABLE READ the transaction
snapshot is immutable so caching is a no-op. Under READ COMMITTED
the cached snapshot will not reflect PK rows committed by other
backends mid-batch. The SPI path's per-row GetSnapshotData() might
catch these depending on timing, but that visibility is
non-deterministic -- whether a given row's FK check happens to see
a concurrent commit is a race, not a guarantee. The FK check only
needs PK rows visible before the statement began plus effects of
earlier triggers (tracked by curcid), and LockTupleKeyShare prevents
the PK row from disappearing regardless. CommandCounterIncrement
still runs on each invocation of ri_FastPathCheckCached(), matching
the SPI path's per-row CCI inside _SPI_execute_plan.
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid directly.
Permission checks (schema USAGE + table SELECT) are performed once at
cache entry creation rather than per flush. The RI check runs as the
PK table owner (via SetUserIdAndSecContext), so in practice these
checks verify that the owner has access to their own table -- a
condition that holds unless privileges have been explicitly revoked
from the owner, which would equally break the SPI path. Checking
once per batch avoids repeated syscache lookups from
pg_class_aclcheck() with no user-visible behavior change.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathTeardown as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- Batch callbacks only fire at the outermost query level
(query_depth == 0 in AfterTriggerEndQuery and checked inside
FireAfterTriggerBatchCallbacks), so nested queries from SPI
inside other AFTER triggers do not tear down the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
Benchmarking shows that together with <commit-hash-0001>, bulk FK
inserts are ~2.2x faster (int PK / int FK, 1M rows, PK table
and index cached).
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 90 +++++++
src/backend/utils/adt/ri_triggers.c | 275 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 86 +++++++
src/test/regress/sql/foreign_key.sql | 80 +++++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 549 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..b7442cf6cb1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5330,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6074,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6772,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6d8de64471f..12de0dd2cf6 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,23 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheckCached().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +222,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +274,8 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +298,9 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathTeardown(void *arg);
/*
@@ -387,12 +411,16 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathCheckCached() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ ri_FastPathCheckCached(riinfo, fk_rel, newslot);
+ else
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
return PointerGetDatum(NULL);
}
@@ -2742,6 +2770,73 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathCheckCached
+ * Cached-resource variant of ri_FastPathCheck for use within the
+ * after-trigger framework.
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, scan begin/end, and snapshot registration. The snapshot's
+ * curcid is patched each call so the scan sees effects of prior triggers.
+ *
+ * Like ri_FastPathCheck, reports the violation via ri_ReportViolation()
+ * if no matching PK row is found.
+ */
+static void
+ri_FastPathCheckCached(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ IndexScanDesc scandesc = fpentry->scandesc;
+ Snapshot snapshot = fpentry->snapshot;
+ TupleTableSlot *slot = fpentry->slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ /*
+ * Advance the command counter and patch the cached snapshot's curcid so
+ * the scan sees PK rows inserted by earlier triggers in this statement.
+ */
+ CommandCounterIncrement();
+ fpentry->snapshot->curcid = GetCurrentCommandId(false);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ /*
+ * The cached scandesc lives in TopTransactionContext, but the btree AM
+ * defers some allocations to the first index_getnext_slot call. Ensure
+ * those land in TopTransactionContext too.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot, snapshot,
+ riinfo, skey, riinfo->nkeys);
+ MemoryContextSwitchTo(oldcxt);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel, newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3673,3 +3768,177 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathTeardown(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ /* Close both scans before closing idx_rel. */
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathTeardown already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations and snapshots. Just
+ * NULL our pointers so the still-registered batch callback becomes a
+ * no-op. The hash table memory in TopTransactionContext will be
+ * freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheckCached()), avoiding
+ * per-row GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathTeardown, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..25d505c6c12 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,89 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..cedd20c8d11 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,83 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c51a0a903a6..0b05304a294 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2478,6 +2480,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
[application/octet-stream] v9-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (31.1K, 4-v9-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 92e8fd30d87a08fa675e7d15cf60b40c11d9afc8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 18:28:00 +0900
Subject: [PATCH v9 1/3] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. Otherwise, the
existing SPI path is used.
ri_FastPathCheck() extracts the FK values, builds scan keys, performs
an index scan, and locks the matching tuple with LockTupleKeyShare
via ri_LockPKTuple(), which handles the RI-specific subset of
table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks. The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index is cached).
Author: Junwang Zhao <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 469 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 726 insertions(+), 14 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..6d8de64471f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+ * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+ * lock. This is semantically equivalent to the SPI path below but avoids
+ * the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport) if no
+ * matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3112,8 +3544,11 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
/*
* ri_HashCompareOp -
*
- * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * Look up or create a cache entry for the given equality operator and
+ * the caller's value type (typeid). The entry holds the operator's
+ * FmgrInfo and, if typeid doesn't match what the operator expects as
+ * its right-hand input, a cast function to coerce the value before
+ * comparison.
*/
static RI_CompareHashEntry *
ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3169,8 +3604,14 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * pf_eq_oprs (used by the fast path) can be cross-type when the
+ * FK and PK columns differ in type, e.g. int48eq for int4 PK /
+ * int8 FK. If the FK column's type already matches what the
+ * operator expects as its right-hand input, no cast is needed.
+ */
+ if (typeid == righttype)
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..c51a0a903a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -815,6 +815,7 @@ ExtensionInfo
ExtensionLocation
ExtensionSiblingCache
ExtensionVersionInfo
+FastPathMeta
FDWCollateState
FD_SET
FILE
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-24 13:56 ` Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-24 13:56 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Mar 24, 2026 at 8:47 PM Amit Langote <[email protected]> wrote:
>
> Hi Junwang,
>
> On Fri, Mar 20, 2026 at 1:20 AM Junwang Zhao <[email protected]> wrote:
> > I squashed 0004 into 0003 so that each file can be committed independently.
> > I also runned pgindent for each file.
>
> Thanks for that.
>
> Here's another version.
>
> In 0001, I noticed that the condition change in ri_HashCompareOp could
> be simplified further. Also improved the commentary surrounding that.
> I also updated the commit message to clarify parity with the SPI path.
>
> Updated the commit message of 0002 to talk about why caching the
> snapshot for the entire trigger firing cycle of a given constraint
> makes a trade off compared to the SPI path which retakes the snapshot
> for every row checked and could in principle avoid failure for FK rows
> whose corresponding PK row was added by a concurrently committed
> transaction, at least in the READ COMMITTED case.
>
> Updated the commit message of 0003 to clarify that it replaces
> ri_FastPathCheckCached() from 0002 with the BatchAdd/BatchFlush pair,
> and that the cached resources are used unchanged -- only the probing
> cadence changes from per-row to per-flush. Per-flush CCI is safe
> because all AFTER triggers for the buffered rows have already fired
> by flush time; a new test case is added to show that.
Kept thinking about this on a walk after I sent this and came to the
conclusion that it might be better to just not cache the snapshot with
only the above argument in its favor. If repeated GetSnapshotData()
is expensive, the solution should be to fix that instead of simply
side-stepping it.
By taking a snapshot per-batch without caching it, and so likewise the
IndexScanDesc, I'm seeing the same ~3x speedup in the batched
SK_SEARCHARRAY case, so I don't see much point in being very stubborn
about snapshot caching. Like in the attached (there's an unrelated
memory context switch thinko fix). Note that relations (pk_rel,
idx_rel) and the slot remain cached across the batch; only the
snapshot and scandesc are taken fresh per flush.
I'll post an updated version tomorrow morning. I think it might be
better to just merge 0003 into 0002, because without snapshot and
scandesc caching the standalone value of 0002 is mostly just relation
and slot caching -- the interesting parts (batch callbacks, lifecycle
management) are all scaffolding for the batching. So v10 will be two
patches: 0001 core fast path, 0002 everything else.
--
Thanks, Amit Langote
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 993c3ac49a3..f271ffccc00 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -224,10 +224,8 @@ typedef struct RI_FastPathEntry
Oid conoid; /* hash key: pg_constraint OID */
Relation pk_rel;
Relation idx_rel;
- IndexScanDesc scandesc;
TupleTableSlot *pk_slot;
TupleTableSlot *fk_slot;
- Snapshot snapshot; /* registered snapshot for the scan */
MemoryContext scan_cxt; /* index scan allocations */
MemoryContext flush_cxt; /* short-lived context for per-flush work */
@@ -301,9 +299,11 @@ static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel);
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot);
static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel);
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot);
static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -2857,8 +2857,8 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- Snapshot snapshot = fpentry->snapshot;
TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
Oid saved_userid;
int saved_sec_context;
MemoryContext oldcxt = CurrentMemoryContext;
@@ -2882,7 +2882,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
* context per row whereas we do it once around the whole batch.
*/
CommandCounterIncrement();
- snapshot->curcid = GetCurrentCommandId(false);
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
@@ -2891,11 +2891,12 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
SECURITY_NOFORCE_RLS);
if (riinfo->nkeys == 1)
- ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel, snapshot);
else
- ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel, snapshot);
MemoryContextSwitchTo(oldcxt);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
/* Free materialized tuples and reset */
for (int i = 0; i < fpentry->batch_count; i++)
@@ -2912,29 +2913,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
*/
static void
ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel)
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot)
{
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
TupleTableSlot *pk_slot = fpentry->pk_slot;
- Snapshot snapshot = fpentry->snapshot;
+ IndexScanDesc scandesc;
Datum pk_vals[INDEX_MAX_KEYS];
char pk_nulls[INDEX_MAX_KEYS];
ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
for (int i = 0; i < fpentry->batch_count; i++)
{
- bool found = false;
ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
-
- /*
- * build_index_scankeys() may palloc cast results for cross-type FKs.
- * Use the entry's short-lived flush context so these don't accumulate
- * across batches.
- */
- MemoryContextSwitchTo(fpentry->flush_cxt);
ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
MemoryContextSwitchTo(fpentry->scan_cxt);
@@ -2943,11 +2945,15 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
snapshot, riinfo, skey, riinfo->nkeys);
if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel,
- fk_slot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ break;
}
+ index_endscan(scandesc);
MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
}
/*
@@ -2962,14 +2968,14 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
*/
static void
ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel)
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot)
{
FastPathMeta *fpmeta = riinfo->fpmeta;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
TupleTableSlot *pk_slot = fpentry->pk_slot;
- Snapshot snapshot = fpentry->snapshot;
+ IndexScanDesc scandesc;
Datum search_vals[RI_FASTPATH_BATCH_SIZE];
bool matched[RI_FASTPATH_BATCH_SIZE];
int nvals = fpentry->batch_count;
@@ -2983,16 +2989,19 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
char elem_align;
ArrayType *arr;
- Assert(fpmeta);
-
- memset(matched, 0, nvals * sizeof(bool));
-
/*
* Transient per-flush allocations (cast results, the search array) must
* not accumulate across repeated flushes. Use the entry's short-lived
* flush context, reset after each flush.
*/
- MemoryContextSwitchTo(fpentry->flush_cxt);
+ MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
/*
* Extract FK values, casting to the operator's expected input type if
@@ -3103,6 +3112,11 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
}
}
+ index_endscan(scandesc);
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
/* Report first unmatched row */
for (int i = 0; i < nvals; i++)
{
@@ -3114,8 +3128,6 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
RI_PLAN_CHECK_LOOKUPPK, false, false);
}
}
-
- MemoryContextReset(fpentry->flush_cxt);
}
/*
@@ -4106,9 +4118,6 @@ ri_FastPathTeardown(void)
hash_seq_init(&status, ri_fastpath_cache);
while ((entry = hash_seq_search(&status)) != NULL)
{
- /* Close both scans before closing idx_rel. */
- if (entry->scandesc)
- index_endscan(entry->scandesc);
if (entry->idx_rel)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
@@ -4117,8 +4126,6 @@ ri_FastPathTeardown(void)
ExecDropSingleTupleTableSlot(entry->pk_slot);
if (entry->fk_slot)
ExecDropSingleTupleTableSlot(entry->fk_slot);
- if (entry->snapshot)
- UnregisterSnapshot(entry->snapshot);
if (entry->scan_cxt)
MemoryContextDelete(entry->scan_cxt);
}
@@ -4232,21 +4239,10 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
- /*
- * Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
- * per-row GetSnapshotData() overhead.
- */
- entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
-
entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
&TTSOpsHeapTuple);
- entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
- entry->snapshot, NULL,
- riinfo->nkeys, 0);
-
entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
"RI fast path scan context",
ALLOCSET_DEFAULT_SIZES);
Attachments:
[text/plain] v9_delta.patch.txt (8.4K, 2-v9_delta.patch.txt)
download | inline diff:
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 993c3ac49a3..f271ffccc00 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -224,10 +224,8 @@ typedef struct RI_FastPathEntry
Oid conoid; /* hash key: pg_constraint OID */
Relation pk_rel;
Relation idx_rel;
- IndexScanDesc scandesc;
TupleTableSlot *pk_slot;
TupleTableSlot *fk_slot;
- Snapshot snapshot; /* registered snapshot for the scan */
MemoryContext scan_cxt; /* index scan allocations */
MemoryContext flush_cxt; /* short-lived context for per-flush work */
@@ -301,9 +299,11 @@ static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel);
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot);
static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel);
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot);
static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -2857,8 +2857,8 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
const RI_ConstraintInfo *riinfo = fpentry->riinfo;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- Snapshot snapshot = fpentry->snapshot;
TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
Oid saved_userid;
int saved_sec_context;
MemoryContext oldcxt = CurrentMemoryContext;
@@ -2882,7 +2882,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
* context per row whereas we do it once around the whole batch.
*/
CommandCounterIncrement();
- snapshot->curcid = GetCurrentCommandId(false);
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
@@ -2891,11 +2891,12 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
SECURITY_NOFORCE_RLS);
if (riinfo->nkeys == 1)
- ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel);
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel, snapshot);
else
- ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel);
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel, snapshot);
MemoryContextSwitchTo(oldcxt);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
/* Free materialized tuples and reset */
for (int i = 0; i < fpentry->batch_count; i++)
@@ -2912,29 +2913,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
*/
static void
ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel)
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot)
{
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
TupleTableSlot *pk_slot = fpentry->pk_slot;
- Snapshot snapshot = fpentry->snapshot;
+ IndexScanDesc scandesc;
Datum pk_vals[INDEX_MAX_KEYS];
char pk_nulls[INDEX_MAX_KEYS];
ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
for (int i = 0; i < fpentry->batch_count; i++)
{
- bool found = false;
ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
-
- /*
- * build_index_scankeys() may palloc cast results for cross-type FKs.
- * Use the entry's short-lived flush context so these don't accumulate
- * across batches.
- */
- MemoryContextSwitchTo(fpentry->flush_cxt);
ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
MemoryContextSwitchTo(fpentry->scan_cxt);
@@ -2943,11 +2945,15 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
snapshot, riinfo, skey, riinfo->nkeys);
if (!found)
- ri_ReportViolation(riinfo, pk_rel, fk_rel,
- fk_slot, NULL,
- RI_PLAN_CHECK_LOOKUPPK, false, false);
+ break;
}
+ index_endscan(scandesc);
MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
}
/*
@@ -2962,14 +2968,14 @@ ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
*/
static void
ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
- const RI_ConstraintInfo *riinfo, Relation fk_rel)
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot)
{
FastPathMeta *fpmeta = riinfo->fpmeta;
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
- IndexScanDesc scandesc = fpentry->scandesc;
TupleTableSlot *pk_slot = fpentry->pk_slot;
- Snapshot snapshot = fpentry->snapshot;
+ IndexScanDesc scandesc;
Datum search_vals[RI_FASTPATH_BATCH_SIZE];
bool matched[RI_FASTPATH_BATCH_SIZE];
int nvals = fpentry->batch_count;
@@ -2983,16 +2989,19 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
char elem_align;
ArrayType *arr;
- Assert(fpmeta);
-
- memset(matched, 0, nvals * sizeof(bool));
-
/*
* Transient per-flush allocations (cast results, the search array) must
* not accumulate across repeated flushes. Use the entry's short-lived
* flush context, reset after each flush.
*/
- MemoryContextSwitchTo(fpentry->flush_cxt);
+ MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
/*
* Extract FK values, casting to the operator's expected input type if
@@ -3103,6 +3112,11 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
}
}
+ index_endscan(scandesc);
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
/* Report first unmatched row */
for (int i = 0; i < nvals; i++)
{
@@ -3114,8 +3128,6 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
RI_PLAN_CHECK_LOOKUPPK, false, false);
}
}
-
- MemoryContextReset(fpentry->flush_cxt);
}
/*
@@ -4106,9 +4118,6 @@ ri_FastPathTeardown(void)
hash_seq_init(&status, ri_fastpath_cache);
while ((entry = hash_seq_search(&status)) != NULL)
{
- /* Close both scans before closing idx_rel. */
- if (entry->scandesc)
- index_endscan(entry->scandesc);
if (entry->idx_rel)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
@@ -4117,8 +4126,6 @@ ri_FastPathTeardown(void)
ExecDropSingleTupleTableSlot(entry->pk_slot);
if (entry->fk_slot)
ExecDropSingleTupleTableSlot(entry->fk_slot);
- if (entry->snapshot)
- UnregisterSnapshot(entry->snapshot);
if (entry->scan_cxt)
MemoryContextDelete(entry->scan_cxt);
}
@@ -4232,21 +4239,10 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
- /*
- * Register an initial snapshot. Its curcid will be patched in place
- * on each subsequent row (see ri_FastPathBatchFlush()), avoiding
- * per-row GetSnapshotData() overhead.
- */
- entry->snapshot = RegisterSnapshot(GetTransactionSnapshot());
-
entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
&TTSOpsHeapTuple);
- entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
- entry->snapshot, NULL,
- riinfo->nkeys, 0);
-
entry->scan_cxt = AllocSetContextCreate(TopTransactionContext,
"RI fast path scan context",
ALLOCSET_DEFAULT_SIZES);
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-25 00:41 ` Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-25 00:41 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Mar 24, 2026 at 10:56 PM Amit Langote <[email protected]> wrote:
> On Tue, Mar 24, 2026 at 8:47 PM Amit Langote <[email protected]> wrote:
> >
> > Hi Junwang,
> >
> > On Fri, Mar 20, 2026 at 1:20 AM Junwang Zhao <[email protected]> wrote:
> > > I squashed 0004 into 0003 so that each file can be committed independently.
> > > I also runned pgindent for each file.
> >
> > Thanks for that.
> >
> > Here's another version.
> >
> > In 0001, I noticed that the condition change in ri_HashCompareOp could
> > be simplified further. Also improved the commentary surrounding that.
> > I also updated the commit message to clarify parity with the SPI path.
> >
> > Updated the commit message of 0002 to talk about why caching the
> > snapshot for the entire trigger firing cycle of a given constraint
> > makes a trade off compared to the SPI path which retakes the snapshot
> > for every row checked and could in principle avoid failure for FK rows
> > whose corresponding PK row was added by a concurrently committed
> > transaction, at least in the READ COMMITTED case.
> >
> > Updated the commit message of 0003 to clarify that it replaces
> > ri_FastPathCheckCached() from 0002 with the BatchAdd/BatchFlush pair,
> > and that the cached resources are used unchanged -- only the probing
> > cadence changes from per-row to per-flush. Per-flush CCI is safe
> > because all AFTER triggers for the buffered rows have already fired
> > by flush time; a new test case is added to show that.
>
> Kept thinking about this on a walk after I sent this and came to the
> conclusion that it might be better to just not cache the snapshot with
> only the above argument in its favor. If repeated GetSnapshotData()
> is expensive, the solution should be to fix that instead of simply
> side-stepping it.
>
> By taking a snapshot per-batch without caching it, and so likewise the
> IndexScanDesc, I'm seeing the same ~3x speedup in the batched
> SK_SEARCHARRAY case, so I don't see much point in being very stubborn
> about snapshot caching. Like in the attached (there's an unrelated
> memory context switch thinko fix). Note that relations (pk_rel,
> idx_rel) and the slot remain cached across the batch; only the
> snapshot and scandesc are taken fresh per flush.
>
> I'll post an updated version tomorrow morning. I think it might be
> better to just merge 0003 into 0002, because without snapshot and
> scandesc caching the standalone value of 0002 is mostly just relation
> and slot caching -- the interesting parts (batch callbacks, lifecycle
> management) are all scaffolding for the batching. So v10 will be two
> patches: 0001 core fast path, 0002 everything else.
And here's a set like that. I noticed that we don't need a dedicated
scan_cxt now that scandesc is not cached and a few other
simplifications.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v10-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (42.3K, 2-v10-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 9332a7086f8563da1aef56c458bfec9e60164b1b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 09:37:04 +0900
Subject: [PATCH v10 2/2] Batch FK rows and use SK_SEARCHARRAY for fast-path
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement and
security context switch across the batch. Per-flush CCI is safe
because all AFTER triggers for the buffered rows have already fired
by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into
TopTransactionContext so they survive across trigger invocations.
Violations are reported immediately during the flush via
ri_ReportViolation(), which does not return.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to the non-cached
per-invocation path in that context.
A purpose-specific memory context (flush_cxt), child of
TopTransactionContext, is used for per-flush transient work: cast
results, the search array, and index scan allocations. Reset after
each flush; deleted in teardown.
Together with <commit-hash-0001>, bulk FK inserts are ~2.9x faster
(int PK / int FK, 1M rows, PK table and index cached in memory).
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 90 ++++
src/backend/utils/adt/ri_triggers.c | 581 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 927 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..b7442cf6cb1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5330,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6074,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6772,76 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6d8de64471f..689c8c08a78 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,44 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in TopTransactionContext,
+ * and the matched[] scan in ri_FastPathFlushArray() is O(batch_size)
+ * per index match. Benchmarking showed little difference between 16
+ * and 64, with 256 consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +243,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +295,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,7 +327,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
-
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
* RI_FKey_check -
@@ -387,12 +440,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2675,10 +2738,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2742,6 +2809,305 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs.
+ * Use the entry's short-lived flush context so these don't accumulate
+ * across batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ if (riinfo->nkeys == 1)
+ ri_FastPathFlushArray(fpentry, fk_slot, riinfo, fk_rel, snapshot,
+ scandesc);
+ else
+ ri_FastPathFlushLoop(fpentry, fk_slot, riinfo, fk_rel, snapshot,
+ scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Free materialized tuples and reset */
+ for (int i = 0; i < fpentry->batch_count; i++)
+ heap_freetuple(fpentry->batch[i]);
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ */
+static void
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ if (!found)
+ break;
+ }
+
+ /* fk_slot contains the tuple that failed. */
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ */
+static void
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i])
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+ }
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -2749,10 +3115,6 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
*
* Returns true if a matching PK row was found, locked, and (if
* applicable) visible to the transaction snapshot.
- *
- * The caller must ensure CurrentMemoryContext is long-lived enough
- * for the scan descriptor's internal allocations (typically
- * TopTransactionContext when using a cached scandesc).
*/
static bool
ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
@@ -3673,3 +4035,204 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the
+ * static pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static
+ * pointers so the still-registered batch callback becomes a
+ * no-op for the rest of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, allocates a result slot, and registers the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c51a0a903a6..0b05304a294 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2478,6 +2480,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
[application/octet-stream] v10-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (31.1K, 3-v10-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 92e8fd30d87a08fa675e7d15cf60b40c11d9afc8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 18:28:00 +0900
Subject: [PATCH v10 1/2] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. Otherwise, the
existing SPI path is used.
ri_FastPathCheck() extracts the FK values, builds scan keys, performs
an index scan, and locks the matching tuple with LockTupleKeyShare
via ri_LockPKTuple(), which handles the RI-specific subset of
table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks. The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index is cached).
Author: Junwang Zhao <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 469 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 726 insertions(+), 14 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..6d8de64471f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+ * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+ * lock. This is semantically equivalent to the SPI path below but avoids
+ * the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport) if no
+ * matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2617,6 +2673,382 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ *
+ * The caller must ensure CurrentMemoryContext is long-lived enough
+ * for the scan descriptor's internal allocations (typically
+ * TopTransactionContext when using a cached scandesc).
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3112,8 +3544,11 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
/*
* ri_HashCompareOp -
*
- * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * Look up or create a cache entry for the given equality operator and
+ * the caller's value type (typeid). The entry holds the operator's
+ * FmgrInfo and, if typeid doesn't match what the operator expects as
+ * its right-hand input, a cast function to coerce the value before
+ * comparison.
*/
static RI_CompareHashEntry *
ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3169,8 +3604,14 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * pf_eq_oprs (used by the fast path) can be cross-type when the
+ * FK and PK columns differ in type, e.g. int48eq for int4 PK /
+ * int8 FK. If the FK column's type already matches what the
+ * operator expects as its right-hand input, no cast is needed.
+ */
+ if (typeid == righttype)
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..c51a0a903a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -815,6 +815,7 @@ ExtensionInfo
ExtensionLocation
ExtensionSiblingCache
ExtensionVersionInfo
+FastPathMeta
FDWCollateState
FD_SET
FILE
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-30 04:55 ` Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-30 04:55 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Mar 25, 2026 at 9:41 AM Amit Langote <[email protected]> wrote:
> On Tue, Mar 24, 2026 at 10:56 PM Amit Langote <[email protected]> wrote:
> > On Tue, Mar 24, 2026 at 8:47 PM Amit Langote <[email protected]> wrote:
> > >
> > > Hi Junwang,
> > >
> > > On Fri, Mar 20, 2026 at 1:20 AM Junwang Zhao <[email protected]> wrote:
> > > > I squashed 0004 into 0003 so that each file can be committed independently.
> > > > I also runned pgindent for each file.
> > >
> > > Thanks for that.
> > >
> > > Here's another version.
> > >
> > > In 0001, I noticed that the condition change in ri_HashCompareOp could
> > > be simplified further. Also improved the commentary surrounding that.
> > > I also updated the commit message to clarify parity with the SPI path.
> > >
> > > Updated the commit message of 0002 to talk about why caching the
> > > snapshot for the entire trigger firing cycle of a given constraint
> > > makes a trade off compared to the SPI path which retakes the snapshot
> > > for every row checked and could in principle avoid failure for FK rows
> > > whose corresponding PK row was added by a concurrently committed
> > > transaction, at least in the READ COMMITTED case.
> > >
> > > Updated the commit message of 0003 to clarify that it replaces
> > > ri_FastPathCheckCached() from 0002 with the BatchAdd/BatchFlush pair,
> > > and that the cached resources are used unchanged -- only the probing
> > > cadence changes from per-row to per-flush. Per-flush CCI is safe
> > > because all AFTER triggers for the buffered rows have already fired
> > > by flush time; a new test case is added to show that.
> >
> > Kept thinking about this on a walk after I sent this and came to the
> > conclusion that it might be better to just not cache the snapshot with
> > only the above argument in its favor. If repeated GetSnapshotData()
> > is expensive, the solution should be to fix that instead of simply
> > side-stepping it.
> >
> > By taking a snapshot per-batch without caching it, and so likewise the
> > IndexScanDesc, I'm seeing the same ~3x speedup in the batched
> > SK_SEARCHARRAY case, so I don't see much point in being very stubborn
> > about snapshot caching. Like in the attached (there's an unrelated
> > memory context switch thinko fix). Note that relations (pk_rel,
> > idx_rel) and the slot remain cached across the batch; only the
> > snapshot and scandesc are taken fresh per flush.
> >
> > I'll post an updated version tomorrow morning. I think it might be
> > better to just merge 0003 into 0002, because without snapshot and
> > scandesc caching the standalone value of 0002 is mostly just relation
> > and slot caching -- the interesting parts (batch callbacks, lifecycle
> > management) are all scaffolding for the batching. So v10 will be two
> > patches: 0001 core fast path, 0002 everything else.
>
> And here's a set like that. I noticed that we don't need a dedicated
> scan_cxt now that scandesc is not cached and a few other
> simplifications.
Junwang pointed out off-list that FK tuples added to
RI_FastPathEntry.batch[] were being copied into TopTransactionContext
rather than flush_cxt, so they would accumulate until the batch was
exhausted rather than being reclaimed per flush. Fixed in
ri_FastPathBatchAdd() in 0002.
Also added a couple of comments in trigger.c that were missing: an
Assert and explanation in RegisterAfterTriggerBatchCallback()
clarifying the query_depth >= 0 precondition, a comment at the
AfterTriggerEndQuery call site explaining why
FireAfterTriggerBatchCallbacks() must precede the query_depth
decrement and AfterTriggerFreeQuery, and brief intent comments at the
AfterTriggerFireDeferred and AfterTriggerSetState call sites.
Plan is to commit 0001 tomorrow barring objections and let it sit for
a bit before committing 0002. Feedback on 0002, particularly on the
AfterTriggerBatchCallback mechanism in trigger.c, welcome in the
meantime.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v11-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (42.6K, 2-v11-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 4c25679b57dfcffb538de722c7224819610e6e3c Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 30 Mar 2026 12:58:26 +0900
Subject: [PATCH v11 2/2] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in the per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. When the buffer fills (64 rows) or the
trigger-firing cycle ends, ri_FastPathBatchFlush() probes the index
for all buffered rows, sharing a single CommandCounterIncrement,
snapshot, and security context switch across the batch. Per-flush
CCI is safe because all AFTER triggers for the buffered rows have
already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; unmatched items are reported as violations.
Multi-column foreign keys fall back to a per-row probe loop via
ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
Reset after each flush; deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing down
cached resources. Since the FK relation may already be closed by
flush time (e.g. for deferred constraints at COMMIT), reopens the
relation using entry->riinfo->fk_relid if needed.
The non-cached path (ALTER TABLE validation) bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to the non-cached
per-invocation path in that context.
Together with <commit-hash-0001>, bulk FK inserts are ~2.9x faster
(int PK / int FK, 1M rows, PK table and index cached in memory).
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 583 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 949 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 18d489d790d..1661a3ea5c7 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,44 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +243,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +295,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +327,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -387,12 +441,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2681,10 +2745,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2748,6 +2816,312 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ if (riinfo->nkeys == 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3675,3 +4049,204 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, allocates a result slot, and registers the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+ Oid saved_userid;
+ int saved_sec_context;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(entry->pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(entry->pk_rel);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ /* For ri_FastPathEndBatch() */
+ entry->riinfo = riinfo;
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 66cb18ba5b9..2a5d9387faf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
[application/octet-stream] v11-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (30.9K, 3-v11-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 02b08afd3b64379dc08eab1fbdc85c8b36ff81bc Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 24 Mar 2026 18:28:00 +0900
Subject: [PATCH v11 1/2] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. Otherwise, the
existing SPI path is used.
ri_FastPathCheck() extracts the FK values, builds scan keys, performs
an index scan, and locks the matching tuple with LockTupleKeyShare
via ri_LockPKTuple(), which handles the RI-specific subset of
table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks. The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index is cached).
Author: Junwang Zhao <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 465 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 722 insertions(+), 14 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6230a2ea9ad..18d489d790d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+ * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+ * lock. This is semantically equivalent to the SPI path below but avoids
+ * the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport) if no
+ * matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2623,6 +2679,378 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3118,8 +3546,11 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
/*
* ri_HashCompareOp -
*
- * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * Look up or create a cache entry for the given equality operator and
+ * the caller's value type (typeid). The entry holds the operator's
+ * FmgrInfo and, if typeid doesn't match what the operator expects as
+ * its right-hand input, a cast function to coerce the value before
+ * comparison.
*/
static RI_CompareHashEntry *
ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3175,8 +3606,14 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * pf_eq_oprs (used by the fast path) can be cross-type when the FK
+ * and PK columns differ in type, e.g. int48eq for int4 PK / int8 FK.
+ * If the FK column's type already matches what the operator expects
+ * as its right-hand input, no cast is needed.
+ */
+ if (typeid == righttype)
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3c1007abdf..66cb18ba5b9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -817,6 +817,7 @@ ExtensionInfo
ExtensionLocation
ExtensionSiblingCache
ExtensionVersionInfo
+FastPathMeta
FDWCollateState
FD_SET
FILE
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-30 11:15 ` Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-30 11:15 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Mon, Mar 30, 2026 at 1:55 PM Amit Langote <[email protected]> wrote:
> Junwang pointed out off-list that FK tuples added to
> RI_FastPathEntry.batch[] were being copied into TopTransactionContext
> rather than flush_cxt, so they would accumulate until the batch was
> exhausted rather than being reclaimed per flush. Fixed in
> ri_FastPathBatchAdd() in 0002.
>
> Also added a couple of comments in trigger.c that were missing: an
> Assert and explanation in RegisterAfterTriggerBatchCallback()
> clarifying the query_depth >= 0 precondition, a comment at the
> AfterTriggerEndQuery call site explaining why
> FireAfterTriggerBatchCallbacks() must precede the query_depth
> decrement and AfterTriggerFreeQuery, and brief intent comments at the
> AfterTriggerFireDeferred and AfterTriggerSetState call sites.
>
> Plan is to commit 0001 tomorrow barring objections and let it sit for
> a bit before committing 0002. Feedback on 0002, particularly on the
> AfterTriggerBatchCallback mechanism in trigger.c, welcome in the
> meantime.
Kept looking at 0002 and found a couple of things to improve or change
my thoughts about. I decided to move the permission check from fast
path cache entry creation into ri_FastPathBatchFlush(), alongside the
snapshot, so that permission changes between flushes are respected
rather than checked once at batch start; the check happens for every
row in the SPI and non-batched fast path. Also, improved comments in
a few places to mention design decisions better.
0001 is mostly unchanged from v11 except I updated its commit message
to explain why only RI_FKey_check is covered and not the action
triggers as the topic has come up in previous threads about this
topic.
Still planning to commit 0001 tomorrow.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (31.5K, 2-v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 5fa5213281c6b5fb25d93272a07606083141e1c7 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 30 Mar 2026 16:32:56 +0900
Subject: [PATCH v12 1/2] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
Benchmarking shows ~1.8x speedup for bulk FK inserts (int PK/int FK,
1M rows, where PK table and index are cached).
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. Otherwise, the
existing SPI path is used.
This optimization covers only the referential check trigger
(RI_FKey_check). The action triggers (CASCADE, SET NULL, SET DEFAULT,
RESTRICT, NO ACTION) must find rows on the FK side to modify, which
requires a table scan with no guaranteed index available, and then
execute DML against those rows through the full executor path including
any triggered actions. Replicating that without substantial code
duplication is not feasible, so those triggers remain on the SPI path.
Extending the fast path to action triggers remains possible as future
work if the necessary infrastructure is built.
The new ri_FastPathCheck() function extracts the FK values, builds scan
keys, performs an index scan, and locks the matching tuple with
LockTupleKeyShare via ri_LockPKTuple(), which handles the RI-specific
subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
The scan uses GetTransactionSnapshot(), matching what the SPI path
uses (via _SPI_execute_plan pushing GetTransactionSnapshot() as the
active snapshot). Under READ COMMITTED this is a fresh snapshot;
under REPEATABLE READ / SERIALIZABLE it is the frozen transaction-
start snapshot, so PK rows committed after the transaction started
are not visible.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path gets implicitly through
the executor's permission checks. The fast path also switches to
the PK table owner's security context (with SECURITY_NOFORCE_RLS)
before the index probe, matching the SPI path where the query runs
as the table owner.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The existing code asserted same-type operators only, which was correct
for its existing callers (ri_KeysEqual compares same-type FK column
values via ff_eq_oprs), but the fast path is the first caller to pass
pf_eq_oprs, which can be cross-type.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
New regression tests cover RLS bypass and ACL enforcement for the
fast-path permission checks. New isolation tests exercise concurrent
PK updates under both READ COMMITTED and REPEATABLE READ.
Author: Junwang Zhao <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 465 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 105 ++++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 53 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 722 insertions(+), 14 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6230a2ea9ad..18d489d790d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+typedef struct FastPathMeta FastPathMeta;
/*
* RI_ConstraintInfo
@@ -132,8 +136,24 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ FastPathMeta *fpmeta;
} RI_ConstraintInfo;
+typedef struct RI_CompareHashEntry RI_CompareHashEntry;
+
+/* Fast-path metadata for RI checks on foreign key referencing tables */
+typedef struct FastPathMeta
+{
+ RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
+} FastPathMeta;
+
/*
* RI_QueryKey
*
@@ -233,6 +253,23 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +313,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +323,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +347,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +355,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +379,31 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * For non-partitioned, non-temporal FKs, we can skip the SPI machinery
+ * (plan cache, executor setup, etc.) and do a direct index scan + tuple
+ * lock. This is semantically equivalent to the SPI path below but avoids
+ * the per-row executor overhead.
+ *
+ * ri_FastPathCheck() reports the violation itself (via ereport) if no
+ * matching PK row is found, so it only returns on success.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return PointerGetDatum(NULL);
+ }
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2356,6 +2406,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->valid = true;
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
+
+ riinfo->fpmeta = NULL;
+
return riinfo;
}
@@ -2623,6 +2679,378 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * If no matching PK row exists, report the violation via ri_ReportViolation(),
+ * otherwise, the function returns normally.
+ */
+static void
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ index_endscan(scandesc);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+
+ if (!found)
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+}
+
+/*
+ * ri_FastPathProbeOne
+ * Probe the PK index for one set of scan keys, lock the matching
+ * tuple
+ *
+ * Returns true if a matching PK row was found, locked, and (if
+ * applicable) visible to the transaction snapshot.
+ */
+static bool
+ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
+ IndexScanDesc scandesc, TupleTableSlot *slot,
+ Snapshot snapshot, const RI_ConstraintInfo *riinfo,
+ ScanKeyData *skey, int nkeys)
+{
+ bool found = false;
+
+ index_rescan(scandesc, skey, nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the current user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * recheck_matched_pk_tuple
+ * After following an update chain (tmfd.traversed), verify that
+ * the locked PK tuple still matches the original search keys.
+ *
+ * A non-key update (e.g. changing a non-PK column) creates a new tuple version
+ * that we've now locked, but the key is unchanged -- that's fine. A key
+ * update means the value we were looking for is gone, so we should treat it as
+ * not found.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ /*
+ * TODO: BuildIndexInfo does a syscache lookup + palloc on every call.
+ * This only fires on the concurrent-update path (tmfd.traversed), which
+ * should be rare, so the cost is acceptable for now. If profiling shows
+ * otherwise, cache the IndexInfo in FastPathMeta.
+ */
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+
+ Assert(fpmeta);
+
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ fpmeta->strats[i], fpmeta->subtypes[i],
+ idx_rel->rd_indcollation[i], fpmeta->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ FastPathMeta *fpmeta;
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+
+ Assert(riinfo != NULL && riinfo->valid);
+
+ fpmeta = palloc_object(FastPathMeta);
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ fpmeta->compare_entries[i] = entry;
+ fpmeta->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &fpmeta->strats[i],
+ &lefttype,
+ &fpmeta->subtypes[i]);
+ }
+
+ riinfo->fpmeta = fpmeta;
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3118,8 +3546,11 @@ ri_CompareWithCast(Oid eq_opr, Oid typeid, Oid collid,
/*
* ri_HashCompareOp -
*
- * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * Look up or create a cache entry for the given equality operator and
+ * the caller's value type (typeid). The entry holds the operator's
+ * FmgrInfo and, if typeid doesn't match what the operator expects as
+ * its right-hand input, a cast function to coerce the value before
+ * comparison.
*/
static RI_CompareHashEntry *
ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3175,8 +3606,14 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * pf_eq_oprs (used by the fast path) can be cross-type when the FK
+ * and PK columns differ in type, e.g. int48eq for int4 PK / int8 FK.
+ * If the FK column's type already matches what the operator expects
+ * as its right-hand input, no cast is needed.
+ */
+ if (typeid == righttype)
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..4dd9535d3c0
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,105 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s2b s2ukey s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s1b s1i s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1b: BEGIN;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2b s2ukey s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1); <waiting ...>
+step s2c: COMMIT;
+step s3i: <... completed>
+ERROR: could not serialize access due to concurrent update
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2b s2uaux s3b s3i s2c s3c s2s s3s
+step s2b: BEGIN;
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s3b: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s3i: INSERT INTO child VALUES (2, 1);
+step s2c: COMMIT;
+step s3c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s3s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 2| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..03dc7f260cd
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,53 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+step s2b { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+session s3
+step s3b { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s3i { INSERT INTO child VALUES (2, 1); }
+step s3c { COMMIT; }
+step s3s { SELECT * FROM child; }
+
+# fail
+permutation s2b s2ukey s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2uaux s1b s1i s2c s1c s2s s1s
+# ok
+permutation s2b s2ukey s1b s1i s2ukey2 s2c s1c s2s s1s
+
+# RR: key update -> serialization failure
+permutation s2b s2ukey s3b s3i s2c s3c s2s s3s
+# RR: non-key update -> old version visible via transaction snapshot
+permutation s2b s2uaux s3b s3i s2c s3c s2s s3s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3c1007abdf..66cb18ba5b9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -817,6 +817,7 @@ ExtensionInfo
ExtensionLocation
ExtensionSiblingCache
ExtensionVersionInfo
+FastPathMeta
FDWCollateState
FD_SET
FILE
--
2.47.3
[application/octet-stream] v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (43.5K, 3-v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 9899f6a34f167025ac5c4ee11558103280f4608d Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 30 Mar 2026 17:38:55 +0900
Subject: [PATCH v12 2/2] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit, bulk FK inserts are ~2.9x faster (int PK /
int FK, 1M rows, PK table and index cached in memory).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->riinfo->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 597 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 963 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 18d489d790d..d97505bd41e 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,50 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+
+ /* For ri_FastPathEndBatch() */
+ const RI_ConstraintInfo *riinfo;
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +249,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +301,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +333,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -387,12 +447,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2681,10 +2751,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2748,6 +2822,321 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ const RI_ConstraintInfo *riinfo = fpentry->riinfo;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ if (riinfo->fpmeta == NULL)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta);
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->nkeys == 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3675,3 +4064,203 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->riinfo->fk_relid,
+ AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+
+ /*
+ * Store riinfo so ri_FastPathEndBatch() can flush any remaining
+ * buffered rows and reopen the FK relation if needed (e.g. for
+ * deferred constraints, the FK relation may already be closed by the
+ * time the batch ends).
+ */
+ entry->riinfo = riinfo;
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..2179d2a8e8f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..7a729486bc2 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 66cb18ba5b9..2a5d9387faf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-31 09:09 ` Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-03-31 09:09 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
> On Mar 30, 2026, at 19:15, Amit Langote <[email protected]> wrote:
>
> On Mon, Mar 30, 2026 at 1:55 PM Amit Langote <[email protected]> wrote:
>> Junwang pointed out off-list that FK tuples added to
>> RI_FastPathEntry.batch[] were being copied into TopTransactionContext
>> rather than flush_cxt, so they would accumulate until the batch was
>> exhausted rather than being reclaimed per flush. Fixed in
>> ri_FastPathBatchAdd() in 0002.
>>
>> Also added a couple of comments in trigger.c that were missing: an
>> Assert and explanation in RegisterAfterTriggerBatchCallback()
>> clarifying the query_depth >= 0 precondition, a comment at the
>> AfterTriggerEndQuery call site explaining why
>> FireAfterTriggerBatchCallbacks() must precede the query_depth
>> decrement and AfterTriggerFreeQuery, and brief intent comments at the
>> AfterTriggerFireDeferred and AfterTriggerSetState call sites.
>>
>> Plan is to commit 0001 tomorrow barring objections and let it sit for
>> a bit before committing 0002. Feedback on 0002, particularly on the
>> AfterTriggerBatchCallback mechanism in trigger.c, welcome in the
>> meantime.
>
> Kept looking at 0002 and found a couple of things to improve or change
> my thoughts about. I decided to move the permission check from fast
> path cache entry creation into ri_FastPathBatchFlush(), alongside the
> snapshot, so that permission changes between flushes are respected
> rather than checked once at batch start; the check happens for every
> row in the SPI and non-batched fast path. Also, improved comments in
> a few places to mention design decisions better.
>
> 0001 is mostly unchanged from v11 except I updated its commit message
> to explain why only RI_FKey_check is covered and not the action
> triggers as the topic has come up in previous threads about this
> topic.
>
> Still planning to commit 0001 tomorrow.
>
> --
> Thanks, Amit Langote
> <v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch><v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
Hi Amit,
While reading the recent commits, I saw that 0001 has been pushed as 2da86c1ef9b5446e0e22c0b6a5846293e58d98e3. However, I also just noticed a use-after-free issue in ri_LoadConstraintInfo(). It dereferences conForm after ReleaseSysCache(tup), which is unsafe. I am attaching a tiny patch to fix that.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] v1-0001-Fix-a-use-after-problem-in-ri_LoadConstraintInfo.patch (1007B, 2-v1-0001-Fix-a-use-after-problem-in-ri_LoadConstraintInfo.patch)
download | inline diff:
From 85898f7825631f9a46b057ea486a766484c77f9b Mon Sep 17 00:00:00 2001
From: "Chao Li (Evan)" <[email protected]>
Date: Tue, 31 Mar 2026 17:06:12 +0800
Subject: [PATCH v1] Fix a use-after-problem in ri_LoadConstraintInfo()
Author: Chao Li <[email protected]>
---
src/backend/utils/adt/ri_triggers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index da7640a8005..94bb180325b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2396,8 +2396,6 @@ ri_LoadConstraintInfo(Oid constraintOid)
&riinfo->period_intersect_oper);
}
- ReleaseSysCache(tup);
-
/*
* For efficient processing of invalidation messages below, we keep a
* doubly-linked count list of all currently valid entries.
@@ -2412,6 +2410,8 @@ ri_LoadConstraintInfo(Oid constraintOid)
riinfo->fpmeta = NULL;
+ ReleaseSysCache(tup);
+
return riinfo;
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-03-31 09:17 ` Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-31 09:17 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
Hi,
On Tue, Mar 31, 2026 at 6:09 PM Chao Li <[email protected]> wrote:
> > On Mar 30, 2026, at 19:15, Amit Langote <[email protected]> wrote:
> >
> > On Mon, Mar 30, 2026 at 1:55 PM Amit Langote <[email protected]> wrote:
> >> Junwang pointed out off-list that FK tuples added to
> >> RI_FastPathEntry.batch[] were being copied into TopTransactionContext
> >> rather than flush_cxt, so they would accumulate until the batch was
> >> exhausted rather than being reclaimed per flush. Fixed in
> >> ri_FastPathBatchAdd() in 0002.
> >>
> >> Also added a couple of comments in trigger.c that were missing: an
> >> Assert and explanation in RegisterAfterTriggerBatchCallback()
> >> clarifying the query_depth >= 0 precondition, a comment at the
> >> AfterTriggerEndQuery call site explaining why
> >> FireAfterTriggerBatchCallbacks() must precede the query_depth
> >> decrement and AfterTriggerFreeQuery, and brief intent comments at the
> >> AfterTriggerFireDeferred and AfterTriggerSetState call sites.
> >>
> >> Plan is to commit 0001 tomorrow barring objections and let it sit for
> >> a bit before committing 0002. Feedback on 0002, particularly on the
> >> AfterTriggerBatchCallback mechanism in trigger.c, welcome in the
> >> meantime.
> >
> > Kept looking at 0002 and found a couple of things to improve or change
> > my thoughts about. I decided to move the permission check from fast
> > path cache entry creation into ri_FastPathBatchFlush(), alongside the
> > snapshot, so that permission changes between flushes are respected
> > rather than checked once at batch start; the check happens for every
> > row in the SPI and non-batched fast path. Also, improved comments in
> > a few places to mention design decisions better.
> >
> > 0001 is mostly unchanged from v11 except I updated its commit message
> > to explain why only RI_FKey_check is covered and not the action
> > triggers as the topic has come up in previous threads about this
> > topic.
> >
> > Still planning to commit 0001 tomorrow.
> >
> > --
> > Thanks, Amit Langote
> > <v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch><v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
>
> Hi Amit,
>
> While reading the recent commits, I saw that 0001 has been pushed as 2da86c1ef9b5446e0e22c0b6a5846293e58d98e3. However, I also just noticed a use-after-free issue in ri_LoadConstraintInfo(). It dereferences conForm after ReleaseSysCache(tup), which is unsafe. I am attaching a tiny patch to fix that.
Thanks. I noticed that too and pushed the fix an hour ago:
https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-31 10:57 ` Junwang Zhao <[email protected]>
2026-03-31 11:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Junwang Zhao @ 2026-03-31 10:57 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Mar 31, 2026 at 5:17 PM Amit Langote <[email protected]> wrote:
>
> Hi,
>
> On Tue, Mar 31, 2026 at 6:09 PM Chao Li <[email protected]> wrote:
> > > On Mar 30, 2026, at 19:15, Amit Langote <[email protected]> wrote:
> > >
> > > On Mon, Mar 30, 2026 at 1:55 PM Amit Langote <[email protected]> wrote:
> > >> Junwang pointed out off-list that FK tuples added to
> > >> RI_FastPathEntry.batch[] were being copied into TopTransactionContext
> > >> rather than flush_cxt, so they would accumulate until the batch was
> > >> exhausted rather than being reclaimed per flush. Fixed in
> > >> ri_FastPathBatchAdd() in 0002.
> > >>
> > >> Also added a couple of comments in trigger.c that were missing: an
> > >> Assert and explanation in RegisterAfterTriggerBatchCallback()
> > >> clarifying the query_depth >= 0 precondition, a comment at the
> > >> AfterTriggerEndQuery call site explaining why
> > >> FireAfterTriggerBatchCallbacks() must precede the query_depth
> > >> decrement and AfterTriggerFreeQuery, and brief intent comments at the
> > >> AfterTriggerFireDeferred and AfterTriggerSetState call sites.
> > >>
> > >> Plan is to commit 0001 tomorrow barring objections and let it sit for
> > >> a bit before committing 0002. Feedback on 0002, particularly on the
> > >> AfterTriggerBatchCallback mechanism in trigger.c, welcome in the
> > >> meantime.
> > >
> > > Kept looking at 0002 and found a couple of things to improve or change
> > > my thoughts about. I decided to move the permission check from fast
> > > path cache entry creation into ri_FastPathBatchFlush(), alongside the
> > > snapshot, so that permission changes between flushes are respected
> > > rather than checked once at batch start; the check happens for every
> > > row in the SPI and non-batched fast path. Also, improved comments in
> > > a few places to mention design decisions better.
> > >
> > > 0001 is mostly unchanged from v11 except I updated its commit message
> > > to explain why only RI_FKey_check is covered and not the action
> > > triggers as the topic has come up in previous threads about this
> > > topic.
> > >
> > > Still planning to commit 0001 tomorrow.
> > >
> > > --
> > > Thanks, Amit Langote
> > > <v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch><v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
> >
> > Hi Amit,
> >
> > While reading the recent commits, I saw that 0001 has been pushed as 2da86c1ef9b5446e0e22c0b6a5846293e58d98e3. However, I also just noticed a use-after-free issue in ri_LoadConstraintInfo(). It dereferences conForm after ReleaseSysCache(tup), which is unsafe. I am attaching a tiny patch to fix that.
>
> Thanks. I noticed that too and pushed the fix an hour ago:
>
> https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
>
> --
> Thanks, Amit Langote
prion is happy now, the fix works, thanks.
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-31 11:26 ` Amit Langote <[email protected]>
2026-03-31 11:35 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Daniel Gustafsson <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-31 11:26 UTC (permalink / raw)
To: Daniel Gustafsson <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Mar 31, 2026 at 8:22 PM Daniel Gustafsson <[email protected]> wrote:
> > On 31 Mar 2026, at 12:57, Junwang Zhao <[email protected]> wrote:
>
> > prion is happy now, the fix works, thanks.
>
> The widowbird failure seems to be SPI related as well, relevant portion of log
> below. Is that the same or another error?
prion was unhappy about something else, which I've fixed:
https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
Though, I'm not sure if or why the fix is now the reason for
widowbird's failure.
> echo "# +++ tap install-check in src/test/modules/worker_spi +++" && rm -rf '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'/tmp_check && /usr/bin/mkdir -p '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'/tmp_check && cd . && TESTLOGDIR='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/tmp_check/log' TESTDATADIR='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/tmp_check' PATH="/mnt/data/buildfarm/buildroot/HEAD/inst/bin:/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi:$PATH" PGPORT='65678' top_builddir='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/../../../..' PG_REGRESS='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/../../../../src/test/regress/pg_regress' share_contrib_dir='/mnt/data/buildfarm/buildroot/HEAD/inst/share/postgresql/extension' /usr/bin/prove -I ../../../../src/test/perl/ -I . t/*.pl
> # +++ tap install-check in src/test/modules/worker_spi +++
> t/001_worker_spi.pl ........ ok
> # Tests were run but no plan was declared and done_testing() was not seen.
> # Looks like your test exited with 29 just after 8.
> t/002_worker_terminate.pl ..
> Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 8 subtests passed
>
> Test Summary Report
> -------------------
> t/002_worker_terminate.pl (Wstat: 7424 (exited 29) Tests: 8 Failed: 0)
> Non-zero exit status: 29
> Parse errors: No plan found in TAP output
> Files=2, Tests=16, 28 wallclock secs ( 0.08 usr 0.01 sys + 6.63 cusr 2.93 csys = 9.65 CPU)
> Result: FAIL
> make[1]: *** [../../../../src/makefiles/pgxs.mk:439: installcheck] Error 1
> make[1]: Leaving directory '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'
> make: *** [Makefile:87: installcheck-worker_spi-recurse] Error 2
> log files for step testmodules-install-check-en_GB.UTF-8:
Not sure what's going on here or how it's related to 68a8601ee.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 11:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-31 11:35 ` Daniel Gustafsson <[email protected]>
2026-03-31 13:33 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Tomas Vondra <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Daniel Gustafsson @ 2026-03-31 11:35 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
> On 31 Mar 2026, at 13:26, Amit Langote <[email protected]> wrote:
>
> On Tue, Mar 31, 2026 at 8:22 PM Daniel Gustafsson <[email protected]> wrote:
>>> On 31 Mar 2026, at 12:57, Junwang Zhao <[email protected]> wrote:
>>
>>> prion is happy now, the fix works, thanks.
>>
>> The widowbird failure seems to be SPI related as well, relevant portion of log
>> below. Is that the same or another error?
>
> prion was unhappy about something else, which I've fixed:
> https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
>
> Though, I'm not sure if or why the fix is now the reason for
> widowbird's failure.
>
>> echo "# +++ tap install-check in src/test/modules/worker_spi +++" && rm -rf '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'/tmp_check && /usr/bin/mkdir -p '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'/tmp_check && cd . && TESTLOGDIR='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/tmp_check/log' TESTDATADIR='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/tmp_check' PATH="/mnt/data/buildfarm/buildroot/HEAD/inst/bin:/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi:$PATH" PGPORT='65678' top_builddir='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/../../../..' PG_REGRESS='/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/../../../../src/test/regress/pg_regress' share_contrib_dir='/mnt/data/buildfarm/buildroot/HEAD/inst/share/postgresql/extension' /usr/bin/prove -I ../../../../src/test/perl/ -I . t/*.pl
>> # +++ tap install-check in src/test/modules/worker_spi +++
>> t/001_worker_spi.pl ........ ok
>> # Tests were run but no plan was declared and done_testing() was not seen.
>> # Looks like your test exited with 29 just after 8.
>> t/002_worker_terminate.pl ..
>> Dubious, test returned 29 (wstat 7424, 0x1d00)
>> All 8 subtests passed
>>
>> Test Summary Report
>> -------------------
>> t/002_worker_terminate.pl (Wstat: 7424 (exited 29) Tests: 8 Failed: 0)
>> Non-zero exit status: 29
>> Parse errors: No plan found in TAP output
>> Files=2, Tests=16, 28 wallclock secs ( 0.08 usr 0.01 sys + 6.63 cusr 2.93 csys = 9.65 CPU)
>> Result: FAIL
>> make[1]: *** [../../../../src/makefiles/pgxs.mk:439: installcheck] Error 1
>> make[1]: Leaving directory '/mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi'
>> make: *** [Makefile:87: installcheck-worker_spi-recurse] Error 2
>> log files for step testmodules-install-check-en_GB.UTF-8:
>
> Not sure what's going on here or how it's related to 68a8601ee.
Not sure either, I just saw SPI when digging and wanted to check, but I agree
that it doesn't seem related. More log excerpts:
[11:15:49.966](3.762s) ok 1 - dynamic bgworker 0 launched
[11:15:50.372](0.407s) ok 2 - background worker blocked the database creation
[11:15:50.413](0.041s) ok 3 - background worker is still running after CREATE DATABASE WITH TEMPLATE
[11:15:50.663](0.250s) ok 4 - dynamic bgworker 1 launched
[11:15:51.085](0.421s) ok 5 - dynamic bgworker stopped for CREATE DATABASE WITH TEMPLATE
[11:15:51.264](0.179s) ok 6 - dynamic bgworker 2 launched
[11:15:51.438](0.175s) ok 7 - dynamic bgworker stopped for ALTER DATABASE RENAME
[11:15:51.655](0.216s) ok 8 - dynamic bgworker 3 launched
error running SQL: 'psql:<stdin>:1: ERROR: database "renameddb" is being accessed by other users
DETAIL: There is 1 other session using the database.'
while running 'psql --no-psqlrc --no-align --tuples-only --quiet --dbname port=17048 host=/mnt/data/buildfarm/buildroot/tmp/Li3aVcqeUa dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'ALTER DATABASE renameddb SET TABLESPACE test_tablespace' at /mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/../../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 2335.
# Postmaster PID for node "mynode" is 792281
### Stopping node "mynode" using mode immediate
# Running: pg_ctl --pgdata /mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/test/modules/worker_spi/tmp_check/t_002_worker_terminate_mynode_data/pgdata --mode immediate stop
waiting for server to shut down...... done
server stopped
# No postmaster PID for node "mynode"
[11:16:02.658](11.003s) # Tests were run but no plan was declared and done_testing() was not seen.
[11:16:02.659](0.001s) # Looks like your test exited with 29 just after 8.
2026-03-31 11:15:51.686 UTC [792340:4] 002_worker_terminate.pl LOG: statement: ALTER DATABASE renameddb SET TABLESPACE test_tablespace
2026-03-31 11:15:51.687 UTC [792340:5] 002_worker_terminate.pl DEBUG: attempting worker termination for database 16413
2026-03-31 11:15:51.687 UTC [792340:6] 002_worker_terminate.pl DEBUG: termination requested for worker (PID 792336) on database 16413
2026-03-31 11:15:51.787 UTC [792340:7] 002_worker_terminate.pl DEBUG: attempting worker termination for database 16413
2026-03-31 11:15:51.787 UTC [792340:8] 002_worker_terminate.pl DEBUG: termination requested for worker (PID 792336) on database 16413
2026-03-31 11:15:51.887 UTC [792340:9] 002_worker_terminate.pl DEBUG: attempting worker termination for database 16413
2026-03-31 11:15:51.887 UTC [792340:10] 002_worker_terminate.pl DEBUG: termination requested for worker (PID 792336) on database 16413
<-- snip -->
2026-03-31 11:15:57.088 UTC [792340:102] 002_worker_terminate.pl DEBUG: termination requested for worker (PID 792336) on database 16413
2026-03-31 11:15:57.188 UTC [792340:103] 002_worker_terminate.pl DEBUG: attempting worker termination for database 16413
2026-03-31 11:15:57.188 UTC [792340:104] 002_worker_terminate.pl DEBUG: termination requested for worker (PID 792336) on database 16413
2026-03-31 11:15:57.288 UTC [792340:105] 002_worker_terminate.pl ERROR: database "renameddb" is being accessed by other users
2026-03-31 11:15:57.288 UTC [792340:106] 002_worker_terminate.pl DETAIL: There is 1 other session using the database.
2026-03-31 11:15:57.288 UTC [792340:107] 002_worker_terminate.pl STATEMENT: ALTER DATABASE renameddb SET TABLESPACE test_tablespace
2026-03-31 11:15:57.289 UTC [792340:108] 002_worker_terminate.pl LOG: disconnection: session time: 0:00:05.611 user=buildfarm database=postgres host=[local]
2026-03-31 11:15:57.300 UTC [792281:23] LOG: received immediate shutdown request
2026-03-31 11:15:57.300 UTC [792281:24] DEBUG: updating PMState from PM_RUN to PM_WAIT_BACKENDS
2026-03-31 11:16:02.309 UTC [792281:25] LOG: issuing SIGKILL to recalcitrant children
2026-03-31 11:16:02.547 UTC [792281:26] DEBUG: updating PMState from PM_WAIT_BACKENDS to PM_WAIT_DEAD_END
2026-03-31 11:16:02.547 UTC [792281:27] DEBUG: updating PMState from PM_WAIT_DEAD_END to PM_NO_CHILDREN
2026-03-31 11:16:02.574 UTC [792281:28] LOG: database system is shut down
--
Daniel Gustafsson
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 11:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 11:35 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Daniel Gustafsson <[email protected]>
@ 2026-03-31 13:33 ` Tomas Vondra <[email protected]>
2026-04-01 00:06 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Tomas Vondra @ 2026-03-31 13:33 UTC (permalink / raw)
To: Daniel Gustafsson <[email protected]>; Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
On 3/31/26 13:35, Daniel Gustafsson wrote:
>> On 31 Mar 2026, at 13:26, Amit Langote <[email protected]> wrote:
>>
>> On Tue, Mar 31, 2026 at 8:22 PM Daniel Gustafsson <[email protected]> wrote:
>>>> On 31 Mar 2026, at 12:57, Junwang Zhao <[email protected]> wrote:
>>>
>>>> prion is happy now, the fix works, thanks.
>>>
>>> The widowbird failure seems to be SPI related as well, relevant portion of log
>>> below. Is that the same or another error?
>>
>> prion was unhappy about something else, which I've fixed:
>> https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
>>
>> Though, I'm not sure if or why the fix is now the reason for
>> widowbird's failure.
>>
Right, that failure doesn't seem related. It first appeared ~2 weeks
ago, i.e. before this got committed.
I don't know what triggered it. It might even be a simple timing issue.
This is a rpi4 machine, running from a USB flashdrive, so it's pretty
slow, and a processes can occasionally "hang" for a little bit and not
disconnect quick enough.
Not sure if something changed ~2 weeks ago. It might also be the flash
drive getting flaky (even though I don't see anything in dmesg).
Anyway, this is likely unrelated to the commit.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 11:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 11:35 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Daniel Gustafsson <[email protected]>
2026-03-31 13:33 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Tomas Vondra <[email protected]>
@ 2026-04-01 00:06 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-01 00:06 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Daniel Gustafsson <[email protected]>; Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers
On Tue, Mar 31, 2026 at 10:33 PM Tomas Vondra <[email protected]> wrote:
> On 3/31/26 13:35, Daniel Gustafsson wrote:
> >> On 31 Mar 2026, at 13:26, Amit Langote <[email protected]> wrote:
> >>
> >> On Tue, Mar 31, 2026 at 8:22 PM Daniel Gustafsson <[email protected]> wrote:
> >>>> On 31 Mar 2026, at 12:57, Junwang Zhao <[email protected]> wrote:
> >>>
> >>>> prion is happy now, the fix works, thanks.
> >>>
> >>> The widowbird failure seems to be SPI related as well, relevant portion of log
> >>> below. Is that the same or another error?
> >>
> >> prion was unhappy about something else, which I've fixed:
> >> https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
> >>
> >> Though, I'm not sure if or why the fix is now the reason for
> >> widowbird's failure.
> >>
>
> Right, that failure doesn't seem related. It first appeared ~2 weeks
> ago, i.e. before this got committed.
>
> I don't know what triggered it. It might even be a simple timing issue.
> This is a rpi4 machine, running from a USB flashdrive, so it's pretty
> slow, and a processes can occasionally "hang" for a little bit and not
> disconnect quick enough.
>
> Not sure if something changed ~2 weeks ago. It might also be the flash
> drive getting flaky (even though I don't see anything in dmesg).
I see, thanks for checking that.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-03-31 12:15 ` Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-03-31 12:15 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Mar 31, 2026 at 7:57 PM Junwang Zhao <[email protected]> wrote:
> On Tue, Mar 31, 2026 at 5:17 PM Amit Langote <[email protected]> wrote:
> > On Tue, Mar 31, 2026 at 6:09 PM Chao Li <[email protected]> wrote:
> > > > On Mar 30, 2026, at 19:15, Amit Langote <[email protected]> wrote:
> > > > Kept looking at 0002 and found a couple of things to improve or change
> > > > my thoughts about. I decided to move the permission check from fast
> > > > path cache entry creation into ri_FastPathBatchFlush(), alongside the
> > > > snapshot, so that permission changes between flushes are respected
> > > > rather than checked once at batch start; the check happens for every
> > > > row in the SPI and non-batched fast path. Also, improved comments in
> > > > a few places to mention design decisions better.
> > > >
> > > > 0001 is mostly unchanged from v11 except I updated its commit message
> > > > to explain why only RI_FKey_check is covered and not the action
> > > > triggers as the topic has come up in previous threads about this
> > > > topic.
> > > >
> > > > Still planning to commit 0001 tomorrow.
> > > >
> > > > --
> > > > Thanks, Amit Langote
> > > > <v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch><v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
> > >
> > > Hi Amit,
> > >
> > > While reading the recent commits, I saw that 0001 has been pushed as 2da86c1ef9b5446e0e22c0b6a5846293e58d98e3. However, I also just noticed a use-after-free issue in ri_LoadConstraintInfo(). It dereferences conForm after ReleaseSysCache(tup), which is unsafe. I am attaching a tiny patch to fix that.
> >
> > Thanks. I noticed that too and pushed the fix an hour ago:
> >
> > https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
> >
> > --
> > Thanks, Amit Langote
>
> prion is happy now, the fix works, thanks.
Yep, good.
Because I noticed a use-after-free with prion, I thought to check our
preparedness for CLOBBER_CACHE_ALWAYS and found issues in both the
committed patch (and similar code in 0002): riinfo going stale inside
ri_FastPathCheck() after relation opens and dangling fpmeta pointer
after riinfo invalidation. 0001 fixes those; I'll apply it tomorrow
morning.
0002 is the rebased batching patch.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v13-0001-Fix-two-issues-in-fast-path-FK-check-introduced-.patch (3.4K, 2-v13-0001-Fix-two-issues-in-fast-path-FK-check-introduced-.patch)
download | inline diff:
From 5ea789323d2005b6df8e0adc88151823e1fbed28 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 31 Mar 2026 20:53:43 +0900
Subject: [PATCH v13 1/2] Fix two issues in fast-path FK check introduced by
commit 2da86c1ef9
First, under CLOBBER_CACHE_ALWAYS, the RI_ConstraintInfo entry can
be invalidated by relcache callbacks triggered inside table_open()
or index_open(), leaving ri_FastPathCheck() calling
ri_populate_fastpath_metadata() with a stale entry whose valid flag
is false. Fix by reloading riinfo after the relation opens and
populating fpmeta immediately, then calling ri_ExtractValues() and
build_index_scankeys() before any further operations that could
trigger invalidation.
Second, fpmeta was not freed or cleared when the entry was
invalidated in InvalidateConstraintCacheCallBack(), leaving a
dangling pointer that caused a crash on the next invocation when
the guard `if (riinfo->fpmeta == NULL)` incorrectly skipped
repopulation. Fix by freeing and NULLing fpmeta at invalidation
time.
Noticed locally when testing with CLOBBER_CACHE_ALWAYS.
---
src/backend/utils/adt/ri_triggers.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index ffaa0e749cb..d57343f85d8 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2486,6 +2486,16 @@ InvalidateConstraintCacheCallBack(Datum arg, SysCacheIdentifier cacheid,
riinfo->rootHashValue == hashvalue)
{
riinfo->valid = false;
+ /*
+ * Free and clear any cached fast-path metadata so the next use
+ * repopulates it from scratch rather than following a dangling
+ * pointer.
+ */
+ if (riinfo->fpmeta)
+ {
+ pfree(riinfo->fpmeta);
+ riinfo->fpmeta = NULL;
+ }
/* Remove invalidated entries from the list, too */
dclist_delete_from(&ri_constraint_cache_valid_list, iter.cur);
}
@@ -2714,17 +2724,23 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
pk_rel = table_open(riinfo->pk_relid, RowShareLock);
idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
slot = table_slot_create(pk_rel, NULL);
scandesc = index_beginscan(pk_rel, idx_rel,
snapshot, NULL,
riinfo->nkeys, 0,
SO_NONE);
- if (riinfo->fpmeta == NULL)
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
- Assert(riinfo->fpmeta);
-
GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
saved_sec_context |
@@ -2732,8 +2748,6 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
SECURITY_NOFORCE_RLS);
ri_CheckPermissions(pk_rel);
- ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
- build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, slot,
snapshot, riinfo, skey, riinfo->nkeys);
SetUserIdAndSecContext(saved_userid, saved_sec_context);
--
2.47.3
[application/octet-stream] v13-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (43.4K, 3-v13-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 2d39a8cb90c16d586d5fd3f0d6f880ddbb578247 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 31 Mar 2026 18:22:23 +0900
Subject: [PATCH v13 2/2] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit, bulk FK inserts are ~2.9x faster (int PK /
int FK, 1M rows, PK table and index cached in memory).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 592 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 958 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d57343f85d8..7ea1755b270 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -196,6 +196,48 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Oid fk_relid; /* for ri_FastPathEndBatch() */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+} RI_FastPathEntry;
/*
* Local data
@@ -205,6 +247,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -255,6 +299,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -277,6 +331,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -387,12 +445,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2692,10 +2760,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2764,6 +2836,325 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ /* Reload; may have been invalidated since last batch accumulation. */
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0, SO_NONE);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+ if (riinfo->nkeys == 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ RI_CompareHashEntry *entry;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ entry = fpmeta->compare_entries[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ search_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(&entry->eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3691,3 +4082,196 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ entry->fk_relid = RelationGetRelid(fk_rel);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6c607d36222..91295754bab 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3557,3 +3557,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index fcdd006c971..f646dd10401 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2535,3 +2535,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c9da1f91cb9..0092a1d9027 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-03-31 15:54 ` Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Junwang Zhao @ 2026-03-31 15:54 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
Hi Amit,
On Tue, Mar 31, 2026 at 8:15 PM Amit Langote <[email protected]> wrote:
>
> On Tue, Mar 31, 2026 at 7:57 PM Junwang Zhao <[email protected]> wrote:
> > On Tue, Mar 31, 2026 at 5:17 PM Amit Langote <[email protected]> wrote:
> > > On Tue, Mar 31, 2026 at 6:09 PM Chao Li <[email protected]> wrote:
> > > > > On Mar 30, 2026, at 19:15, Amit Langote <[email protected]> wrote:
> > > > > Kept looking at 0002 and found a couple of things to improve or change
> > > > > my thoughts about. I decided to move the permission check from fast
> > > > > path cache entry creation into ri_FastPathBatchFlush(), alongside the
> > > > > snapshot, so that permission changes between flushes are respected
> > > > > rather than checked once at batch start; the check happens for every
> > > > > row in the SPI and non-batched fast path. Also, improved comments in
> > > > > a few places to mention design decisions better.
> > > > >
> > > > > 0001 is mostly unchanged from v11 except I updated its commit message
> > > > > to explain why only RI_FKey_check is covered and not the action
> > > > > triggers as the topic has come up in previous threads about this
> > > > > topic.
> > > > >
> > > > > Still planning to commit 0001 tomorrow.
> > > > >
> > > > > --
> > > > > Thanks, Amit Langote
> > > > > <v12-0001-Add-fast-path-for-foreign-key-constraint-checks.patch><v12-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
> > > >
> > > > Hi Amit,
> > > >
> > > > While reading the recent commits, I saw that 0001 has been pushed as 2da86c1ef9b5446e0e22c0b6a5846293e58d98e3. However, I also just noticed a use-after-free issue in ri_LoadConstraintInfo(). It dereferences conForm after ReleaseSysCache(tup), which is unsafe. I am attaching a tiny patch to fix that.
> > >
> > > Thanks. I noticed that too and pushed the fix an hour ago:
> > >
> > > https://www.postgresql.org/message-id/E1w7U6V-002H6n-0o%40gemulon.postgresql.org
> > >
> > > --
> > > Thanks, Amit Langote
> >
> > prion is happy now, the fix works, thanks.
>
> Yep, good.
>
> Because I noticed a use-after-free with prion, I thought to check our
> preparedness for CLOBBER_CACHE_ALWAYS and found issues in both the
> committed patch (and similar code in 0002): riinfo going stale inside
> ri_FastPathCheck() after relation opens and dangling fpmeta pointer
> after riinfo invalidation. 0001 fixes those; I'll apply it tomorrow
> morning.
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
I was thinking of wrapping the reload in a conditional check like
`!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
unconditional reload is harmless, and the code is cleaner.
+1 to the fix.
>
> 0002 is the rebased batching patch.
The change of RI_FastPathEntry from storing riinfo to fk_relid
makes sense to me. I'll do another review on 0002 tomorrow.
>
> --
> Thanks, Amit Langote
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-04-01 08:51 ` Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-01 08:51 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
> + if (riinfo->fpmeta == NULL)
> + {
> + /* Reload to ensure it's valid. */
> + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
>
> I was thinking of wrapping the reload in a conditional check like
> `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
> However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
> unconditional reload is harmless, and the code is cleaner.
>
> +1 to the fix.
Thanks for checking.
I have just pushed a slightly modified version of that.
> > 0002 is the rebased batching patch.
>
> The change of RI_FastPathEntry from storing riinfo to fk_relid
> makes sense to me. I'll do another review on 0002 tomorrow.
Here's another version.
This time, I have another fixup patch (0001) to make FastPathMeta
self-contained by copying the FmgrInfo structs it needs out of
RI_CompareHashEntry rather than storing pointers into it. This avoids
any dependency on those cache entries remaining stable. I'll push
that once the just committed patch has seen enough BF animals.
0002 is rebased over that.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v14-0001-Make-FastPathMeta-self-contained-by-copying-Fmgr.patch (2.9K, 2-v14-0001-Make-FastPathMeta-self-contained-by-copying-Fmgr.patch)
download | inline diff:
From 0a4187aa72bae647f621ea2462469ff409c00757 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 1 Apr 2026 15:53:13 +0900
Subject: [PATCH v14 1/2] Make FastPathMeta self-contained by copying FmgrInfo
structs
FastPathMeta stored pointers into ri_compare_cache entries via
compare_entries[], creating a dependency on that cache remaining
stable. If ri_compare_cache entries were invalidated after fpmeta
was populated, the pointers would dangle.
Replace compare_entries[] with inline copies of the two FmgrInfo
fields actually needed (cast_func_finfo and eq_opr_finfo), copied
at populate time via fmgr_info_copy(). fpmeta now depends only on
riinfo remaining valid, which is already handled by the invalidation
callback.
This issue was introduced by commit 2da86c1ef9 ("Add fast path for
foreign key constraint checks").
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 52bb2f2fee9..2de08da6539 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -150,7 +150,8 @@ typedef struct RI_CompareHashEntry RI_CompareHashEntry;
/* Fast-path metadata for RI checks on foreign key referencing tables */
typedef struct FastPathMeta
{
- RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ FmgrInfo eq_opr_finfo[RI_MAX_NUMKEYS];
+ FmgrInfo cast_func_finfo[RI_MAX_NUMKEYS];
RegProcedure regops[RI_MAX_NUMKEYS];
Oid subtypes[RI_MAX_NUMKEYS];
int strats[RI_MAX_NUMKEYS];
@@ -2996,16 +2997,12 @@ build_index_scankeys(const RI_ConstraintInfo *riinfo,
*/
for (int i = 0; i < riinfo->nkeys; i++)
{
- if (pk_nulls[i] != 'n')
- {
- RI_CompareHashEntry *entry = fpmeta->compare_entries[i];
-
- if (OidIsValid(entry->cast_func_finfo.fn_oid))
- pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
- pk_vals[i],
- Int32GetDatum(-1), /* typmod */
- BoolGetDatum(false)); /* implicit coercion */
- }
+ if (pk_nulls[i] != 'n' &&
+ OidIsValid(fpmeta->cast_func_finfo[i].fn_oid))
+ pk_vals[i] = FunctionCall3(&fpmeta->cast_func_finfo[i],
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
}
/*
@@ -3048,7 +3045,10 @@ ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
Oid lefttype;
RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
- fpmeta->compare_entries[i] = entry;
+ fmgr_info_copy(&fpmeta->cast_func_finfo[i], &entry->cast_func_finfo,
+ CurrentMemoryContext);
+ fmgr_info_copy(&fpmeta->eq_opr_finfo[i], &entry->eq_opr_finfo,
+ CurrentMemoryContext);
fpmeta->regops[i] = get_opcode(eq_opr);
get_op_opfamily_properties(eq_opr,
--
2.47.3
[application/octet-stream] v14-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (43.5K, 3-v14-0002-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 2bd9744ed3d4bd6156a6f3d82590aae5e3fa66aa Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 31 Mar 2026 18:22:23 +0900
Subject: [PATCH v14 2/2] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit, bulk FK inserts are ~2.9x faster (int PK /
int FK, 1M rows, PK table and index cached in memory).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 594 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 960 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2de08da6539..b60e7955636 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -199,6 +199,48 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Oid fk_relid; /* for ri_FastPathEndBatch() */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+} RI_FastPathEntry;
/*
* Local data
@@ -208,6 +250,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -258,6 +302,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -280,6 +334,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -390,12 +448,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2690,10 +2758,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2761,6 +2833,327 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ /* Reload; may have been invalidated since last batch accumulation. */
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0, SO_NONE);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+ if (riinfo->nkeys == 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ FmgrInfo *cast_func_finfo;
+ FmgrInfo *eq_opr_finfo;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ cast_func_finfo = &fpmeta->cast_func_finfo[0];
+ eq_opr_finfo = &fpmeta->eq_opr_finfo[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(cast_func_finfo->fn_oid))
+ search_vals[i] = FunctionCall3(cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3687,3 +4080,196 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ entry->fk_relid = RelationGetRelid(fk_rel);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6c607d36222..91295754bab 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3557,3 +3557,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index fcdd006c971..f646dd10401 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2535,3 +2535,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8e9c06547d6..c0b9e51e335 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-01 09:51 ` Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-01 09:51 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
> > + if (riinfo->fpmeta == NULL)
> > + {
> > + /* Reload to ensure it's valid. */
> > + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> >
> > I was thinking of wrapping the reload in a conditional check like
> > `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
> > However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
> > unconditional reload is harmless, and the code is cleaner.
> >
> > +1 to the fix.
>
> Thanks for checking.
>
> I have just pushed a slightly modified version of that.
>
> > > 0002 is the rebased batching patch.
> >
> > The change of RI_FastPathEntry from storing riinfo to fk_relid
> > makes sense to me. I'll do another review on 0002 tomorrow.
>
> Here's another version.
>
> This time, I have another fixup patch (0001) to make FastPathMeta
> self-contained by copying the FmgrInfo structs it needs out of
> RI_CompareHashEntry rather than storing pointers into it. This avoids
> any dependency on those cache entries remaining stable. I'll push
> that once the just committed patch has seen enough BF animals.
Pushed.
> 0002 is rebased over that.
Rebased again.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v15-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (43.5K, 2-v15-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 191c4e665213b38602762c068445d98b6c39139e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 31 Mar 2026 18:22:23 +0900
Subject: [PATCH v15] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit, bulk FK inserts are ~2.9x faster (int PK /
int FK, 1M rows, PK table and index cached in memory).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 594 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 960 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2de08da6539..b60e7955636 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -199,6 +199,48 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Oid fk_relid; /* for ri_FastPathEndBatch() */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+} RI_FastPathEntry;
/*
* Local data
@@ -208,6 +250,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -258,6 +302,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
+ Relation fk_rel);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -280,6 +334,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -390,12 +448,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2690,10 +2758,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2761,6 +2833,327 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ /* Reload; may have been invalidated since last batch accumulation. */
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0, SO_NONE);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+ if (riinfo->nkeys == 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ FmgrInfo *cast_func_finfo;
+ FmgrInfo *eq_opr_finfo;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ cast_func_finfo = &fpmeta->cast_func_finfo[0];
+ eq_opr_finfo = &fpmeta->eq_opr_finfo[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(cast_func_finfo->fn_oid))
+ search_vals[i] = FunctionCall3(cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3687,3 +4080,196 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
+
+ ri_FastPathBatchFlush(entry, fk_rel);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ entry->fk_relid = RelationGetRelid(fk_rel);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6c607d36222..91295754bab 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3557,3 +3557,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index fcdd006c971..f646dd10401 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2535,3 +2535,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8e9c06547d6..c0b9e51e335 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-01 11:56 ` Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Junwang Zhao @ 2026-04-01 11:56 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
>
> On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> > On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
> > > + if (riinfo->fpmeta == NULL)
> > > + {
> > > + /* Reload to ensure it's valid. */
> > > + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> > >
> > > I was thinking of wrapping the reload in a conditional check like
> > > `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
> > > However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
> > > unconditional reload is harmless, and the code is cleaner.
> > >
> > > +1 to the fix.
> >
> > Thanks for checking.
> >
> > I have just pushed a slightly modified version of that.
> >
> > > > 0002 is the rebased batching patch.
> > >
> > > The change of RI_FastPathEntry from storing riinfo to fk_relid
> > > makes sense to me. I'll do another review on 0002 tomorrow.
> >
> > Here's another version.
> >
> > This time, I have another fixup patch (0001) to make FastPathMeta
> > self-contained by copying the FmgrInfo structs it needs out of
> > RI_CompareHashEntry rather than storing pointers into it. This avoids
> > any dependency on those cache entries remaining stable. I'll push
> > that once the just committed patch has seen enough BF animals.
>
> Pushed.
>
> > 0002 is rebased over that.
>
> Rebased again.
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
+{
+ /* Reload; may have been invalidated since last batch accumulation. */
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
...
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
ri_LoadConstraintInfo is currently invoked twice within
ri_FastPathBatchFlush. Should we eliminate the second call?
Alternatively, we could refactor ri_FastPathBatchFlush to accept
an additional parameter, `const RI_ConstraintInfo *riinfo`, so we
can remove the need for the first call. In that case, we need to call
ri_LoadConstraintInfo in ri_FastPathEndBatch.
>
> --
> Thanks, Amit Langote
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
@ 2026-04-01 12:18 ` Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-01 12:18 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Apr 1, 2026 at 8:56 PM Junwang Zhao <[email protected]> wrote:
> On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> >
> > On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> > > On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
> > > > + if (riinfo->fpmeta == NULL)
> > > > + {
> > > > + /* Reload to ensure it's valid. */
> > > > + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> > > >
> > > > I was thinking of wrapping the reload in a conditional check like
> > > > `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
> > > > However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
> > > > unconditional reload is harmless, and the code is cleaner.
> > > >
> > > > +1 to the fix.
> > >
> > > Thanks for checking.
> > >
> > > I have just pushed a slightly modified version of that.
> > >
> > > > > 0002 is the rebased batching patch.
> > > >
> > > > The change of RI_FastPathEntry from storing riinfo to fk_relid
> > > > makes sense to me. I'll do another review on 0002 tomorrow.
> > >
> > > Here's another version.
> > >
> > > This time, I have another fixup patch (0001) to make FastPathMeta
> > > self-contained by copying the FmgrInfo structs it needs out of
> > > RI_CompareHashEntry rather than storing pointers into it. This avoids
> > > any dependency on those cache entries remaining stable. I'll push
> > > that once the just committed patch has seen enough BF animals.
> >
> > Pushed.
> >
> > > 0002 is rebased over that.
> >
> > Rebased again.
>
> +static void
> +ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
> +{
> + /* Reload; may have been invalidated since last batch accumulation. */
> + const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
>
> ...
> + if (riinfo->fpmeta == NULL)
> + {
> + /* Reload to ensure it's valid. */
> + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> + ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
> + fk_rel, idx_rel);
> + }
>
> ri_LoadConstraintInfo is currently invoked twice within
> ri_FastPathBatchFlush. Should we eliminate the second call?
I think we can't because the entry may be stale by the time we get to
the ri_populate_fastpath_metadata() call due to intervening steps;
even something as benign-looking index_beginscan() may call code paths
that can trigger invalidation in rare cases. Maybe predictably in
CLOBBER_CACHE_ALWAYS builds.
> Alternatively, we could refactor ri_FastPathBatchFlush to accept
> an additional parameter, `const RI_ConstraintInfo *riinfo`, so we
> can remove the need for the first call. In that case, we need to call
> ri_LoadConstraintInfo in ri_FastPathEndBatch.
Yeah, I think that's fine. Done that way in the attached.
Also, I realized that we could do:
@@ -2937,7 +2937,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
Relation fk_rel)
fk_rel, idx_rel);
}
Assert(riinfo->fpmeta);
- if (riinfo->nkeys == 1)
+ if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
fk_rel, snapshot, scandesc);
so that the fixed overhead of ri_FastPathFlushArray (allocating
matched[] array on stack and constructing ArrayType, etc.) is not paid
unnecessarily for single-row batches.
Attached patch is updated that way.
I will continue looking at this tomorrow morning with the aim of
committing it by EOD or Friday.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v16-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (43.7K, 2-v16-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 257f55f20e8a196235da33b2d4b581eec2dddd27 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 31 Mar 2026 18:22:23 +0900
Subject: [PATCH v16] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit, bulk FK inserts are ~2.9x faster (int PK /
int FK, 1M rows, PK table and index cached in memory).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 597 +++++++++++++++++++++-
src/include/commands/trigger.h | 18 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 963 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..4bc31cabff2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2de08da6539..78ba9c7cc34 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -199,6 +199,48 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Oid fk_relid; /* for ri_FastPathEndBatch() */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+} RI_FastPathEntry;
/*
* Local data
@@ -208,6 +250,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -258,6 +302,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
+ const RI_ConstraintInfo *riinfo);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -280,6 +334,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -390,12 +448,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerBatchIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2690,10 +2758,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2761,6 +2833,329 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
+ const RI_ConstraintInfo *riinfo)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0, SO_NONE);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+
+ /* Skip array overhead for single-row batches. */
+ if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply, and also for single-row batches of single-column FKs where
+ * the array overhead is not worth it.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ FmgrInfo *cast_func_finfo;
+ FmgrInfo *eq_opr_finfo;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ cast_func_finfo = &fpmeta->cast_func_finfo[0];
+ eq_opr_finfo = &fpmeta->eq_opr_finfo[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(cast_func_finfo->fn_oid))
+ search_vals[i] = FunctionCall3(cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ */
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3687,3 +4082,197 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+
+ ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ entry->fk_relid = RelationGetRelid(fk_rel);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..7664298f5c8 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6c607d36222..91295754bab 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3557,3 +3557,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index fcdd006c971..f646dd10401 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2535,3 +2535,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8e9c06547d6..c0b9e51e335 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-02 07:41 ` Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-20 20:50 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Peter Eisentraut <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-02 07:41 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Wed, Apr 1, 2026 at 9:18 PM Amit Langote <[email protected]> wrote:
> On Wed, Apr 1, 2026 at 8:56 PM Junwang Zhao <[email protected]> wrote:
> > On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> > >
> > > On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
> > > > On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
> > > > > + if (riinfo->fpmeta == NULL)
> > > > > + {
> > > > > + /* Reload to ensure it's valid. */
> > > > > + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> > > > >
> > > > > I was thinking of wrapping the reload in a conditional check like
> > > > > `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
> > > > > However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
> > > > > unconditional reload is harmless, and the code is cleaner.
> > > > >
> > > > > +1 to the fix.
> > > >
> > > > Thanks for checking.
> > > >
> > > > I have just pushed a slightly modified version of that.
> > > >
> > > > > > 0002 is the rebased batching patch.
> > > > >
> > > > > The change of RI_FastPathEntry from storing riinfo to fk_relid
> > > > > makes sense to me. I'll do another review on 0002 tomorrow.
> > > >
> > > > Here's another version.
> > > >
> > > > This time, I have another fixup patch (0001) to make FastPathMeta
> > > > self-contained by copying the FmgrInfo structs it needs out of
> > > > RI_CompareHashEntry rather than storing pointers into it. This avoids
> > > > any dependency on those cache entries remaining stable. I'll push
> > > > that once the just committed patch has seen enough BF animals.
> > >
> > > Pushed.
> > >
> > > > 0002 is rebased over that.
> > >
> > > Rebased again.
> >
> > +static void
> > +ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
> > +{
> > + /* Reload; may have been invalidated since last batch accumulation. */
> > + const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
> >
> > ...
> > + if (riinfo->fpmeta == NULL)
> > + {
> > + /* Reload to ensure it's valid. */
> > + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
> > + ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
> > + fk_rel, idx_rel);
> > + }
> >
> > ri_LoadConstraintInfo is currently invoked twice within
> > ri_FastPathBatchFlush. Should we eliminate the second call?
>
> I think we can't because the entry may be stale by the time we get to
> the ri_populate_fastpath_metadata() call due to intervening steps;
> even something as benign-looking index_beginscan() may call code paths
> that can trigger invalidation in rare cases. Maybe predictably in
> CLOBBER_CACHE_ALWAYS builds.
>
> > Alternatively, we could refactor ri_FastPathBatchFlush to accept
> > an additional parameter, `const RI_ConstraintInfo *riinfo`, so we
> > can remove the need for the first call. In that case, we need to call
> > ri_LoadConstraintInfo in ri_FastPathEndBatch.
>
> Yeah, I think that's fine. Done that way in the attached.
>
> Also, I realized that we could do:
>
> @@ -2937,7 +2937,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
> Relation fk_rel)
> fk_rel, idx_rel);
> }
> Assert(riinfo->fpmeta);
> - if (riinfo->nkeys == 1)
> + if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
> violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
> fk_rel, snapshot, scandesc);
>
> so that the fixed overhead of ri_FastPathFlushArray (allocating
> matched[] array on stack and constructing ArrayType, etc.) is not paid
> unnecessarily for single-row batches.
There's another case in which it is not ok to use FlushArray and that
is if the index AM's amsearcharray is false (should be true in all
cases because the unique index used for PK is always btree). Added an
Assert to that effect next to where SK_SEARCHARRAY is set in
ri_FastPathFlushArray rather than a runtime check in the dispatch
condition.
Patch updated. Also added a comment about invalidation requirement or
lack thereof for RI_FastPathEntry, rename AfterTriggerBatchIsActive()
to simply AfterTriggerIsActive(), fixed the comments in trigger.h
describing the callback mechanism.
Will push tomorrow morning (Friday) barring objections.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v17-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch (44.7K, 2-v17-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch)
download | inline diff:
From 810eb44a1693ac836eefad0acb4a80523c60944d Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 2 Apr 2026 14:28:36 +0900
Subject: [PATCH v17] Batch FK rows and use SK_SEARCHARRAY for fast-path FK
probes
Instead of probing the PK index on each trigger invocation, buffer
FK rows in a new per-constraint cache entry (RI_FastPathEntry) and
flush them as a batch. Combined with the fast path introduced in
the previous commit (2da86c1ef9), this change produces ~2.9x speedup
for bulk FK inserts (int PK / int FK, 1M rows, PK table and index
cached).
On each trigger invocation, the new ri_FastPathBatchAdd() buffers
the FK row in RI_FastPathEntry. When the buffer fills (64 rows)
or the trigger-firing cycle ends, the new ri_FastPathBatchFlush()
probes the index for all buffered rows, sharing a single
CommandCounterIncrement, snapshot, permission check, and security
context switch across the batch, rather than repeating each per row
as the SPI path does. Per-flush CCI is safe because all AFTER
triggers for the buffered rows have already fired by flush time.
For single-column foreign keys, the flush builds an ArrayType from
the buffered FK values (casting to the PK-side type if needed) and
constructs a scan key with the SK_SEARCHARRAY flag. The index AM
sorts and deduplicates the array internally, then walks matching
leaf pages in one ordered traversal instead of descending from the
root once per row. A matched[] bitmap tracks which batch items
were satisfied; the first unmatched item is reported as a
violation. Multi-column foreign keys fall back to per-row probing
via the new ri_FastPathFlushLoop().
FK tuples are materialized via ExecCopySlotHeapTuple() into a new
purpose-specific memory context (flush_cxt), child of
TopTransactionContext, which is also used for per-flush transient
work: cast results, the search array, and index scan allocations.
It is reset after each flush and deleted in teardown.
The PK relation, index, tuple slots, and fast-path metadata are
cached in RI_FastPathEntry across trigger invocations within a
trigger-firing batch, avoiding repeated open/close overhead. The
snapshot and IndexScanDesc are taken fresh per flush. The entry is
not subject to cache invalidation: cached relations are held with
locks for the transaction duration, and the entry's lifetime is
bounded by the trigger-firing cycle.
ri_FastPathEndBatch() flushes any partial batch before tearing
down cached resources. Since the FK relation may already be
closed by flush time (e.g. for deferred constraints at COMMIT),
it reopens the relation using entry->fk_relid if needed.
The existing ALTER TABLE validation path bypasses batching and
continues to call ri_FastPathCheck() directly per row.
Lifecycle management for RI_FastPathEntry relies on three new
mechanisms:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI
code registers ri_FastPathEndBatch as a batch callback.
- Batch callbacks only fire at the outermost query level
(checked inside FireAfterTriggerBatchCallbacks), so nested
queries from SPI inside other AFTER triggers do not tear down
the cache mid-batch.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end, handling the abort path where the
batch callback never fired.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort, preventing the batch
callback from accessing already-released resources.
- AfterTriggerBatchIsActive(): A new exported accessor that
returns true when afterTriggers.query_depth >= 0. During
ALTER TABLE ... ADD FOREIGN KEY validation, RI triggers are
called directly outside the after-trigger framework, so batch
callbacks would never fire. The fast-path code uses this to
fall back to the non-cached per-invocation path in that
context.
Author: Amit Langote <[email protected]>
Co-authored-by: Junwang Zhao <[email protected]>
Reviewed-by: Haibo Yan <[email protected]>
Tested-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/commands/trigger.c | 105 ++++
src/backend/utils/adt/ri_triggers.c | 608 +++++++++++++++++++++-
src/include/commands/trigger.h | 21 +
src/test/regress/expected/foreign_key.out | 126 +++++
src/test/regress/sql/foreign_key.sql | 118 +++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 977 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6596843a8d8..90e94fb8a5a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3893,6 +3893,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3929,6 +3931,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3964,6 +3973,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5089,6 +5099,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5210,6 +5221,15 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ /*
+ * Fire batch callbacks before releasing query-level storage and before
+ * decrementing query_depth. Callbacks may do real work (index probes,
+ * error reporting) and rely on query_depth still reflecting the current
+ * batch level so that nested calls from SPI inside AFTER triggers are
+ * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ */
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5317,6 +5337,9 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ /* Flush any fast-path batches accumulated by the triggers just fired. */
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6059,6 +6082,11 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ /*
+ * Flush any fast-path batches accumulated by the triggers just fired.
+ */
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6755,3 +6783,80 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ *
+ * Must be called while afterTriggers is active (query_depth >= 0);
+ * callbacks registered outside a trigger-firing context would never fire.
+ */
+ Assert(afterTriggers.query_depth >= 0);
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Only fires at the outermost query level (query_depth == 0) or from
+ * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
+ * at COMMIT). Nested queries from SPI inside AFTER triggers run at
+ * depth > 0 and must not tear down resources the outer batch still needs.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ if (afterTriggers.query_depth > 0)
+ return;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerIsActive
+ * Returns true if we're inside the after-trigger framework where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 2de08da6539..e78fc24f48e 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,6 +23,7 @@
#include "postgres.h"
+#include "access/amapi.h"
#include "access/genam.h"
#include "access/htup_details.h"
#include "access/skey.h"
@@ -199,6 +200,55 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * Maximum number of FK rows buffered before flushing.
+ *
+ * Larger batches amortize per-flush overhead and let the SK_SEARCHARRAY
+ * path walk more leaf pages in a single sorted traversal. But each
+ * buffered row is a materialized HeapTuple in flush_cxt, and the matched[]
+ * scan in ri_FastPathFlushArray() is O(batch_size) per index match.
+ * Benchmarking showed little difference between 16 and 64, with 256
+ * consistently slower. 64 is a reasonable default.
+ */
+#define RI_FASTPATH_BATCH_SIZE 64
+
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathBatchFlush().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathTeardown() at batch end.
+ *
+ * FK tuples are buffered in batch[] across trigger invocations and
+ * flushed when the buffer fills or the batch ends.
+ *
+ * RI_FastPathEntry is not subject to cache invalidation. The cached
+ * relations are held open with locks for the transaction duration, preventing
+ * relcache invalidation. The entry itself is torn down at batch at batch end
+ * by ri_FastPathEndBatch(); on abort, ResourceOwner releases the cached
+ * relations and the XactCallback/SubXactCallback NULL the static cache pointer
+ * to prevent any subsequent access.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Oid fk_relid; /* for ri_FastPathEndBatch() */
+ Relation pk_rel;
+ Relation idx_rel;
+ TupleTableSlot *pk_slot;
+ TupleTableSlot *fk_slot;
+ MemoryContext flush_cxt; /* short-lived context for per-flush work */
+
+ /*
+ * TODO: batch[] is HeapTuple[] because the AFTER trigger machinery
+ * currently passes tuples as HeapTuples. Once trigger infrastructure is
+ * slotified, this should use a slot array or whatever batched tuple
+ * storage abstraction exists at that point to be TAM-agnostic.
+ */
+ HeapTuple batch[RI_FASTPATH_BATCH_SIZE];
+ int batch_count;
+} RI_FastPathEntry;
/*
* Local data
@@ -208,6 +258,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -258,6 +310,16 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
bool detectNewRows, int expect_OK);
static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
+static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static int ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc);
+static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
+ const RI_ConstraintInfo *riinfo);
static bool ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
IndexScanDesc scandesc, TupleTableSlot *slot,
Snapshot snapshot, const RI_ConstraintInfo *riinfo,
@@ -280,6 +342,10 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel);
+static void ri_FastPathEndBatch(void *arg);
+static void ri_FastPathTeardown(void);
/*
@@ -390,12 +456,22 @@ RI_FKey_check(TriggerData *trigdata)
* lock. This is semantically equivalent to the SPI path below but avoids
* the per-row executor overhead.
*
- * ri_FastPathCheck() reports the violation itself (via ereport) if no
- * matching PK row is found, so it only returns on success.
+ * ri_FastPathBatchAdd() and ri_FastPathCheck() report the violation
+ * themselves if no matching PK row is found, so they only return on
+ * success.
*/
if (ri_fastpath_is_applicable(riinfo))
{
- ri_FastPathCheck(riinfo, fk_rel, newslot);
+ if (AfterTriggerIsActive())
+ {
+ /* Batched path: buffer and probe in groups */
+ ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
+ }
+ else
+ {
+ /* ALTER TABLE validation: per-row, no cache */
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ }
return PointerGetDatum(NULL);
}
@@ -2690,10 +2766,14 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
/*
* ri_FastPathCheck
- * Perform FK existence check via direct index probe, bypassing SPI.
+ * Perform per row FK existence check via direct index probe,
+ * bypassing SPI.
*
* If no matching PK row exists, report the violation via ri_ReportViolation(),
* otherwise, the function returns normally.
+ *
+ * Note: This is only used by the ALTER TABLE validation path. Other paths use
+ * ri_FastPathBatchAdd().
*/
static void
ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
@@ -2761,6 +2841,332 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
table_close(pk_rel, NoLock);
}
+/*
+ * ri_FastPathBatchAdd
+ * Buffer a FK row for batched probing.
+ *
+ * Adds the row to the batch buffer. When the buffer is full, flushes all
+ * buffered rows by probing the PK index. Any violation is reported
+ * immediately during the flush via ri_ReportViolation (which does not return).
+ *
+ * Uses the per-batch cache (RI_FastPathEntry) to avoid per-row relation
+ * open/close, slot creation, etc.
+ *
+ * The batch is also flushed at end of trigger-firing cycle via
+ * ri_FastPathEndBatch().
+ */
+static void
+ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
+ MemoryContext oldcxt;
+
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+ fpentry->batch[fpentry->batch_count] =
+ ExecCopySlotHeapTuple(newslot);
+ fpentry->batch_count++;
+ MemoryContextSwitchTo(oldcxt);
+
+ if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+ ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
+}
+
+/*
+ * ri_FastPathBatchFlush
+ * Flush all buffered FK rows by probing the PK index.
+ *
+ * Dispatches to ri_FastPathFlushArray() for single-column FKs
+ * (using SK_SEARCHARRAY) or ri_FastPathFlushLoop() for multi-column
+ * FKs (per-row probing). Violations are reported immediately via
+ * ri_ReportViolation(), which does not return.
+ */
+static void
+ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
+ const RI_ConstraintInfo *riinfo)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *fk_slot = fpentry->fk_slot;
+ Snapshot snapshot;
+ IndexScanDesc scandesc;
+ Oid saved_userid;
+ int saved_sec_context;
+ MemoryContext oldcxt;
+ int violation_index;
+
+ if (fpentry->batch_count == 0)
+ return;
+
+ /*
+ * CCI and security context switch are done once for the entire batch.
+ * Per-row CCI is unnecessary because by the time a flush runs, all AFTER
+ * triggers for the buffered rows have already fired (trigger invocations
+ * strictly alternate per row), so a single CCI advances past all their
+ * effects. Per-row security context switch is unnecessary because each
+ * row's probe runs entirely as the PK table owner, same as the SPI path
+ * -- the only difference is that the SPI path sets and restores the
+ * context per row whereas we do it once around the whole batch.
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+
+ /*
+ * build_index_scankeys() may palloc cast results for cross-type FKs. Use
+ * the entry's short-lived flush context so these don't accumulate across
+ * batches.
+ */
+ oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+
+ scandesc = index_beginscan(pk_rel, idx_rel, snapshot, NULL,
+ riinfo->nkeys, 0, SO_NONE);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+
+ /*
+ * Check that the current user has permission to access pk_rel. Done here
+ * rather than at entry creation so that permission changes between
+ * flushes are respected, matching the per-row behavior of the SPI path,
+ * albeit checked once per flush rather than once per row, like in
+ * ri_FastPathCheck().
+ */
+ ri_CheckPermissions(pk_rel);
+
+ if (riinfo->fpmeta == NULL)
+ {
+ /* Reload to ensure it's valid. */
+ riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ }
+ Assert(riinfo->fpmeta);
+
+ /* Skip array overhead for single-row batches. */
+ if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+ violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+ else
+ violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+ fk_rel, snapshot, scandesc);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+ UnregisterSnapshot(snapshot);
+ index_endscan(scandesc);
+
+ if (violation_index >= 0)
+ {
+ ExecStoreHeapTuple(fpentry->batch[violation_index], fk_slot, false);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ fk_slot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ MemoryContextReset(fpentry->flush_cxt);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Reset. */
+ fpentry->batch_count = 0;
+}
+
+/*
+ * ri_FastPathFlushLoop
+ * Multi-column fallback: probe the index once per buffered row.
+ *
+ * Used for composite foreign keys where SK_SEARCHARRAY does not
+ * apply, and also for single-row batches of single-column FKs where
+ * the array overhead is not worth it.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushLoop(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = true;
+
+ for (int i = 0; i < fpentry->batch_count; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ found = ri_FastPathProbeOne(pk_rel, idx_rel, scandesc, pk_slot,
+ snapshot, riinfo, skey, riinfo->nkeys);
+
+ /* Report first unmatched row */
+ if (!found)
+ return i;
+ }
+
+ /* All pass. */
+ return -1;
+}
+
+/*
+ * ri_FastPathFlushArray
+ * Single-column fast path using SK_SEARCHARRAY.
+ *
+ * Builds an array of FK values and does one index scan with
+ * SK_SEARCHARRAY. The index AM sorts and deduplicates the array
+ * internally, then walks matching leaf pages in order. Each
+ * matched PK tuple is locked and rechecked as before; a matched[]
+ * bitmap tracks which batch items were satisfied.
+ *
+ * Returns the index of the first violating row in the batch array, or -1 if
+ * all rows are valid.
+ */
+static int
+ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
+ const RI_ConstraintInfo *riinfo, Relation fk_rel,
+ Snapshot snapshot, IndexScanDesc scandesc)
+{
+ FastPathMeta *fpmeta = riinfo->fpmeta;
+ Relation pk_rel = fpentry->pk_rel;
+ Relation idx_rel = fpentry->idx_rel;
+ TupleTableSlot *pk_slot = fpentry->pk_slot;
+ Datum search_vals[RI_FASTPATH_BATCH_SIZE];
+ bool matched[RI_FASTPATH_BATCH_SIZE];
+ int nvals = fpentry->batch_count;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[1];
+ FmgrInfo *cast_func_finfo;
+ FmgrInfo *eq_opr_finfo;
+ Oid elem_type;
+ int16 elem_len;
+ bool elem_byval;
+ char elem_align;
+ ArrayType *arr;
+
+ Assert(fpmeta);
+
+ memset(matched, 0, nvals * sizeof(bool));
+
+ /*
+ * Extract FK values, casting to the operator's expected input type if
+ * needed (e.g. int8 FK -> int4 for int48eq).
+ */
+ cast_func_finfo = &fpmeta->cast_func_finfo[0];
+ eq_opr_finfo = &fpmeta->eq_opr_finfo[0];
+ for (int i = 0; i < nvals; i++)
+ {
+ ExecStoreHeapTuple(fpentry->batch[i], fk_slot, false);
+ ri_ExtractValues(fk_rel, fk_slot, riinfo, false, pk_vals, pk_nulls);
+
+ /* Cast if needed (e.g. int8 FK -> numeric PK) */
+ if (OidIsValid(cast_func_finfo->fn_oid))
+ search_vals[i] = FunctionCall3(cast_func_finfo,
+ pk_vals[0],
+ Int32GetDatum(-1),
+ BoolGetDatum(false));
+ else
+ search_vals[i] = pk_vals[0];
+ }
+
+ /*
+ * Array element type must match the operator's right-hand input type,
+ * which is what the index comparison expects on the search side.
+ * ri_populate_fastpath_metadata() stores exactly this via
+ * get_op_opfamily_properties(), which returns the operator's right-hand
+ * type as the subtype for cross-type operators (e.g. int8 for int48eq)
+ * and the common type for same-type operators.
+ */
+ elem_type = fpmeta->subtypes[0];
+ Assert(OidIsValid(elem_type));
+ get_typlenbyvalalign(elem_type, &elem_len, &elem_byval, &elem_align);
+
+ arr = construct_array(search_vals, nvals,
+ elem_type, elem_len, elem_byval, elem_align);
+
+ /*
+ * Build scan key with SK_SEARCHARRAY. The index AM code will internally
+ * sort and deduplicate, then walk leaf pages in order.
+ *
+ * PK indexes are always btree, which supports SK_SEARCHARRAY.
+ */
+ Assert(idx_rel->rd_indam->amsearcharray);
+ ScanKeyEntryInitialize(&skey[0],
+ SK_SEARCHARRAY,
+ 1, /* attno */
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ PointerGetDatum(arr));
+
+ index_rescan(scandesc, skey, 1, NULL, 0);
+
+ /*
+ * Walk all matches. The index AM returns them in index order. For each
+ * match, find which batch item(s) it satisfies.
+ */
+ while (index_getnext_slot(scandesc, ForwardScanDirection, pk_slot))
+ {
+ Datum found_val;
+ bool found_null;
+ bool concurrently_updated;
+ ScanKeyData recheck_skey[1];
+
+ if (!ri_LockPKTuple(pk_rel, pk_slot, snapshot, &concurrently_updated))
+ continue;
+
+ /* Extract the PK value from the matched and locked tuple */
+ found_val = slot_getattr(pk_slot, riinfo->pk_attnums[0], &found_null);
+ Assert(!found_null);
+
+ if (concurrently_updated)
+ {
+ /*
+ * Build a single-key scankey for recheck. We need the actual PK
+ * value that was found, not the FK search value.
+ */
+ ScanKeyEntryInitialize(&recheck_skey[0], 0, 1,
+ fpmeta->strats[0],
+ fpmeta->subtypes[0],
+ idx_rel->rd_indcollation[0],
+ fpmeta->regops[0],
+ found_val);
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ continue;
+ }
+
+ /*
+ * Linear scan to mark all batch items matching this PK value.
+ * O(batch_size) per match, O(batch_size^2) worst case -- fine for the
+ * current batch size of 64.
+ */
+ for (int i = 0; i < nvals; i++)
+ {
+ if (!matched[i] &&
+ DatumGetBool(FunctionCall2Coll(eq_opr_finfo,
+ idx_rel->rd_indcollation[0],
+ found_val,
+ search_vals[i])))
+ matched[i] = true;
+ }
+ }
+
+ /* Report first unmatched row */
+ for (int i = 0; i < nvals; i++)
+ if (!matched[i])
+ return i;
+
+ /* All pass. */
+ return -1;
+}
+
/*
* ri_FastPathProbeOne
* Probe the PK index for one set of scan keys, lock the matching
@@ -3687,3 +4093,197 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathEndBatch
+ * Flush remaining rows and tear down cached state.
+ *
+ * Registered as an AfterTriggerBatchCallback. Note: the flush can
+ * do real work (CCI, security context switch, index probes) and can
+ * throw ERROR on a constraint violation. If that happens,
+ * ri_FastPathTeardown never runs; ResourceOwner + XactCallback
+ * handle resource cleanup on the abort path.
+ */
+static void
+ri_FastPathEndBatch(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ /* Flush any partial batches -- can throw ERROR */
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->batch_count > 0)
+ {
+ Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
+ const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+
+ ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+ table_close(fk_rel, NoLock);
+ }
+ }
+
+ /* Orderly teardown */
+ ri_FastPathTeardown();
+}
+
+/*
+ * ri_FastPathTeardown
+ * Tear down all cached fast-path state.
+ *
+ * Called from ri_FastPathEndBatch() after flushing any remaining rows.
+ */
+static void
+ri_FastPathTeardown(void)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->pk_slot)
+ ExecDropSingleTupleTableSlot(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecDropSingleTupleTableSlot(entry->fk_slot);
+ if (entry->flush_cxt)
+ MemoryContextDelete(entry->flush_cxt);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * On abort, ResourceOwner already released relations; on commit,
+ * ri_FastPathTeardown already ran. Either way, just NULL the static
+ * pointers so they don't dangle into the next transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already released relations. NULL the static pointers
+ * so the still-registered batch callback becomes a no-op for the rest
+ * of this transaction.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the index,
+ * allocates slots for both FK row and the looked up PK row, and registers the
+ * cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathTeardown is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + offsetof(RI_FastPathEntry, pk_rel), 0,
+ sizeof(RI_FastPathEntry) - offsetof(RI_FastPathEntry, pk_rel));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ entry->fk_relid = RelationGetRelid(fk_rel);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ entry->pk_slot = table_slot_create(entry->pk_rel, NULL);
+
+ /*
+ * Must be TTSOpsHeapTuple because ExecStoreHeapTuple() is used to
+ * load entries from batch[] into this slot for value extraction.
+ */
+ entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
+ &TTSOpsHeapTuple);
+
+ entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
+ "RI fast path flush temporary context",
+ ALLOCSET_SMALL_SIZES);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 27af5284406..1d9869973c0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,25 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch callbacks.
+ *
+ * Currently used by ri_triggers.c to flush fast-path FK batches and
+ * clean up associated resources.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * the current trigger-firing batch completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6c607d36222..91295754bab 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3557,3 +3557,129 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+ a
+---
+ 1
+ 1
+(2 rows)
+
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+NOTICE: fp_auto_pk called
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+ERROR: insert or update on table "fp_fk_multi" violates foreign key constraint "fp_fk_multi_a_b_fkey"
+DETAIL: Key (a, b)=(999, 999) is not present in table "fp_pk_multi".
+DROP TABLE fp_fk_multi, fp_pk_multi;
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+ERROR: insert or update on table "fp_fk_commit" violates foreign key constraint "fp_fk_commit_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_commit".
+DROP TABLE fp_fk_commit, fp_pk_commit;
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+ERROR: insert or update on table "fp_fk_cross" violates foreign key constraint "fp_fk_cross_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "fp_pk_cross".
+DROP TABLE fp_fk_cross, fp_pk_cross;
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index fcdd006c971..f646dd10401 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2535,3 +2535,121 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
+
+-- Subtransaction abort: cached state must be invalidated on ROLLBACK TO
+CREATE TABLE fp_pk_subxact (a int PRIMARY KEY);
+CREATE TABLE fp_fk_subxact (a int REFERENCES fp_pk_subxact);
+INSERT INTO fp_pk_subxact VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_subxact VALUES (1);
+SAVEPOINT sp1;
+INSERT INTO fp_fk_subxact VALUES (2);
+ROLLBACK TO sp1;
+INSERT INTO fp_fk_subxact VALUES (1);
+COMMIT;
+SELECT * FROM fp_fk_subxact;
+DROP TABLE fp_fk_subxact, fp_pk_subxact;
+
+-- FK check must see PK rows inserted by earlier AFTER triggers
+-- firing on the same statement
+CREATE TABLE fp_pk_cci (a int PRIMARY KEY);
+CREATE TABLE fp_fk_cci (a int REFERENCES fp_pk_cci);
+
+CREATE FUNCTION fp_auto_pk() RETURNS trigger AS $$
+BEGIN
+ RAISE NOTICE 'fp_auto_pk called';
+ INSERT INTO fp_pk_cci VALUES (NEW.a);
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+-- Name sorts before the RI trigger, so fires first per row
+CREATE TRIGGER "AAA_auto" AFTER INSERT ON fp_fk_cci
+ FOR EACH ROW EXECUTE FUNCTION fp_auto_pk();
+
+-- Should succeed: AAA_auto provisions the PK row before RI check
+INSERT INTO fp_fk_cci VALUES (1), (2), (3);
+
+DROP TABLE fp_fk_cci, fp_pk_cci;
+DROP FUNCTION fp_auto_pk;
+
+-- Multi-column FK: exercises batched per-row probing with composite keys
+CREATE TABLE fp_pk_multi (a int, b int, PRIMARY KEY (a, b));
+INSERT INTO fp_pk_multi SELECT i, i FROM generate_series(1, 100) i;
+CREATE TABLE fp_fk_multi (x int, a int, b int,
+ FOREIGN KEY (a, b) REFERENCES fp_pk_multi);
+INSERT INTO fp_fk_multi SELECT i, i, i FROM generate_series(1, 100) i;
+INSERT INTO fp_fk_multi VALUES (1, 999, 999);
+DROP TABLE fp_fk_multi, fp_pk_multi;
+
+-- Deferred constraint: batch flushed at COMMIT, not at statement end
+CREATE TABLE fp_pk_commit (a int PRIMARY KEY);
+CREATE TABLE fp_fk_commit (a int REFERENCES fp_pk_commit
+ DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_commit VALUES (1);
+BEGIN;
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (1);
+INSERT INTO fp_fk_commit VALUES (999);
+COMMIT;
+DROP TABLE fp_fk_commit, fp_pk_commit;
+
+-- Cross-type FK with bulk insert: int8 FK referencing int4 PK,
+-- values cast during array construction
+CREATE TABLE fp_pk_cross (a int4 PRIMARY KEY);
+INSERT INTO fp_pk_cross SELECT generate_series(1, 200);
+CREATE TABLE fp_fk_cross (a int8 REFERENCES fp_pk_cross);
+INSERT INTO fp_fk_cross SELECT generate_series(1, 200);
+INSERT INTO fp_fk_cross VALUES (999);
+DROP TABLE fp_fk_cross, fp_pk_cross;
+
+-- Duplicate FK values: when using the batched SAOP path, every
+-- row must be recognized as satisfied, not just the first match
+CREATE TABLE fp_pk_dup (a int PRIMARY KEY);
+INSERT INTO fp_pk_dup VALUES (1);
+CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
+INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
+DROP TABLE fp_fk_dup, fp_pk_dup;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8e9c06547d6..c0b9e51e335 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2485,6 +2487,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-02 07:59 ` Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-02 07:59 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
> On Apr 2, 2026, at 15:41, Amit Langote <[email protected]> wrote:
>
> On Wed, Apr 1, 2026 at 9:18 PM Amit Langote <[email protected]> wrote:
>> On Wed, Apr 1, 2026 at 8:56 PM Junwang Zhao <[email protected]> wrote:
>>> On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
>>>>
>>>> On Wed, Apr 1, 2026 at 5:51 PM Amit Langote <[email protected]> wrote:
>>>>> On Wed, Apr 1, 2026 at 12:54 AM Junwang Zhao <[email protected]> wrote:
>>>>>> + if (riinfo->fpmeta == NULL)
>>>>>> + {
>>>>>> + /* Reload to ensure it's valid. */
>>>>>> + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
>>>>>>
>>>>>> I was thinking of wrapping the reload in a conditional check like
>>>>>> `!riinfo->valid`, since `riinfo` can be valid even when `fpmeta == NULL`.
>>>>>> However, `if (riinfo->fpmeta == NULL)` should rarely be true, so the
>>>>>> unconditional reload is harmless, and the code is cleaner.
>>>>>>
>>>>>> +1 to the fix.
>>>>>
>>>>> Thanks for checking.
>>>>>
>>>>> I have just pushed a slightly modified version of that.
>>>>>
>>>>>>> 0002 is the rebased batching patch.
>>>>>>
>>>>>> The change of RI_FastPathEntry from storing riinfo to fk_relid
>>>>>> makes sense to me. I'll do another review on 0002 tomorrow.
>>>>>
>>>>> Here's another version.
>>>>>
>>>>> This time, I have another fixup patch (0001) to make FastPathMeta
>>>>> self-contained by copying the FmgrInfo structs it needs out of
>>>>> RI_CompareHashEntry rather than storing pointers into it. This avoids
>>>>> any dependency on those cache entries remaining stable. I'll push
>>>>> that once the just committed patch has seen enough BF animals.
>>>>
>>>> Pushed.
>>>>
>>>>> 0002 is rebased over that.
>>>>
>>>> Rebased again.
>>>
>>> +static void
>>> +ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel)
>>> +{
>>> + /* Reload; may have been invalidated since last batch accumulation. */
>>> + const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(fpentry->conoid);
>>>
>>> ...
>>> + if (riinfo->fpmeta == NULL)
>>> + {
>>> + /* Reload to ensure it's valid. */
>>> + riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
>>> + ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
>>> + fk_rel, idx_rel);
>>> + }
>>>
>>> ri_LoadConstraintInfo is currently invoked twice within
>>> ri_FastPathBatchFlush. Should we eliminate the second call?
>>
>> I think we can't because the entry may be stale by the time we get to
>> the ri_populate_fastpath_metadata() call due to intervening steps;
>> even something as benign-looking index_beginscan() may call code paths
>> that can trigger invalidation in rare cases. Maybe predictably in
>> CLOBBER_CACHE_ALWAYS builds.
>>
>>> Alternatively, we could refactor ri_FastPathBatchFlush to accept
>>> an additional parameter, `const RI_ConstraintInfo *riinfo`, so we
>>> can remove the need for the first call. In that case, we need to call
>>> ri_LoadConstraintInfo in ri_FastPathEndBatch.
>>
>> Yeah, I think that's fine. Done that way in the attached.
>>
>> Also, I realized that we could do:
>>
>> @@ -2937,7 +2937,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry,
>> Relation fk_rel)
>> fk_rel, idx_rel);
>> }
>> Assert(riinfo->fpmeta);
>> - if (riinfo->nkeys == 1)
>> + if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
>> violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
>> fk_rel, snapshot, scandesc);
>>
>> so that the fixed overhead of ri_FastPathFlushArray (allocating
>> matched[] array on stack and constructing ArrayType, etc.) is not paid
>> unnecessarily for single-row batches.
>
> There's another case in which it is not ok to use FlushArray and that
> is if the index AM's amsearcharray is false (should be true in all
> cases because the unique index used for PK is always btree). Added an
> Assert to that effect next to where SK_SEARCHARRAY is set in
> ri_FastPathFlushArray rather than a runtime check in the dispatch
> condition.
>
> Patch updated. Also added a comment about invalidation requirement or
> lack thereof for RI_FastPathEntry, rename AfterTriggerBatchIsActive()
> to simply AfterTriggerIsActive(), fixed the comments in trigger.h
> describing the callback mechanism.
>
> Will push tomorrow morning (Friday) barring objections.
>
> --
> Thanks, Amit Langote
> <v17-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
With a quick eyeball review, I found a typo:
```
+ * relcache invalidation. The entry itself is torn down at batch at batch end
```
There are two “at batch”.
I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-03 05:52 ` Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-03 05:52 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
Hi,
On Thu, Apr 2, 2026 at 5:00 PM Chao Li <[email protected]> wrote:
> > On Apr 2, 2026, at 15:41, Amit Langote <[email protected]> wrote:
> > Will push tomorrow morning (Friday) barring objections.
> > <v17-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
>
> With a quick eyeball review, I found a typo:
> ```
> + * relcache invalidation. The entry itself is torn down at batch at batch end
> ```
>
> There are two “at batch”.
Thanks for spotting that. Fixed and pushed.
> I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
Sure, I didn't want to leave committing this to the weekend or the next week.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-03 08:57 ` Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-03 08:57 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
> On Apr 3, 2026, at 13:52, Amit Langote <[email protected]> wrote:
>
> Hi,
>
> On Thu, Apr 2, 2026 at 5:00 PM Chao Li <[email protected]> wrote:
>>> On Apr 2, 2026, at 15:41, Amit Langote <[email protected]> wrote:
>>> Will push tomorrow morning (Friday) barring objections.
>>> <v17-0001-Batch-FK-rows-and-use-SK_SEARCHARRAY-for-fast-pa.patch>
>>
>> With a quick eyeball review, I found a typo:
>> ```
>> + * relcache invalidation. The entry itself is torn down at batch at batch end
>> ```
>>
>> There are two “at batch”.
>
> Thanks for spotting that. Fixed and pushed.
>
>> I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
>
> Sure, I didn't want to leave committing this to the weekend or the next week.
>
> --
> Thanks, Amit Langote
Hi Amit,
I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
I traced this scenario:
```
CREATE TABLE pk (a int primary key);
CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
BEGIN;
INSERT INTO fk VALUES (1);
INSERT INTO pk VALUES (1);
COMMIT;
```
When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
From the code:
```
if (ri_fastpath_is_applicable(riinfo))
{
if (AfterTriggerIsActive())
{
/* Batched path: buffer and probe in groups */
ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
}
else
{
/* ALTER TABLE validation: per-row, no cache */
ri_FastPathCheck(riinfo, fk_rel, newslot);
}
return PointerGetDatum(NULL);
}
```
So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass. But I am still new to PG development, so I’m not sure whether I may have missed something.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] fix_deferred_trigger.diff (1.0K, 2-fix_deferred_trigger.diff)
download | inline diff:
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 90e94fb8a5a..63355ebb02f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -6806,10 +6806,10 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
* Allocate in TopTransactionContext so the item survives for the duration
* of the batch, which may span multiple trigger invocations.
*
- * Must be called while afterTriggers is active (query_depth >= 0);
+ * Must be called while afterTriggers is active (MyTriggerDepth > 0);
* callbacks registered outside a trigger-firing context would never fire.
*/
- Assert(afterTriggers.query_depth >= 0);
+ Assert(MyTriggerDepth > 0);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
@@ -6858,5 +6858,5 @@ FireAfterTriggerBatchCallbacks(void)
bool
AfterTriggerIsActive(void)
{
- return afterTriggers.query_depth >= 0;
+ return MyTriggerDepth > 0;
}
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-03 09:39 ` Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-03 09:39 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Fri, Apr 3, 2026 at 5:58 PM Chao Li <[email protected]> wrote:
> > On Apr 3, 2026, at 13:52, Amit Langote <[email protected]> wrote:
> > On Thu, Apr 2, 2026 at 5:00 PM Chao Li <[email protected]> wrote:
> >> I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
> >
> > Sure, I didn't want to leave committing this to the weekend or the next week.
>
> I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
>
> I traced this scenario:
> ```
> CREATE TABLE pk (a int primary key);
> CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
> BEGIN;
> INSERT INTO fk VALUES (1);
> INSERT INTO pk VALUES (1);
> COMMIT;
> ```
>
> When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
>
> From the code:
> ```
> if (ri_fastpath_is_applicable(riinfo))
> {
> if (AfterTriggerIsActive())
> {
> /* Batched path: buffer and probe in groups */
> ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
> }
> else
> {
> /* ALTER TABLE validation: per-row, no cache */
> ri_FastPathCheck(riinfo, fk_rel, newslot);
> }
> return PointerGetDatum(NULL);
> }
> ```
>
> So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
>
> I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass.
I think you might be right. Thanks for the patch. It looks correct
to me at a glance, but I will need to check it a bit more closely
before committing.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-06 09:45 ` Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-06 09:45 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Fri, Apr 3, 2026 at 6:39 PM Amit Langote <[email protected]> wrote:
> On Fri, Apr 3, 2026 at 5:58 PM Chao Li <[email protected]> wrote:
> > > On Apr 3, 2026, at 13:52, Amit Langote <[email protected]> wrote:
> > > On Thu, Apr 2, 2026 at 5:00 PM Chao Li <[email protected]> wrote:
> > >> I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
> > >
> > > Sure, I didn't want to leave committing this to the weekend or the next week.
> >
> > I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
> >
> > I traced this scenario:
> > ```
> > CREATE TABLE pk (a int primary key);
> > CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
> > BEGIN;
> > INSERT INTO fk VALUES (1);
> > INSERT INTO pk VALUES (1);
> > COMMIT;
> > ```
> >
> > When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
> >
> > From the code:
> > ```
> > if (ri_fastpath_is_applicable(riinfo))
> > {
> > if (AfterTriggerIsActive())
> > {
> > /* Batched path: buffer and probe in groups */
> > ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
> > }
> > else
> > {
> > /* ALTER TABLE validation: per-row, no cache */
> > ri_FastPathCheck(riinfo, fk_rel, newslot);
> > }
> > return PointerGetDatum(NULL);
> > }
> > ```
> >
> > So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
> >
> > I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass.
>
> I think you might be right. Thanks for the patch. It looks correct
> to me at a glance, but I will need to check it a bit more closely
> before committing.
Thinking about this some more, your fix is on the right track but
needs a bit more work -- MyTriggerDepth > 0 is too broad since it
fires for BEFORE triggers too. I have a revised version using a new
afterTriggerFiringDepth counter that I'll push shortly.
Added an open item for tracking in the meantime:
https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v2-0001-Fix-deferred-FK-check-batching-introduced-by-comm.patch (5.3K, 2-v2-0001-Fix-deferred-FK-check-batching-introduced-by-comm.patch)
download | inline diff:
From c208c7cf13c6968a12e4c9b321ebeebebd931a42 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 6 Apr 2026 18:40:05 +0900
Subject: [PATCH v2] Fix deferred FK check batching introduced by commit
b7b27eb41a5
That commit introduced AfterTriggerIsActive() to detect whether
we are inside the after-trigger firing machinery, so that RI trigger
functions can take the batched fast path. It was implemented using
query_depth >= 0, which correctly identified immediate trigger firing
but missed the deferred case where query_depth is -1 at COMMIT via
AfterTriggerFireDeferred(). This caused deferred FK checks to fall
back to the per-row fast path instead of the batched path.
The correct check is whether we are inside an after-trigger firing
loop specifically. Introduce afterTriggerFiringDepth, a counter
incremented around the trigger-firing loops in AfterTriggerEndQuery,
AfterTriggerFireDeferred, and AfterTriggerSetState, and decremented
after FireAfterTriggerBatchCallbacks() returns. AfterTriggerIsActive()
now returns afterTriggerFiringDepth > 0.
Reported-by: Chao Li <[email protected]>
Author: Chao Li <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/trigger.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4d4e96a5302..5fe2585c88f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3940,6 +3940,13 @@ typedef struct AfterTriggerCallbackItem
static AfterTriggersData afterTriggers;
+/*
+ * Incremented before invoking afterTriggerInvokeEvents(). Used by
+ * AfterTriggerIsActive() to determine whether batch callbacks will fire,
+ * so that RI trigger functions can take the batched fast path.
+ */
+static int afterTriggerFiringDepth = 0;
+
static void AfterTriggerExecute(EState *estate,
AfterTriggerEvent event,
ResultRelInfo *relInfo,
@@ -5113,6 +5120,7 @@ AfterTriggerBeginXact(void)
Assert(afterTriggers.events.head == NULL);
Assert(afterTriggers.trans_stack == NULL);
Assert(afterTriggers.maxtransdepth == 0);
+ Assert(afterTriggerFiringDepth == 0);
}
@@ -5184,6 +5192,7 @@ AfterTriggerEndQuery(EState *estate)
*/
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
+ afterTriggerFiringDepth++;
for (;;)
{
if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
@@ -5234,6 +5243,7 @@ AfterTriggerEndQuery(EState *estate)
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
afterTriggers.query_depth--;
+ afterTriggerFiringDepth--;
}
@@ -5329,6 +5339,7 @@ AfterTriggerFireDeferred(void)
* Run all the remaining triggers. Loop until they are all gone, in case
* some trigger queues more for us to do.
*/
+ afterTriggerFiringDepth++;
while (afterTriggerMarkEvents(events, NULL, false))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -5340,6 +5351,8 @@ AfterTriggerFireDeferred(void)
/* Flush any fast-path batches accumulated by the triggers just fired. */
FireAfterTriggerBatchCallbacks();
+ afterTriggerFiringDepth--;
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -5404,6 +5417,8 @@ AfterTriggerEndXact(bool isCommit)
/* No more afterTriggers manipulation until next transaction starts. */
afterTriggers.query_depth = -1;
+
+ afterTriggerFiringDepth = 0;
}
/*
@@ -6053,6 +6068,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
AfterTriggerEventList *events = &afterTriggers.events;
bool snapshot_set = false;
+ afterTriggerFiringDepth++;
while (afterTriggerMarkEvents(events, NULL, true))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -6086,6 +6102,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
* Flush any fast-path batches accumulated by the triggers just fired.
*/
FireAfterTriggerBatchCallbacks();
+ afterTriggerFiringDepth--;
if (snapshot_set)
PopActiveSnapshot();
@@ -6806,10 +6823,10 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
* Allocate in TopTransactionContext so the item survives for the duration
* of the batch, which may span multiple trigger invocations.
*
- * Must be called while afterTriggers is active (query_depth >= 0);
- * callbacks registered outside a trigger-firing context would never fire.
+ * Must be called while afterTriggers is active; callbacks registered
+ * outside a trigger-firing context would never fire.
*/
- Assert(afterTriggers.query_depth >= 0);
+ Assert(afterTriggerFiringDepth > 0);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
@@ -6836,6 +6853,7 @@ FireAfterTriggerBatchCallbacks(void)
if (afterTriggers.query_depth > 0)
return;
+ Assert(afterTriggerFiringDepth > 0);
foreach(lc, afterTriggers.batch_callbacks)
{
AfterTriggerCallbackItem *item = lfirst(lc);
@@ -6858,5 +6876,5 @@ FireAfterTriggerBatchCallbacks(void)
bool
AfterTriggerIsActive(void)
{
- return afterTriggers.query_depth >= 0;
+ return afterTriggerFiringDepth > 0;
}
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-07 01:45 ` Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-07 01:45 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
> On Apr 6, 2026, at 17:45, Amit Langote <[email protected]> wrote:
>
> On Fri, Apr 3, 2026 at 6:39 PM Amit Langote <[email protected]> wrote:
>> On Fri, Apr 3, 2026 at 5:58 PM Chao Li <[email protected]> wrote:
>>>> On Apr 3, 2026, at 13:52, Amit Langote <[email protected]> wrote:
>>>> On Thu, Apr 2, 2026 at 5:00 PM Chao Li <[email protected]> wrote:
>>>>> I plan to spend time testing and tracing this patch tomorrow. But I don’t want to block your progress, if I find anything, I will report to you.
>>>>
>>>> Sure, I didn't want to leave committing this to the weekend or the next week.
>>>
>>> I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
>>>
>>> I traced this scenario:
>>> ```
>>> CREATE TABLE pk (a int primary key);
>>> CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
>>> BEGIN;
>>> INSERT INTO fk VALUES (1);
>>> INSERT INTO pk VALUES (1);
>>> COMMIT;
>>> ```
>>>
>>> When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
>>>
>>> From the code:
>>> ```
>>> if (ri_fastpath_is_applicable(riinfo))
>>> {
>>> if (AfterTriggerIsActive())
>>> {
>>> /* Batched path: buffer and probe in groups */
>>> ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
>>> }
>>> else
>>> {
>>> /* ALTER TABLE validation: per-row, no cache */
>>> ri_FastPathCheck(riinfo, fk_rel, newslot);
>>> }
>>> return PointerGetDatum(NULL);
>>> }
>>> ```
>>>
>>> So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
>>>
>>> I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass.
>>
>> I think you might be right. Thanks for the patch. It looks correct
>> to me at a glance, but I will need to check it a bit more closely
>> before committing.
>
> Thinking about this some more, your fix is on the right track but
> needs a bit more work -- MyTriggerDepth > 0 is too broad since it
> fires for BEFORE triggers too. I have a revised version using a new
> afterTriggerFiringDepth counter that I'll push shortly.
>
> Added an open item for tracking in the meantime:
> https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
>
> --
> Thanks, Amit Langote
> <v2-0001-Fix-deferred-FK-check-batching-introduced-by-comm.patch>
V2 looks good to me. Besides the normal cases, I also traced an abnormal case to verify that afterTriggerFiringDepth is always reset to 0:
```
evantest=# begin;
BEGIN
evantest=*# INSERT INTO fk VALUES (2);
INSERT 0 1
evantest=*# commit;
ERROR: insert or update on table "fk" violates foreign key constraint "fk_a_fkey"
DETAIL: Key (a)=(2) is not present in table "pk".
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-07 02:12 ` Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-07 02:12 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Apr 7, 2026 at 10:46 AM Chao Li <[email protected]> wrote:
> > On Apr 6, 2026, at 17:45, Amit Langote <[email protected]> wrote:
> > On Fri, Apr 3, 2026 at 6:39 PM Amit Langote <[email protected]> wrote:
> >> On Fri, Apr 3, 2026 at 5:58 PM Chao Li <[email protected]> wrote:
> >>> I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
> >>>
> >>> I traced this scenario:
> >>> ```
> >>> CREATE TABLE pk (a int primary key);
> >>> CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
> >>> BEGIN;
> >>> INSERT INTO fk VALUES (1);
> >>> INSERT INTO pk VALUES (1);
> >>> COMMIT;
> >>> ```
> >>>
> >>> When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
> >>>
> >>> From the code:
> >>> ```
> >>> if (ri_fastpath_is_applicable(riinfo))
> >>> {
> >>> if (AfterTriggerIsActive())
> >>> {
> >>> /* Batched path: buffer and probe in groups */
> >>> ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
> >>> }
> >>> else
> >>> {
> >>> /* ALTER TABLE validation: per-row, no cache */
> >>> ri_FastPathCheck(riinfo, fk_rel, newslot);
> >>> }
> >>> return PointerGetDatum(NULL);
> >>> }
> >>> ```
> >>>
> >>> So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
> >>>
> >>> I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass.
> >>
> >> I think you might be right. Thanks for the patch. It looks correct
> >> to me at a glance, but I will need to check it a bit more closely
> >> before committing.
> >
> > Thinking about this some more, your fix is on the right track but
> > needs a bit more work -- MyTriggerDepth > 0 is too broad since it
> > fires for BEFORE triggers too. I have a revised version using a new
> > afterTriggerFiringDepth counter that I'll push shortly.
> >
> > Added an open item for tracking in the meantime:
> > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
>
> V2 looks good to me. Besides the normal cases, I also traced an abnormal case to verify that afterTriggerFiringDepth is always reset to 0:
> ```
> evantest=# begin;
> BEGIN
> evantest=*# INSERT INTO fk VALUES (2);
> INSERT 0 1
> evantest=*# commit;
> ERROR: insert or update on table "fk" violates foreign key constraint "fk_a_fkey"
> DETAIL: Key (a)=(2) is not present in table "pk".
> ```
Thanks for checking. Pushed.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-07 12:59 ` Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Evan Montgomery-Recht @ 2026-04-07 12:59 UTC (permalink / raw)
To: pgsql-hackers
Hi Amit,
First time contributing to this project, let me know if I missed
something or need to adjust what I put together.
I found a crash in the RI fast-path FK check code introduced by
2da86c1ef9b and extended by b7b27eb41a5. C-language extensions that
use SPI to INSERT into tables with multiple FK constraints hit an
assertion failure (or, without assertions, a server crash) when the
batch callback fires. I discovered this via PostGIS's topology CI --
toTopoGeom() uses SPI to insert into edge_data, which has 4 immediate
FK constraints referencing node and face. PG 18 passes the same test;
the master crashes.
This appears to be a separate issue from Chao Li's deferred-trigger
batching bug; the patches touch different files and don't conflict. I
did do a regression test on the merge referenced both PostgreSQL and
PostGIS (to validate that this works.)
The problem: ri_FastPathGetEntry() opens pk_rel/idx_rel and creates
TupleTableSlots, registering them with the current resource owner --
the SPI portal's. The batch callback ri_FastPathTeardown() only fires
when query_depth == 0 (via FireAfterTriggerBatchCallbacks), but by
that time the inner portal has finished and its resource owner has
released the relation references and TupleDesc pins. The teardown then
tries to close relations whose refcounts are already zero:
TRAP: failed Assert("rel->rd_refcnt > 0"), File: "relcache.c"
RelationDecrementReferenceCount
-> RelationClose -> index_close -> ri_FastPathTeardown
-> ri_FastPathEndBatch -> FireAfterTriggerBatchCallbacks
-> AfterTriggerEndQuery -> standard_ExecutorFinish
and TupleDesc pins that are no longer tracked by any resource owner
cause "tupdesc reference not owned by resource owner" errors.
Note that simple PL/pgSQL functions don't trigger this because
PL/pgSQL's SPI connection spans the entire function call, so the
portal's resource owner outlives the batch callback. The crash
requires nested SPI from a C extension, which creates a shorter-lived
portal.
The attached patch (against master, for application) fixes this by
transferring both relation references and TupleDesc pins from the
current resource owner to CurTransactionResourceOwner immediately
after creating them in ri_FastPathGetEntry(). The transfer uses
RelationIncrementReferenceCount / PinTupleDesc under the target owner
followed by RelationDecrementReferenceCount / ReleaseTupleDesc under
the original. I chose this over switching CurrentResourceOwner around
the table_open/index_open calls because the latter also affects
transient buffer pins acquired during catalog lookups inside those
functions.
ri_FastPathTeardown is updated to clear any buffered tuples (whose
buffer pins belong to the current resource owner) before switching to
CurTransactionResourceOwner for the close/drop operations.
The patch also adds a test module (test_spi_func) with a C function
that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
crash cannot be triggered from PL/pgSQL. The test exercises the
C-level SPI INSERT with multiple FK constraints, FK violations, and
nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
This is purely a correctness fix with no performance or backward
compatibility impact. No documentation changes are needed since this
is an internal bug fix.
The patch compiles cleanly and passes pgindent, clang-tidy, and
cppcheck. All 247 regression subtests pass, along with the full meson
test suite (370 ok, 0 fail, 21 skipped) (skipped due to hardware
availability on my side this week). I also verified the PostGIS
topology test (toTopoGeom) passes clean with no warnings, and tested
abort paths (FK violation, transaction rollback, subtransaction abort
via EXCEPTION blocks) (not in scope of PostgreSQL but more for my own
verification that things work). Code coverage on the new lines is
100%. Tested on macOS (aarch64) and Linux (aarch64, via Docker).
Unrelated to my patch, SonarCloud flagged a potential issue in
recheck_matched_pk_tuple() (line 3370): the function loops over
ii_NumIndexKeyAttrs elements of the skeys array, but the caller in
ri_FastPathFlushArray passes recheck_skey[1] -- an array of exactly
one element. This is safe because ri_FastPathFlushArray is the
single-column FK path, so ii_NumIndexKeyAttrs is always 1 there.
However, the function signature doesn't communicate this constraint,
which flags as CWE-125 (out-of-bounds read) / CERT C ARR30-C. Adding
an nkeys parameter (like ri_FastPathProbeOne already has) would make
the contract explicit.
Unrelated to PostgreSQL directly, I currently have a workaround for
the changes I'm making to PostGIS to test out some performance
enhancements. I left comments in the code so that if this gets
accepted, I can revert to a cleaner approach, as this appears to only
affect pg19 based on the testing I've done so far.
If there's a cleaner approach or a larger underlying issue, I'm
definitely willing to keep testing to find a better solution.
--
thanks, Evan Montgomery-Recht
On Mon, Apr 6, 2026 at 10:12 PM Amit Langote <[email protected]> wrote:
>
> On Tue, Apr 7, 2026 at 10:46 AM Chao Li <[email protected]> wrote:
> > > On Apr 6, 2026, at 17:45, Amit Langote <[email protected]> wrote:
> > > On Fri, Apr 3, 2026 at 6:39 PM Amit Langote <[email protected]> wrote:
> > >> On Fri, Apr 3, 2026 at 5:58 PM Chao Li <[email protected]> wrote:
> > >>> I spent several hours debugging this patch today, and I found a problem where the batch mode doesn't seem to handle deferred RI triggers, although the commit message suggests that it should.
> > >>>
> > >>> I traced this scenario:
> > >>> ```
> > >>> CREATE TABLE pk (a int primary key);
> > >>> CREATE TABLE fk (a int references pk(a) DEFERRABLE INITIALLY DEFERRED);
> > >>> BEGIN;
> > >>> INSERT INTO fk VALUES (1);
> > >>> INSERT INTO pk VALUES (1);
> > >>> COMMIT;
> > >>> ```
> > >>>
> > >>> When COMMIT is executed, it reaches RI_FKey_check(), where AfterTriggerIsActive() checks whether afterTriggers.query_depth >= 0. But in the deferred case, afterTriggers.query_depth is -1.
> > >>>
> > >>> From the code:
> > >>> ```
> > >>> if (ri_fastpath_is_applicable(riinfo))
> > >>> {
> > >>> if (AfterTriggerIsActive())
> > >>> {
> > >>> /* Batched path: buffer and probe in groups */
> > >>> ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
> > >>> }
> > >>> else
> > >>> {
> > >>> /* ALTER TABLE validation: per-row, no cache */
> > >>> ri_FastPathCheck(riinfo, fk_rel, newslot);
> > >>> }
> > >>> return PointerGetDatum(NULL);
> > >>> }
> > >>> ```
> > >>>
> > >>> So this ends up falling back to the per-row path for deferred RI checks at COMMIT, even though the intent here seems to be only to bypass the ALTER TABLE validation case, where batch callbacks would never fire, and MyTriggerDepth is 0. So, maybe we can just check MyTriggerDepth>0 in AfterTriggerIsActive().
> > >>>
> > >>> I tried the attached fix. With it, deferred triggers go through the batch mode, and all existing tests still pass.
> > >>
> > >> I think you might be right. Thanks for the patch. It looks correct
> > >> to me at a glance, but I will need to check it a bit more closely
> > >> before committing.
> > >
> > > Thinking about this some more, your fix is on the right track but
> > > needs a bit more work -- MyTriggerDepth > 0 is too broad since it
> > > fires for BEFORE triggers too. I have a revised version using a new
> > > afterTriggerFiringDepth counter that I'll push shortly.
> > >
> > > Added an open item for tracking in the meantime:
> > > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
> >
> > V2 looks good to me. Besides the normal cases, I also traced an abnormal case to verify that afterTriggerFiringDepth is always reset to 0:
> > ```
> > evantest=# begin;
> > BEGIN
> > evantest=*# INSERT INTO fk VALUES (2);
> > INSERT 0 1
> > evantest=*# commit;
> > ERROR: insert or update on table "fk" violates foreign key constraint "fk_a_fkey"
> > DETAIL: Key (a)=(2) is not present in table "pk".
> > ```
>
> Thanks for checking. Pushed.
>
> --
> Thanks, Amit Langote
>
>
Attachments:
[application/octet-stream] v1-0001-Fix-RI-fast-path-crash-when-FK-triggers-fire-unde.patch (16.2K, 2-v1-0001-Fix-RI-fast-path-crash-when-FK-triggers-fire-unde.patch)
download | inline diff:
From 396ffef4af4dea6268fc74b4b5a5f48b4b04c892 Mon Sep 17 00:00:00 2001
From: Evan Montgomery-Recht <[email protected]>
Date: Sun, 5 Apr 2026 22:23:26 -0400
Subject: [PATCH v1] Fix RI fast-path crash when FK triggers fire under nested
SPI
The fast-path FK check code (ri_FastPathGetEntry) opens PK relations,
creates TupleTableSlots, and caches them in RI_FastPathEntry for the
duration of a trigger batch. The batch callback (ri_FastPathTeardown)
that closes them fires only at query_depth == 0 via
FireAfterTriggerBatchCallbacks().
When FK triggers fire inside a nested SPI context -- for example, a
C-language function such as PostGIS's topogeo_addLineString() that
uses SPI_connect/SPI_execute to INSERT into a table with multiple FK
constraints -- the relations and TupleDesc pins are registered with
the SPI portal's resource owner. When that portal finishes and its
resource owner is released, the references are decremented. Later,
when the batch callback fires at query_depth == 0, ri_FastPathTeardown
attempts to close relations whose reference counts are already zero,
triggering:
TRAP: failed Assert("rel->rd_refcnt > 0")
in RelationDecrementReferenceCount, called from index_close
and TupleDesc pins that are no longer tracked by any resource owner
cause "tupdesc reference not owned by resource owner" errors.
Fix by transferring both relation references and TupleDesc pins from
the current (inner) resource owner to CurTransactionResourceOwner
immediately after creating them. ri_FastPathTeardown is updated to
clear any buffered tuples (whose buffer pins belong to the current
resource owner) before switching to CurTransactionResourceOwner for
the close/drop operations.
The transfer uses RelationIncrementReferenceCount / PinTupleDesc under
the target owner followed by RelationDecrementReferenceCount /
ReleaseTupleDesc under the original, rather than switching
CurrentResourceOwner around the table_open/index_open calls, because
the latter would also affect transient buffer pins acquired during
catalog lookups inside those functions.
Add a test module (test_spi_func) with a C function that executes
SQL via SPI, reproducing the crash scenario that simple PL/pgSQL
cannot trigger.
Bug found via PostGIS topology CI: toTopoGeom() -> SPI INSERT into
edge_data (4 immediate FK constraints) crashes on PG 19devel but
passes on PG 18.
Discussion: https://postgr.es/m/CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 59 ++++++++++++++
src/test/modules/meson.build | 1 +
.../test_spi_func/expected/ri_fastpath.out | 79 +++++++++++++++++++
src/test/modules/test_spi_func/meson.build | 31 ++++++++
.../modules/test_spi_func/sql/ri_fastpath.sql | 65 +++++++++++++++
.../test_spi_func/test_spi_func--1.0.sql | 9 +++
.../modules/test_spi_func/test_spi_func.c | 51 ++++++++++++
.../test_spi_func/test_spi_func.control | 4 +
8 files changed, 299 insertions(+)
create mode 100644 src/test/modules/test_spi_func/expected/ri_fastpath.out
create mode 100644 src/test/modules/test_spi_func/meson.build
create mode 100644 src/test/modules/test_spi_func/sql/ri_fastpath.sql
create mode 100644 src/test/modules/test_spi_func/test_spi_func--1.0.sql
create mode 100644 src/test/modules/test_spi_func/test_spi_func.c
create mode 100644 src/test/modules/test_spi_func/test_spi_func.control
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 84f9fecdb4c..94346892151 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -52,6 +52,7 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+#include "utils/resowner.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
#include "utils/snapmgr.h"
@@ -4148,6 +4149,25 @@ ri_FastPathTeardown(void)
hash_seq_init(&status, ri_fastpath_cache);
while ((entry = hash_seq_search(&status)) != NULL)
{
+ ResourceOwner oldowner;
+
+ /*
+ * First, clear any buffered tuple from the slots. This must happen
+ * under the current resource owner because buffer pins from the last
+ * index scan belong to it.
+ */
+ if (entry->pk_slot)
+ ExecClearTuple(entry->pk_slot);
+ if (entry->fk_slot)
+ ExecClearTuple(entry->fk_slot);
+
+ /*
+ * Now switch to CurTransactionResourceOwner for closing relations and
+ * dropping slots, since that's where their refs were transferred in
+ * ri_FastPathGetEntry().
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = CurTransactionResourceOwner;
if (entry->idx_rel)
index_close(entry->idx_rel, NoLock);
if (entry->pk_rel)
@@ -4156,6 +4176,7 @@ ri_FastPathTeardown(void)
ExecDropSingleTupleTableSlot(entry->pk_slot);
if (entry->fk_slot)
ExecDropSingleTupleTableSlot(entry->fk_slot);
+ CurrentResourceOwner = oldowner;
if (entry->flush_cxt)
MemoryContextDelete(entry->flush_cxt);
}
@@ -4271,6 +4292,44 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
entry->fk_slot = MakeSingleTupleTableSlot(RelationGetDescr(fk_rel),
&TTSOpsHeapTuple);
+ /*
+ * Transfer relation and TupleDesc references from the current
+ * resource owner to CurTransactionResourceOwner so they survive
+ * cleanup of inner resource owners (e.g., SPI portals from C-language
+ * functions). The batch callback that closes them
+ * (ri_FastPathTeardown) fires at query_depth == 0, which may be long
+ * after the resource owner that was current when the trigger fired
+ * has been released.
+ *
+ * We open relations and create slots under the current resource owner
+ * (to avoid affecting transient buffer pins from catalog lookups),
+ * then transfer the relation refs and TupleDesc pins by incrementing
+ * under the target owner and decrementing under the original.
+ *
+ * Relation TupleDescs (rd_att) are reference-counted (tdrefcount >=
+ * 1), so PinTupleDesc inside table_slot_create /
+ * MakeSingleTupleTableSlot registers them with the resource owner.
+ * These must also be transferred.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ {
+ ResourceOwner saved = CurrentResourceOwner;
+
+ /* Add refs under CurTransactionResourceOwner */
+ CurrentResourceOwner = CurTransactionResourceOwner;
+ RelationIncrementReferenceCount(entry->pk_rel);
+ RelationIncrementReferenceCount(entry->idx_rel);
+ PinTupleDesc(entry->pk_slot->tts_tupleDescriptor);
+ PinTupleDesc(entry->fk_slot->tts_tupleDescriptor);
+
+ /* Remove refs from the original resource owner */
+ CurrentResourceOwner = saved;
+ RelationDecrementReferenceCount(entry->pk_rel);
+ RelationDecrementReferenceCount(entry->idx_rel);
+ ReleaseTupleDesc(entry->pk_slot->tts_tupleDescriptor);
+ ReleaseTupleDesc(entry->fk_slot->tts_tupleDescriptor);
+ }
+
entry->flush_cxt = AllocSetContextCreate(TopTransactionContext,
"RI fast path flush temporary context",
ALLOCSET_SMALL_SIZES);
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..3d5d016c46f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -53,6 +53,7 @@ subdir('test_saslprep')
subdir('test_shmem')
subdir('test_shm_mq')
subdir('test_slru')
+subdir('test_spi_func')
subdir('test_tidstore')
subdir('typcache')
subdir('unsafe_tests')
diff --git a/src/test/modules/test_spi_func/expected/ri_fastpath.out b/src/test/modules/test_spi_func/expected/ri_fastpath.out
new file mode 100644
index 00000000000..5b2bf9e9310
--- /dev/null
+++ b/src/test/modules/test_spi_func/expected/ri_fastpath.out
@@ -0,0 +1,79 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context (SPI_connect/SPI_execute/SPI_finish), the inner
+-- resource owner is released before the batch callback that closes
+-- those relations fires at query_depth == 0. Without the fix, this
+-- crashes with Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does NOT trigger this because its SPI connection
+-- outlives the batch callback. A C function using SPI is required.
+--
+CREATE EXTENSION test_spi_func;
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec
+----------
+
+(1 row)
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec
+----------
+
+(1 row)
+
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec
+----------
+
+(1 row)
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+ERROR: insert or update on table "ri_fp_fk" violates foreign key constraint "ri_fp_fk_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)"
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+SELECT plpgsql_calls_c_spi();
+ plpgsql_calls_c_spi
+---------------------
+
+(1 row)
+
+-- Cleanup
+DROP FUNCTION plpgsql_calls_c_spi();
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_func;
diff --git a/src/test/modules/test_spi_func/meson.build b/src/test/modules/test_spi_func/meson.build
new file mode 100644
index 00000000000..939edc898a4
--- /dev/null
+++ b/src/test/modules/test_spi_func/meson.build
@@ -0,0 +1,31 @@
+test_spi_func_sources = files(
+ 'test_spi_func.c',
+)
+
+if host_system == 'windows'
+ test_spi_func_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_spi_func',
+ '--FILEDESC', 'test_spi_func - SQL-callable C SPI function',])
+endif
+
+test_spi_func = shared_module('test_spi_func',
+ test_spi_func_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_spi_func
+
+test_install_data += files(
+ 'test_spi_func.control',
+ 'test_spi_func--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_spi_func',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'ri_fastpath',
+ ],
+ },
+}
diff --git a/src/test/modules/test_spi_func/sql/ri_fastpath.sql b/src/test/modules/test_spi_func/sql/ri_fastpath.sql
new file mode 100644
index 00000000000..002f4ad5e52
--- /dev/null
+++ b/src/test/modules/test_spi_func/sql/ri_fastpath.sql
@@ -0,0 +1,65 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context (SPI_connect/SPI_execute/SPI_finish), the inner
+-- resource owner is released before the batch callback that closes
+-- those relations fires at query_depth == 0. Without the fix, this
+-- crashes with Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does NOT trigger this because its SPI connection
+-- outlives the batch callback. A C function using SPI is required.
+--
+
+CREATE EXTENSION test_spi_func;
+
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT plpgsql_calls_c_spi();
+
+-- Cleanup
+DROP FUNCTION plpgsql_calls_c_spi();
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_func;
diff --git a/src/test/modules/test_spi_func/test_spi_func--1.0.sql b/src/test/modules/test_spi_func/test_spi_func--1.0.sql
new file mode 100644
index 00000000000..d5d67974d5b
--- /dev/null
+++ b/src/test/modules/test_spi_func/test_spi_func--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_spi_func/test_spi_func--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_spi_func" to load this file. \quit
+
+CREATE FUNCTION spi_exec(query text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'spi_exec'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_spi_func/test_spi_func.c b/src/test/modules/test_spi_func/test_spi_func.c
new file mode 100644
index 00000000000..51f4b9c4f73
--- /dev/null
+++ b/src/test/modules/test_spi_func/test_spi_func.c
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_spi_func.c
+ * SQL-callable C function that uses SPI to execute a query.
+ *
+ * Useful for testing code paths that only trigger under C-level
+ * SPI (not PL/pgSQL), such as resource owner interactions with
+ * RI fast-path FK checks.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_spi_func/test_spi_func.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/spi.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(spi_exec);
+
+/*
+ * spi_exec(query text) - execute a SQL query via SPI.
+ *
+ * Opens a fresh SPI connection, executes the query, and closes the
+ * connection. This mimics the SPI usage pattern of C-language
+ * extensions (e.g., PostGIS topology functions) where each call
+ * to SPI_connect / SPI_execute / SPI_finish creates and destroys
+ * a short-lived SPI context.
+ */
+Datum
+spi_exec(PG_FUNCTION_ARGS)
+{
+ const char *query = text_to_cstring(PG_GETARG_TEXT_PP(0));
+ int ret;
+
+ SPI_connect();
+
+ ret = SPI_execute(query, false, 0);
+
+ if (ret < 0)
+ elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+ SPI_finish();
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_spi_func/test_spi_func.control b/src/test/modules/test_spi_func/test_spi_func.control
new file mode 100644
index 00000000000..87bd9dc9782
--- /dev/null
+++ b/src/test/modules/test_spi_func/test_spi_func.control
@@ -0,0 +1,4 @@
+comment = 'Test SQL-callable C function that uses SPI'
+default_version = '1.0'
+module_pathname = '$libdir/test_spi_func'
+relocatable = true
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
@ 2026-04-08 01:23 ` Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 09:29 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-08 01:23 UTC (permalink / raw)
To: Evan Montgomery-Recht <[email protected]>; +Cc: pgsql-hackers
Hi Evan,
On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
<[email protected]> wrote:
>
> Hi Amit,
>
> First time contributing to this project, let me know if I missed
> something or need to adjust what I put together.
>
> I found a crash in the RI fast-path FK check code introduced by
> 2da86c1ef9b and extended by b7b27eb41a5. C-language extensions that
> use SPI to INSERT into tables with multiple FK constraints hit an
> assertion failure (or, without assertions, a server crash) when the
> batch callback fires. I discovered this via PostGIS's topology CI --
> toTopoGeom() uses SPI to insert into edge_data, which has 4 immediate
> FK constraints referencing node and face. PG 18 passes the same test;
> the master crashes.
>
> This appears to be a separate issue from Chao Li's deferred-trigger
> batching bug; the patches touch different files and don't conflict. I
> did do a regression test on the merge referenced both PostgreSQL and
> PostGIS (to validate that this works.)
>
> The problem: ri_FastPathGetEntry() opens pk_rel/idx_rel and creates
> TupleTableSlots, registering them with the current resource owner --
> the SPI portal's. The batch callback ri_FastPathTeardown() only fires
> when query_depth == 0 (via FireAfterTriggerBatchCallbacks), but by
> that time the inner portal has finished and its resource owner has
> released the relation references and TupleDesc pins. The teardown then
> tries to close relations whose refcounts are already zero:
>
> TRAP: failed Assert("rel->rd_refcnt > 0"), File: "relcache.c"
>
> RelationDecrementReferenceCount
> -> RelationClose -> index_close -> ri_FastPathTeardown
> -> ri_FastPathEndBatch -> FireAfterTriggerBatchCallbacks
> -> AfterTriggerEndQuery -> standard_ExecutorFinish
>
> and TupleDesc pins that are no longer tracked by any resource owner
> cause "tupdesc reference not owned by resource owner" errors.
>
> Note that simple PL/pgSQL functions don't trigger this because
> PL/pgSQL's SPI connection spans the entire function call, so the
> portal's resource owner outlives the batch callback. The crash
> requires nested SPI from a C extension, which creates a shorter-lived
> portal.
>
> The attached patch (against master, for application) fixes this by
> transferring both relation references and TupleDesc pins from the
> current resource owner to CurTransactionResourceOwner immediately
> after creating them in ri_FastPathGetEntry(). The transfer uses
> RelationIncrementReferenceCount / PinTupleDesc under the target owner
> followed by RelationDecrementReferenceCount / ReleaseTupleDesc under
> the original. I chose this over switching CurrentResourceOwner around
> the table_open/index_open calls because the latter also affects
> transient buffer pins acquired during catalog lookups inside those
> functions.
>
> ri_FastPathTeardown is updated to clear any buffered tuples (whose
> buffer pins belong to the current resource owner) before switching to
> CurTransactionResourceOwner for the close/drop operations.
>
> The patch also adds a test module (test_spi_func) with a C function
> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> crash cannot be triggered from PL/pgSQL. The test exercises the
> C-level SPI INSERT with multiple FK constraints, FK violations, and
> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>
> This is purely a correctness fix with no performance or backward
> compatibility impact. No documentation changes are needed since this
> is an internal bug fix.
>
> The patch compiles cleanly and passes pgindent, clang-tidy, and
> cppcheck. All 247 regression subtests pass, along with the full meson
> test suite (370 ok, 0 fail, 21 skipped) (skipped due to hardware
> availability on my side this week). I also verified the PostGIS
> topology test (toTopoGeom) passes clean with no warnings, and tested
> abort paths (FK violation, transaction rollback, subtransaction abort
> via EXCEPTION blocks) (not in scope of PostgreSQL but more for my own
> verification that things work). Code coverage on the new lines is
> 100%. Tested on macOS (aarch64) and Linux (aarch64, via Docker).
Thanks for the report and the patch.
I'll need to study this one a bit more closely. Added an open item
for the time being:
https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items
> Unrelated to my patch, SonarCloud flagged a potential issue in
> recheck_matched_pk_tuple() (line 3370): the function loops over
> ii_NumIndexKeyAttrs elements of the skeys array, but the caller in
> ri_FastPathFlushArray passes recheck_skey[1] -- an array of exactly
> one element. This is safe because ri_FastPathFlushArray is the
>
> single-column FK path, so ii_NumIndexKeyAttrs is always 1 there.
> However, the function signature doesn't communicate this constraint,
> which flags as CWE-125 (out-of-bounds read) / CERT C ARR30-C. Adding
> an nkeys parameter (like ri_FastPathProbeOne already has) would make
> the contract explicit.
Makes sense. Will push the attached patch for this.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0001-Add-nkeys-parameter-to-recheck_matched_pk_tuple.patch (3.0K, 2-v1-0001-Add-nkeys-parameter-to-recheck_matched_pk_tuple.patch)
download | inline diff:
From 344ba694451fbaa57bbb2e12930fe20fd425e806 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 10:20:15 +0900
Subject: [PATCH v1] Add nkeys parameter to recheck_matched_pk_tuple()
The function looped over ii_NumIndexKeyAttrs elements of the skeys
array, but one caller (ri_FastPathFlushArray) passes a one-element
array since it only handles single-column FKs. The function
signature did not communicate this constraint, which static analysis
flags as a potential out-of-bounds read.
Add an nkeys parameter and assert that it matches
ii_NumIndexKeyAttrs, then use it in the loop. The call sites
already know the key count.
Reported-by: Evan Montgomery-Recht <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/backend/utils/adt/ri_triggers.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 84f9fecdb4c..18ec858357d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -329,7 +329,7 @@ static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
static void ri_CheckPermissions(Relation query_rel);
static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
- TupleTableSlot *new_slot);
+ int nkeys, TupleTableSlot *new_slot);
static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
Relation idx_rel, Datum *pk_vals,
char *pk_nulls, ScanKey skeys);
@@ -3138,7 +3138,7 @@ ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
idx_rel->rd_indcollation[0],
fpmeta->regops[0],
found_val);
- if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, pk_slot))
+ if (!recheck_matched_pk_tuple(idx_rel, recheck_skey, 1, pk_slot))
continue;
}
@@ -3193,7 +3193,7 @@ ri_FastPathProbeOne(Relation pk_rel, Relation idx_rel,
&concurrently_updated))
{
if (concurrently_updated)
- found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ found = recheck_matched_pk_tuple(idx_rel, skey, nkeys, slot);
else
found = true;
}
@@ -3340,7 +3340,7 @@ ri_CheckPermissions(Relation query_rel)
* not found.
*/
static bool
-recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys, int nkeys,
TupleTableSlot *new_slot)
{
/*
@@ -3359,8 +3359,9 @@ recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
indexInfo->ii_ExclusionOps == NULL);
/* Form the index values and isnull flags given the table tuple. */
+ Assert(nkeys == indexInfo->ii_NumIndexKeyAttrs);
FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
- for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ for (int i = 0; i < nkeys; i++)
{
ScanKeyData *skey = &skeys[i];
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-08 09:58 ` Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-08 09:58 UTC (permalink / raw)
To: Evan Montgomery-Recht <[email protected]>; +Cc: pgsql-hackers
On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> <[email protected]> wrote:
> > The patch also adds a test module (test_spi_func) with a C function
> > that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> > crash cannot be triggered from PL/pgSQL. The test exercises the
> > C-level SPI INSERT with multiple FK constraints, FK violations, and
> > nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
I applied only the test module changes and it passes (without
crashing) even without your proposed fix. It seems that's because the
C function in test_spi_func calling SPI is using the same resource
owner as the parent SELECT. I think you'd need to create a resource
owner manually in the spi_exec() C function to reproduce the crash, as
done in the attached 0001, which contains the src/test changes
extracted from your patch modified as described, including renaming
the C function to spi_exec_sql().
Also, the test cases that call spi_exec() (_sql()) directly from a
SELECT don't actually exercise the crash path because there is no
outer trigger-firing loop active. query_depth is 0 inside the inner
SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
callback there anyway. The critical case requires spi_exec_sql() to be
called from inside an AFTER trigger, where query_depth > 0 causes the
guard to defer the callback past the inner resource owner's lifetime.
I've added that test case. I kept your original test cases as they
still provide useful coverage of C-level SPI FK behavior even if they
don't exercise the crash path specifically. Maybe your original
PostGIS test suite that hit the crash did have the right structure,
but that's not reflected in the patch as far as I can tell.
I've also renamed the module to test_spi_resowner to better reflect
what it's about.
For the fix, I have a different proposal. As you observed, the
query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
that the nested SPI's callbacks get called under the outer resource
owner, which may not be the same as the one that SPI used. I think it
was a mistake to have that early return in the first place. Instead we
could remember for each callback what firing level it should be called
at, so the nested SPI's callbacks fire before returning to the parent
level and parent-level callbacks fire when the parent level completes.
I have implemented that in the attached 0002 along with transaction
boundary cleanup of callbacks, which passes the check-world for me,
but I'll need to stare some more at it before committing.
Let me know if this also fixes your own in-house test suite or if you
have any other suggestions or if you think I am missing something.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v2-0001-Modified-test-suite-from-Evan-s-patch.patch (17.1K, 2-v2-0001-Modified-test-suite-from-Evan-s-patch.patch)
download | inline diff:
From 2da74146aafbd9d505e9e0d9038138bc46f0cd08 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 13:53:08 +0900
Subject: [PATCH v2 1/2] Modified test suite from Evan's patch
The C function in the original test module was using the same resource
owner as the parent SELECT, so the crash could not be reproduced.
Added a dedicated resource owner around the SPI call to ensure the inner
resource owner is released before the outer trigger-firing batch callback
fires, which is necessary to trigger the crash this test is meant to
cover.
---
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_spi_resowner/Makefile | 23 ++++
.../expected/ri_fastpath.out | 118 ++++++++++++++++++
.../modules/test_spi_resowner/meson.build | 31 +++++
.../test_spi_resowner/sql/ri_fastpath.sql | 107 ++++++++++++++++
.../test_spi_resowner--1.0.sql | 9 ++
.../test_spi_resowner/test_spi_resowner.c | 70 +++++++++++
.../test_spi_resowner.control | 4 +
9 files changed, 364 insertions(+)
create mode 100644 src/test/modules/test_spi_resowner/Makefile
create mode 100644 src/test/modules/test_spi_resowner/expected/ri_fastpath.out
create mode 100644 src/test/modules/test_spi_resowner/meson.build
create mode 100644 src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.c
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 0a74ab5c86f..016b328c8c5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -52,6 +52,7 @@ SUBDIRS = \
test_shmem \
test_shm_mq \
test_slru \
+ test_spi_resowner \
test_tidstore \
unsafe_tests \
worker_spi \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..3ca454064d0 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -53,6 +53,7 @@ subdir('test_saslprep')
subdir('test_shmem')
subdir('test_shm_mq')
subdir('test_slru')
+subdir('test_spi_resowner')
subdir('test_tidstore')
subdir('typcache')
subdir('unsafe_tests')
diff --git a/src/test/modules/test_spi_resowner/Makefile b/src/test/modules/test_spi_resowner/Makefile
new file mode 100644
index 00000000000..5a69e3a3c42
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_spi_resowner/Makefile
+
+MODULE_big = test_spi_resowner
+OBJS = \
+ $(WIN32RES) \
+ test_spi_resowner.o
+PGFILEDESC = "test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner"
+
+EXTENSION = test_spi_resowner
+DATA = test_spi_resowner--1.0.sql
+
+REGRESS = ri_fastpath
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_spi_resowner
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_spi_resowner/expected/ri_fastpath.out b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
new file mode 100644
index 00000000000..ad6b0f7c9b3
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
@@ -0,0 +1,118 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+ERROR: insert or update on table "ri_fp_fk" violates foreign key constraint "ri_fp_fk_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)"
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+SELECT plpgsql_calls_c_spi();
+ plpgsql_calls_c_spi
+---------------------
+
+(1 row)
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's firing depth and
+-- must fire before the inner resource owner is released. This exercises
+-- the depth-matched callback firing introduced to fix that crash.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner; the FK batch callback fires at the
+-- inner SPI's firing depth before that resource owner is released.
+INSERT INTO ri_fp_outer VALUES (1);
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+ERROR: insert or update on table "ri_fp_inner" violates foreign key constraint "ri_fp_inner_id_fkey"
+DETAIL: Key (id)=(3) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_inner VALUES (3)"
+SQL statement "SELECT spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)')"
+PL/pgSQL function outer_trigger_spi_fail() line 3 at PERFORM
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/meson.build b/src/test/modules/test_spi_resowner/meson.build
new file mode 100644
index 00000000000..fbb027e05c7
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/meson.build
@@ -0,0 +1,31 @@
+test_spi_resowner_sources = files(
+ 'test_spi_resowner.c',
+)
+
+if host_system == 'windows'
+ test_spi_resowner_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_spi_resowner',
+ '--FILEDESC', 'test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner',])
+endif
+
+test_spi_resowner = shared_module('test_spi_resowner',
+ test_spi_resowner_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_spi_resowner
+
+test_install_data += files(
+ 'test_spi_resowner.control',
+ 'test_spi_resowner--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_spi_resowner',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'ri_fastpath',
+ ],
+ },
+}
diff --git a/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
new file mode 100644
index 00000000000..4517b2437c4
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
@@ -0,0 +1,107 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT plpgsql_calls_c_spi();
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's firing depth and
+-- must fire before the inner resource owner is released. This exercises
+-- the depth-matched callback firing introduced to fix that crash.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner; the FK batch callback fires at the
+-- inner SPI's firing depth before that resource owner is released.
+INSERT INTO ri_fp_outer VALUES (1);
+
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
new file mode 100644
index 00000000000..29ef70ee0dc
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_spi_resowner" to load this file. \quit
+
+CREATE FUNCTION spi_exec_sql(query text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'spi_exec_sql'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.c b/src/test/modules/test_spi_resowner/test_spi_resowner.c
new file mode 100644
index 00000000000..0306139b5c0
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.c
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_spi_resowner.c
+ * SQL-callable C function that uses SPI to execute a query.
+ *
+ * Useful for testing code paths that only trigger under C-level
+ * SPI (not PL/pgSQL), such as resource owner interactions with
+ * RI fast-path FK checks.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_spi_resowner/test_spi_resowner.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/spi.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(spi_exec_sql);
+
+/*
+ * spi_exec_sql(query text) - execute a SQL query via SPI.
+ *
+ * Opens a fresh SPI connection, executes the query, and closes the
+ * connection. Creates a dedicated child resource owner around the
+ * SPI_execute call and releases it before returning, ensuring that
+ * any resources registered under it (such as relation references
+ * opened by RI fast-path FK checks) are released before the outer
+ * trigger-firing batch callback fires. This reproduces the resource
+ * owner mismatch that occurs with C-language extensions like PostGIS
+ * topology functions, which cannot be triggered from PL/pgSQL since
+ * PL/pgSQL's SPI connection spans the entire function call.
+ */
+Datum
+spi_exec_sql(PG_FUNCTION_ARGS)
+{
+ const char *query = text_to_cstring(PG_GETARG_TEXT_PP(0));
+ int ret;
+ ResourceOwner save = CurrentResourceOwner;
+ ResourceOwner childowner = ResourceOwnerCreate(save, "test_spi inner");
+
+ SPI_connect();
+
+ CurrentResourceOwner = childowner;
+ ret = SPI_execute(query, false, 0);
+
+ if (ret < 0)
+ elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+ SPI_finish();
+
+ CurrentResourceOwner = save;
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_BEFORE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_AFTER_LOCKS,
+ true, false);
+ ResourceOwnerDelete(childowner);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.control b/src/test/modules/test_spi_resowner/test_spi_resowner.control
new file mode 100644
index 00000000000..2120ae9442f
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.control
@@ -0,0 +1,4 @@
+comment = 'Test SQL-callable C function that uses SPI using dedicated ResourceOwner'
+default_version = '1.0'
+module_pathname = '$libdir/test_spi_resowner'
+relocatable = true
--
2.47.3
[application/octet-stream] v2-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch (5.6K, 3-v2-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch)
download | inline diff:
From 312fad1c36e064ab9e7dc1780575e8c07f300751 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 18:17:40 +0900
Subject: [PATCH v2 2/2] Fix RI fast-path crash under nested C-level SPI
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers
register ri_FastPathEndBatch as a batch callback and open PK relations
under the SPI portal's resource owner. FireAfterTriggerBatchCallbacks
was suppressed at that point by the query_depth > 0 guard, deferring
teardown to the outer query's AfterTriggerEndQuery. By then the SPI
portal's resource owner had been released, decrementing the cached
relations' refcounts to zero. ri_FastPathTeardown then crashed
attempting to close them:
TRAP: failed Assert("rel->rd_refcnt > 0")
Fix by tagging each AfterTriggerCallbackItem with the
afterTriggerFiringDepth (added in 5c54c3ed1b9) at registration time
and firing only callbacks whose depth matches the current depth. This
replaces the query_depth > 0 suppression guard. Callbacks now fire at
the same firing depth at which they were registered, while the
resource owner that was active during registration is still alive,
eliminating the mismatch.
While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a5c: assert on commit
that no callbacks remain unfired, and discard any remaining callbacks
on transaction abort. Also restructure FireAfterTriggerBatchCallbacks()
to update afterTriggers.batch_callbacks before invoking any callbacks,
so that if a callback throws an ERROR the list is already in a
consistent state.
Note that ri_PerformCheck() uses fire_triggers=false, which skips
AfterTriggerBeginQuery/EndQuery and thus never increments
afterTriggerFiringDepth; events queued there fire at the outer
query's depth and are unaffected by this change.
Reported-by: Evan Montgomery-Recht <[email protected]>
Author: Evan Montgomery-Recht <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/backend/commands/trigger.c | 54 +++++++++++++++++++++++++++-------
1 file changed, 43 insertions(+), 11 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index c41005ba44e..f59537fe86e 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3935,6 +3935,8 @@ struct AfterTriggersTableData
typedef struct AfterTriggerCallbackItem
{
AfterTriggerBatchCallback callback;
+ int firing_depth; /* afterTriggerFiringDepth when registered;
+ * callback fires only at this depth */
void *arg;
} AfterTriggerCallbackItem;
@@ -5419,6 +5421,15 @@ AfterTriggerEndXact(bool isCommit)
afterTriggers.query_depth = -1;
afterTriggerFiringDepth = 0;
+
+ Assert(afterTriggers.batch_callbacks == NIL || !isCommit);
+
+ /* On abort, discard any pending callbacks without firing them. */
+ if (!isCommit)
+ {
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+ }
}
/*
@@ -6830,6 +6841,7 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
+ item->firing_depth = afterTriggerFiringDepth;
item->arg = arg;
afterTriggers.batch_callbacks =
lappend(afterTriggers.batch_callbacks, item);
@@ -6838,31 +6850,51 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
/*
* FireAfterTriggerBatchCallbacks
- * Invoke and clear all registered batch callbacks.
+ * Invoke callbacks registered at the current firing depth.
+ *
+ * Each callback is tagged with the afterTriggerFiringDepth at registration
+ * time. Only callbacks matching the current depth are invoked; the rest
+ * are retained for when their own depth fires. This ensures that nested
+ * trigger-firing contexts (e.g., SPI calls inside AFTER triggers) only
+ * fire the callbacks they registered, leaving outer-level callbacks intact
+ * until their firing depth is reached.
*
- * Only fires at the outermost query level (query_depth == 0) or from
- * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
- * at COMMIT). Nested queries from SPI inside AFTER triggers run at
- * depth > 0 and must not tear down resources the outer batch still needs.
+ * The list is updated before any callbacks are invoked so that if a
+ * callback throws an ERROR the list is already in a consistent state.
*/
static void
FireAfterTriggerBatchCallbacks(void)
{
+ List *remaining = NIL;
+ List *to_fire = NIL;
ListCell *lc;
- if (afterTriggers.query_depth > 0)
- return;
+ /* remaining and to_fire lists must survive until callbacks complete */
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- Assert(afterTriggerFiringDepth > 0);
foreach(lc, afterTriggers.batch_callbacks)
{
AfterTriggerCallbackItem *item = lfirst(lc);
- item->callback(item->arg);
+ if (item->firing_depth == afterTriggerFiringDepth)
+ to_fire = lappend(to_fire, item);
+ else
+ remaining = lappend(remaining, item);
}
- list_free_deep(afterTriggers.batch_callbacks);
- afterTriggers.batch_callbacks = NIL;
+ list_free(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = remaining;
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Now fire; if one throws, the list is already clean */
+ foreach(lc, to_fire)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ pfree(item);
+ }
+ list_free(to_fire);
}
/*
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-08 14:26 ` Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-08 14:26 UTC (permalink / raw)
To: Evan Montgomery-Recht <[email protected]>; +Cc: pgsql-hackers
On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
> > On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> > <[email protected]> wrote:
> > > The patch also adds a test module (test_spi_func) with a C function
> > > that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> > > crash cannot be triggered from PL/pgSQL. The test exercises the
> > > C-level SPI INSERT with multiple FK constraints, FK violations, and
> > > nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>
> I applied only the test module changes and it passes (without
> crashing) even without your proposed fix. It seems that's because the
> C function in test_spi_func calling SPI is using the same resource
> owner as the parent SELECT. I think you'd need to create a resource
> owner manually in the spi_exec() C function to reproduce the crash, as
> done in the attached 0001, which contains the src/test changes
> extracted from your patch modified as described, including renaming
> the C function to spi_exec_sql().
>
> Also, the test cases that call spi_exec() (_sql()) directly from a
> SELECT don't actually exercise the crash path because there is no
> outer trigger-firing loop active. query_depth is 0 inside the inner
> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
> callback there anyway. The critical case requires spi_exec_sql() to be
> called from inside an AFTER trigger, where query_depth > 0 causes the
> guard to defer the callback past the inner resource owner's lifetime.
> I've added that test case. I kept your original test cases as they
> still provide useful coverage of C-level SPI FK behavior even if they
> don't exercise the crash path specifically. Maybe your original
> PostGIS test suite that hit the crash did have the right structure,
> but that's not reflected in the patch as far as I can tell.
>
> I've also renamed the module to test_spi_resowner to better reflect
> what it's about.
>
> For the fix, I have a different proposal. As you observed, the
> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
> that the nested SPI's callbacks get called under the outer resource
> owner, which may not be the same as the one that SPI used. I think it
> was a mistake to have that early return in the first place. Instead we
> could remember for each callback what firing level it should be called
> at, so the nested SPI's callbacks fire before returning to the parent
> level and parent-level callbacks fire when the parent level completes.
> I have implemented that in the attached 0002 along with transaction
> boundary cleanup of callbacks, which passes the check-world for me,
> but I'll need to stare some more at it before committing.
>
> Let me know if this also fixes your own in-house test suite or if you
> have any other suggestions or if you think I am missing something.
One more cleanup patch attached as 0003: afterTriggerFiringDepth was
added by commit 5c54c3ed1 as a file-static variable, which in
hindsight should have been a field in AfterTriggersData alongside the
other per-transaction after-trigger state. This patch makes that
correction.
One alternative design worth considering for 0002: storing
batch_callbacks per query level in AfterTriggersQueryData rather than
as a single list in AfterTriggersData, so callbacks naturally live at
the query level where they were registered and get cleaned up with
AfterTriggerFreeQuery on abort. Deferred constraints still need a
top-level list in AfterTriggersData since they fire outside any query
level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
caller passes either the query-level or top-level list as appropriate.
This eliminates the need for firing_depth-matched firing entirely. I
did that in 0004. I think I like it over 0002. Will look more
closely tomorrow morning.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v3-0004-Store-batch-callbacks-at-the-appropriate-level-ra.patch (7.8K, 2-v3-0004-Store-batch-callbacks-at-the-appropriate-level-ra.patch)
download | inline diff:
From 14eb87d068e46939c325c2f070c66f4dfb4f064a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 23:22:50 +0900
Subject: [PATCH v3 4/4] Store batch callbacks at the appropriate level rather
than depth-matching
Instead of tagging each AfterTriggerCallbackItem with a firing depth
and matching at invocation time, store callbacks directly at the level
where they should fire: in AfterTriggersQueryData.batch_callbacks for
immediate constraints (fired by AfterTriggerEndQuery) and in
AfterTriggersData.batch_callbacks for deferred constraints (fired by
AfterTriggerFireDeferred and AfterTriggerSetState).
RegisterAfterTriggerBatchCallback() routes the callback to the current
query-level list when query_depth >= 0, and to the top-level list
otherwise (deferred firing at COMMIT).
FireAfterTriggerBatchCallbacks() is simplified to just iterate and
invoke the passed list. Memory cleanup is handled by the caller:
AfterTriggerFreeQuery() for query-level callbacks and
AfterTriggerEndXact() for the top-level list.
This eliminates the firing_depth field from AfterTriggerCallbackItem
and the depth-matched iteration logic, replacing it with natural
list-level scoping. The firing_depth counter in AfterTriggersData
is retained solely for AfterTriggerIsActive().
---
src/backend/commands/trigger.c | 90 ++++++++++++----------------------
1 file changed, 32 insertions(+), 58 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index bbc2405cc4a..993a00aec8c 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3902,8 +3902,8 @@ typedef struct AfterTriggersData
*/
int firing_depth;
- List *batch_callbacks; /* List of AfterTriggerCallbackItem,
- * possibly from multiple firing depths */
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem;
+ * for deferred constraints */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3911,6 +3911,7 @@ struct AfterTriggersQueryData
AfterTriggerEventList events; /* events pending from this query */
Tuplestorestate *fdw_tuplestore; /* foreign tuples for said events */
List *tables; /* list of AfterTriggersTableData, see below */
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
};
struct AfterTriggersTransData
@@ -3944,8 +3945,6 @@ struct AfterTriggersTableData
typedef struct AfterTriggerCallbackItem
{
AfterTriggerBatchCallback callback;
- int firing_depth; /* afterTriggerFiringDepth when registered;
- * callback fires only at this depth */
void *arg;
} AfterTriggerCallbackItem;
@@ -3984,7 +3983,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
-static void FireAfterTriggerBatchCallbacks(void);
+static void FireAfterTriggerBatchCallbacks(List *callbacks);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5237,17 +5236,15 @@ AfterTriggerEndQuery(EState *estate)
/*
* Fire batch callbacks before releasing query-level storage and before
* decrementing query_depth. Callbacks may do real work (index probes,
- * error reporting) and rely on query_depth still reflecting the current
- * batch level so that nested calls from SPI inside AFTER triggers are
- * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ * error reporting).
*/
- FireAfterTriggerBatchCallbacks();
+ FireAfterTriggerBatchCallbacks(qs->batch_callbacks);
+ afterTriggers.firing_depth--;
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
afterTriggers.query_depth--;
- afterTriggers.firing_depth--;
}
@@ -5304,6 +5301,9 @@ AfterTriggerFreeQuery(AfterTriggersQueryData *qs)
*/
qs->tables = NIL;
list_free_deep(tables);
+
+ list_free_deep(qs->batch_callbacks);
+ qs->batch_callbacks = NIL;
}
@@ -5353,8 +5353,7 @@ AfterTriggerFireDeferred(void)
}
/* Flush any fast-path batches accumulated by the triggers just fired. */
- FireAfterTriggerBatchCallbacks();
-
+ FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
afterTriggers.firing_depth--;
/*
@@ -5424,14 +5423,8 @@ AfterTriggerEndXact(bool isCommit)
afterTriggers.firing_depth = 0;
- Assert(afterTriggers.batch_callbacks == NIL || !isCommit);
-
- /* On abort, discard any pending callbacks without firing them. */
- if (!isCommit)
- {
- list_free_deep(afterTriggers.batch_callbacks);
- afterTriggers.batch_callbacks = NIL;
- }
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
}
/*
@@ -5732,6 +5725,7 @@ AfterTriggerEnlargeQueryState(void)
qs->events.tailfree = NULL;
qs->fdw_tuplestore = NULL;
qs->tables = NIL;
+ qs->batch_callbacks = NIL;
++init_depth;
}
@@ -6114,7 +6108,9 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
/*
* Flush any fast-path batches accumulated by the triggers just fired.
*/
- FireAfterTriggerBatchCallbacks();
+ FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
afterTriggers.firing_depth--;
if (snapshot_set)
@@ -6843,60 +6839,38 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
- item->firing_depth = afterTriggers.firing_depth;
item->arg = arg;
- afterTriggers.batch_callbacks =
- lappend(afterTriggers.batch_callbacks, item);
+ if (afterTriggers.query_depth >= 0)
+ {
+ AfterTriggersQueryData *qs =
+ &afterTriggers.query_stack[afterTriggers.query_depth];
+ qs->batch_callbacks = lappend(qs->batch_callbacks, item);
+ }
+ else
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
MemoryContextSwitchTo(oldcxt);
}
/*
* FireAfterTriggerBatchCallbacks
- * Invoke callbacks registered at the current firing depth.
- *
- * Each callback is tagged with the afterTriggerFiringDepth at registration
- * time. Only callbacks matching the current depth are invoked; the rest
- * are retained for when their own depth fires. This ensures that nested
- * trigger-firing contexts (e.g., SPI calls inside AFTER triggers) only
- * fire the callbacks they registered, leaving outer-level callbacks intact
- * until their firing depth is reached.
+ * Invoke all callbacks in the given list.
*
- * The list is updated before any callbacks are invoked so that if a
- * callback throws an ERROR the list is already in a consistent state.
+ * Memory cleanup of the list and its items is handled by the caller
+ * (AfterTriggerFreeQuery for query-level callbacks, AfterTriggerEndXact
+ * for top-level deferred callbacks).
*/
static void
-FireAfterTriggerBatchCallbacks(void)
+FireAfterTriggerBatchCallbacks(List *callbacks)
{
- List *remaining = NIL;
- List *to_fire = NIL;
ListCell *lc;
- /* remaining and to_fire lists must survive until callbacks complete */
- MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-
- foreach(lc, afterTriggers.batch_callbacks)
- {
- AfterTriggerCallbackItem *item = lfirst(lc);
-
- if (item->firing_depth == afterTriggers.firing_depth)
- to_fire = lappend(to_fire, item);
- else
- remaining = lappend(remaining, item);
- }
-
- list_free(afterTriggers.batch_callbacks);
- afterTriggers.batch_callbacks = remaining;
- MemoryContextSwitchTo(oldcxt);
-
- /* Now fire; if one throws, the list is already clean */
- foreach(lc, to_fire)
+ foreach(lc, callbacks)
{
AfterTriggerCallbackItem *item = lfirst(lc);
item->callback(item->arg);
- pfree(item);
}
- list_free(to_fire);
}
/*
--
2.47.3
[application/octet-stream] v3-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch (5.6K, 3-v3-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch)
download | inline diff:
From 312fad1c36e064ab9e7dc1780575e8c07f300751 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 18:17:40 +0900
Subject: [PATCH v3 2/4] Fix RI fast-path crash under nested C-level SPI
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers
register ri_FastPathEndBatch as a batch callback and open PK relations
under the SPI portal's resource owner. FireAfterTriggerBatchCallbacks
was suppressed at that point by the query_depth > 0 guard, deferring
teardown to the outer query's AfterTriggerEndQuery. By then the SPI
portal's resource owner had been released, decrementing the cached
relations' refcounts to zero. ri_FastPathTeardown then crashed
attempting to close them:
TRAP: failed Assert("rel->rd_refcnt > 0")
Fix by tagging each AfterTriggerCallbackItem with the
afterTriggerFiringDepth (added in 5c54c3ed1b9) at registration time
and firing only callbacks whose depth matches the current depth. This
replaces the query_depth > 0 suppression guard. Callbacks now fire at
the same firing depth at which they were registered, while the
resource owner that was active during registration is still alive,
eliminating the mismatch.
While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a5c: assert on commit
that no callbacks remain unfired, and discard any remaining callbacks
on transaction abort. Also restructure FireAfterTriggerBatchCallbacks()
to update afterTriggers.batch_callbacks before invoking any callbacks,
so that if a callback throws an ERROR the list is already in a
consistent state.
Note that ri_PerformCheck() uses fire_triggers=false, which skips
AfterTriggerBeginQuery/EndQuery and thus never increments
afterTriggerFiringDepth; events queued there fire at the outer
query's depth and are unaffected by this change.
Reported-by: Evan Montgomery-Recht <[email protected]>
Author: Evan Montgomery-Recht <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/backend/commands/trigger.c | 54 +++++++++++++++++++++++++++-------
1 file changed, 43 insertions(+), 11 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index c41005ba44e..f59537fe86e 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3935,6 +3935,8 @@ struct AfterTriggersTableData
typedef struct AfterTriggerCallbackItem
{
AfterTriggerBatchCallback callback;
+ int firing_depth; /* afterTriggerFiringDepth when registered;
+ * callback fires only at this depth */
void *arg;
} AfterTriggerCallbackItem;
@@ -5419,6 +5421,15 @@ AfterTriggerEndXact(bool isCommit)
afterTriggers.query_depth = -1;
afterTriggerFiringDepth = 0;
+
+ Assert(afterTriggers.batch_callbacks == NIL || !isCommit);
+
+ /* On abort, discard any pending callbacks without firing them. */
+ if (!isCommit)
+ {
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+ }
}
/*
@@ -6830,6 +6841,7 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
+ item->firing_depth = afterTriggerFiringDepth;
item->arg = arg;
afterTriggers.batch_callbacks =
lappend(afterTriggers.batch_callbacks, item);
@@ -6838,31 +6850,51 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
/*
* FireAfterTriggerBatchCallbacks
- * Invoke and clear all registered batch callbacks.
+ * Invoke callbacks registered at the current firing depth.
+ *
+ * Each callback is tagged with the afterTriggerFiringDepth at registration
+ * time. Only callbacks matching the current depth are invoked; the rest
+ * are retained for when their own depth fires. This ensures that nested
+ * trigger-firing contexts (e.g., SPI calls inside AFTER triggers) only
+ * fire the callbacks they registered, leaving outer-level callbacks intact
+ * until their firing depth is reached.
*
- * Only fires at the outermost query level (query_depth == 0) or from
- * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
- * at COMMIT). Nested queries from SPI inside AFTER triggers run at
- * depth > 0 and must not tear down resources the outer batch still needs.
+ * The list is updated before any callbacks are invoked so that if a
+ * callback throws an ERROR the list is already in a consistent state.
*/
static void
FireAfterTriggerBatchCallbacks(void)
{
+ List *remaining = NIL;
+ List *to_fire = NIL;
ListCell *lc;
- if (afterTriggers.query_depth > 0)
- return;
+ /* remaining and to_fire lists must survive until callbacks complete */
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
- Assert(afterTriggerFiringDepth > 0);
foreach(lc, afterTriggers.batch_callbacks)
{
AfterTriggerCallbackItem *item = lfirst(lc);
- item->callback(item->arg);
+ if (item->firing_depth == afterTriggerFiringDepth)
+ to_fire = lappend(to_fire, item);
+ else
+ remaining = lappend(remaining, item);
}
- list_free_deep(afterTriggers.batch_callbacks);
- afterTriggers.batch_callbacks = NIL;
+ list_free(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = remaining;
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Now fire; if one throws, the list is already clean */
+ foreach(lc, to_fire)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ pfree(item);
+ }
+ list_free(to_fire);
}
/*
--
2.47.3
[application/octet-stream] v3-0003-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch (5.6K, 4-v3-0003-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch)
download | inline diff:
From 4ed90a5b74c77c5e4a2f5b0a602a06a372aec795 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 23:22:41 +0900
Subject: [PATCH v3 3/4] Move afterTriggerFiringDepth into AfterTriggersData
The static variable afterTriggerFiringDepth is logically part of the
after-trigger state. Move it into AfterTriggersData as firing_depth,
alongside query_depth and the other per-transaction after-trigger
state.
---
src/backend/commands/trigger.c | 42 ++++++++++++++++++----------------
1 file changed, 22 insertions(+), 20 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f59537fe86e..bbc2405cc4a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3894,7 +3894,16 @@ typedef struct AfterTriggersData
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
- List *batch_callbacks; /* List of AfterTriggerCallbackItem */
+ /*
+ * Incremented around the trigger-firing loops in AfterTriggerEndQuery,
+ * AfterTriggerFireDeferred, and AfterTriggerSetState. Used by
+ * AfterTriggerIsActive() and to tag batch callbacks with the depth at
+ * which they should fire.
+ */
+ int firing_depth;
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem,
+ * possibly from multiple firing depths */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3942,13 +3951,6 @@ typedef struct AfterTriggerCallbackItem
static AfterTriggersData afterTriggers;
-/*
- * Incremented before invoking afterTriggerInvokeEvents(). Used by
- * AfterTriggerIsActive() to determine whether batch callbacks will fire,
- * so that RI trigger functions can take the batched fast path.
- */
-static int afterTriggerFiringDepth = 0;
-
static void AfterTriggerExecute(EState *estate,
AfterTriggerEvent event,
ResultRelInfo *relInfo,
@@ -5108,6 +5110,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.firing_depth = 0;
afterTriggers.batch_callbacks = NIL;
/*
@@ -5122,7 +5125,6 @@ AfterTriggerBeginXact(void)
Assert(afterTriggers.events.head == NULL);
Assert(afterTriggers.trans_stack == NULL);
Assert(afterTriggers.maxtransdepth == 0);
- Assert(afterTriggerFiringDepth == 0);
}
@@ -5194,7 +5196,7 @@ AfterTriggerEndQuery(EState *estate)
*/
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
for (;;)
{
if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
@@ -5245,7 +5247,7 @@ AfterTriggerEndQuery(EState *estate)
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
afterTriggers.query_depth--;
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
}
@@ -5341,7 +5343,7 @@ AfterTriggerFireDeferred(void)
* Run all the remaining triggers. Loop until they are all gone, in case
* some trigger queues more for us to do.
*/
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
while (afterTriggerMarkEvents(events, NULL, false))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -5353,7 +5355,7 @@ AfterTriggerFireDeferred(void)
/* Flush any fast-path batches accumulated by the triggers just fired. */
FireAfterTriggerBatchCallbacks();
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
/*
* We don't bother freeing the event list, since it will go away anyway
@@ -5420,7 +5422,7 @@ AfterTriggerEndXact(bool isCommit)
/* No more afterTriggers manipulation until next transaction starts. */
afterTriggers.query_depth = -1;
- afterTriggerFiringDepth = 0;
+ afterTriggers.firing_depth = 0;
Assert(afterTriggers.batch_callbacks == NIL || !isCommit);
@@ -6079,7 +6081,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
AfterTriggerEventList *events = &afterTriggers.events;
bool snapshot_set = false;
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
while (afterTriggerMarkEvents(events, NULL, true))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -6113,7 +6115,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
* Flush any fast-path batches accumulated by the triggers just fired.
*/
FireAfterTriggerBatchCallbacks();
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
if (snapshot_set)
PopActiveSnapshot();
@@ -6837,11 +6839,11 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
* Must be called while afterTriggers is active; callbacks registered
* outside a trigger-firing context would never fire.
*/
- Assert(afterTriggerFiringDepth > 0);
+ Assert(afterTriggers.firing_depth > 0);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
- item->firing_depth = afterTriggerFiringDepth;
+ item->firing_depth = afterTriggers.firing_depth;
item->arg = arg;
afterTriggers.batch_callbacks =
lappend(afterTriggers.batch_callbacks, item);
@@ -6876,7 +6878,7 @@ FireAfterTriggerBatchCallbacks(void)
{
AfterTriggerCallbackItem *item = lfirst(lc);
- if (item->firing_depth == afterTriggerFiringDepth)
+ if (item->firing_depth == afterTriggers.firing_depth)
to_fire = lappend(to_fire, item);
else
remaining = lappend(remaining, item);
@@ -6908,5 +6910,5 @@ FireAfterTriggerBatchCallbacks(void)
bool
AfterTriggerIsActive(void)
{
- return afterTriggerFiringDepth > 0;
+ return afterTriggers.firing_depth > 0;
}
--
2.47.3
[application/octet-stream] v3-0001-Modified-test-suite-from-Evan-s-patch.patch (17.1K, 5-v3-0001-Modified-test-suite-from-Evan-s-patch.patch)
download | inline diff:
From 2da74146aafbd9d505e9e0d9038138bc46f0cd08 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 8 Apr 2026 13:53:08 +0900
Subject: [PATCH v3 1/4] Modified test suite from Evan's patch
The C function in the original test module was using the same resource
owner as the parent SELECT, so the crash could not be reproduced.
Added a dedicated resource owner around the SPI call to ensure the inner
resource owner is released before the outer trigger-firing batch callback
fires, which is necessary to trigger the crash this test is meant to
cover.
---
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_spi_resowner/Makefile | 23 ++++
.../expected/ri_fastpath.out | 118 ++++++++++++++++++
.../modules/test_spi_resowner/meson.build | 31 +++++
.../test_spi_resowner/sql/ri_fastpath.sql | 107 ++++++++++++++++
.../test_spi_resowner--1.0.sql | 9 ++
.../test_spi_resowner/test_spi_resowner.c | 70 +++++++++++
.../test_spi_resowner.control | 4 +
9 files changed, 364 insertions(+)
create mode 100644 src/test/modules/test_spi_resowner/Makefile
create mode 100644 src/test/modules/test_spi_resowner/expected/ri_fastpath.out
create mode 100644 src/test/modules/test_spi_resowner/meson.build
create mode 100644 src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.c
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 0a74ab5c86f..016b328c8c5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -52,6 +52,7 @@ SUBDIRS = \
test_shmem \
test_shm_mq \
test_slru \
+ test_spi_resowner \
test_tidstore \
unsafe_tests \
worker_spi \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..3ca454064d0 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -53,6 +53,7 @@ subdir('test_saslprep')
subdir('test_shmem')
subdir('test_shm_mq')
subdir('test_slru')
+subdir('test_spi_resowner')
subdir('test_tidstore')
subdir('typcache')
subdir('unsafe_tests')
diff --git a/src/test/modules/test_spi_resowner/Makefile b/src/test/modules/test_spi_resowner/Makefile
new file mode 100644
index 00000000000..5a69e3a3c42
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_spi_resowner/Makefile
+
+MODULE_big = test_spi_resowner
+OBJS = \
+ $(WIN32RES) \
+ test_spi_resowner.o
+PGFILEDESC = "test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner"
+
+EXTENSION = test_spi_resowner
+DATA = test_spi_resowner--1.0.sql
+
+REGRESS = ri_fastpath
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_spi_resowner
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_spi_resowner/expected/ri_fastpath.out b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
new file mode 100644
index 00000000000..ad6b0f7c9b3
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
@@ -0,0 +1,118 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+ERROR: insert or update on table "ri_fp_fk" violates foreign key constraint "ri_fp_fk_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)"
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+SELECT plpgsql_calls_c_spi();
+ plpgsql_calls_c_spi
+---------------------
+
+(1 row)
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's firing depth and
+-- must fire before the inner resource owner is released. This exercises
+-- the depth-matched callback firing introduced to fix that crash.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner; the FK batch callback fires at the
+-- inner SPI's firing depth before that resource owner is released.
+INSERT INTO ri_fp_outer VALUES (1);
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+ERROR: insert or update on table "ri_fp_inner" violates foreign key constraint "ri_fp_inner_id_fkey"
+DETAIL: Key (id)=(3) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_inner VALUES (3)"
+SQL statement "SELECT spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)')"
+PL/pgSQL function outer_trigger_spi_fail() line 3 at PERFORM
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/meson.build b/src/test/modules/test_spi_resowner/meson.build
new file mode 100644
index 00000000000..fbb027e05c7
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/meson.build
@@ -0,0 +1,31 @@
+test_spi_resowner_sources = files(
+ 'test_spi_resowner.c',
+)
+
+if host_system == 'windows'
+ test_spi_resowner_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_spi_resowner',
+ '--FILEDESC', 'test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner',])
+endif
+
+test_spi_resowner = shared_module('test_spi_resowner',
+ test_spi_resowner_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_spi_resowner
+
+test_install_data += files(
+ 'test_spi_resowner.control',
+ 'test_spi_resowner--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_spi_resowner',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'ri_fastpath',
+ ],
+ },
+}
diff --git a/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
new file mode 100644
index 00000000000..4517b2437c4
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
@@ -0,0 +1,107 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+
+-- C-level SPI INSERT: the critical test case.
+-- Without the fix this crashes the server.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- C-level SPI with FK violation: should error, not crash
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT plpgsql_calls_c_spi();
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's firing depth and
+-- must fire before the inner resource owner is released. This exercises
+-- the depth-matched callback firing introduced to fix that crash.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner; the FK batch callback fires at the
+-- inner SPI's firing depth before that resource owner is released.
+INSERT INTO ri_fp_outer VALUES (1);
+
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
new file mode 100644
index 00000000000..29ef70ee0dc
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_spi_resowner" to load this file. \quit
+
+CREATE FUNCTION spi_exec_sql(query text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'spi_exec_sql'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.c b/src/test/modules/test_spi_resowner/test_spi_resowner.c
new file mode 100644
index 00000000000..0306139b5c0
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.c
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_spi_resowner.c
+ * SQL-callable C function that uses SPI to execute a query.
+ *
+ * Useful for testing code paths that only trigger under C-level
+ * SPI (not PL/pgSQL), such as resource owner interactions with
+ * RI fast-path FK checks.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_spi_resowner/test_spi_resowner.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/spi.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(spi_exec_sql);
+
+/*
+ * spi_exec_sql(query text) - execute a SQL query via SPI.
+ *
+ * Opens a fresh SPI connection, executes the query, and closes the
+ * connection. Creates a dedicated child resource owner around the
+ * SPI_execute call and releases it before returning, ensuring that
+ * any resources registered under it (such as relation references
+ * opened by RI fast-path FK checks) are released before the outer
+ * trigger-firing batch callback fires. This reproduces the resource
+ * owner mismatch that occurs with C-language extensions like PostGIS
+ * topology functions, which cannot be triggered from PL/pgSQL since
+ * PL/pgSQL's SPI connection spans the entire function call.
+ */
+Datum
+spi_exec_sql(PG_FUNCTION_ARGS)
+{
+ const char *query = text_to_cstring(PG_GETARG_TEXT_PP(0));
+ int ret;
+ ResourceOwner save = CurrentResourceOwner;
+ ResourceOwner childowner = ResourceOwnerCreate(save, "test_spi inner");
+
+ SPI_connect();
+
+ CurrentResourceOwner = childowner;
+ ret = SPI_execute(query, false, 0);
+
+ if (ret < 0)
+ elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+ SPI_finish();
+
+ CurrentResourceOwner = save;
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_BEFORE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_AFTER_LOCKS,
+ true, false);
+ ResourceOwnerDelete(childowner);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.control b/src/test/modules/test_spi_resowner/test_spi_resowner.control
new file mode 100644
index 00000000000..2120ae9442f
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.control
@@ -0,0 +1,4 @@
+comment = 'Test SQL-callable C function that uses SPI using dedicated ResourceOwner'
+default_version = '1.0'
+module_pathname = '$libdir/test_spi_resowner'
+relocatable = true
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 07:39 ` Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-09 07:39 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Evan Montgomery-Recht <[email protected]>; pgsql-hackers
> On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> wrote:
>
> On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
>> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
>>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
>>> <[email protected]> wrote:
>>>> The patch also adds a test module (test_spi_func) with a C function
>>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
>>>> crash cannot be triggered from PL/pgSQL. The test exercises the
>>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
>>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>>
>> I applied only the test module changes and it passes (without
>> crashing) even without your proposed fix. It seems that's because the
>> C function in test_spi_func calling SPI is using the same resource
>> owner as the parent SELECT. I think you'd need to create a resource
>> owner manually in the spi_exec() C function to reproduce the crash, as
>> done in the attached 0001, which contains the src/test changes
>> extracted from your patch modified as described, including renaming
>> the C function to spi_exec_sql().
>>
>> Also, the test cases that call spi_exec() (_sql()) directly from a
>> SELECT don't actually exercise the crash path because there is no
>> outer trigger-firing loop active. query_depth is 0 inside the inner
>> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
>> callback there anyway. The critical case requires spi_exec_sql() to be
>> called from inside an AFTER trigger, where query_depth > 0 causes the
>> guard to defer the callback past the inner resource owner's lifetime.
>> I've added that test case. I kept your original test cases as they
>> still provide useful coverage of C-level SPI FK behavior even if they
>> don't exercise the crash path specifically. Maybe your original
>> PostGIS test suite that hit the crash did have the right structure,
>> but that's not reflected in the patch as far as I can tell.
>>
>> I've also renamed the module to test_spi_resowner to better reflect
>> what it's about.
>>
>> For the fix, I have a different proposal. As you observed, the
>> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
>> that the nested SPI's callbacks get called under the outer resource
>> owner, which may not be the same as the one that SPI used. I think it
>> was a mistake to have that early return in the first place. Instead we
>> could remember for each callback what firing level it should be called
>> at, so the nested SPI's callbacks fire before returning to the parent
>> level and parent-level callbacks fire when the parent level completes.
>> I have implemented that in the attached 0002 along with transaction
>> boundary cleanup of callbacks, which passes the check-world for me,
>> but I'll need to stare some more at it before committing.
>>
>> Let me know if this also fixes your own in-house test suite or if you
>> have any other suggestions or if you think I am missing something.
>
> One more cleanup patch attached as 0003: afterTriggerFiringDepth was
> added by commit 5c54c3ed1 as a file-static variable, which in
> hindsight should have been a field in AfterTriggersData alongside the
> other per-transaction after-trigger state. This patch makes that
> correction.
>
> One alternative design worth considering for 0002: storing
> batch_callbacks per query level in AfterTriggersQueryData rather than
> as a single list in AfterTriggersData, so callbacks naturally live at
> the query level where they were registered and get cleaned up with
> AfterTriggerFreeQuery on abort. Deferred constraints still need a
> top-level list in AfterTriggersData since they fire outside any query
> level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
> caller passes either the query-level or top-level list as appropriate.
> This eliminates the need for firing_depth-matched firing entirely. I
> did that in 0004. I think I like it over 0002. Will look more
> closely tomorrow morning.
>
>
> --
> Thanks, Amit Langote
> <v3-0004-Store-batch-callbacks-at-the-appropriate-level-ra.patch><v3-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch><v3-0003-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch><v3-0001-Modified-test-suite-from-Evan-s-patch.patch>
A few comments on v3:
1 - 0002
```
static void
FireAfterTriggerBatchCallbacks(void)
{
+ List *remaining = NIL;
+ List *to_fire = NIL;
ListCell *lc;
- if (afterTriggers.query_depth > 0)
- return;
+ /* remaining and to_fire lists must survive until callbacks complete */
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
```
I think remaining and to_fire should stay in the same context of afterTriggers.batch_callbacks, so instead of hard coding TopTransactionContext, we can use GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the intention explicit.
2 - 0004, I noticed one potential problem, although I am not sure whether it can really happen in practice. This version stores callback items at the individual query depth, and FireAfterTriggerBatchCallbacks() now iterates the callback list for that depth and invokes each callback directly. My concern is that if one of those callbacks needs to register a new callback, that would append a new item to the same list while it is being iterated. That seems unsafe to me, because list append may create a new list structure underneath. If that happens, we may end up modifying the list being traversed, which does not look safe.
This problem doesn’t exist in 0002, because 0002 splits afterTriggers.batch_callbacks into remaining and to_fire, and reset afterTriggers.batch_callbacks = remaining before running callbacks. But the problem is, if a callback registers a new callback, the new callback goes to afterTriggers.batch_callbacks, so it won’t get executed.
From this perspective, I would assume a callback should not be allowed to register a new callback. Can you please help confirm?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-09 08:40 ` Amit Langote <[email protected]>
2026-04-09 09:21 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 jie wang <[email protected]>
2026-04-09 10:25 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
0 siblings, 2 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-09 08:40 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Evan Montgomery-Recht <[email protected]>; pgsql-hackers
Hi,
On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
> > On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> wrote:
> > On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
> >> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
> >>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> >>> <[email protected]> wrote:
> >>>> The patch also adds a test module (test_spi_func) with a C function
> >>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> >>>> crash cannot be triggered from PL/pgSQL. The test exercises the
> >>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
> >>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
> >>
> >> I applied only the test module changes and it passes (without
> >> crashing) even without your proposed fix. It seems that's because the
> >> C function in test_spi_func calling SPI is using the same resource
> >> owner as the parent SELECT. I think you'd need to create a resource
> >> owner manually in the spi_exec() C function to reproduce the crash, as
> >> done in the attached 0001, which contains the src/test changes
> >> extracted from your patch modified as described, including renaming
> >> the C function to spi_exec_sql().
> >>
> >> Also, the test cases that call spi_exec() (_sql()) directly from a
> >> SELECT don't actually exercise the crash path because there is no
> >> outer trigger-firing loop active. query_depth is 0 inside the inner
> >> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
> >> callback there anyway. The critical case requires spi_exec_sql() to be
> >> called from inside an AFTER trigger, where query_depth > 0 causes the
> >> guard to defer the callback past the inner resource owner's lifetime.
> >> I've added that test case. I kept your original test cases as they
> >> still provide useful coverage of C-level SPI FK behavior even if they
> >> don't exercise the crash path specifically. Maybe your original
> >> PostGIS test suite that hit the crash did have the right structure,
> >> but that's not reflected in the patch as far as I can tell.
> >>
> >> I've also renamed the module to test_spi_resowner to better reflect
> >> what it's about.
> >>
> >> For the fix, I have a different proposal. As you observed, the
> >> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
> >> that the nested SPI's callbacks get called under the outer resource
> >> owner, which may not be the same as the one that SPI used. I think it
> >> was a mistake to have that early return in the first place. Instead we
> >> could remember for each callback what firing level it should be called
> >> at, so the nested SPI's callbacks fire before returning to the parent
> >> level and parent-level callbacks fire when the parent level completes.
> >> I have implemented that in the attached 0002 along with transaction
> >> boundary cleanup of callbacks, which passes the check-world for me,
> >> but I'll need to stare some more at it before committing.
> >>
> >> Let me know if this also fixes your own in-house test suite or if you
> >> have any other suggestions or if you think I am missing something.
> >
> > One more cleanup patch attached as 0003: afterTriggerFiringDepth was
> > added by commit 5c54c3ed1 as a file-static variable, which in
> > hindsight should have been a field in AfterTriggersData alongside the
> > other per-transaction after-trigger state. This patch makes that
> > correction.
> >
> > One alternative design worth considering for 0002: storing
> > batch_callbacks per query level in AfterTriggersQueryData rather than
> > as a single list in AfterTriggersData, so callbacks naturally live at
> > the query level where they were registered and get cleaned up with
> > AfterTriggerFreeQuery on abort. Deferred constraints still need a
> > top-level list in AfterTriggersData since they fire outside any query
> > level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
> > caller passes either the query-level or top-level list as appropriate.
> > This eliminates the need for firing_depth-matched firing entirely. I
> > did that in 0004. I think I like it over 0002. Will look more
> > closely tomorrow morning.
> A few comments on v3:
Thanks for the review.
> 1 - 0002
> ```
> static void
> FireAfterTriggerBatchCallbacks(void)
> {
> + List *remaining = NIL;
> + List *to_fire = NIL;
> ListCell *lc;
>
> - if (afterTriggers.query_depth > 0)
> - return;
> + /* remaining and to_fire lists must survive until callbacks complete */
> + MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
> ```
>
> I think remaining and to_fire should stay in the same context of afterTriggers.batch_callbacks, so instead of hard coding TopTransactionContext, we can use GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the intention explicit.
I'm dropping 0002 or have merged 0004 into it so this memory context
switch is no longer present.
> 2 - 0004, I noticed one potential problem, although I am not sure whether it can really happen in practice. This version stores callback items at the individual query depth, and FireAfterTriggerBatchCallbacks() now iterates the callback list for that depth and invokes each callback directly. My concern is that if one of those callbacks needs to register a new callback, that would append a new item to the same list while it is being iterated. That seems unsafe to me, because list append may create a new list structure underneath. If that happens, we may end up modifying the list being traversed, which does not look safe.
>
> This problem doesn’t exist in 0002, because 0002 splits afterTriggers.batch_callbacks into remaining and to_fire, and reset afterTriggers.batch_callbacks = remaining before running callbacks. But the problem is, if a callback registers a new callback, the new callback goes to afterTriggers.batch_callbacks, so it won’t get executed.
>
> From this perspective, I would assume a callback should not be allowed to register a new callback. Can you please help confirm?
Good point on the re-entrant registration concern. I've added a
firing_batch_callbacks flag to AfterTriggersData that prevents
callbacks from registering new callbacks during
FireAfterTriggerBatchCallbacks(), with an Assert in
RegisterAfterTriggerBatchCallback() to enforce it. That should keep
the list being iterated from being modified.
The attached patches are updated accordingly. 0001 is the main fix
incorporating the per-query-level storage design, the transaction
boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
followup that moves afterTriggerFiringDepth into AfterTriggersData as
a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
commit 0001 and 0002 shortly. For 0003, I need to check on the policy
around adding new test modules during feature freeze before committing
it.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v4-0002-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch (5.6K, 2-v4-0002-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch)
download | inline diff:
From 4bd1ded2c80d1c1294af5e6a190debafd4866ceb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 9 Apr 2026 13:46:45 +0900
Subject: [PATCH v4 2/3] Move afterTriggerFiringDepth into AfterTriggersData
The static variable afterTriggerFiringDepth introduced by commit
5c54c3ed1b9 is logically part of the after-trigger state. Move it
into AfterTriggersData as firing_depth, alongside query_depth and
the other per-transaction after-trigger state. Also update its
comment to accurately reflect its sole remaining purpose: signaling
to AfterTriggerIsActive() that after-trigger firing is active.
Discussion: https://postgr.es/m/CA+HiwqFt4NGTNk7BinOsHHM48E9zGAa852vCfGoSe1bbL=JNFQ@mail.gmail.com
---
src/backend/commands/trigger.c | 36 +++++++++++++++++-----------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9c6125623e0..28187fe8c06 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3898,6 +3898,13 @@ typedef struct AfterTriggersData
* for deferred constraints */
bool firing_batch_callbacks; /* true when in
* FireAfterTriggersBatchCallbacks() */
+
+ /*
+ * Incremented around the trigger-firing loops in AfterTriggerEndQuery,
+ * AfterTriggerFireDeferred, and AfterTriggerSetState. Used by
+ * AfterTriggerIsActive() to signal that after-trigger firing is active.
+ */
+ int firing_depth;
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3944,13 +3951,6 @@ typedef struct AfterTriggerCallbackItem
static AfterTriggersData afterTriggers;
-/*
- * Incremented before invoking afterTriggerInvokeEvents(). Used by
- * AfterTriggerIsActive() to determine whether batch callbacks will fire,
- * so that RI trigger functions can take the batched fast path.
- */
-static int afterTriggerFiringDepth = 0;
-
static void AfterTriggerExecute(EState *estate,
AfterTriggerEvent event,
ResultRelInfo *relInfo,
@@ -5110,6 +5110,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.firing_depth = 0;
afterTriggers.batch_callbacks = NIL;
afterTriggers.firing_batch_callbacks = false;
@@ -5125,7 +5126,6 @@ AfterTriggerBeginXact(void)
Assert(afterTriggers.events.head == NULL);
Assert(afterTriggers.trans_stack == NULL);
Assert(afterTriggers.maxtransdepth == 0);
- Assert(afterTriggerFiringDepth == 0);
}
@@ -5197,7 +5197,7 @@ AfterTriggerEndQuery(EState *estate)
*/
qs = &afterTriggers.query_stack[afterTriggers.query_depth];
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
for (;;)
{
if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
@@ -5246,7 +5246,7 @@ AfterTriggerEndQuery(EState *estate)
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
afterTriggers.query_depth--;
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
}
@@ -5345,7 +5345,7 @@ AfterTriggerFireDeferred(void)
* Run all the remaining triggers. Loop until they are all gone, in case
* some trigger queues more for us to do.
*/
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
while (afterTriggerMarkEvents(events, NULL, false))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -5357,7 +5357,7 @@ AfterTriggerFireDeferred(void)
/* Flush any fast-path batches accumulated by the triggers just fired. */
FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
/*
* We don't bother freeing the event list or batch_callbacks, since
@@ -5425,7 +5425,7 @@ AfterTriggerEndXact(bool isCommit)
/* No more afterTriggers manipulation until next transaction starts. */
afterTriggers.query_depth = -1;
- afterTriggerFiringDepth = 0;
+ afterTriggers.firing_depth = 0;
list_free_deep(afterTriggers.batch_callbacks);
afterTriggers.batch_callbacks = NIL;
@@ -6083,7 +6083,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
AfterTriggerEventList *events = &afterTriggers.events;
bool snapshot_set = false;
- afterTriggerFiringDepth++;
+ afterTriggers.firing_depth++;
while (afterTriggerMarkEvents(events, NULL, true))
{
CommandId firing_id = afterTriggers.firing_counter++;
@@ -6117,7 +6117,7 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
* Flush any fast-path batches accumulated by the triggers just fired.
*/
FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
- afterTriggerFiringDepth--;
+ afterTriggers.firing_depth--;
list_free_deep(afterTriggers.batch_callbacks);
afterTriggers.batch_callbacks = NIL;
@@ -6843,7 +6843,7 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
* Must be called while afterTriggers is active; callbacks registered
* outside a trigger-firing context would never fire.
*/
- Assert(afterTriggerFiringDepth > 0);
+ Assert(afterTriggers.firing_depth > 0);
Assert(!afterTriggers.firing_batch_callbacks);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
@@ -6874,7 +6874,7 @@ FireAfterTriggerBatchCallbacks(List *callbacks)
{
ListCell *lc;
- Assert(afterTriggerFiringDepth > 0);
+ Assert(afterTriggers.firing_depth > 0);
afterTriggers.firing_batch_callbacks = true;
foreach(lc, callbacks)
{
@@ -6896,5 +6896,5 @@ FireAfterTriggerBatchCallbacks(List *callbacks)
bool
AfterTriggerIsActive(void)
{
- return afterTriggerFiringDepth > 0;
+ return afterTriggers.firing_depth > 0;
}
--
2.47.3
[application/octet-stream] v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch (9.3K, 3-v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch)
download | inline diff:
From 2343a90020cf2445dd574d7ca43ea4d460820a74 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 9 Apr 2026 17:39:25 +0900
Subject: [PATCH v4 1/3] Fix RI fast-path crash under nested C-level SPI
When a C-language function uses SPI_connect/SPI_execute/SPI_finish to
INSERT into a table with FK constraints, the FK AFTER triggers
register ri_FastPathEndBatch as a batch callback and open PK relations
under the SPI portal's resource owner. FireAfterTriggerBatchCallbacks
was suppressed at that point by the query_depth > 0 guard, deferring
teardown to the outer query's AfterTriggerEndQuery. By then the SPI
portal's resource owner had been released, decrementing the cached
relations' refcounts to zero. ri_FastPathTeardown then crashed
attempting to close them:
TRAP: failed Assert("rel->rd_refcnt > 0")
Fix by storing batch callbacks at the level where they should fire:
in AfterTriggersQueryData.batch_callbacks for immediate constraints
(fired by AfterTriggerEndQuery) and in AfterTriggersData.batch_callbacks
for deferred constraints (fired by AfterTriggerFireDeferred and
AfterTriggerSetState). RegisterAfterTriggerBatchCallback() routes the
callback to the current query-level list when query_depth >= 0, and to
the top-level list otherwise. FireAfterTriggerBatchCallbacks() takes a
list parameter and simply iterates and invokes it; memory cleanup is
handled by the caller. This replaces the query_depth > 0 guard and
the firing_depth field in AfterTriggerCallbackItem with natural
list-level scoping. The firing_depth counter in AfterTriggersData is
retained solely for AfterTriggerIsActive().
Also add firing_batch_callbacks to AfterTriggersData to detect and
prevent re-entrant callback registration during
FireAfterTriggerBatchCallbacks(), which would be unsafe as it could
modify the list being iterated. The flag is reset at transaction and
subtransaction boundaries to handle cases where an error thrown by a
callback is caught and the subtransaction is rolled back.
While at it, ensure callbacks are properly accounted for at all
transaction boundaries, as cleanup of b7b27eb41a5c: discard any
remaining top-level callbacks on both commit and abort in
AfterTriggerEndXact(), and clean up query-level callbacks in
AfterTriggerFreeQuery().
Note that ri_PerformCheck() calls SPI with fire_triggers=false, which
skips AfterTriggerBeginQuery/EndQuery for that SPI command. Any
callbacks registered by triggers fired during that SPI command land at
the outer query's level and fire when the outer query completes, which
is the correct behavior.
Reported-by: Evan Montgomery-Recht <[email protected]>
Analyzed-by: Evan Montgomery-Recht <[email protected]>
Author: Amit Langote <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/backend/commands/trigger.c | 70 ++++++++++++++++++++++------------
1 file changed, 45 insertions(+), 25 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index c41005ba44e..9c6125623e0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3894,7 +3894,10 @@ typedef struct AfterTriggersData
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
- List *batch_callbacks; /* List of AfterTriggerCallbackItem */
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem;
+ * for deferred constraints */
+ bool firing_batch_callbacks; /* true when in
+ * FireAfterTriggersBatchCallbacks() */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3902,6 +3905,7 @@ struct AfterTriggersQueryData
AfterTriggerEventList events; /* events pending from this query */
Tuplestorestate *fdw_tuplestore; /* foreign tuples for said events */
List *tables; /* list of AfterTriggersTableData, see below */
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
};
struct AfterTriggersTransData
@@ -3980,7 +3984,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
-static void FireAfterTriggerBatchCallbacks(void);
+static void FireAfterTriggerBatchCallbacks(List *callbacks);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5107,6 +5111,7 @@ AfterTriggerBeginXact(void)
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
afterTriggers.batch_callbacks = NIL;
+ afterTriggers.firing_batch_callbacks = false;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5233,11 +5238,9 @@ AfterTriggerEndQuery(EState *estate)
/*
* Fire batch callbacks before releasing query-level storage and before
* decrementing query_depth. Callbacks may do real work (index probes,
- * error reporting) and rely on query_depth still reflecting the current
- * batch level so that nested calls from SPI inside AFTER triggers are
- * correctly suppressed by FireAfterTriggerBatchCallbacks's depth guard.
+ * error reporting).
*/
- FireAfterTriggerBatchCallbacks();
+ FireAfterTriggerBatchCallbacks(qs->batch_callbacks);
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5300,6 +5303,9 @@ AfterTriggerFreeQuery(AfterTriggersQueryData *qs)
*/
qs->tables = NIL;
list_free_deep(tables);
+
+ list_free_deep(qs->batch_callbacks);
+ qs->batch_callbacks = NIL;
}
@@ -5349,13 +5355,14 @@ AfterTriggerFireDeferred(void)
}
/* Flush any fast-path batches accumulated by the triggers just fired. */
- FireAfterTriggerBatchCallbacks();
+ FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
afterTriggerFiringDepth--;
/*
- * We don't bother freeing the event list, since it will go away anyway
- * (and more efficiently than via pfree) in AfterTriggerEndXact.
+ * We don't bother freeing the event list or batch_callbacks, since
+ * they will go away anyway (and more efficiently than via pfree) in
+ * AfterTriggerEndXact.
*/
if (snap_pushed)
@@ -5419,6 +5426,10 @@ AfterTriggerEndXact(bool isCommit)
afterTriggers.query_depth = -1;
afterTriggerFiringDepth = 0;
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+ afterTriggers.firing_batch_callbacks = false;
}
/*
@@ -5565,6 +5576,9 @@ AfterTriggerEndSubXact(bool isCommit)
}
}
}
+
+ /* Reset in case a callback threw an error while firing. */
+ afterTriggers.firing_batch_callbacks = false;
}
/*
@@ -5719,6 +5733,7 @@ AfterTriggerEnlargeQueryState(void)
qs->events.tailfree = NULL;
qs->fdw_tuplestore = NULL;
qs->tables = NIL;
+ qs->batch_callbacks = NIL;
++init_depth;
}
@@ -6101,8 +6116,10 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
/*
* Flush any fast-path batches accumulated by the triggers just fired.
*/
- FireAfterTriggerBatchCallbacks();
+ FireAfterTriggerBatchCallbacks(afterTriggers.batch_callbacks);
afterTriggerFiringDepth--;
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
if (snapshot_set)
PopActiveSnapshot();
@@ -6827,42 +6844,45 @@ RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
* outside a trigger-firing context would never fire.
*/
Assert(afterTriggerFiringDepth > 0);
+ Assert(!afterTriggers.firing_batch_callbacks);
oldcxt = MemoryContextSwitchTo(TopTransactionContext);
item = palloc(sizeof(AfterTriggerCallbackItem));
item->callback = callback;
item->arg = arg;
- afterTriggers.batch_callbacks =
- lappend(afterTriggers.batch_callbacks, item);
+ if (afterTriggers.query_depth >= 0)
+ {
+ AfterTriggersQueryData *qs =
+ &afterTriggers.query_stack[afterTriggers.query_depth];
+ qs->batch_callbacks = lappend(qs->batch_callbacks, item);
+ }
+ else
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
MemoryContextSwitchTo(oldcxt);
}
/*
* FireAfterTriggerBatchCallbacks
- * Invoke and clear all registered batch callbacks.
+ * Invoke all callbacks in the given list.
*
- * Only fires at the outermost query level (query_depth == 0) or from
- * top-level operations (query_depth == -1, e.g. AfterTriggerFireDeferred
- * at COMMIT). Nested queries from SPI inside AFTER triggers run at
- * depth > 0 and must not tear down resources the outer batch still needs.
+ * Memory cleanup of the list and its items is handled by the caller
+ * (AfterTriggerFreeQuery for query-level callbacks, AfterTriggerEndXact
+ * for top-level deferred callbacks).
*/
static void
-FireAfterTriggerBatchCallbacks(void)
+FireAfterTriggerBatchCallbacks(List *callbacks)
{
ListCell *lc;
- if (afterTriggers.query_depth > 0)
- return;
-
Assert(afterTriggerFiringDepth > 0);
- foreach(lc, afterTriggers.batch_callbacks)
+ afterTriggers.firing_batch_callbacks = true;
+ foreach(lc, callbacks)
{
AfterTriggerCallbackItem *item = lfirst(lc);
item->callback(item->arg);
}
-
- list_free_deep(afterTriggers.batch_callbacks);
- afterTriggers.batch_callbacks = NIL;
+ afterTriggers.firing_batch_callbacks = false;
}
/*
--
2.47.3
[application/octet-stream] v4-0003-Add-test-module-for-RI-fast-path-FK-checks-under-.patch (17.5K, 4-v4-0003-Add-test-module-for-RI-fast-path-FK-checks-under-.patch)
download | inline diff:
From 0cfd3e2ab44bcd5fcddad3173e37de62bfd1a842 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 9 Apr 2026 14:44:51 +0900
Subject: [PATCH v4 3/3] Add test module for RI fast-path FK checks under
C-level SPI
Add test_spi_resowner, a test module providing a SQL-callable C function
that executes SQL via SPI with a dedicated short-lived resource owner.
This reproduces the crash scenario fixed by the previous commit that
cannot be triggered from PL/pgSQL, since PL/pgSQL's SPI connection spans
the entire function call and its resource owner outlives the batch
callback.
The critical test case calls spi_exec_sql() from inside an AFTER trigger,
where the FK checks fire under a nested SPI context while the outer
trigger-firing loop is active. The dedicated resource owner ensures it is
released before the outer batch callback fires, reproducing the resource
owner mismatch that previously caused a crash. Additional test cases
exercise multiple FK constraints, FK violations, and PL/pgSQL calling the
C SPI function, matching the PostGIS toTopoGeom() call pattern reported
by Evan Montgomery-Recht.
Reported-by: Evan Montgomery-Recht <[email protected]>
Author: Evan Montgomery-Recht <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_spi_resowner/Makefile | 23 ++++
.../expected/ri_fastpath.out | 116 ++++++++++++++++++
.../modules/test_spi_resowner/meson.build | 31 +++++
.../test_spi_resowner/sql/ri_fastpath.sql | 105 ++++++++++++++++
.../test_spi_resowner--1.0.sql | 9 ++
.../test_spi_resowner/test_spi_resowner.c | 70 +++++++++++
.../test_spi_resowner.control | 4 +
9 files changed, 360 insertions(+)
create mode 100644 src/test/modules/test_spi_resowner/Makefile
create mode 100644 src/test/modules/test_spi_resowner/expected/ri_fastpath.out
create mode 100644 src/test/modules/test_spi_resowner/meson.build
create mode 100644 src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.c
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 0a74ab5c86f..016b328c8c5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -52,6 +52,7 @@ SUBDIRS = \
test_shmem \
test_shm_mq \
test_slru \
+ test_spi_resowner \
test_tidstore \
unsafe_tests \
worker_spi \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..3ca454064d0 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -53,6 +53,7 @@ subdir('test_saslprep')
subdir('test_shmem')
subdir('test_shm_mq')
subdir('test_slru')
+subdir('test_spi_resowner')
subdir('test_tidstore')
subdir('typcache')
subdir('unsafe_tests')
diff --git a/src/test/modules/test_spi_resowner/Makefile b/src/test/modules/test_spi_resowner/Makefile
new file mode 100644
index 00000000000..5a69e3a3c42
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_spi_resowner/Makefile
+
+MODULE_big = test_spi_resowner
+OBJS = \
+ $(WIN32RES) \
+ test_spi_resowner.o
+PGFILEDESC = "test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner"
+
+EXTENSION = test_spi_resowner
+DATA = test_spi_resowner--1.0.sql
+
+REGRESS = ri_fastpath
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_spi_resowner
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_spi_resowner/expected/ri_fastpath.out b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
new file mode 100644
index 00000000000..03984ca892e
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
@@ -0,0 +1,116 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+-- C-level SPI INSERT: the critical test case.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- C-level SPI with FK violation: should error
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+ERROR: insert or update on table "ri_fp_fk" violates foreign key constraint "ri_fp_fk_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)"
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+SELECT plpgsql_calls_c_spi();
+ plpgsql_calls_c_spi
+---------------------
+
+(1 row)
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's query level and
+-- must fire before the inner resource owner is released.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner that is released after the FK batch
+-- callback fires.
+INSERT INTO ri_fp_outer VALUES (1);
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+ERROR: insert or update on table "ri_fp_inner" violates foreign key constraint "ri_fp_inner_id_fkey"
+DETAIL: Key (id)=(3) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_inner VALUES (3)"
+SQL statement "SELECT spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)')"
+PL/pgSQL function outer_trigger_spi_fail() line 3 at PERFORM
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/meson.build b/src/test/modules/test_spi_resowner/meson.build
new file mode 100644
index 00000000000..fbb027e05c7
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/meson.build
@@ -0,0 +1,31 @@
+test_spi_resowner_sources = files(
+ 'test_spi_resowner.c',
+)
+
+if host_system == 'windows'
+ test_spi_resowner_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_spi_resowner',
+ '--FILEDESC', 'test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner',])
+endif
+
+test_spi_resowner = shared_module('test_spi_resowner',
+ test_spi_resowner_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_spi_resowner
+
+test_install_data += files(
+ 'test_spi_resowner.control',
+ 'test_spi_resowner--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_spi_resowner',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'ri_fastpath',
+ ],
+ },
+}
diff --git a/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
new file mode 100644
index 00000000000..11a561a06ac
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
@@ -0,0 +1,105 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+
+-- C-level SPI INSERT: the critical test case.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- C-level SPI with FK violation: should error
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT plpgsql_calls_c_spi();
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's query level and
+-- must fire before the inner resource owner is released.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner that is released after the FK batch
+-- callback fires.
+INSERT INTO ri_fp_outer VALUES (1);
+
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
new file mode 100644
index 00000000000..29ef70ee0dc
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_spi_resowner" to load this file. \quit
+
+CREATE FUNCTION spi_exec_sql(query text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'spi_exec_sql'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.c b/src/test/modules/test_spi_resowner/test_spi_resowner.c
new file mode 100644
index 00000000000..0306139b5c0
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.c
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_spi_resowner.c
+ * SQL-callable C function that uses SPI to execute a query.
+ *
+ * Useful for testing code paths that only trigger under C-level
+ * SPI (not PL/pgSQL), such as resource owner interactions with
+ * RI fast-path FK checks.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_spi_resowner/test_spi_resowner.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/spi.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(spi_exec_sql);
+
+/*
+ * spi_exec_sql(query text) - execute a SQL query via SPI.
+ *
+ * Opens a fresh SPI connection, executes the query, and closes the
+ * connection. Creates a dedicated child resource owner around the
+ * SPI_execute call and releases it before returning, ensuring that
+ * any resources registered under it (such as relation references
+ * opened by RI fast-path FK checks) are released before the outer
+ * trigger-firing batch callback fires. This reproduces the resource
+ * owner mismatch that occurs with C-language extensions like PostGIS
+ * topology functions, which cannot be triggered from PL/pgSQL since
+ * PL/pgSQL's SPI connection spans the entire function call.
+ */
+Datum
+spi_exec_sql(PG_FUNCTION_ARGS)
+{
+ const char *query = text_to_cstring(PG_GETARG_TEXT_PP(0));
+ int ret;
+ ResourceOwner save = CurrentResourceOwner;
+ ResourceOwner childowner = ResourceOwnerCreate(save, "test_spi inner");
+
+ SPI_connect();
+
+ CurrentResourceOwner = childowner;
+ ret = SPI_execute(query, false, 0);
+
+ if (ret < 0)
+ elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+ SPI_finish();
+
+ CurrentResourceOwner = save;
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_BEFORE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_AFTER_LOCKS,
+ true, false);
+ ResourceOwnerDelete(childowner);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.control b/src/test/modules/test_spi_resowner/test_spi_resowner.control
new file mode 100644
index 00000000000..2120ae9442f
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.control
@@ -0,0 +1,4 @@
+comment = 'Test SQL-callable C function that uses SPI using dedicated ResourceOwner'
+default_version = '1.0'
+module_pathname = '$libdir/test_spi_resowner'
+relocatable = true
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 09:21 ` jie wang <[email protected]>
2026-04-09 09:24 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: jie wang @ 2026-04-09 09:21 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Evan Montgomery-Recht <[email protected]>; pgsql-hackers
Amit Langote <[email protected]> 于2026年4月9日周四 16:41写道:
> Hi,
>
> On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
> > > On Apr 8, 2026, at 22:26, Amit Langote <[email protected]>
> wrote:
> > > On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]>
> wrote:
> > >> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]>
> wrote:
> > >>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> > >>> <[email protected]> wrote:
> > >>>> The patch also adds a test module (test_spi_func) with a C function
> > >>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> > >>>> crash cannot be triggered from PL/pgSQL. The test exercises the
> > >>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
> > >>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
> > >>
> > >> I applied only the test module changes and it passes (without
> > >> crashing) even without your proposed fix. It seems that's because the
> > >> C function in test_spi_func calling SPI is using the same resource
> > >> owner as the parent SELECT. I think you'd need to create a resource
> > >> owner manually in the spi_exec() C function to reproduce the crash, as
> > >> done in the attached 0001, which contains the src/test changes
> > >> extracted from your patch modified as described, including renaming
> > >> the C function to spi_exec_sql().
> > >>
> > >> Also, the test cases that call spi_exec() (_sql()) directly from a
> > >> SELECT don't actually exercise the crash path because there is no
> > >> outer trigger-firing loop active. query_depth is 0 inside the inner
> > >> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
> > >> callback there anyway. The critical case requires spi_exec_sql() to be
> > >> called from inside an AFTER trigger, where query_depth > 0 causes the
> > >> guard to defer the callback past the inner resource owner's lifetime.
> > >> I've added that test case. I kept your original test cases as they
> > >> still provide useful coverage of C-level SPI FK behavior even if they
> > >> don't exercise the crash path specifically. Maybe your original
> > >> PostGIS test suite that hit the crash did have the right structure,
> > >> but that's not reflected in the patch as far as I can tell.
> > >>
> > >> I've also renamed the module to test_spi_resowner to better reflect
> > >> what it's about.
> > >>
> > >> For the fix, I have a different proposal. As you observed, the
> > >> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
> > >> that the nested SPI's callbacks get called under the outer resource
> > >> owner, which may not be the same as the one that SPI used. I think it
> > >> was a mistake to have that early return in the first place. Instead we
> > >> could remember for each callback what firing level it should be called
> > >> at, so the nested SPI's callbacks fire before returning to the parent
> > >> level and parent-level callbacks fire when the parent level completes.
> > >> I have implemented that in the attached 0002 along with transaction
> > >> boundary cleanup of callbacks, which passes the check-world for me,
> > >> but I'll need to stare some more at it before committing.
> > >>
> > >> Let me know if this also fixes your own in-house test suite or if you
> > >> have any other suggestions or if you think I am missing something.
> > >
> > > One more cleanup patch attached as 0003: afterTriggerFiringDepth was
> > > added by commit 5c54c3ed1 as a file-static variable, which in
> > > hindsight should have been a field in AfterTriggersData alongside the
> > > other per-transaction after-trigger state. This patch makes that
> > > correction.
> > >
> > > One alternative design worth considering for 0002: storing
> > > batch_callbacks per query level in AfterTriggersQueryData rather than
> > > as a single list in AfterTriggersData, so callbacks naturally live at
> > > the query level where they were registered and get cleaned up with
> > > AfterTriggerFreeQuery on abort. Deferred constraints still need a
> > > top-level list in AfterTriggersData since they fire outside any query
> > > level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
> > > caller passes either the query-level or top-level list as appropriate.
> > > This eliminates the need for firing_depth-matched firing entirely. I
> > > did that in 0004. I think I like it over 0002. Will look more
> > > closely tomorrow morning.
> > A few comments on v3:
>
> Thanks for the review.
>
> > 1 - 0002
> > ```
> > static void
> > FireAfterTriggerBatchCallbacks(void)
> > {
> > + List *remaining = NIL;
> > + List *to_fire = NIL;
> > ListCell *lc;
> >
> > - if (afterTriggers.query_depth > 0)
> > - return;
> > + /* remaining and to_fire lists must survive until callbacks
> complete */
> > + MemoryContext oldcxt =
> MemoryContextSwitchTo(TopTransactionContext);
> > ```
> >
> > I think remaining and to_fire should stay in the same context of
> afterTriggers.batch_callbacks, so instead of hard coding
> TopTransactionContext, we can use
> GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the
> intention explicit.
>
> I'm dropping 0002 or have merged 0004 into it so this memory context
> switch is no longer present.
>
> > 2 - 0004, I noticed one potential problem, although I am not sure
> whether it can really happen in practice. This version stores callback
> items at the individual query depth, and FireAfterTriggerBatchCallbacks()
> now iterates the callback list for that depth and invokes each callback
> directly. My concern is that if one of those callbacks needs to register a
> new callback, that would append a new item to the same list while it is
> being iterated. That seems unsafe to me, because list append may create a
> new list structure underneath. If that happens, we may end up modifying the
> list being traversed, which does not look safe.
> >
> > This problem doesn’t exist in 0002, because 0002 splits
> afterTriggers.batch_callbacks into remaining and to_fire, and reset
> afterTriggers.batch_callbacks = remaining before running callbacks. But the
> problem is, if a callback registers a new callback, the new callback goes
> to afterTriggers.batch_callbacks, so it won’t get executed.
> >
> > From this perspective, I would assume a callback should not be allowed
> to register a new callback. Can you please help confirm?
>
> Good point on the re-entrant registration concern. I've added a
> firing_batch_callbacks flag to AfterTriggersData that prevents
> callbacks from registering new callbacks during
> FireAfterTriggerBatchCallbacks(), with an Assert in
> RegisterAfterTriggerBatchCallback() to enforce it. That should keep
> the list being iterated from being modified.
>
> The attached patches are updated accordingly. 0001 is the main fix
> incorporating the per-query-level storage design, the transaction
> boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
> followup that moves afterTriggerFiringDepth into AfterTriggersData as
> a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
> commit 0001 and 0002 shortly. For 0003, I need to check on the policy
> around adding new test modules during feature freeze before committing
> it.
>
> --
> Thanks, Amit Langote
>
Hi,
I took a glance at the patch, overall looks good to me. A nitpick on 0001:
+ bool firing_batch_callbacks; /* true when in
+
* FireAfterTriggersBatchCallbacks() */
Looks like a typo in the comment. The function name is
FireAfterTriggerBatchCallbacks, no “s” after Trigger.
Best regards,
--
wang jie
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 09:21 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 jie wang <[email protected]>
@ 2026-04-09 09:24 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-09 09:24 UTC (permalink / raw)
To: jie wang <[email protected]>; +Cc: Chao Li <[email protected]>; Evan Montgomery-Recht <[email protected]>; pgsql-hackers
On Thu, Apr 9, 2026 at 6:22 PM jie wang <[email protected]> wrote:
> Hi,
>
> I took a glance at the patch, overall looks good to me. A nitpick on 0001:
>
> + bool firing_batch_callbacks; /* true when in
> + * FireAfterTriggersBatchCallbacks() */
>
> Looks like a typo in the comment. The function name is FireAfterTriggerBatchCallbacks, no “s” after Trigger.
Thanks, I've fixed the typo in my local tree.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 10:25 ` Chao Li <[email protected]>
2026-04-10 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Chao Li @ 2026-04-09 10:25 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Evan Montgomery-Recht <[email protected]>; pgsql-hackers
> On Apr 9, 2026, at 16:40, Amit Langote <[email protected]> wrote:
>
> Hi,
>
> On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
>>> On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> wrote:
>>> On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
>>>> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
>>>>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
>>>>> <[email protected]> wrote:
>>>>>> The patch also adds a test module (test_spi_func) with a C function
>>>>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
>>>>>> crash cannot be triggered from PL/pgSQL. The test exercises the
>>>>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
>>>>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>>>>
>>>> I applied only the test module changes and it passes (without
>>>> crashing) even without your proposed fix. It seems that's because the
>>>> C function in test_spi_func calling SPI is using the same resource
>>>> owner as the parent SELECT. I think you'd need to create a resource
>>>> owner manually in the spi_exec() C function to reproduce the crash, as
>>>> done in the attached 0001, which contains the src/test changes
>>>> extracted from your patch modified as described, including renaming
>>>> the C function to spi_exec_sql().
>>>>
>>>> Also, the test cases that call spi_exec() (_sql()) directly from a
>>>> SELECT don't actually exercise the crash path because there is no
>>>> outer trigger-firing loop active. query_depth is 0 inside the inner
>>>> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
>>>> callback there anyway. The critical case requires spi_exec_sql() to be
>>>> called from inside an AFTER trigger, where query_depth > 0 causes the
>>>> guard to defer the callback past the inner resource owner's lifetime.
>>>> I've added that test case. I kept your original test cases as they
>>>> still provide useful coverage of C-level SPI FK behavior even if they
>>>> don't exercise the crash path specifically. Maybe your original
>>>> PostGIS test suite that hit the crash did have the right structure,
>>>> but that's not reflected in the patch as far as I can tell.
>>>>
>>>> I've also renamed the module to test_spi_resowner to better reflect
>>>> what it's about.
>>>>
>>>> For the fix, I have a different proposal. As you observed, the
>>>> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
>>>> that the nested SPI's callbacks get called under the outer resource
>>>> owner, which may not be the same as the one that SPI used. I think it
>>>> was a mistake to have that early return in the first place. Instead we
>>>> could remember for each callback what firing level it should be called
>>>> at, so the nested SPI's callbacks fire before returning to the parent
>>>> level and parent-level callbacks fire when the parent level completes.
>>>> I have implemented that in the attached 0002 along with transaction
>>>> boundary cleanup of callbacks, which passes the check-world for me,
>>>> but I'll need to stare some more at it before committing.
>>>>
>>>> Let me know if this also fixes your own in-house test suite or if you
>>>> have any other suggestions or if you think I am missing something.
>>>
>>> One more cleanup patch attached as 0003: afterTriggerFiringDepth was
>>> added by commit 5c54c3ed1 as a file-static variable, which in
>>> hindsight should have been a field in AfterTriggersData alongside the
>>> other per-transaction after-trigger state. This patch makes that
>>> correction.
>>>
>>> One alternative design worth considering for 0002: storing
>>> batch_callbacks per query level in AfterTriggersQueryData rather than
>>> as a single list in AfterTriggersData, so callbacks naturally live at
>>> the query level where they were registered and get cleaned up with
>>> AfterTriggerFreeQuery on abort. Deferred constraints still need a
>>> top-level list in AfterTriggersData since they fire outside any query
>>> level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
>>> caller passes either the query-level or top-level list as appropriate.
>>> This eliminates the need for firing_depth-matched firing entirely. I
>>> did that in 0004. I think I like it over 0002. Will look more
>>> closely tomorrow morning.
>> A few comments on v3:
>
> Thanks for the review.
>
>> 1 - 0002
>> ```
>> static void
>> FireAfterTriggerBatchCallbacks(void)
>> {
>> + List *remaining = NIL;
>> + List *to_fire = NIL;
>> ListCell *lc;
>>
>> - if (afterTriggers.query_depth > 0)
>> - return;
>> + /* remaining and to_fire lists must survive until callbacks complete */
>> + MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
>> ```
>>
>> I think remaining and to_fire should stay in the same context of afterTriggers.batch_callbacks, so instead of hard coding TopTransactionContext, we can use GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the intention explicit.
>
> I'm dropping 0002 or have merged 0004 into it so this memory context
> switch is no longer present.
>
>> 2 - 0004, I noticed one potential problem, although I am not sure whether it can really happen in practice. This version stores callback items at the individual query depth, and FireAfterTriggerBatchCallbacks() now iterates the callback list for that depth and invokes each callback directly. My concern is that if one of those callbacks needs to register a new callback, that would append a new item to the same list while it is being iterated. That seems unsafe to me, because list append may create a new list structure underneath. If that happens, we may end up modifying the list being traversed, which does not look safe.
>>
>> This problem doesn’t exist in 0002, because 0002 splits afterTriggers.batch_callbacks into remaining and to_fire, and reset afterTriggers.batch_callbacks = remaining before running callbacks. But the problem is, if a callback registers a new callback, the new callback goes to afterTriggers.batch_callbacks, so it won’t get executed.
>>
>> From this perspective, I would assume a callback should not be allowed to register a new callback. Can you please help confirm?
>
> Good point on the re-entrant registration concern. I've added a
> firing_batch_callbacks flag to AfterTriggersData that prevents
> callbacks from registering new callbacks during
> FireAfterTriggerBatchCallbacks(), with an Assert in
> RegisterAfterTriggerBatchCallback() to enforce it. That should keep
> the list being iterated from being modified.
>
> The attached patches are updated accordingly. 0001 is the main fix
> incorporating the per-query-level storage design, the transaction
> boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
> followup that moves afterTriggerFiringDepth into AfterTriggersData as
> a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
> commit 0001 and 0002 shortly. For 0003, I need to check on the policy
> around adding new test modules during feature freeze before committing
> it.
>
> --
> Thanks, Amit Langote
> <v4-0002-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch><v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch><v4-0003-Add-test-module-for-RI-fast-path-FK-checks-under-.patch>
0001 and 0002 look good to me. I didn’t review 0003 and don’t intend to review it.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 10:25 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
@ 2026-04-10 07:39 ` Amit Langote <[email protected]>
2026-04-10 23:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Haibo Yan <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-10 07:39 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Evan Montgomery-Recht <[email protected]>; pgsql-hackers
On Thu, Apr 9, 2026 at 7:26 PM Chao Li <[email protected]> wrote:
> 0001 and 0002 look good to me. I didn’t review 0003 and don’t intend to review it.
I've now pushed 0001 (34a3078629) and 0002 (d6e96bacd3c).
Here's the remaning patch to add src/test/modules/test_spi_resowner
rebased against master. I'm holding off on committing the test module
until I confirm the policy on new test modules during feature freeze.
It's also worth discussing whether this is the right place for testing
C extensions that use SPI with a dedicated resource owner, or whether
that coverage belongs elsewhere.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v5-0001-Add-test-module-for-RI-fast-path-FK-checks-under-.patch (17.5K, 2-v5-0001-Add-test-module-for-RI-fast-path-FK-checks-under-.patch)
download | inline diff:
From 01f695db4778fcd1d730b315395565cfd0c3d38d Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Fri, 10 Apr 2026 15:19:14 +0900
Subject: [PATCH v5] Add test module for RI fast-path FK checks under C-level
SPI
Add test_spi_resowner, a test module providing a SQL-callable C function
that executes SQL via SPI with a dedicated short-lived resource owner.
This reproduces the crash scenario fixed by the previous commit that
cannot be triggered from PL/pgSQL, since PL/pgSQL's SPI connection spans
the entire function call and its resource owner outlives the batch
callback.
The critical test case calls spi_exec_sql() from inside an AFTER trigger,
where the FK checks fire under a nested SPI context while the outer
trigger-firing loop is active. The dedicated resource owner ensures it is
released before the outer batch callback fires, reproducing the resource
owner mismatch that previously caused a crash. Additional test cases
exercise multiple FK constraints, FK violations, and PL/pgSQL calling the
C SPI function, matching the PostGIS toTopoGeom() call pattern reported
by Evan Montgomery-Recht.
Reported-by: Evan Montgomery-Recht <[email protected]>
Author: Evan Montgomery-Recht <[email protected]>
Co-authored-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/CAEg7pwcKf01FmDqFAf-Hzu_pYnMYScY_Otid-pe9uw3BJ6gq9g@mail.gmail.com
---
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_spi_resowner/Makefile | 23 ++++
.../expected/ri_fastpath.out | 116 ++++++++++++++++++
.../modules/test_spi_resowner/meson.build | 31 +++++
.../test_spi_resowner/sql/ri_fastpath.sql | 105 ++++++++++++++++
.../test_spi_resowner--1.0.sql | 9 ++
.../test_spi_resowner/test_spi_resowner.c | 70 +++++++++++
.../test_spi_resowner.control | 4 +
9 files changed, 360 insertions(+)
create mode 100644 src/test/modules/test_spi_resowner/Makefile
create mode 100644 src/test/modules/test_spi_resowner/expected/ri_fastpath.out
create mode 100644 src/test/modules/test_spi_resowner/meson.build
create mode 100644 src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.c
create mode 100644 src/test/modules/test_spi_resowner/test_spi_resowner.control
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 0a74ab5c86f..016b328c8c5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -52,6 +52,7 @@ SUBDIRS = \
test_shmem \
test_shm_mq \
test_slru \
+ test_spi_resowner \
test_tidstore \
unsafe_tests \
worker_spi \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..3ca454064d0 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -53,6 +53,7 @@ subdir('test_saslprep')
subdir('test_shmem')
subdir('test_shm_mq')
subdir('test_slru')
+subdir('test_spi_resowner')
subdir('test_tidstore')
subdir('typcache')
subdir('unsafe_tests')
diff --git a/src/test/modules/test_spi_resowner/Makefile b/src/test/modules/test_spi_resowner/Makefile
new file mode 100644
index 00000000000..5a69e3a3c42
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_spi_resowner/Makefile
+
+MODULE_big = test_spi_resowner
+OBJS = \
+ $(WIN32RES) \
+ test_spi_resowner.o
+PGFILEDESC = "test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner"
+
+EXTENSION = test_spi_resowner
+DATA = test_spi_resowner--1.0.sql
+
+REGRESS = ri_fastpath
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_spi_resowner
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_spi_resowner/expected/ri_fastpath.out b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
new file mode 100644
index 00000000000..03984ca892e
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/expected/ri_fastpath.out
@@ -0,0 +1,116 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+-- C-level SPI INSERT: the critical test case.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+ spi_exec_sql
+--------------
+
+(1 row)
+
+-- C-level SPI with FK violation: should error
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+ERROR: insert or update on table "ri_fp_fk" violates foreign key constraint "ri_fp_fk_a_fkey"
+DETAIL: Key (a)=(999) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)"
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+SELECT plpgsql_calls_c_spi();
+ plpgsql_calls_c_spi
+---------------------
+
+(1 row)
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's query level and
+-- must fire before the inner resource owner is released.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner that is released after the FK batch
+-- callback fires.
+INSERT INTO ri_fp_outer VALUES (1);
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+ERROR: insert or update on table "ri_fp_inner" violates foreign key constraint "ri_fp_inner_id_fkey"
+DETAIL: Key (id)=(3) is not present in table "ri_fp_pk1".
+CONTEXT: SQL statement "INSERT INTO ri_fp_inner VALUES (3)"
+SQL statement "SELECT spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)')"
+PL/pgSQL function outer_trigger_spi_fail() line 3 at PERFORM
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/meson.build b/src/test/modules/test_spi_resowner/meson.build
new file mode 100644
index 00000000000..fbb027e05c7
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/meson.build
@@ -0,0 +1,31 @@
+test_spi_resowner_sources = files(
+ 'test_spi_resowner.c',
+)
+
+if host_system == 'windows'
+ test_spi_resowner_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_spi_resowner',
+ '--FILEDESC', 'test_spi_resowner - SQL-callable C SPI function under a dedicated ResourceOwner',])
+endif
+
+test_spi_resowner = shared_module('test_spi_resowner',
+ test_spi_resowner_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_spi_resowner
+
+test_install_data += files(
+ 'test_spi_resowner.control',
+ 'test_spi_resowner--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_spi_resowner',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'ri_fastpath',
+ ],
+ },
+}
diff --git a/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
new file mode 100644
index 00000000000..11a561a06ac
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/sql/ri_fastpath.sql
@@ -0,0 +1,105 @@
+--
+-- Test RI fast-path FK check under C-level SPI.
+--
+-- The RI fast-path caches PK relation references in ri_FastPathGetEntry()
+-- under the current resource owner. When FK triggers fire inside a
+-- C-level SPI context that creates a dedicated short-lived resource owner,
+-- those references must be released before the inner resource owner is
+-- released. The fix ensures batch callbacks fire at the same firing depth
+-- at which they were registered, while the corresponding resource owner
+-- is still alive. Without this, ri_FastPathTeardown would crash with
+-- Assert(rel->rd_refcnt > 0) in index_close.
+--
+-- Simple PL/pgSQL does not trigger this because its SPI connection spans
+-- the entire function call, so its resource owner outlives the batch
+-- callback. The critical test case requires a C function that creates a
+-- dedicated short-lived resource owner around its SPI call.
+--
+CREATE EXTENSION test_spi_resowner;
+
+CREATE TABLE ri_fp_pk1 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk2 (id serial PRIMARY KEY);
+CREATE TABLE ri_fp_pk3 (id serial PRIMARY KEY);
+INSERT INTO ri_fp_pk1 VALUES (1);
+INSERT INTO ri_fp_pk2 VALUES (1);
+INSERT INTO ri_fp_pk3 VALUES (1);
+
+CREATE TABLE ri_fp_fk (
+ id serial PRIMARY KEY,
+ a int REFERENCES ri_fp_pk1(id),
+ b int REFERENCES ri_fp_pk2(id),
+ c int REFERENCES ri_fp_pk3(id),
+ d int REFERENCES ri_fp_pk1(id),
+ e int REFERENCES ri_fp_pk2(id),
+ f int REFERENCES ri_fp_pk3(id)
+);
+
+-- C-level SPI INSERT: the critical test case.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- Additional C-level SPI INSERTs to exercise batch reuse across calls.
+-- Use different column orderings to ensure each is a distinct statement.
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (f, e, d, c, b, a) VALUES (1, 1, 1, 1, 1, 1)');
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, c, e, b, d, f) VALUES (1, 1, 1, 1, 1, 1)');
+
+-- C-level SPI with FK violation: should error
+SELECT spi_exec_sql(
+ 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (999, 1, 1, 1, 1, 1)');
+
+-- Nested: PL/pgSQL calling C SPI (mimics PostGIS toTopoGeom pattern)
+CREATE FUNCTION plpgsql_calls_c_spi() RETURNS void AS $$
+DECLARE
+ ins_stmt text := 'INSERT INTO ri_fp_fk (a, b, c, d, e, f) VALUES (1, 1, 1, 1, 1, 1)';
+BEGIN
+ PERFORM spi_exec_sql(ins_stmt);
+END;
+$$ LANGUAGE plpgsql;
+
+SELECT plpgsql_calls_c_spi();
+
+-- AFTER trigger that uses C-level SPI to insert into an FK-referencing table.
+-- The FK batch callback is registered at the inner SPI's query level and
+-- must fire before the inner resource owner is released.
+CREATE TABLE ri_fp_outer (id int PRIMARY KEY);
+CREATE TABLE ri_fp_inner (id int REFERENCES ri_fp_pk1(id));
+
+CREATE FUNCTION outer_trigger_spi_ok() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (1)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_ok();
+
+-- Fires outer_tg, whose PL/pgSQL body calls spi_exec_sql(). The C function
+-- creates a dedicated resource owner that is released after the FK batch
+-- callback fires.
+INSERT INTO ri_fp_outer VALUES (1);
+
+CREATE FUNCTION outer_trigger_spi_fail() RETURNS trigger AS $$
+BEGIN
+ PERFORM spi_exec_sql('INSERT INTO ri_fp_inner VALUES (3)');
+ RETURN NEW;
+END $$ LANGUAGE plpgsql;
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_ok();
+
+CREATE TRIGGER outer_tg AFTER INSERT ON ri_fp_outer
+ FOR EACH ROW EXECUTE FUNCTION outer_trigger_spi_fail();
+
+-- Like above but the inner insert fails.
+INSERT INTO ri_fp_outer VALUES (2);
+
+DROP TRIGGER outer_tg ON ri_fp_outer;
+DROP FUNCTION outer_trigger_spi_fail();
+DROP TABLE ri_fp_inner, ri_fp_outer;
+
+-- Cleanup
+DROP TABLE ri_fp_fk;
+DROP TABLE ri_fp_pk3, ri_fp_pk2, ri_fp_pk1;
+DROP EXTENSION test_spi_resowner;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
new file mode 100644
index 00000000000..29ef70ee0dc
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_spi_resowner/test_spi_resowner--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_spi_resowner" to load this file. \quit
+
+CREATE FUNCTION spi_exec_sql(query text)
+RETURNS void
+AS 'MODULE_PATHNAME', 'spi_exec_sql'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.c b/src/test/modules/test_spi_resowner/test_spi_resowner.c
new file mode 100644
index 00000000000..0306139b5c0
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.c
@@ -0,0 +1,70 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_spi_resowner.c
+ * SQL-callable C function that uses SPI to execute a query.
+ *
+ * Useful for testing code paths that only trigger under C-level
+ * SPI (not PL/pgSQL), such as resource owner interactions with
+ * RI fast-path FK checks.
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_spi_resowner/test_spi_resowner.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/spi.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(spi_exec_sql);
+
+/*
+ * spi_exec_sql(query text) - execute a SQL query via SPI.
+ *
+ * Opens a fresh SPI connection, executes the query, and closes the
+ * connection. Creates a dedicated child resource owner around the
+ * SPI_execute call and releases it before returning, ensuring that
+ * any resources registered under it (such as relation references
+ * opened by RI fast-path FK checks) are released before the outer
+ * trigger-firing batch callback fires. This reproduces the resource
+ * owner mismatch that occurs with C-language extensions like PostGIS
+ * topology functions, which cannot be triggered from PL/pgSQL since
+ * PL/pgSQL's SPI connection spans the entire function call.
+ */
+Datum
+spi_exec_sql(PG_FUNCTION_ARGS)
+{
+ const char *query = text_to_cstring(PG_GETARG_TEXT_PP(0));
+ int ret;
+ ResourceOwner save = CurrentResourceOwner;
+ ResourceOwner childowner = ResourceOwnerCreate(save, "test_spi inner");
+
+ SPI_connect();
+
+ CurrentResourceOwner = childowner;
+ ret = SPI_execute(query, false, 0);
+
+ if (ret < 0)
+ elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+ SPI_finish();
+
+ CurrentResourceOwner = save;
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_BEFORE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_LOCKS,
+ true, false);
+ ResourceOwnerRelease(childowner,
+ RESOURCE_RELEASE_AFTER_LOCKS,
+ true, false);
+ ResourceOwnerDelete(childowner);
+
+ PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_spi_resowner/test_spi_resowner.control b/src/test/modules/test_spi_resowner/test_spi_resowner.control
new file mode 100644
index 00000000000..2120ae9442f
--- /dev/null
+++ b/src/test/modules/test_spi_resowner/test_spi_resowner.control
@@ -0,0 +1,4 @@
+comment = 'Test SQL-callable C function that uses SPI using dedicated ResourceOwner'
+default_version = '1.0'
+module_pathname = '$libdir/test_spi_resowner'
+relocatable = true
--
2.47.3
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 10:25 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-10 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-10 23:34 ` Haibo Yan <[email protected]>
2026-04-15 06:03 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Haibo Yan @ 2026-04-10 23:34 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Evan Montgomery-Recht <[email protected]>; pgsql-hackers
On Fri, Apr 10, 2026 at 12:39 AM Amit Langote <[email protected]>
wrote:
> On Thu, Apr 9, 2026 at 7:26 PM Chao Li <[email protected]> wrote:
> > 0001 and 0002 look good to me. I didn’t review 0003 and don’t intend to
> review it.
>
> I've now pushed 0001 (34a3078629) and 0002 (d6e96bacd3c).
>
> Here's the remaning patch to add src/test/modules/test_spi_resowner
> rebased against master. I'm holding off on committing the test module
> until I confirm the policy on new test modules during feature freeze.
> It's also worth discussing whether this is the right place for testing
> C extensions that use SPI with a dedicated resource owner, or whether
> that coverage belongs elsewhere.
>
> --
> Thanks, Amit Langote
>
I reviewed the patch, and overall it looks close. I have a few comments:
1.
Should spi_exec_sql() be made exception-safe?
The current implementation does not restore CurrentResourceOwner or
release/delete childowner on all error paths, and it also does not check
for SPI_connect() failure. Since this module is specifically meant to
exercise ResourceOwner lifetime interactions, I think the helper itself
should be robust in both success and error paths.
2.
Consider adding a follow-up test that does failure first, then success.
That would help show that the helper does not leave any lingering state
behind after an error.
3.
Consider trimming the long explanatory comments in the regression test a
bit.
The rationale is useful, but some of it is repeated across the commit
message, the SQL file header, and the expected output.
Regards
Haibo
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 09:58 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-08 14:26 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-09 08:40 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-09 10:25 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-10 07:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-10 23:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Haibo Yan <[email protected]>
@ 2026-04-15 06:03 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-15 06:03 UTC (permalink / raw)
To: Haibo Yan <[email protected]>; +Cc: Chao Li <[email protected]>; Evan Montgomery-Recht <[email protected]>; pgsql-hackers
On Sat, Apr 11, 2026 at 8:34 AM Haibo Yan <[email protected]> wrote:
> On Fri, Apr 10, 2026 at 12:39 AM Amit Langote <[email protected]> wrote:
>> Here's the remaning patch to add src/test/modules/test_spi_resowner
>> rebased against master. I'm holding off on committing the test module
>> until I confirm the policy on new test modules during feature freeze.
>> It's also worth discussing whether this is the right place for testing
>> C extensions that use SPI with a dedicated resource owner, or whether
>> that coverage belongs elsewhere.
>
> I reviewed the patch, and overall it looks close. I have a few comments:
>
> Should spi_exec_sql() be made exception-safe?
>
> The current implementation does not restore CurrentResourceOwner or release/delete childowner on all error paths, and it also does not check for SPI_connect() failure. Since this module is specifically meant to exercise ResourceOwner lifetime interactions, I think the helper itself should be robust in both success and error paths.
>
> Consider adding a follow-up test that does failure first, then success.
>
> That would help show that the helper does not leave any lingering state behind after an error.
>
> Consider trimming the long explanatory comments in the regression test a bit.
>
> The rationale is useful, but some of it is repeated across the commit message, the SQL file header, and the expected output.
Thanks Haibo for the review. Your points are well taken and would need
to be addressed if this module were to be committed, but I've been
reconsidering whether to commit it at all. It was written to reproduce
a specific crash caused by an extension's idiosyncratic use of SPI
with a dedicated resource owner, a pattern that's specific to PostGIS
and similar extensions rather than something core PostgreSQL
exercises. Now that the crash is fixed, the module's main value is as
a regression test for that one scenario. I'm not convinced it pulls
its weight as a permanent addition to the test suite, especially given
the maintenance burden and the time it adds to test runs.
I'll hold off on committing it unless someone feels strongly that it
should be included.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 05:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-03 08:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-03 09:39 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-06 09:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 01:45 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-04-07 02:12 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-07 12:59 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-09 09:29 ` Amit Langote <[email protected]>
1 sibling, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-09 09:29 UTC (permalink / raw)
To: Evan Montgomery-Recht <[email protected]>; +Cc: pgsql-hackers
On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> wrote:
> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> <[email protected]> wrote:
> > Unrelated to my patch, SonarCloud flagged a potential issue in
> > recheck_matched_pk_tuple() (line 3370): the function loops over
> > ii_NumIndexKeyAttrs elements of the skeys array, but the caller in
> > ri_FastPathFlushArray passes recheck_skey[1] -- an array of exactly
> > one element. This is safe because ri_FastPathFlushArray is the
> >
> > single-column FK path, so ii_NumIndexKeyAttrs is always 1 there.
> > However, the function signature doesn't communicate this constraint,
> > which flags as CWE-125 (out-of-bounds read) / CERT C ARR30-C. Adding
> > an nkeys parameter (like ri_FastPathProbeOne already has) would make
> > the contract explicit.
>
> Makes sense. Will push the attached patch for this.
Pushed this fix.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-20 20:50 ` Peter Eisentraut <[email protected]>
2026-04-21 00:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
1 sibling, 1 reply; 61+ messages in thread
From: Peter Eisentraut @ 2026-04-20 20:50 UTC (permalink / raw)
To: Amit Langote <[email protected]>; Junwang Zhao <[email protected]>; +Cc: Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On 02.04.26 09:41, Amit Langote wrote:
> There's another case in which it is not ok to use FlushArray and that
> is if the index AM's amsearcharray is false (should be true in all
> cases because the unique index used for PK is always btree). Added an
> Assert to that effect next to where SK_SEARCHARRAY is set in
> ri_FastPathFlushArray rather than a runtime check in the dispatch
> condition.
>
> Patch updated. Also added a comment about invalidation requirement or
> lack thereof for RI_FastPathEntry, rename AfterTriggerBatchIsActive()
> to simply AfterTriggerIsActive(), fixed the comments in trigger.h
> describing the callback mechanism.
>
> Will push tomorrow morning (Friday) barring objections.
This commit contains a couple of calls
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
fk_rel, idx_rel);
where the cast casts away the const-ness of riinfo.
But this is kind of a lie, since the purpose of
ri_populate_fastpath_metadata() is to modify riinfo.
I think the right thing to do here is to unwind the const qualifiers up
the stack. See attached patch.
From 35f273c812aa4f1345a3c1a9eb1443e3c7439254 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Mon, 20 Apr 2026 22:40:05 +0200
Subject: [PATCH] Fix some const qualifier use in ri_triggers.c
---
src/backend/utils/adt/ri_triggers.c | 34 ++++++++++++++---------------
1 file changed, 16 insertions(+), 18 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e060280fcd4..f63a7f0b580 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -297,9 +297,9 @@ static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
int tgkind);
-static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
- Relation trig_rel, bool rel_is_pk);
-static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
+static RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
+ Relation trig_rel, bool rel_is_pk);
+static RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
static Oid get_ri_constraint_root(Oid constrOid);
static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
@@ -309,12 +309,12 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
-static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+static void ri_FastPathCheck(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+static void ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
- const RI_ConstraintInfo *riinfo);
+ RI_ConstraintInfo *riinfo);
static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
const RI_ConstraintInfo *riinfo, Relation fk_rel,
Snapshot snapshot, IndexScanDesc scandesc);
@@ -357,7 +357,7 @@ static void ri_FastPathTeardown(void);
static Datum
RI_FKey_check(TriggerData *trigdata)
{
- const RI_ConstraintInfo *riinfo;
+ RI_ConstraintInfo *riinfo;
Relation fk_rel;
Relation pk_rel;
TupleTableSlot *newslot;
@@ -2341,11 +2341,11 @@ ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
/*
* Fetch the RI_ConstraintInfo struct for the trigger's FK constraint.
*/
-static const RI_ConstraintInfo *
+static RI_ConstraintInfo *
ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
{
Oid constraintOid = trigger->tgconstraint;
- const RI_ConstraintInfo *riinfo;
+ RI_ConstraintInfo *riinfo;
/*
* Check that the FK constraint's OID is available; it might not be if
@@ -2395,7 +2395,7 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
/*
* Fetch or create the RI_ConstraintInfo struct for an FK constraint.
*/
-static const RI_ConstraintInfo *
+static RI_ConstraintInfo *
ri_LoadConstraintInfo(Oid constraintOid)
{
RI_ConstraintInfo *riinfo;
@@ -2777,7 +2777,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
* ri_FastPathBatchAdd().
*/
static void
-ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ri_FastPathCheck(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot)
{
Relation pk_rel;
@@ -2820,8 +2820,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
{
/* Reload to ensure it's valid. */
riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
+ ri_populate_fastpath_metadata(riinfo, fk_rel, idx_rel);
}
Assert(riinfo->fpmeta);
ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
@@ -2857,7 +2856,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ri_FastPathEndBatch().
*/
static void
-ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
@@ -2884,7 +2883,7 @@ ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
*/
static void
ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
- const RI_ConstraintInfo *riinfo)
+ RI_ConstraintInfo *riinfo)
{
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
@@ -2941,8 +2940,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
{
/* Reload to ensure it's valid. */
riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
+ ri_populate_fastpath_metadata(riinfo, fk_rel, idx_rel);
}
Assert(riinfo->fpmeta);
@@ -4147,7 +4145,7 @@ ri_FastPathEndBatch(void *arg)
if (entry->batch_count > 0)
{
Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
- const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+ RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
ri_FastPathBatchFlush(entry, fk_rel, riinfo);
table_close(fk_rel, NoLock);
--
2.53.0
Attachments:
[text/plain] 0001-Fix-some-const-qualifier-use-in-ri_triggers.c.patch (5.3K, 2-0001-Fix-some-const-qualifier-use-in-ri_triggers.c.patch)
download | inline diff:
From 35f273c812aa4f1345a3c1a9eb1443e3c7439254 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Mon, 20 Apr 2026 22:40:05 +0200
Subject: [PATCH] Fix some const qualifier use in ri_triggers.c
---
src/backend/utils/adt/ri_triggers.c | 34 ++++++++++++++---------------
1 file changed, 16 insertions(+), 18 deletions(-)
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e060280fcd4..f63a7f0b580 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -297,9 +297,9 @@ static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
int tgkind);
-static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
- Relation trig_rel, bool rel_is_pk);
-static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
+static RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
+ Relation trig_rel, bool rel_is_pk);
+static RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
static Oid get_ri_constraint_root(Oid constrOid);
static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
@@ -309,12 +309,12 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
-static void ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+static void ri_FastPathCheck(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
-static void ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+static void ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot);
static void ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
- const RI_ConstraintInfo *riinfo);
+ RI_ConstraintInfo *riinfo);
static int ri_FastPathFlushArray(RI_FastPathEntry *fpentry, TupleTableSlot *fk_slot,
const RI_ConstraintInfo *riinfo, Relation fk_rel,
Snapshot snapshot, IndexScanDesc scandesc);
@@ -357,7 +357,7 @@ static void ri_FastPathTeardown(void);
static Datum
RI_FKey_check(TriggerData *trigdata)
{
- const RI_ConstraintInfo *riinfo;
+ RI_ConstraintInfo *riinfo;
Relation fk_rel;
Relation pk_rel;
TupleTableSlot *newslot;
@@ -2341,11 +2341,11 @@ ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
/*
* Fetch the RI_ConstraintInfo struct for the trigger's FK constraint.
*/
-static const RI_ConstraintInfo *
+static RI_ConstraintInfo *
ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
{
Oid constraintOid = trigger->tgconstraint;
- const RI_ConstraintInfo *riinfo;
+ RI_ConstraintInfo *riinfo;
/*
* Check that the FK constraint's OID is available; it might not be if
@@ -2395,7 +2395,7 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
/*
* Fetch or create the RI_ConstraintInfo struct for an FK constraint.
*/
-static const RI_ConstraintInfo *
+static RI_ConstraintInfo *
ri_LoadConstraintInfo(Oid constraintOid)
{
RI_ConstraintInfo *riinfo;
@@ -2777,7 +2777,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
* ri_FastPathBatchAdd().
*/
static void
-ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ri_FastPathCheck(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot)
{
Relation pk_rel;
@@ -2820,8 +2820,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
{
/* Reload to ensure it's valid. */
riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
+ ri_populate_fastpath_metadata(riinfo, fk_rel, idx_rel);
}
Assert(riinfo->fpmeta);
ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
@@ -2857,7 +2856,7 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ri_FastPathEndBatch().
*/
static void
-ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
+ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
Relation fk_rel, TupleTableSlot *newslot)
{
RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
@@ -2884,7 +2883,7 @@ ri_FastPathBatchAdd(const RI_ConstraintInfo *riinfo,
*/
static void
ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
- const RI_ConstraintInfo *riinfo)
+ RI_ConstraintInfo *riinfo)
{
Relation pk_rel = fpentry->pk_rel;
Relation idx_rel = fpentry->idx_rel;
@@ -2941,8 +2940,7 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
{
/* Reload to ensure it's valid. */
riinfo = ri_LoadConstraintInfo(riinfo->constraint_id);
- ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
- fk_rel, idx_rel);
+ ri_populate_fastpath_metadata(riinfo, fk_rel, idx_rel);
}
Assert(riinfo->fpmeta);
@@ -4147,7 +4145,7 @@ ri_FastPathEndBatch(void *arg)
if (entry->batch_count > 0)
{
Relation fk_rel = table_open(entry->fk_relid, AccessShareLock);
- const RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+ RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
ri_FastPathBatchFlush(entry, fk_rel, riinfo);
table_close(fk_rel, NoLock);
--
2.53.0
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-20 20:50 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Peter Eisentraut <[email protected]>
@ 2026-04-21 00:52 ` Amit Langote <[email protected]>
2026-04-22 04:04 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
0 siblings, 1 reply; 61+ messages in thread
From: Amit Langote @ 2026-04-21 00:52 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Apr 21, 2026 at 5:50 AM Peter Eisentraut <[email protected]> wrote:
> On 02.04.26 09:41, Amit Langote wrote:
> > There's another case in which it is not ok to use FlushArray and that
> > is if the index AM's amsearcharray is false (should be true in all
> > cases because the unique index used for PK is always btree). Added an
> > Assert to that effect next to where SK_SEARCHARRAY is set in
> > ri_FastPathFlushArray rather than a runtime check in the dispatch
> > condition.
> >
> > Patch updated. Also added a comment about invalidation requirement or
> > lack thereof for RI_FastPathEntry, rename AfterTriggerBatchIsActive()
> > to simply AfterTriggerIsActive(), fixed the comments in trigger.h
> > describing the callback mechanism.
> >
> > Will push tomorrow morning (Friday) barring objections.
>
> This commit contains a couple of calls
>
> ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
> fk_rel, idx_rel);
>
> where the cast casts away the const-ness of riinfo.
>
> But this is kind of a lie, since the purpose of
> ri_populate_fastpath_metadata() is to modify riinfo.
>
> I think the right thing to do here is to unwind the const qualifiers up
> the stack. See attached patch.
Thanks for the patch. LGTM.
Are you planning to push it or do you want me to?
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
* Re: Eliminating SPI / SQL from some RI triggers - take 3
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 05:10 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-12-01 06:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-02 15:30 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-18 15:34 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-24 13:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-25 00:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 04:55 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-30 11:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 09:09 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Chao Li <[email protected]>
2026-03-31 09:17 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 10:57 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-03-31 12:15 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-03-31 15:54 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 09:51 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-01 11:56 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-02 07:41 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2026-04-20 20:50 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Peter Eisentraut <[email protected]>
2026-04-21 00:52 ` Re: Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
@ 2026-04-22 04:04 ` Amit Langote <[email protected]>
0 siblings, 0 replies; 61+ messages in thread
From: Amit Langote @ 2026-04-22 04:04 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: Junwang Zhao <[email protected]>; Chao Li <[email protected]>; Haibo Yan <[email protected]>; Pavel Stehule <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>
On Tue, Apr 21, 2026 at 9:52 AM Amit Langote <[email protected]> wrote:
> On Tue, Apr 21, 2026 at 5:50 AM Peter Eisentraut <[email protected]> wrote:
> > On 02.04.26 09:41, Amit Langote wrote:
> > > There's another case in which it is not ok to use FlushArray and that
> > > is if the index AM's amsearcharray is false (should be true in all
> > > cases because the unique index used for PK is always btree). Added an
> > > Assert to that effect next to where SK_SEARCHARRAY is set in
> > > ri_FastPathFlushArray rather than a runtime check in the dispatch
> > > condition.
> > >
> > > Patch updated. Also added a comment about invalidation requirement or
> > > lack thereof for RI_FastPathEntry, rename AfterTriggerBatchIsActive()
> > > to simply AfterTriggerIsActive(), fixed the comments in trigger.h
> > > describing the callback mechanism.
> > >
> > > Will push tomorrow morning (Friday) barring objections.
> >
> > This commit contains a couple of calls
> >
> > ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
> > fk_rel, idx_rel);
> >
> > where the cast casts away the const-ness of riinfo.
> >
> > But this is kind of a lie, since the purpose of
> > ri_populate_fastpath_metadata() is to modify riinfo.
> >
> > I think the right thing to do here is to unwind the const qualifiers up
> > the stack. See attached patch.
>
> Thanks for the patch. LGTM.
>
> Are you planning to push it or do you want me to?
Pushed.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 61+ messages in thread
end of thread, other threads:[~2026-04-22 04:04 UTC | newest]
Thread overview: 61+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-12-20 04:23 Eliminating SPI / SQL from some RI triggers - take 3 Amit Langote <[email protected]>
2025-10-21 04:07 ` Amit Langote <[email protected]>
2025-10-21 05:10 ` Pavel Stehule <[email protected]>
2025-10-22 13:55 ` Amit Langote <[email protected]>
2025-12-01 06:09 ` Junwang Zhao <[email protected]>
2026-03-02 07:49 ` Amit Langote <[email protected]>
2026-04-09 11:07 ` Sandro Santilli <[email protected]>
2026-04-09 11:55 ` Amit Langote <[email protected]>
2026-04-09 16:01 ` Sandro Santilli <[email protected]>
2026-04-10 04:14 ` Amit Langote <[email protected]>
2026-04-10 04:20 ` Chao Li <[email protected]>
2026-04-10 04:21 ` Amit Langote <[email protected]>
2026-04-10 18:35 ` Sandro Santilli <[email protected]>
2026-03-02 15:30 ` Junwang Zhao <[email protected]>
2026-03-10 12:28 ` Junwang Zhao <[email protected]>
2026-03-16 14:03 ` Amit Langote <[email protected]>
2026-03-20 08:20 ` Amit Langote <[email protected]>
2026-03-18 15:34 ` Junwang Zhao <[email protected]>
2026-03-19 16:19 ` Junwang Zhao <[email protected]>
2026-03-24 11:47 ` Amit Langote <[email protected]>
2026-03-24 13:56 ` Amit Langote <[email protected]>
2026-03-25 00:41 ` Amit Langote <[email protected]>
2026-03-30 04:55 ` Amit Langote <[email protected]>
2026-03-30 11:15 ` Amit Langote <[email protected]>
2026-03-31 09:09 ` Chao Li <[email protected]>
2026-03-31 09:17 ` Amit Langote <[email protected]>
2026-03-31 10:57 ` Junwang Zhao <[email protected]>
2026-03-31 11:26 ` Amit Langote <[email protected]>
2026-03-31 11:35 ` Daniel Gustafsson <[email protected]>
2026-03-31 13:33 ` Tomas Vondra <[email protected]>
2026-04-01 00:06 ` Amit Langote <[email protected]>
2026-03-31 12:15 ` Amit Langote <[email protected]>
2026-03-31 15:54 ` Junwang Zhao <[email protected]>
2026-04-01 08:51 ` Amit Langote <[email protected]>
2026-04-01 09:51 ` Amit Langote <[email protected]>
2026-04-01 11:56 ` Junwang Zhao <[email protected]>
2026-04-01 12:18 ` Amit Langote <[email protected]>
2026-04-02 07:41 ` Amit Langote <[email protected]>
2026-04-02 07:59 ` Chao Li <[email protected]>
2026-04-03 05:52 ` Amit Langote <[email protected]>
2026-04-03 08:57 ` Chao Li <[email protected]>
2026-04-03 09:39 ` Amit Langote <[email protected]>
2026-04-06 09:45 ` Amit Langote <[email protected]>
2026-04-07 01:45 ` Chao Li <[email protected]>
2026-04-07 02:12 ` Amit Langote <[email protected]>
2026-04-07 12:59 ` Evan Montgomery-Recht <[email protected]>
2026-04-08 01:23 ` Amit Langote <[email protected]>
2026-04-08 09:58 ` Amit Langote <[email protected]>
2026-04-08 14:26 ` Amit Langote <[email protected]>
2026-04-09 07:39 ` Chao Li <[email protected]>
2026-04-09 08:40 ` Amit Langote <[email protected]>
2026-04-09 09:21 ` jie wang <[email protected]>
2026-04-09 09:24 ` Amit Langote <[email protected]>
2026-04-09 10:25 ` Chao Li <[email protected]>
2026-04-10 07:39 ` Amit Langote <[email protected]>
2026-04-10 23:34 ` Haibo Yan <[email protected]>
2026-04-15 06:03 ` Amit Langote <[email protected]>
2026-04-09 09:29 ` Amit Langote <[email protected]>
2026-04-20 20:50 ` Peter Eisentraut <[email protected]>
2026-04-21 00:52 ` Amit Langote <[email protected]>
2026-04-22 04:04 ` Amit Langote <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox