public inbox for [email protected]
help / color / mirror / Atom feedFrom: Amit Langote <[email protected]>
To: Junwang Zhao <[email protected]>
Cc: Pavel Stehule <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Date: Thu, 19 Feb 2026 18:21:27 +0900
Message-ID: <CA+HiwqEyiLCY6MTLbOJXDdLNNQLaURYHvdW797MQgbjEK9od4Q@mail.gmail.com> (raw)
In-Reply-To: <CAEG8a3JM=NoqiTK0V6S9FNxZPvy1+C5F7rfafTtPKBVWnunL-g@mail.gmail.com>
References: <CA+HiwqF4C0ws3cO+z5cLkPuvwnAwkSp7sfvgGj3yQ=Li6KNMqA@mail.gmail.com>
<CA+HiwqGM6nvAV5O+=Nr+BXMPWOma0oeCRVzVP0XiLE8zX5TVAg@mail.gmail.com>
<CA+HiwqGMaovCUgDbGxVGnK0Mrivr+ph3YE2Ws+47-ugyPb4f7g@mail.gmail.com>
<CAFj8pRDaiBe_GOLk_yyYHTtPiDAAaLOM8u1-=Q3ZgXBTH+1igg@mail.gmail.com>
<CA+HiwqGA5Ay_MR0eJEEbt4j6WrVh4F+AasTp8yCbs5aJLOJn6Q@mail.gmail.com>
<CAEG8a3JM=NoqiTK0V6S9FNxZPvy1+C5F7rfafTtPKBVWnunL-g@mail.gmail.com>
Hi Junwang,
On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> design.
>
> 0001 adds a fast path optimization for foreign key constraint checks
> that bypasses the SPI executor, the fast path applies when the referenced
> table is not partitioned, and the constraint does not involve temporal
> semantics.
>
> With the following test:
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
>
> head:
>
> [local] zhjwpku@postgres:5432-90419=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 13516.177 ms (00:13.516)
>
> [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 15057.638 ms (00:15.058)
>
> patched:
>
> [local] zhjwpku@postgres:5432-98673=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 8248.777 ms (00:08.249)
>
> [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 10117.002 ms (00:10.117)
>
> 0002 cache fast-path metadata used by the index probe, at the current
> time only comparison operator hash entries, operator function OIDs
> and strategy numbers and subtypes for index scans. But this cache
> doesn't buy any performance improvement.
>
> Caching additional metadata should improve performance for foreign key checks.
>
> Amit suggested introducing a mechanism for ri_triggers.c to register a
> cleanup callback in the EState, which AfterTriggerEndQuery() could then
> invoke to release per-statement cached metadata (such as the IndexScanDesc).
> However, I haven't been able to implement this mechanism yet.
Thanks for working on this. I've taken your patches as a starting
point and reworked the series into two patches (attached): 1st is your
0001+0002 as the core patch that adds a gated fast-path alternative to
SPI and 2nd where I added per-statement resource caching. Doing the
latter turned out to be not so hard thanks to the structure you chose
to build the core fast path. Good call on adding the RLS and ACL test
cases, btw.
So, 0001 is a functionally complete fast path: concurrency handling,
REPEATABLE READ crosscheck, cross-type operators, security context,
and metadata caching. 0002 implements the per-statement resource
caching we discussed, though instead of sharing the EState between
trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
mechanism that fires at the end of each trigger-firing cycle
(per-statement for immediate constraints, or until COMMIT for deferred
ones). It layers resource caching on top so that the PK relation,
index, scan descriptor, and snapshot stay open across all FK trigger
invocations within a single trigger-firing cycle rather than being
opened and closed per row.
Note that phe previous 0002 (metadata caching) is folded into 0001,
and most of the new fast-path logic added in 0001 now lives in
ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
RI_FKey_check diff is just the gating call and SPI fallback.
I re-ran the benchmarks (same test as yours, different machine):
create table pk (a numeric primary key);
create table fk (a bigint references pk);
insert into pk select generate_series(1, 2000000);
insert into fk select generate_series(1, 2000000, 2);
master: 2444 ms (median of 3 runs)
0001: 1382 ms (43% faster)
0001+0002: 1202 ms (51% faster, 13% over 0001 alone)
Also, with int PK / int FK (1M rows):
create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select generate_series(1, 1000000);
master: 1000 ms
0001: 520 ms (48% faster)
0001+0002: 432 ms (57% faster, 17% over 0001 alone)
The incremental gain from 0002 comes from eliminating per-row relation
open/close, scan begin/end, slot alloc/free, and replacing per-row
GetSnapshotData() with only curcid adjustment on the registered
snapshot copy in the cache.
The two current limitations are partitioned referenced tables and
temporal foreign keys. Partitioned PKs are relatively uncommon in
practice, so the non-partitioned case should cover most FK workloads,
so I'm not sure it's worth the added complexity to support them.
Temporal FKs are inherently multi-row, so they're a poor fit for a
single-probe fast path.
David Rowley mentioned off-list that it might be worth batching
multiple FK values into a single index probe, leveraging the
ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
to buffer FK values across trigger invocations in the per-constraint
cache (0002 already has the right structure for this), build a
SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
pages in one sorted traversal instead of one tree descent per row. The
locking and recheck would still be per-tuple, but the index traversal
cost drops significantly. Single-column FKs are the obvious starting
point. That seems worth exploring but can be done as a separate patch
on top of this.
I think the series is in reasonable shape but would appreciate extra
eyeballs, especially on the concurrency handling in ri_LockPKTuple()
in 0001 and the snapshot lifecycle in 0002. Or anything else that
catches one's eye.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v3-0001-Add-fast-path-for-foreign-key-constraint-checks.patch (27.5K, 2-v3-0001-Add-fast-path-for-foreign-key-constraint-checks.patch)
download | inline diff:
From 0359b2a0db7a195e797851b56c380a61ac549ecc Mon Sep 17 00:00:00 2001
From: Junwang Zhao <[email protected]>
Date: Thu, 19 Feb 2026 18:06:34 +0900
Subject: [PATCH v3 1/2] Add fast path for foreign key constraint checks
Add a fast-path optimization for foreign key checks that bypasses SPI
by directly probing the unique index on the referenced table.
The fast path applies when the referenced table is not partitioned and
the constraint does not involve temporal semantics. ri_FastPathCheck()
extracts the FK values, builds scan keys, performs an index scan, and
locks the matching tuple with LockTupleKeyShare via ri_LockPKTuple(),
which handles the RI-specific subset of table_tuple_lock() results.
If the locked tuple was reached by chasing an update chain
(tmfd.traversed), recheck_matched_pk_tuple() verifies that the key
is still the same, emulating EvalPlanQual.
For REPEATABLE READ / SERIALIZABLE, a second index probe with
GetTransactionSnapshot() replicates the SPI path's crosscheck_snapshot
behavior: a PK row visible to the latest snapshot but not to the
transaction snapshot is rejected.
The ri_CheckPermissions() function performs schema USAGE and table
SELECT checks, matching what the SPI path does implicitly.
ri_HashCompareOp() is adjusted to handle cross-type equality operators
(e.g. int48eq for int4 PK / int8 FK) which can appear in conpfeqop.
The original code asserted same-type operators only.
Per-key metadata (compare entries, operator procedures, strategy
numbers) is cached in RI_ConstraintInfo via
ri_populate_fastpath_metadata() on first use, eliminating repeated
calls to ri_HashCompareOp() and get_op_opfamily_properties().
conindid and pk_is_partitioned are also cached at constraint load
time, avoiding per-invocation syscache lookups and the need to open
pk_rel before deciding whether the fast path applies.
Author: Junwang Zhao <[email protected]>
Author: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/
---
src/backend/utils/adt/ri_triggers.c | 465 +++++++++++++++++-
.../expected/fk-concurrent-pk-upd.out | 58 +++
src/test/isolation/isolation_schedule | 1 +
.../isolation/specs/fk-concurrent-pk-upd.spec | 42 ++
src/test/regress/expected/foreign_key.out | 47 ++
src/test/regress/sql/foreign_key.sql | 64 +++
6 files changed, 665 insertions(+), 12 deletions(-)
create mode 100644 src/test/isolation/expected/fk-concurrent-pk-upd.out
create mode 100644 src/test/isolation/specs/fk-concurrent-pk-upd.spec
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d22b8ef7f3c..45cc742fa19 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -24,12 +24,15 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/skey.h"
#include "access/sysattr.h"
#include "access/table.h"
#include "access/tableam.h"
#include "access/xact.h"
+#include "catalog/index.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
#include "commands/trigger.h"
#include "executor/executor.h"
#include "executor/spi.h"
@@ -91,6 +94,7 @@
#define RI_TRIGTYPE_UPDATE 2
#define RI_TRIGTYPE_DELETE 3
+struct RI_CompareHashEntry;
/*
* RI_ConstraintInfo
@@ -132,6 +136,16 @@ typedef struct RI_ConstraintInfo
Oid period_intersect_oper; /* anyrange * anyrange (or
* multiranges) */
dlist_node valid_link; /* Link in list of valid entries */
+
+ Oid conindid;
+ bool pk_is_partitioned;
+
+ /* Fast-path metadata for RI checks on foreign tables */
+ bool fpmeta_valid; /* is fast-path metadata valid? */
+ struct RI_CompareHashEntry *compare_entries[RI_MAX_NUMKEYS];
+ RegProcedure regops[RI_MAX_NUMKEYS];
+ Oid subtypes[RI_MAX_NUMKEYS];
+ int strats[RI_MAX_NUMKEYS];
} RI_ConstraintInfo;
/*
@@ -233,6 +247,19 @@ static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
TupleTableSlot *oldslot, TupleTableSlot *newslot,
bool is_restrict,
bool detectNewRows, int expect_OK);
+static bool ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot);
+static bool ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated);
+static bool ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo);
+static void ri_CheckPermissions(Relation query_rel);
+static bool recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot);
+static void build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys);
+static void ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel);
static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
const RI_ConstraintInfo *riinfo, bool rel_is_pk,
Datum *vals, char *nulls);
@@ -276,14 +303,7 @@ RI_FKey_check(TriggerData *trigdata)
if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
return PointerGetDatum(NULL);
- /*
- * Get the relation descriptors of the FK and PK tables.
- *
- * pk_rel is opened in RowShareLock mode since that's what our eventual
- * SELECT FOR KEY SHARE will get on it.
- */
fk_rel = trigdata->tg_relation;
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
{
@@ -293,7 +313,6 @@ RI_FKey_check(TriggerData *trigdata)
* No further check needed - an all-NULL key passes every type of
* foreign key constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case RI_KEYS_SOME_NULL:
@@ -318,7 +337,6 @@ RI_FKey_check(TriggerData *trigdata)
errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
errtableconstraint(fk_rel,
NameStr(riinfo->conname))));
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
case FKCONSTR_MATCH_SIMPLE:
@@ -327,7 +345,6 @@ RI_FKey_check(TriggerData *trigdata)
* MATCH SIMPLE - if ANY column is null, the key passes
* the constraint.
*/
- table_close(pk_rel, RowShareLock);
return PointerGetDatum(NULL);
#ifdef NOT_USED
@@ -352,8 +369,42 @@ RI_FKey_check(TriggerData *trigdata)
break;
}
+ /*
+ * Fast path: probe the PK unique index directly, bypassing SPI.
+ *
+ * Note: pk_rel is NOT opened above. ri_fastpath_is_applicable() uses
+ * cached metadata (pk_is_partitioned) rather than an open Relation, and
+ * ri_FastPathCheck() opens it internally.
+ */
+ if (ri_fastpath_is_applicable(riinfo))
+ {
+ bool found = ri_FastPathCheck(riinfo, fk_rel, newslot);
+
+ if (found)
+ return PointerGetDatum(NULL);
+
+ /*
+ * ri_FastPathCheck opens pk_rel internally; we need it for
+ * ri_ReportViolation. Re-open briefly.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ ri_ReportViolation(riinfo, pk_rel, fk_rel,
+ newslot, NULL,
+ RI_PLAN_CHECK_LOOKUPPK, false, false);
+ }
+
+ /* Fall back to SPI */
+ elog(DEBUG1, "RI fastpath: constraint \"%s\" falling back to SPI",
+ NameStr(riinfo->conname));
+
SPI_connect();
+ /*
+ * pk_rel is opened in RowShareLock mode since that's what our eventual
+ * SELECT FOR KEY SHARE will get on it.
+ */
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+
/* Fetch or prepare a saved plan for the real check */
ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
@@ -2355,6 +2406,11 @@ ri_LoadConstraintInfo(Oid constraintOid)
dclist_push_tail(&ri_constraint_cache_valid_list, &riinfo->valid_link);
riinfo->valid = true;
+ riinfo->fpmeta_valid = false;
+
+ riinfo->conindid = conForm->conindid;
+ riinfo->pk_is_partitioned =
+ (get_rel_relkind(riinfo->pk_relid) == RELKIND_PARTITIONED_TABLE);
return riinfo;
}
@@ -2617,6 +2673,383 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
return SPI_processed != 0;
}
+/*
+ * ri_FastPathCheck
+ * Perform FK existence check via direct index probe, bypassing SPI.
+ *
+ * Returns true if the PK row was found (constraint satisfied).
+ * Returns false if no matching PK row exists; caller should call
+ * ri_ReportViolation().
+ */
+static bool
+ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
+ Relation fk_rel, TupleTableSlot *newslot)
+{
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Datum pk_vals[INDEX_MAX_KEYS];
+ char pk_nulls[INDEX_MAX_KEYS];
+ ScanKeyData skey[INDEX_MAX_KEYS];
+ bool found = false;
+ Oid saved_userid;
+ int saved_sec_context;
+ Snapshot snapshot;
+
+ /*
+ * Advance the command counter so the snapshot sees the effects of prior
+ * triggers in this statement. Mirrors what the SPI path does in
+ * ri_PerformCheck().
+ */
+ CommandCounterIncrement();
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ slot = table_slot_create(pk_rel, NULL);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ if (!riinfo->fpmeta_valid)
+ ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
+ fk_rel, idx_rel);
+ Assert(riinfo->fpmeta_valid);
+
+ GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+ SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+ saved_sec_context |
+ SECURITY_LOCAL_USERID_CHANGE |
+ SECURITY_NOFORCE_RLS);
+ ri_CheckPermissions(pk_rel);
+
+ ri_ExtractValues(fk_rel, newslot, riinfo, false, pk_vals, pk_nulls);
+ build_index_scankeys(riinfo, idx_rel, pk_vals, pk_nulls, skey);
+
+ index_rescan(scandesc, skey, riinfo->nkeys, NULL, 0);
+
+ if (index_getnext_slot(scandesc, ForwardScanDirection, slot))
+ {
+ bool concurrently_updated;
+
+ if (ri_LockPKTuple(pk_rel, slot, snapshot,
+ &concurrently_updated))
+ {
+ if (concurrently_updated)
+ found = recheck_matched_pk_tuple(idx_rel, skey, slot);
+ else
+ found = true;
+ }
+ }
+
+ /*--------
+ * Crosscheck for REPEATABLE READ / SERIALIZABLE:
+ *
+ * The latest snapshot can see PK rows committed after our transaction
+ * started. But the FK check must only succeed if the key also exists in a
+ * version visible to our transaction snapshot. We can't just do
+ * table_tuple_satisfies_snapshot on the locked tuple, because a non-key
+ * update creates a new version invisible to our snapshot even though the
+ * key hasn't changed.
+ *
+ * Instead, do a second index probe with the transaction snapshot. This
+ * correctly handles both cases:
+ * - Newly inserted PK row: not found -> reject
+ * - Non-key update of existing row: old version found -> accept
+ *
+ * This matches the crosscheck_snapshot behavior in the SPI path's
+ * ri_PerformCheck().
+ */
+ if (found && IsolationUsesXactSnapshot())
+ {
+ IndexScanDesc xact_scan;
+ TupleTableSlot *xact_slot;
+ Snapshot xact_snap = GetTransactionSnapshot();
+
+ xact_slot = table_slot_create(pk_rel, NULL);
+ xact_scan = index_beginscan(pk_rel,
+ idx_rel,
+ xact_snap, NULL,
+ riinfo->nkeys, 0);
+ index_rescan(xact_scan, skey, riinfo->nkeys, NULL, 0);
+
+ if (!index_getnext_slot(xact_scan, ForwardScanDirection, xact_slot))
+ found = false;
+
+ index_endscan(xact_scan);
+ ExecDropSingleTupleTableSlot(xact_slot);
+ }
+
+ index_endscan(scandesc);
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+ ExecDropSingleTupleTableSlot(slot);
+
+ UnregisterSnapshot(snapshot);
+
+ SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+ return found;
+}
+
+/*
+ * ri_LockPKTuple
+ * Lock a PK tuple found by the fast-path index scan.
+ *
+ * Calls table_tuple_lock() directly with handling specific to RI checks.
+ * Returns true if the tuple was successfully locked.
+ *
+ * Sets *concurrently_updated to true if the locked tuple was reached
+ * by following an update chain (tmfd.traversed), indicating the caller
+ * should recheck the key.
+ */
+static bool
+ri_LockPKTuple(Relation pk_rel, TupleTableSlot *slot, Snapshot snap,
+ bool *concurrently_updated)
+{
+ TM_FailureData tmfd;
+ TM_Result result;
+ int lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+
+ *concurrently_updated = false;
+
+ if (!IsolationUsesXactSnapshot())
+ lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+ result = table_tuple_lock(pk_rel, &slot->tts_tid, snap,
+ slot, GetCurrentCommandId(false),
+ LockTupleKeyShare, LockWaitBlock,
+ lockflags, &tmfd);
+
+ switch (result)
+ {
+ case TM_Ok:
+ if (tmfd.traversed)
+ *concurrently_updated = true;
+ return true;
+
+ case TM_Deleted:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+ return false;
+
+ case TM_Updated:
+ if (IsolationUsesXactSnapshot())
+ ereport(ERROR,
+ (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg("could not serialize access due to concurrent update")));
+
+ /*
+ * In READ COMMITTED, FIND_LAST_VERSION should have chased the
+ * chain and returned TM_Ok. Getting here means something
+ * unexpected -- fall through to error.
+ */
+ elog(ERROR, "unexpected table_tuple_lock status: %u", result);
+ break;
+
+ case TM_SelfModified:
+
+ /*
+ * The current command or a later command in this transaction
+ * modified the PK row. This shouldn't normally happen during an
+ * FK check (we're not modifying pk_rel), but handle it safely by
+ * treating the tuple as not found.
+ */
+ return false;
+
+ case TM_Invisible:
+ elog(ERROR, "attempted to lock invisible tuple");
+ break;
+
+ default:
+ elog(ERROR, "unrecognized table_tuple_lock status: %u", result);
+ break;
+ }
+
+ return false; /* keep compiler quiet */
+}
+
+static bool
+ri_fastpath_is_applicable(const RI_ConstraintInfo *riinfo)
+{
+ /*
+ * Partitioned referenced tables are skipped for simplicity, since they
+ * require routing the probe through the correct partition using
+ * PartitionDirectory.
+ */
+ if (riinfo->pk_is_partitioned)
+ return false;
+
+ /*
+ * Temporal foreign keys use range overlap and containment semantics (&&,
+ * <@, range_agg()) that inherently involve aggregation and multiple-row
+ * reasoning, so they stay on the SPI path.
+ */
+ if (riinfo->hasperiod)
+ return false;
+
+ return true;
+}
+
+/*
+ * ri_CheckPermissions
+ * Check that the new user has permissions to look into the schema of
+ * and SELECT from 'query_rel'
+ *
+ * Provided for non-SQL implementors of an RI_Plan.
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+ AclResult aclresult;
+
+ /* USAGE on schema. */
+ aclresult = object_aclcheck(NamespaceRelationId,
+ RelationGetNamespace(query_rel),
+ GetUserId(), ACL_USAGE);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_SCHEMA,
+ get_namespace_name(RelationGetNamespace(query_rel)));
+
+ /* SELECT on relation. */
+ aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+ ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult, OBJECT_TABLE,
+ RelationGetRelationName(query_rel));
+}
+
+/*
+ * This checks that the index key of the tuple specified in 'new_slot' matches
+ * the key that has already been found in the PK index relation 'idxrel'.
+ *
+ * Returns true if the index key of the tuple matches the existing index
+ * key, false otherwise.
+ */
+static bool
+recheck_matched_pk_tuple(Relation idxrel, ScanKeyData *skeys,
+ TupleTableSlot *new_slot)
+{
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+ Datum values[INDEX_MAX_KEYS];
+ bool isnull[INDEX_MAX_KEYS];
+ bool matched = true;
+
+ /* PK indexes never have these. */
+ Assert(indexInfo->ii_Expressions == NIL &&
+ indexInfo->ii_ExclusionOps == NULL);
+
+ /* Form the index values and isnull flags given the table tuple. */
+ FormIndexDatum(indexInfo, new_slot, NULL, values, isnull);
+ for (int i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ ScanKeyData *skey = &skeys[i];
+
+ /* A PK column can never be set to NULL. */
+ Assert(!isnull[i]);
+ if (!DatumGetBool(FunctionCall2Coll(&skey->sk_func,
+ skey->sk_collation,
+ values[i],
+ skey->sk_argument)))
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ return matched;
+}
+
+/*
+ * build_index_scankeys
+ * Build ScanKeys for a direct index probe of the PK's unique index.
+ *
+ * Uses cached compare entries, operator procedures, and strategy numbers
+ * from ri_populate_fastpath_metadata() rather than looking them up on
+ * each invocation. Casts FK values to the operator's expected input
+ * type if needed.
+ */
+static void
+build_index_scankeys(const RI_ConstraintInfo *riinfo,
+ Relation idx_rel, Datum *pk_vals,
+ char *pk_nulls, ScanKey skeys)
+{
+ /*
+ * May need to cast each of the individual values of the foreign key to
+ * the corresponding PK column's type if the equality operator demands it.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ if (pk_nulls[i] != 'n')
+ {
+ RI_CompareHashEntry *entry = riinfo->compare_entries[i];
+
+ if (OidIsValid(entry->cast_func_finfo.fn_oid))
+ pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+ pk_vals[i],
+ Int32GetDatum(-1), /* typmod */
+ BoolGetDatum(false)); /* implicit coercion */
+ }
+ else
+ {
+ Assert(false);
+ }
+ }
+
+ /*
+ * Set up ScanKeys for the index scan. This is essentially how
+ * ExecIndexBuildScanKeys() sets them up.
+ */
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ int pkattrno = i + 1;
+
+ ScanKeyEntryInitialize(&skeys[i], 0, pkattrno,
+ riinfo->strats[i], riinfo->subtypes[i],
+ idx_rel->rd_indcollation[i], riinfo->regops[i],
+ pk_vals[i]);
+ }
+}
+
+/*
+ * ri_populate_fastpath_metadata
+ * Cache per-key metadata needed by build_index_scankeys().
+ *
+ * Looks up the compare hash entry, operator procedure OID, and index
+ * strategy/subtype for each key column. Called lazily on first use
+ * and persists for the lifetime of the RI_ConstraintInfo entry.
+ */
+static void
+ri_populate_fastpath_metadata(RI_ConstraintInfo *riinfo,
+ Relation fk_rel, Relation idx_rel)
+{
+ Assert(riinfo != NULL && riinfo->valid);
+
+ for (int i = 0; i < riinfo->nkeys; i++)
+ {
+ Oid eq_opr = riinfo->pf_eq_oprs[i];
+ Oid typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+ Oid lefttype;
+ RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+ riinfo->compare_entries[i] = entry;
+ riinfo->regops[i] = get_opcode(eq_opr);
+
+ get_op_opfamily_properties(eq_opr,
+ idx_rel->rd_opfamily[i],
+ false,
+ &riinfo->strats[i],
+ &lefttype,
+ &riinfo->subtypes[i]);
+ }
+
+ riinfo->fpmeta_valid = true;
+}
+
/*
* Extract fields from a tuple into Datum/nulls arrays
*/
@@ -3169,8 +3602,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
* moment since that will never be generated for implicit coercions.
*/
op_input_types(eq_opr, &lefttype, &righttype);
- Assert(lefttype == righttype);
- if (typeid == lefttype)
+
+ /*
+ * Don't need to cast if the FK column type already matches what the
+ * operator expects. For same-type operators, that's the common type.
+ * For cross-type operators (e.g. int48eq for int4 PK / int8 FK), the
+ * FK value is the right operand, so skip the cast if typeid matches
+ * righttype.
+ */
+ if ((lefttype == righttype && typeid == lefttype) ||
+ (lefttype != righttype && typeid == righttype))
castfunc = InvalidOid; /* simplest case */
else
{
diff --git a/src/test/isolation/expected/fk-concurrent-pk-upd.out b/src/test/isolation/expected/fk-concurrent-pk-upd.out
new file mode 100644
index 00000000000..9bbec638ac9
--- /dev/null
+++ b/src/test/isolation/expected/fk-concurrent-pk-upd.out
@@ -0,0 +1,58 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2ukey s1i s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2c: COMMIT;
+step s1i: <... completed>
+ERROR: insert or update on table "child" violates foreign key constraint "child_parent_key_fkey"
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 2|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+(0 rows)
+
+
+starting permutation: s2uaux s1i s2c s1c s2s s1s
+step s2uaux: UPDATE parent SET aux = 'bar' WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1);
+step s2c: COMMIT;
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|bar
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
+
+starting permutation: s2ukey s1i s2ukey2 s2c s1c s2s s1s
+step s2ukey: UPDATE parent SET parent_key = 2 WHERE parent_key = 1;
+step s1i: INSERT INTO child VALUES (1, 1); <waiting ...>
+step s2ukey2: UPDATE parent SET parent_key = 1 WHERE parent_key = 2;
+step s2c: COMMIT;
+step s1i: <... completed>
+step s1c: COMMIT;
+step s2s: SELECT * FROM parent;
+parent_key|aux
+----------+---
+ 1|foo
+(1 row)
+
+step s1s: SELECT * FROM child;
+child_key|parent_key
+---------+----------
+ 1| 1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..c1a999bf1d2 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -37,6 +37,7 @@ test: fk-partitioned-2
test: fk-snapshot
test: fk-snapshot-2
test: fk-snapshot-3
+test: fk-concurrent-pk-upd
test: subxid-overflow
test: eval-plan-qual
test: eval-plan-qual-trigger
diff --git a/src/test/isolation/specs/fk-concurrent-pk-upd.spec b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
new file mode 100644
index 00000000000..d59a14d4de1
--- /dev/null
+++ b/src/test/isolation/specs/fk-concurrent-pk-upd.spec
@@ -0,0 +1,42 @@
+# Tests that an INSERT on referencing table correctly fails when
+# the referenced value disappears due to a concurrent update
+setup
+{
+ CREATE TABLE parent (
+ parent_key int PRIMARY KEY,
+ aux text NOT NULL
+ );
+
+ CREATE TABLE child (
+ child_key int PRIMARY KEY,
+ parent_key int8 NOT NULL REFERENCES parent
+ );
+
+ INSERT INTO parent VALUES (1, 'foo');
+}
+
+teardown
+{
+ DROP TABLE parent, child;
+}
+
+session s1
+setup { BEGIN; }
+step s1i { INSERT INTO child VALUES (1, 1); }
+step s1c { COMMIT; }
+step s1s { SELECT * FROM child; }
+
+session s2
+setup { BEGIN; }
+step s2ukey { UPDATE parent SET parent_key = 2 WHERE parent_key = 1; }
+step s2uaux { UPDATE parent SET aux = 'bar' WHERE parent_key = 1; }
+step s2ukey2 { UPDATE parent SET parent_key = 1 WHERE parent_key = 2; }
+step s2c { COMMIT; }
+step s2s { SELECT * FROM parent; }
+
+# fail
+permutation s2ukey s1i s2c s1c s2s s1s
+# ok
+permutation s2uaux s1i s2c s1c s2s s1s
+# ok
+permutation s2ukey s1i s2ukey2 s2c s1c s2s s1s
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 9ae4dbf1b0a..0826f518004 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -370,6 +370,53 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+SET ROLE regress_foreign_key_user;
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+RESET ROLE;
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+ERROR: permission denied for table pktable
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+--
-- Check initial check upon ALTER TABLE
--
CREATE TABLE PKTABLE ( ptest1 int, ptest2 int, PRIMARY KEY(ptest1, ptest2) );
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 3b8c95bf893..e9ee29331cb 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -242,6 +242,70 @@ SELECT * FROM PKTABLE;
DROP TABLE FKTABLE;
DROP TABLE PKTABLE;
+--
+-- Check RLS
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant privileges on PKTABLE/FKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+GRANT SELECT, INSERT ON FKTABLE TO regress_foreign_key_user;
+
+-- Enable RLS on PKTABLE and Create policies
+ALTER TABLE PKTABLE ENABLE ROW LEVEL SECURITY;
+CREATE POLICY pktable_view_odd_policy ON PKTABLE TO regress_foreign_key_user USING (ptest1 % 2 = 1);
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+SET ROLE regress_foreign_key_user;
+
+INSERT INTO FKTABLE VALUES (3, 5);
+INSERT INTO FKTABLE VALUES (2, 5); -- success, REFERENCES are not subject to row security
+
+RESET ROLE;
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+DROP USER regress_foreign_key_user;
+
+--
+-- Check ACL
+--
+CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY, ptest2 text );
+CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE, ftest2 int );
+
+-- Insert test data into PKTABLE
+INSERT INTO PKTABLE VALUES (1, 'Test1');
+INSERT INTO PKTABLE VALUES (2, 'Test2');
+INSERT INTO PKTABLE VALUES (3, 'Test3');
+
+-- Grant usage on PKTABLE to user regress_foreign_key_user
+CREATE USER regress_foreign_key_user NOLOGIN;
+GRANT SELECT ON PKTABLE TO regress_foreign_key_user;
+
+ALTER TABLE PKTABLE OWNER to regress_foreign_key_user;
+
+-- Inserting into FKTABLE should work
+INSERT INTO FKTABLE VALUES (3, 5);
+
+-- Revoke usage on PKTABLE from user regress_foreign_key_user
+REVOKE SELECT ON PKTABLE FROM regress_foreign_key_user;
+
+-- Inserting into FKTABLE should fail
+INSERT INTO FKTABLE VALUES (2, 6);
+
+DROP TABLE FKTABLE;
+DROP TABLE PKTABLE;
+
+DROP USER regress_foreign_key_user;
+
--
-- Check initial check upon ALTER TABLE
--
--
2.47.3
[application/octet-stream] v3-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch (22.9K, 3-v3-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch)
download | inline diff:
From 6cb6c402d894cdbe44701d97cd644e4a4fd2cf75 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 18 Feb 2026 23:37:19 +0900
Subject: [PATCH v3 2/2] Cache per-batch resources for fast-path foreign key
checks
The fast-path FK check introduced in the previous commits opens and
closes the PK relation, index, scan descriptor, and tuple slot on
every trigger invocation. For bulk operations that fire thousands of
FK triggers in a single statement, this repeated setup/teardown
dominates the cost.
Introduce RI_FastPathEntry, a per-constraint hash table that caches
the open Relation (pk_rel, idx_rel), IndexScanDesc, TupleTableSlot,
and a registered Snapshot across all trigger invocations within a
single trigger-firing batch. Entries are created lazily on first use
via ri_FastPathGetEntry() and persist until the batch ends.
The snapshot is registered once at entry creation. On subsequent
rows, only CommandCounterIncrement() + a direct curcid patch on the
registered copy is needed, avoiding the per-row GetSnapshotData()
cost (which takes ProcArrayLock and iterates all backend slots).
SnapshotSetCommandId() only patches the process-global statics, not
registered copies, so we patch entry->snapshot->curcid ourselves.
Lifecycle management:
- AfterTriggerBatchCallback: A new general-purpose callback
mechanism in trigger.c. Callbacks registered via
RegisterAfterTriggerBatchCallback() fire at the end of each
trigger-firing batch (AfterTriggerEndQuery for immediate
constraints, AfterTriggerFireDeferred at COMMIT, and
AfterTriggerSetState for SET CONSTRAINTS IMMEDIATE). The RI code
registers ri_FastPathCleanup as a batch callback, which does
orderly teardown: index_endscan, index_close, table_close,
ExecDropSingleTupleTableSlot, UnregisterSnapshot.
- XactCallback: ri_FastPathXactCallback NULLs the static cache
pointer at transaction end. On the normal path, cleanup already
ran via the batch callback; this handles the abort path where
TopTransactionContext destruction frees the memory but
ResourceOwner handles the actual resource cleanup.
- SubXactCallback: ri_FastPathSubXactCallback NULLs the static
cache pointer on subtransaction abort. ResourceOwner already
cleaned up the resources; this prevents the batch callback from
trying to double-close them.
- AfterTriggerBatchIsActive(): Exported accessor that returns true
when afterTriggers.query_depth >= 0. During ALTER TABLE ... ADD
FOREIGN KEY validation, RI triggers are called directly outside
the after-trigger framework, so batch callbacks would never fire.
The fast-path code uses this to fall back to a non-cached
per-invocation path (open/scan/close each call) in that context.
---
src/backend/commands/trigger.c | 84 +++++++
src/backend/utils/adt/ri_triggers.c | 253 ++++++++++++++++++++--
src/include/commands/trigger.h | 18 ++
src/test/regress/expected/foreign_key.out | 47 ++++
src/test/regress/sql/foreign_key.sql | 44 ++++
src/tools/pgindent/typedefs.list | 3 +
6 files changed, 435 insertions(+), 14 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 8df915f63fb..7adeae5c7e5 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3891,6 +3891,8 @@ typedef struct AfterTriggersData
/* per-subtransaction-level data: */
AfterTriggersTransData *trans_stack; /* array of structs shown below */
int maxtransdepth; /* allocated len of above array */
+
+ List *batch_callbacks; /* List of AfterTriggerCallbackItem */
} AfterTriggersData;
struct AfterTriggersQueryData
@@ -3927,6 +3929,13 @@ struct AfterTriggersTableData
TupleTableSlot *storeslot; /* for converting to tuplestore's format */
};
+/* Entry in afterTriggers.batch_callbacks */
+typedef struct AfterTriggerCallbackItem
+{
+ AfterTriggerBatchCallback callback;
+ void *arg;
+} AfterTriggerCallbackItem;
+
static AfterTriggersData afterTriggers;
static void AfterTriggerExecute(EState *estate,
@@ -3962,6 +3971,7 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
Oid tgoid, bool tgisdeferred);
static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);
+static void FireAfterTriggerBatchCallbacks(void);
/*
* Get the FDW tuplestore for the current trigger query level, creating it
@@ -5087,6 +5097,7 @@ AfterTriggerBeginXact(void)
*/
afterTriggers.firing_counter = (CommandId) 1; /* mustn't be 0 */
afterTriggers.query_depth = -1;
+ afterTriggers.batch_callbacks = NIL;
/*
* Verify that there is no leftover state remaining. If these assertions
@@ -5208,6 +5219,8 @@ AfterTriggerEndQuery(EState *estate)
break;
}
+ FireAfterTriggerBatchCallbacks();
+
/* Release query-level-local storage, including tuplestores if any */
AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
@@ -5315,6 +5328,8 @@ AfterTriggerFireDeferred(void)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
/*
* We don't bother freeing the event list, since it will go away anyway
* (and more efficiently than via pfree) in AfterTriggerEndXact.
@@ -6057,6 +6072,8 @@ AfterTriggerSetState(ConstraintsSetStmt *stmt)
break; /* all fired */
}
+ FireAfterTriggerBatchCallbacks();
+
if (snapshot_set)
PopActiveSnapshot();
}
@@ -6753,3 +6770,70 @@ check_modified_virtual_generated(TupleDesc tupdesc, HeapTuple tuple)
return tuple;
}
+
+/*
+ * RegisterAfterTriggerBatchCallback
+ * Register a function to be called when the current trigger-firing
+ * batch completes.
+ *
+ * Must be called from within a trigger function's execution context
+ * (i.e., while afterTriggers state is active).
+ *
+ * The callback list is cleared after invocation, so the caller must
+ * re-register for each new batch if needed.
+ */
+void
+RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg)
+{
+ AfterTriggerCallbackItem *item;
+ MemoryContext oldcxt;
+
+ /*
+ * Allocate in TopTransactionContext so the item survives for the duration
+ * of the batch, which may span multiple trigger invocations.
+ */
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ item = palloc(sizeof(AfterTriggerCallbackItem));
+ item->callback = callback;
+ item->arg = arg;
+ afterTriggers.batch_callbacks =
+ lappend(afterTriggers.batch_callbacks, item);
+ MemoryContextSwitchTo(oldcxt);
+}
+
+/*
+ * FireAfterTriggerBatchCallbacks
+ * Invoke and clear all registered batch callbacks.
+ *
+ * Called at the end of each trigger-firing batch.
+ */
+static void
+FireAfterTriggerBatchCallbacks(void)
+{
+ ListCell *lc;
+
+ foreach(lc, afterTriggers.batch_callbacks)
+ {
+ AfterTriggerCallbackItem *item = lfirst(lc);
+
+ item->callback(item->arg);
+ }
+
+ list_free_deep(afterTriggers.batch_callbacks);
+ afterTriggers.batch_callbacks = NIL;
+}
+
+/*
+ * AfterTriggerBatchIsActive
+ * Returns true if we're inside a query-level trigger batch where
+ * registered batch callbacks will actually be invoked.
+ *
+ * This is false during validateForeignKeyConstraint(), which calls
+ * RI trigger functions directly outside the after-trigger framework.
+ */
+bool
+AfterTriggerBatchIsActive(void)
+{
+ return afterTriggers.query_depth >= 0;
+}
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 45cc742fa19..2dbb33f436d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -190,6 +190,23 @@ typedef struct RI_CompareHashEntry
FmgrInfo cast_func_finfo; /* in case we must coerce input */
} RI_CompareHashEntry;
+/*
+ * RI_FastPathEntry
+ * Per-constraint cache of resources needed by ri_FastPathCheck().
+ *
+ * One entry per constraint, keyed by pg_constraint OID. Created lazily
+ * by ri_FastPathGetEntry() on first use within a trigger-firing batch
+ * and torn down by ri_FastPathCleanup() at batch end.
+ */
+typedef struct RI_FastPathEntry
+{
+ Oid conoid; /* hash key: pg_constraint OID */
+ Relation pk_rel;
+ Relation idx_rel;
+ IndexScanDesc scandesc;
+ TupleTableSlot *slot;
+ Snapshot snapshot; /* registered snapshot for the scan */
+} RI_FastPathEntry;
/*
* Local data
@@ -199,6 +216,8 @@ static HTAB *ri_query_cache = NULL;
static HTAB *ri_compare_cache = NULL;
static dclist_head ri_constraint_cache_valid_list;
+static HTAB *ri_fastpath_cache = NULL;
+static bool ri_fastpath_callback_registered = false;
/*
* Local function prototypes
@@ -267,6 +286,8 @@ pg_noreturn static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
Relation pk_rel, Relation fk_rel,
TupleTableSlot *violatorslot, TupleDesc tupdesc,
int queryno, bool is_restrict, bool partgone);
+static RI_FastPathEntry *ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo);
+static void ri_FastPathCleanup(void *arg);
/*
@@ -2696,6 +2717,19 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
Oid saved_userid;
int saved_sec_context;
Snapshot snapshot;
+ bool use_cache;
+ RI_FastPathEntry *fpentry = NULL;
+
+ /*
+ * Use the per-batch cache only if we're inside the after-trigger
+ * framework, where our cleanup callback will fire. During ALTER TABLE
+ * ... ADD FOREIGN KEY validation, triggers are called directly and the
+ * callback would never run, leaking resources.
+ */
+ use_cache = AfterTriggerBatchIsActive();
+
+ if (use_cache)
+ fpentry = ri_FastPathGetEntry(riinfo);
/*
* Advance the command counter so the snapshot sees the effects of prior
@@ -2703,15 +2737,40 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
* ri_PerformCheck().
*/
CommandCounterIncrement();
- snapshot = RegisterSnapshot(GetLatestSnapshot());
-
- pk_rel = table_open(riinfo->pk_relid, RowShareLock);
- idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ if (use_cache)
+ {
+ /*
+ * The snapshot was registered once when the cache entry was created.
+ * We just patch curcid to reflect the new command counter.
+ * SnapshotSetCommandId() only patches process-global statics, not
+ * registered copies, so we do it directly.
+ *
+ * The xmin/xmax/xip fields don't need refreshing: within a single
+ * statement batch, only curcid changes between rows.
+ */
+ Assert(fpentry && fpentry->snapshot != NULL);
+ snapshot = fpentry->snapshot;
+ snapshot->curcid = GetCurrentCommandId(false);
+ }
+ else
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
- slot = table_slot_create(pk_rel, NULL);
- scandesc = index_beginscan(pk_rel, idx_rel,
- snapshot, NULL,
- riinfo->nkeys, 0);
+ if (use_cache)
+ {
+ pk_rel = fpentry->pk_rel;
+ idx_rel = fpentry->idx_rel;
+ scandesc = fpentry->scandesc;
+ slot = fpentry->slot;
+ }
+ else
+ {
+ pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ idx_rel = index_open(riinfo->conindid, AccessShareLock);
+ scandesc = index_beginscan(pk_rel, idx_rel,
+ snapshot, NULL,
+ riinfo->nkeys, 0);
+ slot = table_slot_create(pk_rel, NULL);
+ }
if (!riinfo->fpmeta_valid)
ri_populate_fastpath_metadata((RI_ConstraintInfo *) riinfo,
@@ -2782,12 +2841,15 @@ ri_FastPathCheck(const RI_ConstraintInfo *riinfo,
ExecDropSingleTupleTableSlot(xact_slot);
}
- index_endscan(scandesc);
- index_close(idx_rel, NoLock);
- table_close(pk_rel, NoLock);
- ExecDropSingleTupleTableSlot(slot);
-
- UnregisterSnapshot(snapshot);
+ /* Non-cached path: clean up per-invocation resources */
+ if (!use_cache)
+ {
+ index_endscan(scandesc);
+ index_close(idx_rel, NoLock);
+ table_close(pk_rel, NoLock);
+ ExecDropSingleTupleTableSlot(slot);
+ UnregisterSnapshot(snapshot);
+ }
SetUserIdAndSecContext(saved_userid, saved_sec_context);
@@ -3673,3 +3735,166 @@ RI_FKey_trigger_type(Oid tgfoid)
return RI_TRIGGER_NONE;
}
+
+/*
+ * ri_FastPathCleanup
+ * Tear down all cached fast-path state.
+ *
+ * Called as an AfterTriggerBatchCallback at end of batch.
+ */
+static void
+ri_FastPathCleanup(void *arg)
+{
+ HASH_SEQ_STATUS status;
+ RI_FastPathEntry *entry;
+
+ if (ri_fastpath_cache == NULL)
+ return;
+
+ hash_seq_init(&status, ri_fastpath_cache);
+ while ((entry = hash_seq_search(&status)) != NULL)
+ {
+ if (entry->scandesc)
+ index_endscan(entry->scandesc);
+ if (entry->idx_rel)
+ index_close(entry->idx_rel, NoLock);
+ if (entry->pk_rel)
+ table_close(entry->pk_rel, NoLock);
+ if (entry->slot)
+ ExecDropSingleTupleTableSlot(entry->slot);
+ if (entry->snapshot)
+ UnregisterSnapshot(entry->snapshot);
+ }
+
+ hash_destroy(ri_fastpath_cache);
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static bool ri_fastpath_xact_callback_registered = false;
+
+static void
+ri_FastPathXactCallback(XactEvent event, void *arg)
+{
+ /*
+ * TopTransactionContext is destroyed at end of transaction, taking the
+ * hash table and all cached resources with it. Just reset our static
+ * pointers so we don't dereference freed memory.
+ *
+ * In the normal (non-error) path, ri_FastPathCleanup already ran via the
+ * batch callback and did orderly teardown. Here we're just handling the
+ * abort path where that callback never fired.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+}
+
+static void
+ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
+ SubTransactionId parentSubid, void *arg)
+{
+ if (event == SUBXACT_EVENT_ABORT_SUB)
+ {
+ /*
+ * ResourceOwner already cleaned up relations, scans, and snapshots.
+ * Just NULL our pointers so the still-registered batch callback
+ * becomes a no-op. The hash table memory in TopTransactionContext
+ * will be freed at transaction end.
+ */
+ ri_fastpath_cache = NULL;
+ ri_fastpath_callback_registered = false;
+ }
+}
+
+/*
+ * ri_FastPathGetEntry
+ * Look up or create a per-batch cache entry for the given constraint.
+ *
+ * On first call for a constraint within a batch: opens pk_rel and the
+ * index, begins an index scan, allocates a result slot, and registers
+ * the cleanup callback.
+ *
+ * On subsequent calls: returns the existing entry. Caller uses
+ * index_rescan() with new keys.
+ */
+static RI_FastPathEntry *
+ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo)
+{
+ RI_FastPathEntry *entry;
+ bool found;
+
+ /* Create hash table on first use in this batch */
+ if (ri_fastpath_cache == NULL)
+ {
+ HASHCTL ctl;
+
+ if (!ri_fastpath_xact_callback_registered)
+ {
+ RegisterXactCallback(ri_FastPathXactCallback, NULL);
+ RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
+ ri_fastpath_xact_callback_registered = true;
+ }
+
+ ctl.keysize = sizeof(Oid);
+ ctl.entrysize = sizeof(RI_FastPathEntry);
+ ctl.hcxt = TopTransactionContext;
+ ri_fastpath_cache = hash_create("RI fast-path cache",
+ 16,
+ &ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ entry = hash_search(ri_fastpath_cache, &riinfo->constraint_id,
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ MemoryContext oldcxt;
+
+ /*
+ * Zero out non-key fields so ri_FastPathCleanup is safe if we error
+ * out during partial initialization below.
+ */
+ memset(((char *) entry) + sizeof(Oid), 0,
+ sizeof(RI_FastPathEntry) - sizeof(Oid));
+
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+
+ /*
+ * Open PK table and its unique index.
+ *
+ * RowShareLock on pk_rel matches what the SPI path's SELECT ... FOR
+ * KEY SHARE would acquire as a relation-level lock. AccessShareLock
+ * on the index is standard for index scans.
+ *
+ * We don't release these locks until end of transaction, matching SPI
+ * behavior.
+ */
+ entry->pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+ entry->idx_rel = index_open(riinfo->conindid, AccessShareLock);
+
+ /*
+ * Register an initial snapshot. Its curcid will be patched in place
+ * on each subsequent row (see ri_FastPathCheck()), avoiding per-row
+ * GetSnapshotData() overhead.
+ */
+ entry->snapshot = RegisterSnapshot(GetLatestSnapshot());
+
+ entry->slot = table_slot_create(entry->pk_rel, NULL);
+
+ entry->scandesc = index_beginscan(entry->pk_rel, entry->idx_rel,
+ entry->snapshot, NULL,
+ riinfo->nkeys, 0);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Ensure cleanup at end of this trigger-firing batch */
+ if (!ri_fastpath_callback_registered)
+ {
+ RegisterAfterTriggerBatchCallback(ri_FastPathCleanup, NULL);
+ ri_fastpath_callback_registered = true;
+ }
+ }
+
+ return entry;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 556c86bf5e1..4304abffc8d 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -289,4 +289,22 @@ extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
extern int RI_FKey_trigger_type(Oid tgfoid);
+/*
+ * Callback type for end-of-trigger-batch notifications.
+ *
+ * Registered via RegisterAfterTriggerBatchCallback(). Invoked when
+ * a batch of after-trigger processing completes:
+ * - AfterTriggerEndQuery() (immediate constraints)
+ * - AfterTriggerFireDeferred() (deferred constraints at COMMIT)
+ * - AfterTriggerSetState() (SET CONSTRAINTS IMMEDIATE)
+ *
+ * The callback list is cleared after each batch. Callers must
+ * re-register if they need to be called again in a subsequent batch.
+ */
+typedef void (*AfterTriggerBatchCallback) (void *arg);
+
+extern void RegisterAfterTriggerBatchCallback(AfterTriggerBatchCallback callback,
+ void *arg);
+extern bool AfterTriggerBatchIsActive(void);
+
#endif /* TRIGGER_H */
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 0826f518004..db831ddfc5c 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3504,3 +3504,50 @@ DETAIL: drop cascades to table fkpart13_t1
drop cascades to table fkpart13_t2
drop cascades to table fkpart13_t3
RESET search_path;
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+ERROR: insert or update on table "fp_fk_alter" violates foreign key constraint "fp_fk_alter_a_fkey"
+DETAIL: Key (a)=(101) is not present in table "fp_pk_alter".
+DROP TABLE fp_fk_alter, fp_pk_alter;
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+ERROR: insert or update on table "fp_fk_alter2" violates foreign key constraint "fp_fk_alter2_a_fkey"
+DETAIL: Key (a)=(200) is not present in table "fp_pk_alter2".
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+ERROR: insert or update on table "fp_multi_fk" violates foreign key constraint "fp_multi_fk_b_fkey"
+DETAIL: Key (b)=(2) is not present in table "fp_pk2".
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+ERROR: insert or update on table "fp_fk_defer" violates foreign key constraint "fp_fk_defer_a_fkey"
+DETAIL: Key (a)=(3) is not present in table "fp_pk_defer".
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index e9ee29331cb..0762caa3682 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2498,3 +2498,47 @@ WITH cte AS (
DROP SCHEMA fkpart13 CASCADE;
RESET search_path;
+
+-- Tests foreign key check fast-path no-cache path.
+CREATE TABLE fp_pk_alter (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter SELECT generate_series(1, 100);
+CREATE TABLE fp_fk_alter (a int);
+INSERT INTO fp_fk_alter SELECT generate_series(1, 100);
+-- Validation path: should succeed
+ALTER TABLE fp_fk_alter ADD FOREIGN KEY (a) REFERENCES fp_pk_alter;
+INSERT INTO fp_fk_alter VALUES (101); -- should fail (constraint active)
+DROP TABLE fp_fk_alter, fp_pk_alter;
+
+-- Separate test: validation catches existing violation
+CREATE TABLE fp_pk_alter2 (a int PRIMARY KEY);
+INSERT INTO fp_pk_alter2 VALUES (1);
+CREATE TABLE fp_fk_alter2 (a int);
+INSERT INTO fp_fk_alter2 VALUES (1), (200); -- 200 has no PK match
+ALTER TABLE fp_fk_alter2 ADD FOREIGN KEY (a) REFERENCES fp_pk_alter2; -- should fail
+DROP TABLE fp_fk_alter2, fp_pk_alter2;
+
+-- Tests that the fast-path handles caching for multiple constraints
+CREATE TABLE fp_pk1 (a int PRIMARY KEY);
+CREATE TABLE fp_pk2 (b int PRIMARY KEY);
+INSERT INTO fp_pk1 VALUES (1);
+INSERT INTO fp_pk2 VALUES (1);
+CREATE TABLE fp_multi_fk (
+ a int REFERENCES fp_pk1,
+ b int REFERENCES fp_pk2
+);
+INSERT INTO fp_multi_fk VALUES (1, 1); -- two constraints, one batch
+INSERT INTO fp_multi_fk VALUES (1, 2); -- second constraint fails
+DROP TABLE fp_multi_fk, fp_pk1, fp_pk2;
+
+-- Test that fast-path cache handles deferred constraints and SET CONSTRAINTS IMMEDIATE
+CREATE TABLE fp_pk_defer (a int PRIMARY KEY);
+CREATE TABLE fp_fk_defer (a int REFERENCES fp_pk_defer DEFERRABLE INITIALLY DEFERRED);
+INSERT INTO fp_pk_defer VALUES (1), (2);
+
+BEGIN;
+INSERT INTO fp_fk_defer VALUES (1);
+INSERT INTO fp_fk_defer VALUES (2);
+SET CONSTRAINTS ALL IMMEDIATE; -- fires batch callback here
+INSERT INTO fp_fk_defer VALUES (3); -- should fail, also tests that cache was cleaned up
+COMMIT;
+DROP TABLE fp_pk_defer, fp_fk_defer;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 241945734ec..c8388ea33d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -30,6 +30,8 @@ AddForeignUpdateTargets_function
AddrInfo
AffixNode
AffixNodeData
+AfterTriggerBatchCallback
+AfterTriggerCallbackItem
AfterTriggerEvent
AfterTriggerEventChunk
AfterTriggerEventData
@@ -2446,6 +2448,7 @@ RIX
RI_CompareHashEntry
RI_CompareKey
RI_ConstraintInfo
+RI_FastPathEntry
RI_QueryHashEntry
RI_QueryKey
RTEKind
--
2.47.3
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
In-Reply-To: <CA+HiwqEyiLCY6MTLbOJXDdLNNQLaURYHvdW797MQgbjEK9od4Q@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox