PG19 FK fast path: OOB write and missed FK checks during batched

public inbox for [email protected]  
help / color / mirror / Atom feed

PG19 FK fast path: OOB write and missed FK checks during batched
14+ messages / 4 participants
[nested] [flat]

* PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-06 08:30  Nikolay Samokhvalov <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Nikolay Samokhvalov @ 2026-06-06 08:30 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>; [email protected] <[email protected]>

Hi hackers,

The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs
user-defined code in the middle of a deferred batch flush, which yields at
least three defects reachable by an unprivileged table owner. Present in
master and verified inREL_19_BETA1.

I identified these issues during recent security research with LLMs. While
they have clear security implications (OOB write, integrity bypass),
reporting them here because they are isolated to 19beta1, absent in PG18
and earlier; I don't have patches, only reproducibility.

Mechanism:

For an INSERT/UPDATE on the referencing side the fast path buffers rows in
a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint OID)
and probes the PK index in groups, flushing when a

per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the

trigger-firing pass ends (ri_FastPathEndBatch, an
AfterTriggerBatchCallback). For a cross-type FK the flush calls the
column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line
3069) and the equality operator -- arbitrary user code, mid-flush.  Line
numbers below are from a REL_19_BETA1 build (commit 4b0bf07).

Unprivileged vehicle (defects 1 and 3).  No superuser, no contrib: a
role creates
a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQL
function, which ri_HashCompareOp wires into the fast path's cast

slot. Below uses a composite type. Default btree opclass, ordinary
single-column
FK, no GUC (fast path is unconditional for non-partitioned, non-temporal
FKs, per ri_fastpath_is_applicable).

1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry

The write precedes the bound check, and batch_count is reset to 0 only at end
of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a
full-batch
flush:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);

    fpentry->batch_count++;

    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)

        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

There is no re-entrancy guard and ri_FastPathGetEntry returns the same entry,
so user code that does DML on the same table during a full-batch flush
re-enters with batch_count == 64 and writes batch[64], one past the

array, overwriting the adjacent batch_count field (struct layout, lines
250-251). A single re-entrant row only stomps batch_count, which is then reset
to 0 before reuse; the crash manifests once the re-entrant insert is

itself large enough to fill and flush a batch, so the stomped batch_count
is used as an array index (batch[garbage]) and as nvals in memset(matched,
0, nvals * sizeof(bool)) (line 3054).

Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0;
under -O2 the out-of-bounds write is of undefined effect):

    create table parent(id int primary key);

    insert into parent select g from generate_series(1,2000) g;

    create type vch as (v int);

    create function vcast(vch) returns int language plpgsql as $$

    begin

      if $1.v = 64 then

        insert into child select row(g)::vch from
generate_series(1001,1064) g;

      end if;

      return $1.v;

    end$$;

    create cast (vch as int) with function vcast(vch) as implicit;

    create table child(a vch);

    alter table child add constraint child_fkey

      foreign key (a) references parent(id);

    insert into child select row(g)::vch from generate_series(1,64) g;  --
crash

    -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count holding
a

    -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote

    -- batch_count; backend SIGSEGVs and the cluster restarts.

2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort

On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache:

    ri_fastpath_cache = NULL;

    ri_fastpath_callback_registered = false;

But batch[] holds outstanding rows of the enclosing transaction, not
the aborting
subxact. An internal subxact abort during after-trigger firing (PL/pgSQL
BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks
never run and orphans commit behind a constraint that still reports itself
valid. No cast needed:

    create table pk(id int primary key);

    create table fk(a int, tag text);

    insert into pk select g from generate_series(1,10) g;

    alter table fk add constraint fk_a_fkey foreign key (a) references
pk(id);

    create function abort_subxact() returns trigger language plpgsql as $$

    begin

      if NEW.tag = 'boom' then

        begin perform 1/0; exception when others then null; end;

      end if;

      return NEW;

    end$$;

    create trigger fk_after after insert on fk

      for each row execute function abort_subxact();

    insert into fk values (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok');

    -- INSERT 0 5, no error

    select f.a from fk f left join pk p on f.a=p.id where p.id is null;

    --  a

    -- -----

    -- 999

    --   0   (orphans)

    -- the constraint still reports itself valid, and re-validation passes

    -- while the orphans remain:

    select convalidated from pg_constraint where conname = 'fk_a_fkey';

    -- convalidated

    -- --------------

    -- t

    alter table fk validate constraint fk_a_fkey;

    -- ALTER TABLE   (succeeds; does not re-scan committed rows)

    select f.a from fk f left join pk p on f.a=p.id where p.id is null;

    -- 999, 0  (orphans still present)

Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE
INITIALLY DEFERRED)
all behave correctly (FK violation raised, no orphans). The whole statement's
buffered batch is discarded, not just the aborting row's check. The abort
path also emits "WARNING: resource was not closed" (relation /

index / TupleDesc), a resource leak consistent with the missing flush.

3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check

EndBatch flushes by iterating the cache with hash_seq_search (line 4143). If
flush-time user code INSERTs into a different fast-path FK table,
ri_FastPathGetEntry
adds a new cache entry mid-scan; it can land in a bucket hash_seq_search
already passed and is never reached. ri_FastPathTeardown (line 4165) then
hash_destroys the cache (line 4188) without flushing entries that still
have batch_count > 0, so that buffered check is discarded. This survives a

per-entry guard for [1] (different entry, not a re-entry of the busy one):

    create table parent(id int primary key);

    insert into parent select g from generate_series(1,64) g;

    create table child2(a int);

    alter table child2 add constraint child2_fkey

      foreign key (a) references parent(id);

    create type vch as (v int);

    create function vcast(vch) returns int language plpgsql as $$

    begin

      if $1.v = 1 then

        insert into child2 values (999999);   -- orphan into a *different*
FK

      end if;

      return $1.v;

    end$$;

    create cast (vch as int) with function vcast(vch) as implicit;

    create table child(a vch);

    alter table child add constraint child_fkey

      foreign key (a) references parent(id);

    insert into child values (row(1)::vch);    -- flushed at
ri_FastPathEndBatch

    select a from child2 where a not in (select id from parent);  -- =>
999999

    -- control: INSERT INTO child2 VALUES (999999); -- correctly raises FK
error

Root cause / thoughts:

All three stem from invoking user cast/operator code inside a deferred batch
flush: while a per-entry batch is half-updated [1], while a cache-wide
hash_seq_search
is in progress and teardown drops non-empty entries [3], and against a
subxact-abort invalidation that cannot tell parent-xact rows from
aborted-subxact
rows [2].

- [1] Bound-check before the write in ri_FastPathBatchAdd, and add a "flushing"
flag to RI_FastPathEntry, rejecting re-entrant modification of a busy entry
(a nested per-row probe is unsafe: the flush may hold PK-index buffer
locks).

- [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count
> 0, and/or
flush non-empty entries in ri_FastPathTeardown before hash_destroy.

- [2] Do not discard outstanding parent-xact rows on
SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush
immediate-constraint batches subxact boundaries.

- Unifying: a global "in fast-path flush" guard routing any re-entrant FK check
to the immediate per-row path, and reconsidering running user code mid-flush
at all.

Nik

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-06 09:13  Amit Langote <[email protected]>
  parent: Nikolay Samokhvalov <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-06 09:13 UTC (permalink / raw)
  To: Nikolay Samokhvalov <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Sat, Jun 6, 2026 at 17:31 Nikolay Samokhvalov <[email protected]> wrote:

> Hi hackers,
>
>
> The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs
> user-defined code in the middle of a deferred batch flush, which yields at
> least three defects reachable by an unprivileged table owner. Present in
> master and verified inREL_19_BETA1.
>
>
> I identified these issues during recent security research with LLMs. While
> they have clear security implications (OOB write, integrity bypass),
> reporting them here because they are isolated to 19beta1, absent in PG18
> and earlier; I don't have patches, only reproducibility.
>
>
> Mechanism:
>
>
> For an INSERT/UPDATE on the referencing side the fast path buffers rows
> in a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint
> OID) and probes the PK index in groups, flushing when a
>
> per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the
>
> trigger-firing pass ends (ri_FastPathEndBatch, an
> AfterTriggerBatchCallback). For a cross-type FK the flush calls the
> column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line
> 3069) and the equality operator -- arbitrary user code, mid-flush.  Line
> numbers below are from a REL_19_BETA1 build (commit 4b0bf07).
>
>
> Unprivileged vehicle (defects 1 and 3).  No superuser, no contrib: a role creates
> a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQL
> function, which ri_HashCompareOp wires into the fast path's cast
>
> slot. Below uses a composite type. Default btree opclass, ordinary single-column
> FK, no GUC (fast path is unconditional for non-partitioned, non-temporal
> FKs, per ri_fastpath_is_applicable).
>
>
>
> 1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry
>
>
> The write precedes the bound check, and batch_count is reset to 0 only at end
> of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a full-batch
> flush:
>
>
>     fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
>
>     fpentry->batch_count++;
>
>     if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
>
>         ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
>
>
> There is no re-entrancy guard and ri_FastPathGetEntry returns the same entry,
> so user code that does DML on the same table during a full-batch flush
> re-enters with batch_count == 64 and writes batch[64], one past the
>
> array, overwriting the adjacent batch_count field (struct layout, lines
> 250-251). A single re-entrant row only stomps batch_count, which is then reset
> to 0 before reuse; the crash manifests once the re-entrant insert is
>
> itself large enough to fill and flush a batch, so the stomped batch_count
> is used as an array index (batch[garbage]) and as nvals in memset(matched,
> 0, nvals * sizeof(bool)) (line 3054).
>
>
> Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0;
> under -O2 the out-of-bounds write is of undefined effect):
>
>
>     create table parent(id int primary key);
>
>     insert into parent select g from generate_series(1,2000) g;
>
>     create type vch as (v int);
>
>     create function vcast(vch) returns int language plpgsql as $$
>
>     begin
>
>       if $1.v = 64 then
>
>         insert into child select row(g)::vch from
> generate_series(1001,1064) g;
>
>       end if;
>
>       return $1.v;
>
>     end$$;
>
>     create cast (vch as int) with function vcast(vch) as implicit;
>
>     create table child(a vch);
>
>     alter table child add constraint child_fkey
>
>       foreign key (a) references parent(id);
>
>     insert into child select row(g)::vch from generate_series(1,64) g;  --
> crash
>
>     -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count
> holding a
>
>     -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote
>
>     -- batch_count; backend SIGSEGVs and the cluster restarts.
>
>
>
> 2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort
>
>
> On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache:
>
>
>     ri_fastpath_cache = NULL;
>
>     ri_fastpath_callback_registered = false;
>
>
> But batch[] holds outstanding rows of the enclosing transaction, not the aborting
> subxact. An internal subxact abort during after-trigger firing (PL/pgSQL
> BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks
> never run and orphans commit behind a constraint that still reports itself
> valid. No cast needed:
>
>
>     create table pk(id int primary key);
>
>     create table fk(a int, tag text);
>
>     insert into pk select g from generate_series(1,10) g;
>
>     alter table fk add constraint fk_a_fkey foreign key (a) references
> pk(id);
>
>     create function abort_subxact() returns trigger language plpgsql as $$
>
>     begin
>
>       if NEW.tag = 'boom' then
>
>         begin perform 1/0; exception when others then null; end;
>
>       end if;
>
>       return NEW;
>
>     end$$;
>
>     create trigger fk_after after insert on fk
>
>       for each row execute function abort_subxact();
>
>     insert into fk values
> (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok');
>
>     -- INSERT 0 5, no error
>
>     select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
>     --  a
>
>     -- -----
>
>     -- 999
>
>     --   0   (orphans)
>
>
>     -- the constraint still reports itself valid, and re-validation passes
>
>     -- while the orphans remain:
>
>     select convalidated from pg_constraint where conname = 'fk_a_fkey';
>
>     -- convalidated
>
>     -- --------------
>
>     -- t
>
>     alter table fk validate constraint fk_a_fkey;
>
>     -- ALTER TABLE   (succeeds; does not re-scan committed rows)
>
>     select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
>     -- 999, 0  (orphans still present)
>
>
> Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY DEFERRED)
> all behave correctly (FK violation raised, no orphans). The whole statement's
> buffered batch is discarded, not just the aborting row's check. The abort
> path also emits "WARNING: resource was not closed" (relation /
>
> index / TupleDesc), a resource leak consistent with the missing flush.
>
>
>
> 3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check
>
>
> EndBatch flushes by iterating the cache with hash_seq_search (line 4143). If
> flush-time user code INSERTs into a different fast-path FK table, ri_FastPathGetEntry
> adds a new cache entry mid-scan; it can land in a bucket hash_seq_search
> already passed and is never reached. ri_FastPathTeardown (line 4165) then
> hash_destroys the cache (line 4188) without flushing entries that still
> have batch_count > 0, so that buffered check is discarded. This survives a
>
> per-entry guard for [1] (different entry, not a re-entry of the busy one):
>
>
>     create table parent(id int primary key);
>
>     insert into parent select g from generate_series(1,64) g;
>
>     create table child2(a int);
>
>     alter table child2 add constraint child2_fkey
>
>       foreign key (a) references parent(id);
>
>     create type vch as (v int);
>
>     create function vcast(vch) returns int language plpgsql as $$
>
>     begin
>
>       if $1.v = 1 then
>
>         insert into child2 values (999999);   -- orphan into a
> *different* FK
>
>       end if;
>
>       return $1.v;
>
>     end$$;
>
>     create cast (vch as int) with function vcast(vch) as implicit;
>
>     create table child(a vch);
>
>     alter table child add constraint child_fkey
>
>       foreign key (a) references parent(id);
>
>     insert into child values (row(1)::vch);    -- flushed at
> ri_FastPathEndBatch
>
>     select a from child2 where a not in (select id from parent);  -- =>
> 999999
>
>     -- control: INSERT INTO child2 VALUES (999999); -- correctly raises
> FK error
>
>
>
> Root cause / thoughts:
>
>
> All three stem from invoking user cast/operator code inside a deferred batch
> flush: while a per-entry batch is half-updated [1], while a cache-wide hash_seq_search
> is in progress and teardown drops non-empty entries [3], and against a
> subxact-abort invalidation that cannot tell parent-xact rows from aborted-subxact
> rows [2].
>
>
> - [1] Bound-check before the write in ri_FastPathBatchAdd, and add a "flushing"
> flag to RI_FastPathEntry, rejecting re-entrant modification of a busy
> entry (a nested per-row probe is unsafe: the flush may hold PK-index buffer
> locks).
>
> - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count >
> 0, and/or flush non-empty entries in ri_FastPathTeardown before
> hash_destroy.
>
> - [2] Do not discard outstanding parent-xact rows on
> SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush
> immediate-constraint batches subxact boundaries.
>
> - Unifying: a global "in fast-path flush" guard routing any re-entrant FK check
> to the immediate per-row path, and reconsidering running user code mid-flush
> at all.
>
>
> Nik
>

Thanks for the detailed report and reproducers. I’ve started looking into
this.

- thanks, Amit

>


^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-08 08:18  Amit Langote <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-08 08:18 UTC (permalink / raw)
  To: Nikolay Samokhvalov <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Sat, Jun 6, 2026 at 6:13 PM Amit Langote <[email protected]> wrote:
> Thanks for the detailed report and reproducers. I’ve started looking into this.

Continuing to look.  Appended this to the open items list:

https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues

-- 
Thanks, Amit Langote






^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-09 13:31  Amit Langote <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-09 13:31 UTC (permalink / raw)
  To: Nikolay Samokhvalov <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Mon, Jun 8, 2026 at 5:18 PM Amit Langote <[email protected]> wrote:
> On Sat, Jun 6, 2026 at 6:13 PM Amit Langote <[email protected]> wrote:
> > Thanks for the detailed report and reproducers. I’ve started looking into this.
>
> Continuing to look.  Appended this to the open items list:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues

Thanks again, Nik, for the thorough analysis and the reproducers --
they made all three easy to confirm and pin down. Patches attached:
0001 for defect 1, 0002 for defects 2 and 3.

0001 (defect 1): check and flush before writing the row rather than
after, and add a per-entry "flushing" flag so a re-entrant add on the
same entry during a flush takes the per-row path instead of touching
the mid-flush batch. The flag is cleared in a PG_FINALLY, which also
resets batch_count, so the entry stays reusable if a flush error is
caught by a savepoint.

0002 (defects 2 and 3): rather than track subxact membership per row,
confine batching to the top transaction level -- in RI_FKey_check,
when GetCurrentTransactionNestLevel() > 1, use the per-row path. I
went this way because per-entry subxact tracking isn't enough (one
entry's batch can mix rows from several levels, since the cache is
keyed by constraint), and flushing at subxact boundaries doesn't work
for deferred constraints. Once the cache only ever holds top-level
rows, a subxact abort has nothing of its own to discard, so
ri_FastPathSubXactCallback goes away -- that's what fixes your defect
2 reproducer. For defect 3, which is still reachable at the top level,
the same patch adds a cache-wide flag set while ri_FastPathEndBatch
iterates, so a re-entrant check during the scan takes the per-row path
instead of inserting into the cache being scanned.

The per-row path still bypasses SPI, so these stay well ahead of the
pre-19 check in terms of performance. I'd like to recover batching
across subtransactions properly in v20 but didn't want to rush it now.

On defect 3, can you check whether your reproducer still commits the
orphan with 0002 applied, or whether (like on my build) it now raises
the violation? I'd like to be sure the bucket-placement variation you
hit is actually covered. And of course any review of the patches is
welcome.

--
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v1-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch (10.0K, 2-v1-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch)
  download | inline diff:
From 00fa1c1137009cd64fb04b718c1de4c8c130f08e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 9 Jun 2026 18:27:24 +0900
Subject: [PATCH v1 1/2] Fix out-of-bounds write in RI fast-path batch on
 re-entry

The FK fast-path batching added in b7b27eb41a5 wrote the incoming row
into the batch array before checking whether the array was full:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
    fpentry->batch_count++;
    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

batch_count is reset to zero only at the end of ri_FastPathBatchFlush(),
so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush.
A flush runs user-defined cast functions and equality operators; if that
user code performs DML on the same FK table, ri_FastPathBatchAdd()
re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past
the end of the array, corrupting the adjacent batch_count field.  This
is reachable by an unprivileged table owner via an implicit cast with a
PL/pgSQL function and causes a SIGSEGV in assert-enabled builds.

Fix by checking whether the batch is full and flushing before writing
the new row, and by adding a "flushing" flag to RI_FastPathEntry that
routes re-entrant ri_FastPathBatchAdd() calls on a busy entry to the
per-row path (ri_FastPathCheck) instead of touching the mid-flush batch
array.  The flag is set around the probe in ri_FastPathBatchFlush() and
cleared in a PG_FINALLY, which also resets batch_count, so the entry is
left empty and reusable if a flush error (including a reported FK
violation) is caught by a savepoint.

Add regression tests for both the re-entrant flush and reuse of an entry
after a flush error caught by a savepoint.

Reported-by: Nikolay Samokhvalov <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 58 ++++++++++++++++++-----
 src/test/regress/expected/foreign_key.out | 56 ++++++++++++++++++++++
 src/test/regress/sql/foreign_key.sql      | 46 ++++++++++++++++++
 3 files changed, 147 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index dc89c686394..453b83ce85d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -249,6 +249,12 @@ typedef struct RI_FastPathEntry
 	 */
 	HeapTuple	batch[RI_FASTPATH_BATCH_SIZE];
 	int			batch_count;
+
+	/*
+	 * true while this entry's batch is being flushed; guards against
+	 * re-entrant ri_FastPathBatchAdd from user code run during the flush.
+	 */
+	bool		flushing;
 } RI_FastPathEntry;
 
 /*
@@ -2862,14 +2868,26 @@ ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
 	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
 	MemoryContext oldcxt;
 
+	/*
+	 * If this entry is already being flushed, a cast function or an operator
+	 * invoked during the flush has re-entered with DML on the same FK.  Fall
+	 * back to the per-row path rather than touching the batch array, which is
+	 * mid-flush.
+	 */
+	if (fpentry->flushing)
+	{
+		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		return;
+	}
+
+	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+		ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
+
 	oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
 	fpentry->batch[fpentry->batch_count] =
 		ExecCopySlotHeapTuple(newslot);
 	fpentry->batch_count++;
 	MemoryContextSwitchTo(oldcxt);
-
-	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
-		ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
 }
 
 /*
@@ -2944,13 +2962,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 	}
 	Assert(riinfo->fpmeta);
 
-	/* Skip array overhead for single-row batches. */
-	if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
-		violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
-												fk_rel, snapshot, scandesc);
-	else
-		violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
-											   fk_rel, snapshot, scandesc);
+	/*
+	 * The probe runs user-defined cast and equality functions.  Set the
+	 * flushing flag around it so a re-entrant ri_FastPathBatchAdd on this
+	 * entry takes the per-row path, and clear it even on error so the entry
+	 * is reusable if the error is caught by a savepoint.
+	 */
+	Assert(!fpentry->flushing);
+	fpentry->flushing = true;
+	PG_TRY();
+	{
+		/* Skip array overhead for single-row batches. */
+		if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+			violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+													fk_rel, snapshot, scandesc);
+		else
+			violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+												   fk_rel, snapshot, scandesc);
+	}
+	PG_FINALLY();
+	{
+		fpentry->flushing = false;
+		fpentry->batch_count = 0;
+	}
+	PG_END_TRY();
 
 	SetUserIdAndSecContext(saved_userid, saved_sec_context);
 	UnregisterSnapshot(snapshot);
@@ -2966,9 +3001,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 
 	MemoryContextReset(fpentry->flush_cxt);
 	MemoryContextSwitchTo(oldcxt);
-
-	/* Reset. */
-	fpentry->batch_count = 0;
 }
 
 /*
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 8b3b268de0f..fa80d9c915f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3712,3 +3712,59 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+  a  | count 
+-----+-------
+ (1) |    64
+ (2) |    64
+(2 rows)
+
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+	-- Reuse the same FK with a full batch in the same transaction.  The
+	-- entry must be empty after the caught violation: no stale rows from the
+	-- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+NOTICE:  caught fk violation
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+ count | max 
+-------+-----
+    64 |   1
+(1 row)
+
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 7eb86b188f0..0f2ce8f76f5 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2680,3 +2680,49 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+	-- Reuse the same FK with a full batch in the same transaction.  The
+	-- entry must be empty after the caught violation: no stale rows from the
+	-- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
-- 
2.47.3



  [application/octet-stream] v1-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch (10.9K, 3-v1-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch)
  download | inline diff:
From d136a8f5edc723899daafb59377930ee7fec3838 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 9 Jun 2026 22:11:54 +0900
Subject: [PATCH v1 2/2] Confine RI fast-path batching to the top transaction
 level

The FK fast-path batching added in b7b27eb41a5 buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 77 +++++++++++++++--------
 src/test/regress/expected/foreign_key.out | 24 +++++++
 src/test/regress/sql/foreign_key.sql      | 23 +++++++
 3 files changed, 97 insertions(+), 27 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 453b83ce85d..9b49d87279f 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -267,6 +267,7 @@ static dclist_head ri_constraint_cache_valid_list;
 
 static HTAB *ri_fastpath_cache = NULL;
 static bool ri_fastpath_callback_registered = false;
+static bool ri_fastpath_flushing = false;
 
 /*
  * Local function prototypes
@@ -469,14 +470,31 @@ RI_FKey_check(TriggerData *trigdata)
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
-		if (AfterTriggerIsActive())
+		if (AfterTriggerIsActive() &&
+			GetCurrentTransactionNestLevel() == 1 &&
+			!ri_fastpath_flushing)
 		{
 			/* Batched path: buffer and probe in groups */
 			ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
 		}
 		else
 		{
-			/* ALTER TABLE validation: per-row, no cache */
+			/*
+			 * Per-row path, used when batching is not safe or not
+			 * applicable:
+			 *
+			 * - ALTER TABLE validation, where no after-trigger firing is
+			 *   active;
+			 *
+			 * - any FK check inside a subtransaction, since the batch cache
+			 *   is confined to the top transaction level (it cannot be
+			 *   cleanly unwound on subxact abort);
+			 *
+			 * - a re-entrant check from user cast/operator code running
+			 *   during a batch flush, since adding a cache entry while
+			 *   ri_FastPathEndBatch is iterating the cache could leave it
+			 *   unflushed.
+			 */
 			ri_FastPathCheck(riinfo, fk_rel, newslot);
 		}
 		return PointerGetDatum(NULL);
@@ -4170,19 +4188,41 @@ ri_FastPathEndBatch(void *arg)
 	if (ri_fastpath_cache == NULL)
 		return;
 
-	/* Flush any partial batches -- can throw ERROR */
-	hash_seq_init(&status, ri_fastpath_cache);
-	while ((entry = hash_seq_search(&status)) != NULL)
+	/*
+	 * Set a flag for the duration of the scan so that any FK check triggered
+	 * by user cast or operator code during a flush takes the per-row path
+	 * instead of adding a new entry to the cache we are iterating.  A new
+	 * entry could land in an already-scanned bucket and then be torn down
+	 * unflushed below.
+	 *
+	 * The flush can throw ERROR (a reported constraint violation, or an error
+	 * from the user code it runs).  In that case ri_FastPathTeardown below is
+	 * skipped; the ResourceOwner and the transaction-end callback handle
+	 * resource cleanup on the abort path.  The PG_FINALLY only resets the
+	 * flag and deliberately does not attempt teardown.
+	 */
+	Assert(!ri_fastpath_flushing);
+	ri_fastpath_flushing = true;
+	PG_TRY();
 	{
-		if (entry->batch_count > 0)
+		hash_seq_init(&status, ri_fastpath_cache);
+		while ((entry = hash_seq_search(&status)) != NULL)
 		{
-			Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
-			RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+			if (entry->batch_count > 0)
+			{
+				Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
+				RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
 
-			ri_FastPathBatchFlush(entry, fk_rel, riinfo);
-			table_close(fk_rel, NoLock);
+				ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+				table_close(fk_rel, NoLock);
+			}
 		}
 	}
+	PG_FINALLY();
+	{
+		ri_fastpath_flushing = false;
+	}
+	PG_END_TRY();
 
 	ri_FastPathTeardown();
 }
@@ -4236,22 +4276,6 @@ ri_FastPathXactCallback(XactEvent event, void *arg)
 	ri_fastpath_callback_registered = false;
 }
 
-static void
-ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
-						   SubTransactionId parentSubid, void *arg)
-{
-	if (event == SUBXACT_EVENT_ABORT_SUB)
-	{
-		/*
-		 * ResourceOwner already released relations.  NULL the static pointers
-		 * so the still-registered batch callback becomes a no-op for the rest
-		 * of this transaction.
-		 */
-		ri_fastpath_cache = NULL;
-		ri_fastpath_callback_registered = false;
-	}
-}
-
 /*
  * ri_FastPathGetEntry
  *		Look up or create a per-batch cache entry for the given constraint.
@@ -4276,7 +4300,6 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 		if (!ri_fastpath_xact_callback_registered)
 		{
 			RegisterXactCallback(ri_FastPathXactCallback, NULL);
-			RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
 			ri_fastpath_xact_callback_registered = true;
 		}
 
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index fa80d9c915f..3f99a4a59cc 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3768,3 +3768,27 @@ SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 (1 row)
 
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+ERROR:  insert or update on table "fp_subxact_fk" violates foreign key constraint "fp_subxact_fk_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_subxact_pk".
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 0f2ce8f76f5..18c3e166f02 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2726,3 +2726,26 @@ BEGIN
 END$$;
 SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-10 08:16  Nikolay Samokhvalov <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Nikolay Samokhvalov @ 2026-06-10 08:16 UTC (permalink / raw)
  To: Amit Langote <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Tue, Jun 9, 2026 at 6:31 AM Amit Langote <[email protected]> wrote:

> On Mon, Jun 8, 2026 at 5:18 PM Amit Langote <[email protected]>
> wrote:
> > On Sat, Jun 6, 2026 at 6:13 PM Amit Langote <[email protected]>
> wrote:
> > > Thanks for the detailed report and reproducers. I’ve started looking
> into this.
> >
> > Continuing to look.  Appended this to the open items list:
> >
> > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
>
> Thanks again, Nik, for the thorough analysis and the reproducers --
> they made all three easy to confirm and pin down. Patches attached:
> 0001 for defect 1, 0002 for defects 2 and 3.
>
> 0001 (defect 1): check and flush before writing the row rather than
> after, and add a per-entry "flushing" flag so a re-entrant add on the
> same entry during a flush takes the per-row path instead of touching
> the mid-flush batch. The flag is cleared in a PG_FINALLY, which also
> resets batch_count, so the entry stays reusable if a flush error is
> caught by a savepoint.
>
> 0002 (defects 2 and 3): rather than track subxact membership per row,
> confine batching to the top transaction level -- in RI_FKey_check,
> when GetCurrentTransactionNestLevel() > 1, use the per-row path. I
> went this way because per-entry subxact tracking isn't enough (one
> entry's batch can mix rows from several levels, since the cache is
> keyed by constraint), and flushing at subxact boundaries doesn't work
> for deferred constraints. Once the cache only ever holds top-level
> rows, a subxact abort has nothing of its own to discard, so
> ri_FastPathSubXactCallback goes away -- that's what fixes your defect
> 2 reproducer. For defect 3, which is still reachable at the top level,
> the same patch adds a cache-wide flag set while ri_FastPathEndBatch
> iterates, so a re-entrant check during the scan takes the per-row path
> instead of inserting into the cache being scanned.
>
> The per-row path still bypasses SPI, so these stay well ahead of the
> pre-19 check in terms of performance. I'd like to recover batching
> across subtransactions properly in v20 but didn't want to rush it now.
>
> On defect 3, can you check whether your reproducer still commits the
> orphan with 0002 applied, or whether (like on my build) it now raises
> the violation? I'd like to be sure the bucket-placement variation you
> hit is actually covered. And of course any review of the patches is
> welcome.
>
> --
> Thanks, Amit Langote
>

Hi Amit,

Thanks for the quick fixes.

I checked v1-0001 + v1-0002 against current master (e18b0cb7) with an
assertion/debug build.

- Both apply cleanly to master (in sequence)
- Defect 1 same-FK re-entry no longer crashes; the original shape completes
and leaves the expected rows
- Defect 2 subtransaction-abort case now raises the FK violation instead of
committing orphans
- For your defect 3 question: with 0002 applied, my reproducer no longer
commits the child2 orphan. It raises:
    ERROR: insert or update on table "child2" violates foreign key
constraint "child2_fkey"
    DETAIL: Key (a)=(999999) is not present in table "parent".

After the error, child2_orphans = 0 and child2 is empty in my run.

I also ran the regression suite in that tree; foreign_key passed, and the
full run reported all 245 tests passed.

So v1 looks good to me for the three reported cases.

Thanks!

Nik


^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-10 08:32  Amit Langote <[email protected]>
  parent: Nikolay Samokhvalov <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-10 08:32 UTC (permalink / raw)
  To: Nikolay Samokhvalov <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Wed, Jun 10, 2026 at 5:16 PM Nikolay Samokhvalov <[email protected]> wrote:
> On Tue, Jun 9, 2026 at 6:31 AM Amit Langote <[email protected]> wrote:
>> On Mon, Jun 8, 2026 at 5:18 PM Amit Langote <[email protected]> wrote:
>> > On Sat, Jun 6, 2026 at 6:13 PM Amit Langote <[email protected]> wrote:
>> > > Thanks for the detailed report and reproducers. I’ve started looking into this.
>> >
>> > Continuing to look.  Appended this to the open items list:
>> >
>> > https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues
>>
>> Thanks again, Nik, for the thorough analysis and the reproducers --
>> they made all three easy to confirm and pin down. Patches attached:
>> 0001 for defect 1, 0002 for defects 2 and 3.
>>
>> 0001 (defect 1): check and flush before writing the row rather than
>> after, and add a per-entry "flushing" flag so a re-entrant add on the
>> same entry during a flush takes the per-row path instead of touching
>> the mid-flush batch. The flag is cleared in a PG_FINALLY, which also
>> resets batch_count, so the entry stays reusable if a flush error is
>> caught by a savepoint.
>>
>> 0002 (defects 2 and 3): rather than track subxact membership per row,
>> confine batching to the top transaction level -- in RI_FKey_check,
>> when GetCurrentTransactionNestLevel() > 1, use the per-row path. I
>> went this way because per-entry subxact tracking isn't enough (one
>> entry's batch can mix rows from several levels, since the cache is
>> keyed by constraint), and flushing at subxact boundaries doesn't work
>> for deferred constraints. Once the cache only ever holds top-level
>> rows, a subxact abort has nothing of its own to discard, so
>> ri_FastPathSubXactCallback goes away -- that's what fixes your defect
>> 2 reproducer. For defect 3, which is still reachable at the top level,
>> the same patch adds a cache-wide flag set while ri_FastPathEndBatch
>> iterates, so a re-entrant check during the scan takes the per-row path
>> instead of inserting into the cache being scanned.
>>
>> The per-row path still bypasses SPI, so these stay well ahead of the
>> pre-19 check in terms of performance. I'd like to recover batching
>> across subtransactions properly in v20 but didn't want to rush it now.
>>
>> On defect 3, can you check whether your reproducer still commits the
>> orphan with 0002 applied, or whether (like on my build) it now raises
>> the violation? I'd like to be sure the bucket-placement variation you
>> hit is actually covered. And of course any review of the patches is
>> welcome.
>
> Hi Amit,
>
> Thanks for the quick fixes.
>
> I checked v1-0001 + v1-0002 against current master (e18b0cb7) with an assertion/debug build.
>
> - Both apply cleanly to master (in sequence)
> - Defect 1 same-FK re-entry no longer crashes; the original shape completes and leaves the expected rows
> - Defect 2 subtransaction-abort case now raises the FK violation instead of committing orphans
> - For your defect 3 question: with 0002 applied, my reproducer no longer commits the child2 orphan. It raises:
>     ERROR: insert or update on table "child2" violates foreign key constraint "child2_fkey"
>     DETAIL: Key (a)=(999999) is not present in table "parent".
>
> After the error, child2_orphans = 0 and child2 is empty in my run.
>
> I also ran the regression suite in that tree; foreign_key passed, and the full run reported all 245 tests passed.
>
> So v1 looks good to me for the three reported cases.

Thanks for checking.  I will review them a bit more closely before
committing by Friday.  Other reviews are welcome.

-- 
Thanks, Amit Langote






^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-10 10:08  Ayush Tiwari <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Ayush Tiwari @ 2026-06-10 10:08 UTC (permalink / raw)
  To: Amit Langote <[email protected]>; +Cc: Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

Hi,

On Wed, 10 Jun 2026 at 14:02, Amit Langote <[email protected]> wrote

>
> Thanks for checking.  I will review them a bit more closely before
> committing by Friday.  Other reviews are welcome.
>

Thanks for the patch!

I read through v1-0001 and v1-0002 and tried them locally. I had a couple of
things I wanted to ask about.

1. The per-entry "flushing" flag and test coverage.  If I'm reading the two
patches together correctly, with both applied the 64-row re-entry test in
0001
reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
ri_fastpath_flushing guard already routes the re-entrant check to the
per-row
path before it gets back into ri_FastPathBatchAdd().  Does that mean the
per-entry flag from 0001 isn't really exercised by that test once 0002 is
in?
As far as I can tell you'd need the flush to fire from ri_FastPathBatchAdd()
itself (a 65th row) to reach it.  I tried a 65-row variant (same FK,
re-entrant
DML from the cast during the full-batch flush), including a case where the
re-entrant row was an orphan, and it seemed to do the right thing; the
per-row fallback still raised the violation.  Would it be worth switching
the
test to 65 rows, or adding that variant, so the per-entry guard is covered
too?
Or am I missing a path where the committed test already hits it?

2. Resetting ri_fastpath_flushing.  I noticed it's cleared only in the
PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases I
could
think of.  Since ri_FastPathXactCallback already NULLs ri_fastpath_cache and
clears ri_fastpath_callback_registered at transaction end, I wondered
whether
it might be worth clearing ri_fastpath_flushing there too, just as cheap
insurance against some future path that leaves it set across transactions
though maybe that's unnecessary given the PG_FINALLY.

Other than the above queries, the patch looks good to me.

Regards,
Ayush

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-10 12:17  Amit Langote <[email protected]>
  parent: Ayush Tiwari <[email protected]>
  0 siblings, 2 replies; 14+ messages in thread

From: Amit Langote @ 2026-06-10 12:17 UTC (permalink / raw)
  To: Ayush Tiwari <[email protected]>; +Cc: Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

Hi Ayush,

Thanks for the review.

On Wed, Jun 10, 2026 at 7:09 PM Ayush Tiwari
<[email protected]> wrote:
> On Wed, 10 Jun 2026 at 14:02, Amit Langote <[email protected]> wrote
>> Thanks for checking.  I will review them a bit more closely before
>> committing by Friday.  Other reviews are welcome.
>
> Thanks for the patch!
>
> I read through v1-0001 and v1-0002 and tried them locally. I had a couple of
> things I wanted to ask about.
>
> 1. The per-entry "flushing" flag and test coverage.  If I'm reading the two
> patches together correctly, with both applied the 64-row re-entry test in 0001
> reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
> ri_fastpath_flushing guard already routes the re-entrant check to the per-row
> path before it gets back into ri_FastPathBatchAdd().  Does that mean the
> per-entry flag from 0001 isn't really exercised by that test once 0002 is in?
> As far as I can tell you'd need the flush to fire from ri_FastPathBatchAdd()
> itself (a 65th row) to reach it.  I tried a 65-row variant (same FK, re-entrant
> DML from the cast during the full-batch flush), including a case where the
> re-entrant row was an orphan, and it seemed to do the right thing; the
> per-row fallback still raised the violation.  Would it be worth switching the
> test to 65 rows, or adding that variant, so the per-entry guard is covered too?
> Or am I missing a path where the committed test already hits it?

You're right. With 0002 applied, the 64-row test reaches the flush
through ri_FastPathEndBatch(), where the cache-wide
ri_fastpath_flushing guard catches the re-entry before it returns to
ri_FastPathBatchAdd(), so the per-entry flag is no longer exercised by
that test. To hit the per-entry flag the flush has to fire from
ri_FastPathBatchAdd() itself, which the 64-row case no longer does
once the add and flush are reordered.

Rather than bump the test to 65 rows, I'd prefer to keep the flush
firing from ri_FastPathBatchAdd() at 64 by not reordering the add and
flush, and prevent the OOB write by bounds-checking the write instead,
as done in the attached updated 0001. A re-entrant add then can't
overrun the array regardless of the flag, the per-entry flushing guard
still routes the re-entry to the per-row path, and a 64-row statement
flushes from ri_FastPathBatchAdd() on the 64th row, so the existing
test exercises the per-entry guard.

> 2. Resetting ri_fastpath_flushing.  I noticed it's cleared only in the
> PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases I could
> think of.  Since ri_FastPathXactCallback already NULLs ri_fastpath_cache and
> clears ri_fastpath_callback_registered at transaction end, I wondered whether
> it might be worth clearing ri_fastpath_flushing there too, just as cheap
> insurance against some future path that leaves it set across transactions
> though maybe that's unnecessary given the PG_FINALLY.

Agreed, it's cheap and matches the existing resets there, so I've
added it to ri_FastPathXactCallback() in v2-0002.

> Other than the above queries, the patch looks good to me.

Updated patches attached.

-- 
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v2-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch (10.5K, 2-v2-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch)
  download | inline diff:
From f2979edb939d37e81d8144d18328aafcceb501c5 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:07:09 +0900
Subject: [PATCH v2 1/2] Fix out-of-bounds write in RI fast-path batch on
 re-entry

The FK fast-path batching added in b7b27eb41a5 wrote the incoming row
into the batch array before checking whether the array was full:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
    fpentry->batch_count++;
    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

batch_count is reset to zero only at the end of ri_FastPathBatchFlush(),
so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush.
A flush runs user-defined cast functions and equality operators; if that
user code performs DML on the same FK table, ri_FastPathBatchAdd()
re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past
the end of the array, corrupting the adjacent batch_count field.  This
is reachable by an unprivileged table owner via an implicit cast with a
PL/pgSQL function and causes a SIGSEGV in assert-enabled builds.

Fix by bounds-checking the write into the batch array so a re-entrant
add can never write past the end, and by adding a "flushing" flag to
RI_FastPathEntry that routes re-entrant ri_FastPathBatchAdd() calls on
a busy entry to the per-row path (ri_FastPathCheck) instead of touching
the mid-flush batch array.  The flag is set around the probe in
ri_FastPathBatchFlush() and cleared in a PG_FINALLY, which also resets
batch_count, so the entry is left empty and reusable if a flush error
(including a reported FK violation) is caught by a savepoint.

Add regression tests for both the re-entrant flush and reuse of an entry
after a flush error caught by a savepoint.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 70 +++++++++++++++++------
 src/test/regress/expected/foreign_key.out | 56 ++++++++++++++++++
 src/test/regress/sql/foreign_key.sql      | 46 +++++++++++++++
 3 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index dc89c686394..6d0d4204886 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -249,6 +249,12 @@ typedef struct RI_FastPathEntry
 	 */
 	HeapTuple	batch[RI_FASTPATH_BATCH_SIZE];
 	int			batch_count;
+
+	/*
+	 * true while this entry's batch is being flushed; guards against
+	 * re-entrant ri_FastPathBatchAdd from user code run during the flush.
+	 */
+	bool		flushing;
 } RI_FastPathEntry;
 
 /*
@@ -2860,15 +2866,31 @@ ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
 					Relation fk_rel, TupleTableSlot *newslot)
 {
 	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
-	MemoryContext oldcxt;
 
-	oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
-	fpentry->batch[fpentry->batch_count] =
-		ExecCopySlotHeapTuple(newslot);
-	fpentry->batch_count++;
-	MemoryContextSwitchTo(oldcxt);
+	/*
+	 * If this entry is already being flushed, a cast function or an operator
+	 * invoked during the flush has re-entered with DML on the same FK.  Fall
+	 * back to the per-row path rather than touching the batch array, which is
+	 * mid-flush.
+	 */
+	if (fpentry->flushing)
+	{
+		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		return;
+	}
+
+	/* Buffer the row if there is room; otherwise it is flushed below. */
+	if (fpentry->batch_count < RI_FASTPATH_BATCH_SIZE)
+	{
+		MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
+		fpentry->batch[fpentry->batch_count] =
+			ExecCopySlotHeapTuple(newslot);
+		fpentry->batch_count++;
+		MemoryContextSwitchTo(oldcxt);
+	}
 
-	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+	/* Flush as soon as the batch is full. */
+	if (fpentry->batch_count == RI_FASTPATH_BATCH_SIZE)
 		ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
 }
 
@@ -2944,13 +2966,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 	}
 	Assert(riinfo->fpmeta);
 
-	/* Skip array overhead for single-row batches. */
-	if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
-		violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
-												fk_rel, snapshot, scandesc);
-	else
-		violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
-											   fk_rel, snapshot, scandesc);
+	/*
+	 * The probe runs user-defined cast and equality functions.  Set the
+	 * flushing flag around it so a re-entrant ri_FastPathBatchAdd on this
+	 * entry takes the per-row path, and clear it even on error so the entry
+	 * is reusable if the error is caught by a savepoint.
+	 */
+	Assert(!fpentry->flushing);
+	fpentry->flushing = true;
+	PG_TRY();
+	{
+		/* Skip array overhead for single-row batches. */
+		if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+			violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+													fk_rel, snapshot, scandesc);
+		else
+			violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+												   fk_rel, snapshot, scandesc);
+	}
+	PG_FINALLY();
+	{
+		fpentry->flushing = false;
+		fpentry->batch_count = 0;
+	}
+	PG_END_TRY();
 
 	SetUserIdAndSecContext(saved_userid, saved_sec_context);
 	UnregisterSnapshot(snapshot);
@@ -2966,9 +3005,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 
 	MemoryContextReset(fpentry->flush_cxt);
 	MemoryContextSwitchTo(oldcxt);
-
-	/* Reset. */
-	fpentry->batch_count = 0;
 }
 
 /*
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 8b3b268de0f..e08dff99f03 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3712,3 +3712,59 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+  a  | count 
+-----+-------
+ (1) |    64
+ (2) |    64
+(2 rows)
+
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+NOTICE:  caught fk violation
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+ count | max 
+-------+-----
+    64 |   1
+(1 row)
+
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 7eb86b188f0..87381194f41 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2680,3 +2680,49 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
-- 
2.47.3



  [application/octet-stream] v2-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch (11.3K, 3-v2-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch)
  download | inline diff:
From dd87a2e45f2f2dbdccd4d65cbd90256b7f72e277 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:11:10 +0900
Subject: [PATCH v2 2/2] Confine RI fast-path batching to the top transaction
 level

The FK fast-path batching added in b7b27eb41a5 buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.  As defensive insurance it is also cleared in
ri_FastPathXactCallback() at transaction end.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 82 ++++++++++++++++-------
 src/test/regress/expected/foreign_key.out | 24 +++++++
 src/test/regress/sql/foreign_key.sql      | 23 +++++++
 3 files changed, 103 insertions(+), 26 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6d0d4204886..6ad521fba34 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -267,6 +267,7 @@ static dclist_head ri_constraint_cache_valid_list;
 
 static HTAB *ri_fastpath_cache = NULL;
 static bool ri_fastpath_callback_registered = false;
+static bool ri_fastpath_flushing = false;
 
 /*
  * Local function prototypes
@@ -469,14 +470,31 @@ RI_FKey_check(TriggerData *trigdata)
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
-		if (AfterTriggerIsActive())
+		if (AfterTriggerIsActive() &&
+			GetCurrentTransactionNestLevel() == 1 &&
+			!ri_fastpath_flushing)
 		{
 			/* Batched path: buffer and probe in groups */
 			ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
 		}
 		else
 		{
-			/* ALTER TABLE validation: per-row, no cache */
+			/*
+			 * Per-row path, used when batching is not safe or not
+			 * applicable:
+			 *
+			 * - ALTER TABLE validation, where no after-trigger firing is
+			 *   active;
+			 *
+			 * - any FK check inside a subtransaction, since the batch cache
+			 *   is confined to the top transaction level (it cannot be
+			 *   cleanly unwound on subxact abort);
+			 *
+			 * - a re-entrant check from user cast/operator code running
+			 *   during a batch flush, since adding a cache entry while
+			 *   ri_FastPathEndBatch is iterating the cache could leave it
+			 *   unflushed.
+			 */
 			ri_FastPathCheck(riinfo, fk_rel, newslot);
 		}
 		return PointerGetDatum(NULL);
@@ -4174,19 +4192,41 @@ ri_FastPathEndBatch(void *arg)
 	if (ri_fastpath_cache == NULL)
 		return;
 
-	/* Flush any partial batches -- can throw ERROR */
-	hash_seq_init(&status, ri_fastpath_cache);
-	while ((entry = hash_seq_search(&status)) != NULL)
+	/*
+	 * Set a flag for the duration of the scan so that any FK check triggered
+	 * by user cast or operator code during a flush takes the per-row path
+	 * instead of adding a new entry to the cache we are iterating.  A new
+	 * entry could land in an already-scanned bucket and then be torn down
+	 * unflushed below.
+	 *
+	 * The flush can throw ERROR (a reported constraint violation, or an error
+	 * from the user code it runs).  In that case ri_FastPathTeardown below is
+	 * skipped; the ResourceOwner and the transaction-end callback handle
+	 * resource cleanup on the abort path.  The PG_FINALLY only resets the
+	 * flag and deliberately does not attempt teardown.
+	 */
+	Assert(!ri_fastpath_flushing);
+	ri_fastpath_flushing = true;
+	PG_TRY();
 	{
-		if (entry->batch_count > 0)
+		hash_seq_init(&status, ri_fastpath_cache);
+		while ((entry = hash_seq_search(&status)) != NULL)
 		{
-			Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
-			RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+			if (entry->batch_count > 0)
+			{
+				Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
+				RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
 
-			ri_FastPathBatchFlush(entry, fk_rel, riinfo);
-			table_close(fk_rel, NoLock);
+				ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+				table_close(fk_rel, NoLock);
+			}
 		}
 	}
+	PG_FINALLY();
+	{
+		ri_fastpath_flushing = false;
+	}
+	PG_END_TRY();
 
 	ri_FastPathTeardown();
 }
@@ -4238,22 +4278,13 @@ ri_FastPathXactCallback(XactEvent event, void *arg)
 	 */
 	ri_fastpath_cache = NULL;
 	ri_fastpath_callback_registered = false;
-}
 
-static void
-ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
-						   SubTransactionId parentSubid, void *arg)
-{
-	if (event == SUBXACT_EVENT_ABORT_SUB)
-	{
-		/*
-		 * ResourceOwner already released relations.  NULL the static pointers
-		 * so the still-registered batch callback becomes a no-op for the rest
-		 * of this transaction.
-		 */
-		ri_fastpath_cache = NULL;
-		ri_fastpath_callback_registered = false;
-	}
+	/*
+	 * Also clear the in-flush flag.  ri_FastPathEndBatch() already clears it
+	 * via PG_FINALLY, so this is just defensive: it keeps a stale flag from
+	 * surviving into the next transaction should any future path leave it set.
+	 */
+	ri_fastpath_flushing = false;
 }
 
 /*
@@ -4280,7 +4311,6 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 		if (!ri_fastpath_xact_callback_registered)
 		{
 			RegisterXactCallback(ri_FastPathXactCallback, NULL);
-			RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
 			ri_fastpath_xact_callback_registered = true;
 		}
 
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index e08dff99f03..e1563144d4c 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3768,3 +3768,27 @@ SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 (1 row)
 
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+ERROR:  insert or update on table "fp_subxact_fk" violates foreign key constraint "fp_subxact_fk_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_subxact_pk".
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 87381194f41..abeb85965b9 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2726,3 +2726,26 @@ BEGIN
 END$$;
 SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-10 12:48  Ayush Tiwari <[email protected]>
  parent: Amit Langote <[email protected]>
  1 sibling, 0 replies; 14+ messages in thread

From: Ayush Tiwari @ 2026-06-10 12:48 UTC (permalink / raw)
  To: Amit Langote <[email protected]>; +Cc: Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

Hi,

On Wed, 10 Jun 2026 at 17:47, Amit Langote <[email protected]> wrote:

> Hi Ayush,
>
> Thanks for the review.
>
> On Wed, Jun 10, 2026 at 7:09 PM Ayush Tiwari
> <[email protected]> wrote:
> > On Wed, 10 Jun 2026 at 14:02, Amit Langote <[email protected]>
> wrote
> >> Thanks for checking.  I will review them a bit more closely before
> >> committing by Friday.  Other reviews are welcome.
> >
> > Thanks for the patch!
> >
> > I read through v1-0001 and v1-0002 and tried them locally. I had a
> couple of
> > things I wanted to ask about.
> >
> > 1. The per-entry "flushing" flag and test coverage.  If I'm reading the
> two
> > patches together correctly, with both applied the 64-row re-entry test
> in 0001
> > reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
> > ri_fastpath_flushing guard already routes the re-entrant check to the
> per-row
> > path before it gets back into ri_FastPathBatchAdd().  Does that mean the
> > per-entry flag from 0001 isn't really exercised by that test once 0002
> is in?
> > As far as I can tell you'd need the flush to fire from
> ri_FastPathBatchAdd()
> > itself (a 65th row) to reach it.  I tried a 65-row variant (same FK,
> re-entrant
> > DML from the cast during the full-batch flush), including a case where
> the
> > re-entrant row was an orphan, and it seemed to do the right thing; the
> > per-row fallback still raised the violation.  Would it be worth
> switching the
> > test to 65 rows, or adding that variant, so the per-entry guard is
> covered too?
> > Or am I missing a path where the committed test already hits it?
>
> You're right. With 0002 applied, the 64-row test reaches the flush
> through ri_FastPathEndBatch(), where the cache-wide
> ri_fastpath_flushing guard catches the re-entry before it returns to
> ri_FastPathBatchAdd(), so the per-entry flag is no longer exercised by
> that test. To hit the per-entry flag the flush has to fire from
> ri_FastPathBatchAdd() itself, which the 64-row case no longer does
> once the add and flush are reordered.
>
> Rather than bump the test to 65 rows, I'd prefer to keep the flush
> firing from ri_FastPathBatchAdd() at 64 by not reordering the add and
> flush, and prevent the OOB write by bounds-checking the write instead,
> as done in the attached updated 0001. A re-entrant add then can't
> overrun the array regardless of the flag, the per-entry flushing guard
> still routes the re-entry to the per-row path, and a 64-row statement
> flushes from ri_FastPathBatchAdd() on the 64th row, so the existing
> test exercises the per-entry guard.
>

Makes sense, it is better.

> 2. Resetting ri_fastpath_flushing.  I noticed it's cleared only in the
> > PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases
> I could
> > think of.  Since ri_FastPathXactCallback already NULLs ri_fastpath_cache
> and
> > clears ri_fastpath_callback_registered at transaction end, I wondered
> whether
> > it might be worth clearing ri_fastpath_flushing there too, just as cheap
> > insurance against some future path that leaves it set across transactions
> > though maybe that's unnecessary given the PG_FINALLY.
>
> Agreed, it's cheap and matches the existing resets there, so I've
> added it to ri_FastPathXactCallback() in v2-0002.
>
> > Other than the above queries, the patch looks good to me.
>
> Updated patches attached.
>

Thanks for the updated patches!

Both patches, lgtm.

Regards,
Ayush


^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-11 08:18  Junwang Zhao <[email protected]>
  parent: Amit Langote <[email protected]>
  1 sibling, 1 reply; 14+ messages in thread

From: Junwang Zhao @ 2026-06-11 08:18 UTC (permalink / raw)
  To: Amit Langote <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

Hi Amit,

On Wed, Jun 10, 2026 at 8:17 PM Amit Langote <[email protected]> wrote:
>
> Hi Ayush,
>
> Thanks for the review.
>
> On Wed, Jun 10, 2026 at 7:09 PM Ayush Tiwari
> <[email protected]> wrote:
> > On Wed, 10 Jun 2026 at 14:02, Amit Langote <[email protected]> wrote
> >> Thanks for checking.  I will review them a bit more closely before
> >> committing by Friday.  Other reviews are welcome.
> >
> > Thanks for the patch!
> >
> > I read through v1-0001 and v1-0002 and tried them locally. I had a couple of
> > things I wanted to ask about.
> >
> > 1. The per-entry "flushing" flag and test coverage.  If I'm reading the two
> > patches together correctly, with both applied the 64-row re-entry test in 0001
> > reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
> > ri_fastpath_flushing guard already routes the re-entrant check to the per-row
> > path before it gets back into ri_FastPathBatchAdd().  Does that mean the
> > per-entry flag from 0001 isn't really exercised by that test once 0002 is in?
> > As far as I can tell you'd need the flush to fire from ri_FastPathBatchAdd()
> > itself (a 65th row) to reach it.  I tried a 65-row variant (same FK, re-entrant
> > DML from the cast during the full-batch flush), including a case where the
> > re-entrant row was an orphan, and it seemed to do the right thing; the
> > per-row fallback still raised the violation.  Would it be worth switching the
> > test to 65 rows, or adding that variant, so the per-entry guard is covered too?
> > Or am I missing a path where the committed test already hits it?
>
> You're right. With 0002 applied, the 64-row test reaches the flush
> through ri_FastPathEndBatch(), where the cache-wide
> ri_fastpath_flushing guard catches the re-entry before it returns to
> ri_FastPathBatchAdd(), so the per-entry flag is no longer exercised by
> that test. To hit the per-entry flag the flush has to fire from
> ri_FastPathBatchAdd() itself, which the 64-row case no longer does
> once the add and flush are reordered.
>
> Rather than bump the test to 65 rows, I'd prefer to keep the flush
> firing from ri_FastPathBatchAdd() at 64 by not reordering the add and
> flush, and prevent the OOB write by bounds-checking the write instead,
> as done in the attached updated 0001. A re-entrant add then can't
> overrun the array regardless of the flag, the per-entry flushing guard
> still routes the re-entry to the per-row path, and a 64-row statement
> flushes from ri_FastPathBatchAdd() on the 64th row, so the existing
> test exercises the per-entry guard.
>
> > 2. Resetting ri_fastpath_flushing.  I noticed it's cleared only in the
> > PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases I could
> > think of.  Since ri_FastPathXactCallback already NULLs ri_fastpath_cache and
> > clears ri_fastpath_callback_registered at transaction end, I wondered whether
> > it might be worth clearing ri_fastpath_flushing there too, just as cheap
> > insurance against some future path that leaves it set across transactions
> > though maybe that's unnecessary given the PG_FINALLY.
>
> Agreed, it's cheap and matches the existing resets there, so I've
> added it to ri_FastPathXactCallback() in v2-0002.
>
> > Other than the above queries, the patch looks good to me.
>
> Updated patches attached.

I only reviewed and applied patch 0001 on my local machine, and it
successfully fixed the crash.

One minor comment:

+ if (fpentry->flushing)
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return;
+ }

Would it be worth wrapping the condition with unlikely()? It seems
this branch is expected to be false in most cases, not a strong
opinion though.

>
> --
> Thanks, Amit Langote



-- 
Regards
Junwang Zhao






^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-11 09:05  Amit Langote <[email protected]>
  parent: Junwang Zhao <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-11 09:05 UTC (permalink / raw)
  To: Junwang Zhao <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Thu, Jun 11, 2026 at 5:18 PM Junwang Zhao <[email protected]> wrote:
> I only reviewed and applied patch 0001 on my local machine, and it
> successfully fixed the crash.
>
> One minor comment:
>
> + if (fpentry->flushing)
> + {
> + ri_FastPathCheck(riinfo, fk_rel, newslot);
> + return;
> + }
>
> Would it be worth wrapping the condition with unlikely()? It seems
> this branch is expected to be false in most cases, not a strong
> opinion though.

Good idea.  Will do.

Are you planning to look at 0002?

-- 
Thanks, Amit Langote





^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-11 09:50  Junwang Zhao <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Junwang Zhao @ 2026-06-11 09:50 UTC (permalink / raw)
  To: Amit Langote <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

Hi Amit,

On Thu, Jun 11, 2026 at 5:05 PM Amit Langote <[email protected]> wrote:
>
> On Thu, Jun 11, 2026 at 5:18 PM Junwang Zhao <[email protected]> wrote:
> > I only reviewed and applied patch 0001 on my local machine, and it
> > successfully fixed the crash.
> >
> > One minor comment:
> >
> > + if (fpentry->flushing)
> > + {
> > + ri_FastPathCheck(riinfo, fk_rel, newslot);
> > + return;
> > + }
> >
> > Would it be worth wrapping the condition with unlikely()? It seems
> > this branch is expected to be false in most cases, not a strong
> > opinion though.
>
> Good idea.  Will do.
>
> Are you planning to look at 0002?

I just applied 0002 and ran the regression successfully.

I have one trivial comment, subXact abort doesn't NULL the
ri_fastpath_cache, so I think the following comment of
RI_FastPathEntry should be polished accordingly by removing the
`SubXactCallback`.

* ri_FastPathEndBatch(); on abort, ResourceOwner releases the cached
* relations and the XactCallback/SubXactCallback NULL the static cache pointer
* to prevent any subsequent access.

>
> --
> Thanks, Amit Langote


-- 
Regards
Junwang Zhao





^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-11 10:47  Amit Langote <[email protected]>
  parent: Junwang Zhao <[email protected]>
  0 siblings, 1 reply; 14+ messages in thread

From: Amit Langote @ 2026-06-11 10:47 UTC (permalink / raw)
  To: Junwang Zhao <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Thu, Jun 11, 2026 at 6:51 PM Junwang Zhao <[email protected]> wrote:
> On Thu, Jun 11, 2026 at 5:05 PM Amit Langote <[email protected]> wrote:
> >
> > On Thu, Jun 11, 2026 at 5:18 PM Junwang Zhao <[email protected]> wrote:
> > > I only reviewed and applied patch 0001 on my local machine, and it
> > > successfully fixed the crash.
> > >
> > > One minor comment:
> > >
> > > + if (fpentry->flushing)
> > > + {
> > > + ri_FastPathCheck(riinfo, fk_rel, newslot);
> > > + return;
> > > + }
> > >
> > > Would it be worth wrapping the condition with unlikely()? It seems
> > > this branch is expected to be false in most cases, not a strong
> > > opinion though.
> >
> > Good idea.  Will do.
> >
> > Are you planning to look at 0002?
>
> I just applied 0002 and ran the regression successfully.
>
> I have one trivial comment, subXact abort doesn't NULL the
> ri_fastpath_cache, so I think the following comment of
> RI_FastPathEntry should be polished accordingly by removing the
> `SubXactCallback`.
>
> * ri_FastPathEndBatch(); on abort, ResourceOwner releases the cached
> * relations and the XactCallback/SubXactCallback NULL the static cache pointer
> * to prevent any subsequent access.

Thanks for the review.  Yes, I missed that.

I've updated the patches to address your comments and did some other polishing.


--
Thanks, Amit Langote


Attachments:

  [application/x-patch] v3-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch (11.9K, 2-v3-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch)
  download | inline diff:
From 00e3f4ff89b3c533899eb996be4e98e3ee82c63c Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:11:10 +0900
Subject: [PATCH v3 2/2] Confine RI fast-path batching to the top transaction
 level

The FK fast-path batching added in b7b27eb41a5 buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.  As defensive insurance it is also cleared in
ri_FastPathXactCallback() at transaction end.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 86 +++++++++++++++--------
 src/test/regress/expected/foreign_key.out | 24 +++++++
 src/test/regress/sql/foreign_key.sql      | 23 ++++++
 3 files changed, 105 insertions(+), 28 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 917a544c32b..06e7728c45d 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -228,8 +228,8 @@ typedef struct RI_CompareHashEntry
  * relations are held open with locks for the transaction duration, preventing
  * relcache invalidation.  The entry itself is torn down at batch end by
  * ri_FastPathEndBatch(); on abort, ResourceOwner releases the cached
- * relations and the XactCallback/SubXactCallback NULL the static cache pointer
- * to prevent any subsequent access.
+ * relations and the XactCallback NULLs the static cache pointer to prevent
+ * any subsequent access.
  */
 typedef struct RI_FastPathEntry
 {
@@ -267,6 +267,7 @@ static dclist_head ri_constraint_cache_valid_list;
 
 static HTAB *ri_fastpath_cache = NULL;
 static bool ri_fastpath_callback_registered = false;
+static bool ri_fastpath_flushing = false;
 
 /*
  * Local function prototypes
@@ -469,14 +470,30 @@ RI_FKey_check(TriggerData *trigdata)
 	 */
 	if (ri_fastpath_is_applicable(riinfo))
 	{
-		if (AfterTriggerIsActive())
+		if (AfterTriggerIsActive() &&
+			GetCurrentTransactionNestLevel() == 1 &&
+			!ri_fastpath_flushing)
 		{
 			/* Batched path: buffer and probe in groups */
 			ri_FastPathBatchAdd(riinfo, fk_rel, newslot);
 		}
 		else
 		{
-			/* ALTER TABLE validation: per-row, no cache */
+			/*
+			 * Per-row path, used when batching is not safe or not applicable:
+			 *
+			 * - ALTER TABLE validation, where no after-trigger firing is
+			 * active;
+			 *
+			 * - any FK check inside a subtransaction, since the batch cache
+			 * is confined to the top transaction level (it cannot be cleanly
+			 * unwound on subxact abort);
+			 *
+			 * - a re-entrant check from user cast/operator code running
+			 * during a batch flush, since adding a cache entry while
+			 * ri_FastPathEndBatch is iterating the cache could leave it
+			 * unflushed.
+			 */
 			ri_FastPathCheck(riinfo, fk_rel, newslot);
 		}
 		return PointerGetDatum(NULL);
@@ -4181,19 +4198,41 @@ ri_FastPathEndBatch(void *arg)
 	if (ri_fastpath_cache == NULL)
 		return;
 
-	/* Flush any partial batches -- can throw ERROR */
-	hash_seq_init(&status, ri_fastpath_cache);
-	while ((entry = hash_seq_search(&status)) != NULL)
+	/*
+	 * Set a flag for the duration of the scan so that any FK check triggered
+	 * by user cast or operator code during a flush takes the per-row path
+	 * instead of adding a new entry to the cache we are iterating.  A new
+	 * entry could land in an already-scanned bucket and then be torn down
+	 * unflushed below.
+	 *
+	 * The flush can throw ERROR (a reported constraint violation, or an error
+	 * from the user code it runs).  In that case ri_FastPathTeardown below is
+	 * skipped; the ResourceOwner and the transaction-end callback handle
+	 * resource cleanup on the abort path.  The PG_FINALLY only resets the
+	 * flag and deliberately does not attempt teardown.
+	 */
+	Assert(!ri_fastpath_flushing);
+	ri_fastpath_flushing = true;
+	PG_TRY();
 	{
-		if (entry->batch_count > 0)
+		hash_seq_init(&status, ri_fastpath_cache);
+		while ((entry = hash_seq_search(&status)) != NULL)
 		{
-			Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
-			RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
+			if (entry->batch_count > 0)
+			{
+				Relation	fk_rel = table_open(entry->fk_relid, AccessShareLock);
+				RI_ConstraintInfo *riinfo = ri_LoadConstraintInfo(entry->conoid);
 
-			ri_FastPathBatchFlush(entry, fk_rel, riinfo);
-			table_close(fk_rel, NoLock);
+				ri_FastPathBatchFlush(entry, fk_rel, riinfo);
+				table_close(fk_rel, NoLock);
+			}
 		}
 	}
+	PG_FINALLY();
+	{
+		ri_fastpath_flushing = false;
+	}
+	PG_END_TRY();
 
 	ri_FastPathTeardown();
 }
@@ -4245,22 +4284,14 @@ ri_FastPathXactCallback(XactEvent event, void *arg)
 	 */
 	ri_fastpath_cache = NULL;
 	ri_fastpath_callback_registered = false;
-}
 
-static void
-ri_FastPathSubXactCallback(SubXactEvent event, SubTransactionId mySubid,
-						   SubTransactionId parentSubid, void *arg)
-{
-	if (event == SUBXACT_EVENT_ABORT_SUB)
-	{
-		/*
-		 * ResourceOwner already released relations.  NULL the static pointers
-		 * so the still-registered batch callback becomes a no-op for the rest
-		 * of this transaction.
-		 */
-		ri_fastpath_cache = NULL;
-		ri_fastpath_callback_registered = false;
-	}
+	/*
+	 * Also clear the in-flush flag.  ri_FastPathEndBatch() already clears it
+	 * via PG_FINALLY, so this is just defensive: it keeps a stale flag from
+	 * surviving into the next transaction should any future path leave it
+	 * set.
+	 */
+	ri_fastpath_flushing = false;
 }
 
 /*
@@ -4287,7 +4318,6 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 		if (!ri_fastpath_xact_callback_registered)
 		{
 			RegisterXactCallback(ri_FastPathXactCallback, NULL);
-			RegisterSubXactCallback(ri_FastPathSubXactCallback, NULL);
 			ri_fastpath_xact_callback_registered = true;
 		}
 
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index e08dff99f03..e1563144d4c 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3768,3 +3768,27 @@ SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 (1 row)
 
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+ERROR:  insert or update on table "fp_subxact_fk" violates foreign key constraint "fp_subxact_fk_fkey"
+DETAIL:  Key (a)=(999) is not present in table "fp_subxact_pk".
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 87381194f41..abeb85965b9 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2726,3 +2726,26 @@ BEGIN
 END$$;
 SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
 DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
+
+-- Subtransaction abort during after-trigger firing must not drop FK checks
+-- for rows buffered earlier in the same statement.  Batching is confined to
+-- the top transaction level and the buffered batch is no longer discarded on
+-- subxact abort, so the violating rows are detected.
+CREATE TABLE fp_subxact_pk (id int PRIMARY KEY);
+INSERT INTO fp_subxact_pk SELECT g FROM generate_series(1, 10) g;
+CREATE TABLE fp_subxact_fk (a int, tag text);
+ALTER TABLE fp_subxact_fk ADD CONSTRAINT fp_subxact_fk_fkey
+    FOREIGN KEY (a) REFERENCES fp_subxact_pk (id);
+CREATE FUNCTION fp_abort_subxact() RETURNS trigger LANGUAGE plpgsql AS $$
+BEGIN
+    IF NEW.tag = 'boom' THEN
+        BEGIN PERFORM 1/0; EXCEPTION WHEN division_by_zero THEN NULL; END;
+    END IF;
+    RETURN NEW;
+END$$;
+CREATE TRIGGER fp_subxact_trg AFTER INSERT ON fp_subxact_fk
+    FOR EACH ROW EXECUTE FUNCTION fp_abort_subxact();
+INSERT INTO fp_subxact_fk VALUES (999, 'bad'), (0, 'boom'), (1, 'ok');
+DROP TRIGGER fp_subxact_trg ON fp_subxact_fk;
+DROP FUNCTION fp_abort_subxact();
+DROP TABLE fp_subxact_fk, fp_subxact_pk;
-- 
2.47.3



  [application/x-patch] v3-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch (10.9K, 3-v3-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch)
  download | inline diff:
From 9a078d0f332f96d8e16040558ddb22daaf199778 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 10 Jun 2026 21:07:09 +0900
Subject: [PATCH v3 1/2] Fix out-of-bounds write in RI fast-path batch on
 re-entry

The FK fast-path batching added in b7b27eb41a5 wrote the incoming row
into the batch array before checking whether the array was full:

    fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
    fpentry->batch_count++;
    if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
        ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);

batch_count is reset to zero only at the end of ri_FastPathBatchFlush(),
so it remains at RI_FASTPATH_BATCH_SIZE throughout a full-batch flush.
A flush runs user-defined cast functions and equality operators; if that
user code performs DML on the same FK table, ri_FastPathBatchAdd()
re-enters with batch_count == RI_FASTPATH_BATCH_SIZE and writes one past
the end of the array, corrupting the adjacent batch_count field.  This
is reachable by an unprivileged table owner via an implicit cast with a
PL/pgSQL function and causes a SIGSEGV in assert-enabled builds.

Fix by bounds-checking the write into the batch array so a re-entrant
add can never write past the end, and by adding a "flushing" flag to
RI_FastPathEntry that routes re-entrant ri_FastPathBatchAdd() calls on
a busy entry to the per-row path (ri_FastPathCheck) instead of touching
the mid-flush batch array.  The flag is set around the probe in
ri_FastPathBatchFlush() and cleared in a PG_FINALLY, which also resets
batch_count, so the entry is left empty and reusable if a flush error
(including a reported FK violation) is caught by a savepoint.

Add regression tests for both the re-entrant flush and reuse of an entry
after a flush error caught by a savepoint.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Discussion: https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=aw@mail.gmail.com
---
 src/backend/utils/adt/ri_triggers.c       | 80 ++++++++++++++++++-----
 src/test/regress/expected/foreign_key.out | 56 ++++++++++++++++
 src/test/regress/sql/foreign_key.sql      | 46 +++++++++++++
 3 files changed, 165 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index dc89c686394..917a544c32b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -249,6 +249,12 @@ typedef struct RI_FastPathEntry
 	 */
 	HeapTuple	batch[RI_FASTPATH_BATCH_SIZE];
 	int			batch_count;
+
+	/*
+	 * true while this entry's batch is being flushed; guards against
+	 * re-entrant ri_FastPathBatchAdd from user code run during the flush.
+	 */
+	bool		flushing;
 } RI_FastPathEntry;
 
 /*
@@ -2860,15 +2866,38 @@ ri_FastPathBatchAdd(RI_ConstraintInfo *riinfo,
 					Relation fk_rel, TupleTableSlot *newslot)
 {
 	RI_FastPathEntry *fpentry = ri_FastPathGetEntry(riinfo, fk_rel);
-	MemoryContext oldcxt;
 
-	oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
-	fpentry->batch[fpentry->batch_count] =
-		ExecCopySlotHeapTuple(newslot);
-	fpentry->batch_count++;
-	MemoryContextSwitchTo(oldcxt);
+	/*
+	 * If this entry is already being flushed, a cast function or an operator
+	 * invoked during the flush has re-entered with DML on the same FK.  Fall
+	 * back to the per-row path rather than touching the batch array, which is
+	 * mid-flush.
+	 */
+	if (unlikely(fpentry->flushing))
+	{
+		ri_FastPathCheck(riinfo, fk_rel, newslot);
+		return;
+	}
+
+	/*
+	 * Buffer the row.  A full batch is flushed below and re-entry is handled
+	 * above, so there is always room here; the bounds check just guards the
+	 * array write.
+	 */
+	if (fpentry->batch_count < RI_FASTPATH_BATCH_SIZE)
+	{
+		MemoryContext oldcxt = MemoryContextSwitchTo(fpentry->flush_cxt);
 
-	if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
+		fpentry->batch[fpentry->batch_count] =
+			ExecCopySlotHeapTuple(newslot);
+		fpentry->batch_count++;
+		MemoryContextSwitchTo(oldcxt);
+	}
+	else
+		elog(ERROR, "RI fast-path batch unexpectedly full");
+
+	/* Flush as soon as the batch is full. */
+	if (fpentry->batch_count == RI_FASTPATH_BATCH_SIZE)
 		ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
 }
 
@@ -2944,13 +2973,30 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 	}
 	Assert(riinfo->fpmeta);
 
-	/* Skip array overhead for single-row batches. */
-	if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
-		violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
-												fk_rel, snapshot, scandesc);
-	else
-		violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
-											   fk_rel, snapshot, scandesc);
+	/*
+	 * The probe runs user-defined cast and equality functions.  Set the
+	 * flushing flag around it so a re-entrant ri_FastPathBatchAdd on this
+	 * entry takes the per-row path, and clear it even on error so the entry
+	 * is reusable if the error is caught by a savepoint.
+	 */
+	Assert(!fpentry->flushing);
+	fpentry->flushing = true;
+	PG_TRY();
+	{
+		/* Skip array overhead for single-row batches. */
+		if (riinfo->nkeys == 1 && fpentry->batch_count > 1)
+			violation_index = ri_FastPathFlushArray(fpentry, fk_slot, riinfo,
+													fk_rel, snapshot, scandesc);
+		else
+			violation_index = ri_FastPathFlushLoop(fpentry, fk_slot, riinfo,
+												   fk_rel, snapshot, scandesc);
+	}
+	PG_FINALLY();
+	{
+		fpentry->flushing = false;
+		fpentry->batch_count = 0;
+	}
+	PG_END_TRY();
 
 	SetUserIdAndSecContext(saved_userid, saved_sec_context);
 	UnregisterSnapshot(snapshot);
@@ -2966,9 +3012,6 @@ ri_FastPathBatchFlush(RI_FastPathEntry *fpentry, Relation fk_rel,
 
 	MemoryContextReset(fpentry->flush_cxt);
 	MemoryContextSwitchTo(oldcxt);
-
-	/* Reset. */
-	fpentry->batch_count = 0;
 }
 
 /*
@@ -4307,6 +4350,9 @@ ri_FastPathGetEntry(const RI_ConstraintInfo *riinfo, Relation fk_rel)
 			RegisterAfterTriggerBatchCallback(ri_FastPathEndBatch, NULL);
 			ri_fastpath_callback_registered = true;
 		}
+
+		entry->flushing = false;
+		entry->batch_count = 0;
 	}
 
 	return entry;
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 8b3b268de0f..e08dff99f03 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -3712,3 +3712,59 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+  a  | count 
+-----+-------
+ (1) |    64
+ (2) |    64
+(2 rows)
+
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+NOTICE:  caught fk violation
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+ count | max 
+-------+-----
+    64 |   1
+(1 row)
+
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index 7eb86b188f0..87381194f41 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -2680,3 +2680,49 @@ INSERT INTO fp_pk_dup VALUES (1);
 CREATE TABLE fp_fk_dup (a int REFERENCES fp_pk_dup);
 INSERT INTO fp_fk_dup SELECT 1 FROM generate_series(1, 100);
 DROP TABLE fp_fk_dup, fp_pk_dup;
+
+-- Re-entrant FK fast-path: DML on the same FK table from a cast function
+-- during a full-batch flush must not corrupt the batch array.
+CREATE TABLE fp_reentry_pk (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk VALUES (1), (2);
+CREATE TYPE fp_vch AS (v int);
+CREATE FUNCTION fp_vcast(fp_vch) RETURNS int LANGUAGE plpgsql AS $$
+BEGIN
+    IF $1.v = 1 THEN
+        INSERT INTO fp_reentry_fk VALUES (row(2)::fp_vch);
+    END IF;
+    RETURN $1.v;
+END$$;
+CREATE CAST (fp_vch AS int) WITH FUNCTION fp_vcast(fp_vch) AS IMPLICIT;
+CREATE TABLE fp_reentry_fk (a fp_vch
+    REFERENCES fp_reentry_pk (id));
+-- Fill exactly one batch so the flush fires; the cast re-enters with DML
+-- on the same FK and must take the per-row path.
+INSERT INTO fp_reentry_fk SELECT row(1)::fp_vch FROM generate_series(1, 64);
+SELECT a, count(*) FROM fp_reentry_fk GROUP BY a ORDER BY a;
+DROP TABLE fp_reentry_fk, fp_reentry_pk;
+DROP CAST (fp_vch AS int);
+DROP FUNCTION fp_vcast(fp_vch);
+DROP TYPE fp_vch;
+
+-- Flush error caught by a savepoint must leave the entry empty and reusable.
+CREATE TABLE fp_reentry_pk2 (id int PRIMARY KEY);
+INSERT INTO fp_reentry_pk2 VALUES (1);
+CREATE TABLE fp_reentry_fk2 (a int REFERENCES fp_reentry_pk2 (id));
+DO $$
+BEGIN
+    -- A batch containing a violating row; the flush reports the violation.
+    BEGIN
+        INSERT INTO fp_reentry_fk2 SELECT CASE WHEN g = 32 THEN 999 ELSE 1 END
+            FROM generate_series(1, 64) g;
+    EXCEPTION WHEN foreign_key_violation THEN
+        RAISE NOTICE 'caught fk violation';
+    END;
+
+    -- Reuse the same FK with a full batch in the same transaction.  The
+    -- entry must be empty after the caught violation: no stale rows from the
+    -- rolled-back batch (in particular no 999), and no array overflow.
+    INSERT INTO fp_reentry_fk2 SELECT 1 FROM generate_series(1, 64);
+END$$;
+SELECT count(*), max(a) FROM fp_reentry_fk2;  -- 64 rows, max 1
+DROP TABLE fp_reentry_fk2, fp_reentry_pk2;
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: PG19 FK fast path: OOB write and missed FK checks during batched
@ 2026-06-12 02:46  Amit Langote <[email protected]>
  parent: Amit Langote <[email protected]>
  0 siblings, 0 replies; 14+ messages in thread

From: Amit Langote @ 2026-06-12 02:46 UTC (permalink / raw)
  To: Junwang Zhao <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Nikolay Samokhvalov <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Kirk Wolak <[email protected]>

On Thu, Jun 11, 2026 at 7:47 PM Amit Langote <[email protected]> wrote:
>
> On Thu, Jun 11, 2026 at 6:51 PM Junwang Zhao <[email protected]> wrote:
> > On Thu, Jun 11, 2026 at 5:05 PM Amit Langote <[email protected]> wrote:
> > >
> > > On Thu, Jun 11, 2026 at 5:18 PM Junwang Zhao <[email protected]> wrote:
> > > > I only reviewed and applied patch 0001 on my local machine, and it
> > > > successfully fixed the crash.
> > > >
> > > > One minor comment:
> > > >
> > > > + if (fpentry->flushing)
> > > > + {
> > > > + ri_FastPathCheck(riinfo, fk_rel, newslot);
> > > > + return;
> > > > + }
> > > >
> > > > Would it be worth wrapping the condition with unlikely()? It seems
> > > > this branch is expected to be false in most cases, not a strong
> > > > opinion though.
> > >
> > > Good idea.  Will do.
> > >
> > > Are you planning to look at 0002?
> >
> > I just applied 0002 and ran the regression successfully.
> >
> > I have one trivial comment, subXact abort doesn't NULL the
> > ri_fastpath_cache, so I think the following comment of
> > RI_FastPathEntry should be polished accordingly by removing the
> > `SubXactCallback`.
> >
> > * ri_FastPathEndBatch(); on abort, ResourceOwner releases the cached
> > * relations and the XactCallback/SubXactCallback NULL the static cache pointer
> > * to prevent any subsequent access.
>
> Thanks for the review.  Yes, I missed that.
>
> I've updated the patches to address your comments and did some other polishing.

I've pushed these now.  Thank you everyone.

-- 
Thanks, Amit Langote






^ permalink  raw  reply  [nested|flat] 14+ messages in thread

end of thread, other threads:[~2026-06-12 02:46 UTC | newest]

Thread overview: 14+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-06 08:30 PG19 FK fast path: OOB write and missed FK checks during batched Nikolay Samokhvalov <[email protected]>
2026-06-06 09:13 ` Amit Langote <[email protected]>
2026-06-08 08:18   ` Amit Langote <[email protected]>
2026-06-09 13:31     ` Amit Langote <[email protected]>
2026-06-10 08:16       ` Nikolay Samokhvalov <[email protected]>
2026-06-10 08:32         ` Amit Langote <[email protected]>
2026-06-10 10:08           ` Ayush Tiwari <[email protected]>
2026-06-10 12:17             ` Amit Langote <[email protected]>
2026-06-10 12:48               ` Ayush Tiwari <[email protected]>
2026-06-11 08:18               ` Junwang Zhao <[email protected]>
2026-06-11 09:05                 ` Amit Langote <[email protected]>
2026-06-11 09:50                   ` Junwang Zhao <[email protected]>
2026-06-11 10:47                     ` Amit Langote <[email protected]>
2026-06-12 02:46                       ` Amit Langote <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox