Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vuWG6-00Az6Y-1W for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Feb 2026 13:45:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vuWG5-00DUHV-0N for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Feb 2026 13:45:57 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vuWG4-00DUHN-2X for pgsql-hackers@lists.postgresql.org; Mon, 23 Feb 2026 13:45:56 +0000 Received: from mail-ej1-x62b.google.com ([2a00:1450:4864:20::62b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vuWG1-00000000sBU-3YxB for pgsql-hackers@postgresql.org; Mon, 23 Feb 2026 13:45:56 +0000 Received: by mail-ej1-x62b.google.com with SMTP id a640c23a62f3a-b8f96f6956aso596182966b.3 for ; Mon, 23 Feb 2026 05:45:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771854353; cv=none; d=google.com; s=arc-20240605; b=L2H1DLY1yBDimvktbVfSeCa0rK4ZAZXDo18cezVAitkpalIMjiQEejqIZD5ZHBX6Is Uy+jkoRRaqk7VyEogVo8NOm2cAlWaZzDHjji+MmuRGk4MjIYB/g4aeSS2ZTFOBnSggw1 pwnYXTUHz8tjfBeD37HSUNWhOYnPQox+uf4XlkdlpBvD4rku/4QCd1t1iIpDE61huiNU XqRVgK6N4i5hhOaE30EcvBCrI1s6coJBQZ81ngRov0BbvJzVft1pB09bTNTbjWGbjfyu SrzRO/qkN1roHHo6hsw4F5EsXuAnH+bG/a4tr/tI5/MmZ3Gm/0lw8gloEwtG9MrH7IcF s+Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=i075zHHq12KkvS3pB/3Yv6rlk4JEovOqEbZ3KaFhQ5w=; fh=qmUBXfs6fBmr+WvAqgVVUQ6CUWnuNP4bqREOhuVmn7M=; b=ZUQCoXkh3yu0AyCEPF9jq1lGdG5QtB9BJQvsXfJVX0DnJVhceWkjjXbUAZGAmhs4Cf n+xHasXRpgCB9FshWQdaPIxuEyaQ7MY7m1zzM3RlWIDgOPyASOc5Csc+NWClgVjP1xd8 NYWqixm88XyH/f44FW+8pyrYOqTgqysBfyP+eb02i/HnKKgshiJnMsPGs9jGzI6LiTdQ Z8ShTypoP5qXQA7z4VqRKCa57S2gkC7Kmy5lYcCiAzFW6pm711pUS73QYj8SbRZJ5O07 aZmqqYk0DNcnT3ZpWDK+9pv6WcDh15euAb7fWPkmc4svQTCPv37c/lMW3+m9euWh/1NJ DXuw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771854353; x=1772459153; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=i075zHHq12KkvS3pB/3Yv6rlk4JEovOqEbZ3KaFhQ5w=; b=LDhZveFDPBdJW7si3pQs784uDICyb29A4gc2gsxFYqwBScTr2Tnq6EWZeOnSI3f+tS 4Fyn1yFhNC24A4DSMoO8xmEEMa5V7sovmHpTZFVtryBfnjspIHl527OIryUYzBfv8Kqk 14t8ZsMU1P+XyG5iZxHH0jDvYwpbB7ibIKGzACGWdFxwAEDJAEer0wGFSvCiL/yr+N0a vekmClitfO9nmtB6B5Y79WIRVtQDgEc0PkilHzogcu0DNLl52d/ahgic0JPM/SdS67TC n38EBM3zVwyulvODpqqedJS/y/LURLjrVivd36ejPCcsw9X4xmkPVn9fSMyk8mFDYjSM 6TFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771854353; x=1772459153; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=i075zHHq12KkvS3pB/3Yv6rlk4JEovOqEbZ3KaFhQ5w=; b=m/ZD7sNJSno+PXjj324BX+gpckG/kPgAjwWgkT8ydXMKD6FBUFG4r8JWRbm1VIDz5Q l63MRrUG4kIsIzrzeylJcmZz7saQXzMezuWTq0EXdUTNxM+a8+6e9MGleXuoiHhcDr2G 6YGvwt8BGLg5Q7gYiN8Q+VkvX7sciqwBP6RbgYCvgDhT8EvX2lsIX7P91AX5Bm+hya43 REL6imyczsO22m2lTe7SBkc40KApHp9aALQ/efS5gtVwjSYRRo4UGYWx7yOsQqmVXX2x PxNAJZranWa4GUpkKL9BYKLHRDLvClBVeStKKCVp1whevvhxRhJGA530P8Xst72tPo6o SFyQ== X-Forwarded-Encrypted: i=1; AJvYcCV+wbIYtVgpMx1/XySglpII3tIYfti7tmhhB0QjkHjgsSlHuCbtDgElBUmX3VcyE5EnoB3V79GkpxcWEcg2@postgresql.org X-Gm-Message-State: AOJu0YxfiThD/PEDybBWwz+uUpGyHzPLLb2cc14hvpdFeKtVTlAPh5EV ymYClVBmE27TzzKyqlxQwuiFFyXC82XE4jh2fEupVNzBkbL8kDMYHPHBSXKjSS2z12bAwMfcRg0 /KAuLGhyOG708NchhdLLM+hBKdVw76LE= X-Gm-Gg: AZuq6aKWfNsCNr68MoDHWoaNkhf23gVAciCi+u+OI6Jz8BAK10aI/uJ3BVfa3xfI2Sf PqOcygnQP8notvKKVJb22u/JjdNHvQDCO9b/B06lWGmtK7Mgd9rGfOZDURfu1tzxUA6i9V5ZzIX DVn5FosWX2rcsePRVUishsJcN0snBCrlw96Qdd41okyq3ExUDFmfCuvgvR2bk93HS03BOowsxpt d96jEIyY2YmQO88mfTxeMQWEUmO6Gygnh2mY+O08tQt9hUd2FiYpJn4x3MmcRoLeYcwoZ1e62ZX +CXc2h3lhb+cnpluCjKt3yEnQx+TD2zSmbaNeA== X-Received: by 2002:a17:907:d8b:b0:b87:1c20:7c63 with SMTP id a640c23a62f3a-b90819bfa47mr587811566b.20.1771854351421; Mon, 23 Feb 2026 05:45:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Junwang Zhao Date: Mon, 23 Feb 2026 21:45:39 +0800 X-Gm-Features: AaiRm53EThgK3QXmXctQM0t0Y7KiCpX2AbF3w1JWaphtoCwopEi92hXZcCdy2A8 Message-ID: Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3 To: Amit Langote Cc: Pavel Stehule , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Amit, On Thu, Feb 19, 2026 at 5:21=E2=80=AFPM Amit Langote wrote: > > Hi Junwang, > > On Mon, Dec 1, 2025 at 3:09=E2=80=AFPM Junwang Zhao w= rote: > > As Amit has already stated, we are approaching a hybrid "fast-path + fa= llback" > > design. > > > > 0001 adds a fast path optimization for foreign key constraint checks > > that bypasses the SPI executor, the fast path applies when the referenc= ed > > table is not partitioned, and the constraint does not involve temporal > > semantics. > > > > With the following test: > > > > create table pk (a numeric primary key); > > create table fk (a bigint references pk); > > insert into pk select generate_series(1, 2000000); > > > > head: > > > > [local] zhjwpku@postgres:5432-90419=3D# insert into fk select > > generate_series(1, 2000000, 2); > > INSERT 0 1000000 > > Time: 13516.177 ms (00:13.516) > > > > [local] zhjwpku@postgres:5432-90419=3D# update fk set a =3D a + 1; > > UPDATE 1000000 > > Time: 15057.638 ms (00:15.058) > > > > patched: > > > > [local] zhjwpku@postgres:5432-98673=3D# insert into fk select > > generate_series(1, 2000000, 2); > > INSERT 0 1000000 > > Time: 8248.777 ms (00:08.249) > > > > [local] zhjwpku@postgres:5432-98673=3D# update fk set a =3D a + 1; > > UPDATE 1000000 > > Time: 10117.002 ms (00:10.117) > > > > 0002 cache fast-path metadata used by the index probe, at the current > > time only comparison operator hash entries, operator function OIDs > > and strategy numbers and subtypes for index scans. But this cache > > doesn't buy any performance improvement. > > > > Caching additional metadata should improve performance for foreign key = checks. > > > > Amit suggested introducing a mechanism for ri_triggers.c to register a > > cleanup callback in the EState, which AfterTriggerEndQuery() could then > > invoke to release per-statement cached metadata (such as the IndexScanD= esc). > > However, I haven't been able to implement this mechanism yet. > > Thanks for working on this. I've taken your patches as a starting > point and reworked the series into two patches (attached): 1st is your > 0001+0002 as the core patch that adds a gated fast-path alternative to > SPI and 2nd where I added per-statement resource caching. Doing the > latter turned out to be not so hard thanks to the structure you chose > to build the core fast path. Good call on adding the RLS and ACL test > cases, btw. > > So, 0001 is a functionally complete fast path: concurrency handling, > REPEATABLE READ crosscheck, cross-type operators, security context, > and metadata caching. 0002 implements the per-statement resource > caching we discussed, though instead of sharing the EState between > trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback > mechanism that fires at the end of each trigger-firing cycle > (per-statement for immediate constraints, or until COMMIT for deferred > ones). It layers resource caching on top so that the PK relation, > index, scan descriptor, and snapshot stay open across all FK trigger > invocations within a single trigger-firing cycle rather than being > opened and closed per row. > > Note that phe previous 0002 (metadata caching) is folded into 0001, > and most of the new fast-path logic added in 0001 now lives in > ri_FastPathCheck() rather than inline in RI_FKey_check(), so the > RI_FKey_check diff is just the gating call and SPI fallback. > > I re-ran the benchmarks (same test as yours, different machine): > > create table pk (a numeric primary key); > create table fk (a bigint references pk); > insert into pk select generate_series(1, 2000000); > insert into fk select generate_series(1, 2000000, 2); > > master: 2444 ms (median of 3 runs) > 0001: 1382 ms (43% faster) > 0001+0002: 1202 ms (51% faster, 13% over 0001 alone) I can get similar improvement on my old mac intel chip: master: 12963.993 ms 0001: 6641.692 ms, 48.8% faster 0001+0002: 5771.703 ms, 55.5% faster > > Also, with int PK / int FK (1M rows): > > create table pk (a int primary key); > create table fk (a int references pk); > insert into pk select generate_series(1, 1000000); > insert into fk select generate_series(1, 1000000); > > master: 1000 ms > 0001: 520 ms (48% faster) > 0001+0002: 432 ms (57% faster, 17% over 0001 alone) master: 11134.583 ms 0001: 5240.298 ms, 52.9% faster 0001+0002: 4554.215 ms, 59.1% faster > > The incremental gain from 0002 comes from eliminating per-row relation > open/close, scan begin/end, slot alloc/free, and replacing per-row > GetSnapshotData() with only curcid adjustment on the registered > snapshot copy in the cache. > > The two current limitations are partitioned referenced tables and > temporal foreign keys. Partitioned PKs are relatively uncommon in > practice, so the non-partitioned case should cover most FK workloads, > so I'm not sure it's worth the added complexity to support them. > Temporal FKs are inherently multi-row, so they're a poor fit for a > single-probe fast path. > > David Rowley mentioned off-list that it might be worth batching > multiple FK values into a single index probe, leveraging the > ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be > to buffer FK values across trigger invocations in the per-constraint > cache (0002 already has the right structure for this), build a > SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf > pages in one sorted traversal instead of one tree descent per row. The > locking and recheck would still be per-tuple, but the index traversal > cost drops significantly. Single-column FKs are the obvious starting > point. That seems worth exploring but can be done as a separate patch > on top of this. I will take a look at this in the following weeks. > > I think the series is in reasonable shape but would appreciate extra > eyeballs, especially on the concurrency handling in ri_LockPKTuple() > in 0001 and the snapshot lifecycle in 0002. Or anything else that > catches one's eye. > > -- > Thanks, Amit Langote I don't have any additional comments on the patch except one minor nit, maybe merge the following two if conditions into one, not a strong opinion though. if (use_cache) { /* * The snapshot was registered once when the cache entry was created. * We just patch curcid to reflect the new command counter. * SnapshotSetCommandId() only patches process-global statics, not * registered copies, so we do it directly. * * The xmin/xmax/xip fields don't need refreshing: within a single * statement batch, only curcid changes between rows. */ Assert(fpentry && fpentry->snapshot !=3D NULL); snapshot =3D fpentry->snapshot; snapshot->curcid =3D GetCurrentCommandId(false); } else snapshot =3D RegisterSnapshot(GetLatestSnapshot()); if (use_cache) { pk_rel =3D fpentry->pk_rel; idx_rel =3D fpentry->idx_rel; scandesc =3D fpentry->scandesc; slot =3D fpentry->slot; } else { pk_rel =3D table_open(riinfo->pk_relid, RowShareLock); idx_rel =3D index_open(riinfo->conindid, AccessShareLock); scandesc =3D index_beginscan(pk_rel, idx_rel, snapshot, NULL, riinfo->nkeys, 0); slot =3D table_slot_create(pk_rel, NULL); } --=20 Regards Junwang Zhao