Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwy1u-007AOm-1c for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 07:49:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vwy1t-00G67Y-0z for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 07:49:25 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwy1s-00G67Q-2q for pgsql-hackers@lists.postgresql.org; Mon, 02 Mar 2026 07:49:25 +0000 Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vwy1p-000000027xL-37HK for pgsql-hackers@postgresql.org; Mon, 02 Mar 2026 07:49:24 +0000 Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-3591cc98871so1740448a91.3 for ; Sun, 01 Mar 2026 23:49:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772437761; cv=none; d=google.com; s=arc-20240605; b=P13eiWY2x4JsHLwOhdR9nbcddwizmTez+R734PvInhFZyM21QR7vjJMaIfUn9C0arh cqhpeHdOO5Md17oIs/oKQh+O6FxABd6Ft2jYNZgQfnrJ9Ko9zuA06EYxwsoxrBSD9Wja gB/J0rTQ73DYFnUC4HXq5iNIA1P/XnDD2gBnr7qeupl4Wld+P2+BXUYoYTbqkFZBltAP Rw17ZpIDcpF7n1kOQ5lvOYa3Qj1FBEwxJcDMMt7g/CHAMmaUC6Eg6SQX+Bm6Ft26vlnA zBSLJYzIAwr69AOJkvXYmoW+gjnWY10tu0FLXAu73XHzKKSZ0yEPDR+q7k554aZaa0yU jANA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=lrRuWiGU+0pBW8Qz0dlvSI7HwyX6vQhiAS6czec/SI8=; fh=pCuDoNbjUb7gpEzJSR4sQcztowzSp6UN6QYTYPID/Gk=; b=FrhzYfs0CLuOx0y+BhgOatRVVaNHOj3jiU1w5NsuZVL9sFrAfIuQO6m7Hbzh/tPqvG 4+d8qNG4DfgWjpgg947mWt16Gkpi/aPNytuuVnhQPGt25ceRqEtdln+WP3C/ERBgs0pr DuQWsvhdunVPOq0jntgLIrMqVdSWJk2AzMcMVW5+EXWSuc9wa0QP9SYHHyrZRA2mVMw6 +KfNhV7LgJOsRtgQR8oOqI8nxlARVbyJLPj4W0B9/+G5/gPYhqh4dfJ3IQaJION6Ccni a1SZ05dnMnovC2WDQLtHiAbnfydtK9t8ZohmEiykRVyvMexf6FZa+gqrR7y6Txu4wN2P AOrw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772437761; x=1773042561; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lrRuWiGU+0pBW8Qz0dlvSI7HwyX6vQhiAS6czec/SI8=; b=SzRzGcJKxOklvSlrdnu+rIloN1cOZyqjVAp1G0NxidId3M+zSw7lE04xWBRcTdi2Ja t01svcA8f3ggpFqzqi7WFh0qu0Mpijl8AVr0DaYui0wHJYpcxe8nEIVq3PeII2uncml6 87wB3yRgtqpIAKk/H+zuljjz7kIbBj1936b5sO+dRtn1twETa4o2MHzm0tJPvocBK3BW /7oyuI1jK38egbGnj+/st3aNVqq0JCxdeeX5ZxFbI8RTyIjhTmCB6I8D004ZKfzbQN0j FLM/xSCh/3EvqqByVvtzgdf7ykQ/YnwmsBzUPlFZllp1TRHfUoSWTpymot9/xlKFAXms i4hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772437761; x=1773042561; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lrRuWiGU+0pBW8Qz0dlvSI7HwyX6vQhiAS6czec/SI8=; b=Rno07OsbNrQ0ssZPcE4RdQ4Uz9k2bsHUG2qxl/0SsWlU87vMtlC7J/lMzrTzHtFQJ2 vmJpZmjQvr/yZ++JABsHi258z5UOfpVcuJCdfcoDNnGOEjcV6QRFb0nciKcwxZ9UOOjL LRbo9R/yIMnqaBf46nCvRnrw9napxtoxyPM7Fb/iySM2aAp+WmQwGe1VOAfqam36pU6j 1aBXo3VFbKuoYF1a98Yo6B0wGvEJ8n33BLn5zU29HMj7xjRdXRWFss33Z03uQDt+20vz SH+QiH0bTjT3A1RqNh7MuqYCWB8NMLcMBNonar+FZEzAaZoB+DDG7Cd4a/ZrK/DHpCDs Qu7w== X-Forwarded-Encrypted: i=1; AJvYcCUhA/BQihVwiNFFYsLwQCpuTKfUvD4Jvv6elpqS6f3J7205qWsLjnw+jtHRyqzB3bLyVX3fe7sXj6U/LO3X@postgresql.org X-Gm-Message-State: AOJu0Yws+nrRU9a00xmwdrnP2Cdj9DSjJqCxbZTYHSP476kmhg4cc64H mHQY1/vymeZNXB6bysKXWrcHjpdh1tvz8hoyu+rcWO9r9v+ExMgx+vpZ/8B7DSMlgBX6mIgGbU3 sUyC0Uar+x3cu1BUqw7OUe99bNX/kZug= X-Gm-Gg: ATEYQzzdsSHtcl1N8D9U08HzOva3Fj8ixICClGzNkH4ePEJFr0Bb25/nYgcS6NQ80Vs A0uwUwTrFZ+I8pN26uK4R1YaVh5Amxyk9Whsv+KiYSyvZGU7cM3wmAWXfhBes542+RlJWHfqZ8G fg6GwdXSQZFrgRE6aGHBPbxUILxN0sETt9U4e1OImo+K8hY/MIk7hZ2rbLqDvKzSO0nbv5ic5Zx TEzZ7eVLKuXYl8BpMnK9zkNlClaTTcZbjSz3sPAzDIByMs2u107eIHVQi6HZ3Vpcab4CKLyG6IJ gHOtJTdM X-Received: by 2002:a17:90a:e7ca:b0:359:95c2:b41 with SMTP id 98e67ed59e1d1-35995c20d79mr227553a91.23.1772437761160; Sun, 01 Mar 2026 23:49:21 -0800 (PST) MIME-Version: 1.0 References: <43b56e3b-80d6-4a45-a313-cb1caa587b22@vondra.me> In-Reply-To: <43b56e3b-80d6-4a45-a313-cb1caa587b22@vondra.me> From: Amit Langote Date: Mon, 2 Mar 2026 16:49:05 +0900 X-Gm-Features: AaiRm50PDkiviF9QExzDg5wvzcv6QTM7n_q_5D_flXUo6D6p3y_NTTHw4CWlaaA Message-ID: Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3 To: Tomas Vondra Cc: Junwang Zhao , Pavel Stehule , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Tomas, Thanks for the thorough benchmarking. On Sun, Mar 1, 2026 at 9:22=E2=80=AFPM Tomas Vondra wrote= : > On 2/28/26 08:08, Amit Langote wrote: > > Tomas Vondra also tested with an I/O-intensive workload (dataset > > larger than shared_buffers, combined with his and Peter Geoghegan's > > I/O prefetching patches) and confirmed that the batching + SAOP > > approach helps there too, not just in the CPU-bound / memory-resident > > case. In fact he showed that the patches here don't make a big dent > > when the main bottleneck is I/O as shown in numbers that he shared in > > an off-list email: > > > > master: 161617 ms > > ri-check (0001..0004): 149446 ms (1.08x) > > ri-check + i/o prefetching: 50885 ms (3.2x) > > > > So the RI patches alone only give ~8% here since most time is waiting > > on reads. But the batching gives the prefetch machinery a window of > > upcoming probes to issue readahead against, so the two together yield > > 3.2x. > > > > I tested this (with the index prefetching v11 patch), because I wanted > to check if the revised API works fine for other use cases, not just the > regular index scans. Turns out the answer is "yes", the necessary tweaks > to the FK batching patch were pretty minimal, and at the same time it > did help quite a bit for cases bottle-necked on I/O. Do you think those changes to the FK batching are only necessary for making it work with your patch or is that worth including with the set here because it's generally applicable? > FWIW I wonder how difficult would it be to do something like this for > inserts into indexes. It's an orthogonal issue to FK checks (especially > for the CPU-bound cases this thread focuses on), but it's a bit similar > to the I/O-bound case. In fact, I now realize I actually did a PoC for > that in 2023-11 [1], but it went stale ... Interesting. I hadn't seen your earlier PoC. Does the current I/O prefetching infrastructure simplify that approach, or are they independent paths? The old patch calls PrefetchBuffer() directly on the leaf, which seems orthogonal to the scan-side prefetching. Either way, would be nice to see more paths benefit from batching. > benchmarks > ---------- > > Anyway, thinking about the CPU-bound case, I decided to do a bit of > testing on my own. I was wondering about three things: > > (a) how does the improvement depend on data distribution > (b) could it cause regressions for small inserts > (c) how sensitive is the batch size > > So I devised two simple benchmarks: > > 1) run-pattern.sh - Inserts batches of values into a table, both the > batch and table can be either random or sequential. It's either 100k or > 1M rows, logged or unlogged, etc. > > 2) run-pgbench.sh - Runs short pgbench inserting data into a table, > similar to (1), but with very few rows - so the timing approach is not > suitable to measure this. > > Both scripts run against master, and then patched branch with three > batch sizes (default 64, 16 and 256). > > > results > ------- > > The results are very positive - see the attached PDF files comparing the > patched builds to master. > > I have not found a single case where the batching causes regressions. > This surprised me a bit, I've expected small regressions for single-row > inserts in the pgbench test, but even that shows a small (~5%) gain. > Even just 2-row inserts show +25% improvement in pgbench throughput. This is reassuring. I too was half-expecting the batching infrastructure to add measurable overhead for single-row inserts, but it looks like the SPI bypass alone more than covers it. > There are a couple cases where it matches master, I assume that's for > I/O bound cases where the CPU optimizations do not really matter. That's > expected, of course. > > I don't see much sensitivity on the batch size. The 256 batches seem to > be a bit slower, but there's little difference between 16 and 64. So I'd > say 64 seems reasonable. Agreed. Interesting that 16 is consistently a little better than 64 in the patterns benchmark. I'd guess that's the per-PK-index-match linear scan over the batch cost showing up, since it's O(batch_size) per PK match. 256 being noticeably worse fits that picture. 64 seems like a good middle ground since the pgbench numbers show virtually no difference between 16 and 64. The best-case numbers are striking -- when both the PK table and the FK values being inserted are in sequential order, the unlogged patterns case hits 4-5x, wow. I guess that makes sense because sequential FK values turn into a sorted SAOP array that walks consecutive leaf pages, so it's essentially a single sequential scan of the relevant index portion. > Overall, I think these results looks quite good. I haven't looked at the > code very closely, not beyond adjusting it to work with index prefetch. If you get a chance, I'd welcome a closer look. Your memory context catch was a real bug that I'd missed entirely. The area that would benefit most from a second pair of eyes is the snapshot and permission caching semantics in 0002. The argument for why reusing the snapshot and checking permissions once per batch is safe rather than per-row is sound I think, but the effects are global and hard to validate by testing alone.. --=20 Thanks, Amit Langote