Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmYlq-00DTNM-2O for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 14:49:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vmYlo-00A0A5-2g for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 14:49:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmYlo-00A09x-1I for pgsql-hackers@lists.postgresql.org; Sun, 01 Feb 2026 14:49:49 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vmYln-0000000030e-1cCi for pgsql-hackers@postgresql.org; Sun, 01 Feb 2026 14:49:48 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-658b9e95990so6908567a12.1 for ; Sun, 01 Feb 2026 06:49:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1769957385; cv=none; d=google.com; s=arc-20240605; b=gGGcHETunqSa4I8oHmx2fskyd74qZgCCNjoyXj/5iNqCnsno0ZxmPw7xDYaYVl53gt L5+mFs8jIBsZs4syHLrtTdV38bxwFgkv9rvdLweGfS45hsV0N4whYYTEsGLSOVzwkB1A lNHwnZtyTAiNR1QE0sfepBPDsX45jN5klQj/O7tfNA2YL7ZGjgV1D/2ggWr9YnVrv3Z5 vO7UrAGrJTFwLSi+XFXcOglmiVbDABP4bOBVzk15e4XEIcPC/3BeLrgMNdnh41CNY+YD cAjnYiMscKcc7SEDVH7CbqxCVefZd0SCfjfg/WU69UJZipc2tzhVxbTjHtbRRAUwq837 7lTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fbEkVJyjuIBIBunDxVD2bHGS3dwirPCW+O/3xwDEhgo=; fh=35H8Jt5+YxnpLi7oCiYC+liJSOpSn40JALZMqFHTayc=; b=LuQi/wkSt1keiWRT2YvamuL35eFz8moKztsObtZ2dtxYjzy6/An7hh4BUH2KUAVy76 Gmudk+lgEt2o87UQbfaiEa5aanxKiRHnYrNsyEdGgJ4g74OpPkGhtCPTiKPXrbCOWbge 8HDRA/wc4lA7qFn0MX5+ec1NNjFEouSUjJb2qSslLutBoePMaqCqQauofyaGyx5bIRxP pg9cxhdKz/1uoIfx2u4EDPfkoDhHZtbobzFEb+WvzNpkvbQnFB+G84ARf9G5zWA3t/xx 6ad5SAheBsi4lqy7JKiAZz0h4fjweUAvWQmxD1XllFGWGEc5rELo4hVUAibeifwbmzwB kEsQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769957385; x=1770562185; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fbEkVJyjuIBIBunDxVD2bHGS3dwirPCW+O/3xwDEhgo=; b=CLu4c1QlqkykiZdrWzeLfY88othD4VsCGh2oU449lUWMSDwtyHHVgxKEDHKXYajmgp +355uBSPKjaAPtbxee3x4fCC1F5vbeLnCe3h67hgB+qk1gWulsNfzNfLjcGTxvvRXMPL hzvuuQkoFTLc99amAuyx7JKw1akg7F7NtqzEvgrWMKrrJwIkuAkJg5B3sGRrAIGgOMT3 IC047pVXbrUSD9CpMzqmxAW5uv8cp523xOCdENT54HDTllcXyBMjTHRCpYjxLpj0lOh6 V07NrxQk5pzBTowe628u+ud+LishidNLU4IdzZEHzVRDRXDA0KloMbZmBbA0bGQyhTvz rdiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769957385; x=1770562185; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fbEkVJyjuIBIBunDxVD2bHGS3dwirPCW+O/3xwDEhgo=; b=NKjLZFE2Pb62P6gNCsyaVWGXUp4R8FOaZ39dak/JBD+o2MwPwcppUXeYZReM8QjQ/2 hmkD20SmxdDIAl/yeONMYKWbXOE2hvBG5+ogmIT1bKK4bKpzF8yCJL1666D3SxcWBYnU jyuLhqajaIVPHJmxVc4SFv52i0AJpM3ALPZZ99LYjMPqm6wUP6c6N+5PEPEfGTFI7Yt3 wLFKhclCd4FCqujXTYpB1KQG4UvkRtqQC/7EnpeX4oyJ/LwDMbu/vuUClZgcVkK1QUdY 50b0KyiHOf2nDeLttHJ1QdVNor2oFQ1Tiy8dCUoAmrI6kDOVwIS/yCfRXlcr0frZp1tr QYpQ== X-Forwarded-Encrypted: i=1; AJvYcCX+H1pRl3/bi8+4e+P7kpNefggziNrmPORZeGJVZi5lKO+LY2nr4l11XYU6s/Irfr+6tTYJvxWuyyhaqD7S@postgresql.org X-Gm-Message-State: AOJu0YxIfh71ip+MHPH1i1lm9nrS2aTjnOo5e/EUMaS6B+uhkD/zuqeF t1lWgxyZZ7pNEcVQ1lX9UkrckSNOMWUWDBV8UieOVHwxYb5Fmzide/Siy5+Nu+aRWHnxVQ5HPJm HVCXUnZixtxqok/2XEcXpXfllOKwRzLI= X-Gm-Gg: AZuq6aIHk9ZSAuPlz2me9SNWzcjuBzvCOBe8SpP8ZUbATXP4HSIM6nrM1SSNE5J01KA xtE0ONyDpuh6tfJj+4Cq3T5SnykePqsjAcSJT3WN1bC4j7tlVqYzj+s5+n2+8prrutgJ1UbWj8G GrJsKT+TZKc5RjJHMS7SgCU5nnvRWbazgnPD8o037Y5EwVgKRIlqVkYCZKsTZneY3YURmW2w8Gl 9o7pkQ6Rnf3aBVs69YTURehoAEvZh4zJtr+KW3RClJk+ZqQGw1+8zdHZ0rbrLfSiUztSzrjOoND Fg+vWk21EqCR+8majXmVa4OvdnuvE8A9pmqyr/4= X-Received: by 2002:a05:6402:520b:b0:64b:416a:cb48 with SMTP id 4fb4d7f45d1cf-658de5ac5b8mr5501270a12.19.1769957384857; Sun, 01 Feb 2026 06:49:44 -0800 (PST) MIME-Version: 1.0 References: <68f3771f-91f5-4cb7-b1de-74d9abbf0b96@vondra.me> In-Reply-To: From: Junwang Zhao Date: Sun, 1 Feb 2026 22:49:31 +0800 X-Gm-Features: AZwV_QgUbfZp7ZHux5CidNeBXKELet699LrpTLzAsZgfhyVLLA_HSB5VYM6h7kE Message-ID: Subject: Re: Batching in executor To: Amit Langote Cc: Daniil Davydov <3danissimo@gmail.com>, cca5507 <2624345507@qq.com>, PostgreSQL-development , Tomas Vondra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Amit, On Thu, Jan 29, 2026 at 3:35=E2=80=AFPM Amit Langote wrote: > > Hi, > > Here is v5 of the patch series. > > Patches 0001-0003 add the core batching infrastructure. 0001 adds the > batch table AM API with heapam implementation, 0002 wires up SeqScan > to use it (still returning one slot at a time), and 0003 adds EXPLAIN > (BATCHES). I'd love to hear people's thoughts around TupleBatch > structure added in 0002. I thought about making it a separate patch so > that 0002 will still populate the single ScanState.ss_scanTupleSlot, > but that means we'd still have to call the TAM callback to populate > the tuple in the TAM's batch struct into the slot, defeating the whole > point. With TupleBatch, you have executor_batch_rows number of slots > which are filled in one TAM callback (materialize_all) call. So I > decided to keep the TupleBatch and related things in 0002. > > For scans without quals, batching shows 20-30% improvement with no > visible regressions when batching is disabled (batch_rows=3D0): > > SELECT * FROM t LIMIT n (no qual) > > Rows Master batch=3D0 %diff batch=3D64 %diff > ------ -------- ------- ----- -------- ----- > 1M 12.42 ms 11.96 ms 3.7% 8.56 ms 31.0% > 3M 38.95 ms 38.92 ms 0.1% 28.59 ms 26.6% > 10M 153.64 ms 150.28 ms 2.2% 112.95 ms 26.5% > > (%diff: positive =3D faster than master, negative =3D slower) > > Patches 0004-0005 add batched qual evaluation and are more > experimental (see below on why 0005 exists). For quals referencing > early columns, the improvement is significant: > > SELECT * FROM t WHERE a =3D 0 ... OFFSET n (qual on 1st col) > > Rows Master batch=3D64 %diff > ------ -------- -------- ----- > 1M 30.19 ms 15.55 ms 48.5% > 3M 92.47 ms 50.01 ms 45.9% > 10M 325.58 ms 211.83 ms 34.9% > > However, for quals on later columns (e.g., 15th), batching provides no > benefit - deformation dominates and batching doesn't help: > > SELECT * FROM t WHERE o =3D 0 ... OFFSET n (qual on 15th col) > > Rows Master batch=3D64 %diff > ------ -------- -------- ----- > 1M 44.14 ms 44.56 ms -0.9% > 3M 133.89 ms 137.77 ms -2.9% > 10M 503.33 ms 528.88 ms -5.1% > > I don't have a satisfactory explanation for why batching doesn't help > the deform-heavy case at all. One would expect at least some benefit > from reduced per-tuple overhead, but that's not materializing. > > I've also been struggling to understand why 0004 affects the per-tuple > path even when batch_rows=3D0. For quals with 0% selectivity (all rows > fail the qual), perf shows ExecInterpExpr is noticeably hotter with > the patched code compared to master, even though batching is disabled: > > SELECT * FROM t WHERE a =3D 0 ... OFFSET n (0% selectivity) > > Rows Master batch=3D0 %diff batch=3D64 %diff > ------ -------- ------- ----- -------- ----- > 1M 24.37 ms 28.67 ms -17.6% 12.46 ms 48.9% > 3M 73.95 ms 85.07 ms -15.0% 41.64 ms 43.7% > 10M 287.63 ms 316.81 ms -10.1% 188.01 ms 34.6% > > Compare that to 100% selectivity (all rows pass), where there's no regres= sion: > > SELECT * FROM t WHERE a > 0 ... OFFSET n (100% selectivity) > > Rows Master batch=3D0 %diff batch=3D64 %diff > ------ -------- ------- ----- -------- ----- > 1M 29.44 ms 29.10 ms 1.2% 16.61 ms 43.6% > 3M 91.22 ms 90.28 ms 1.0% 54.10 ms 40.7% > 10M 360.77 ms 331.25 ms 8.2% 224.00 ms 37.9% > > I tried moving batch opcodes to a separate interpreter (0005) thinking > it might be register pressure or jump table effects from adding cases > to ExecInterpExpr's switch. With 0005, the generated assembly for > ExecInterpExpr looks identical to master (same stack frame size, same > epilogue), yet the performance still differs. Specifically, the ldp > instruction in the function epilogue shows 53% hotness in patched vs > 35% in master. We still need placeholder entries in the dispatch > table, so it's unclear if this fully isolates the per-tuple path. I'll > continue looking at perf, but I feel like at a bit of a loss here and > would appreciate any insights. > > Other changes worth noting: > > - I removed the BatchVector intermediate representation that copied > Datums into columnar arrays before qual evaluation (it used to be in > the batched qual patch 0004). Now quals access batch slots' tts_values > directly. This simplifies the code and the copy overhead wasn't paying > off. If we pursue serious vectorization later, this may need to be > revisited, but removing it doesn't degrade performance. > > -- > Thanks, Amit Langote Here are some comments for v5: 0001: +/* + * heap_scan_begin_batch + * + * Allocate a HeapBatch with space for 'maxitems' tuple headers. No pin is + * taken here. Memory is allocated under the scan's memory context. + */ +void * +heap_begin_batch(TableScanDesc sscan, int maxitems) +/* + * heap_scan_end_batch + * + * Release any outstanding pin and free the batch allocations. Caller will + * not use 'am_batch' after this point. + */ +void +heap_end_batch(TableScanDesc sscan, void *am_batch) These function names are not consistent with comments. 0002: +/* + * heap_scan_materialize_all + * + * Bind all tuples of the current batch into 'slots'. We bind the + * HeapTupleData header that points into the pinned page. No per-row copy. + */ +void +heap_materialize_batch_all(void *am_batch, TupleTableSlot **slots, int n) ditto. +const TupleBatchOps * +table_batch_callbacks(Relation relation) +{ + if (relation->rd_tableam) + return relation->rd_tableam->batch_callbacks(relation); + elog(ERROR, "relation does not support TupleBatch operations"); +} Is there any chance this batch_callbacks can be NULL? In that case it can cause a segfault. I felt changing to if (relation->rd_tableam && relation->rd_tableam->batch_callbacks) should be more robust, but then I found table_slot_callbacks follow the same pattern, so this shouldn't be a problem. 0003: +++ b/src/include/executor/execBatch.h @@ -13,6 +13,8 @@ #ifndef EXECBATCH_H #define EXECBATCH_H +#include I guess the reason for including this header is because of the use of INT_MAX, so maybe put that line into execBatch.c? --=20 Regards Junwang Zhao