Re: Batching in executor

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Amit Langote <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Batching in executor
Date: Sat, 20 Dec 2025 23:12:03 +0900
Message-ID: <CA+HiwqEZja5rJ78p3FBDZNvynWsHwanxyt6h0YaK_r84NemXng@mail.gmail.com> (raw)
In-Reply-To: <CA+HiwqEPMwhg6pUE4XML2rG4fRqKMpeGTtMwRPGes90f9iOqtg@mail.gmail.com>
References: <CA+HiwqFfAY_ZFqN8wcAEMw71T9hM_kA8UtyHaZZEZtuT3UyogA@mail.gmail.com>
	<[email protected]>
	<CA+HiwqGqeS_94wxiYa8VpymiR_OtFPKDSpX+Me=MYWO45f5yig@mail.gmail.com>
	<[email protected]>
	<CA+HiwqGM0ZTeVHicSkGnCp-2U-jvU-KBQCkPJ0N7nAj_c2LjZg@mail.gmail.com>
	<CA+HiwqHyE7-oOvtZ+OC-4N7DvKSr8Jbu75erMLQ7O4d6gfkBhg@mail.gmail.com>
	<CA+HiwqEPMwhg6pUE4XML2rG4fRqKMpeGTtMwRPGes90f9iOqtg@mail.gmail.com>

On Fri, Dec 5, 2025 at 12:54 AM Amit Langote <[email protected]>
wrote:
> On Wed, Oct 29, 2025 at 3:37 PM Amit Langote <[email protected]>
wrote:
> > On Tue, Oct 28, 2025 at 10:40 PM Amit Langote <[email protected]>
wrote:
> > > That would be nice to see if you have the time, but maybe after I post
> > > a new version.
> >
> > I’ve created a CF entry marked WoA for this in the next CF under the
> > title “Batching in executor, part 1: add batch variant of table AM
> > scan API.” The idea is to track this piece separately so that later
> > parts can have their own entries and we don’t end up with a single
> > long-lived entry that never gets marked done. :-)
>
> I intend to continue working on this, so have just moved it into the
> next fest.  I will post a new patch version next week that addresses
> Daniil's comments and implements a few other things I mentioned I will
> in my reply to Tomas on Oct 28; sorry for the delay.

Before I go on vacation for a couple of weeks, here's an updated patch
set.  I am only including the patches that add TAM interface, add
TupleBatch executor wrapper for TAM batches, and use it in SeqScan as I had
posted before.  There is a new patch to add a BATCHES option to EXPLAIN.  I
renamed the testing GUC to executor_batch_rows (integer) from the boolean
executor_batching.  EXPLAIN (BATCHES) example:

+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off)
select * from batch_test');
+                         explain_filter
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+   Batches: N  Avg Rows: N.N  Max: N  Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)

What I have not included in this set are the patches that add
ExecProcNodeBatch() so that TupleBatch can be passed from one plan node to
another (parent), ExprEvalOps (EEOPs) for batched expression evaluation
(qual and aggregate transition).  I would like to focus on the patches that
allow reading batches from TAM into Scan nodes (only SeqScan for now).

After I'm back from vacation, I will post patches for batched qual
evaluation in SeqScan filter quals (once bugs are fixed and polished).
Batching in Agg node can wait for now.

In the meantime, what I would like to have someone's thoughts on:

* the shape of the TAM APIs -- should I add a TAMBatch or something that is
created, populated, and destroyed by the TAM instead of the current void
pointer and TupleBatchOps that are initialized in the executor like this
(excerpt from 0002):

+    /* Lazily create the AM batch payload. */
+    if (node->ss.ps.ps_Batch->am_payload == NULL)
+    {
+        const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY =
scandesc->rs_rd->rd_tableam;
+
+        Assert(tam && tam->scan_begin_batch);
+        node->ss.ps.ps_Batch->am_payload =
+            table_scan_begin_batch(scandesc,
node->ss.ps.ps_Batch->maxslots);
+        node->ss.ps.ps_Batch->ops =
table_batch_callbacks(node->ss.ss_currentRelation);
+    }

* the shape of TupleBatch itself -- its contents and operations defined in
execBatch.c/h.

* any other thoughts you might have on the project, patches.

Benchmark:

Scripts attached if you want to try them.

(Negative % = faster than master)

SELECT * FROM table LIMIT 1 OFFSET N:
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          11ms       11ms        -0%        8ms       -23%
2M          23ms       22ms        -1%       18ms       -23%
3M          36ms       34ms        -5%       27ms       -25%
4M          51ms       50ms        -2%       38ms       -26%
5M          64ms       64ms        -1%       48ms       -26%
10M        147ms      145ms        -1%      114ms       -22%

SELECT * FROM WHERE a > 0 LIMIT 1 OFFSET N:
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          31ms       31ms        +0%       16ms       -48%
2M          64ms       64ms        -0%       34ms       -47%
3M          67ms       66ms        -1%       50ms       -25%
4M          91ms       90ms        -1%       71ms       -22%
5M         119ms      113ms        -5%       88ms       -26%
10M        262ms      261ms        -0%      205ms       -21%

SELECT * FROM table WHERE o > 0 LIMIT 1 OFFSET N (last column -
deform-heavy):
Rows      Master    batch=0   vs master   batch=64   vs master
--------------------------------------------------------------
1M          38ms       37ms        -2%       38ms        +0%
2M          79ms       75ms        -6%       77ms        -4%
3M         182ms      186ms        +2%      160ms       -12%
4M         250ms      252ms        +1%      219ms       -12%
5M         314ms      316ms        +1%      273ms       -13%
10M        647ms      651ms        +1%      604ms        -7%

The smaller improvement with WHERE o > 0 is expected since accessing the
last column requires deforming most of the tuple, which dominates the
execution time. Future work on batched tuple deformation could help here.

Note on regressions with executor_batch_rows = 0 vs master:

I am not seeing the regressions with batch_rows=0 vs master as I did
before.  I think some of it might have to do with my removing some stray
fields from HeapScanData that were accidentally left there in the earlier
patches.  Also, the regressions I was observing earlier seemed more to have
to do with using gcc to compile master tree and clang to compile patched
tree, which resulted in code layout changes that seemed to cause patched
binary to regress.  Would be nice if these numbers can be verified by
others.

-- 
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v4-0001-Add-batch-table-AM-API-and-heapam-implementation.patch (13.4K, 3-v4-0001-Add-batch-table-AM-API-and-heapam-implementation.patch)
  download | inline diff:
From 24a3d208db93312788745882a01b526957919966 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 20 Dec 2025 17:21:56 +0900
Subject: [PATCH v4 1/3] Add batch table AM API and heapam implementation

Introduce new table AM callbacks to fetch multiple tuples per call.
This reduces per-tuple call overhead by letting executor nodes work
in batches.

Define a HeapBatch structure and supporting code in tableam.h.
Batches are limited to tuples from a single page and at most
EXEC_BATCH_ROWS (currently 64) entries.

Provide initial heapam support with heapgettup_pagemode_batch().
No executor node is switched over yet; a later commit will adapt
SeqScan to use this API. Other nodes may adopt it in the future.

Also add pgstat_count_heap_getnext_batch() to record batched fetches
in pgstat.

Reviewed-by: Daniil Davydov <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqFfAY_ZFqN8wcAEMw71T9hM_kA8UtyHaZZEZtuT3UyogA@mail.gmail.com
---
 src/backend/access/heap/heapam.c         | 219 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   4 +
 src/include/access/heapam.h              |  18 ++
 src/include/access/tableam.h             |  58 ++++++
 src/include/pgstat.h                     |   5 +
 5 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 6daf4a87dec..fcc0813f139 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1023,7 +1023,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 					int nkeys,
 					ScanKey key)
 {
-	HeapTuple	tuple = &(scan->rs_ctup);
+	HeapTuple tuple = &scan->rs_ctup;
 	Page		page;
 	uint32		lineindex;
 	uint32		linesleft;
@@ -1104,6 +1104,132 @@ continue_page:
 	scan->rs_inited = false;
 }
 
+/*
+ * heapgettup_pagemode_batch
+ *		Collect up to 'maxitems' visible tuples from a single page in page mode.
+ *
+ * This function returns a *batch* of tuples from one heap page. If the
+ * current page (as tracked by the scan desc) has no more tuples left,
+ * it will advance to the next page and prepare it (via heap_prepare_pagescan).
+ * It will not cross a page boundary while filling the batch.
+ *
+ * Return value:
+ *		number of tuples written into 'tdata' (0 at end-of-scan).
+ *
+ * Side effects:
+ *	- Ensures rs_cbuf pins the page from which tuples were produced.
+ *	- Sets rs_cblock, rs_cindex, rs_ntuples consistently (same as
+ *	  heapgettup_pagemode’s inner-loop effects).
+ *	- Does *not* change buffer pin counts except through normal page
+ *	  transitions performed by heap_fetch_next_buffer().
+ */
+static int
+heapgettup_pagemode_batch(HeapScanDesc scan,
+						  ScanDirection dir,
+						  int nkeys, ScanKey key,
+						  HeapTupleData *tdata,
+						  int maxitems)
+{
+	Page		page;
+	uint32		lineindex;
+	uint32		linesleft;
+	int			nout = 0;
+	Relation	rel = scan->rs_base.rs_rd;
+	Oid			tableOid = RelationGetRelid(rel);
+	TupleDesc	tupdesc = key ? RelationGetDescr(rel) : NULL;
+
+	/*
+	 * Current batching limitations (may be relaxed in future):
+	 *
+	 * - Forward scans only: backward scan support would require changes to
+	 *   batch iteration and page advancement logic.
+	 *
+	 * - Pagemode required: batching relies on the pre-built rs_vistuples[]
+	 *   array from heap_prepare_pagescan(). This is guaranteed by
+	 *   ScanCanUseBatching() which only enables batching when SO_ALLOW_PAGEMODE
+	 *   is set. Unlike heap_getnextslot, we don't support dynamic fallback to
+	 *   tuple-at-a-time mode since the batch execution path is selected at
+	 *   ExecInit time.
+	 */
+	Assert(ScanDirectionIsForward(dir));
+	Assert(scan->rs_base.rs_flags & SO_ALLOW_PAGEMODE);
+	Assert(maxitems > 0);
+
+	/*
+	 * If we have no current page (or the current page is exhausted),
+	 * advance to the next page that has any visible tuples and prepare it.
+	 * This mirrors the outer loop of heapgettup_pagemode(), but we stop
+	 * as soon as we have a prepared page; we never produce from two pages.
+	 */
+	for (;;)
+	{
+		if (BufferIsValid(scan->rs_cbuf))
+		{
+			/* Are there more visible tuples left on this page? */
+			lineindex = scan->rs_cindex + dir;
+			linesleft = (lineindex <= (uint32) scan->rs_ntuples) ?
+				(scan->rs_ntuples - lineindex) : 0;
+			if (linesleft > 0)
+				break;	/* continue on this page */
+		}
+
+		/* Move to next page and prepare its visible tuple list. */
+		heap_fetch_next_buffer(scan, dir);
+
+		if (!BufferIsValid(scan->rs_cbuf))
+		{
+			/* end of scan; keep rs_cbuf invalid like heapgettup_pagemode */
+			scan->rs_cblock = InvalidBlockNumber;
+			scan->rs_prefetch_block = InvalidBlockNumber;
+			scan->rs_inited = false;
+			return 0;
+		}
+
+		Assert(BufferGetBlockNumber(scan->rs_cbuf) == scan->rs_cblock);
+		heap_prepare_pagescan((TableScanDesc) scan);
+
+		/* After prepare, either rs_ntuples > 0 or we'll loop again. */
+		if (scan->rs_ntuples > 0)
+		{
+			lineindex = 0;
+			linesleft = scan->rs_ntuples;
+			break;
+		}
+		/* else: page had no visible tuples; continue to next page */
+	}
+
+	/* From here on, we must only read tuples from this single page. */
+	page = BufferGetPage(scan->rs_cbuf);
+
+	/*
+	 * Walk rs_vistuples[] from 'lineindex', copying headers into tdata[]
+	 * until either the page is exhausted or the batch capacity is reached.
+	 */
+	for (; linesleft > 0 && nout < maxitems; linesleft--, lineindex += dir)
+	{
+		OffsetNumber	lineoff;
+		ItemId			lpp;
+		HeapTupleData *dst = &tdata[nout];
+
+		Assert(lineindex <= (uint32) scan->rs_ntuples);
+		lineoff = scan->rs_vistuples[lineindex];
+		lpp = PageGetItemId(page, lineoff);
+		Assert(ItemIdIsNormal(lpp));
+
+		dst->t_data = (HeapTupleHeader) PageGetItem(page, lpp);
+		dst->t_len = ItemIdGetLength(lpp);
+		dst->t_tableOid = tableOid;
+		ItemPointerSet(&(dst->t_self), scan->rs_cblock, lineoff);
+
+		if (key != NULL && !HeapKeyTest(dst, tupdesc, nkeys, key))
+			continue;
+
+		scan->rs_cindex = lineindex;
+		nout++;
+	}
+
+	return nout;
+}
 
 /* ----------------------------------------------------------------
  *					 heap access method interface
@@ -1436,6 +1562,97 @@ heap_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableSlot *s
 	return true;
 }
 
+/*---------- Batching support -----------*/
+
+/*
+ * heap_scan_begin_batch
+ *
+ * Allocate a HeapBatch with space for 'maxitems' tuple headers. No pin is
+ * taken here. Memory is allocated under the scan's memory context.
+ */
+void *
+heap_begin_batch(TableScanDesc sscan, int maxitems)
+{
+	HeapBatch  *hb;
+	Oid			relid;
+
+	Assert(maxitems > 0);
+
+	hb = palloc(sizeof(HeapBatch));
+	hb->tupdata = palloc(sizeof(HeapTupleData) * maxitems);
+	hb->maxitems = maxitems;
+	hb->nitems = 0;
+	hb->buf = InvalidBuffer;
+
+	/* Initialize static fields of HeapTupleData. Row bodies remain on page. */
+	relid = RelationGetRelid(sscan->rs_rd);
+	for (int i = 0; i < maxitems; i++)
+		hb->tupdata[i].t_tableOid = relid;
+
+	return hb;
+}
+
+/*
+ * heap_scan_end_batch
+ *
+ * Release any outstanding pin and free the batch allocations. Caller will
+ * not use 'am_batch' after this point.
+ */
+void
+heap_end_batch(TableScanDesc sscan, void *am_batch)
+{
+	HeapBatch *hb = (HeapBatch *) am_batch;
+
+	if (BufferIsValid(hb->buf))
+		ReleaseBuffer(hb->buf);
+
+	pfree(hb->tupdata);
+	pfree(hb);
+}
+
+int
+heap_getnextbatch(TableScanDesc sscan, void *am_batch, ScanDirection dir)
+{
+	HeapScanDesc scan = (HeapScanDesc) sscan;
+	HeapBatch  *hb = (HeapBatch *) am_batch;
+	Buffer		curbuf;
+	int			n;
+
+	Assert(ScanDirectionIsForward(dir));
+	Assert(sscan->rs_flags & SO_ALLOW_PAGEMODE);
+	Assert(hb->maxitems > 0);
+
+	/* Drop prior batch pin, if any. */
+	if (BufferIsValid(hb->buf))
+	{
+		ReleaseBuffer(hb->buf);
+		hb->buf = InvalidBuffer;
+	}
+
+	hb->nitems = 0;
+
+	/* One call per batch, never crosses a page. */
+	n = heapgettup_pagemode_batch(scan, dir,
+								  sscan->rs_nkeys, sscan->rs_key,
+								  hb->tupdata, hb->maxitems);
+
+	if (n == 0)
+		return 0;	/* end of scan */
+
+	/* Hold a shared pin for the batch lifetime so t_data stays valid. */
+	curbuf = scan->rs_cbuf;
+	IncrBufferRefCount(curbuf);
+	hb->buf = curbuf;
+
+	/* Per-tuple stats (can be collapsed into a future _multi() call). */
+	pgstat_count_heap_getnext_batch(sscan->rs_rd, n);
+
+	hb->nitems = n;
+	return n;
+}
+
+/*----- End of batching support -----*/
+
 void
 heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
 				  ItemPointer maxtid)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index dd4fe6bf62f..550b788553c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2623,6 +2623,10 @@ static const TableAmRoutine heapam_methods = {
 	.scan_rescan = heap_rescan,
 	.scan_getnextslot = heap_getnextslot,
 
+	.scan_begin_batch = heap_begin_batch,
+	.scan_getnextbatch = heap_getnextbatch,
+	.scan_end_batch = heap_end_batch,
+
 	.scan_set_tidrange = heap_set_tidrange,
 	.scan_getnextslot_tidrange = heap_getnextslot_tidrange,
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f7e4ae3843c..f6675043fb3 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -101,6 +101,19 @@ typedef struct HeapScanDescData
 } HeapScanDescData;
 typedef struct HeapScanDescData *HeapScanDesc;
 
+/*
+ * HeapBatch -- stateless per-batch buffer. A batch pins one page and
+ * exposes up to maxitems HeapTupleData headers whose t_data point into that
+ * page.
+ */
+typedef struct HeapBatch
+{
+	HeapTupleData  *tupdata;	/* len = maxitems; headers only */
+	int				nitems;		/* tuples produced in last getnextbatch() */
+	int				maxitems;	/* fixed capacity set at begin_batch() */
+	Buffer			buf;		/* single pinned buffer for this batch */
+} HeapBatch;
+
 typedef struct BitmapHeapScanDescData
 {
 	HeapScanDescData rs_heap_base;
@@ -337,6 +350,11 @@ extern void heap_endscan(TableScanDesc sscan);
 extern HeapTuple heap_getnext(TableScanDesc sscan, ScanDirection direction);
 extern bool heap_getnextslot(TableScanDesc sscan,
 							 ScanDirection direction, TupleTableSlot *slot);
+
+extern void *heap_begin_batch(TableScanDesc sscan, int maxitems);
+extern void heap_end_batch(TableScanDesc sscan, void *am_batch);
+extern int heap_getnextbatch(TableScanDesc sscan, void *am_batch, ScanDirection dir);
+
 extern void heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
 							  ItemPointer maxtid);
 extern bool heap_getnextslot_tidrange(TableScanDesc sscan,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2fa790b6bf5..3ec3c3dd008 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -351,6 +351,16 @@ typedef struct TableAmRoutine
 									 ScanDirection direction,
 									 TupleTableSlot *slot);
 
+	/* ------------------------------------------------------------------------
+	 * Batched scan support
+	 * ------------------------------------------------------------------------
+	 */
+
+	void	   *(*scan_begin_batch)(TableScanDesc sscan, int maxitems);
+	int			(*scan_getnextbatch)(TableScanDesc sscan, void *am_batch,
+									 ScanDirection dir);
+	void		(*scan_end_batch)(TableScanDesc sscan, void *am_batch);
+
 	/*-----------
 	 * Optional functions to provide scanning for ranges of ItemPointers.
 	 * Implementations must either provide both of these functions, or neither
@@ -1036,6 +1046,54 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/*
+ * table_scan_begin_batch
+ *		Allocate AM-owned batch payload with capacity 'maxitems'.
+ */
+static inline void *
+table_scan_begin_batch(TableScanDesc sscan, int maxitems)
+{
+	const TableAmRoutine *tam = sscan->rs_rd->rd_tableam;
+
+	Assert(tam->scan_begin_batch != NULL);
+
+	return tam->scan_begin_batch(sscan, maxitems);
+}
+
+/*
+ * table_scan_getnextbatch
+ *		Fill next batch from the AM. Returns number of tuples, 0 => EOS.
+ *		Batches are single-page in v1. Direction is forward only in v1.
+ */
+static inline int
+table_scan_getnextbatch(TableScanDesc sscan, void *am_batch, ScanDirection dir)
+{
+	const TableAmRoutine *tam = sscan->rs_rd->rd_tableam;
+
+	/* Only forward scans are supported in the batched mode. */
+	Assert(dir == ForwardScanDirection);
+	Assert(tam->scan_getnextbatch != NULL);
+
+	return tam->scan_getnextbatch(sscan, am_batch, dir);
+}
+
+/*
+ * table_scan_end_batch
+ *		Release AM-owned resources for the batch payload.
+ */
+static inline void
+table_scan_end_batch(TableScanDesc sscan, void *am_batch)
+{
+	const TableAmRoutine *tam = sscan->rs_rd->rd_tableam;
+
+	if (am_batch == NULL)
+		return;
+
+	Assert(tam->scan_end_batch != NULL);
+
+	tam->scan_end_batch(sscan, am_batch);
+}
+
 /* ----------------------------------------------------------------------------
  * TID Range scanning related functions.
  * ----------------------------------------------------------------------------
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6714363144a..85f76dee468 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -697,6 +697,11 @@ extern void pgstat_report_analyze(Relation rel,
 		if (pgstat_should_count_relation(rel))						\
 			(rel)->pgstat_info->counts.tuples_returned++;			\
 	} while (0)
+#define pgstat_count_heap_getnext_batch(rel, n)						\
+	do {															\
+		if (pgstat_should_count_relation(rel))						\
+			(rel)->pgstat_info->counts.tuples_returned += n;		\
+	} while (0)
 #define pgstat_count_heap_fetch(rel)								\
 	do {															\
 		if (pgstat_should_count_relation(rel))						\
-- 
2.47.3



  [application/octet-stream] v4-0002-SeqScan-add-batch-driven-variants-returning-slots.patch (27.6K, 4-v4-0002-SeqScan-add-batch-driven-variants-returning-slots.patch)
  download | inline diff:
From 5630836aefb87948bb745d7faad01e9e3534a64c Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 20 Dec 2025 17:23:12 +0900
Subject: [PATCH v4 2/3] SeqScan: add batch-driven variants returning slots

Teach SeqScan to drive the table AM via new the batch API added in
the previous commit, while still returning one TupleTableSlot at a
time to callers. This reduces per tuple AM crossings without
changing the node interface seen by parents.

Add TupleBatch and supporting code in execBatch.c/h to hold executor
side batching state. PlanState gains ps_Batch to carry the active
TupleBatch when a node supports batching.

Wire up runtime selection in ExecInitSeqScan using
ScanCanUseBatching(). When executor_batching is enabled, EPQ is
inactive, the scan is not backward, and the relation supports
batching, ps.ExecProcNode is set to a batch-driven variant. Otherwise
the non-batch path is used.

Plan shape and EXPLAIN output remain unchanged; only the internal
tuple flow differs when batching is enabled and allowed.

Add executor_batch_rows GUC to specify the maximum number of rows
that can be added into a batch.

Notes / current limits:

- With the current heapam, batches are composed from a single page, so
  the batch may not always be full. Future work may let SeqScan and/or
  AMs top up batches across pages when safe to do so.

Reviewed-by: Daniil Davydov <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqFfAY_ZFqN8wcAEMw71T9hM_kA8UtyHaZZEZtuT3UyogA@mail.gmail.com
---
 src/backend/access/heap/heapam.c          |  29 ++++
 src/backend/access/heap/heapam_handler.c  |  16 ++
 src/backend/access/table/tableam.c        |  11 ++
 src/backend/executor/Makefile             |   1 +
 src/backend/executor/execBatch.c          | 117 ++++++++++++++
 src/backend/executor/execScan.c           |  31 ++++
 src/backend/executor/meson.build          |   1 +
 src/backend/executor/nodeSeqscan.c        | 176 +++++++++++++++++++++-
 src/backend/utils/init/globals.c          |   3 +
 src/backend/utils/misc/guc_parameters.dat |   9 ++
 src/include/access/heapam.h               |   1 +
 src/include/access/tableam.h              |  27 ++++
 src/include/executor/execBatch.h          |  99 ++++++++++++
 src/include/executor/execScan.h           |  69 +++++++++
 src/include/executor/executor.h           |   4 +
 src/include/miscadmin.h                   |   1 +
 src/include/nodes/execnodes.h             |   4 +
 17 files changed, 598 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/executor/execBatch.c
 create mode 100644 src/include/executor/execBatch.h

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fcc0813f139..0c0b2384f0e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1592,6 +1592,35 @@ heap_begin_batch(TableScanDesc sscan, int maxitems)
 	return hb;
 }
 
+/*
+ * heap_scan_materialize_all
+ *
+ * Bind all tuples of the current batch into 'slots'. We bind the
+ * HeapTupleData header that points into the pinned page. No per-row copy.
+ */
+void
+heap_materialize_batch_all(void *am_batch, TupleTableSlot **slots, int n)
+{
+	HeapBatch *hb = (HeapBatch *) am_batch;
+
+	Assert(n <= hb->nitems);
+
+	for (int i = 0; i < n; i++)
+	{
+		HeapTupleData *tuple = &hb->tupdata[i];
+		HeapTupleTableSlot *slot = (HeapTupleTableSlot *) slots[i];
+
+		/* Inline of ExecStoreHeapTuple(tuple, slot, false) */
+		slot->tuple = tuple;
+		slot->off = 0;
+		slot->base.tts_nvalid = 0;
+		slot->base.tts_flags &= ~(TTS_FLAG_EMPTY | TTS_FLAG_SHOULDFREE);
+		slot->base.tts_tid = tuple->t_self;
+		slot->base.tts_tableOid = tuple->t_tableOid;
+		slot->base.tts_flags &= ~(TTS_FLAG_SHOULDFREE | TTS_FLAG_EMPTY);
+	}
+}
+
 /*
  * heap_scan_end_batch
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 550b788553c..a4de7e5b4f5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -72,6 +72,21 @@ heapam_slot_callbacks(Relation relation)
 	return &TTSOpsBufferHeapTuple;
 }
 
+/* ------------------------------------------------------------------------
+ * TupleBatch related callbacks for heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static const TupleBatchOps TupleBatchHeapOps =
+{
+	.materialize_all = heap_materialize_batch_all
+};
+
+static const TupleBatchOps *
+heapam_batch_callbacks(Relation relation)
+{
+	return &TupleBatchHeapOps;
+}
 
 /* ------------------------------------------------------------------------
  * Index Scan Callbacks for heap AM
@@ -2617,6 +2632,7 @@ static const TableAmRoutine heapam_methods = {
 	.type = T_TableAmRoutine,
 
 	.slot_callbacks = heapam_slot_callbacks,
+	.batch_callbacks = heapam_batch_callbacks,
 
 	.scan_begin = heap_beginscan,
 	.scan_end = heap_endscan,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 73ebc01a08f..d281aacaf94 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -103,6 +103,17 @@ table_slot_create(Relation relation, List **reglist)
 	return slot;
 }
 
+/* ----------------------------------------------------------------------------
+ * TupleBatch support routines
+ * ----------------------------------------------------------------------------
+ */
+const TupleBatchOps *
+table_batch_callbacks(Relation relation)
+{
+	if (relation->rd_tableam)
+		return relation->rd_tableam->batch_callbacks(relation);
+	elog(ERROR, "relation does not support TupleBatch operations");
+}
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..3e72f3fe03c 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	execAmi.o \
 	execAsync.o \
+	execBatch.o \
 	execCurrent.o \
 	execExpr.o \
 	execExprInterp.o \
diff --git a/src/backend/executor/execBatch.c b/src/backend/executor/execBatch.c
new file mode 100644
index 00000000000..007ae535687
--- /dev/null
+++ b/src/backend/executor/execBatch.c
@@ -0,0 +1,117 @@
+/*-------------------------------------------------------------------------
+ *
+ * execBatch.c
+ *		Helpers for TupleBatch
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/execBatch.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "executor/execBatch.h"
+
+/*
+ * TupleBatchCreate
+ *		Allocate and initialize a new TupleBatch envelope.
+ */
+TupleBatch *
+TupleBatchCreate(TupleDesc scandesc, int capacity)
+{
+	TupleBatch  *b;
+	TupleTableSlot **inslots,
+				   **outslots;
+
+	inslots = palloc(sizeof(TupleTableSlot *) * capacity);
+	outslots = palloc(sizeof(TupleTableSlot *) * capacity);
+	for (int i = 0; i < capacity; i++)
+		inslots[i] = MakeSingleTupleTableSlot(scandesc, &TTSOpsHeapTuple);
+
+	b = (TupleBatch *) palloc(sizeof(TupleBatch));
+
+	/* Initial state: empty envelope */
+	b->am_payload = NULL;
+	b->ntuples = 0;
+	b->inslots = inslots;
+	b->outslots = outslots;
+	b->activeslots = NULL;
+	b->outslots = outslots;
+	b->maxslots = capacity;
+
+	b->nvalid = 0;
+	b->next = 0;
+
+	return b;
+}
+
+/*
+ * TupleBatchReset
+ *		Reset an existing TupleBatch envelope to empty.
+ */
+void
+TupleBatchReset(TupleBatch *b, bool drop_slots)
+{
+	if (b == NULL)
+		return;
+
+	for (int i = 0; i < b->maxslots; i++)
+	{
+		ExecClearTuple(b->inslots[i]);
+		if (drop_slots)
+			ExecDropSingleTupleTableSlot(b->inslots[i]);
+	}
+
+	if (drop_slots)
+	{
+		pfree(b->inslots);
+		pfree(b->outslots);
+		b->inslots = b->outslots = NULL;
+	}
+
+	b->ntuples = 0;
+	b->nvalid = 0;
+	b->next = 0;
+	b->activeslots = NULL;
+}
+
+void
+TupleBatchUseInput(TupleBatch *b, int nvalid)
+{
+	b->materialized = true;
+	b->activeslots = b->inslots;
+	b->nvalid = nvalid;
+	b->next = 0;
+}
+
+void
+TupleBatchUseOutput(TupleBatch *b, int nvalid)
+{
+	b->materialized = true;
+	b->activeslots = b->outslots;
+	b->nvalid = nvalid;
+	b->next = 0;
+}
+
+bool
+TupleBatchIsValid(TupleBatch *b)
+{
+	return	b != NULL &&
+			b->maxslots > 0 &&
+			b->inslots != NULL &&
+			b->outslots != NULL;
+}
+
+void
+TupleBatchRewind(TupleBatch *b)
+{
+	b->next = 0;
+}
+
+int
+TupleBatchGetNumValid(TupleBatch *b)
+{
+	return b->nvalid;
+}
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 31ed4783c1d..ba25daa5e46 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -18,6 +18,7 @@
  */
 #include "postgres.h"
 
+#include "access/tableam.h"
 #include "executor/executor.h"
 #include "executor/execScan.h"
 #include "miscadmin.h"
@@ -154,3 +155,33 @@ ExecScanReScan(ScanState *node)
 		}
 	}
 }
+
+bool
+ScanCanUseBatching(ScanState *scanstate, int eflags)
+{
+	Relation	relation = scanstate->ss_currentRelation;
+
+	return	executor_batch_rows > 0 &&
+			(scanstate->ps.state->es_epq_active == NULL) &&
+			!(eflags & EXEC_FLAG_BACKWARD) &&
+			relation && table_supports_batching(relation);
+}
+
+void
+ScanResetBatching(ScanState *scanstate, bool drop)
+{
+	TupleBatch *b = scanstate->ps.ps_Batch;
+
+	if (b)
+	{
+		TupleBatchReset(b, drop);
+		if (b->am_payload)
+		{
+			table_scan_end_batch(scanstate->ss_currentScanDesc,
+								 b->am_payload);
+			b->am_payload = NULL;
+		}
+		if (drop)
+			pfree(b);
+	}
+}
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 2cea41f8771..40ffc28f3cb 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -3,6 +3,7 @@
 backend_sources += files(
   'execAmi.c',
   'execAsync.c',
+  'execBatch.c',
   'execCurrent.c',
   'execExpr.c',
   'execExprInterp.c',
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..a9071e32560 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -203,6 +203,171 @@ ExecSeqScanEPQ(PlanState *pstate)
 					(ExecScanRecheckMtd) SeqRecheck);
 }
 
+/* ----------------------------------------------------------------
+ *						Batch Support
+ * ----------------------------------------------------------------
+ */
+static inline bool
+SeqNextBatch(SeqScanState *node)
+{
+	TableScanDesc scandesc;
+	EState	   *estate;
+	ScanDirection direction;
+
+	Assert(node->ss.ps.ps_Batch != NULL);
+
+	/*
+	 * get information from the estate and scan state
+	 */
+	scandesc = node->ss.ss_currentScanDesc;
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	Assert(direction == ForwardScanDirection);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the scan is not parallel, or if we're serially
+		 * executing a scan that was planned to be parallel.
+		 */
+		scandesc = table_beginscan(node->ss.ss_currentRelation,
+								   estate->es_snapshot,
+								   0, NULL);
+		node->ss.ss_currentScanDesc = scandesc;
+	}
+
+	/* Lazily create the AM batch payload. */
+	if (node->ss.ps.ps_Batch->am_payload == NULL)
+	{
+		const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY = scandesc->rs_rd->rd_tableam;
+
+		Assert(tam && tam->scan_begin_batch);
+		node->ss.ps.ps_Batch->am_payload =
+			table_scan_begin_batch(scandesc, node->ss.ps.ps_Batch->maxslots);
+		node->ss.ps.ps_Batch->ops = table_batch_callbacks(node->ss.ss_currentRelation);
+	}
+
+	node->ss.ps.ps_Batch->ntuples =
+		table_scan_getnextbatch(scandesc, node->ss.ps.ps_Batch->am_payload, direction);
+	node->ss.ps.ps_Batch->nvalid = node->ss.ps.ps_Batch->ntuples;
+	node->ss.ps.ps_Batch->materialized = false;
+
+	return node->ss.ps.ps_Batch->ntuples > 0;
+}
+
+static inline bool
+SeqNextBatchMaterialize(SeqScanState *node)
+{
+	if (SeqNextBatch(node))
+	{
+		TupleBatchMaterializeAll(node->ss.ps.ps_Batch);
+		return true;
+	}
+
+	return false;
+}
+
+static TupleTableSlot *
+ExecSeqScanBatchSlot(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtendedBatchSlot(&node->ss,
+									 (ExecScanAccessBatchMtd) SeqNextBatchMaterialize,
+									 NULL, NULL);
+}
+
+static TupleTableSlot *
+ExecSeqScanBatchSlotWithQual(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	/*
+	 * Use pg_assume() for != NULL tests to make the compiler realize no
+	 * runtime check for the field is needed in ExecScanExtended().
+	 */
+	Assert(pstate->state->es_epq_active == NULL);
+	pg_assume(pstate->qual != NULL);
+	Assert(pstate->ps_ProjInfo == NULL);
+
+	return ExecScanExtendedBatchSlot(&node->ss,
+									 (ExecScanAccessBatchMtd) SeqNextBatchMaterialize,
+									 pstate->qual, NULL);
+}
+
+/*
+ * Variant of ExecSeqScan() but when projection is required.
+ */
+static TupleTableSlot *
+ExecSeqScanBatchSlotWithProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	Assert(pstate->qual == NULL);
+	pg_assume(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtendedBatchSlot(&node->ss,
+									 (ExecScanAccessBatchMtd) SeqNextBatchMaterialize,
+									 NULL, pstate->ps_ProjInfo);
+}
+
+/*
+ * Variant of ExecSeqScan() but when qual evaluation and projection are
+ * required.
+ */
+static TupleTableSlot *
+ExecSeqScanBatchSlotWithQualProject(PlanState *pstate)
+{
+	SeqScanState *node = castNode(SeqScanState, pstate);
+
+	Assert(pstate->state->es_epq_active == NULL);
+	pg_assume(pstate->qual != NULL);
+	pg_assume(pstate->ps_ProjInfo != NULL);
+
+	return ExecScanExtendedBatchSlot(&node->ss,
+									 (ExecScanAccessBatchMtd) SeqNextBatchMaterialize,
+									 pstate->qual, pstate->ps_ProjInfo);
+}
+
+/* Batch SeqScan enablement and dispatch */
+static void
+SeqScanInitBatching(SeqScanState *scanstate, int eflags)
+{
+	const int cap = executor_batch_rows;
+	TupleDesc	scandesc = RelationGetDescr(scanstate->ss.ss_currentRelation);
+
+	scanstate->ss.ps.ps_Batch = TupleBatchCreate(scandesc, cap);
+
+	/* Choose batch variant to preserve your specialization matrix */
+	if (scanstate->ss.ps.qual == NULL)
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+		{
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanBatchSlot;
+		}
+		else
+		{
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanBatchSlotWithProject;
+		}
+	}
+	else
+	{
+		if (scanstate->ss.ps.ps_ProjInfo == NULL)
+		{
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanBatchSlotWithQual;
+		}
+		else
+		{
+			scanstate->ss.ps.ExecProcNode = ExecSeqScanBatchSlotWithQualProject;
+		}
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitSeqScan
  * ----------------------------------------------------------------
@@ -211,6 +376,7 @@ SeqScanState *
 ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 {
 	SeqScanState *scanstate;
+	bool	use_batching;
 
 	/*
 	 * Once upon a time it was possible to have an outerPlan of a SeqScan, but
@@ -241,9 +407,12 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 							 node->scan.scanrelid,
 							 eflags);
 
+	use_batching = ScanCanUseBatching(&scanstate->ss, eflags);
+
 	/* and create slot with the appropriate rowtype */
 	ExecInitScanTupleSlot(estate, &scanstate->ss,
 						  RelationGetDescr(scanstate->ss.ss_currentRelation),
+						  use_batching ? &TTSOpsHeapTuple :
 						  table_slot_callbacks(scanstate->ss.ss_currentRelation));
 
 	/*
@@ -280,6 +449,9 @@ ExecInitSeqScan(SeqScan *node, EState *estate, int eflags)
 			scanstate->ss.ps.ExecProcNode = ExecSeqScanWithQualProject;
 	}
 
+	if (use_batching)
+		SeqScanInitBatching(scanstate, eflags);
+
 	return scanstate;
 }
 
@@ -299,6 +471,8 @@ ExecEndSeqScan(SeqScanState *node)
 	 */
 	scanDesc = node->ss.ss_currentScanDesc;
 
+	ScanResetBatching(&node->ss, true);
+
 	/*
 	 * close heap scan
 	 */
@@ -327,7 +501,7 @@ ExecReScanSeqScan(SeqScanState *node)
 	if (scan != NULL)
 		table_rescan(scan,		/* scan desc */
 					 NULL);		/* new scan keys */
-
+	ScanResetBatching(&node->ss, false);
 	ExecScanReScan((ScanState *) node);
 }
 
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index d31cb45a058..266502e9778 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -165,3 +165,6 @@ int			notify_buffers = 16;
 int			serializable_buffers = 32;
 int			subtransaction_buffers = 0;
 int			transaction_buffers = 0;
+
+/* executor batching */
+int			executor_batch_rows = 64;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..fd97d26c073 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1001,6 +1001,15 @@
   boot_val => 'true',
 },
 
+{ name => 'executor_batch_rows', type => 'int', context => 'PGC_USERSET', group => 'DEVELOPER_OPTIONS',
+  short_desc => 'Number of rows to include in batches during execution.',
+  flags => 'GUC_NOT_IN_SAMPLE',
+  variable => 'executor_batch_rows',
+  boot_val => '64',
+  min => '0',
+  max => '1024',
+},
+
 { name => 'exit_on_error', type => 'bool', context => 'PGC_USERSET', group => 'ERROR_HANDLING_OPTIONS',
   short_desc => 'Terminate session on any error.',
   variable => 'ExitOnAnyError',
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f6675043fb3..fe07b21eaa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -354,6 +354,7 @@ extern bool heap_getnextslot(TableScanDesc sscan,
 extern void *heap_begin_batch(TableScanDesc sscan, int maxitems);
 extern void heap_end_batch(TableScanDesc sscan, void *am_batch);
 extern int heap_getnextbatch(TableScanDesc sscan, void *am_batch, ScanDirection dir);
+extern void heap_materialize_batch_all(void *am_batch, TupleTableSlot **slots, int n);
 
 extern void heap_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
 							  ItemPointer maxtid);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 3ec3c3dd008..13a95f7a589 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,6 +21,7 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "commands/vacuum.h"
+#include "executor/execBatch.h"
 #include "executor/tuptable.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
@@ -39,6 +40,7 @@ typedef struct BulkInsertStateData BulkInsertStateData;
 typedef struct IndexInfo IndexInfo;
 typedef struct SampleScanState SampleScanState;
 typedef struct ValidateIndexState ValidateIndexState;
+typedef struct TupleBatchOps TupleBatchOps;
 
 /*
  * Bitmask values for the flags argument to the scan_begin callback.
@@ -301,6 +303,7 @@ typedef struct TableAmRoutine
 	 * Return slot implementation suitable for storing a tuple of this AM.
 	 */
 	const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
+	const TupleBatchOps *(*batch_callbacks)(Relation rel);
 
 
 	/* ------------------------------------------------------------------------
@@ -361,6 +364,7 @@ typedef struct TableAmRoutine
 									 ScanDirection dir);
 	void		(*scan_end_batch)(TableScanDesc sscan, void *am_batch);
 
+
 	/*-----------
 	 * Optional functions to provide scanning for ranges of ItemPointers.
 	 * Implementations must either provide both of these functions, or neither
@@ -872,6 +876,16 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation relation);
  */
 extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
 
+/* ----------------------------------------------------------------------------
+ * TupleBatch functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Returns callbacks for manipulating TupleBatch for tuples of the given
+ * relation.
+ */
+extern const TupleBatchOps *table_batch_callbacks(Relation relation);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1046,6 +1060,18 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 	return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
 }
 
+/*
+ * table_supports_batching
+ *		Does the relation's AM support batching?
+ */
+static inline bool
+table_supports_batching(Relation relation)
+{
+	const TableAmRoutine *tam = relation->rd_tableam;
+
+	return tam->scan_getnextbatch != NULL;
+}
+
 /*
  * table_scan_begin_batch
  *		Allocate AM-owned batch payload with capacity 'maxitems'.
@@ -2128,5 +2154,6 @@ extern const TableAmRoutine *GetTableAmRoutine(Oid amhandler);
  */
 
 extern const TableAmRoutine *GetHeapamTableAmRoutine(void);
+extern struct TupleBatchOps *GetHeapamTupleBatchOps(void);
 
 #endif							/* TABLEAM_H */
diff --git a/src/include/executor/execBatch.h b/src/include/executor/execBatch.h
new file mode 100644
index 00000000000..2d0066103ce
--- /dev/null
+++ b/src/include/executor/execBatch.h
@@ -0,0 +1,99 @@
+/*-------------------------------------------------------------------------
+ *
+ * execBatch.h
+ *		Executor batch envelope for passing tuple batch state upward
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/executor/execBatch.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXECBATCH_H
+#define EXECBATCH_H
+
+#include "executor/tuptable.h"
+
+/*
+ * TupleBatchOps -- AM-specific helpers for lazy materialization.
+ */
+typedef struct TupleBatchOps
+{
+	void (*materialize_all)(void *am_payload,
+							TupleTableSlot **dst,
+							int maxslots);
+} TupleBatchOps;
+
+/*
+ * TupleBatch
+ *
+ * Envelope for a batch of tuples produced by a plan node (e.g., SeqScan) per
+ * call to a batch variant of ExecSeqScan().
+ */
+typedef struct TupleBatch
+{
+	void	   *am_payload;
+	const TupleBatchOps *ops;
+	int			ntuples;				/* number of tuples in am_payload */
+	bool		materialized;		 /* tuples in slots valid? */
+	struct TupleTableSlot **inslots; /* slots for tuples read "into" batch */
+	struct TupleTableSlot **outslots; /* slots for tuples going "out of"
+									   * batch */
+	struct TupleTableSlot **activeslots;
+	int			maxslots;
+
+	int		nvalid;		/* number of returnable tuples in outslots */
+	int		next;		/* 0-based index of next tuple to be returned */
+} TupleBatch;
+
+
+/* Helpers */
+extern TupleBatch *TupleBatchCreate(TupleDesc scandesc, int capacity);
+extern void TupleBatchReset(TupleBatch *b, bool drop_slots);
+extern void TupleBatchUseInput(TupleBatch *b, int nvalid);
+extern void TupleBatchUseOutput(TupleBatch *b, int nvalid);
+extern bool TupleBatchIsValid(TupleBatch *b);
+extern void TupleBatchRewind(TupleBatch *b);
+extern int TupleBatchGetNumValid(TupleBatch *b);
+
+static inline TupleTableSlot *
+TupleBatchGetNextSlot(TupleBatch *b)
+{
+	return b->next < b->nvalid ? b->activeslots[b->next++] : NULL;
+}
+
+static inline TupleTableSlot *
+TupleBatchGetSlot(TupleBatch *b, int index)
+{
+	Assert(index < b->nvalid);
+	return b->activeslots[index];
+}
+
+static inline void
+TupleBatchStoreInOut(TupleBatch *b, int index, TupleTableSlot *out)
+{
+	Assert(TupleBatchIsValid(b));
+	b->outslots[index] = out;
+}
+
+static inline bool
+TupleBatchHasMore(TupleBatch *b)
+{
+	return b->activeslots && b->next < b->nvalid;
+}
+
+static inline void
+TupleBatchMaterializeAll(TupleBatch *b)
+{
+	if (b->materialized)
+		return;
+
+	if (b->ops == NULL || b->ops->materialize_all == NULL)
+		elog(ERROR, "TupleBatch has no slots and no materialize_all op");
+
+	b->ops->materialize_all(b->am_payload, b->inslots, b->ntuples);
+	TupleBatchUseInput(b, b->ntuples);
+}
+
+#endif	/* EXECBATCH_H */
diff --git a/src/include/executor/execScan.h b/src/include/executor/execScan.h
index 2003cbc7ed5..c1add8ca331 100644
--- a/src/include/executor/execScan.h
+++ b/src/include/executor/execScan.h
@@ -251,4 +251,73 @@ ExecScanExtended(ScanState *node,
 	}
 }
 
+/*
+ * ExecScanExtendedBatchSlot
+ *		Batch-driven variant of ExecScanExtended.
+ *
+ * Returns one tuple at a time to callers, but internally fetches tuples
+ * in batches from the AM via accessBatchMtd. This reduces per-tuple AM
+ * call overhead while preserving the single-slot interface expected by
+ * parent nodes.
+ *
+ * The batch is refilled when exhausted by calling accessBatchMtd, which
+ * returns false at end-of-scan.
+ *
+ * Note: EPQ is not supported in the batch path; callers must ensure
+ * es_epq_active is NULL before using this function.
+ */
+static inline TupleTableSlot *
+ExecScanExtendedBatchSlot(ScanState *node,
+						  ExecScanAccessBatchMtd accessBatchMtd,
+						  ExprState *qual, ProjectionInfo *projInfo)
+{
+	ExprContext *econtext = node->ps.ps_ExprContext;
+	TupleBatch *b = node->ps.ps_Batch;
+
+	/* Batch path does not support EPQ */
+	Assert(node->ps.state->es_epq_active == NULL);
+	Assert(TupleBatchIsValid(b));
+
+	for (;;)
+	{
+		TupleTableSlot *in;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* Get next input slot from current batch, or refill */
+		if (!TupleBatchHasMore(b))
+		{
+			if (!accessBatchMtd(node))
+				return NULL;
+		}
+
+		in = TupleBatchGetNextSlot(b);
+		Assert(in);
+
+		/* No qual, no projection: direct return */
+		if (qual == NULL && projInfo == NULL)
+			return in;
+
+		ResetExprContext(econtext);
+		econtext->ecxt_scantuple = in;
+
+		/* Qual only */
+		if (projInfo == NULL)
+		{
+			if (qual == NULL || ExecQual(qual, econtext))
+				return in;
+			else
+				InstrCountFiltered1(node, 1);
+			continue;
+		}
+
+		/* Projection (with or without qual) */
+		if (qual == NULL || ExecQual(qual, econtext))
+			return ExecProject(projInfo);
+		else
+			InstrCountFiltered1(node, 1);
+		/* else try next tuple */
+	}
+}
+
 #endif							/* EXECSCAN_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 7cd6a49309f..c1f05ce6273 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -578,12 +578,16 @@ extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
  */
 typedef TupleTableSlot *(*ExecScanAccessMtd) (ScanState *node);
 typedef bool (*ExecScanRecheckMtd) (ScanState *node, TupleTableSlot *slot);
+typedef bool (*ExecScanAccessBatchMtd)(ScanState *node);
 
 extern TupleTableSlot *ExecScan(ScanState *node, ExecScanAccessMtd accessMtd,
 								ExecScanRecheckMtd recheckMtd);
+
 extern void ExecAssignScanProjectionInfo(ScanState *node);
 extern void ExecAssignScanProjectionInfoWithVarno(ScanState *node, int varno);
 extern void ExecScanReScan(ScanState *node);
+extern bool ScanCanUseBatching(ScanState *scanstate, int eflags);
+extern void ScanResetBatching(ScanState *scanstate, bool drop);
 
 /*
  * prototypes from functions in execTuples.c
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9a7d733ddef..13285210998 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -288,6 +288,7 @@ extern PGDLLIMPORT double VacuumCostDelay;
 extern PGDLLIMPORT int VacuumCostBalance;
 extern PGDLLIMPORT bool VacuumCostActive;
 
+extern PGDLLIMPORT int executor_batch_rows;
 
 /* in utils/misc/stack_depth.c */
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..219a722c49a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -30,6 +30,7 @@
 #define EXECNODES_H
 
 #include "access/tupconvert.h"
+#include "executor/execBatch.h"
 #include "executor/instrument.h"
 #include "fmgr.h"
 #include "lib/ilist.h"
@@ -1204,6 +1205,9 @@ typedef struct PlanState
 	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
 
+	/* Batching state if node supports it. */
+	TupleBatch *ps_Batch;
+
 	bool		async_capable;	/* true if node is async-capable */
 
 	/*
-- 
2.47.3



  [application/octet-stream] v4-0003-Add-EXPLAIN-BATCHES-option-for-tuple-batching-sta.patch (13.8K, 5-v4-0003-Add-EXPLAIN-BATCHES-option-for-tuple-batching-sta.patch)
  download | inline diff:
From 189edab507d407cce6446a944b3a48c327167ec3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 20 Dec 2025 23:09:37 +0900
Subject: [PATCH v4 3/3] Add EXPLAIN (BATCHES) option for tuple batching
 statistics

Add a BATCHES option to EXPLAIN that reports per-node batch statistics
when a node uses batch mode execution.

For nodes that support batching (currently SeqScan), this shows the
number of batches fetched along with average, minimum, and maximum
rows per batch. Output is supported in both text and non-text formats.

Add regression tests covering text output, JSON format, filtered scans,
LIMIT, and disabled batching.

Discussion: https://postgr.es/m/CA+HiwqFfAY_ZFqN8wcAEMw71T9hM_kA8UtyHaZZEZtuT3UyogA@mail.gmail.com
---
 src/backend/commands/explain.c        | 30 ++++++++++++++
 src/backend/commands/explain_state.c  |  2 +
 src/backend/executor/execBatch.c      |  8 +++-
 src/backend/executor/nodeSeqscan.c    | 24 +++++------
 src/include/commands/explain_state.h  |  1 +
 src/include/executor/execBatch.h      | 35 +++++++++++++++-
 src/include/executor/instrument.h     |  1 +
 src/test/regress/expected/explain.out | 57 +++++++++++++++++++++++++++
 src/test/regress/sql/explain.sql      | 26 ++++++++++++
 9 files changed, 171 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..3a639a13807 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -22,6 +22,7 @@
 #include "commands/explain_format.h"
 #include "commands/explain_state.h"
 #include "commands/prepare.h"
+#include "executor/execBatch.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "libpq/pqformat.h"
@@ -517,6 +518,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 		instrument_option |= INSTRUMENT_BUFFERS;
 	if (es->wal)
 		instrument_option |= INSTRUMENT_WAL;
+	if (es->batches)
+		instrument_option |= INSTRUMENT_BATCHES;
 
 	/*
 	 * We always collect timing for the entire statement, even when node-level
@@ -2292,6 +2295,33 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		show_buffer_usage(es, &planstate->instrument->bufusage);
 	if (es->wal && planstate->instrument)
 		show_wal_usage(es, &planstate->instrument->walusage);
+	if (es->batches && planstate->ps_Batch)
+	{
+		TupleBatch *b = planstate->ps_Batch;
+
+		if (b->stat_batches > 0)
+		{
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str,
+								 "Batches: %lld  Avg Rows: %.1f  Max: %d  Min: %d\n",
+								 (long long) b->stat_batches,
+								 TupleBatchAvgRows(b),
+								 b->stat_max_rows,
+								 b->stat_min_rows == INT_MAX ? 0 : b->stat_min_rows);
+			}
+			else
+			{
+				ExplainPropertyInteger("Batches", NULL, b->stat_batches, es);
+				ExplainPropertyFloat("Average Batch Rows", NULL,
+									 TupleBatchAvgRows(b), 1, es);
+				ExplainPropertyInteger("Max Batch Rows", NULL, b->stat_max_rows, es);
+				ExplainPropertyInteger("Min Batch Rows", NULL,
+									   b->stat_min_rows == INT_MAX ? 0 : b->stat_min_rows, es);
+			}
+		}
+	}
 
 	/* Prepare per-worker buffer/WAL usage */
 	if (es->workers_state && (es->buffers || es->wal) && es->verbose)
diff --git a/src/backend/commands/explain_state.c b/src/backend/commands/explain_state.c
index a6623f8fa52..6ef6055c479 100644
--- a/src/backend/commands/explain_state.c
+++ b/src/backend/commands/explain_state.c
@@ -159,6 +159,8 @@ ParseExplainOptionList(ExplainState *es, List *options, ParseState *pstate)
 								"EXPLAIN", opt->defname, p),
 						 parser_errposition(pstate, opt->location)));
 		}
+		else if (strcmp(opt->defname, "batches") == 0)
+			es->batches = defGetBoolean(opt);
 		else if (!ApplyExtensionExplainOption(es, opt, pstate))
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/executor/execBatch.c b/src/backend/executor/execBatch.c
index 007ae535687..93c90680d3d 100644
--- a/src/backend/executor/execBatch.c
+++ b/src/backend/executor/execBatch.c
@@ -19,7 +19,7 @@
  *		Allocate and initialize a new TupleBatch envelope.
  */
 TupleBatch *
-TupleBatchCreate(TupleDesc scandesc, int capacity)
+TupleBatchCreate(TupleDesc scandesc, int capacity, bool track_stats)
 {
 	TupleBatch  *b;
 	TupleTableSlot **inslots,
@@ -44,6 +44,12 @@ TupleBatchCreate(TupleDesc scandesc, int capacity)
 	b->nvalid = 0;
 	b->next = 0;
 
+	b->track_stats = track_stats;
+	b->stat_batches = 0;
+	b->stat_rows = 0;
+	b->stat_max_rows = 0;
+	b->stat_min_rows = INT_MAX;
+
 	return b;
 }
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index a9071e32560..73eb9b6a51e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -213,8 +213,9 @@ SeqNextBatch(SeqScanState *node)
 	TableScanDesc scandesc;
 	EState	   *estate;
 	ScanDirection direction;
+	TupleBatch *b = node->ss.ps.ps_Batch;
 
-	Assert(node->ss.ps.ps_Batch != NULL);
+	Assert(b != NULL);
 
 	/*
 	 * get information from the estate and scan state
@@ -237,22 +238,21 @@ SeqNextBatch(SeqScanState *node)
 	}
 
 	/* Lazily create the AM batch payload. */
-	if (node->ss.ps.ps_Batch->am_payload == NULL)
+	if (b->am_payload == NULL)
 	{
 		const TableAmRoutine *tam PG_USED_FOR_ASSERTS_ONLY = scandesc->rs_rd->rd_tableam;
 
 		Assert(tam && tam->scan_begin_batch);
-		node->ss.ps.ps_Batch->am_payload =
-			table_scan_begin_batch(scandesc, node->ss.ps.ps_Batch->maxslots);
-		node->ss.ps.ps_Batch->ops = table_batch_callbacks(node->ss.ss_currentRelation);
+		b->am_payload = table_scan_begin_batch(scandesc, b->maxslots);
+		b->ops = table_batch_callbacks(node->ss.ss_currentRelation);
 	}
 
-	node->ss.ps.ps_Batch->ntuples =
-		table_scan_getnextbatch(scandesc, node->ss.ps.ps_Batch->am_payload, direction);
-	node->ss.ps.ps_Batch->nvalid = node->ss.ps.ps_Batch->ntuples;
-	node->ss.ps.ps_Batch->materialized = false;
+	b->ntuples = table_scan_getnextbatch(scandesc, b->am_payload, direction);
+	b->nvalid = b->ntuples;
+	b->materialized = false;
+	TupleBatchRecordStats(b, b->ntuples);
 
-	return node->ss.ps.ps_Batch->ntuples > 0;
+	return b->ntuples > 0;
 }
 
 static inline bool
@@ -340,8 +340,10 @@ SeqScanInitBatching(SeqScanState *scanstate, int eflags)
 {
 	const int cap = executor_batch_rows;
 	TupleDesc	scandesc = RelationGetDescr(scanstate->ss.ss_currentRelation);
+	EState *estate = scanstate->ss.ps.state;
+	bool track_stats = estate->es_instrument && (estate->es_instrument & INSTRUMENT_BATCHES);
 
-	scanstate->ss.ps.ps_Batch = TupleBatchCreate(scandesc, cap);
+	scanstate->ss.ps.ps_Batch = TupleBatchCreate(scandesc, cap, track_stats);
 
 	/* Choose batch variant to preserve your specialization matrix */
 	if (scanstate->ss.ps.qual == NULL)
diff --git a/src/include/commands/explain_state.h b/src/include/commands/explain_state.h
index ba073b86918..b82f7ac0829 100644
--- a/src/include/commands/explain_state.h
+++ b/src/include/commands/explain_state.h
@@ -55,6 +55,7 @@ typedef struct ExplainState
 	bool		memory;			/* print planner's memory usage information */
 	bool		settings;		/* print modified settings */
 	bool		generic;		/* generate a generic plan */
+	bool		batches;		/* print batch statistics */
 	ExplainSerializeOption serialize;	/* serialize the query's output? */
 	ExplainFormat format;		/* output format */
 	/* state for output formatting --- not reset for each new plan tree */
diff --git a/src/include/executor/execBatch.h b/src/include/executor/execBatch.h
index 2d0066103ce..e3a4f762284 100644
--- a/src/include/executor/execBatch.h
+++ b/src/include/executor/execBatch.h
@@ -13,6 +13,7 @@
 #ifndef EXECBATCH_H
 #define EXECBATCH_H
 
+#include "limits.h"
 #include "executor/tuptable.h"
 
 /*
@@ -45,11 +46,18 @@ typedef struct TupleBatch
 
 	int		nvalid;		/* number of returnable tuples in outslots */
 	int		next;		/* 0-based index of next tuple to be returned */
+
+	/* Statistics (populated when EXPLAIN ANALYZE BATCHES) */
+	bool	track_stats;	/* whether to collect stats */
+	int64	stat_batches;	/* total number of batches fetched */
+	int64	stat_rows;		/* total tuples across all batches */
+	int		stat_max_rows;	/* max rows in any single batch */
+	int		stat_min_rows;	/* min rows in any single batch (non-zero) */
 } TupleBatch;
 
 
 /* Helpers */
-extern TupleBatch *TupleBatchCreate(TupleDesc scandesc, int capacity);
+extern TupleBatch *TupleBatchCreate(TupleDesc scandesc, int capacity, bool track_stats);
 extern void TupleBatchReset(TupleBatch *b, bool drop_slots);
 extern void TupleBatchUseInput(TupleBatch *b, int nvalid);
 extern void TupleBatchUseOutput(TupleBatch *b, int nvalid);
@@ -96,4 +104,29 @@ TupleBatchMaterializeAll(TupleBatch *b)
 	TupleBatchUseInput(b, b->ntuples);
 }
 
+/* === Batching stats. ===*/
+
+static inline void
+TupleBatchRecordStats(TupleBatch *b, int rows)
+{
+	if (!b->track_stats)
+		return;
+
+	b->stat_batches++;
+	b->stat_rows += rows;
+	if (rows > b->stat_max_rows)
+		b->stat_max_rows = rows;
+	if (rows < b->stat_min_rows && rows > 0)
+		b->stat_min_rows = rows;
+}
+
+static inline double
+TupleBatchAvgRows(TupleBatch *b)
+{
+	if (b->stat_batches == 0)
+		return 0.0;
+
+	return (double) b->stat_rows / b->stat_batches;
+}
+
 #endif	/* EXECBATCH_H */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index ffe470f2b84..0af02db3760 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -64,6 +64,7 @@ typedef enum InstrumentOption
 	INSTRUMENT_BUFFERS = 1 << 1,	/* needs buffer usage */
 	INSTRUMENT_ROWS = 1 << 2,	/* needs row count */
 	INSTRUMENT_WAL = 1 << 3,	/* needs WAL usage */
+	INSTRUMENT_BATCHES = 1 << 4, /* needs batches */
 	INSTRUMENT_ALL = PG_INT32_MAX
 } InstrumentOption;
 
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 7c1f26b182c..fef3b4a5497 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -822,3 +822,60 @@ select explain_filter('explain (analyze,buffers off,costs off) select sum(n) ove
 (9 rows)
 
 reset work_mem;
+-- Test BATCHES option
+set executor_batch_rows = 64;
+create table batch_test (a int, b text);
+insert into batch_test select i, repeat('x', 100) from generate_series(1, 10000) i;
+analyze batch_test;
+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+                         explain_filter                         
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+   Batches: N  Avg Rows: N.N  Max: N  Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
+
+-- With filter
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test where a > 5000');
+                         explain_filter                         
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+   Filter: (a > N)
+   Rows Removed by Filter: N
+   Batches: N  Avg Rows: N.N  Max: N  Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(6 rows)
+
+-- With LIMIT - partial scan shows fewer batches
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test limit 100');
+                            explain_filter                            
+----------------------------------------------------------------------
+ Limit (actual time=N.N..N.N rows=N.N loops=N)
+   ->  Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+         Batches: N  Avg Rows: N.N  Max: N  Min: N
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(5 rows)
+
+-- Batching disabled - no batch line
+set executor_batch_rows = 0;
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+                         explain_filter                         
+----------------------------------------------------------------
+ Seq Scan on batch_test (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(3 rows)
+
+reset executor_batch_rows;
+-- JSON format
+select explain_filter_to_json('explain (analyze, batches, buffers off, format json) select * from batch_test where a < 1000') #> '{0,Plan,Batches}';
+ ?column? 
+----------
+ 0
+(1 row)
+
+drop table batch_test;
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index ebdab42604b..87bb179ced9 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -188,3 +188,29 @@ select explain_filter('explain (analyze,buffers off,costs off) select sum(n) ove
 -- Test tuplestore storage usage in Window aggregate (memory and disk case, final result is disk)
 select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over(partition by m) from (SELECT n < 3 as m, n from generate_series(1,2500) a(n))');
 reset work_mem;
+
+-- Test BATCHES option
+set executor_batch_rows = 64;
+
+create table batch_test (a int, b text);
+insert into batch_test select i, repeat('x', 100) from generate_series(1, 10000) i;
+analyze batch_test;
+
+-- Basic batch stats output
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+
+-- With filter
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test where a > 5000');
+
+-- With LIMIT - partial scan shows fewer batches
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test limit 100');
+
+-- Batching disabled - no batch line
+set executor_batch_rows = 0;
+select explain_filter('explain (analyze, batches, buffers off, costs off) select * from batch_test');
+reset executor_batch_rows;
+
+-- JSON format
+select explain_filter_to_json('explain (analyze, batches, buffers off, format json) select * from batch_test where a < 1000') #> '{0,Plan,Batches}';
+
+drop table batch_test;
-- 
2.47.3



  [text/x-sh] bar_limit.sh (1.7K, 6-bar_limit.sh)
  download | inline:
home=$HOME
master=$home/pg/install/master-opt/bin
patched=$home/pg/install/patched-opt/bin
master_data=$home/pg/data/master
patched_data=$home/pg/data/patched
logdir=$home/pg/log

# master
export PATH=$master:$PATH
which postgres
pg_ctl -D  $master_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $master_data -l $logdir/pg_master_log stop

export PATH=$patched:$PATH;
which postgres
echo "executor_batch_rows=0" >> $patched_data/postgresql.conf
echo "executor_batch_rows=0"
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

which postgres
echo "executor_batch_rows=64" >> $patched_data/postgresql.conf
echo "executor_batch_rows=64"
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

  [text/x-sh] bar_limit_where_o.sh (1.7K, 7-bar_limit_where_o.sh)
  download | inline:
home=$HOME
master=$home/pg/install/master-opt/bin
patched=$home/pg/install/patched-opt/bin
master_data=$home/pg/data/master
patched_data=$home/pg/data/patched
logdir=$home/pg/log

# master
export PATH=$master:$PATH
which postgres
pg_ctl -D  $master_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where o > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $master_data -l $logdir/pg_master_log stop

export PATH=$patched:$PATH;
which postgres
echo "executor_batch_rows=0" >> $patched_data/postgresql.conf
echo "executor_batch_rows=0"
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where o > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

which postgres
echo "executor_batch_rows=64" >> $patched_data/postgresql.conf
echo "executor_batch_rows=64"
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where o > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

  [text/x-sh] bar_limit_where_a.sh (1.7K, 8-bar_limit_where_a.sh)
  download | inline:
home=$HOME
master=$home/pg/install/master-opt/bin
patched=$home/pg/install/patched-opt/bin
master_data=$home/pg/data/master
patched_data=$home/pg/data/patched
logdir=$home/pg/log

# master
export PATH=$master:$PATH
which postgres
pg_ctl -D  $master_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where a > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $master_data -l $logdir/pg_master_log stop

export PATH=$patched:$PATH;
which postgres
echo "executor_batch_rows=0" >> $patched_data/postgresql.conf
echo "executor_batch_rows=0";
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where a > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

which postgres
echo "executor_batch_rows=64" >> $patched_data/postgresql.conf
echo "executor_batch_rows=64"
pg_ctl -D  $patched_data -l $logdir/pg_master_log start

for i in 1000000 2000000 3000000 4000000 5000000 10000000; do
	psql -c "select pg_prewarm('bar_$i')" > /dev/null 2>&1
	psql -c "vacuum bar_$i" > /dev/null 2>&1
	printf "%s\t" "$i"
	echo "select * from bar_$i where a > 0 limit 1 offset $i" > /tmp/bar_limit.sql
	pgbench -n -T5 -f /tmp/bar_limit.sql | grep latency
done

pg_ctl -D  $patched_data -l $logdir/pg_master_log stop

view thread (22+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Batching in executor
  In-Reply-To: <CA+HiwqEZja5rJ78p3FBDZNvynWsHwanxyt6h0YaK_r84NemXng@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox