public inbox for [email protected]  
help / color / mirror / Atom feed
From: Konstantin Knizhnik <[email protected]>
To: Peter Geoghegan <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 25 Dec 2025 17:39:24 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAH2-Wzm7-QuDOs6TcqfhhDsGEZCuHtn=D-SriOTnTZ_fiXNBvA@mail.gmail.com>
References: <[email protected]>
	<qdl4fojnbfcnm2k7b4zpvgd6gwzwdgtbl5c7shpimrb76dbyy6@scdnspus3ejh>
	<bpdeohyqvltb77viyft4bza4xc4peed3jcoep74d2ih6ynqlke@wbnhcwmq3ril>
	<CAH2-WznFdjY_OB2S7_BY4iAyeffK+XrE2qsX6aghgP63VocRfQ@mail.gmail.com>
	<4zeu5yb73byiquvf3eefsunnrydyqfxy3eup66jrliutrtd4xl@5iifjey4n5m5>
	<[email protected]>
	<CAH2-WzkAC_gWQgc3MV3MWBx2EUaE4TfPk8XQPkJgs1R_qNvo=A@mail.gmail.com>
	<CAH2-WznLRnUZHC4kmR+J3MgGpN0iTUaZ-4xeSHPiaYR=y3C8Og@mail.gmail.com>
	<CAH2-WzkHK0++yn3_nmWL2rWzXiu8Qc=2=NPQDG4nfVjvYkRDrQ@mail.gmail.com>
	<CAH2-Wzk9=x=a2TbcqYcX+XXmDHQr5=1v9m4Z_v8a-KwF1Zoz0A@mail.gmail.com>
	<CAH2-WzmYqhacBH161peAWb5eF=Ja7CFAQ+0jSEMq=qnfLVTOOg@mail.gmail.com>
	<CAH2-Wzmm+mXuv_r+eYgcwOKZzNSmbYoEmxi5mdKMUtkmYa_qXw@mail.gmail.com>
	<CAH2-WzkC_EdGB_tdEQr63BiOuJbcvZiV=xO+OJYux4wOaxPD8w@mail.gmail.com>
	<CAH2-Wzn8whv-RE4E5L2WQB_Ha_jy0UReJq1yypu83gDYHeXq5g@mail.gmail.com>
	<CAH2-Wzm7-QuDOs6TcqfhhDsGEZCuHtn=D-SriOTnTZ_fiXNBvA@mail.gmail.com>


On 21/12/2025 7:55 PM, Peter Geoghegan wrote:
> On Wed, Dec 10, 2025 at 9:21 PM Peter Geoghegan <[email protected]> wrote:
>> Attached is v4.
> Attached is v5. Changes from v4:
>
> * Simplified and optimized index-only scans, with a particular
> emphasis on avoiding regressions with nested loop joins with an inner
> index-only scan.
>
> There were quite a number of small problems/dead code related to
> index-only scans fixed by this new v5. Overall, I'm quite a bit
> happier with the state of index-only scans, which I'd not paid too
> much attention to before now.
>
> * Added Valgrind instrumentation to the hash index patch, which was
> required to fix some false positives.
>
> The generic indexam_util_batch_unlock routine had Valgrind
> instrumentation in earlier versions, just to keep nbtree's buffer
> locking checks from generating similar false positives. Some time
> later, when I added the hashgetbatch patch, there were new Valgrind
> false positives during hash index scans -- which I missed at first.
> This new v5 revisions adds similar Valgrind checks to hash itself
> (changes that add code that is more or less a direct port of the stuff
> added to nbtree by commit 4a70f829), which fixes the false positives,
> and is independently useful.
>
> The rule for amgetbatch-based index AMs is that they must have similar
> buffer locking instrumentation. That seems like a good thing.
>
> --
> Peter Geoghegan

I the previous mail I shared results of my experiments with different 
prefetch distance.
I think that we should start prefetching of heap tuples not from the 
second batch, but after some number of proceeded tids.

Attached please find a patch which implements this approach.
And below are updated results:

limit\prefetch    on      off   always  inc    threshold
1                 12074   12765  3146    3282     12394
2                 5912    6198   2463    2438      6124
4                 2919    3047   1334    1964      2910
8                 1554    1496   1166    1409      1588
16                815     775    947     940        600
32                424     403    687     695        478
64                223     208    446     453        358
128               115     106    258     270        232
256               68      53     138     149        131
512               43      27     72      78          71
1024              28      13     38      40          38

Last column is result of prefetch with read_stream_threshold=10.





diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b9d42b15a18..aac663edfdd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -36,6 +36,7 @@
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -46,6 +47,8 @@
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
+int read_stream_threshold = DEFAULT_READ_STREAM_THRESHOLD;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
 									 Relation OldHeap, Relation NewHeap,
 									 Datum *values, bool *isnull, RewriteState rwstate);
@@ -229,6 +232,7 @@ heap_batch_advance_pos(IndexScanDesc scan, struct BatchQueueItemPos *pos,
 					   ScanDirection direction)
 {
 	BatchIndexScan batch;
+	int proceed_items;
 
 	/* make sure we have batching initialized and consistent */
 	batch_assert_batches_valid(scan);
@@ -288,6 +292,24 @@ heap_batch_advance_pos(IndexScanDesc scan, struct BatchQueueItemPos *pos,
 	 */
 	batch = INDEX_SCAN_BATCH(scan, pos->batch);
 
+	proceed_items = ScanDirectionIsForward(direction)
+		? pos->item - batch->firstItem
+		: batch->lastItem - pos->item;
+
+	/* Delay initializing stream until proceeding */
+	if (proceed_items >= read_stream_threshold
+		&& !scan->xs_heapfetch->rs
+		&& !scan->batchqueue->disabled
+		&& !scan->xs_want_itup	/* XXX prefetching disabled for IoS, for now */
+		&& enable_indexscan_prefetch)
+	{
+		scan->xs_heapfetch->rs =
+			read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
+									   scan->heapRelation, MAIN_FORKNUM,
+									   scan->heapRelation->rd_tableam->index_getnext_stream,
+									   scan, 0);
+	}
+
 	if (ScanDirectionIsForward(direction))
 	{
 		if (++pos->item <= batch->lastItem)
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 8f6fa6843cb..0c2081e32ba 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2322,6 +2322,15 @@
   max => 'DBL_MAX',
 },
 
+{ name => 'read_stream_threshold', type => 'int', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+  short_desc => 'Minimal number of heap tuples for creation read stream.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'read_stream_threshold',
+  boot_val => 'DEFAULT_READ_STREAM_THRESHOLD',
+  min => '0',
+  max => 'INT_MAX',
+},
+
 { name => 'recovery_end_command', type => 'string', context => 'PGC_SIGHUP', group => 'WAL_ARCHIVE_RECOVERY',
   short_desc => 'Sets the shell command that will be executed once at the end of recovery.',
   variable => 'recoveryEndCommand',
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 00f4c3d0011..97150433c99 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -28,6 +28,7 @@
 #define DEFAULT_CPU_OPERATOR_COST  0.0025
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0
+#define DEFAULT_READ_STREAM_THRESHOLD	10
 
 /* defaults for non-Cost parameters */
 #define DEFAULT_RECURSIVE_WORKTABLE_FACTOR  10.0
@@ -72,6 +73,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
 extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT int read_stream_threshold;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
 								  double index_pages, PlannerInfo *root);


Attachments:

  [text/plain] read_stream_threshold.patch (3.5K, 2-read_stream_threshold.patch)
  download | inline diff:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b9d42b15a18..aac663edfdd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -36,6 +36,7 @@
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
+#include "optimizer/cost.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/bufpage.h"
@@ -46,6 +47,8 @@
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
+int read_stream_threshold = DEFAULT_READ_STREAM_THRESHOLD;
+
 static void reform_and_rewrite_tuple(HeapTuple tuple,
 									 Relation OldHeap, Relation NewHeap,
 									 Datum *values, bool *isnull, RewriteState rwstate);
@@ -229,6 +232,7 @@ heap_batch_advance_pos(IndexScanDesc scan, struct BatchQueueItemPos *pos,
 					   ScanDirection direction)
 {
 	BatchIndexScan batch;
+	int proceed_items;
 
 	/* make sure we have batching initialized and consistent */
 	batch_assert_batches_valid(scan);
@@ -288,6 +292,24 @@ heap_batch_advance_pos(IndexScanDesc scan, struct BatchQueueItemPos *pos,
 	 */
 	batch = INDEX_SCAN_BATCH(scan, pos->batch);
 
+	proceed_items = ScanDirectionIsForward(direction)
+		? pos->item - batch->firstItem
+		: batch->lastItem - pos->item;
+
+	/* Delay initializing stream until proceeding */
+	if (proceed_items >= read_stream_threshold
+		&& !scan->xs_heapfetch->rs
+		&& !scan->batchqueue->disabled
+		&& !scan->xs_want_itup	/* XXX prefetching disabled for IoS, for now */
+		&& enable_indexscan_prefetch)
+	{
+		scan->xs_heapfetch->rs =
+			read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
+									   scan->heapRelation, MAIN_FORKNUM,
+									   scan->heapRelation->rd_tableam->index_getnext_stream,
+									   scan, 0);
+	}
+
 	if (ScanDirectionIsForward(direction))
 	{
 		if (++pos->item <= batch->lastItem)
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 8f6fa6843cb..0c2081e32ba 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2322,6 +2322,15 @@
   max => 'DBL_MAX',
 },
 
+{ name => 'read_stream_threshold', type => 'int', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+  short_desc => 'Minimal number of heap tuples for creation read stream.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'read_stream_threshold',
+  boot_val => 'DEFAULT_READ_STREAM_THRESHOLD',
+  min => '0',
+  max => 'INT_MAX',
+},
+
 { name => 'recovery_end_command', type => 'string', context => 'PGC_SIGHUP', group => 'WAL_ARCHIVE_RECOVERY',
   short_desc => 'Sets the shell command that will be executed once at the end of recovery.',
   variable => 'recoveryEndCommand',
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 00f4c3d0011..97150433c99 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -28,6 +28,7 @@
 #define DEFAULT_CPU_OPERATOR_COST  0.0025
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0
+#define DEFAULT_READ_STREAM_THRESHOLD	10
 
 /* defaults for non-Cost parameters */
 #define DEFAULT_RECURSIVE_WORKTABLE_FACTOR  10.0
@@ -72,6 +73,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
 extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT int read_stream_threshold;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
 								  double index_pages, PlannerInfo *root);


view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox