Re: index prefetching - Andres Freund

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andres Freund <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Alexandre Felipe <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 26 Feb 2026 23:18:03 -0500
Message-ID: <issqornf6vdn3vb64fjuoathypmu3e5pgputd3lpfuvoeqyvzr@qfordnhplp2v> (raw)
In-Reply-To: <CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com>
References: <CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com>
	<64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl>
	<a67mvhyi2q45eg4eimhpwdg6l3s3dmpahti2svffvmvzwmss27@r4nohusvndbq>
	<[email protected]>
	<il7jtfowpatrlg33qb5plj7v7pferes4ogerq5fdczszi4kokh@sbwvb2ukfgos>
	<[email protected]>
	<ws47e3wly6skt36b23zy5qfvcxzueo6od3uicunuodsqnxl7os@7v2qi7qkxzbz>
	<CAH2-Wzk-89uCvdJ1Q6NsM6LvDvUEt6Qy66T6A60J=D_voWxZDg@mail.gmail.com>
	<64mfcfv7iihc4pmqlxarii4esnmqry52ckz5m7lmwylnfnuxuz@oxh4ioxkjtep>
	<CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com>

Hi,

On 2026-02-24 13:13:25 -0500, Peter Geoghegan wrote:
> > Plausible.  It could be that we could get away with controlling the rampup to
> > be slower in potentially problematic cases, without needing the yielding, but
> > not sure.
> 
> Attached is v11, which makes the read stream yielding mechanism better
> cooperate with index prefetching, so as to avoid interefering with
> io_combine_limit. This should deal with the odd performance that you
> complained about. See
> v11-0006-Introduce-read_stream_-pause-resume-yield.patch (and the
> later prefetching patch
> v11-0007-Add-heapam-index-scan-I-O-prefetching.patch) for details.
> 
> The whole idea of measuring "batch distance" is gone in this version,
> though we do still only consider whether now is a good time to yield
> at "batch boundaries". We always refuse yield on the first few batches
> of the scan, so the idea of caring about batch boundaries is still
> there, albeit in a much more limited form.

I'm planning to do some reviewing in the next days. In preparation I just
retried a benchmark and saw some odd results.  After a while I was able to
reproduce even with a simpler setup:

-c shared_buffers=2GB -c debug_io_direct=data -c io_method=io_uring


pgbench -i -q -s 100 --fillfactor=90
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.43..441511.11 rows=10000045 width=97) (actual time=0.308..6101.837 rows=10000000.00 loops=1) │
│   Index Searches: 1                                                                                                                                              │
│   Buffers: shared hit=27325 read=181819                                                                                                                          │
│   I/O Timings: shared read=4538.003                                                                                                                              │
│ Planning Time: 0.041 ms                                                                                                                                          │
│ Execution Time: 6433.192 ms                                                                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

pgbench -i -q -s 100 --fillfactor=50
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                           QUERY PLAN                                                                            │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.43..593022.41 rows=9999798 width=97) (actual time=0.131..3973.698 rows=10000000.00 loops=1) │
│   Index Searches: 1                                                                                                                                             │
│   Buffers: shared hit=19239 read=341420                                                                                                                         │
│   I/O Timings: shared read=1752.057                                                                                                                             │
│ Planning:                                                                                                                                                       │
│   Buffers: shared hit=42 read=15                                                                                                                                │
│ Planning Time: 1.668 ms                                                                                                                                         │
│ Execution Time: 4308.182 ms                                                                                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

pgbench -i -q -s 100 --fillfactor=25
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.43..926358.51 rows=10000005 width=97) (actual time=0.112..3259.362 rows=10000000.00 loops=1) │
│   Index Searches: 1                                                                                                                                              │
│   Buffers: shared hit=9610 read=684382                                                                                                                           │
│   I/O Timings: shared read=242.259                                                                                                                               │
│ Planning:                                                                                                                                                        │
│   Buffers: shared hit=18                                                                                                                                         │
│ Planning Time: 0.097 ms                                                                                                                                          │
│ Execution Time: 3594.782 ms                                                                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


Note how the increase in scanned heap pages actually *decreases* the overall
time rather substantially.

It's quite visible, both in iostat, and a query like
  SELECT pid, target_desc, off, length FROM pg_aios \watch 0.5

that for the first query has basically no IO concurrency, the second has very
intermittent IO concurrency and the third one has nice IO concurrency.


If I disable the yield logic, the fillfactor=90 case is good:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                            QUERY PLAN                                                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts  (cost=0.43..441511.11 rows=10000045 width=97) (actual time=0.470..1662.331 rows=10000000.00 loops=1) │
│   Index Searches: 1                                                                                                                                              │
│   Buffers: shared hit=27325 read=181819                                                                                                                          │
│   I/O Timings: shared read=21.113                                                                                                                                │
│ Planning Time: 0.043 ms                                                                                                                                          │
│ Execution Time: 1995.723 ms                                                                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


Of course this is a silly query, but you'd also see that with a mergejoin or
such.

Greetings,

Andres Freund

view thread (87+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <issqornf6vdn3vb64fjuoathypmu3e5pgputd3lpfuvoeqyvzr@qfordnhplp2v>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox