public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andres Freund <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Wed, 13 Aug 2025 10:44:33 -0400
Message-ID: <c7a77pcyc5eynme376wvyojryijtlieyxsu3bvxp4eiy6au6uf@caniulyi4jr5> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAH2-WzmdiO4fHA1O06SYUjgMQZG7haysY7Tu5DS5z-CHsv5MLQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=Y-PsC6_tZOPhHWvPx0geGnrh9VKjUZ-168ezUM_XM2Q@mail.gmail.com>
	<CA+hUKGKMaZLmNQHaa_DZMw9MJJKGegjrqnTY3KOZB-_nvFa3wQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAH2-Wzko86NwiENCJGtakJ=fOhWpr-Yz-F+1oxgv2Ku1mvXwvA@mail.gmail.com>
	<[email protected]>

Hi,

On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> In fact, I believe this is about io_method. I initially didn't see the
> difference you described, and then I realized I set io_method=sync to
> make it easier to track the block access. And if I change io_method to
> worker, I get different stats, that also change between runs.
>
> With "sync" I always get this (after a restart):
>
>    Buffers: shared hit=7435 read=52801
>
> while with "worker" I get this:
>
>    Buffers: shared hit=4879 read=52801
>    Buffers: shared hit=5151 read=52801
>    Buffers: shared hit=4978 read=52801
>
> So not only it changes run to tun, it also does not add up to 60236.

This is reproducible on master? If so, how?


> I vaguely recall I ran into this some time ago during AIO benchmarking,
> and IIRC it's due to how StartReadBuffersImpl() may behave differently
> depending on I/O started earlier. It only calls PinBufferForBlock() in
> some cases, and PinBufferForBlock() is what updates the hits.

Hm, I don't immediately see an issue there. The only case we don't call
PinBufferForBlock() is if we already have pinned the relevant buffer in a
prior call to StartReadBuffersImpl().


If this happens only with the prefetching patch applied, is is possible that
what happens here is that we occasionally re-request buffers that already in
the process of being read in? That would only happen with a read stream and
io_method != sync (since with sync we won't read ahead). If we have to start
reading in a buffer that's already undergoing IO we wait for the IO to
complete and count that access as a hit:

	/*
	 * Check if we can start IO on the first to-be-read buffer.
	 *
	 * If an I/O is already in progress in another backend, we want to wait
	 * for the outcome: either done, or something went wrong and we will
	 * retry.
	 */
	if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
	{
...
		/*
		 * Report and track this as a 'hit' for this backend, even though it
		 * must have started out as a miss in PinBufferForBlock(). The other
		 * backend will track this as a 'read'.
		 */
...
		if (persistence == RELPERSISTENCE_TEMP)
			pgBufferUsage.local_blks_hit += 1;
		else
			pgBufferUsage.shared_blks_hit += 1;
...


Greetings,

Andres Freund





view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <c7a77pcyc5eynme376wvyojryijtlieyxsu3bvxp4eiy6au6uf@caniulyi4jr5>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox