public inbox for [email protected]  
help / color / mirror / Atom feed
From: Thomas Munro <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Tue, 12 Aug 2025 17:06:47 +1200
Message-ID: <CA+hUKGKMaZLmNQHaa_DZMw9MJJKGegjrqnTY3KOZB-_nvFa3wQ@mail.gmail.com> (raw)
In-Reply-To: <CAH2-Wz=Y-PsC6_tZOPhHWvPx0geGnrh9VKjUZ-168ezUM_XM2Q@mail.gmail.com>
References: <[email protected]>
	<[email protected]>
	<CAH2-WzmER9kc4OtmkDh+h51QV=v6Yc5BGsJikwJHtucf1C1HWw@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=0enySZ5g0k0BLY3tHRs=wyG=7yXDYP=Abt=6GM=7XkQ@mail.gmail.com>
	<[email protected]>
	<CAH2-WzkX2fwtiNOX4RrBR8=XKta999NM_5+ghTUnwUBkeyfcHQ@mail.gmail.com>
	<[email protected]>
	<CAH2-WzkPh+L2u8_4jG=NgGgzFNqW7ZZhSxGb6mJR=2YdouL1_Q@mail.gmail.com>
	<[email protected]>
	<xhe4l63oxs36sunvqbf3bzlzrondjyhumm7ywt4arkxu622f52@yriy7rejf5g2>
	<[email protected]>
	<CAH2-WzmdiO4fHA1O06SYUjgMQZG7haysY7Tu5DS5z-CHsv5MLQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=Y-PsC6_tZOPhHWvPx0geGnrh9VKjUZ-168ezUM_XM2Q@mail.gmail.com>

On Tue, Aug 12, 2025 at 11:42 AM Peter Geoghegan <[email protected]> wrote:
> On Mon, Aug 11, 2025 at 5:07 PM Tomas Vondra <[email protected]> wrote:
> > I can do some tests with forward vs. backwards scans. Of course, the
> > trouble with finding these weird cases is that they may be fairly rare.
> > So hitting them is a matter or luck or just happening to generate the
> > right data / query. But I'll give it a try and we'll see.
>
> I was talking more about finding "performance bugs" through a
> semi-directed process of trying random things while looking out for
> discrepancies. Something like that shouldn't require the usual
> "benchmarking rigor", since suspicious inconsistencies should be
> fairly obvious once encountered. I expect similar queries to have
> similar performance, regardless of superficial differences such as
> scan direction, DESC vs ASC column order, etc.

I'd be interested to hear more about reverse scans.  Bilal was
speculating about backwards I/O combining in read_stream.c a while
back, but we didn't have anything interesting to use it yet.  You'll
probably see a flood of uncombined 8KB IOs in the pg_aios view while
travelling up the heap with cache misses today.  I suspect Linux does
reverse sequential prefetching with buffered I/O (less sure about
other OSes) which should help but we'd still have more overheads than
we could if we combined them, not to mention direct I/O.

Not tested, but something like this might do it:

                /* Can we merge it with the pending read? */
-               if (stream->pending_read_nblocks > 0 &&
-                       stream->pending_read_blocknum +
stream->pending_read_nblocks == blocknum)
+               if (stream->pending_read_nblocks > 0)
                {
-                       stream->pending_read_nblocks++;
-                       continue;
+                       if (stream->pending_read_blocknum +
stream->pending_read_nblocks ==
+                               blocknum)
+                       {
+                               stream->pending_read_nblocks++;
+                               continue;
+                       }
+                       else if (stream->pending_read_blocknum ==
blocknum + 1 &&
+                                        stream->forwarded_buffers == 0)
+                       {
+                               stream->pending_read_blocknum--;
+                               stream->pending_read_nblocks++;
+                               continue;
+                       }
                }

> I tested this issue again (using my original pgbench_account query),
> having rebased on top of HEAD as of today. I found that the
> inconsistency seems to be much smaller now -- so much so that I don't
> think that the remaining inconsistency is particularly suspicious.
>
> I also think that performance might have improved across the board. I
> see that the same TPC-C query that took 768.454 ms a few weeks back
> now takes only 617.408 ms. Also, while I originally saw "I/O Timings:
> shared read=138.856" with this query, I now see "I/O Timings: shared
> read=46.745". That feels like a performance bug fix to me.
>
> I wonder if today's commit b4212231 from Thomas ("Fix rare bug in
> read_stream.c's split IO handling") fixed the issue, without anyone
> realizing that the bug in question could manifest like this.

I can't explain that.  If you can consistently reproduce the change at
the two base commits, maybe bisect?  If it's a real phenomenon I'm
definitely curious to know what you're seeing.





view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CA+hUKGKMaZLmNQHaa_DZMw9MJJKGegjrqnTY3KOZB-_nvFa3wQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox