public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: Michael Paquier <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Subject: Re: Streamify more code paths
Date: Tue, 10 Mar 2026 21:23:26 +0800
Message-ID: <CABPTF7XD51Qx2043p80npKmYEd67qMagK5AW=s6LNXyZt5s2nw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAN55FZ02eR083kPV_8_boWEJphXZW=-hRxJKp7nwR-WomyKb6g@mail.gmail.com>
<CABPTF7VSa5L=k6ONVUZHfRrO2Y2_iYz6npWj0Na69RoCvSevpQ@mail.gmail.com>
<CABPTF7V3+QGC+0W9ERCcAY14jq_w_XvmwrRs9vXbi_oqv4FnTQ@mail.gmail.com>
<CABPTF7VyePb8O-WDgs2hCCXYhZzGzdjg0N3NkxojZ=ke4SB3pA@mail.gmail.com>
<CAN55FZ39HSsXKTSi66ASq+i4Ed5FuGXD11hmJ+8c0F0O0+ozew@mail.gmail.com>
<CABPTF7Vd4JWSHi9N7pGTzn6xmOdtAToCe1NGbZAH8U9_mXOqpw@mail.gmail.com>
<CABPTF7W-f_zPN442FCp4Xaopi721oDmGYimq=VhAk=F7jwYZDQ@mail.gmail.com>
<CABPTF7VUaRnvsXqa+628YkuR4oPVRr1mR2seXTkxabfiqQ3NHw@mail.gmail.com>
<CABPTF7VtSYmC5LZSnkJWYn9PCkxgOJd9QbtAM79qftBK-fbA4w@mail.gmail.com>
<CABPTF7UVCkub6jFXVk-qrYd4xjgiwRt1FTFL2=rBVV9SYcgfkQ@mail.gmail.com>
<[email protected]>
Hi Michael,
On Tue, Mar 10, 2026 at 6:28 PM Michael Paquier <[email protected]> wrote:
>
> On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote:
> > Here’s v5 of the patchset. The wal_logging_large patch has been
> > removed, as no performance gains were observed in the benchmark runs.
>
> Looking at the numbers you are posting, it is harder to get excited
> about the hash, gin, bloom_vacuum and wal_logging. The worker method
> seems more efficient, may show that we are out of noise level.
> The results associated to pgstattuple and the bloom scans are on a
> different level for the three methods.
>
> Saying that, it is really nice that you have sent the benchmark. The
> measurement method looks in line with the goal here after review (IO
> stats, calculations), and I have taken some time to run it to get an
> idea of the difference for these five code paths, as of (slightly
> edited the script for my own environment, result is the same):
> ./run_streaming_benchmark --baseline --io-method=io_uring/worker
>
> I am not much interested in the sync case, so I have tested the two
> other methods:
>
> 1) method=IO-uring
> bloom_scan_large base= 725.3ms patch= 99.9ms 7.26x
> ( 86.2%) (reads=19676->1294, io_time=688.36->33.69ms)
> bloom_vacuum_large base= 7414.9ms patch= 7455.2ms 0.99x
> ( -0.5%) (reads=48361->11597, io_time=459.02->257.51ms)
> pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x
> ( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms)
> gin_vacuum_large base= 3546.8ms patch= 2317.9ms 1.53x
> ( 34.6%) (reads=20734->17735, io_time=3244.40->2021.53ms)
> hash_vacuum_large base= 12268.5ms patch= 11751.1ms 1.04x
> ( 4.2%) (reads=76677->15606, io_time=1483.10->315.03ms)
> wal_logging_large base= 33713.0ms patch= 32773.9ms 1.03x
> ( 2.8%) (reads=21641->21641, io_time=81.18->77.25ms)
>
> 2) method=worker io-workers=3
> bloom_scan_large base= 725.0ms patch= 465.7ms 1.56x
> ( 35.8%) (reads=19676->1294, io_time=688.70->52.20ms)
> bloom_vacuum_large base= 7138.3ms patch= 7156.0ms 1.00x
> ( -0.2%) (reads=48361->11597, io_time=284.56->64.37ms)
> pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x
> ( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms)
> gin_vacuum_large base= 3769.4ms patch= 3716.7ms 1.01x
> ( 1.4%) (reads=20775->17684, io_time=3562.21->3528.14ms)
> hash_vacuum_large base= 11750.1ms patch= 11289.0ms 1.04x
> ( 3.9%) (reads=76677->15606, io_time=1296.03->98.72ms)
> wal_logging_large base= 32862.3ms patch= 33179.7ms 0.99x
> ( -1.0%) (reads=21641->21641, io_time=91.42->90.59ms)
>
> The bloom scan case is a winner in runtime for both cases, and in
> terms of stats we get much better numbers for all of them. These feel
> rather in line with what you have, except for pgstattuple's runtime,
> still its IO numbers feel good.
Thanks for running the benchmarks! The performance gains for hash,
gin, bloom_vacuum, and wal_logging is insignificant, likely because
these workloads are not I/O-bound. The default number of I/O workers
is three, which is fairly conservative. When I ran the benchmark
script with a higher number of I/O workers, some runs showed improved
performance.
> pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x
> ( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms)
> pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x
> ( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms)
Yeah, this looks somewhat strange. The io_time has been reduced
significantly, which should also lead to a substantial reduction in
runtime.
method=io_uring
pgstattuple_large base= 5551.5ms patch= 3498.2ms 1.59x
( 37.0%) (reads=206945→12983, io_time=2323.49→207.14ms)
I ran the benchmark for this test again with io_uring, and the result
is consistent with previous runs. I’m not sure what might be
contributing to this behavior.
Another code path that showed significant performance improvement is
pgstatindex [1]. I've incorporated the test into the script too. Here
are the results from my testing:
method=worker io-workers=12
pgstatindex_large base= 233.8ms patch= 54.1ms 4.32x
( 76.8%) (reads=27460→1757, io_time=213.94→6.31ms)
method=io_uring
pgstatindex_large base= 224.2ms patch= 56.4ms 3.98x
( 74.9%) (reads=27460→1757, io_time=204.41→4.88ms)
>That's just to say that I'll review
> them and try to do something about at least some of the pieces for
> this release.
Thanks for that.
[1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mai...
--
Best,
Xuneng
Attachments:
[application/x-patch] v6-0001-Use-streaming-read-in-pgstatindex-functions.patch (4.4K, 2-v6-0001-Use-streaming-read-in-pgstatindex-functions.patch)
download | inline diff:
From 2e925f32aada5b5aad4b7a82fe6d76c8db9fb075 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 10 Mar 2026 20:28:16 +0800
Subject: [PATCH v6] Use streaming read API in pgstatindex functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Replace synchronous ReadBufferExtended() loops with the streaming read
API in pgstatindex_impl() and pgstathashindex().
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: wenhui qiu <[email protected]>
Reviewed-by: Shinya Kato <[email protected]>
---
contrib/pgstattuple/pgstatindex.c | 57 ++++++++++++++++++++++++++-----
1 file changed, 48 insertions(+), 9 deletions(-)
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index ef723af1f19..41cafe8559a 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -37,6 +37,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/read_stream.h"
#include "utils/rel.h"
#include "utils/varlena.h"
@@ -217,6 +218,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
BlockNumber blkno;
BTIndexStat indexStat;
BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ BlockRangeReadStreamPrivate p;
+ ReadStream *stream;
if (!IS_INDEX(rel) || !IS_BTREE(rel))
ereport(ERROR,
@@ -273,10 +276,26 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
indexStat.fragments = 0;
/*
- * Scan all blocks except the metapage
+ * Scan all blocks except the metapage (0th page) using streaming reads
*/
nblocks = RelationGetNumberOfBlocks(rel);
+ p.current_blocknum = BTREE_METAPAGE + 1;
+ p.last_exclusive = nblocks;
+
+ /*
+ * It is safe to use batchmode as block_range_read_stream_cb takes no
+ * locks.
+ */
+ stream = read_stream_begin_relation(READ_STREAM_FULL |
+ READ_STREAM_USE_BATCHING,
+ bstrategy,
+ rel,
+ MAIN_FORKNUM,
+ block_range_read_stream_cb,
+ &p,
+ 0);
+
for (blkno = 1; blkno < nblocks; blkno++)
{
Buffer buffer;
@@ -285,8 +304,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
- /* Read and lock buffer */
- buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buffer = read_stream_next_buffer(stream, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -322,11 +340,12 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
else
indexStat.internal_pages++;
- /* Unlock and release buffer */
- LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- ReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
}
+ Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+ read_stream_end(stream);
+
relation_close(rel, AccessShareLock);
/*----------------------------
@@ -600,6 +619,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
HashMetaPage metap;
float8 free_percent;
uint64 total_space;
+ BlockRangeReadStreamPrivate p;
+ ReadStream *stream;
/*
* This uses relation_open() and not index_open(). The latter allows
@@ -644,7 +665,23 @@ pgstathashindex(PG_FUNCTION_ARGS)
/* prepare access strategy for this index */
bstrategy = GetAccessStrategy(BAS_BULKREAD);
- /* Start from blkno 1 as 0th block is metapage */
+ /* Scan all blocks except the metapage (0th page) using streaming reads */
+ p.current_blocknum = HASH_METAPAGE + 1;
+ p.last_exclusive = nblocks;
+
+ /*
+ * It is safe to use batchmode as block_range_read_stream_cb takes no
+ * locks.
+ */
+ stream = read_stream_begin_relation(READ_STREAM_FULL |
+ READ_STREAM_USE_BATCHING,
+ bstrategy,
+ rel,
+ MAIN_FORKNUM,
+ block_range_read_stream_cb,
+ &p,
+ 0);
+
for (blkno = 1; blkno < nblocks; blkno++)
{
Buffer buf;
@@ -652,8 +689,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- bstrategy);
+ buf = read_stream_next_buffer(stream, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
page = BufferGetPage(buf);
@@ -698,6 +734,9 @@ pgstathashindex(PG_FUNCTION_ARGS)
UnlockReleaseBuffer(buf);
}
+ Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+ read_stream_end(stream);
+
/* Done accessing the index */
relation_close(rel, AccessShareLock);
--
2.51.0
[application/x-sh] run_streaming_benchmark.sh (28.2K, 3-run_streaming_benchmark.sh)
download
view thread (36+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: Streamify more code paths
In-Reply-To: <CABPTF7XD51Qx2043p80npKmYEd67qMagK5AW=s6LNXyZt5s2nw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox