Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Boris Mironov <[email protected]>
To: Fujii Masao <[email protected]>
Cc: Madyshev Egor <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Date: Fri, 20 Mar 2026 02:59:01 +0000
Message-ID: <PH0PR08MB70208B8F2C961EC78C092772884CA@PH0PR08MB7020.namprd08.prod.outlook.com> (raw)
In-Reply-To: <CAHGQGwGKAr7GFD5mS3aGi4W4KpmP9wVkbYRWEHOQM8u18qX9Cg@mail.gmail.com>
References: <OS9PR01MB121499E64344675A86179219CF579A@OS9PR01MB12149.jpnprd01.prod.outlook.com>
	<PH0PR08MB7020F6EA8877834CCDD5FABB8879A@PH0PR08MB7020.namprd08.prod.outlook.com>
	<[email protected]>
	<SA1PR08MB702308B377B110460FCDADA18847A@SA1PR08MB7023.namprd08.prod.outlook.com>
	<[email protected]>
	<PH0PR08MB7020956FB86755AD6B4E2DCF8844A@PH0PR08MB7020.namprd08.prod.outlook.com>
	<CAHGQGwGKAr7GFD5mS3aGi4W4KpmP9wVkbYRWEHOQM8u18qX9Cg@mail.gmail.com>

Dear Fujii Masao,

Thank you for your interest in this patch.

For convenience I duplicate my results from previous post and
reformat your results below:

pgbench (PostgreSQL) 19devel / client-side generation
Mode
Scale | 1 | 2 | 10 | 20 | 100 | 200 | 1000 | 2000
Mc | 0.15 | 0.29 | 1.39 | 2.93 | 14.79 | 30.78 | 161.52 | 330.63
Sc | 0.13 | 0.27 | 1.37 | 2.69 | 14.71 | 29.99 | 152.83 | 298.88
Mg | 0.17 | 0.30 | 1.58 | 3.26 | 15.91 | 31.95 | 160.31 | 326.46
Sg | 0.20 | 0.38 | 1.66 | 3.39 | 18.72 | 36.26 | 176.54 | 351.66

pgbench (PostgreSQL) 19devel / server-side generation
  Mode
Scale  |  1   |  2   |  10  |  20  |  100  |  200  |  1000  |  2000
   MU | 0.22 | 0.47 | 2.35 | 4.71 | 24.80 | 53.19 | 261.44 | 536.05
   SU | 0.22 | 0.44 | 2.35 | 4.78 | 24.42 | 49.16 | 246.41 | 495.51
   MG | 0.22 | 0.45 | 2.29 | 4.44 | 24.03 | 50.21 | 256.54 | 544.17
   SG | 0.23 | 0.43 | 2.35 | 5.04 | 27.74 | 52.72 | 250.27 | 492.77


Fujii's single run results / client-side generation
  Mode
Scale |  100  |  1000
Sc    | 18.31 | 204.47
Sg    | 13.38 | 151.81

Fujii's single run results / server-side generation
  Mode
Scale |  100  |  1000
SU    | 23.16 | 249.07
SG    | 21.30 | 230.93

> I ran some tests on my MacBook (server and pgbench on the same machine)

In my environment I ran pgbench on different virtual machine from DB
server and win of COPY BINARY over COPY TEXT was clear. All results are
average of 5 runs. You can use "-Idtccccc" parameter to run 5 times
same init mode. By default it will be "single transaction for whole data
set". Otherwise "-IdtMccccc" will execute init in "one transaction per
scale" mode 5 times. Could you please rerun your tests with these
tweaks?

> So I'm wondering if there are specific conditions where binary mode or unnest()
> performs better. If not, it may not be worth supporting these additional data
> loading options...

Besides introduction of new data generation modes this patch prepares ground
for multi-threaded data initialization with introduction of "one COMMIT per scale"
mode. By the way, all of your tests were executed in old (default) "single transaction
for whole data set" mode.

For addition of UNNEST init mode, it was added for PostgreSQL forks like TigerDB
that add columnar mode of data storage. This is quite useful for OLAP systems as
much more efficient way of data retrieval of few columns from wide tables. These
are also quite efficient as databases for monitoring systems like Prometheus.

Best regards,
Boris

________________________________
From: Fujii Masao <[email protected]>
Sent: March 19, 2026 11:29 AM
To: Boris Mironov <[email protected]>
Cc: Madyshev Egor <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; [email protected] <[email protected]>
Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

On Thu, Mar 12, 2026 at 11:28 PM Boris Mironov
<[email protected]> wrote:
>
> Hi Egor,
>
> Thank you very much for your time reviewing this patch
> and guiding through some of its inefficiencies.

I like the idea of improving the performance of the initial data load
in pgbench. That's definitely useful.

I ran some tests on my MacBook (server and pgbench on the same machine)
using different data loading options. Here are the results:

------------------------------------------------
[ Client / Text mode ]
pgbench -i -Idtg -s100
done in 13.38 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 13.38 s).

pgbench -i -Idtg -s1000
done in 151.81 s (drop tables 0.00 s, create tables 0.01 s,
client-side generate 151.81 s).

[ Client / Binary mode ]
pgbench -i -Idtc -s100
done in 18.32 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 18.31 s).

pgbench -i -Idtc -s1000
done in 204.48 s (drop tables 0.00 s, create tables 0.01 s,
client-side generate 204.47 s).

[ Server / generate_series ]
pgbench -i -IdtG -s100
done in 21.30 s (drop tables 0.00 s, create tables 0.00 s, server-side
generate 21.30 s).

pgbench -i -IdtG -s1000
done in 230.94 s (drop tables 0.00 s, create tables 0.01 s,
server-side generate 230.93 s).

[ Server / Unnest ]
pgbench -i -IdtU -s100
done in 23.16 s (drop tables 0.00 s, create tables 0.00 s, server-side
generate 23.16 s).

pgbench -i -IdtU -s1000
done in 249.08 s (drop tables 0.00 s, create tables 0.01 s,
server-side generate 249.07 s).
------------------------------------------------

In my tests, text mode was faster than binary mode. Also, on the server side,
generate_series() was faster than unnest().

So I'm wondering if there are specific conditions where binary mode or unnest()
performs better. If not, it may not be worth supporting these additional data
loading options...

Regards,


--
Fujii Masao

view thread (21+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
  In-Reply-To: <PH0PR08MB70208B8F2C961EC78C092772884CA@PH0PR08MB7020.namprd08.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox