public inbox for [email protected]
help / color / mirror / Atom feedFrom: Heikki Linnakangas <[email protected]>
To: lakshmi <[email protected]>
To: Mircea Cadariu <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: parallel data loading for pgbench -i
Date: Tue, 7 Apr 2026 12:00:28 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAEvyyTircZ-tHgap=J6Aog0CBgXp4Dqx6dHYyK1iqgfoT+8D_A@mail.gmail.com>
References: <[email protected]>
<CAEvyyTj0rEsgcQOQgkARbRPbupHR_mc=TUzHBBLNzd8JByUUTw@mail.gmail.com>
<[email protected]>
<OS9PR01MB12149A6CEE200E1A5A88D9F14F563A@OS9PR01MB12149.jpnprd01.prod.outlook.com>
<CAEvyyTiQqd=rv3XUxc0YEaW-feopksBveZKKjVZNeSVG=GrY+A@mail.gmail.com>
<TYRPR01MB121560B291DA3CD262CC7A09AF568A@TYRPR01MB12156.jpnprd01.prod.outlook.com>
<CAEvyyTjPWfvJLn3c_G_zLRffZ3=YqzMYj6c5znaNxpHyZAg3XQ@mail.gmail.com>
<[email protected]>
<CAEvyyTircZ-tHgap=J6Aog0CBgXp4Dqx6dHYyK1iqgfoT+8D_A@mail.gmail.com>
On 18/03/2026 12:37, lakshmi wrote:
> So overall, the benefit of parallel loading is much clearer in the
> partitioned case.
>
> I’ll try to look further into the VACUUM behavior.
As discussed already, the slower VACUUM is surely because of the lack of
COPY FREEZE option. That's unfortunate...
The way this patch uses the connections and workers is a little bonkers.
The main thread uses the first connection to execute:
begin; TRUNCATE TABLE pgbench_accounts;
That connection is handed over to the first worker thread, and new
connections are opened for the other workers. But thanks to the
TRUNCATE, the open transaction on the first connection holds an
AccessExclusiveLock, preventing the other workers from starting the COPY
until the first worker has finished! I added some debugging prints to
show this:
$ pgbench -s500 -i -j10 postgres
dropping old tables...
creating tables...
generating data (client-side)...
loading pgbench_accounts with 10 threads...
0.00: thread 0: sending COPY command, use_freeze: 1
0.00: thread 1: sending COPY command, use_freeze: 0
0.00: thread 2: sending COPY command, use_freeze: 0
0.00: thread 0: COPY started for rows between 0 and 5000000
0.00: thread 6: sending COPY command, use_freeze: 0
0.00: thread 3: sending COPY command, use_freeze: 0
0.00: thread 9: sending COPY command, use_freeze: 0
0.00: thread 4: sending COPY command, use_freeze: 0
0.00: thread 5: sending COPY command, use_freeze: 0
0.00: thread 7: sending COPY command, use_freeze: 0
0.00: thread 8: sending COPY command, use_freeze: 0
6.19: thread 0: COPY done!
6.27: thread 9: COPY started for rows between 45000000 and 50000000
6.27: thread 1: COPY started for rows between 5000000 and 10000000
6.27: thread 5: COPY started for rows between 25000000 and 30000000
6.27: thread 2: COPY started for rows between 10000000 and 15000000
6.27: thread 6: COPY started for rows between 30000000 and 35000000
6.27: thread 3: COPY started for rows between 15000000 and 20000000
6.27: thread 8: COPY started for rows between 40000000 and 45000000
6.27: thread 4: COPY started for rows between 20000000 and 25000000
6.27: thread 7: COPY started for rows between 35000000 and 40000000
19.19: thread 1: COPY done!
19.21: thread 9: COPY done!
19.26: thread 6: COPY done!
19.27: thread 7: COPY done!
19.28: thread 3: COPY done!
19.28: thread 5: COPY done!
19.28: thread 4: COPY done!
19.29: thread 8: COPY done!
19.36: thread 2: COPY done!
vacuuming...
creating primary keys...
done in 71.58 s (drop tables 0.07 s, create tables 0.01 s, client-side
generate 19.41 s, vacuum 26.50 s, primary keys 25.59 s).
The straightforward fix is to commit the TRUNCATE transaction, and not
use FREEZE on any of the COPY commands.
This all makes more sense in the partitioned case. Perhaps we should
parallelize only when partitioned are used, and use only one thread per
partition.
- Heikki
view thread (11+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: parallel data loading for pgbench -i
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox