Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2oDg-000ckv-1F for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 10:33:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2oDf-009a11-0k for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 10:33:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2oDe-009a0s-2x for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 10:33:42 +0000 Received: from mail-dy1-x1342.google.com ([2607:f8b0:4864:20::1342]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2oDb-00000000uij-42HG for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 10:33:42 +0000 Received: by mail-dy1-x1342.google.com with SMTP id 5a478bee46e88-2bdd40d3c61so6125930eec.1 for ; Wed, 18 Mar 2026 03:33:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773830018; cv=none; d=google.com; s=arc-20240605; b=VD8b4fPu49Fb1vQQKYkyfweyPzChmTu1kf/LkCNoF4N4ul2l+0N4IMg8NPuAFqhl7I dDvbaMGW40EKw2cK44ESk/o4mvJtveL8Lk5p3yMpiHtbzUIIn/IB56rurFGcEgbdkjdt BuoHlgpOuJOrNe8OAEeJLOpZWzf+HdoRncfMEJ3F/CNVJNrMPaF9CRxsfpDdwZthkLIM oAKlAyaH49gDOxF8eMfs8tbDUyrstmjnRAAI3PCQuu6hyYwz5+shSpY6hhLvftMY4qce 4oC/gf1gaABttmJHgIpdMugFLgB3r+Mb86zoKX+eiYHEa3goQzF7d8RiM755drgYpo2t 9FKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=/vE11WTt1s6Gz8vI5BI4ePsAFR3hi/mzyUm8enq/mMU=; fh=KLtCwyI24/nGf1vnS7F+ykpLpu+7AQ9N9yczsoH2uxQ=; b=NjhwfzWdGWHB1paEOpTkHoZC1rLBadfSy5wt72vIGIi8sgD+C41aeCwgwjkFkPHk8r zSUGmdfR6k1cIDI2rTuvHGdW+Yis0dcTRm9uguGLC43Zq2sWJrmXLq8Eiy1dngO+kETx Tn3vYCR4rTni4FycJZifoq4VFO1DHJ3uGV37XKmGX1MTIkHIfky43JXx6yOdNWhax2pD 0fN90Vc6mBaNsxo8XH658p1oWXk4ygT9eWf3+xehIIc7t6SuClysvcmzDAGSeuY3canC 9nMl5X3NbO5Yz76092r1jU/29PmJm8zXAYG7L3Je4mfPEDWpsCX01RKVjbMdQp1yTYko 9eaw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773830018; x=1774434818; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/vE11WTt1s6Gz8vI5BI4ePsAFR3hi/mzyUm8enq/mMU=; b=BB0x0o+t1mrr8ymYXAWu8A6UFV4vPM7P9yGH3WX/e0y5NcAZ49bAuyZW1hG/wL/Hg7 qN6tGP0fz56J7Wb9xOM8QtZgna5bhjtHm9wJ1blNK0BeREGUMj619B3RcK6ZQBtk/s5M vCpvYsP0KDb9ZL9X66YGIvSGDxZ1DN6j0S0P6u40g9Aa1eMJeMoo6HKavOaozFz4Hs1C DOntXazs8hPM66M9znMiwmFB87nTegvJAaVotXnDUG158f9feyAcyFUezpsN516JpboB B92haM8ADhAd0tYb9uEJ1kHEEiYSpE0/AQPiU1R/74CGHSvsqWuOxtu/ha0Yls5RkWs1 JtSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773830018; x=1774434818; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/vE11WTt1s6Gz8vI5BI4ePsAFR3hi/mzyUm8enq/mMU=; b=GV67LmSyb7MA9uor96kbU0VPpCg1AT7ZMDb6uyQpT8smC3UFq86y2gh1qGqggsFt/Z tleE9y/CGhxu+DQeXK3ITH7oUzZsVrY0pRozJKmmnU68bLBAw+HN9mtAonRJXRSoLyrR j9BbHB5rZXrOdaYFEKfTVyBxpVQfkJoAALq8PQ3Yl5q0qdPO0WpXdUfIHEhqhH2pgmv1 vcYn9W9MRrPDAqMsfVD5EM0+RI/YRPxShKWpy9JguWSrrvKneTHJnughcwUAwT97/zZI ++om4cl2lvH1/S4BBtHPzS1MXS+pLlTeWPS9gj6Qp5igAYRp+/cZosIAF7eN4q/Nr6rP AiQQ== X-Forwarded-Encrypted: i=1; AJvYcCUnqM4LWSdisXsB6mQ/AemAC6Ay84zLBXKpOhCh7IVbXnMssrbMhMhK+tG5KVcEAVVwdDOGBOkPJqfJXyiB@lists.postgresql.org X-Gm-Message-State: AOJu0Yz+GH5mFonydapIHZDNQw7nnxHnwj2E4JBWiH0nkexHrNwCQPjT 3jIa5p8cNc/OQ/p8skeL42F2xEMDOPfu/LhfIs1coVDrfSMtoE4KxH9Ytn3FgmPQT/PAIOPu1ly 0ZMgSZXRMjROtJn2MYkwklqFzDuJHdGw= X-Gm-Gg: ATEYQzyX3xz8i80GXIvdmZfJOt/HJzcsDuqPo8dtDAEFzZ9N8Plw101CH2yN8m5AtXY YAdwr1MxuhIxYsc+LOngiSM4s3mf87mKJJ2R5J5JC4gyKO4xdUnApumngflUYVVs29Y02VTGvhB 2z/+1kEUbH4tVsGyD3+h3lMtNYd89/zG0NU2brGsSAP0f0l52Et1Jn/gTEZaNVleNGVlyN87kxH XKqgD6GHayr9WYNVQciq1n21ylE5sN7UN6PjoDIf7Ku+dx8fzqu1zfQ+wVLoPRYU0R9lC8uGefz d4KE7yQ= X-Received: by 2002:a05:7300:fb97:b0:2c0:e209:d3a3 with SMTP id 5a478bee46e88-2c0e50629cdmr1390778eec.22.1773830017997; Wed, 18 Mar 2026 03:33:37 -0700 (PDT) MIME-Version: 1.0 References: <4c1d0b97-a5f8-472c-afdd-bdeb09b93f33@gmail.com> <10868918-cdf9-49dc-99af-8e8ccd6e368c@gmail.com> In-Reply-To: <10868918-cdf9-49dc-99af-8e8ccd6e368c@gmail.com> From: lakshmi Date: Wed, 18 Mar 2026 16:07:01 +0530 X-Gm-Features: AaiRm50rHDp5tvFPaJN-dCD8q0LxiX9CcSXTNXRltSe1asKpRICCNELIP_QnPdU Message-ID: Subject: Re: parallel data loading for pgbench -i To: Mircea Cadariu Cc: "Hayato Kuroda (Fujitsu)" , PostgreSQL Hackers , "tomas@vondra.me" Content-Type: multipart/alternative; boundary="0000000000001d5dd2064d49fc32" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000001d5dd2064d49fc32 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Mircea, Hayato, Thanks for the updated v2 patches. I applied 0001 and 0002 on 19devel and ran some tests. The results look consistent. For scale 100, parallel loading speeds up data generation, but in the non-partitioned case, the VACUUM phase becomes noticeably slower. In contrast, the partitioned + parallel case performs best overall with much lower vacuum cost. For scale 500, I see the same pattern: non-partitioned parallel runs are dominated by VACUUM time, while the partitioned setup shows a clear overall speedup. I also verified correctness, and row counts match expected values. So overall, the benefit of parallel loading is much clearer in the partitioned case. I=E2=80=99ll try to look further into the VACUUM behavior. Thanks again for the work on this. Best regards, Lakshmi On Fri, Mar 13, 2026 at 11:59=E2=80=AFPM Mircea Cadariu wrote: > Hi Lakshmi, Hayato, > > > Thanks a lot for your input! > > I'm not sure why the VACUUM phase takes longer compared to the serial > run. We can potentially get a clue with a profiler. I know there is an > ongoing effort to introduce parallel heap vacuum [1] which I expect will > help with this. > > The code comments you have provided me have been applied to the v2 patch > attached. Below I provide answers to the questions. > > > Also, why is -j accepted in case of non-partitions? > For non-partitioned tables, each worker loads a separate range of rows > via its own connection in parallel. > > > Copying seems to be divided into chunks per COPY_BATCH_SIZE. Is it real= ly > > essential to parallelize the initialization? I feel it may optimize eve= n > > serialized case thus can be discussed independently. > You're right that the COPY batching is an optimization that's > independent. I wanted to see how fast I can get this patch, so I looked > for bottlenecks in the new code with a profiler and this was one of > them. I agree it makes sense to apply this for the serialised case > separately. > > > Per my understanding, each thread creates its tables, and all of them a= re > > attached to the parent table. Is it right? I think it needs more code > > changes, and I am not sure it is critical to make initialization faster= . > Yes, that's correct. Each worker creates its assigned partitions as > standalone tables, loads data into them, and then the main thread > attaches them all to the parent after loading completes. It's to avoid > AccessExclusiveLock contention on the parent table during parallel > loading and allow each worker to use COPY FREEZE on its standalone table. > > > So I suggest using the incremental approach. The first patch only > > parallelizes > > the data load, and the second patch implements the CREATE TABLE and > > ALTER TABLE > > ATTACH PARTITION. You can benchmark three patterns, master, 0001, and > > 0001 + 0002, then compare the results. IIUC, this is the common > > approach to > > reduce the patch size and make them more reviewable. > > Thanks for the recommendation, I extracted 0001 and 0002 as per your > suggestion. I will see if I can split it more, as indeed it helps with > the review. > > Results are similar with the previous runs. > > master > > pgbench -i -s 100 -j 10 > done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s). > > pgbench -i -s 100 -j 10 --partitions=3D10 > done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side > generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s). > > > 0001 > pgbench -i -s 100 -j 10 > done in 18.75 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 6.51 s, vacuum 5.73 s, primary keys 6.50 s). > > pgbench -i -s 100 -j 10 --partitions=3D10 > done in 29.33 s (drop tables 0.00 s, create tables 0.02 s, client-side > generate 16.48 s, vacuum 7.59 s, primary keys 5.24 s). > > 0002 > pgbench -i -s 100 -j 10 > done in 18.12 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 6.64 s, vacuum 5.81 s, primary keys 5.65 s). > > pgbench -i -s 100 -j 10 --partitions=3D10 > done in 14.38 s (drop tables 0.00 s, create tables 0.01 s, client-side > generate 7.97 s, vacuum 1.55 s, primary keys 4.85 s). > > > Looking forward to your feedback. > > [1]: > > https://www.postgresql.org/message-id/CAD21AoAEfCNv-GgaDheDJ%2Bs-p_Lv1H24= AiJeNoPGCmZNSwL1YA%40mail.gmail.com > > -- > Thanks, > Mircea Cadariu > --0000000000001d5dd2064d49fc32 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi Mircea, Hayato,

Thanks for the updated v2 patches.

I applied 0001 and 0002 on 19devel and ran some t= ests. The results look consistent.

For = scale 100, parallel loading speeds up data generation, but in the non-parti= tioned case, the VACUUM phase becomes noticeably slower. In contrast, the p= artitioned + parallel case performs best overall with much lower vacuum cos= t.

For scale 500, I see the same patter= n: non-partitioned parallel runs are dominated by VACUUM time, while the pa= rtitioned setup shows a clear overall speedup.

I also verified correctness, and row counts match expected values.<= /p>

So overall, the benefit of parallel loa= ding is much clearer in the partitioned case.

I=E2=80=99ll try to look further into the VACUUM behavior.

Thanks again for the work on this.

Best re= gards,
Lakshmi


On Fri, Mar 13, 2026 at 11:59=E2= =80=AFPM Mircea Cadariu <cad= ariu.mircea@gmail.com> wrote:
Hi Lakshmi, Hayato,


Thanks a lot for your input!

I'm not sure why the VACUUM phase takes longer compared to the serial <= br> run. We can potentially get a clue with a profiler. I know there is an
ongoing effort to introduce parallel heap vacuum [1] which I expect will help with this.

The code comments you have provided me have been applied to the v2 patch attached. Below I provide answers to the questions.

> Also, why is -j accepted in case of non-partitions?
For non-partitioned tables, each worker loads a separate range of rows
via its own connection in parallel.

> Copying seems to be divided into chunks per COPY_BATCH_SIZE. Is it rea= lly
> essential to parallelize the initialization? I feel it may optimize ev= en
> serialized case thus can be discussed independently.
You're right that the COPY batching is an optimization that's
independent. I wanted to see how fast I can get this patch, so I looked for bottlenecks in the new code with a profiler and this was one of
them. I agree it makes sense to apply this for the serialised case
separately.

> Per my understanding, each thread creates its tables, and all of them = are
> attached to the parent table. Is it right? I think it needs more code<= br> > changes, and I am not sure it is critical to make initialization faste= r.
Yes, that's correct. Each worker creates its assigned partitions as standalone tables, loads data into them, and then the main thread
attaches them all to the parent after loading completes. It's to avoid =
AccessExclusiveLock contention on the parent table during parallel
loading and allow each worker to use COPY FREEZE on its standalone table.
> So I suggest using the incremental approach. The first patch only
> parallelizes
> the data load, and the second patch implements the CREATE TABLE and > ALTER TABLE
> ATTACH PARTITION. You can benchmark three patterns, master, 0001, and<= br> > 0001 + 0002, then compare the results. IIUC, this is the common
> approach to
> reduce the patch size and make them more reviewable.

Thanks for the recommendation, I extracted 0001 and 0002 as per your
suggestion. I will see if I can split it more, as indeed it helps with
the review.

Results are similar with the previous runs.

master

pgbench -i -s 100 -j 10
done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).

pgbench -i -s 100 -j 10 --partitions=3D10
done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side
generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).


0001
pgbench -i -s 100 -j 10
done in 18.75 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 6.51 s, vacuum 5.73 s, primary keys 6.50 s).

pgbench -i -s 100 -j 10 --partitions=3D10
done in 29.33 s (drop tables 0.00 s, create tables 0.02 s, client-side
generate 16.48 s, vacuum 7.59 s, primary keys 5.24 s).

0002
pgbench -i -s 100 -j 10
done in 18.12 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 6.64 s, vacuum 5.81 s, primary keys 5.65 s).

pgbench -i -s 100 -j 10 --partitions=3D10
done in 14.38 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 7.97 s, vacuum 1.55 s, primary keys 4.85 s).


Looking forward to your feedback.

[1]:
https://www.postgresql.org/message-id/CAD21AoAEfCNv-GgaDheDJ%2Bs-p_= Lv1H24AiJeNoPGCmZNSwL1YA%40mail.gmail.com

--
Thanks,
Mircea Cadariu
--0000000000001d5dd2064d49fc32--