Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Manni Wood <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Mon, 9 Mar 2026 08:31:39 -0500
Message-ID: <CAKWEB6pWU+2mUa41t0Tb+3XmKyDuGSS=8XaDirumjSc9=d8WJQ@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ1pM3PgKfAuTT5YmG4enCoh93bOjxNi95nqKgwbHmh3hg@mail.gmail.com>
References: <aZ3kYQnF9_u6sUQp@nathan>
	<CAN55FZ3+NYF1TkKyNtpRQuLiaauSYk9G5tA+fpruOA4-14Y_ZA@mail.gmail.com>
	<aaXrGSyq4u2d9qEC@nathan>
	<CAN55FZ2DNaKCK3Kf_kHizb2pAbQvULeDYtzaiz97B_xz7YbrkQ@mail.gmail.com>
	<[email protected]>
	<CAKWEB6r0CrN-a2P=2ey3EK7p1MxsbQx2C8=hpNGfxLxnRaX66Q@mail.gmail.com>
	<CAN55FZ1BYbdt97RLkd-vAVZ5EKiX=KLkhJ_S-4vgvNJRHvQZ6w@mail.gmail.com>
	<CAKWEB6pmXvv3sbQBNLAR_B=8wzX8rn4VFsJ4WwmNbuJ2etSwDQ@mail.gmail.com>
	<CAN55FZ25HV0FNziFR8xBpQpHkNPS48w8yEXCc91W+=0c0jr+KA@mail.gmail.com>
	<CAKWEB6rnMKYGSt=t9=pL2kKUefsAdzKfjtwdqW_acv1+vMTKVA@mail.gmail.com>
	<aatfiTMLvLok2cUc@nathan>
	<CA+K2RunT=1P7_E4jrZMogDiPunNCOGw9P_UHLaJRJk=5_odKmA@mail.gmail.com>
	<CAN55FZ1k2sOZn1qkDDVTX8Z1t99ZUkywBhdp_43euDHaAi5bRA@mail.gmail.com>
	<CAKWEB6pJ-5b7QUmVtG12hC0bQ82OvDv4XsidAcnngN36q28qTQ@mail.gmail.com>
	<CAN55FZ1pM3PgKfAuTT5YmG4enCoh93bOjxNi95nqKgwbHmh3hg@mail.gmail.com>

On Mon, Mar 9, 2026 at 3:10 AM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sun, 8 Mar 2026 at 22:45, Manni Wood <[email protected]>
> wrote:
> >
> > As requested, here are some numbers based on the latest master but with
> the copy code inlining excised (`git revert
> dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy
> code inlining left in place and the v11 patch applied.
> > Both results have lz4 compression in place.
>
> Thank you for the benchmark!
>
> > I have not run numbers without lz4. I assume I could use the two
> postgres instances that I have compiled with lz4, but just set
> `default_toast_compression = pglz` in postgesql.conf for both instances.
> Let me know if that is a mistaken assumption on my part.
>
> I am a bit confused. Are you asking that for the current benchmark you
> shared or future benchmarks? I assume your current benchmark has
> 'default_toast_compression = lz4' because your benchmark results are
> very similar to my benchmark with 'default_toast_compression = lz4'
> but I just wanted to make sure.
>
> What you said about editing postgresql.conf is correct but you need to
> make this change before creating the Postgres instance with 'pg_ctl
> ... start' command, otherwise it won't have an effect and you need to
> restart the instance to see the effect. Also, If you want to benchmark
> without lz4 change, you can just use the "SET
> default_toast_compression to 'pglz';" command in psql, then you don't
> need to edit postgresql.conf. Please note that this will affect only
> the psql instance you typed the command. To make things easier, you
> can run the 'SHOW default_toast_compression;' command to see the
> current value of 'default_toast_compression'.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello, Nazir!

I was being too brief.

The benchmarks I shared were absolutely with lz4 compiled in
and 'default_toast_compression = lz4' set in postgresql.conf for every
postgres instance I tested with. (Furthermore, I ran `show
default_toast_compression` via `psql` on each postgres instance to be
sure 'default_toast_compression = lz4' was really set!)

Also, all were compiled using meson using `debugoptimized` which results in
`-g -O2`.

So those are the benchmarks that I shared.

OK, so my final question, hopefully clarified: If I run additional
benchmarks where pglz is used for default_toast_compression, is it enough
to use the instances I have already compiled with lz4 in them, but
with 'default_toast_compression = pglz` explicitly set in postgresql.conf
in a brand new data dir created by initdb? (In other words, existing data
dir deleted, then initdb run to make a new data dir, then postgresql.conf
edited to ensure 'default_toast_compression = pglz` explicitly set, then
and only then starting up the cluster for the first time... and finally
verifying via `show default_toast_compression` for good measure.)

Or should I re-compile with the lz4-is-now-the-default commit completely
excised?

Thanks so much!

-Manni

--
-- Manni Wood EDB: https://www.enterprisedb.com

view thread (114+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CAKWEB6pWU+2mUa41t0Tb+3XmKyDuGSS=8XaDirumjSc9=d8WJQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox