public inbox for [email protected]
help / color / mirror / Atom feedFrom: Manni Wood <[email protected]>
To: KAZAR Ayoub <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Mon, 2 Feb 2026 08:17:55 -0600
Message-ID: <CAKWEB6p5+3VL4s61=zD4UBFp4ybNo1NrBnBw+avXsxgjBREqew@mail.gmail.com> (raw)
In-Reply-To: <CA+K2Ru=C_woAnd-3-pGHoNSTR8FOf=7eeSWE1xaLt9ojVWndVg@mail.gmail.com>
References: <aPkvi5P7kpA8oQKc@nathan>
<[email protected]>
<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
<aRue0D4QQkUf2B_N@nathan>
<CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
<[email protected]>
<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
<aR4wDwNdLc5TmcQq@nathan>
<CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com>
<CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com>
<CAKWEB6oO4gQd+UJBrU=uuUTE8Hv7GMznjMouvn0Lskr52UqjhQ@mail.gmail.com>
<CAN55FZ0Nd9FL=aDSjOTJTeFAn8VNrZgWG+WbcHR+R7GkDMvUyw@mail.gmail.com>
<CAN55FZ1fwKgGo2wEie1w2M2jzJko6cMi1NWD05Xm47_L9a3D+g@mail.gmail.com>
<CAKWEB6oZdQhhBV3ojHLBwjQgKzfDw0fkqncurt9oi7vNsq41ww@mail.gmail.com>
<CAN55FZ1p5UyUdTRO7iWR_ukjhJDOnpOR2rYNOq=+hcC45OuahQ@mail.gmail.com>
<CAOW5sYZEx=fPw2wp7y2nK_-ifXFeYW4CTmFx_OQeoHFjG7rbHw@mail.gmail.com>
<CA+K2Ru=C_woAnd-3-pGHoNSTR8FOf=7eeSWE1xaLt9ojVWndVg@mail.gmail.com>
On Sat, Jan 31, 2026 at 10:21 AM KAZAR Ayoub <[email protected]> wrote:
> Hello,
>
> On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <[email protected]> wrote:
>
>> A few suggestions:
>>
>> * I'm curious if we'll see better performance on large inputs if we flush
>> to `line_buf` periodically (e.g., at least every few thousand bytes or so).
>> Otherwise we might see poor data cache behavior if large inputs with no
>> control characters get evicted before we've copied them over. See the
>> approach taken in escape_json_with_len() in utils/adt/json.c
>>
>> So i gave this a try, attached is the small patch that has v3 + the
> suggestion added, here are the results with different threshold for
> line_buf refill:
>
> Execution time compared to master:
> Workload v3 v3.1 (2k) v3.1 (4k) v3.1 (8k) v3.1 (16k) v3.1 (20k) v3.1 (28k)
> text/none -16.5% -17.4% -14.3% -12.6% -13.6% -10.5% -16.3%
> text/esc +5.6% +11.1% +3.1% +7.6% +3.0% +4.9% +4.2%
> csv/none -31.0% -29.9% -26.7% -30.1% -27.9% -30.2% -29.6%
> csv/quote +0.2% -0.6% -0.4% -1.0% +0.1% +2.5% -1.0%
>
> L1d cache miss rates:
> Workload Master v3 v3.1 (2k) v3.1 (4k) v3.1 (8k) v3.1 (16k) v3.1 (20k) v3.1
> (28k)
> text/none 0.20% 0.23% 0.21% 0.22% 0.21% 0.21% 0.21% 0.22%
> text/esc 0.21% 0.22% 0.22% 0.22% 0.22% 0.21% 0.22% 0.22%
> csv/none 0.17% 0.22% 0.21% 0.22% 0.21% 0.21% 0.22% 0.22%
> csv/quote 0.18% 0.22% 0.19% 0.20% 0.20% 0.19% 0.20% 0.20%
> On my laptop I have 32KB L1 cache per core.
> Results are super close, it is hard to see in the cache misses numbers but
> execution times are saying other things, doing the periodic filling of
> line_buf seems good to do.
> If Manni can rerun the benchmarks on these too, it would be nice to
> confirm this.
>
>
> Regards,
> Ayoub
>
Hello, All!
Ayoub, I will try to benchmark v3.1 this week on my standalone x86 and arm
PCs. Sadly, other work has been taking priority these last couple weeks,
but I will carve out some time.
Neil, thanks so much for looking at this patch!
-Manni
--
-- Manni Wood EDB: https://www.enterprisedb.com
view thread (4+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
In-Reply-To: <CAKWEB6p5+3VL4s61=zD4UBFp4ybNo1NrBnBw+avXsxgjBREqew@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox