public inbox for [email protected]
help / color / mirror / Atom feedFrom: Manni Wood <[email protected]>
To: KAZAR Ayoub <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Mark Wong <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Mon, 29 Dec 2025 11:03:17 -0600
Message-ID: <CAKWEB6qa4V+aU5-S_Eq=J2o09xp=3e-iLFVqimB0Zu6iq3GKdw@mail.gmail.com> (raw)
In-Reply-To: <CA+K2RumOaH-daBGN6uTo6+_0XSg7HQ10Na8OzScCV5j6eKkFgA@mail.gmail.com>
References: <CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
<[email protected]>
<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
<aR4wDwNdLc5TmcQq@nathan>
<CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com>
<CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com>
<CAKWEB6oO4gQd+UJBrU=uuUTE8Hv7GMznjMouvn0Lskr52UqjhQ@mail.gmail.com>
<CAN55FZ0Nd9FL=aDSjOTJTeFAn8VNrZgWG+WbcHR+R7GkDMvUyw@mail.gmail.com>
<CAN55FZ1fwKgGo2wEie1w2M2jzJko6cMi1NWD05Xm47_L9a3D+g@mail.gmail.com>
<aTx-LDyiHV-7wfOP@ltdrgnflg2>
<CAKWEB6r=axZsG-s7zyWURZ-s9-s1dTV9ohkZXO0ynfLEU5ha3Q@mail.gmail.com>
<CAN55FZ2DE2XSrFUhsOqbpBo+BtzTwsJWOD0MffvdGnHtbsPRuw@mail.gmail.com>
<CA+K2RumOaH-daBGN6uTo6+_0XSg7HQ10Na8OzScCV5j6eKkFgA@mail.gmail.com>
On Wed, Dec 24, 2025 at 9:08 AM KAZAR Ayoub <[email protected]> wrote:
> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that
> COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as
> the first special character is found, then fallback to scalar processing
> for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue
> scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special
> characters in the attribute, we don't lose anything here since it advances
> with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
> chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character,
> keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same
> heuristic to use for both COPY FROM/TO to reduce the regressions (same
> regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy
> as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO
> will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the
> second one.
>
>
> Regards,
> Ayoub
>
Hello, Nazir and Ayoub!
Nazir, sorry for the late reply, I am on holiday. :-) I wanted to thank you
for the tips on using cpupower to get less variance in my test results.
Ayoub, I suppose it was inevitable the SIMD patch would work for copying
out as well as copying in!
I am back at work on 5 Jan 2026, so I till try to carve out time to test
this then, using Nazir's tips.
Happy Holidays!
-Manni
--
-- Manni Wood EDB: https://www.enterprisedb.com
view thread (99+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
In-Reply-To: <CAKWEB6qa4V+aU5-S_Eq=J2o09xp=3e-iLFVqimB0Zu6iq3GKdw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox