Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Manni Wood <[email protected]>
To: KAZAR Ayoub <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Wed, 26 Nov 2025 08:21:46 -0600
Message-ID: <CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com> (raw)
In-Reply-To: <CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com>
References: <aPkvi5P7kpA8oQKc@nathan>
	<[email protected]>
	<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
	<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
	<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
	<aRue0D4QQkUf2B_N@nathan>
	<CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
	<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
	<[email protected]>
	<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
	<aR4wDwNdLc5TmcQq@nathan>
	<CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com>

On Wed, Nov 26, 2025 at 5:51 AM KAZAR Ayoub <[email protected]> wrote:

> Hello,
> On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <[email protected]>
> wrote:
>
>> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
>> > Thanks, done.
>>
>> I took a look at the v3 patches.  Here are my high-level thoughts:
>>
>> +    /*
>> +     * Parse data and transfer into line_buf. To get benefit from
>> inlining,
>> +     * call CopyReadLineText() with the constant boolean variables.
>> +     */
>> +    if (cstate->simd_continue)
>> +        result = CopyReadLineText(cstate, is_csv, true);
>> +    else
>> +        result = CopyReadLineText(cstate, is_csv, false);
>>
>> I'm curious whether this actually generates different code, and if it
>> does,
>> if it's actually faster.  We're already branching on cstate->simd_continue
>> here.
>
> I've compiled both versions with -O2 and confirmed they generate different
> code. When simd_continue is passed as a constant to CopyReadLineText, the
> compiler optimizes out the condition checks from the SIMD path.
> A small benchmark on a 1GB+ file shows the expected benefit which is
> around 6% performance improvement.
> I've attached the assembly outputs in case someone wants to check
> something else.
>
>
> Regards,
> Ayoub Kazar
>

Correction to my last post:

I also tried files that alternated lines with no special characters and
lines with 1/3rd special characters, thinking I could force the algorithm
to continually check whether or not it should use simd and therefore force
more overhead in the try-simd/don't-try-simd housekeeping code. The text
file was still 20% faster (not 50% faster as I originally stated --- that
was a typo). The CSV file was still 13% faster.

Also, apologies for posting at the top in my last e-mail.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com

view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox