public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nazir Bilal Yavuz <[email protected]>
To: Nathan Bossart <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Thu, 20 Nov 2025 15:55:43 +0300
Message-ID: <CAN55FZ0e_L_O2O5W4E39vap1rz=OJjVqT7w--7gYeHpHK0a2aQ@mail.gmail.com> (raw)
In-Reply-To: <aR4wDwNdLc5TmcQq@nathan>
References: <aPkvi5P7kpA8oQKc@nathan>
	<[email protected]>
	<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
	<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
	<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
	<aRue0D4QQkUf2B_N@nathan>
	<CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
	<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
	<[email protected]>
	<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
	<aR4wDwNdLc5TmcQq@nathan>

Hi,

Thank you for looking into this!

On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks, done.
>
> I took a look at the v3 patches.  Here are my high-level thoughts:
>
> +    /*
> +     * Parse data and transfer into line_buf. To get benefit from inlining,
> +     * call CopyReadLineText() with the constant boolean variables.
> +     */
> +    if (cstate->simd_continue)
> +        result = CopyReadLineText(cstate, is_csv, true);
> +    else
> +        result = CopyReadLineText(cstate, is_csv, false);
>
> I'm curious whether this actually generates different code, and if it does,
> if it's actually faster.  We're already branching on cstate->simd_continue
> here.

I had the same doubts before but my benchmark shows nice speedup. I
used a test which is full of delimiters. The current code gives 2700
ms but when I changed these lines with the 'result =
CopyReadLineText(cstate, is_csv, cstate->simd_continue);', the result
was 2920 ms. I compiled code with both -O3 and -O2 and the results
were similar.

>
> +            /* Load a chunk of data into a vector register */
> +            vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
>
> In other places, processing 2 or 4 vectors of data at a time has proven
> faster.  Have you tried that here?

Sorry, I could not find the related code piece. I only saw the
vector8_load() inside of hex_decode_safe() function and its comment
says:

/*
 * We must process 2 vectors at a time since the output will be half the
 * length of the input.
 */

But this does not mention any speedup from using 2 vectors at a time.
Could you please show the related code?

>
> +            /* \n and \r are not special inside quotes */
> +            if (!in_quote)
> +                match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
> +
> +            if (is_csv)
> +            {
> +                match = vector8_or(match, vector8_eq(chunk, quote));
> +                if (escapec != '\0')
> +                    match = vector8_or(match, vector8_eq(chunk, escape));
> +            }
> +            else
> +                match = vector8_or(match, vector8_eq(chunk, bs));
>
> The amount of branching here catches my eye.  Some branching might be
> unavoidable, but in general we want to keep these SIMD paths as branch-free
> as possible.

You are right, I will check these branches and will try to remove as
many branches as possible.

>
> +                /*
> +                 * Found a special character. Advance up to that point and let
> +                 * the scalar code handle it.
> +                 */
> +                int         advance = pg_rightmost_one_pos32(mask);
> +
> +                input_buf_ptr += advance;
> +                simd_total_advance += advance;
>
> Do we actually need to advance here?  Or could we just fall through to the
> scalar path?  My suspicion is that this extra code doesn't gain us much.

My testing shows that if we advance more than ~5 characters then SIMD
is worth it, but if we advance less than ~5; then code causes a
regression. I used this information while writing a heuristic.

>
> +            if (simd_last_sleep_cycle == 0)
> +                simd_last_sleep_cycle = 1;
> +            else if (simd_last_sleep_cycle >= SIMD_SLEEP_MAX / 2)
> +                simd_last_sleep_cycle = SIMD_SLEEP_MAX;
> +            else
> +                simd_last_sleep_cycle <<= 1;
> +            cstate->simd_current_sleep_cycle = simd_last_sleep_cycle;
> +            cstate->simd_last_sleep_cycle = simd_last_sleep_cycle;
>
> IMHO we should be looking for ways to simplify this should-we-use-SIMD
> code.  For example, perhaps we could just disable the SIMD path for 10K or
> 100K lines any time a special character is found.  I'm dubious that a lot
> of complexity is warranted.

I think this is a bit too harsh since SIMD is still worth it if SIMD
can advance more than ~5 character average. I am trying to use SIMD as
much as possible when it is worth it but what you said can remove the
regression completely, perhaps that is the correct way.

--
Regards,
Nazir Bilal Yavuz
Microsoft





view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CAN55FZ0e_L_O2O5W4E39vap1rz=OJjVqT7w--7gYeHpHK0a2aQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox