public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ants Aasma <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Tue, 19 Aug 2025 12:09:20 +0300
Message-ID: <CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com>
References: <CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com>
	<CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com>

On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <[email protected]> wrote:
> I have a couple of ideas that I was working on:
> ---
>
> +         * However, SIMD optimization cannot be applied in the following cases:
> +         * - Inside quoted fields, where escape sequences and closing quotes
> +         *   require sequential processing to handle correctly.
>
> I think you can continue SIMD inside quoted fields. Only important
> thing is you need to set last_was_esc to false when SIMD skipped the
> chunk.

There is a trick with doing carryless multiplication with -1 that can
be used to SIMD process transitions between quoted/not-quoted. [1]
This is able to convert a bitmask of unescaped quote character
positions to a quote mask in a single operation. I last looked at it 5
years ago, but I remember coming to the conclusion that it would work
for implementing PostgreSQL's interpretation of CSV.

[1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76

--
Ants





view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox