public inbox for [email protected]
help / color / mirror / Atom feedFrom: Ants Aasma <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Tue, 19 Aug 2025 12:09:20 +0300
Message-ID: <CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com>
References: <CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com>
<CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com>
On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <[email protected]> wrote:
> I have a couple of ideas that I was working on:
> ---
>
> + * However, SIMD optimization cannot be applied in the following cases:
> + * - Inside quoted fields, where escape sequences and closing quotes
> + * require sequential processing to handle correctly.
>
> I think you can continue SIMD inside quoted fields. Only important
> thing is you need to set last_was_esc to false when SIMD skipped the
> chunk.
There is a trick with doing carryless multiplication with -1 that can
be used to SIMD process transitions between quoted/not-quoted. [1]
This is able to convert a bitmask of unescaped quote character
positions to a quote mask in a single operation. I last looked at it 5
years ago, but I remember coming to the conclusion that it would work
for implementing PostgreSQL's interpretation of CSV.
[1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76
--
Ants
view thread (99+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
In-Reply-To: <CANwKhkMnay=xrVNcuw45G+8nMAGkWee9KtFSGussZX8-16+zNg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox