Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andrew Dunstan <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
To: Nathan Bossart <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Fri, 21 Nov 2025 09:48:53 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAN55FZ0e_L_O2O5W4E39vap1rz=OJjVqT7w--7gYeHpHK0a2aQ@mail.gmail.com>
References: <aPkvi5P7kpA8oQKc@nathan>
	<[email protected]>
	<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
	<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
	<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
	<aRue0D4QQkUf2B_N@nathan>
	<CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
	<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
	<[email protected]>
	<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
	<aR4wDwNdLc5TmcQq@nathan>
	<CAN55FZ0e_L_O2O5W4E39vap1rz=OJjVqT7w--7gYeHpHK0a2aQ@mail.gmail.com>


On 2025-11-20 Th 7:55 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> Thank you for looking into this!
>
> On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:
>
>> IMHO we should be looking for ways to simplify this should-we-use-SIMD
>> code.  For example, perhaps we could just disable the SIMD path for 10K or
>> 100K lines any time a special character is found.  I'm dubious that a lot
>> of complexity is warranted.
> I think this is a bit too harsh since SIMD is still worth it if SIMD
> can advance more than ~5 character average. I am trying to use SIMD as
> much as possible when it is worth it but what you said can remove the
> regression completely, perhaps that is the correct way.
>

Perhaps a very small regression (say under 1%) in the worst case would 
be OK. But the closer you can get that to zero the more acceptable this 
will be. Very large loads of sparse data, which will often have lots of 
special characters AIUI, are very common, so we should not dismiss the 
worst case as an outlier. I still like the idea of testing, say, a 
thousand lines every million, or something like that.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox