Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Nathan Bossart <[email protected]>
To: KAZAR Ayoub <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Fri, 6 Feb 2026 15:29:13 -0600
Message-ID: <aYZdKSTw6N3khsVE@nathan> (raw)
In-Reply-To: <CA+K2Run1VdLnmp-5_Qv2Fax0KgT7LLJMH-uzjaaf-NZD1oU-=w@mail.gmail.com>
References: <CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com>
	<CAKWEB6oO4gQd+UJBrU=uuUTE8Hv7GMznjMouvn0Lskr52UqjhQ@mail.gmail.com>
	<CAN55FZ0Nd9FL=aDSjOTJTeFAn8VNrZgWG+WbcHR+R7GkDMvUyw@mail.gmail.com>
	<CAN55FZ1fwKgGo2wEie1w2M2jzJko6cMi1NWD05Xm47_L9a3D+g@mail.gmail.com>
	<CAKWEB6oZdQhhBV3ojHLBwjQgKzfDw0fkqncurt9oi7vNsq41ww@mail.gmail.com>
	<CAN55FZ1p5UyUdTRO7iWR_ukjhJDOnpOR2rYNOq=+hcC45OuahQ@mail.gmail.com>
	<CAOW5sYZEx=fPw2wp7y2nK_-ifXFeYW4CTmFx_OQeoHFjG7rbHw@mail.gmail.com>
	<CA+K2Ru=C_woAnd-3-pGHoNSTR8FOf=7eeSWE1xaLt9ojVWndVg@mail.gmail.com>
	<CAN55FZ0FRB2OD6-oEESLvgUT4bLZQVD72pAqUqzdw7Rx5cN0ig@mail.gmail.com>
	<CA+K2Run1VdLnmp-5_Qv2Fax0KgT7LLJMH-uzjaaf-NZD1oU-=w@mail.gmail.com>

Sorry for disappearing from this thread for a while.

It looks like a lot of energy has been put into benchmarking and refining
the heuristic for deciding when to use the SIMD path so that we avoid large
regressions when there are special characters.  I think this is all
valuable work, but I'm a bit concerned that we are putting the cart before
the horse.  IMHO it would be better to first get the SIMD code committed
with the absolute simplest heuristic we can think of (e.g., as soon as we
see a special character, switch to the scalar path for the remainder of
COPY FROM).  My hope is that would be far easier to reason about from a
performance angle.  If we immediately fall back to the existing code path,
we don't need to worry about how many special characters there are and
whether they are sparse or clustered or whatever.  We just need to measure
the overhead of the new branches and ensure they don't produce meaningful
regressions.  Assuming that all looks good, we can then focus on the SIMD
code itself and make sure that is correct and optimal.  And once we get
that portion committed, we could then consider more sophisticated
heuristics.

FWIW I'm hoping to get something in this area committed for v19, and IMO
now is a good time to start thinking about how to get things over the
finish line.  Thanks for working on it.

-- 
nathan

view thread (21+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <aYZdKSTw6N3khsVE@nathan>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox