Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

From: KAZAR Ayoub <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Tue, 24 Feb 2026 16:07:38 +0100
Message-ID: <CA+K2RukFH57QPAfTEzvy7PEyrLzav3HkyCiu-2yqR+uW_Niorw@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ2O2Ls==sdpROHqxWRx-PMBZ0riJ6eVKoHj8=vssTavxw@mail.gmail.com>
References: <CAN55FZ3g6QaiC8G4GMjdJ24egvgc-HG_xpoOztxnM_wnQNn5aw@mail.gmail.com>
	<aY-vJe_ENCB-fux9@nathan>
	<CAN55FZ2OpqRxUUEvgPpHCk2HnY0xZSH1x09fgFGOUyXSv8HcEA@mail.gmail.com>
	<aZYudtuBLVb36pZE@nathan>
	<CAN55FZ0J5iz9wFJLHcK7yNQqPb10_4ROoZiDu1wBZWSGC_fATg@mail.gmail.com>
	<CAKWEB6qY=mU62oAQFAVPCFWvwRuTPKBwxvM2aZ+J7p_9_MBmhQ@mail.gmail.com>
	<CAN55FZ2RPMxquXE6TH7dQkhtoiBcOOOZq8EOXj5COHv3ecP_cw@mail.gmail.com>
	<CA+K2Ru=fFTUVgEDr-fBed5aOMeDbH9vrOEhapXzHEpBeOxkucg@mail.gmail.com>
	<CAKWEB6pq7C0Wv1wT9Y1_c_1fn-+cR8pb210Pj3w2FcEOmNGxbQ@mail.gmail.com>
	<CAN55FZ2DT4-k06umn=7NYG+NoM6gnVJVQCCwRrr2qOraO+Jadw@mail.gmail.com>
	<aZikzQP6WPJ5Rq2S@nathan>
	<CAN55FZ3cBN_TncLVWyXAKm-KfewguN1AUjyRhoR6zL_QCxHh7A@mail.gmail.com>
	<CAKWEB6qzsZEQ4Czo9QBFiMXqdXVJknHUJwg6wjRwNzLn4+Jw0g@mail.gmail.com>
	<CAN55FZ2O2Ls==sdpROHqxWRx-PMBZ0riJ6eVKoHj8=vssTavxw@mail.gmail.com>

Hello,

On Tue, Feb 24, 2026 at 2:57 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Tue, 24 Feb 2026 at 07:44, Manni Wood <[email protected]>
> wrote:
> >
> > Hello!
> >
> > I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit
> surprised at the regression for x86 with wide rows for the 1/3rd special
> characters scenarios. I'm hoping it's something I did wrong. If anyone else
> has numbers to share, that would be excellent.
>
> Thank you for doing this!
>
> I see similar regression on the wide & CSV 1/3 case by using your
> benchmark script. I didn't see this regression when I used my
> benchmark while sharing v9 [1].
>
> +-------------+---------------------------+---------------------------+
> |             |            Text           |            CSV            |
> +-------------+-------------+-------------+-------------+-------------+
> |  WIDE TEST  |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9996    |    10769    |    11548    |    13960    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
> +-------------+-------------+-------------+-------------+-------------+
> |             |             |             |             |             |
> +-------------+-------------+-------------+-------------+-------------+
> |             |            Text           |             |     CSV     |
> +-------------+-------------+-------------+-------------+-------------+
> | NARROW TEST |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9441    |     9561    |     9734    |     9830    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     |  9291 %-1.5 |  9504 -%0.5 |  9644 %-0.9 | 10078 %-2.4 |
> +-------------+-------------+-------------+-------------+-------------+
>
> I will investigate this. However, please note that the current master
> includes the inlining commit (dc592a4155), which makes the COPY FROM
> faster. In my case,
>
> 1: current master without dc592a4155: 14400ms
> 2: current master: 13960ms (%3 improvement against #1)
> 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> regression against #2)
>
> Is it possible for you to do a similar test? I mean dropping
> dc592a4155 from the current master and re-running the benchmark, that
> would be helpful.
>
> [1]
> https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

Here are some numbers for v10 from my end, these are multiple long runs:
Master contains the previous inlining patch.

This is on an Intel I7-1255U CPU

WIDE (500k rows)

TXT | none
Master avg: 20,721 ms
New avg: 17,980 ms
Improvement: -13.23%

CSV | none
Master avg: 26,608 ms
New avg: 18,433 ms
Improvement: -30.73%

TXT | escape
Master avg: 25,069 ms
New avg: 22,910 ms
Improvement: -8.61%

CSV | quote
Master avg: 31,931 ms
New avg: 31,493 ms
Improvement: -1.37%

--------------------------------------

NARROW (15M rows)

TXT | none
Master avg: 20,687 ms
New avg: 20,824 ms
Regression: +0.67%

CSV | none
Master avg: 21,187 ms
New avg: 21,153 ms
Improvement: -0.16%

TXT | escape
Master avg: 20,870 ms
New avg: 21,341 ms
Regression: +2.25%

CSV | quote
Master avg: 22,074 ms
New avg: 22,267 ms
Regression: +0.87%

For narrow that would be mostly noise and extra branch effects.

Regards,
Ayoub

view thread (59+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CA+K2RukFH57QPAfTEzvy7PEyrLzav3HkyCiu-2yqR+uW_Niorw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox