Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Speed up COPY FROM text/CSV parsing using SIMD
21+ messages / 4 participants
[nested] [flat]

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 13:51  Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-06 13:51 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Neil Conway <[email protected]>; Manni Wood <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <[email protected]> wrote:
>
> On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <[email protected]> wrote:
>>
>> * I'm curious if we'll see better performance on large inputs if we flush to `line_buf` periodically (e.g., at least every few thousand bytes or so). Otherwise we might see poor data cache behavior if large inputs with no control characters get evicted before we've copied them over. See the approach taken in escape_json_with_len() in utils/adt/json.c
>>
> So i gave this a try, attached is the small patch that has v3 + the suggestion added, here are the results with different threshold for line_buf refill:
>
> Execution time compared to master:
> Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3%
> text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2%
> csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6%
> csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0%
>
> L1d cache miss rates:
> WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22%
> text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22%
> csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22%
> csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20%
>
> On my laptop I have 32KB L1 cache per core.
> Results are super close, it is hard to see in the cache misses numbers but execution times are saying other things, doing the periodic filling of line_buf seems good to do.
> If Manni can rerun the benchmarks on these too, it would be nice to confirm this.

I looked at this change and had a couple of points.

We already have REFILL_LINEBUF at the start of the for loop in the
CopyReadLineText() function (let’s call this refill #1). This refills
when the input_buf_ptr >= copy_buf_len check is true. On my end,
copy_buf_len stays at 8191 until the end of the input, and then it
becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to
8192, the REFILL_LINEBUF you added shouldn’t be called; instead,
refill #1 should be triggered.

I verified this manually by adding some logging, and the results seem
to confirm this behavior. Based on that, there shouldn’t be a
performance difference when LINE_BUF_FLUSH_AFTER >= 8k.

Could you please take a look and confirm whether you see the same behavior?

Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512,
so it might be worth trying smaller values here as well.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 18:11  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: KAZAR Ayoub @ 2026-02-06 18:11 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Neil Conway <[email protected]>; Manni Wood <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Fri, Feb 6, 2026 at 2:51 PM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sat, 31 Jan 2026 at 19:21, KAZAR Ayoub <[email protected]> wrote:
> >
> > On Wed, Jan 21, 2026 at 9:50 PM Neil Conway <[email protected]>
> wrote:
> >>
> >> * I'm curious if we'll see better performance on large inputs if we
> flush to `line_buf` periodically (e.g., at least every few thousand bytes
> or so). Otherwise we might see poor data cache behavior if large inputs
> with no control characters get evicted before we've copied them over. See
> the approach taken in escape_json_with_len() in utils/adt/json.c
> >>
> > So i gave this a try, attached is the small patch that has v3 + the
> suggestion added, here are the results with different threshold for
> line_buf refill:
> >
> > Execution time compared to master:
> > Workloadv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> > text/none-16.5%-17.4%-14.3%-12.6%-13.6%-10.5%-16.3%
> > text/esc+5.6%+11.1%+3.1%+7.6%+3.0%+4.9%+4.2%
> > csv/none-31.0%-29.9%-26.7%-30.1%-27.9%-30.2%-29.6%
> > csv/quote+0.2%-0.6%-0.4%-1.0%+0.1%+2.5%-1.0%
> >
> > L1d cache miss rates:
> > WorkloadMasterv3v3.1 (2k)v3.1 (4k)v3.1 (8k)v3.1 (16k)v3.1 (20k)v3.1 (28k)
> > text/none0.20%0.23%0.21%0.22%0.21%0.21%0.21%0.22%
> > text/esc0.21%0.22%0.22%0.22%0.22%0.21%0.22%0.22%
> > csv/none0.17%0.22%0.21%0.22%0.21%0.21%0.22%0.22%
> > csv/quote0.18%0.22%0.19%0.20%0.20%0.19%0.20%0.20%
> >
> > On my laptop I have 32KB L1 cache per core.
> > Results are super close, it is hard to see in the cache misses numbers
> but execution times are saying other things, doing the periodic filling of
> line_buf seems good to do.
> > If Manni can rerun the benchmarks on these too, it would be nice to
> confirm this.
>
> I looked at this change and had a couple of points.
>
> We already have REFILL_LINEBUF at the start of the for loop in the
> CopyReadLineText() function (let’s call this refill #1). This refills
> when the input_buf_ptr >= copy_buf_len check is true. On my end,
> copy_buf_len stays at 8191 until the end of the input, and then it
> becomes the remaining amount. So when I set LINE_BUF_FLUSH_AFTER to
> 8192, the REFILL_LINEBUF you added shouldn’t be called; instead,
> refill #1 should be triggered.
>
> I verified this manually by adding some logging, and the results seem
> to confirm this behavior. Based on that, there shouldn’t be a
> performance difference when LINE_BUF_FLUSH_AFTER >= 8k.
>

> Could you please take a look and confirm whether you see the same behavior?
>
So just to make sure i understand this correctly, line_buf holds processed
bytes of ONE line, so for the periodic flush that i did in:
input_buf_ptr - cstate->input_buf_index >= LINE_BUF_FLUSH_AFTER
if a line in the file is smaller than LINE_BUF_FLUSH_AFTER, the #2
REFILL_LINEBUF is never reached in a CopyReadLine entrance, as line_buf is
cleared after a line:

CopyReadLine(CopyFromState cstate, bool is_csv)
{
        bool result;
        resetStringInfo(&cstate->line_buf);
....
}

So my previous benchmarks (ones that have LINE_BUF_FLUSH_AFTER > 4096) are
wrong since I was working with lines of 4096 bytes.
If: Line length < LINE_BUF_FLUSH_AFTER < INPUT_BUF_SIZE : This neither hits
#1 REFILL_LINEBUF nor #2 PERIODIC REFILL_LINEBUF, this only reaches the
#3 REFILL_LINEBUF before CopyReadLineText returns.

So flushing here is just mitigating against l1d cache misses for long lines
(lines that occupy for example  > 70% input_buf size, maybe ?)
Does this make sense for your case too with 8k ?
So i propose removing REFILL_LINEBUF in #1, as it doesn't make sense
anymore since PERIODIC REFILL_LINEBUF already does the job for smaller
sizes than INPUT_BUF_SIZE (in accordance to most l1d cache sizes).
So basically it becomes
If (line length < LINE_BUF_FLUSH_AFTER) then flush at very end
if (line length > LINE_BUF_FLUSH_AFTER) flush (line length
/ LINE_BUF_FLUSH_AFTER) + leftover times


> Also, I noticed that json.c uses ESCAPE_JSON_FLUSH_AFTER set to 512,
> so it might be worth trying smaller values here as well.

If I'm correct about the usage of LINE_BUF_FLUSH_AFTER above, I think
smaller values would imply too many memory loads that are unnecessary, as
for 512 we aren't battling against l1d cache misses anymore, though I'll
try it in the next benchmark.
If this sounds right, I'll be re-benchmarking for multiple row sizes with
different LINE_BUF_FLUSH_AFTER sizes.

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 21:29  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-06 21:29 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Sorry for disappearing from this thread for a while.

It looks like a lot of energy has been put into benchmarking and refining
the heuristic for deciding when to use the SIMD path so that we avoid large
regressions when there are special characters.  I think this is all
valuable work, but I'm a bit concerned that we are putting the cart before
the horse.  IMHO it would be better to first get the SIMD code committed
with the absolute simplest heuristic we can think of (e.g., as soon as we
see a special character, switch to the scalar path for the remainder of
COPY FROM).  My hope is that would be far easier to reason about from a
performance angle.  If we immediately fall back to the existing code path,
we don't need to worry about how many special characters there are and
whether they are sparse or clustered or whatever.  We just need to measure
the overhead of the new branches and ensure they don't produce meaningful
regressions.  Assuming that all looks good, we can then focus on the SIMD
code itself and make sure that is correct and optimal.  And once we get
that portion committed, we could then consider more sophisticated
heuristics.

FWIW I'm hoping to get something in this area committed for v19, and IMO
now is a good time to start thinking about how to get things over the
finish line.  Thanks for working on it.

-- 
nathan

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 22:19  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-06 22:19 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

Thank you for sharing your thoughts!

On Sat, 7 Feb 2026 at 00:29, Nathan Bossart <[email protected]> wrote:
>
> It looks like a lot of energy has been put into benchmarking and refining
> the heuristic for deciding when to use the SIMD path so that we avoid large
> regressions when there are special characters.  I think this is all
> valuable work, but I'm a bit concerned that we are putting the cart before
> the horse.  IMHO it would be better to first get the SIMD code committed
> with the absolute simplest heuristic we can think of (e.g., as soon as we
> see a special character, switch to the scalar path for the remainder of
> COPY FROM).  My hope is that would be far easier to reason about from a
> performance angle.  If we immediately fall back to the existing code path,
> we don't need to worry about how many special characters there are and
> whether they are sparse or clustered or whatever.  We just need to measure
> the overhead of the new branches and ensure they don't produce meaningful
> regressions.  Assuming that all looks good, we can then focus on the SIMD
> code itself and make sure that is correct and optimal.  And once we get
> that portion committed, we could then consider more sophisticated
> heuristics.

I have three possible approaches in my mind, they are actually similar
to each other.

1- After encountering a special character, disable SIMD for the rest
of the current line and also for the rest of the data.

2- It is a mixed version of the current heuristic and #1. After
encountering a special character, skip SIMD for the current line (let'
say line 1) and for the next line (line 2). Then try running SIMD for
the next line (line 3), if there is no special character continue to
run SIMD but if there is a special character then skip running SIMD
for two lines this time. And it goes like that, everytime special
character is encountered in the SIMD run, skipped SIMD lines are
doubled.

3- This version is a bit different from #2. Instead of calculating the
number of lines to skip dynamically, skip the constant N number of
lines and then try to run SIMD again after these lines. N could be
something like 100, 1000, or 10000 etc.. Actually, you and Andrew
suggested this approach before [1].

I think what you suggested is closer to #1 or #3. I just wanted to
hear your opinions, and whether you think any of these approaches are
good to implement / work on.

[1] https://postgr.es/m/aR4wDwNdLc5TmcQq%40nathan

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 22:36  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 21+ messages in thread

From: KAZAR Ayoub @ 2026-02-06 22:36 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Fri, Feb 6, 2026 at 11:19 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> Thank you for sharing your thoughts!
>
> On Sat, 7 Feb 2026 at 00:29, Nathan Bossart <[email protected]>
> wrote:
> >
> > It looks like a lot of energy has been put into benchmarking and refining
> > the heuristic for deciding when to use the SIMD path so that we avoid
> large
> > regressions when there are special characters.  I think this is all
> > valuable work, but I'm a bit concerned that we are putting the cart
> before
> > the horse.  IMHO it would be better to first get the SIMD code committed
> > with the absolute simplest heuristic we can think of (e.g., as soon as we
> > see a special character, switch to the scalar path for the remainder of
> > COPY FROM).  My hope is that would be far easier to reason about from a
> > performance angle.  If we immediately fall back to the existing code
> path,
> > we don't need to worry about how many special characters there are and
> > whether they are sparse or clustered or whatever.  We just need to
> measure
> > the overhead of the new branches and ensure they don't produce meaningful
> > regressions.  Assuming that all looks good, we can then focus on the SIMD
> > code itself and make sure that is correct and optimal.  And once we get
> > that portion committed, we could then consider more sophisticated
> > heuristics.
>
I also agree on this, especially for the line_buf refilling idea, it needs
a bit more time to find the good value of threshold than work for
heuristic.

>
> I have three possible approaches in my mind, they are actually similar
> to each other.
>
> 1- After encountering a special character, disable SIMD for the rest
> of the current line and also for the rest of the data.
>
> 2- It is a mixed version of the current heuristic and #1. After
> encountering a special character, skip SIMD for the current line (let'
> say line 1) and for the next line (line 2). Then try running SIMD for
> the next line (line 3), if there is no special character continue to
> run SIMD but if there is a special character then skip running SIMD
> for two lines this time. And it goes like that, everytime special
> character is encountered in the SIMD run, skipped SIMD lines are
> doubled.
>
> 3- This version is a bit different from #2. Instead of calculating the
> number of lines to skip dynamically, skip the constant N number of
> lines and then try to run SIMD again after these lines. N could be
> something like 100, 1000, or 10000 etc.. Actually, you and Andrew
> suggested this approach before [1].
>
> I think what you suggested is closer to #1 or #3. I just wanted to
> hear your opinions, and whether you think any of these approaches are
> good to implement / work on.
>
For v19, #1 seems like a "wasted potential", #3 sounds more relaxed than
v4.2 so this has good potential, i can fully benchmark it against v3 as
soon as you send a patch for it.


Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-06 22:47  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-06 22:47 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Sat, Feb 07, 2026 at 01:19:16AM +0300, Nazir Bilal Yavuz wrote:
> I have three possible approaches in my mind, they are actually similar
> to each other.
> 
> 1- After encountering a special character, disable SIMD for the rest
> of the current line and also for the rest of the data.
> 
> 2- It is a mixed version of the current heuristic and #1. After
> encountering a special character, skip SIMD for the current line (let'
> say line 1) and for the next line (line 2). Then try running SIMD for
> the next line (line 3), if there is no special character continue to
> run SIMD but if there is a special character then skip running SIMD
> for two lines this time. And it goes like that, everytime special
> character is encountered in the SIMD run, skipped SIMD lines are
> doubled.
> 
> 3- This version is a bit different from #2. Instead of calculating the
> number of lines to skip dynamically, skip the constant N number of
> lines and then try to run SIMD again after these lines. N could be
> something like 100, 1000, or 10000 etc.. Actually, you and Andrew
> suggested this approach before [1].
> 
> I think what you suggested is closer to #1 or #3. I just wanted to
> hear your opinions, and whether you think any of these approaches are
> good to implement / work on.

Yeah, I think either (1) or (3) would be a good starting point.  (1) is
basically just (3) with N set to infinity, anyway.  I imagine there's some
value less than infinity that is acceptable, but if I had to pick an
approach right now, I'd probably go with (1) to essentially remove the
heuristic from the discussion until we're ready to focus on it.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-11 13:27  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-11 13:27 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 7 Feb 2026 at 01:47, Nathan Bossart <[email protected]> wrote:
>
> On Sat, Feb 07, 2026 at 01:19:16AM +0300, Nazir Bilal Yavuz wrote:
> > I have three possible approaches in my mind, they are actually similar
> > to each other.
> >
> > 1- After encountering a special character, disable SIMD for the rest
> > of the current line and also for the rest of the data.
> >
> > 2- It is a mixed version of the current heuristic and #1. After
> > encountering a special character, skip SIMD for the current line (let'
> > say line 1) and for the next line (line 2). Then try running SIMD for
> > the next line (line 3), if there is no special character continue to
> > run SIMD but if there is a special character then skip running SIMD
> > for two lines this time. And it goes like that, everytime special
> > character is encountered in the SIMD run, skipped SIMD lines are
> > doubled.
> >
> > 3- This version is a bit different from #2. Instead of calculating the
> > number of lines to skip dynamically, skip the constant N number of
> > lines and then try to run SIMD again after these lines. N could be
> > something like 100, 1000, or 10000 etc.. Actually, you and Andrew
> > suggested this approach before [1].
> >
> > I think what you suggested is closer to #1 or #3. I just wanted to
> > hear your opinions, and whether you think any of these approaches are
> > good to implement / work on.
>
> Yeah, I think either (1) or (3) would be a good starting point.  (1) is
> basically just (3) with N set to infinity, anyway.  I imagine there's some
> value less than infinity that is acceptable, but if I had to pick an
> approach right now, I'd probably go with (1) to essentially remove the
> heuristic from the discussion until we're ready to focus on it.

I am sharing a v6 which implements (1). My benchmark results show
almost no difference for the special-character cases and a nice
improvement for the no-special-character cases.

Timing results after running Manni's v1.2.1 benchmark:

+---------+---------------+----------------+--------------+----------------+
|         | text | no sp. | text | 1/3 sp. | csv | no sp. | csv | 1/3 sp.  |
+---------+---------------+----------------+--------------+----------------+
| master  | 104437        | 118711         | 121173       | 151589         |
+---------+---------------+----------------+--------------+----------------+
| patched | 90062 -%13.7  | 119070 +%0.003 | 88964 -%26.5 | 153710 +%0.013 |
+---------+---------------+----------------+--------------+----------------+

In case the table does not render well in your email client, here is a
short summary:

- Text, no special characters: 13.7% faster
- Text, 1/3 special characters: %0.003 slower, no meaningful change

- CSV, no special characters: 26.5% faster
- CSV, 1/3 special characters: %0.013 slower, no meaningful change

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v6-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (8.0K, 2-v6-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 494f86e2cd01c9d55e90f7683e151828d127b8e4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Feb 2026 14:49:21 +0300
Subject: [PATCH v6] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   4 +
 src/backend/commands/copyfromparse.c     | 132 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   4 +
 3 files changed, 135 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 25ee20b23db..fbf78b6698b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1721,6 +1721,10 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD variables */
+	cstate->simd_enabled = false;
+	cstate->simd_initialized = false;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 94d6f415a06..554b3cb9bf8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -141,12 +142,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 									 Oid typioparam, int32 typmod,
 									 bool *isnull);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv,
+														bool simd_enabled);
 static pg_attribute_always_inline bool CopyFromTextLikeOneRow(CopyFromState cstate,
 															  ExprContext *econtext,
 															  Datum *values,
@@ -1173,8 +1176,21 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/* Initialize SIMD on the first read */
+	if (unlikely(!cstate->simd_initialized))
+	{
+		cstate->simd_initialized = true;
+		cstate->simd_enabled = true;
+	}
+
+	/*
+	 * Parse data and transfer into line_buf. To benefit from inlining, call
+	 * CopyReadLineText() with constant boolean arguments.
+	 */
+	if (cstate->simd_enabled)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,8 +1257,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1257,6 +1273,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1264,6 +1288,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1330,6 +1360,98 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - When the remaining buffer is smaller than one
+		 * vector width (sizeof(Vector8)), since SIMD operates on fixed-size
+		 * chunks.
+		 *
+		 * Note that, SIMD may become slower when the input contains many
+		 * special characters. To avoid this regression, we disable SIMD for
+		 * the rest of the input once we encounter a special character which
+		 * is neither EOF nor EOL.
+		 */
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+				simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);
+				simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv;
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+				{
+					simd_enabled = false;
+					cstate->simd_enabled = false;
+				}
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 822ef33cf69..56942a15469 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,10 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+	bool		simd_initialized;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-11 22:39  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-11 22:39 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> I am sharing a v6 which implements (1). My benchmark results show
> almost no difference for the special-character cases and a nice
> improvement for the no-special-character cases.

Thanks!

> +	/* Initialize SIMD variables */
> +	cstate->simd_enabled = false;
> +	cstate->simd_initialized = false;

> +	/* Initialize SIMD on the first read */
> +	if (unlikely(!cstate->simd_initialized))
> +	{
> +		cstate->simd_initialized = true;
> +		cstate->simd_enabled = true;
> +	}

Why do we do this initialization in CopyReadLine() as opposed to setting
simd_enabled to true when initializing cstate in BeginCopyFrom()?  If we
can initialize it in BeginCopyFrom, we could probably remove
simd_initialized.

> +	if (cstate->simd_enabled)
> +		result = CopyReadLineText(cstate, is_csv, true);
> +	else
> +		result = CopyReadLineText(cstate, is_csv, false);

I know we discussed this upthread, but I'd like to take a closer look at
this to see whether/why it makes such a big difference.  It's a bit awkward
that CopyReadLineText() needs to manage both its local simd_enabled and
cstate->simd_enabled.

+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);

As mentioned upthread [0], I think it's worth testing whether processing
multiple vectors worth of data in each loop iteration is worthwhile.

[0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-13 11:45  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-13 11:45 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

Thanks for the review!

On Thu, 12 Feb 2026 at 01:39, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> > I am sharing a v6 which implements (1). My benchmark results show
> > almost no difference for the special-character cases and a nice
> > improvement for the no-special-character cases.
>
> Thanks!
>
> > +     /* Initialize SIMD variables */
> > +     cstate->simd_enabled = false;
> > +     cstate->simd_initialized = false;
>
> > +     /* Initialize SIMD on the first read */
> > +     if (unlikely(!cstate->simd_initialized))
> > +     {
> > +             cstate->simd_initialized = true;
> > +             cstate->simd_enabled = true;
> > +     }
>
> Why do we do this initialization in CopyReadLine() as opposed to setting
> simd_enabled to true when initializing cstate in BeginCopyFrom()?  If we
> can initialize it in BeginCopyFrom, we could probably remove
> simd_initialized.

Correct, I guess this is left over from the earlier versions.

> > +     if (cstate->simd_enabled)
> > +             result = CopyReadLineText(cstate, is_csv, true);
> > +     else
> > +             result = CopyReadLineText(cstate, is_csv, false);
>
> I know we discussed this upthread, but I'd like to take a closer look at
> this to see whether/why it makes such a big difference.  It's a bit awkward
> that CopyReadLineText() needs to manage both its local simd_enabled and
> cstate->simd_enabled.

I extensively benchmarked this with the new v6 version. If I change
this to either of:

CopyReadLineText(cstate, is_csv);
or
CopyReadLineText(cstate, is_csv, cstate->simd_enabled);

then there is %5-%10 regression for the scalar path. I ran my
benchmarks with both "meson --buildtype=debugoptimized" and "meson
--buildtype=release" but the result is the same.

Also, if I change this code to:

    if (cstate->simd_enabled)
    {
        if (is_csv)
            result = CopyReadLineText(cstate, true, true);
        else
            result = CopyReadLineText(cstate, false, true);
    }
    else
    {
        if (is_csv)
            result = CopyReadLineText(cstate, true, false);
        else
            result = CopyReadLineText(cstate, false, false);
    }

then I see ~%5 performance improvement in scalar path compared to master.

> +                       /* Load a chunk of data into a vector register */
> +                       vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
>
> As mentioned upthread [0], I think it's worth testing whether processing
> multiple vectors worth of data in each loop iteration is worthwhile.
>
> [0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan

There are multiple keys in CopyReadLineText() compared to
pg_lfind32(). I am not sure if I correctly used multiple vectors but I
attached what I did as 0002, could you please look at it? I didn't see
any performance benefit in my benchmarks, though.

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v7-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (7.8K, 2-v7-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From c4b29849ad9f87f51022b947a9a0ab695dd1cde2 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:28:55 +0300
Subject: [PATCH v7 1/2] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 125 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 25ee20b23db..40dae0bdacc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1721,6 +1721,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 94d6f415a06..4a127d1af90 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -141,12 +142,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 									 Oid typioparam, int32 typmod,
 									 bool *isnull);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv,
+														bool simd_enabled);
 static pg_attribute_always_inline bool CopyFromTextLikeOneRow(CopyFromState cstate,
 															  ExprContext *econtext,
 															  Datum *values,
@@ -1173,8 +1176,14 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/*
+	 * Parse data and transfer into line_buf. To benefit from inlining, call
+	 * CopyReadLineText() with constant boolean arguments.
+	 */
+	if (cstate->simd_enabled)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,8 +1250,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1257,6 +1266,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1264,6 +1281,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1330,6 +1353,98 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - When the remaining buffer is smaller than one
+		 * vector width (sizeof(Vector8)), since SIMD operates on fixed-size
+		 * chunks.
+		 *
+		 * Note that, SIMD may become slower when the input contains many
+		 * special characters. To avoid this regression, we disable SIMD for
+		 * the rest of the input once we encounter a special character which
+		 * is neither EOF nor EOL.
+		 */
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+				simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);
+				simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv;
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+				{
+					simd_enabled = false;
+					cstate->simd_enabled = false;
+				}
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 822ef33cf69..73ce777c52b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



  [text/x-patch] v7-0002-Use-4-vectors-in-CopyReadLineText-SIMD.patch (6.4K, 3-v7-0002-Use-4-vectors-in-CopyReadLineText-SIMD.patch)
  download | inline diff:
From 2de9b5bc18bfa169b3ba3507b6bdf79d277c0ad4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:36:34 +0300
Subject: [PATCH v7 2/2] Use 4 vectors in CopyReadLineText() SIMD

---
 src/backend/commands/copyfromparse.c | 116 +++++++++++++++++++++------
 1 file changed, 92 insertions(+), 24 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 4a127d1af90..caadc40cc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1361,6 +1361,9 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 		 * escape). This is faster than byte-by-byte iteration, especially on
 		 * large buffers.
 		 *
+		 * For better instruction-level parallelism, we try to process four
+		 * vectors at a time.
+		 *
 		 * We do not apply the SIMD fast path in either of the following
 		 * cases: - When the previously processed character was an escape
 		 * character (last_was_esc), since the next byte must be examined
@@ -1373,53 +1376,118 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 		 * the rest of the input once we encounter a special character which
 		 * is neither EOF nor EOL.
 		 */
-		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr >= 4 * sizeof(Vector8))
 		{
-			Vector8		chunk;
-			Vector8		match = vector8_broadcast(0);
-			uint32		mask;
-
-			/* Load a chunk of data into a vector register */
-			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+			Vector8		chunk1,
+						chunk2,
+						chunk3,
+						chunk4;
+			Vector8		match1,
+						match2,
+						match3,
+						match4;
+			Vector8		tmp1,
+						tmp2,
+						result;
+
+			/* Load four chunks of data into vector registers */
+			vector8_load(&chunk1, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+			vector8_load(&chunk2, (const uint8 *) &copy_input_buf[input_buf_ptr + sizeof(Vector8)]);
+			vector8_load(&chunk3, (const uint8 *) &copy_input_buf[input_buf_ptr + 2 * sizeof(Vector8)]);
+			vector8_load(&chunk4, (const uint8 *) &copy_input_buf[input_buf_ptr + 3 * sizeof(Vector8)]);
 
 			if (is_csv)
 			{
 				/* \n and \r are not special inside quotes */
 				if (!in_quote)
-					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				{
+					match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr));
+					match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr));
+					match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr));
+					match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr));
+				}
+				else
+				{
+					match1 = vector8_broadcast(0);
+					match2 = vector8_broadcast(0);
+					match3 = vector8_broadcast(0);
+					match4 = vector8_broadcast(0);
+				}
 
-				match = vector8_or(match, vector8_eq(chunk, quote));
+				match1 = vector8_or(match1, vector8_eq(chunk1, quote));
+				match2 = vector8_or(match2, vector8_eq(chunk2, quote));
+				match3 = vector8_or(match3, vector8_eq(chunk3, quote));
+				match4 = vector8_or(match4, vector8_eq(chunk4, quote));
 				if (escapec != '\0')
-					match = vector8_or(match, vector8_eq(chunk, escape));
+				{
+					match1 = vector8_or(match1, vector8_eq(chunk1, escape));
+					match2 = vector8_or(match2, vector8_eq(chunk2, escape));
+					match3 = vector8_or(match3, vector8_eq(chunk3, escape));
+					match4 = vector8_or(match4, vector8_eq(chunk4, escape));
+				}
 			}
 			else
 			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, bs));
+				match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr));
+				match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr));
+				match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr));
+				match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr));
+
+				match1 = vector8_or(match1, vector8_eq(chunk1, bs));
+				match2 = vector8_or(match2, vector8_eq(chunk2, bs));
+				match3 = vector8_or(match3, vector8_eq(chunk3, bs));
+				match4 = vector8_or(match4, vector8_eq(chunk4, bs));
 			}
 
-			/* Check if we found any special characters */
-			mask = vector8_highbit_mask(match);
-			if (mask != 0)
+			/* Combine results to check if any chunk has special characters */
+			tmp1 = vector8_or(match1, match2);
+			tmp2 = vector8_or(match3, match4);
+			result = vector8_or(tmp1, tmp2);
+
+			if (vector8_is_highbit_set(result))
 			{
 				/*
-				 * Found a special character. Advance up to that point and let
-				 * the scalar code handle it.
+				 * Found a special character somewhere in the four chunks.
+				 * Identify the first chunk containing it.
 				 */
-				int			advance = pg_rightmost_one_pos32(mask);
+				uint32		mask;
+				int			advance;
 				char		c1,
 							c2;
 				bool		simd_hit_eol,
 							simd_hit_eof;
 
+				mask = vector8_highbit_mask(match1);
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match2);
+				}
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match3);
+				}
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match4);
+				}
+				Assert(mask != 0);
+
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				advance = pg_rightmost_one_pos32(mask);
 				input_buf_ptr += advance;
 				c1 = copy_input_buf[input_buf_ptr];
 
 				/*
-				 * Since we stopped within the chunk and ((copy_buf_len -
-				 * input_buf_ptr) > sizeof(Vector8)) is true,
-				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
-				 * readable.
+				 * Since we stopped within the block and ((copy_buf_len -
+				 * input_buf_ptr) >= 4 * sizeof(Vector8)) was true at the
+				 * start, copy_input_buf[input_buf_ptr + 1] is guaranteed to
+				 * be readable.
 				 */
 				c2 = copy_input_buf[input_buf_ptr + 1];
 				simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);
@@ -1438,8 +1506,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 			}
 			else
 			{
-				/* No special characters found, so skip the entire chunk */
-				input_buf_ptr += sizeof(Vector8);
+				/* No special characters found, so skip the entire block */
+				input_buf_ptr += 4 * sizeof(Vector8);
 				continue;
 			}
 		}
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-13 23:09  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 21+ messages in thread

From: Nathan Bossart @ 2026-02-13 23:09 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote:
> Also, if I change this code to:
> 
>     if (cstate->simd_enabled)
>     {
>         if (is_csv)
>             result = CopyReadLineText(cstate, true, true);
>         else
>             result = CopyReadLineText(cstate, false, true);
>     }
>     else
>     {
>         if (is_csv)
>             result = CopyReadLineText(cstate, true, false);
>         else
>             result = CopyReadLineText(cstate, false, false);
>     }
> 
> then I see ~%5 performance improvement in scalar path compared to master.

Hm.  What difference do you see if you just do

	if (is_csv)
		result = CopyReadLineText(cstate, true);
	else
		result = CopyReadLineText(cstate, false);

both with and without the SIMD stuff?  IIUC this is allowing the compiler
to remove several branches in CopyReadLineText(), which might be a nice
improvement on its own.  That being said, I'm less convinced that adding a
simd_enabled parameter to CopyReadLineText() helps, because 1) it's
involved in fewer branches and 2) we change it within the function, so the
compiler can't remove the branches, anyway.  But perhaps I'm missing
something.

Some other random thoughts:

+                    match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));

+                match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));

Since \n and \r are well below "normal" ASCII values, I wonder if we could
simplify these to something like

	match = vector8_gt(... vector with all lanes set to \r + 1 ..., chunk);

+            /* Check if we found any special characters */
+            mask = vector8_highbit_mask(match);
+            if (mask != 0)

vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if
waiting until we enter the "if" block to calculate it has any benefit.

+                simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);

If (is_csv && in_quote), we shouldn't have picked up \r or \n in the first
place, right?

+                simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv;
+
+                /*
+                 * Do not disable SIMD when we hit EOL or EOF characters. In
+                 * practice, it does not matter for EOF because parsing ends
+                 * there, but we keep the behavior consistent.
+                 */
+                if (!(simd_hit_eof || simd_hit_eol))

I'd think that doing less unnecessary work would outweigh the benefits of
consistency for the EOF case.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-14 03:34  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 21+ messages in thread

From: Manni Wood @ 2026-02-14 03:34 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello!

I ran some COPY FROM tests using master and then Nazir's v7-0001 and
v7-0002 patches applied to master.

x86 master
TXT :                 29222.524250 ms
CSV :                 36162.588500 ms
TXT with 1/3 escapes: 32922.649750 ms
CSV with 1/3 quotes:  47631.423750 ms

x86 v7-0001
TXT :                 23247.834250 ms  20.445496% improvement
CSV :                 23162.711750 ms  35.948413% improvement
TXT with 1/3 escapes: 31786.386000 ms  3.451313% improvement
CSV with 1/3 quotes:  43330.475500 ms  9.029645% improvement

x86 v7-0002
TXT :                 22394.812500 ms  23.364552% improvement
CSV :                 22374.645750 ms  38.127643% improvement
TXT with 1/3 escapes: 32378.929750 ms  1.651507% improvement
CSV with 1/3 quotes:  47139.171750 ms  1.033461% improvement

arm master
TXT :                 9448.900500 ms
CSV :                 11135.871500 ms
TXT with 1/3 escapes: 10786.418750 ms
CSV with 1/3 quotes:  14115.335500 ms

arm v7-0001
TXT :                 7271.170500 ms  23.047443% improvement
CSV :                 7259.866750 ms  34.806479% improvement
TXT with 1/3 escapes: 10894.445500 ms  -1.001507% regression
CSV with 1/3 quotes:  13398.444000 ms  5.078813% improvement

arm v7-0002
TXT :                 7165.707250 ms  24.163587% improvement
CSV :                 7140.497250 ms  35.878416% improvement
TXT with 1/3 escapes: 10308.782250 ms  4.428129% improvement
CSV with 1/3 quotes:  12576.179500 ms  10.904140% improvement

v7-0001 + v7-0002 applied to master certainly seems promising: nice to see
speed improvements across the board on both x86 and arm!

On Fri, Feb 13, 2026 at 5:09 PM Nathan Bossart <[email protected]>
wrote:

> On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote:
> > Also, if I change this code to:
> >
> >     if (cstate->simd_enabled)
> >     {
> >         if (is_csv)
> >             result = CopyReadLineText(cstate, true, true);
> >         else
> >             result = CopyReadLineText(cstate, false, true);
> >     }
> >     else
> >     {
> >         if (is_csv)
> >             result = CopyReadLineText(cstate, true, false);
> >         else
> >             result = CopyReadLineText(cstate, false, false);
> >     }
> >
> > then I see ~%5 performance improvement in scalar path compared to master.
>
> Hm.  What difference do you see if you just do
>
>         if (is_csv)
>                 result = CopyReadLineText(cstate, true);
>         else
>                 result = CopyReadLineText(cstate, false);
>
> both with and without the SIMD stuff?  IIUC this is allowing the compiler
> to remove several branches in CopyReadLineText(), which might be a nice
> improvement on its own.  That being said, I'm less convinced that adding a
> simd_enabled parameter to CopyReadLineText() helps, because 1) it's
> involved in fewer branches and 2) we change it within the function, so the
> compiler can't remove the branches, anyway.  But perhaps I'm missing
> something.
>
> Some other random thoughts:
>
> +                    match = vector8_or(vector8_eq(chunk, nl),
> vector8_eq(chunk, cr));
>
> +                match = vector8_or(vector8_eq(chunk, nl),
> vector8_eq(chunk, cr));
>
> Since \n and \r are well below "normal" ASCII values, I wonder if we could
> simplify these to something like
>
>         match = vector8_gt(... vector with all lanes set to \r + 1 ...,
> chunk);
>
> +            /* Check if we found any special characters */
> +            mask = vector8_highbit_mask(match);
> +            if (mask != 0)
>
> vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if
> waiting until we enter the "if" block to calculate it has any benefit.
>
> +                simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv ||
> !in_quote);
>
> If (is_csv && in_quote), we shouldn't have picked up \r or \n in the first
> place, right?
>
> +                simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv;
> +
> +                /*
> +                 * Do not disable SIMD when we hit EOL or EOF characters.
> In
> +                 * practice, it does not matter for EOF because parsing
> ends
> +                 * there, but we keep the behavior consistent.
> +                 */
> +                if (!(simd_hit_eof || simd_hit_eol))
>
> I'd think that doing less unnecessary work would outweigh the benefits of
> consistency for the EOF case.
>
> --
> nathan
>


-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-16 17:04  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-16 17:04 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Feb 13, 2026 at 09:34:13PM -0600, Manni Wood wrote:
> v7-0001 + v7-0002 applied to master certainly seems promising: nice to see
> speed improvements across the board on both x86 and arm!

Thanks for testing.  Based on these results, I think we can abandon 0002,
at least for now.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-16 18:15  Nathan Bossart <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-16 18:15 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Feb 16, 2026 at 11:04:58AM -0600, Nathan Bossart wrote:
> On Fri, Feb 13, 2026 at 09:34:13PM -0600, Manni Wood wrote:
>> v7-0001 + v7-0002 applied to master certainly seems promising: nice to see
>> speed improvements across the board on both x86 and arm!
> 
> Thanks for testing.  Based on these results, I think we can abandon 0002,
> at least for now.

Have you tested small rows, i.e., less than 16 bytes per row?  I'm
wondering if that regresses at all.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-17 05:01  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Manni Wood @ 2026-02-17 05:01 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Feb 16, 2026 at 12:15 PM Nathan Bossart <[email protected]>
wrote:

> On Mon, Feb 16, 2026 at 11:04:58AM -0600, Nathan Bossart wrote:
> > On Fri, Feb 13, 2026 at 09:34:13PM -0600, Manni Wood wrote:
> >> v7-0001 + v7-0002 applied to master certainly seems promising: nice to
> see
> >> speed improvements across the board on both x86 and arm!
> >
> > Thanks for testing.  Based on these results, I think we can abandon 0002,
> > at least for now.
>
> Have you tested small rows, i.e., less than 16 bytes per row?  I'm
> wondering if that regresses at all.
>
> --
> nathan
>

I ran some tests using narrow rows that look like this:

$ head t_none.txt
BB      AA
BB      AA
BB      AA

$ head t_none.csv
BB,AA
BB,AA
BB,AA

$ head t_escape.txt
B\\B    A\\A
B\\B    A\\A
B\\B    A\\A

$ head t_quote.csv
"B""B","A""A"
"B""B","A""A"
"B""B","A""A"

Here are the results on my x86 tower and my arm raspberry pi 5:

x86 NARROW master copy from
TXT :                 2477.022500 ms
CSV :                 2825.095500 ms
TXT with 1/3 escapes: 2620.575000 ms
CSV with 1/3 quotes:  3249.058750 ms

x86 NARROW v70001 copy from
TXT :                 2475.659000 ms  0.055046% improvement
CSV :                 2421.976750 ms  14.269208% improvement
TXT with 1/3 escapes: 2660.953750 ms  -1.540836% regression
CSV with 1/3 quotes:  3255.546750 ms  -0.199689% regression

x86 NARROW v70002 copy from
TXT :                 2481.372250 ms  -0.175604% regression
CSV :                 2437.541250 ms  13.718271% improvement
TXT with 1/3 escapes: 2646.300000 ms  -0.981655% regression
CSV with 1/3 quotes:  3202.014500 ms  1.447935% improvement


arm NARROW master copy from
TXT :                 2294.270500 ms
CSV :                 2085.839000 ms
TXT with 1/3 escapes: 2467.966000 ms
CSV with 1/3 quotes:  2485.533000 ms

arm NARROW v70001 copy from
TXT :                 1982.497500 ms  13.589200% improvement
CSV :                 2005.829500 ms  3.835843% improvement
TXT with 1/3 escapes: 2111.778250 ms  14.432442% improvement
CSV with 1/3 quotes:  2441.370000 ms  1.776802% improvement

arm NARROW v70002 copy from
TXT :                 1975.982250 ms  13.873179% improvement
CSV :                 2022.744000 ms  3.024922% improvement
TXT with 1/3 escapes: 2080.273000 ms  15.709009% improvement
CSV with 1/3 quotes:  2476.819000 ms  0.350589% improvement

Hope this helps!
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-17 22:29  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Nathan Bossart @ 2026-02-17 22:29 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Feb 16, 2026 at 11:01:21PM -0600, Manni Wood wrote:
> Here are the results on my x86 tower and my arm raspberry pi 5:
> 
> x86 NARROW master copy from
> TXT :                 2477.022500 ms
> CSV :                 2825.095500 ms
> TXT with 1/3 escapes: 2620.575000 ms
> CSV with 1/3 quotes:  3249.058750 ms
> 
> x86 NARROW v70001 copy from
> TXT :                 2475.659000 ms  0.055046% improvement
> CSV :                 2421.976750 ms  14.269208% improvement
> TXT with 1/3 escapes: 2660.953750 ms  -1.540836% regression
> CSV with 1/3 quotes:  3255.546750 ms  -0.199689% regression
> 
> x86 NARROW v70002 copy from
> TXT :                 2481.372250 ms  -0.175604% regression
> CSV :                 2437.541250 ms  13.718271% improvement
> TXT with 1/3 escapes: 2646.300000 ms  -0.981655% regression
> CSV with 1/3 quotes:  3202.014500 ms  1.447935% improvement
> 
> 
> arm NARROW master copy from
> TXT :                 2294.270500 ms
> CSV :                 2085.839000 ms
> TXT with 1/3 escapes: 2467.966000 ms
> CSV with 1/3 quotes:  2485.533000 ms
> 
> arm NARROW v70001 copy from
> TXT :                 1982.497500 ms  13.589200% improvement
> CSV :                 2005.829500 ms  3.835843% improvement
> TXT with 1/3 escapes: 2111.778250 ms  14.432442% improvement
> CSV with 1/3 quotes:  2441.370000 ms  1.776802% improvement
> 
> arm NARROW v70002 copy from
> TXT :                 1975.982250 ms  13.873179% improvement
> CSV :                 2022.744000 ms  3.024922% improvement
> TXT with 1/3 escapes: 2080.273000 ms  15.709009% improvement
> CSV with 1/3 quotes:  2476.819000 ms  0.350589% improvement

Thanks.  I don't see anything too terribly concerning here, but we should
still plan to redo these tests once the patch is ready for commit.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-18 13:38  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-18 13:38 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 14 Feb 2026 at 02:09, Nathan Bossart <[email protected]> wrote:
>
> On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote:
> > Also, if I change this code to:
> >
> >     if (cstate->simd_enabled)
> >     {
> >         if (is_csv)
> >             result = CopyReadLineText(cstate, true, true);
> >         else
> >             result = CopyReadLineText(cstate, false, true);
> >     }
> >     else
> >     {
> >         if (is_csv)
> >             result = CopyReadLineText(cstate, true, false);
> >         else
> >             result = CopyReadLineText(cstate, false, false);
> >     }
> >
> > then I see ~%5 performance improvement in scalar path compared to master.
>
> Hm.  What difference do you see if you just do
>
>         if (is_csv)
>                 result = CopyReadLineText(cstate, true);
>         else
>                 result = CopyReadLineText(cstate, false);
>
> both with and without the SIMD stuff?  IIUC this is allowing the compiler
> to remove several branches in CopyReadLineText(), which might be a nice
> improvement on its own.  That being said, I'm less convinced that adding a
> simd_enabled parameter to CopyReadLineText() helps, because 1) it's
> involved in fewer branches and 2) we change it within the function, so the
> compiler can't remove the branches, anyway.  But perhaps I'm missing
> something.

I did couple of benchmarks, some info about them:

- Benchmarks show percentage comparisons of timings against the master
branch. Positive values mean a regression, while negative values mean
an improvement.

- There are a total 200000 lines in each input and each line is 8192 bytes.

- For the columns, none means there is no special character. The other
numbers represent the ratio of normal characters to special
characters. For example, 0 means all the data is special characters, 4
means %25 of the data is special characters, 16 means 1/16 of the data
is special characters and such.

--------------------

This is the benchmark without the SIMD stuff:

- only_inline: Only change is CopyReadLineText() being inlined.

- is_csv_verbose-wo-inline: is_csv is sent as a constant boolean like
you suggested but CopyReadLineText() isn't inlined.

- is_csv_verbose-w-inline: is_csv is sent as a constant boolean and
CopyReadLineText() is inlined.

+-------------------------------+-------+------+------+-------+-------+------+
| TEXT                          |  none |   0  |   4  |   8   |   16  |  32  |
+-------------------------------+-------+------+------+-------+-------+------+
| only_inline                   |   0   | +1.6 |   0  |   0   |   -1  |   0  |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-wo-inline      |   0   |   0  |   0  |   0   |   0   |   0  |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-w-inline       |  -3.5 |   0  | -7.7 |  -2.3 |  -4.1 | -3.4 |
+-------------------------------+-------+------+------+-------+-------+------+
|                               |       |      |      |       |       |      |
+-------------------------------+-------+------+------+-------+-------+------+
|                               |       |      |      |       |       |      |
+-------------------------------+-------+------+------+-------+-------+------+
| CSV                           |  none |   0  |   4  |   8   |   16  |  32  |
+-------------------------------+-------+------+------+-------+-------+------+
| only_inline                   |  -1.1 |   0  |   0  |   0   |   0   |  -1  |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-wo-inline      |   0   |   0  |   0  |   0   |  -0.3 |   0  |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-w-inline       | -4    | -2.3 | -1.2 | -1.8  | -2.8  | -2.7 |
+-------------------------------+-------+------+------+-------+-------+------+

By looking the benchmark results,

        if (is_csv)
                result = CopyReadLineText(cstate, true);
        else
                result = CopyReadLineText(cstate, false);

with inline CopyReadLineText() function helps the performance even without SIMD.

----------------------------------------

This is the same benchmark with SIMD stuff:

only_inline: Only change is CopyReadLineText() being inlined, there is
no simd_enabled argument in the CopyReadLineText().

is_csv_verbose-w-inline: is_csv is sent as a constant boolean and
CopyReadLineText() is inlined. There is no simd_enabled argument in
the CopyReadLineText().

simd_enabled_verbose-w-inline: simd_enabled is sent as a constant
boolean and CopyReadLineText() is inlined.

both_verbose_w_inline: both is_csv and simd_enabled are sent as a
constant boolean and CopyReadLineText() is inlined.

+-------------------------------+-------+------+------+-------+-------+------+
| TEXT                          |  none |   0  |   4  |   8   |   16  |  32  |
+-------------------------------+-------+------+------+-------+-------+------+
| only_inline                   |  -11  | +9.7 | +9.1 | +11.4 | +14.8 |  +8  |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-wo-inline      | -11.9 | +4.5 | +2.4 |  +3.5 |  +2.2 | +1.8 |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-w-inline       | -12.6 |   0  | -2.4 |  +2.8 |  +1.6 |   0  |
+-------------------------------+-------+------+------+-------+-------+------+
| both_verbose_w_inline         | -12.1 |   0  |  -5  |   0   |  +2.5 | -1.8 |
+-------------------------------+-------+------+------+-------+-------+------+
|                               |       |      |      |       |       |      |
+-------------------------------+-------+------+------+-------+-------+------+
| CSV                           |  none |   0  |   4  |   8   |   16  |  32  |
+-------------------------------+-------+------+------+-------+-------+------+
| only_inline                   | -22.6 | +4.2 | +2.1 |  +2.6 |   0   | +2.2 |
+-------------------------------+-------+------+------+-------+-------+------+
| is_csv_verbose-w-inline       | -22.5 | -2.1 | -3.4 |  -3.9 |  -6.4 | -3.4 |
+-------------------------------+-------+------+------+-------+-------+------+
| simd_enabled_verbose-w-inline | -23   | 0    | -1.9 | -2.2  | -4.8  | -1.6 |
+-------------------------------+-------+------+------+-------+-------+------+
| both_verbose_w_inline         | -23.3 | -2.9 | -5   | -4.5  | -7.1  | -4.3 |
+-------------------------------+-------+------+------+-------+-------+------+

By looking at these results having both is_csv and simd_enabled as an
argument and sending them as constant boolean arguments help most.

>
> Some other random thoughts:

I am sending the benchmark results for now. I haven't tested other
suggestions yet. I will follow up with another email once I have
tested them.

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-18 21:26  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nathan Bossart @ 2026-02-18 21:26 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Feb 18, 2026 at 04:38:07PM +0300, Nazir Bilal Yavuz wrote:
> By looking at these results having both is_csv and simd_enabled as an
> argument and sending them as constant boolean arguments help most.

Thanks for doing these tests.  ISTM we might as well get this initial
inlining stuff committed.  Thoughts?

-- 
nathan

From 516176894a682e8bd4cb32202e3726f6621ba44d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Wed, 18 Feb 2026 14:55:39 -0600
Subject: [PATCH v8 1/1] Speedup COPY FROM with additional function inlining.

Following the example set by commit 58a359e585, we can squeeze out
a little more performance from COPY FROM (FORMAT {text,csv}) by
forcing CopyReadLineText() to be inlined and by passing the is_csv
parameter as a constant.  This allows the compiler to emit
specialized code with the known-const false comparisons and
subsequent branches removed.

Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Ayoub Kazar <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfromparse.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 94d6f415a06..0aa549837b5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -141,7 +141,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1173,8 +1174,18 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/*
+	 * Parse data and transfer into line_buf.
+	 *
+	 * Because this is performance critical, we inline CopyReadLineText() and
+	 * pass the boolean parameters as constants to allow the compiler to emit
+	 * specialized code with the known-const false comparisons and subsequent
+	 * branches removed.
+	 */
+	if (is_csv)
+		result = CopyReadLineText(cstate, true);
+	else
+		result = CopyReadLineText(cstate, false);
 
 	if (result)
 	{
@@ -1241,7 +1252,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
+static pg_attribute_always_inline bool
 CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
 	char	   *copy_input_buf;
-- 
2.50.1 (Apple Git-155)



Attachments:

  [text/plain] v8-0001-Speedup-COPY-FROM-with-additional-function-inlini.patch (2.6K, 2-v8-0001-Speedup-COPY-FROM-with-additional-function-inlini.patch)
  download | inline diff:
From 516176894a682e8bd4cb32202e3726f6621ba44d Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Wed, 18 Feb 2026 14:55:39 -0600
Subject: [PATCH v8 1/1] Speedup COPY FROM with additional function inlining.

Following the example set by commit 58a359e585, we can squeeze out
a little more performance from COPY FROM (FORMAT {text,csv}) by
forcing CopyReadLineText() to be inlined and by passing the is_csv
parameter as a constant.  This allows the compiler to emit
specialized code with the known-const false comparisons and
subsequent branches removed.

Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Ayoub Kazar <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfromparse.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 94d6f415a06..0aa549837b5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -141,7 +141,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1173,8 +1174,18 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/*
+	 * Parse data and transfer into line_buf.
+	 *
+	 * Because this is performance critical, we inline CopyReadLineText() and
+	 * pass the boolean parameters as constants to allow the compiler to emit
+	 * specialized code with the known-const false comparisons and subsequent
+	 * branches removed.
+	 */
+	if (is_csv)
+		result = CopyReadLineText(cstate, true);
+	else
+		result = CopyReadLineText(cstate, false);
 
 	if (result)
 	{
@@ -1241,7 +1252,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
+static pg_attribute_always_inline bool
 CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
 	char	   *copy_input_buf;
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-18 21:54  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-18 21:54 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 19 Feb 2026 at 00:26, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 18, 2026 at 04:38:07PM +0300, Nazir Bilal Yavuz wrote:
> > By looking at these results having both is_csv and simd_enabled as an
> > argument and sending them as constant boolean arguments help most.
>
> Thanks for doing these tests.  ISTM we might as well get this initial
> inlining stuff committed.  Thoughts?

nitpick:

-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+                                                        bool is_csv);

Do we want to move the new CopyReadLineText() declaration below to
group it with the other functions marked pg_attribute_always_inline?

Other than that, LGTM. I think it makes sense to separately commit this.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-19 04:02  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Manni Wood @ 2026-02-19 04:02 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Feb 18, 2026 at 3:54 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Thu, 19 Feb 2026 at 00:26, Nathan Bossart <[email protected]>
> wrote:
> >
> > On Wed, Feb 18, 2026 at 04:38:07PM +0300, Nazir Bilal Yavuz wrote:
> > > By looking at these results having both is_csv and simd_enabled as an
> > > argument and sending them as constant boolean arguments help most.
> >
> > Thanks for doing these tests.  ISTM we might as well get this initial
> > inlining stuff committed.  Thoughts?
>
> nitpick:
>
> -static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
> +static pg_attribute_always_inline bool CopyReadLineText(CopyFromState
> cstate,
> +                                                        bool is_csv);
>
> Do we want to move the new CopyReadLineText() declaration below to
> group it with the other functions marked pg_attribute_always_inline?
>
> Other than that, LGTM. I think it makes sense to separately commit this.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello.

I took some time tonight to apply v8 to the latest master (759b03b2) on my
x86 tower and arm raspberry pi 5.

Here are the results, using both narrow columns and the wider columns we've
been using througout:

x86 master NARROW
TXT :                 2587.642000 ms
CSV :                 2621.759000 ms
TXT with 1/3 escapes: 2707.933500 ms
CSV with 1/3 quotes:  3254.896500 ms

x86 v8 NARROW
TXT :                 2488.655250 ms  3.825365% improvement
CSV :                 2628.818000 ms  -0.269247% regression
TXT with 1/3 escapes: 2615.522000 ms  3.412621% improvement
CSV with 1/3 quotes:  3446.368000 ms  -5.882568% regression

x86 master WIDE
TXT :                 30583.229500 ms
CSV :                 35054.533500 ms
TXT with 1/3 escapes: 32767.421500 ms
CSV with 1/3 quotes:  44214.163500 ms

x86 v8 WIDE
TXT :                 26527.494250 ms  13.261305% improvement
CSV :                 33364.443750 ms  4.821316% improvement
TXT with 1/3 escapes: 29320.648000 ms  10.518904% improvement
CSV with 1/3 quotes:  42334.074750 ms  4.252232% improvement



arm master NARROW
TXT :                 1999.401000 ms
CSV :                 2081.610750 ms
TXT with 1/3 escapes: 2053.230250 ms
CSV with 1/3 quotes:  2431.608750 ms

arm v8 NARROW
TXT :                 1981.663750 ms  0.887128% improvement
CSV :                 2023.892500 ms  2.772769% improvement
TXT with 1/3 escapes: 2004.215250 ms  2.387214% improvement
CSV with 1/3 quotes:  2616.872750 ms  -7.618989% regression

arm master WIDE
TXT :                 9120.731750 ms
CSV :                 11114.478250 ms
TXT with 1/3 escapes: 10338.124500 ms
CSV with 1/3 quotes:  13404.430250 ms

arm v8 WIDE
TXT :                 8430.090750 ms  7.572210% improvement
CSV :                 10115.135500 ms  8.991360% improvement
TXT with 1/3 escapes: 9624.383500 ms  6.903970% improvement
CSV with 1/3 quotes:  12331.714000 ms  8.002699% improvement


-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-19 09:01  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-19 09:01 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 19 Feb 2026 at 07:02, Manni Wood <[email protected]> wrote:
>
> I took some time tonight to apply v8 to the latest master (759b03b2) on my x86 tower and arm raspberry pi 5.
>
> Here are the results, using both narrow columns and the wider columns we've been using througout:
>
> x86 master NARROW
> TXT :                 2587.642000 ms
> CSV :                 2621.759000 ms
> TXT with 1/3 escapes: 2707.933500 ms
> CSV with 1/3 quotes:  3254.896500 ms
>
> x86 v8 NARROW
> TXT :                 2488.655250 ms  3.825365% improvement
> CSV :                 2628.818000 ms  -0.269247% regression
> TXT with 1/3 escapes: 2615.522000 ms  3.412621% improvement
> CSV with 1/3 quotes:  3446.368000 ms  -5.882568% regression
>
> x86 master WIDE
> TXT :                 30583.229500 ms
> CSV :                 35054.533500 ms
> TXT with 1/3 escapes: 32767.421500 ms
> CSV with 1/3 quotes:  44214.163500 ms
>
> x86 v8 WIDE
> TXT :                 26527.494250 ms  13.261305% improvement
> CSV :                 33364.443750 ms  4.821316% improvement
> TXT with 1/3 escapes: 29320.648000 ms  10.518904% improvement
> CSV with 1/3 quotes:  42334.074750 ms  4.252232% improvement
>
>
>
> arm master NARROW
> TXT :                 1999.401000 ms
> CSV :                 2081.610750 ms
> TXT with 1/3 escapes: 2053.230250 ms
> CSV with 1/3 quotes:  2431.608750 ms
>
> arm v8 NARROW
> TXT :                 1981.663750 ms  0.887128% improvement
> CSV :                 2023.892500 ms  2.772769% improvement
> TXT with 1/3 escapes: 2004.215250 ms  2.387214% improvement
> CSV with 1/3 quotes:  2616.872750 ms  -7.618989% regression
>
> arm master WIDE
> TXT :                 9120.731750 ms
> CSV :                 11114.478250 ms
> TXT with 1/3 escapes: 10338.124500 ms
> CSV with 1/3 quotes:  13404.430250 ms
>
> arm v8 WIDE
> TXT :                 8430.090750 ms  7.572210% improvement
> CSV :                 10115.135500 ms  8.991360% improvement
> TXT with 1/3 escapes: 9624.383500 ms  6.903970% improvement
> CSV with 1/3 quotes:  12331.714000 ms  8.002699% improvement

Thank you for the results, they are interesting. I didn't expect to
see any regression for this benchmark. Also, I would expect the
non-special character cases and the 1/3 special character cases to
perform similarly, since we are not using SIMD for this benchmark.

I noticed that the timings in your narrow benchmark (both x86 and ARM)
are quite short. Would it be possible to extend the test so that the
total runtime is closer to ~10,000 ms? That might give us more stable
results.

Here is my benchmark with using your script:

WIDE: Total 500000 lines and each line is 4096 bytes.
NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and `A\\A`).

+---------+---------------+---------------+---------------+----------------+
| WIDE    | TXT None      | TXT 1/3       | CSV None      | CSV 1/3        |
+---------+---------------+---------------+---------------+----------------+
| master  | 10512         | 11133         | 12241         | 14321          |
+---------+---------------+---------------+---------------+----------------+
| patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008 (-%2.18) |
+---------+---------------+---------------+---------------+----------------+
|         |               |               |               |                |
+---------+---------------+---------------+---------------+----------------+
| NARROW  |               |               |               |                |
+---------+---------------+---------------+---------------+----------------+
| master  | 9702          | 9745          | 9784          | 10149          |
+---------+---------------+---------------+---------------+----------------+
| patched | 9344 (-%3.6)  | 9477 (-%2.7)  | 9439 (-%3.5)  | 9751 (-%3.9)   |
+---------+---------------+---------------+---------------+----------------+

The results look promising to me.

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-19 22:37  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: KAZAR Ayoub @ 2026-02-19 22:37 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

I ran some long benchmarks on this, and I got stable results across
multiple runs (few milliseconds difference)

This is on an Intel I7-1255U CPU with:
sudo cpupower frequency-set --governor=performance
sudo cpupower idle-set -D 0
echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

WIDE (500k rows)

TXT | none
Master avg: 22,183 ms
New avg: 20,435 ms
Improvement: -7.88%

CSV | none
Master avg: 26,737 ms
New avg: 24,625 ms
Improvement: -7.90%

TXT | escape
Master avg: 26,720 ms
New avg: 23,658 ms
Improvement: -11.46%

CSV | quote
Master avg: 35,961 ms
New avg: 33,317 ms
Improvement: -7.35%

--------------------------------------

NARROW (1.5M rows)

TXT | none
Master avg: 2,220 ms
New avg: 2,125 ms
Improvement: -4.28%

CSV | none
Master avg: 2,330 ms
New avg: 2,145 ms
Improvement: -7.92%

TXT | escape
Master avg: 2,425 ms
New avg: 2,187 ms
Improvement: -9.79%

CSV | quote
Master avg: 2,272 ms
New avg: 2,253 ms
Improvement: -0.85%

No regressions as expected, overall this looks good.

Regards,

Ayoub

On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Thu, 19 Feb 2026 at 07:02, Manni Wood <[email protected]>
> wrote:
> >
> > I took some time tonight to apply v8 to the latest master (759b03b2) on
> my x86 tower and arm raspberry pi 5.
> >
> > Here are the results, using both narrow columns and the wider columns
> we've been using througout:
> >
> > x86 master NARROW
> > TXT :                 2587.642000 ms
> > CSV :                 2621.759000 ms
> > TXT with 1/3 escapes: 2707.933500 ms
> > CSV with 1/3 quotes:  3254.896500 ms
> >
> > x86 v8 NARROW
> > TXT :                 2488.655250 ms  3.825365% improvement
> > CSV :                 2628.818000 ms  -0.269247% regression
> > TXT with 1/3 escapes: 2615.522000 ms  3.412621% improvement
> > CSV with 1/3 quotes:  3446.368000 ms  -5.882568% regression
> >
> > x86 master WIDE
> > TXT :                 30583.229500 ms
> > CSV :                 35054.533500 ms
> > TXT with 1/3 escapes: 32767.421500 ms
> > CSV with 1/3 quotes:  44214.163500 ms
> >
> > x86 v8 WIDE
> > TXT :                 26527.494250 ms  13.261305% improvement
> > CSV :                 33364.443750 ms  4.821316% improvement
> > TXT with 1/3 escapes: 29320.648000 ms  10.518904% improvement
> > CSV with 1/3 quotes:  42334.074750 ms  4.252232% improvement
> >
> >
> >
> > arm master NARROW
> > TXT :                 1999.401000 ms
> > CSV :                 2081.610750 ms
> > TXT with 1/3 escapes: 2053.230250 ms
> > CSV with 1/3 quotes:  2431.608750 ms
> >
> > arm v8 NARROW
> > TXT :                 1981.663750 ms  0.887128% improvement
> > CSV :                 2023.892500 ms  2.772769% improvement
> > TXT with 1/3 escapes: 2004.215250 ms  2.387214% improvement
> > CSV with 1/3 quotes:  2616.872750 ms  -7.618989% regression
> >
> > arm master WIDE
> > TXT :                 9120.731750 ms
> > CSV :                 11114.478250 ms
> > TXT with 1/3 escapes: 10338.124500 ms
> > CSV with 1/3 quotes:  13404.430250 ms
> >
> > arm v8 WIDE
> > TXT :                 8430.090750 ms  7.572210% improvement
> > CSV :                 10115.135500 ms  8.991360% improvement
> > TXT with 1/3 escapes: 9624.383500 ms  6.903970% improvement
> > CSV with 1/3 quotes:  12331.714000 ms  8.002699% improvement
>
> Thank you for the results, they are interesting. I didn't expect to
> see any regression for this benchmark. Also, I would expect the
> non-special character cases and the 1/3 special character cases to
> perform similarly, since we are not using SIMD for this benchmark.
>
> I noticed that the timings in your narrow benchmark (both x86 and ARM)
> are quite short. Would it be possible to extend the test so that the
> total runtime is closer to ~10,000 ms? That might give us more stable
> results.
>
> Here is my benchmark with using your script:
>
> WIDE: Total 500000 lines and each line is 4096 bytes.
> NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and
> `A\\A`).
>
>
> +---------+---------------+---------------+---------------+----------------+
> | WIDE    | TXT None      | TXT 1/3       | CSV None      | CSV 1/3
> |
>
> +---------+---------------+---------------+---------------+----------------+
> | master  | 10512         | 11133         | 12241         | 14321
> |
>
> +---------+---------------+---------------+---------------+----------------+
> | patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008 (-%2.18)
> |
>
> +---------+---------------+---------------+---------------+----------------+
> |         |               |               |               |
> |
>
> +---------+---------------+---------------+---------------+----------------+
> | NARROW  |               |               |               |
> |
>
> +---------+---------------+---------------+---------------+----------------+
> | master  | 9702          | 9745          | 9784          | 10149
> |
>
> +---------+---------------+---------------+---------------+----------------+
> | patched | 9344 (-%3.6)  | 9477 (-%2.7)  | 9439 (-%3.5)  | 9751 (-%3.9)
>  |
>
> +---------+---------------+---------------+---------------+----------------+
>
> The results look promising to me.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

end of thread, other threads:[~2026-02-19 22:37 UTC | newest]

Thread overview: 21+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-06 13:51 Re: Speed up COPY FROM text/CSV parsing using SIMD Nazir Bilal Yavuz <[email protected]>
2026-02-06 18:11 ` KAZAR Ayoub <[email protected]>
2026-02-06 21:29   ` Nathan Bossart <[email protected]>
2026-02-06 22:19     ` Nazir Bilal Yavuz <[email protected]>
2026-02-06 22:36       ` KAZAR Ayoub <[email protected]>
2026-02-06 22:47       ` Nathan Bossart <[email protected]>
2026-02-11 13:27         ` Nazir Bilal Yavuz <[email protected]>
2026-02-11 22:39           ` Nathan Bossart <[email protected]>
2026-02-13 11:45             ` Nazir Bilal Yavuz <[email protected]>
2026-02-13 23:09               ` Nathan Bossart <[email protected]>
2026-02-14 03:34                 ` Manni Wood <[email protected]>
2026-02-16 17:04                   ` Nathan Bossart <[email protected]>
2026-02-16 18:15                     ` Nathan Bossart <[email protected]>
2026-02-17 05:01                       ` Manni Wood <[email protected]>
2026-02-17 22:29                         ` Nathan Bossart <[email protected]>
2026-02-18 13:38                 ` Nazir Bilal Yavuz <[email protected]>
2026-02-18 21:26                   ` Nathan Bossart <[email protected]>
2026-02-18 21:54                     ` Nazir Bilal Yavuz <[email protected]>
2026-02-19 04:02                       ` Manni Wood <[email protected]>
2026-02-19 09:01                         ` Nazir Bilal Yavuz <[email protected]>
2026-02-19 22:37                           ` KAZAR Ayoub <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox