Re: Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Speed up COPY FROM text/CSV parsing using SIMD
59+ messages / 6 participants
[nested] [flat]

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 00:09  Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-02-20 00:09 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 19, 2026 at 4:37 PM KAZAR Ayoub <[email protected]> wrote:

> Hello,
>
> I ran some long benchmarks on this, and I got stable results across
> multiple runs (few milliseconds difference)
>
> This is on an Intel I7-1255U CPU with:
> sudo cpupower frequency-set --governor=performance
> sudo cpupower idle-set -D 0
> echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
>
> WIDE (500k rows)
>
> TXT | none
> Master avg: 22,183 ms
> New avg: 20,435 ms
> Improvement: -7.88%
>
> CSV | none
> Master avg: 26,737 ms
> New avg: 24,625 ms
> Improvement: -7.90%
>
> TXT | escape
> Master avg: 26,720 ms
> New avg: 23,658 ms
> Improvement: -11.46%
>
> CSV | quote
> Master avg: 35,961 ms
> New avg: 33,317 ms
> Improvement: -7.35%
>
> --------------------------------------
>
> NARROW (1.5M rows)
>
> TXT | none
> Master avg: 2,220 ms
> New avg: 2,125 ms
> Improvement: -4.28%
>
> CSV | none
> Master avg: 2,330 ms
> New avg: 2,145 ms
> Improvement: -7.92%
>
> TXT | escape
> Master avg: 2,425 ms
> New avg: 2,187 ms
> Improvement: -9.79%
>
> CSV | quote
> Master avg: 2,272 ms
> New avg: 2,253 ms
> Improvement: -0.85%
>
> No regressions as expected, overall this looks good.
>
> Regards,
>
> Ayoub
>
> On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Thu, 19 Feb 2026 at 07:02, Manni Wood <[email protected]>
>> wrote:
>> >
>> > I took some time tonight to apply v8 to the latest master (759b03b2) on
>> my x86 tower and arm raspberry pi 5.
>> >
>> > Here are the results, using both narrow columns and the wider columns
>> we've been using througout:
>> >
>> > x86 master NARROW
>> > TXT :                 2587.642000 ms
>> > CSV :                 2621.759000 ms
>> > TXT with 1/3 escapes: 2707.933500 ms
>> > CSV with 1/3 quotes:  3254.896500 ms
>> >
>> > x86 v8 NARROW
>> > TXT :                 2488.655250 ms  3.825365% improvement
>> > CSV :                 2628.818000 ms  -0.269247% regression
>> > TXT with 1/3 escapes: 2615.522000 ms  3.412621% improvement
>> > CSV with 1/3 quotes:  3446.368000 ms  -5.882568% regression
>> >
>> > x86 master WIDE
>> > TXT :                 30583.229500 ms
>> > CSV :                 35054.533500 ms
>> > TXT with 1/3 escapes: 32767.421500 ms
>> > CSV with 1/3 quotes:  44214.163500 ms
>> >
>> > x86 v8 WIDE
>> > TXT :                 26527.494250 ms  13.261305% improvement
>> > CSV :                 33364.443750 ms  4.821316% improvement
>> > TXT with 1/3 escapes: 29320.648000 ms  10.518904% improvement
>> > CSV with 1/3 quotes:  42334.074750 ms  4.252232% improvement
>> >
>> >
>> >
>> > arm master NARROW
>> > TXT :                 1999.401000 ms
>> > CSV :                 2081.610750 ms
>> > TXT with 1/3 escapes: 2053.230250 ms
>> > CSV with 1/3 quotes:  2431.608750 ms
>> >
>> > arm v8 NARROW
>> > TXT :                 1981.663750 ms  0.887128% improvement
>> > CSV :                 2023.892500 ms  2.772769% improvement
>> > TXT with 1/3 escapes: 2004.215250 ms  2.387214% improvement
>> > CSV with 1/3 quotes:  2616.872750 ms  -7.618989% regression
>> >
>> > arm master WIDE
>> > TXT :                 9120.731750 ms
>> > CSV :                 11114.478250 ms
>> > TXT with 1/3 escapes: 10338.124500 ms
>> > CSV with 1/3 quotes:  13404.430250 ms
>> >
>> > arm v8 WIDE
>> > TXT :                 8430.090750 ms  7.572210% improvement
>> > CSV :                 10115.135500 ms  8.991360% improvement
>> > TXT with 1/3 escapes: 9624.383500 ms  6.903970% improvement
>> > CSV with 1/3 quotes:  12331.714000 ms  8.002699% improvement
>>
>> Thank you for the results, they are interesting. I didn't expect to
>> see any regression for this benchmark. Also, I would expect the
>> non-special character cases and the 1/3 special character cases to
>> perform similarly, since we are not using SIMD for this benchmark.
>>
>> I noticed that the timings in your narrow benchmark (both x86 and ARM)
>> are quite short. Would it be possible to extend the test so that the
>> total runtime is closer to ~10,000 ms? That might give us more stable
>> results.
>>
>> Here is my benchmark with using your script:
>>
>> WIDE: Total 500000 lines and each line is 4096 bytes.
>> NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and
>> `A\\A`).
>>
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | WIDE    | TXT None      | TXT 1/3       | CSV None      | CSV 1/3
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master  | 10512         | 11133         | 12241         | 14321
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008
>> (-%2.18) |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> |         |               |               |               |
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | NARROW  |               |               |               |
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master  | 9702          | 9745          | 9784          | 10149
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 9344 (-%3.6)  | 9477 (-%2.7)  | 9439 (-%3.5)  | 9751 (-%3.9)
>>  |
>>
>> +---------+---------------+---------------+---------------+----------------+
>>
>> The results look promising to me.
>>
>> --
>> Regards,
>> Nazir Bilal Yavuz
>> Microsoft
>>
>
Hello!

Thanks for running benchmarks, Ayoub.

Nazir, I ran my benchmarks with more rows this time --- as many rows as
would fit on  my test computers without exhausting their RAM disks. That
seems to have brought things more into line with what Ayoub saw. I did get
some small regressions, but I suspect those are not a big deal. (For
instance, on both machines I also noticed the occasional "truncate table"
would take longer than the others, despite my scripts' best efforts to
steady a CPU core and pin postmaster and children to that core.)

x86 WIDE master 500,000 rows
TXT :                 30602.244000 ms
CSV :                 35062.451250 ms
TXT with 1/3 escapes: 32704.250250 ms
CSV with 1/3 quotes:  44128.072500 ms

x86 WIDE v8 500,000 rows
TXT :                 26611.953250 ms  13.039210% improvement
CSV :                 33366.184000 ms  4.837846% improvement
TXT with 1/3 escapes: 29251.310000 ms  10.558078% improvement
CSV with 1/3 quotes:  42368.421000 ms  3.987601% improvement

x86 NARROW master 50mil rows
TXT :                 25898.004000 ms
CSV :                 27212.684500 ms
TXT with 1/3 escapes: 29189.518250 ms
CSV with 1/3 quotes:  33222.510250 ms

x86 NARROW v8 50mil rows
TXT :                 26368.765000 ms  -1.817750% regression
CSV :                 26711.122250 ms  1.843119% improvement
TXT with 1/3 escapes: 28081.150750 ms  3.797142% improvement
CSV with 1/3 quotes:  32851.963500 ms  1.115348% improvement


arm WIDE master 250,000 rows
TXT :                 11392.462750 ms
CSV :                 13887.576500 ms
TXT with 1/3 escapes: 12908.560750 ms
CSV with 1/3 quotes:  16699.337000 ms

arm WIDE v8 250,000 rows
TXT :                 10524.567750 ms  7.618151% improvement
CSV :                 12621.211250 ms  9.118691% improvement
TXT with 1/3 escapes: 12017.030250 ms  6.906506% improvement
CSV with 1/3 quotes:  15428.020500 ms  7.612976% improvement

arm NARROW master 25mil rows
TXT :                 10030.274000 ms
CSV :                 10245.238750 ms
TXT with 1/3 escapes: 10345.224500 ms
CSV with 1/3 quotes:  12186.313250 ms

arm NARROW v8 25mil rows
TXT :                 10197.386500 ms  -1.666081% regression
CSV :                 10257.918750 ms  -0.123765% regression
TXT with 1/3 escapes: 10084.978500 ms  2.515615% improvement
CSV with 1/3 quotes:  12064.215000 ms  1.001929% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 09:50  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-20 09:50 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 20 Feb 2026 at 03:09, Manni Wood <[email protected]> wrote:
>
> Thanks for running benchmarks, Ayoub.
>
> Nazir, I ran my benchmarks with more rows this time --- as many rows as would fit on  my test computers without exhausting their RAM disks. That seems to have brought things more into line with what Ayoub saw. I did get some small regressions, but I suspect those are not a big deal. (For instance, on both machines I also noticed the occasional "truncate table" would take longer than the others, despite my scripts' best efforts to steady a CPU core and pin postmaster and children to that core.)

Thank you both for the benchmarks. Results look good to me!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 18:15  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-02-20 18:15 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Feb 20, 2026 at 12:50:35PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 20 Feb 2026 at 03:09, Manni Wood <[email protected]> wrote:
>> Nazir, I ran my benchmarks with more rows this time --- as many rows as
>> would fit on  my test computers without exhausting their RAM disks. That
>> seems to have brought things more into line with what Ayoub saw. I did
>> get some small regressions, but I suspect those are not a big deal. (For
>> instance, on both machines I also noticed the occasional "truncate
>> table" would take longer than the others, despite my scripts' best
>> efforts to steady a CPU core and pin postmaster and children to that
>> core.)

Yeah, the couple of small regressions seem close to (or below) the noise
level, and IIUC yours were the only benchmarks that showed them, anyway.
Plus, I think we'll need this change regardless as a prerequisite for the
SIMD work.

> Thank you both for the benchmarks. Results look good to me!

Committed that part.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-23 09:10  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-23 09:10 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 20 Feb 2026 at 21:15, Nathan Bossart <[email protected]> wrote:
>
> Yeah, the couple of small regressions seem close to (or below) the noise
> level, and IIUC yours were the only benchmarks that showed them, anyway.
> Plus, I think we'll need this change regardless as a prerequisite for the
> SIMD work.
>
> > Thank you both for the benchmarks. Results look good to me!
>
> Committed that part.

Thank you! Attaching the SIMD patch only.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v10-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (7.7K, 2-v10-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 9ef4e1376657b577cd4b4c42fb6a592ebd5fae24 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:28:55 +0300
Subject: [PATCH v10] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 135 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2b7556b287c..3dd159f15b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1717,6 +1717,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 6b00d49c50f..7bdf5681628 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -142,7 +143,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
 static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
-														bool is_csv);
+														bool is_csv,
+														bool simd_enabled);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1182,9 +1184,19 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	 * specialized code with fewer branches.
 	 */
 	if (is_csv)
-		result = CopyReadLineText(cstate, true);
+	{
+		if (cstate->simd_enabled)
+			result = CopyReadLineText(cstate, true, true);
+		else
+			result = CopyReadLineText(cstate, true, false);
+	}
 	else
-		result = CopyReadLineText(cstate, false);
+	{
+		if (cstate->simd_enabled)
+			result = CopyReadLineText(cstate, false, true);
+		else
+			result = CopyReadLineText(cstate, false, false);
+	}
 
 	if (result)
 	{
@@ -1252,7 +1264,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static pg_attribute_always_inline bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1267,6 +1279,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1274,6 +1294,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1340,6 +1366,107 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - When the remaining buffer is smaller than one
+		 * vector width (sizeof(Vector8)), since SIMD operates on fixed-size
+		 * chunks.
+		 *
+		 * Note that, SIMD may become slower when the input contains many
+		 * special characters. To avoid this regression, we disable SIMD for
+		 * the rest of the input once we encounter a special character which
+		 * is neither EOF nor EOL.
+		 */
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+
+				simd_hit_eof = (c1 == '\\' && c2 == '.' && !is_csv);
+				simd_hit_eol = (c1 == '\r' || c1 == '\n');
+
+				/*
+				 * If (is_csv && in_quote), we shouldn't have picked up '\r'
+				 * or '\n' in the first place.
+				 */
+				Assert(!simd_hit_eol || !(is_csv && in_quote));
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+				{
+					simd_enabled = false;
+					cstate->simd_enabled = false;
+				}
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 822ef33cf69..73ce777c52b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 04:44  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-02-24 04:44 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Feb 23, 2026 at 3:10 AM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Fri, 20 Feb 2026 at 21:15, Nathan Bossart <[email protected]>
> wrote:
> >
> > Yeah, the couple of small regressions seem close to (or below) the noise
> > level, and IIUC yours were the only benchmarks that showed them, anyway.
> > Plus, I think we'll need this change regardless as a prerequisite for the
> > SIMD work.
> >
> > > Thank you both for the benchmarks. Results look good to me!
> >
> > Committed that part.
>
> Thank you! Attaching the SIMD patch only.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello!

I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit surprised
at the regression for x86 with wide rows for the 1/3rd special characters
scenarios. I'm hoping it's something I did wrong. If anyone else has
numbers to share, that would be excellent.

x86 NARROW master 50,000,000 rows
TXT :                 26359.319000 ms
CSV :                 25661.199750 ms
TXT with 1/3 escapes: 28170.085250 ms
CSV with 1/3 quotes:  32638.147500 ms

x86 NARROW v10 50,000,000 rows
TXT :                 26416.331500 ms  -0.216290% regression
CSV :                 25318.727500 ms  1.334592% improvement
TXT with 1/3 escapes: 28608.007500 ms  -1.554565% regression
CSV with 1/3 quotes:  32805.627750 ms  -0.513143% regression

x86 WIDE master 500,000 rows
TXT :                 26475.164250 ms
CSV :                 31963.478500 ms
TXT with 1/3 escapes: 29671.120750 ms
CSV with 1/3 quotes:  40391.616250 ms

x86 WIDE v10 500,000 rows
TXT :                 23067.046750 ms  12.872885% improvement
CSV :                 23259.092250 ms  27.232287% improvement
TXT with 1/3 escapes: 31796.098250 ms  -7.161770% regression
CSV with 1/3 quotes:  42925.792250 ms  -6.274015% regression



arm NARROW master 25,000,000 rows
TXT :                 10077.096250 ms
CSV :                 10310.671250 ms
TXT with 1/3 escapes: 9893.155000 ms
CSV with 1/3 quotes:  12133.064750 ms

arm NARROW v10 25,000,000 rows
TXT :                 10467.816750 ms  -3.877312% regression
CSV :                 9986.288000 ms  3.146092% improvement
TXT with 1/3 escapes: 10323.173750 ms  -4.346629% regression
CSV with 1/3 quotes:  11843.611750 ms  2.385654% improvement

arm WIDE master 250,000 rows
TXT :                 10568.344750 ms
CSV :                 13046.610500 ms
TXT with 1/3 escapes: 12193.088500 ms
CSV with 1/3 quotes:  16629.319000 ms

arm WIDE v10 250,000 rows
TXT :                 9064.959000 ms  14.225366% improvement
CSV :                 9019.553250 ms  30.866693% improvement
TXT with 1/3 escapes: 12344.497250 ms  -1.241759% regression
CSV with 1/3 quotes:  15495.863750 ms  6.816005% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 13:57  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 2 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-24 13:57 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 07:44, Manni Wood <[email protected]> wrote:
>
> Hello!
>
> I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit surprised at the regression for x86 with wide rows for the 1/3rd special characters scenarios. I'm hoping it's something I did wrong. If anyone else has numbers to share, that would be excellent.

Thank you for doing this!

I see similar regression on the wide & CSV 1/3 case by using your
benchmark script. I didn't see this regression when I used my
benchmark while sharing v9 [1].

+-------------+---------------------------+---------------------------+
|             |            Text           |            CSV            |
+-------------+-------------+-------------+-------------+-------------+
|  WIDE TEST  |     None    |     1/3     |     None    |     1/3     |
+-------------+-------------+-------------+-------------+-------------+
|    Master   |     9996    |    10769    |    11548    |    13960    |
+-------------+-------------+-------------+-------------+-------------+
|     v10     | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
+-------------+-------------+-------------+-------------+-------------+
|             |             |             |             |             |
+-------------+-------------+-------------+-------------+-------------+
|             |            Text           |             |     CSV     |
+-------------+-------------+-------------+-------------+-------------+
| NARROW TEST |     None    |     1/3     |     None    |     1/3     |
+-------------+-------------+-------------+-------------+-------------+
|    Master   |     9441    |     9561    |     9734    |     9830    |
+-------------+-------------+-------------+-------------+-------------+
|     v10     |  9291 %-1.5 |  9504 -%0.5 |  9644 %-0.9 | 10078 %-2.4 |
+-------------+-------------+-------------+-------------+-------------+

I will investigate this. However, please note that the current master
includes the inlining commit (dc592a4155), which makes the COPY FROM
faster. In my case,

1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + SIMD: 15123ms (%5 regression against #1 and %8
regression against #2)

Is it possible for you to do a similar test? I mean dropping
dc592a4155 from the current master and re-running the benchmark, that
would be helpful.

[1] https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 15:07  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 59+ messages in thread

From: KAZAR Ayoub @ 2026-02-24 15:07 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Tue, Feb 24, 2026 at 2:57 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Tue, 24 Feb 2026 at 07:44, Manni Wood <[email protected]>
> wrote:
> >
> > Hello!
> >
> > I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit
> surprised at the regression for x86 with wide rows for the 1/3rd special
> characters scenarios. I'm hoping it's something I did wrong. If anyone else
> has numbers to share, that would be excellent.
>
> Thank you for doing this!
>
> I see similar regression on the wide & CSV 1/3 case by using your
> benchmark script. I didn't see this regression when I used my
> benchmark while sharing v9 [1].
>
> +-------------+---------------------------+---------------------------+
> |             |            Text           |            CSV            |
> +-------------+-------------+-------------+-------------+-------------+
> |  WIDE TEST  |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9996    |    10769    |    11548    |    13960    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
> +-------------+-------------+-------------+-------------+-------------+
> |             |             |             |             |             |
> +-------------+-------------+-------------+-------------+-------------+
> |             |            Text           |             |     CSV     |
> +-------------+-------------+-------------+-------------+-------------+
> | NARROW TEST |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9441    |     9561    |     9734    |     9830    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     |  9291 %-1.5 |  9504 -%0.5 |  9644 %-0.9 | 10078 %-2.4 |
> +-------------+-------------+-------------+-------------+-------------+
>
> I will investigate this. However, please note that the current master
> includes the inlining commit (dc592a4155), which makes the COPY FROM
> faster. In my case,
>
> 1: current master without dc592a4155: 14400ms
> 2: current master: 13960ms (%3 improvement against #1)
> 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> regression against #2)
>
> Is it possible for you to do a similar test? I mean dropping
> dc592a4155 from the current master and re-running the benchmark, that
> would be helpful.
>
> [1]
> https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

Here are some numbers for v10 from my end, these are multiple long runs:
Master contains the previous inlining patch.

This is on an Intel I7-1255U CPU

WIDE (500k rows)

TXT | none
Master avg: 20,721 ms
New avg: 17,980 ms
Improvement: -13.23%

CSV | none
Master avg: 26,608 ms
New avg: 18,433 ms
Improvement: -30.73%

TXT | escape
Master avg: 25,069 ms
New avg: 22,910 ms
Improvement: -8.61%

CSV | quote
Master avg: 31,931 ms
New avg: 31,493 ms
Improvement: -1.37%

--------------------------------------

NARROW (15M rows)

TXT | none
Master avg: 20,687 ms
New avg: 20,824 ms
Regression: +0.67%

CSV | none
Master avg: 21,187 ms
New avg: 21,153 ms
Improvement: -0.16%

TXT | escape
Master avg: 20,870 ms
New avg: 21,341 ms
Regression: +2.25%

CSV | quote
Master avg: 22,074 ms
New avg: 22,267 ms
Regression: +0.87%

For narrow that would be mostly noise and extra branch effects.

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 17:48  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 2 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-02-24 17:48 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> I will investigate this. However, please note that the current master
> includes the inlining commit (dc592a4155), which makes the COPY FROM
> faster. In my case,
> 
> 1: current master without dc592a4155: 14400ms
> 2: current master: 13960ms (%3 improvement against #1)
> 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> regression against #2)
> 
> Is it possible for you to do a similar test? I mean dropping
> dc592a4155 from the current master and re-running the benchmark, that
> would be helpful.

IMHO as long as the difference from v18 looks reasonable, commit-by-commit
regressions and improvements that even out in the end are okay.  That's
perhaps a bit of mental gymnastics (e.g., what if we had committed the
inlining patch for v18?), but I believe that's how we've dealt with similar
problems in the past.  But maybe there are ways to avoid even these
in-development regressions, too...

-- 
nathan






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-25 04:06  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 0 replies; 59+ messages in thread

From: Manni Wood @ 2026-02-25 04:06 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Feb 24, 2026 at 11:48 AM Nathan Bossart <[email protected]>
wrote:

> On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> > I will investigate this. However, please note that the current master
> > includes the inlining commit (dc592a4155), which makes the COPY FROM
> > faster. In my case,
> >
> > 1: current master without dc592a4155: 14400ms
> > 2: current master: 13960ms (%3 improvement against #1)
> > 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> > regression against #2)
> >
> > Is it possible for you to do a similar test? I mean dropping
> > dc592a4155 from the current master and re-running the benchmark, that
> > would be helpful.
>
> IMHO as long as the difference from v18 looks reasonable, commit-by-commit
> regressions and improvements that even out in the end are okay.  That's
> perhaps a bit of mental gymnastics (e.g., what if we had committed the
> inlining patch for v18?), but I believe that's how we've dealt with similar
> problems in the past.  But maybe there are ways to avoid even these
> in-development regressions, too...
>
> --
> nathan
>

Oh yes, I see now.

Commit 18bcdb75 is just before the v9 patch got applied, so I used that as
"old master" and compared that with master (v9 applied) and then "master
(v9 applied) + v10 applied".

arm NARROW old master 18bcdb75
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW master (v9 applied)
TXT :                 10077.096250 ms  8.369778% improvement
CSV :                 10310.671250 ms  4.509151% improvement
TXT with 1/3 escapes: 9893.155000 ms  3.941064% improvement
CSV with 1/3 quotes:  12133.064750 ms  3.394441% improvement

arm NARROW v10
TXT :                 10467.816750 ms  4.816988% improvement
CSV :                 9986.288000 ms  7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms  -0.234262% regression
CSV with 1/3 quotes:  11843.611750 ms  5.699116% improvement


arm WIDE old master 18bcdb75
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE master (v9 applied)
TXT :                 10568.344750 ms  10.632934% improvement
CSV :                 13046.610500 ms  6.187236% improvement
TXT with 1/3 escapes: 12193.088500 ms  9.214736% improvement
CSV with 1/3 quotes:  16629.319000 ms  5.288973% improvement

arm WIDE v10
TXT :                 9064.959000 ms  23.345727% improvement
CSV :                 9019.553250 ms  35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms  8.087402% improvement
CSV with 1/3 quotes:  15495.863750 ms  11.744482% improvement



x86 NARROW old master 18bcdb75
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW master
TXT :                 26359.319000 ms  -1.737842% regression
CSV :                 25661.199750 ms  8.801007% improvement
TXT with 1/3 escapes: 28170.085250 ms  -1.352471% regression
CSV with 1/3 quotes:  32638.147500 ms  5.510895% improvement

x86 NARROW v10
TXT :                 26416.331500 ms  -1.957890% regression
CSV :                 25318.727500 ms  10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms  -2.928061% regression
CSV with 1/3 quotes:  32805.627750 ms  5.026032% improvement

x86 WIDE old master 18bcdb75
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE master
TXT :                 26475.164250 ms  8.003434% improvement
CSV :                 31963.478500 ms  10.395938% improvement
TXT with 1/3 escapes: 29671.120750 ms  8.539755% improvement
CSV with 1/3 quotes:  40391.616250 ms  14.105012% improvement

x86 WIDE v10
TXT :                 23067.046750 ms  19.846046% improvement
CSV :                 23259.092250 ms  34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms  1.989583% improvement
CSV with 1/3 quotes:  42925.792250 ms  8.715948% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-25 14:24  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 2 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-25 14:24 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 20:48, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> > I will investigate this. However, please note that the current master
> > includes the inlining commit (dc592a4155), which makes the COPY FROM
> > faster. In my case,
> >
> > 1: current master without dc592a4155: 14400ms
> > 2: current master: 13960ms (%3 improvement against #1)
> > 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> > regression against #2)
> >
> > Is it possible for you to do a similar test? I mean dropping
> > dc592a4155 from the current master and re-running the benchmark, that
> > would be helpful.
>
> IMHO as long as the difference from v18 looks reasonable, commit-by-commit
> regressions and improvements that even out in the end are okay.  That's
> perhaps a bit of mental gymnastics (e.g., what if we had committed the
> inlining patch for v18?), but I believe that's how we've dealt with similar
> problems in the past.  But maybe there are ways to avoid even these
> in-development regressions, too...

I agree with you. However, unfortunately, I see regression on master +
v10 compared to REL_18_3 (62d6c7d3df6).

Thank you Kazar and Manni for benchmarks in [1] and [2]!

I am still able to reproduce regression for the 'wide & CSV 1/3' case
[3] by using Manni's benchmark script. I constantly see ~%5
regression, I am just curious if I am doing something wrong. I am a
bit surprised because I didn't see this regression before, also Kazar
and Manni don't see any regression in their [1] and [2] benchmarks. I
am still investigating this regression. Hopefully, I will come back
with more information soon.

If anyone has any suggestions/ideas, please let me know!

[1] https://postgr.es/m/CA%2BK2RukFH57QPAfTEzvy7PEyrLzav3HkyCiu-2yqR%2BuW_Niorw%40mail.gmail.com
[2] https://postgr.es/m/CAKWEB6oT5KbyF%2BuRRhjjJi7p2PmRdOzxp3T6vFcN04BCR-%3DB2w%40mail.gmail.com
[3]
1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + v10: 15123ms (%5 regression against #1 and %8
regression against #2)

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 12:19  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-26 12:19 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]> wrote:
>
> I agree with you. However, unfortunately, I see regression on master +
> v10 compared to REL_18_3 (62d6c7d3df6).
>
> Thank you Kazar and Manni for benchmarks in [1] and [2]!

Kazar and Manni, if possible could you please share the build commands
you use? I see regressions for an inlining patch (dc592a4155) too when
I build postgres with -O2.

My build commands are:

-O2: meson setup buildtype=debugoptimized ...

-O3: meson setup buildtype=release ...

This is a wide benchmark only, old master means b2ff2a0b529 without
0001 inlining patch (dc592a4155):

+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|            -O2           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10440 | 11000 | 11940 | 13600 |
+--------------------------+-------+-------+-------+-------+
|     Old Master + 0001    | 10140 | 10800 | 11600 | 14300 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9000 | 11000 |  8850 | 15300 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |       |  CSV  |
+--------------------------+-------+-------+-------+-------+
|            -O3           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10440 | 11200 | 12200 | 14390 |
+--------------------------+-------+-------+-------+-------+
|     Old Master + 0001    | 10000 | 10700 | 11540 | 13960 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8880 | 10900 |  8900 | 15100 |
+--------------------------+-------+-------+-------+-------+

This result shows that when we compare v18 and v18 + SIMD (0001 +
0002), there is only regression for the CSV 1/3 case. The regression
is %12.5 for the -O2 and %5 for the -O3.

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 14:31  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: KAZAR Ayoub @ 2026-02-26 14:31 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
> wrote:
> >
> > I agree with you. However, unfortunately, I see regression on master +
> > v10 compared to REL_18_3 (62d6c7d3df6).
> >
> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>
> Kazar and Manni, if possible could you please share the build commands
> you use? I see regressions for an inlining patch (dc592a4155) too when
> I build postgres with -O2.
>
> My build commands are:
>
> -O2: meson setup buildtype=debugoptimized ...
>
> -O3: meson setup buildtype=release ...

All my builds are with CFLAGS='-O2 -g'

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 14:36  Manni Wood <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-02-26 14:36 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 26, 2026 at 8:31 AM KAZAR Ayoub <[email protected]> wrote:

> Hello,
>
> On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
>> wrote:
>> >
>> > I agree with you. However, unfortunately, I see regression on master +
>> > v10 compared to REL_18_3 (62d6c7d3df6).
>> >
>> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>>
>> Kazar and Manni, if possible could you please share the build commands
>> you use? I see regressions for an inlining patch (dc592a4155) too when
>> I build postgres with -O2.
>>
>> My build commands are:
>>
>> -O2: meson setup buildtype=debugoptimized ...
>>
>> -O3: meson setup buildtype=release ...
>
> All my builds are with CFLAGS='-O2 -g'
>
> Regards,
> Ayoub
>

Hello!

I have been building with this command:

meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH}
--buildtype=debugoptimized

And in my notes I have "If I use `--buildtype=debugoptimized` it optimizes
`-O2` and uses `-g`"

Best,
-Manni
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 15:32  Manni Wood <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-02-26 15:32 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

I have a thought and a question:

My notes say "If I use `--buildtype=release` it optimizes `-O2` and the
executable contains no debug symbols."

So, seeing as end users will presumably be seeing the performance generated
by `--buildtype=release`, should we be building with that for all
performance testing?

Best,
-Manni

On Thu, Feb 26, 2026 at 8:36 AM Manni Wood <[email protected]>
wrote:

>
>
> On Thu, Feb 26, 2026 at 8:31 AM KAZAR Ayoub <[email protected]> wrote:
>
>> Hello,
>>
>> On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
>>> wrote:
>>> >
>>> > I agree with you. However, unfortunately, I see regression on master +
>>> > v10 compared to REL_18_3 (62d6c7d3df6).
>>> >
>>> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>>>
>>> Kazar and Manni, if possible could you please share the build commands
>>> you use? I see regressions for an inlining patch (dc592a4155) too when
>>> I build postgres with -O2.
>>>
>>> My build commands are:
>>>
>>> -O2: meson setup buildtype=debugoptimized ...
>>>
>>> -O3: meson setup buildtype=release ...
>>
>> All my builds are with CFLAGS='-O2 -g'
>>
>> Regards,
>> Ayoub
>>
>
> Hello!
>
> I have been building with this command:
>
> meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH}
> --buildtype=debugoptimized
>
> And in my notes I have "If I use `--buildtype=debugoptimized` it optimizes
> `-O2` and uses `-g`"
>
> Best,
> -Manni
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>


-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 15:51  KAZAR Ayoub <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: KAZAR Ayoub @ 2026-02-26 15:51 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 26, 2026 at 4:32 PM Manni Wood <[email protected]>
wrote:

> I have a thought and a question:
>
> My notes say "If I use `--buildtype=release` it optimizes `-O2` and the
> executable contains no debug symbols."
>
That would be `debugoptimized` not `release`, from [1] i see that `release`
is -O3 with no debug.

>
> So, seeing as end users will presumably be seeing the performance
> generated by `--buildtype=release`, should we be building with that for all
> performance testing?
>
I know that Debian builds with  'CFLAGS=-g -O2 -flto=auto -ffat-lto-objects
-flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat
-Werror=format-security -fno-omit-frame-pointer'
'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto
-Wl,-z,relro -Wl,-z,now' ; this is from pg_config for v18.

[1] https://mesonbuild.com/Builtin-options.html

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-02 19:55  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-02 19:55 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> If anyone has any suggestions/ideas, please let me know!

A couple of random ideas:

* Additional inlining for callers.  I looked around a little bit and didn't
see any great candidates, so I don't have much faith in this, but maybe
you'll see something I don't.

* Disable SIMD if we are consistently getting small rows.  That won't help
your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
the regression for narrow rows described elsewhere.

* Surround the variable initializations with "if (simd_enabled)".
Presumably compilers are smart enough to remove those in the non-SIMD paths
already, but it could be worth a try.

* Add simd_enabled function parameter to CopyReadLine(),
NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
compiler to do some additional optimizations to reduce branching.

-- 
nathan

^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-04 15:15  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-04 15:15 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> > If anyone has any suggestions/ideas, please let me know!

I am able to fix the problem. My first assumption was that the
branching of SIMD code caused that problem, so I moved SIMD code to
the CopyReadLineTextSIMDHelper() function. Then I moved this
CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
that we won't have any branching in the non-SIMD (scalar) code path.
This didn't solve the problem and then I realized that even though I
disable SIMD code path with 'if (false)', there is still regression
but if I comment all of the 'if (cstate->simd_enabled)' branch, then
there is no regression at all.

To find out more, I compared assembly outputs of both and found out
the possible reason. What I understood is that the compiler can't
promote a variable to register, instead these variables live in the
stack; which is slower. Please see the two different assembly outputs:

Slow code:

        c = copy_input_buf[input_buf_ptr++];
     db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
     db4:    48 63 c6                 movslq %esi,%rax
     db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
     dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
     dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx

Fast code:

        c = copy_input_buf[input_buf_ptr++];
     d80:    49 63 c4                 movslq %r12d,%rax
     d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
     d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax

And the reason for that is sending the address of input_buf_ptr to a
CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
this:

int            temp_input_buf_ptr = input_buf_ptr;
CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);

Then there is no regression. However, I am still not completely sure
if that is the same problem in the v10, I am planning to spend more
time debugging this.

> A couple of random ideas:
>
> * Additional inlining for callers.  I looked around a little bit and didn't
> see any great candidates, so I don't have much faith in this, but maybe
> you'll see something I don't.

I agree with you. CopyReadLineText() is already quite a big function.

> * Disable SIMD if we are consistently getting small rows.  That won't help
> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
> the regression for narrow rows described elsewhere.

I implemented this, two consecutive small rows disables SIMD.

> * Surround the variable initializations with "if (simd_enabled)".
> Presumably compilers are smart enough to remove those in the non-SIMD paths
> already, but it could be worth a try.

Done.

> * Add simd_enabled function parameter to CopyReadLine(),
> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
> compiler to do some additional optimizations to reduce branching.

I think we don't need this. At least the implementation with
CopyReadLineTextSIMDHelper() doesn't need this since branching will be
at the top and it will be once per line.

I think v11 looks better compared to v10. I liked the
CopyReadLineTextSIMDHelper() helper function. I also liked it being at
the top of CopyReadLineText(), not being in the scalar path. This
gives us more optimization options without affecting the scalar path.

Here are the new benchmark results, I benchmarked the changes with
both -O2 and -O3 and also both with and without 'changing
default_toast_compression to lz4' commit (65def42b1d5). Benchmark
results show that there is no regression and the performance
improvement is much bigger with 65def42b1d5, it is close to 2x for
text format and more than 2x for the csv format.

------------------------------

Benchmark results:

With 65def42b1d5:

+---------------------------------------------------------+
|                    Optimization: -O2                    |
+--------------------------+--------------+---------------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|           WIDE           | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 4220 |  4780 |  5930 |  8250 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2520 |  4500 |  2520 |  7800 |
+--------------------------+------+-------+-------+-------+
|                          |      |       |       |       |
+--------------------------+------+-------+-------+-------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|          NARROW          | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 9920 | 10100 | 10200 | 10470 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9970 | 10000 | 10180 | 10350 |
+--------------------------+------+-------+-------+-------+
|                                                         |
+---------------------------------------------------------+
|                                                         |
+---------------------------------------------------------+
|                    Optimization: -O3                    |
+--------------------------+--------------+---------------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|           WIDE           | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 4100 |  4900 |  6200 |  8300 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2470 |  4440 |  2570 |  7700 |
+--------------------------+------+-------+-------+-------+
|                          |      |       |       |       |
+--------------------------+------+-------+-------+-------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|          NARROW          | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 9530 |  9690 |  9800 | 10080 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9350 |  9450 |  9700 | 10000 |
+--------------------------+------+-------+-------+-------+

------------------------------

Without 65def42b1d5:

+----------------------------------------------------------+
|                     Optimization: -O2                    |
+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|           WIDE           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10550 | 11030 | 12250 | 14400 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8890 | 10700 |  8870 | 14070 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|          NARROW          |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        |  9921 | 10205 | 10123 | 10420 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9880 | 10070 | 10150 | 10400 |
+--------------------------+-------+-------+-------+-------+
|                                                          |
+----------------------------------------------------------+
|                                                          |
+----------------------------------------------------------+
|                     Optimization: -O3                    |
+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|           WIDE           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10500 | 11100 | 12600 | 14580 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8900 | 10660 |  8860 | 13990 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|          NARROW          |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        |  9600 |  9700 |  9800 | 10150 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9300 |  9470 |  9600 |  9880 |
+--------------------------+-------+-------+-------+-------+

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v11-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (9.4K, 2-v11-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 7acaeb3201ae4ae279bf8b25641bea7f8cb92cbe Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v11] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   4 +
 src/backend/commands/copyfromparse.c     | 222 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   4 +
 3 files changed, 223 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..2aa52810ff1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,10 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+	cstate->simd_failed_first_vector = false;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fbd13353efc..70e1a5a0410 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -158,6 +159,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1310,6 +1317,182 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which is neither EOF nor EOL.
+ * Also, SIMD is disabled when it encounters two consecutive short lines that
+ * SIMD can't create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) >= copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+
+				simd_hit_eof = (c1 == '\\' && c2 == '.' && !is_csv);
+				simd_hit_eol = (c1 == '\r' || c1 == '\n');
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+					cstate->simd_enabled = false;
+
+				/*
+				 * We encountered a EOL or EOF on the first vector. This means
+				 * lines are not long enough to skip fully sized vector. If
+				 * this happens two times consecutively, then disable the
+				 * SIMD.
+				 */
+				if (first_vector)
+				{
+					if (cstate->simd_failed_first_vector)
+						cstate->simd_enabled = false;
+
+					cstate->simd_failed_first_vector = true;
+				}
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			first_vector = false;
+			break;
+		}
+	}
+
+	cstate->simd_failed_first_vector = first_vector;
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1338,6 +1521,38 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/* input_buf_ptr will be used in the SIMD Helper function */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		int			temp_input_buf_ptr = input_buf_ptr;
+		bool		temp_hit_eof = false;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Short exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1359,14 +1574,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..4a748df8ac8 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,10 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+	bool		simd_failed_first_vector;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-05 21:25  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Andrew Dunstan @ 2026-03-05 21:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]> wrote:
>> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
>>> If anyone has any suggestions/ideas, please let me know!
> I am able to fix the problem. My first assumption was that the
> branching of SIMD code caused that problem, so I moved SIMD code to
> the CopyReadLineTextSIMDHelper() function. Then I moved this
> CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> that we won't have any branching in the non-SIMD (scalar) code path.
> This didn't solve the problem and then I realized that even though I
> disable SIMD code path with 'if (false)', there is still regression
> but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> there is no regression at all.
>
> To find out more, I compared assembly outputs of both and found out
> the possible reason. What I understood is that the compiler can't
> promote a variable to register, instead these variables live in the
> stack; which is slower. Please see the two different assembly outputs:
>
> Slow code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
>       db4:    48 63 c6                 movslq %esi,%rax
>       db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
>       dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
>       dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx
>
> Fast code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       d80:    49 63 c4                 movslq %r12d,%rax
>       d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
>       d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax
>
> And the reason for that is sending the address of input_buf_ptr to a
> CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> this:
>
> int            temp_input_buf_ptr = input_buf_ptr;
> CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
>
> Then there is no regression. However, I am still not completely sure
> if that is the same problem in the v10, I am planning to spend more
> time debugging this.
>
>> A couple of random ideas:
>>
>> * Additional inlining for callers.  I looked around a little bit and didn't
>> see any great candidates, so I don't have much faith in this, but maybe
>> you'll see something I don't.
> I agree with you. CopyReadLineText() is already quite a big function.
>
>> * Disable SIMD if we are consistently getting small rows.  That won't help
>> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
>> the regression for narrow rows described elsewhere.
> I implemented this, two consecutive small rows disables SIMD.
>
>> * Surround the variable initializations with "if (simd_enabled)".
>> Presumably compilers are smart enough to remove those in the non-SIMD paths
>> already, but it could be worth a try.
> Done.
>
>> * Add simd_enabled function parameter to CopyReadLine(),
>> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
>> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
>> compiler to do some additional optimizations to reduce branching.
> I think we don't need this. At least the implementation with
> CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> at the top and it will be once per line.
>
> I think v11 looks better compared to v10. I liked the
> CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> the top of CopyReadLineText(), not being in the scalar path. This
> gives us more optimization options without affecting the scalar path.
>
> Here are the new benchmark results, I benchmarked the changes with
> both -O2 and -O3 and also both with and without 'changing
> default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> results show that there is no regression and the performance
> improvement is much bigger with 65def42b1d5, it is close to 2x for
> text format and more than 2x for the csv format.


I spent some time exploring different ideas for improving this, but 
found none that didn't cause regression in some cases, so good to go 
from my POV.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 16:59  Manni Wood <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-06 16:59 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello.

I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using
the same build I've been using: meson with "debugoptimized", which
translates to "-g -O2" gcc flags.

x86 NARROW old master (18bcdb75)
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW v10
TXT :                 26416.331500 ms  -1.957890% regression
CSV :                 25318.727500 ms  10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms  -2.928061% regression
CSV with 1/3 quotes:  32805.627750 ms  5.026032% improvement

x86 NARROW v11
TXT :                 27212.945750 ms  -5.032545% regression
CSV :                 26985.971250 ms  4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms  2.078374% improvement
CSV with 1/3 quotes:  32817.267500 ms  4.992334% improvement


x86 WIDE old master (18bcdb75)
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE v10
TXT :                 23067.046750 ms  19.846046% improvement
CSV :                 23259.092250 ms  34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms  1.989583% improvement
CSV with 1/3 quotes:  42925.792250 ms  8.715948% improvement

x86 WIDE v11
TXT :                 22571.305750 ms  21.568659% improvement
CSV :                 22711.524750 ms  36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms  9.879604% improvement
CSV with 1/3 quotes:  40022.110750 ms  14.890786% improvement



arm NARROW old master (18bcdb75)
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW v10
TXT :                 10467.816750 ms  4.816988% improvement
CSV :                 9986.288000 ms  7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms  -0.234262% regression
CSV with 1/3 quotes:  11843.611750 ms  5.699116% improvement

arm NARROW v11
TXT :                 10340.966250 ms  5.970429% improvement
CSV :                 10224.399500 ms  5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms  -1.351288% regression
CSV with 1/3 quotes:  11865.934000 ms  5.521383% improvement


arm WIDE old master (18bcdb75)
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE v10
TXT :                 9064.959000 ms  23.345727% improvement
CSV :                 9019.553250 ms  35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms  8.087402% improvement
CSV with 1/3 quotes:  15495.863750 ms  11.744482% improvement

arm WIDE v11
TXT :                 9001.442250 ms  23.882831% improvement
CSV :                 8940.928750 ms  35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms  10.282589% improvement
CSV with 1/3 quotes:  15277.843250 ms  12.986201% improvement

Best,

-Manni

On Thu, Mar 5, 2026 at 3:25 PM Andrew Dunstan <[email protected]> wrote:

>
> On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]>
> wrote:
> >> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> >>> If anyone has any suggestions/ideas, please let me know!
> > I am able to fix the problem. My first assumption was that the
> > branching of SIMD code caused that problem, so I moved SIMD code to
> > the CopyReadLineTextSIMDHelper() function. Then I moved this
> > CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> > that we won't have any branching in the non-SIMD (scalar) code path.
> > This didn't solve the problem and then I realized that even though I
> > disable SIMD code path with 'if (false)', there is still regression
> > but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> > there is no regression at all.
> >
> > To find out more, I compared assembly outputs of both and found out
> > the possible reason. What I understood is that the compiler can't
> > promote a variable to register, instead these variables live in the
> > stack; which is slower. Please see the two different assembly outputs:
> >
> > Slow code:
> >
> >          c = copy_input_buf[input_buf_ptr++];
> >       db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
> >       db4:    48 63 c6                 movslq %esi,%rax
> >       db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
> >       dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
> >       dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx
> >
> > Fast code:
> >
> >          c = copy_input_buf[input_buf_ptr++];
> >       d80:    49 63 c4                 movslq %r12d,%rax
> >       d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
> >       d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax
> >
> > And the reason for that is sending the address of input_buf_ptr to a
> > CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> > this:
> >
> > int            temp_input_buf_ptr = input_buf_ptr;
> > CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
> >
> > Then there is no regression. However, I am still not completely sure
> > if that is the same problem in the v10, I am planning to spend more
> > time debugging this.
> >
> >> A couple of random ideas:
> >>
> >> * Additional inlining for callers.  I looked around a little bit and
> didn't
> >> see any great candidates, so I don't have much faith in this, but maybe
> >> you'll see something I don't.
> > I agree with you. CopyReadLineText() is already quite a big function.
> >
> >> * Disable SIMD if we are consistently getting small rows.  That won't
> help
> >> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help
> with
> >> the regression for narrow rows described elsewhere.
> > I implemented this, two consecutive small rows disables SIMD.
> >
> >> * Surround the variable initializations with "if (simd_enabled)".
> >> Presumably compilers are smart enough to remove those in the non-SIMD
> paths
> >> already, but it could be worth a try.
> > Done.
> >
> >> * Add simd_enabled function parameter to CopyReadLine(),
> >> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do
> the
> >> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage
> the
> >> compiler to do some additional optimizations to reduce branching.
> > I think we don't need this. At least the implementation with
> > CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> > at the top and it will be once per line.
> >
> > I think v11 looks better compared to v10. I liked the
> > CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> > the top of CopyReadLineText(), not being in the scalar path. This
> > gives us more optimization options without affecting the scalar path.
> >
> > Here are the new benchmark results, I benchmarked the changes with
> > both -O2 and -O3 and also both with and without 'changing
> > default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> > results show that there is no regression and the performance
> > improvement is much bigger with 65def42b1d5, it is close to 2x for
> > text format and more than 2x for the csv format.
>
>
> I spent some time exploring different ideas for improving this, but
> found none that didn't cause regression in some cases, so good to go
> from my POV.
>
>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 17:39  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-06 17:39 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 6 Mar 2026 at 20:00, Manni Wood <[email protected]> wrote:
>
> Hello.
>
> I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using the same build I've been using: meson with "debugoptimized", which translates to "-g -O2" gcc flags.

Thanks for the benchmark! The results look nice.

One question: does your benchmark include the 34dfca2934 LZ4 commit,
and is LZ4 enabled on your system?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 18:13  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-06 18:13 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 6, 2026 at 11:39 AM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Fri, 6 Mar 2026 at 20:00, Manni Wood <[email protected]>
> wrote:
> >
> > Hello.
> >
> > I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using
> the same build I've been using: meson with "debugoptimized", which
> translates to "-g -O2" gcc flags.
>
> Thanks for the benchmark! The results look nice.
>
> One question: does your benchmark include the 34dfca2934 LZ4 commit,
> and is LZ4 enabled on your system?
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello, Nazir!

When I ran `meson setup build --buildtype=debugoptimized` on both my x86
machine and my arm machine, the response on both was:

"External libraries
"  lz4                      : NO"

However, I did not remove commit 34dfca2934 from any of my Postgres builds;
I left that commit in place.

Let me know if that helps!

And I agree, the results look nice.

Best,

-Manni
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 18:55  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-06 18:55 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Manni!

On Fri, 6 Mar 2026 at 21:13, Manni Wood <[email protected]> wrote:
>
> When I ran `meson setup build --buildtype=debugoptimized` on both my x86 machine and my arm machine, the response on both was:
>
> "External libraries
> "  lz4                      : NO"
>
> However, I did not remove commit 34dfca2934 from any of my Postgres builds; I left that commit in place.
>
> Let me know if that helps!

That definitely helps, thanks! If you have a chance, could you also
run the benchmark with LZ4 enabled? I expect you may see significantly
better performance, similar to what I observed.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 21:25  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-06 21:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 6, 2026 at 12:55 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi Manni!
>
> On Fri, 6 Mar 2026 at 21:13, Manni Wood <[email protected]>
> wrote:
> >
> > When I ran `meson setup build --buildtype=debugoptimized` on both my x86
> machine and my arm machine, the response on both was:
> >
> > "External libraries
> > "  lz4                      : NO"
> >
> > However, I did not remove commit 34dfca2934 from any of my Postgres
> builds; I left that commit in place.
> >
> > Let me know if that helps!
>
> That definitely helps, thanks! If you have a chance, could you also
> run the benchmark with LZ4 enabled? I expect you may see significantly
> better performance, similar to what I observed.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hi, Nazir.

Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
with no lz4, and then your v11 patch with lz4 compiled in.

(Aside: I assume everything is good for my lz4 build after installing the
lz4 dev library and seeing this with my meson config:

"  External libraries
"    lz4                      : YES 1.9.4"

and this from the db itself:

$ ./bin/psql -U mwood -d postgres -c 'show default_toast_compression'
 default_toast_compression
---------------------------
 lz4
)

x86 NARROW old master
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW v11
TXT :                 27212.945750 ms  -5.032545% regression
CSV :                 26985.971250 ms  4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms  2.078374% improvement
CSV with 1/3 quotes:  32817.267500 ms  4.992334% improvement

x86 NARROW v11 lz4
TXT :                 26471.776500 ms  -2.171889% regression
CSV :                 25607.026250 ms  8.993538% improvement
TXT with 1/3 escapes: 28628.729750 ms  -3.002617% regression
CSV with 1/3 quotes:  34729.006750 ms  -0.542249% regression


x86 WIDE old master
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE v11
TXT :                 22571.305750 ms  21.568659% improvement
CSV :                 22711.524750 ms  36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms  9.879604% improvement
CSV with 1/3 quotes:  40022.110750 ms  14.890786% improvement

x86 WIDE v11 lz4
TXT :                 8032.912750 ms  72.087033% improvement
CSV :                 8047.098000 ms  77.441358% improvement
TXT with 1/3 escapes: 15428.139500 ms  52.443272% improvement
CSV with 1/3 quotes:  27517.084500 ms  41.483410% improvement



arm NARROW old master
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW v11
TXT :                 10340.966250 ms  5.970429% improvement
CSV :                 10224.399500 ms  5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms  -1.351288% regression
CSV with 1/3 quotes:  11865.934000 ms  5.521383% improvement

arm NARROW v11 lz4
TXT :                 9783.737000 ms  11.037270% improvement
CSV :                 10122.890750 ms  6.248254% improvement
TXT with 1/3 escapes: 10298.780250 ms  0.002590% improvement
CSV with 1/3 quotes:  11738.992250 ms  6.532115% improvement


arm WIDE old master
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE v11
TXT :                 9001.442250 ms  23.882831% improvement
CSV :                 8940.928750 ms  35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms  10.282589% improvement
CSV with 1/3 quotes:  15277.843250 ms  12.986201% improvement

arm WIDE v11 lz4
TXT :                 3186.825500 ms  73.051859% improvement
CSV :                 3142.526500 ms  77.403396% improvement
TXT with 1/3 escapes: 6180.176000 ms  53.984677% improvement
CSV with 1/3 quotes:  9460.505500 ms  46.118407% improvement

Cheers,

-Manni

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 23:13  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-06 23:13 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
> with no lz4, and then your v11 patch with lz4 compiled in.

I'm appreciative of all the benchmarking that you and others are doing, but
wouldn't we be more interested in the difference between "old master with
lz4" and "v11 with lz4"?  Else, we have multiple variables in play.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 23:31  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: KAZAR Ayoub @ 2026-03-06 23:31 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]>
wrote:

> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> > Well, golly! Look at these numbers. Old master with no lz4, your v11
> patch
> > with no lz4, and then your v11 patch with lz4 compiled in.
>
> I'm appreciative of all the benchmarking that you and others are doing, but
> wouldn't we be more interested in the difference between "old master with
> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
>
Yes I agree because the lz4 effect doesn't prove anything for the SIMD
patch itself right ? So basically a comparison for the SIMD effect should
be "master with/out lz4 vs patched with/out lz4, respectively and nothing
more!", is this correct ?

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-08 10:31  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-08 10:31 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 7 Mar 2026 at 02:31, KAZAR Ayoub <[email protected]> wrote:
>
> On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]> wrote:
>>
>> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
>> > Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
>> > with no lz4, and then your v11 patch with lz4 compiled in.
>>
>> I'm appreciative of all the benchmarking that you and others are doing, but
>> wouldn't we be more interested in the difference between "old master with
>> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
>
> Yes I agree because the lz4 effect doesn't prove anything for the SIMD patch itself right ? So basically a comparison for the SIMD effect should be "master with/out lz4 vs patched with/out lz4, respectively and nothing more!", is this correct ?

Yes, I think 'master with/out lz4 vs patched with/out lz4,
respectively' is enough to determine the effect of the SIMD patch.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-08 19:45  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-08 19:45 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Sun, Mar 8, 2026 at 5:31 AM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sat, 7 Mar 2026 at 02:31, KAZAR Ayoub <[email protected]> wrote:
> >
> > On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]>
> wrote:
> >>
> >> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> >> > Well, golly! Look at these numbers. Old master with no lz4, your v11
> patch
> >> > with no lz4, and then your v11 patch with lz4 compiled in.
> >>
> >> I'm appreciative of all the benchmarking that you and others are doing,
> but
> >> wouldn't we be more interested in the difference between "old master
> with
> >> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
> >
> > Yes I agree because the lz4 effect doesn't prove anything for the SIMD
> patch itself right ? So basically a comparison for the SIMD effect should
> be "master with/out lz4 vs patched with/out lz4, respectively and nothing
> more!", is this correct ?
>
> Yes, I think 'master with/out lz4 vs patched with/out lz4,
> respectively' is enough to determine the effect of the SIMD patch.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello!

As requested, here are some numbers based on the latest master but with the
copy code inlining excised (`git revert
dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy
code inlining left in place and the v11 patch applied.
Both results have lz4 compression in place.

I have not run numbers without lz4. I assume I could use the two postgres
instances that I have compiled with lz4, but just set
`default_toast_compression = pglz` in postgesql.conf for both instances.
Let me know if that is a mistaken assumption on my part.

arm NARROW master without inline with lz4
TXT :                 10362.799500 ms
CSV :                 10288.791000 ms
TXT with 1/3 escapes: 10411.416250 ms
CSV with 1/3 quotes:  12318.385750 ms

arm NARROW master with inline with lz4 with v11patch
TXT :                 10317.125750 ms  0.440747% improvement
CSV :                 10418.020250 ms -1.256020% regression
TXT with 1/3 escapes: 10188.319500 ms  2.142809% improvement
CSV with 1/3 quotes:  12032.964500 ms  2.317035% improvement


arm WIDE master without inline with lz4
TXT :                  5608.834500 ms
CSV :                  8115.155000 ms
TXT with 1/3 escapes:  7037.290500 ms
CSV with 1/3 quotes:  10894.615750 ms

arm WIDE master with inline with lz4 with v11patch
TXT :                  3190.268750 ms  43.120647% improvement
CSV :                  3135.177000 ms  61.366394% improvement
TXT with 1/3 escapes:  6373.746750 ms   9.428966% improvement
CSV with 1/3 quotes:  10336.763500 ms   5.120440% improvement



x86 NARROW-master-without-inline-with-lz4.log
TXT :                 26701.079250 ms
CSV :                 26492.235500 ms
TXT with 1/3 escapes: 28590.508250 ms
CSV with 1/3 quotes:  34876.742750 ms

x86 NARROW-master-with-inline-with-lz4-with-v11patch.log
TXT :                 26511.747750 ms  0.709078% improvement
CSV :                 26261.269750 ms  0.871824% improvement
TXT with 1/3 escapes: 27702.964750 ms  3.104329% improvement
CSV with 1/3 quotes:  32339.393000 ms  7.275191% improvement


x86 WIDE-master-without-inline-with-lz4.log
TXT :                 14485.563250 ms
CSV :                 21392.582000 ms
TXT with 1/3 escapes: 18081.514750 ms
CSV with 1/3 quotes:  32547.086250 ms

x86 WIDE-master-with-inline-with-lz4-with-v11patch.log
TXT :                  8080.378250 ms  44.217714% improvement
CSV :                  8283.723000 ms  61.277591% improvement
TXT with 1/3 escapes: 15054.111000 ms  16.743087% improvement
CSV with 1/3 quotes:  25668.009750 ms  21.135768% improvement
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 08:10  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-09 08:10 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sun, 8 Mar 2026 at 22:45, Manni Wood <[email protected]> wrote:
>
> As requested, here are some numbers based on the latest master but with the copy code inlining excised (`git revert dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy code inlining left in place and the v11 patch applied.
> Both results have lz4 compression in place.

Thank you for the benchmark!

> I have not run numbers without lz4. I assume I could use the two postgres instances that I have compiled with lz4, but just set `default_toast_compression = pglz` in postgesql.conf for both instances. Let me know if that is a mistaken assumption on my part.

I am a bit confused. Are you asking that for the current benchmark you
shared or future benchmarks? I assume your current benchmark has
'default_toast_compression = lz4' because your benchmark results are
very similar to my benchmark with 'default_toast_compression = lz4'
but I just wanted to make sure.

What you said about editing postgresql.conf is correct but you need to
make this change before creating the Postgres instance with 'pg_ctl
... start' command, otherwise it won't have an effect and you need to
restart the instance to see the effect. Also, If you want to benchmark
without lz4 change, you can just use the "SET
default_toast_compression to 'pglz';" command in psql, then you don't
need to edit postgresql.conf. Please note that this will affect only
the psql instance you typed the command. To make things easier, you
can run the 'SHOW default_toast_compression;' command to see the
current value of 'default_toast_compression'.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 13:31  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-09 13:31 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Mar 9, 2026 at 3:10 AM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sun, 8 Mar 2026 at 22:45, Manni Wood <[email protected]>
> wrote:
> >
> > As requested, here are some numbers based on the latest master but with
> the copy code inlining excised (`git revert
> dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy
> code inlining left in place and the v11 patch applied.
> > Both results have lz4 compression in place.
>
> Thank you for the benchmark!
>
> > I have not run numbers without lz4. I assume I could use the two
> postgres instances that I have compiled with lz4, but just set
> `default_toast_compression = pglz` in postgesql.conf for both instances.
> Let me know if that is a mistaken assumption on my part.
>
> I am a bit confused. Are you asking that for the current benchmark you
> shared or future benchmarks? I assume your current benchmark has
> 'default_toast_compression = lz4' because your benchmark results are
> very similar to my benchmark with 'default_toast_compression = lz4'
> but I just wanted to make sure.
>
> What you said about editing postgresql.conf is correct but you need to
> make this change before creating the Postgres instance with 'pg_ctl
> ... start' command, otherwise it won't have an effect and you need to
> restart the instance to see the effect. Also, If you want to benchmark
> without lz4 change, you can just use the "SET
> default_toast_compression to 'pglz';" command in psql, then you don't
> need to edit postgresql.conf. Please note that this will affect only
> the psql instance you typed the command. To make things easier, you
> can run the 'SHOW default_toast_compression;' command to see the
> current value of 'default_toast_compression'.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello, Nazir!

I was being too brief.

The benchmarks I shared were absolutely with lz4 compiled in
and 'default_toast_compression = lz4' set in postgresql.conf for every
postgres instance I tested with. (Furthermore, I ran `show
default_toast_compression` via `psql` on each postgres instance to be
sure 'default_toast_compression = lz4' was really set!)

Also, all were compiled using meson using `debugoptimized` which results in
`-g -O2`.

So those are the benchmarks that I shared.

OK, so my final question, hopefully clarified: If I run additional
benchmarks where pglz is used for default_toast_compression, is it enough
to use the instances I have already compiled with lz4 in them, but
with 'default_toast_compression = pglz` explicitly set in postgresql.conf
in a brand new data dir created by initdb? (In other words, existing data
dir deleted, then initdb run to make a new data dir, then postgresql.conf
edited to ensure 'default_toast_compression = pglz` explicitly set, then
and only then starting up the cluster for the first time... and finally
verifying via `show default_toast_compression` for good measure.)

Or should I re-compile with the lz4-is-now-the-default commit completely
excised?

Thanks so much!

-Manni

--
-- Manni Wood EDB: https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 13:43  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-09 13:43 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Manni!

On Mon, 9 Mar 2026 at 16:31, Manni Wood <[email protected]> wrote:
>
> I was being too brief.
>
> The benchmarks I shared were absolutely with lz4 compiled in and 'default_toast_compression = lz4' set in postgresql.conf for every postgres instance I tested with. (Furthermore, I ran `show default_toast_compression` via `psql` on each postgres instance to be sure 'default_toast_compression = lz4' was really set!)
>
> Also, all were compiled using meson using `debugoptimized` which results in `-g -O2`.
>
> So those are the benchmarks that I shared.

Thanks for the clarification.

> OK, so my final question, hopefully clarified: If I run additional benchmarks where pglz is used for default_toast_compression, is it enough to use the instances I have already compiled with lz4 in them, but with 'default_toast_compression = pglz` explicitly set in postgresql.conf in a brand new data dir created by initdb? (In other words, existing data dir deleted, then initdb run to make a new data dir, then postgresql.conf edited to ensure 'default_toast_compression = pglz` explicitly set, then and only then starting up the cluster for the first time... and finally verifying via `show default_toast_compression` for good measure.)
>
> Or should I re-compile with the lz4-is-now-the-default commit completely excised?

Yes, it is clear now; thanks. You don't need to compile without the
lz4-is-now-the-default commit. You can compile with lz4 commit and set
the 'default_toast_compression = pglz' in the postgresql.conf like you
said. This should be enough.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 18:25  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 2 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-03-09 18:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> +#ifndef USE_NO_SIMD
> +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> +									   bool *temp_hit_eof, int *temp_input_buf_ptr);
> +#endif

Should we inline this, too?

> +				/*
> +				 * Do not disable SIMD when we hit EOL or EOF characters. In
> +				 * practice, it does not matter for EOF because parsing ends
> +				 * there, but we keep the behavior consistent.
> +				 */
> +				if (!(simd_hit_eof || simd_hit_eol))
> +					cstate->simd_enabled = false;

nitpick: I would personally avoid disabling it for EOF.  It probably
doesn't amount to much, but I don't see any point in the extra
complexity/work solely for consistency.

> +				/*
> +				 * We encountered a EOL or EOF on the first vector. This means
> +				 * lines are not long enough to skip fully sized vector. If
> +				 * this happens two times consecutively, then disable the
> +				 * SIMD.
> +				 */
> +				if (first_vector)
> +				{
> +					if (cstate->simd_failed_first_vector)
> +						cstate->simd_enabled = false;
> +
> +					cstate->simd_failed_first_vector = true;
> +				}

The first time I saw this, my mind immediately went to the extreme case
where this likely regresses: alternating long and short lines.  We might
just want to disable it the first time we see a short line, like we do for
special characters.  This is another thing that we can improve
independently later on.

> +	/* First try to run SIMD, then continue with the scalar path */
> +	if (cstate->simd_enabled)
> +	{
> +		int			temp_input_buf_ptr = input_buf_ptr;
> +		bool		temp_hit_eof = false;
> +
> +		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> +											&temp_input_buf_ptr);
> +		input_buf_ptr = temp_input_buf_ptr;
> +		hit_eof = temp_hit_eof;

Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
moving the SIMD stuff to its own function is nice.  The temp variables seem
a bit too magical to me, though.  If those really make a difference, IMHO
there ought to be a big comment explaining why.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 02:30  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-10 02:30 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Mar 9, 2026 at 1:25 PM Nathan Bossart <[email protected]>
wrote:

> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool
> is_csv,
> > +
> bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?
>
> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or
> EOF characters. In
> > +                              * practice, it does not matter for EOF
> because parsing ends
> > +                              * there, but we keep the behavior
> consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.
>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the
> first vector. This means
> > +                              * lines are not long enough to skip fully
> sized vector. If
> > +                              * this happens two times consecutively,
> then disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if
> (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled =
> false;
> > +
> > +                                     cstate->simd_failed_first_vector =
> true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.
>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv,
> &temp_hit_eof,
> > +
>              &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.
>
> --
> nathan
>

Here are some benchmarks showing what performance will look like for users
who continue to use default_toast_compression = pglz.

all compiled by meson with debugoptimized (-g -O2)

arm NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 10055.141000 ms
CSV :                 10549.174500 ms
TXT with 1/3 escapes: 10213.864750 ms
CSV with 1/3 quotes:  12188.039000 ms

arm NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 10070.153750 ms  -0.149304% regression
CSV :                 10161.348750 ms   3.676361% improvement
TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression

arm WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 11355.602750 ms
CSV :                 13893.110500 ms
TXT with 1/3 escapes: 12872.690500 ms
CSV with 1/3 quotes:  16722.262500 ms

arm WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 9001.007250 ms  20.735099% improvement
CSV :                 8988.679750 ms  35.301171% improvement
TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement


x86 NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 26243.084500 ms
CSV :                 27719.564000 ms
TXT with 1/3 escapes: 29578.192750 ms
CSV with 1/3 quotes:  34467.571250 ms

x86 NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 26371.996750 ms  -0.491224% regression
CSV :                 26137.186500 ms   5.708522% improvement
TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement

x86 WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 28734.774750 ms
CSV :                 35700.485000 ms
TXT with 1/3 escapes: 32376.878250 ms
CSV with 1/3 quotes:  47024.985750 ms

x86 WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 22753.755750 ms  20.814567% improvement
CSV :                 22977.195500 ms  35.638982% improvement
TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 11:42  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-10 11:42 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 10 Mar 2026 at 05:30, Manni Wood <[email protected]> wrote:
>
> Here are some benchmarks showing what performance will look like for users who continue to use default_toast_compression = pglz.
>
> all compiled by meson with debugoptimized (-g -O2)
>
> arm NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 10055.141000 ms
> CSV :                 10549.174500 ms
> TXT with 1/3 escapes: 10213.864750 ms
> CSV with 1/3 quotes:  12188.039000 ms
>
> arm NARROW master with inline with v11patch default_toast_compression = pglz
> TXT :                 10070.153750 ms  -0.149304% regression
> CSV :                 10161.348750 ms   3.676361% improvement
> TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
> CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression
>
> arm WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 11355.602750 ms
> CSV :                 13893.110500 ms
> TXT with 1/3 escapes: 12872.690500 ms
> CSV with 1/3 quotes:  16722.262500 ms
>
> arm WIDE master with inline with v11patch default_toast_compression = pglz
> TXT :                 9001.007250 ms  20.735099% improvement
> CSV :                 8988.679750 ms  35.301171% improvement
> TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
> CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement
>
>
> x86 NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 26243.084500 ms
> CSV :                 27719.564000 ms
> TXT with 1/3 escapes: 29578.192750 ms
> CSV with 1/3 quotes:  34467.571250 ms
>
> x86 NARROW master with inline with v11patch default_toast_compression = pglz
> TXT :                 26371.996750 ms  -0.491224% regression
> CSV :                 26137.186500 ms   5.708522% improvement
> TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
> CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement
>
> x86 WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 28734.774750 ms
> CSV :                 35700.485000 ms
> TXT with 1/3 escapes: 32376.878250 ms
> CSV with 1/3 quotes:  47024.985750 ms
>
> x86 WIDE master with inline with v11patch default_toast_compression = pglz
> TXT :                 22753.755750 ms  20.814567% improvement
> CSV :                 22977.195500 ms  35.638982% improvement
> TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
> CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement

Thank you for the benchmark, results look nice! So, there is almost no
regression for both pglz and lz4 toast compression modes. Best case is
~60% improvement for the lz4 and ~35% improvement for the pglz.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 12:35  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-10 12:35 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Mon, 9 Mar 2026 at 21:25, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> > +                                                                        bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?

I think there is no need to inline this function. In the previous
version, SIMD code was in the main for loop which loops for every
character in the data. This means there was branching for every
character in the data. In the current version, SIMD code is outside of
this loop so there is no branching.


> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or EOF characters. In
> > +                              * practice, it does not matter for EOF because parsing ends
> > +                              * there, but we keep the behavior consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.

Done. I thought that was a small change but this removed more
complexity than I thought.


>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the first vector. This means
> > +                              * lines are not long enough to skip fully sized vector. If
> > +                              * this happens two times consecutively, then disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled = false;
> > +
> > +                                     cstate->simd_failed_first_vector = true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.

I agree with you, done.


>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> > +                                                                                     &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.

I added a comment, please let me know if you wouldn't like it.


--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v12-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (10.0K, 2-v12-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From de695aaf5c7ceeb4f62d2352fabbb111047a4434 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 206 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..166b1c4c415 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,155 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * We encountered a special character in the first vector.
+				 * This means line is not long enough to skip fully sized
+				 * vector. To be cautios, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1495,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Short exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1559,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 17:10  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-10 17:10 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Mar 10, 2026 at 03:35:30PM +0300, Nazir Bilal Yavuz wrote:
> Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD

This looks pretty good to me.  I'm hoping to take a closer look in the near
future, but I think we are approaching something committable.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 11:36  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 11:36 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 10 Mar 2026 at 20:10, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Mar 10, 2026 at 03:35:30PM +0300, Nazir Bilal Yavuz wrote:
> > Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD
>
> This looks pretty good to me.  I'm hoping to take a closer look in the near
> future, but I think we are approaching something committable.

Thanks for looking into it!

I am attaching v13 of the patch.

0001 is basically some typo fixes on top of v12, no functional changes.

0002 has an attempt to remove some branches from SIMD code but since
it is kind of functional change, I wanted to attach that as another
patch. I think we can apply some parts of this, if not all.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v13-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (10.0K, 2-v13-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From f3e9a234ddc537544d510ad344eb1b8eb2127855 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:29 +0300
Subject: [PATCH v13 1/2] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 206 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..9f1256353c4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,155 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * If we encountered a special character in the first vector,
+				 * this means line is not long enough to skip fully sized
+				 * vector. To be cautious, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1495,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Early exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1559,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



  [text/x-patch] v13-0002-Upcoming-improvements.patch (2.4K, 3-v13-0002-Upcoming-improvements.patch)
  download | inline diff:
From 1006dac44cb208a8b164f2553772de942e79c2d4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:55 +0300
Subject: [PATCH v13 2/2] Upcoming improvements

---
 src/backend/commands/copyfromparse.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 9f1256353c4..55159b0122c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1333,8 +1333,6 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 static bool
 CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
 {
-	char		quotec = '\0';
-	char		escapec = '\0';
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
 	int			copy_buf_len;
@@ -1343,16 +1341,15 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof
 	bool		first_vector = true;
 	Vector8		nl = vector8_broadcast('\n');
 	Vector8		cr = vector8_broadcast('\r');
-	Vector8		bs = vector8_broadcast('\\');
-	Vector8		quote = vector8_broadcast(0);
+	Vector8		bs_or_quote = vector8_broadcast('\\');
 	Vector8		escape = vector8_broadcast(0);
 
 	if (is_csv)
 	{
-		quotec = cstate->opts.quote[0];
-		escapec = cstate->opts.escape[0];
+		char		quotec = cstate->opts.quote[0];
+		char		escapec = cstate->opts.escape[0];
 
-		quote = vector8_broadcast(quotec);
+		bs_or_quote = vector8_broadcast(quotec);
 		if (quotec != escapec)
 		{
 			unique_escapec = true;
@@ -1397,18 +1394,10 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof
 			/* Load a chunk of data into a vector register */
 			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
 
-			if (is_csv)
-			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, quote));
-				if (unique_escapec)
-					match = vector8_or(match, vector8_eq(chunk, escape));
-			}
-			else
-			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, bs));
-			}
+			match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+			match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
+			if (unique_escapec)
+				match = vector8_or(match, vector8_eq(chunk, escape));
 
 			/* Check if we found any special characters */
 			if (vector8_is_highbit_set(match))
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 12:19  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: KAZAR Ayoub @ 2026-03-11 12:19 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]> wrote:

> 0002 has an attempt to remove some branches from SIMD code but since
> it is kind of functional change, I wanted to attach that as another
> patch. I think we can apply some parts of this, if not all.
>
0002 sounds really good to have, haven't measured the diff but it's very
logical.

Another quick question though, do we need USE_NO_SIMD for any reason? I
just remembered that there's some simd paths like json that don't use it.

Regards,
Ayoub

>


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 13:10  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 13:10 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 15:19, KAZAR Ayoub <[email protected]> wrote:
>
> On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]> wrote:
>>
>> 0002 has an attempt to remove some branches from SIMD code but since
>> it is kind of functional change, I wanted to attach that as another
>> patch. I think we can apply some parts of this, if not all.
>
> 0002 sounds really good to have, haven't measured the diff but it's very logical.

I agree with you. I saw very small speedups like 1%-2% but I think
changes make sense regardless of the performance improvement.


> Another quick question though, do we need USE_NO_SIMD for any reason? I just remembered that there's some simd paths like json that don't use it.

vector8_eq() and vector8_highbit_mask() don't have non-SIMD
implementations, so we need to use USE_NO_SIMD.


-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 13:23  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: KAZAR Ayoub @ 2026-03-11 13:23 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 2:10 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Wed, 11 Mar 2026 at 15:19, KAZAR Ayoub <[email protected]> wrote:
> >
> > On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]>
> wrote:
> >>
> >> 0002 has an attempt to remove some branches from SIMD code but since
> >> it is kind of functional change, I wanted to attach that as another
> >> patch. I think we can apply some parts of this, if not all.
> >
> > 0002 sounds really good to have, haven't measured the diff but it's very
> logical.
>
> I agree with you. I saw very small speedups like 1%-2% but I think
> changes make sense regardless of the performance improvement.


>
> > Another quick question though, do we need USE_NO_SIMD for any reason? I
> just remembered that there's some simd paths like json that don't use it.
>
> vector8_eq() and vector8_highbit_mask() don't have non-SIMD
> implementations, so we need to use USE_NO_SIMD.
>
Aha ! that's true, thanks.

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 18:09  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-11 18:09 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote:
> 0002 has an attempt to remove some branches from SIMD code but since
> it is kind of functional change, I wanted to attach that as another
> patch. I think we can apply some parts of this, if not all.

Could you describe what this is doing and what the performance impact is?

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 18:49  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 18:49 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 21:09, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote:
> > 0002 has an attempt to remove some branches from SIMD code but since
> > it is kind of functional change, I wanted to attach that as another
> > patch. I think we can apply some parts of this, if not all.
>
> Could you describe what this is doing and what the performance impact is?

SIMD code check these characters:

csv mode: nl, cr, quote and possibly escape.

text mode: nl, cr and bs.

v12 checks them like that:

            if (is_csv)
            {
                match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
                match = vector8_or(match, vector8_eq(chunk, quote));
                if (unique_escapec)
                    match = vector8_or(match, vector8_eq(chunk, escape));
            }
            else
            {
                match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
                match = vector8_or(match, vector8_eq(chunk, bs));
            }

But actually we know that we will definitely check nl, cr and one of
the quote or bs characters in the code. So, we can introduce a new
variable named bs_or_quote, it will be equal to bs if the mode is text
and it will be equal to quote if the mode is csv. Then, we can remove
the 'if (is_csv)' check and only check for escape ('if
(unique_escapec)'). Now code will look like that:

            match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
            match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
            if (unique_escapec)
                match = vector8_or(match, vector8_eq(chunk, escape));

That is what v13-0002 does. I saw 1%-2% speedups with this change and
there was no regression.

Regardless of introducing the bs_or_quote variable, we can move 'match
= vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));' outside
of the if checks, though.

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 19:02  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-11 19:02 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 09:49:22PM +0300, Nazir Bilal Yavuz wrote:
> That is what v13-0002 does. I saw 1%-2% speedups with this change and
> there was no regression.

Thanks for the explanation.  Is there any reason _not_ to add this to 0001?

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 19:22  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 19:22 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 22:02, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 09:49:22PM +0300, Nazir Bilal Yavuz wrote:
> > That is what v13-0002 does. I saw 1%-2% speedups with this change and
> > there was no regression.
>
> Thanks for the explanation.  Is there any reason _not_ to add this to 0001?

I noticed this improvement today. To make it easier to review for
anyone who may have already started looking at v12, I attached the new
functional code changes as 0002. There was no other reason.

Here is v14 which is v13-0001 + v13-0002.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v14-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (9.8K, 2-v14-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From d19e62275db2943cc4275cac2262c63d0bd4436c Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:29 +0300
Subject: [PATCH v14] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 195 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 194 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..55159b0122c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,144 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs_or_quote = vector8_broadcast('\\');
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		char		quotec = cstate->opts.quote[0];
+		char		escapec = cstate->opts.escape[0];
+
+		bs_or_quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+			match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
+			if (unique_escapec)
+				match = vector8_or(match, vector8_eq(chunk, escape));
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * If we encountered a special character in the first vector,
+				 * this means line is not long enough to skip fully sized
+				 * vector. To be cautious, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1484,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Early exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1548,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 20:42  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-11 20:42 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> Here is v14 which is v13-0001 + v13-0002.

Thanks!  It's getting close.

> +		/*
> +		 * Temporary variables are used here instead of passing the actual
> +		 * variables (especially input_buf_ptr) directly to the helper. Taking
> +		 * the address of a local variable might force the compiler to
> +		 * allocate it on the stack rather than in a register.  Because
> +		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
> +		 * it in a register is important for performance.
> +		 */
> +		int			temp_input_buf_ptr;
> +		bool		temp_hit_eof = hit_eof;

A few notes:

* Does using a temporary variable for hit_eof actually make a difference?
AFAICT that's only updated when loading more data.

* Does inlining the function produce the same results?

* Also, I'm curious what the usual benchmarks look like with and without
this hack for the latest patch.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-12 10:59  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-12 10:59 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 23:42, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> > Here is v14 which is v13-0001 + v13-0002.
>
> Thanks!  It's getting close.
>
> > +             /*
> > +              * Temporary variables are used here instead of passing the actual
> > +              * variables (especially input_buf_ptr) directly to the helper. Taking
> > +              * the address of a local variable might force the compiler to
> > +              * allocate it on the stack rather than in a register.  Because
> > +              * input_buf_ptr is used heavily in the hot scalar path below, keeping
> > +              * it in a register is important for performance.
> > +              */
> > +             int                     temp_input_buf_ptr;
> > +             bool            temp_hit_eof = hit_eof;
>
> A few notes:
>
> * Does using a temporary variable for hit_eof actually make a difference?
> AFAICT that's only updated when loading more data.
>
> * Does inlining the function produce the same results?
>
> * Also, I'm curious what the usual benchmarks look like with and without
> this hack for the latest patch.

I tried to benchmark all of these questions, here are the results:

Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155).

v14 means d841ca2d14 + v14.

v14 + #1 means removing temporary variables.

v14 + #2 means removing temp_hit_eof variable only.

v14 + #3 means inlining CopyReadLineTextSIMDHelper().

v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing
temporary variables (#1).

------------------------------------------------------------

Results for default_toast_compression = 'lz4':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+

------------------------------------------------------------

Results for default_toast_compression = 'pglz':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+


------------------------------------------------------------

By looking these results:

v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases.

v14 + #4 performs worse on CSV & wide & 1/3 cases.

v14 and v14 + #2 perform very similarly. They don't have regression. I
think we can move forward with one of these.

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-12 17:37  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-03-12 17:37 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Here is what I have staged for commit, which I'm planning to do tomorrow.
Please review and/or test if you are able.

-- 
nathan


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 02:39  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Manni Wood @ 2026-03-13 02:39 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Mar 12, 2026 at 12:37 PM Nathan Bossart <[email protected]>
wrote:

> Here is what I have staged for commit, which I'm planning to do tomorrow.
> Please review and/or test if you are able.
>
> --
> nathan
>

Hello, Nathan!

I found some time this evening to run some benchmarks using your v15 patch.
I hope these help.

lz4 - arm

arm NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 10203.493250 ms
CSV :                 10217.946000 ms
TXT with 1/3 escapes: 10305.912750 ms
CSV with 1/3 quotes:  12339.182000 ms

arm NARROW v15 default_toast_compression = lz4
TXT :                 10205.261500 ms  -0.017330% regression
CSV :                 10358.898500 ms  -1.379460% regression
TXT with 1/3 escapes: 10053.073000 ms  2.453347% improvement
CSV with 1/3 quotes:  11881.337000 ms  3.710497% improvement

arm WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 5613.525250 ms
CSV :                 8069.692750 ms
TXT with 1/3 escapes: 7088.888250 ms
CSV with 1/3 quotes:  10902.545500 ms

arm WIDE v15 default_toast_compression = lz4
TXT :                 3201.494500 ms  42.968200% improvement
CSV :                 3146.033750 ms  61.014207% improvement
TXT with 1/3 escapes: 6677.907500 ms  5.797535% improvement
CSV with 1/3 quotes:  10766.909500 ms  1.244076% improvement

lz4 - x86

x86 NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 26110.287750 ms
CSV :                 27923.199750 ms
TXT with 1/3 escapes: 27984.483250 ms
CSV with 1/3 quotes:  34387.239000 ms

x86 NARROW v15 default_toast_compression = lz4
TXT :                 26019.629000 ms  0.347215% improvement
CSV :                 26379.889000 ms  5.526984% improvement
TXT with 1/3 escapes: 28865.322750 ms  -3.147600% regression
CSV with 1/3 quotes:  33218.293250 ms  3.399359% improvement

x86 WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 15829.765000 ms
CSV :                 20479.146000 ms
TXT with 1/3 escapes: 18437.507500 ms
CSV with 1/3 quotes:  29749.379250 ms

x86 WIDE v15 default_toast_compression = lz4
TXT :                 8056.305000 ms  49.106604% improvement
CSV :                 7997.555500 ms  60.947808% improvement
TXT with 1/3 escapes: 16324.925500 ms  11.458067% improvement
CSV with 1/3 quotes:  29978.346500 ms  -0.769654% regression



pglz - arm

arm NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 10334.666250 ms
CSV :                 10978.851250 ms
TXT with 1/3 escapes: 11076.502750 ms
CSV with 1/3 quotes:  12582.679000 ms

arm NARROW v15 default_toast_compression = pglz
TXT :                 10002.507750 ms  3.214023% improvement
CSV :                 10017.436250 ms  8.756973% improvement
TXT with 1/3 escapes: 10179.949000 ms  8.094195% improvement
CSV with 1/3 quotes:  12088.836750 ms  3.924778% improvement

arm WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 11403.206000 ms
CSV :                 13915.718750 ms
TXT with 1/3 escapes: 12888.060250 ms
CSV with 1/3 quotes:  16741.463000 ms

arm WIDE v15 default_toast_compression = pglz
TXT :                 9005.868250 ms  21.023366% improvement
CSV :                 8935.159250 ms  35.790889% improvement
TXT with 1/3 escapes: 12432.655250 ms  3.533542% improvement
CSV with 1/3 quotes:  16564.852250 ms  1.054930% improvement

pglz - x86

x86 NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 26404.516250 ms
CSV :                 28138.719000 ms
TXT with 1/3 escapes: 28084.379750 ms
CSV with 1/3 quotes:  34502.702250 ms

x86 NARROW v15 default_toast_compression = pglz
TXT :                 26438.415000 ms  -0.128382% regression
CSV :                 26869.718000 ms  4.509804% improvement
TXT with 1/3 escapes: 29379.299750 ms  -4.610819% regression
CSV with 1/3 quotes:  33371.390250 ms  3.278908% improvement

x86 WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 30595.372000 ms
CSV :                 35665.908500 ms
TXT with 1/3 escapes: 32746.252000 ms
CSV with 1/3 quotes:  44136.542750 ms

x86 WIDE v15 default_toast_compression = pglz
TXT :                 22681.770750 ms  25.865354% improvement
CSV :                 22692.153000 ms  36.375789% improvement
TXT with 1/3 escapes: 30638.978000 ms  6.435161% improvement
CSV with 1/3 quotes:  44330.233000 ms  -0.438843% regression
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 11:57  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 11:57 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 12 Mar 2026 at 20:37, Nathan Bossart <[email protected]> wrote:
>
> Here is what I have staged for commit, which I'm planning to do tomorrow.
> Please review and/or test if you are able.

Thank you!

Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
tmp_hit_eof variable then the regression disappears. Also, if I use a
struct like below, regression disappears again.

typedef struct CopyReadLineSIMDResult
{
    int            input_buf_ptr;
    bool        hit_eof;
    bool        result;
} CopyReadLineSIMDResult;

When I removed the tmp_hit_eof variable on v14, I didn't encounter any
regression. I really don't understand why this is happening on my end.
Manni didn't encounter any regression on the benchmark [1].

I benchmarked v15 and both of the cases above:

------------------------------------------------------------

Results for default_toast_compression = 'lz4':

+--------------------------------------------------+
|                 Optimization: -O2                |
+-------------------+--------------+---------------+
|                   |     Text     |      CSV      |
+-------------------+------+-------+-------+-------+
|        WIDE       | None |  1/3  |  None |  1/3  |
+-------------------+------+-------+-------+-------+
|     Old master    | 4260 |  4789 |  5930 |  8276 |
+-------------------+------+-------+-------+-------+
|        v14        | 2489 |  4439 |  2529 |  8098 |
+-------------------+------+-------+-------+-------+
|        v15        | 2494 |  4235 |  2490 |  9140 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 2487 |  4539 |  2478 |  8041 |
+-------------------+------+-------+-------+-------+
|    v15 + struct   | 2490 |  4531 |  2483 |  7756 |
+-------------------+------+-------+-------+-------+
|                   |      |       |       |       |
+-------------------+------+-------+-------+-------+
|                   |      |       |       |       |
+-------------------+------+-------+-------+-------+
|                   |     Text     |      CSV      |
+-------------------+------+-------+-------+-------+
|       NARROW      | None |  1/3  |  None |  1/3  |
+-------------------+------+-------+-------+-------+
|     Old master    | 9955 | 10056 | 10329 | 10872 |
+-------------------+------+-------+-------+-------+
|        v14        | 9917 | 10080 | 10104 | 10510 |
+-------------------+------+-------+-------+-------+
|        v15        | 9898 | 10062 | 10232 | 10483 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 9847 | 10004 | 10192 | 10437 |
+-------------------+------+-------+-------+-------+
|    v15 + struct   | 9877 | 10008 | 10107 | 10521 |
+-------------------+------+-------+-------+-------+


------------------------------------------------------------

Results for default_toast_compression = 'pglz':

+---------------------------------------------------+
|                 Optimization: -O2                 |
+-------------------+---------------+---------------+
|                   |      Text     |      CSV      |
+-------------------+-------+-------+-------+-------+
|        WIDE       |  None |  1/3  |  None |  1/3  |
+-------------------+-------+-------+-------+-------+
|     Old master    | 10579 | 10927 | 12276 | 14488 |
+-------------------+-------+-------+-------+-------+
|        v14        |  8832 | 10646 |  8815 | 14352 |
+-------------------+-------+-------+-------+-------+
|        v15        |  8859 | 10489 |  8835 | 15414 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof |  8828 | 10829 |  8840 | 14297 |
+-------------------+-------+-------+-------+-------+
|    v15 + struct   |  8847 | 10829 |  8846 | 14003 |
+-------------------+-------+-------+-------+-------+
|                   |       |       |       |       |
+-------------------+-------+-------+-------+-------+
|                   |       |       |       |       |
+-------------------+-------+-------+-------+-------+
|                   |      Text     |      CSV      |
+-------------------+-------+-------+-------+-------+
|       NARROW      |  None |  1/3  |  None |  1/3  |
+-------------------+-------+-------+-------+-------+
|     Old master    |  9952 | 10342 | 10112 | 10861 |
+-------------------+-------+-------+-------+-------+
|        v14        |  9907 | 10344 | 10103 | 10492 |
+-------------------+-------+-------+-------+-------+
|        v15        |  9897 | 10261 | 10126 | 10490 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof |  9848 | 10218 | 10184 | 10425 |
+-------------------+-------+-------+-------+-------+
|    v15 + struct   |  9858 | 10150 | 10116 | 10464 |
+-------------------+-------+-------+-------+-------+

------------------------------------------------------------

It can be seen that the 'csv & wide & 1/3' case is much better on 'v15
+ struct' and 'v15 + tmp_hit_eof' but 'text & wide & 1/3' case is a
bit worse but still better than master.


Regardless of the issues above, I encountered a compiler warning on
the v15, if 'USE_NO_SIMD' is defined, then this warning appears:

copyfromparse.c:1780:1: warning: label ‘out’ defined but not used
[-Wunused-label]

Rest of the changes look good to me. v16 is attached, it fixes the
warning by protecting 'out' with '#ifndef USE_NO_SIMD', no other
changes. In addition to that, I put 'using CopyReadLineSIMDResult
struct' as a 0002 to get an opinion.


[1] https://postgr.es/m/CAKWEB6pMbdMDvhfaX1Z0eSULVQFYhEhssaRHdOxAX_5OYubxKw%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v16-0001-Optimize-COPY-FROM-FORMAT-text-csv-using-SIMD.patch (8.9K, 2-v16-0001-Optimize-COPY-FROM-FORMAT-text-csv-using-SIMD.patch)
  download | inline diff:
From 49e82abfc752032fb10e2c144f7656f6fdf78366 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Thu, 12 Mar 2026 12:32:23 -0500
Subject: [PATCH v16 1/2] Optimize COPY FROM (FORMAT {text,csv}) using SIMD.

Presently, such commands scan the input buffer one byte at a time
looking for special characters.  This commit adds a new path that
uses SIMD instructions to skip over chunks of data without any
special characters.  This can be much faster.

To avoid regressions, SIMD processing is disabled for the remainder
of the COPY FROM command as soon as we encounter a short line or a
special character (except for end-of-line characters, else we'd
always disable it after the first line).  This is perhaps too
conservative, but it could probably be made more lenient in the
future via fine-tuned heuristics.

Author: Nazir Bilal Yavuz <[email protected]>
Co-authored-by: Shinya Kato <[email protected]>
Reviewed-by: Ayoub Kazar <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Tested-by: Manni Wood <[email protected]>
Tested-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   1 +
 src/backend/commands/copyfromparse.c     | 182 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   1 +
 3 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0ece40557c8..95f6cb416a9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1746,6 +1746,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->simd_enabled = true;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..bae3bf6fb0d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -1311,6 +1312,152 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Helper function for CopyReadLineText() that uses SIMD instructions to scan
+ * the input buffer for special characters.  This can be much faster.
+ *
+ * Note that we disable SIMD for the remainder of the COPY FROM command upon
+ * encountering a special character (except for end-of-line characters) or a
+ * short line.  This is perhaps too conservative, but it should help avoid
+ * regressions.  It could probably be made more lenient in the future via
+ * fine-tuned heuristics.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+						   bool *hit_eof_p, int *input_buf_ptr_p)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		unique_esc_char;	/* for csv, do quote/esc chars differ? */
+	bool		first = true;
+	bool		result = false;
+	const Vector8 nl_vec = vector8_broadcast('\n');
+	const Vector8 cr_vec = vector8_broadcast('\r');
+	Vector8		bs_or_quote_vec;	/* '\' for text, quote for csv */
+	Vector8		esc_vec;		/* only for csv */
+
+	if (is_csv)
+	{
+		char		quote = cstate->opts.quote[0];
+		char		esc = cstate->opts.escape[0];
+
+		bs_or_quote_vec = vector8_broadcast(quote);
+		esc_vec = vector8_broadcast(esc);
+		unique_esc_char = (quote != esc);
+	}
+	else
+	{
+		bs_or_quote_vec = vector8_broadcast('\\');
+		unique_esc_char = false;
+	}
+
+	/*
+	 * For a little extra speed within the loop, we copy some state members
+	 * into local variables. Note that we need to use a separate local
+	 * variable for input_buf_ptr so that the REFILL_LINEBUF macro works.  We
+	 * copy its value into the input_buf_ptr_p argument before returning.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	/*
+	 * See the corresponding loop in CopyReadLineText() for more information
+	 * about the purpose of this loop.  This one does the same thing using
+	 * SIMD instructions, although we are quick to bail out to the scalar path
+	 * if we encounter a special character.
+	 */
+	for (;;)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		/* Load more data if needed. */
+		if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*hit_eof_p = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		/*
+		 * If we still don't have enough data for the SIMD path, fall back to
+		 * the scalar code.  Note that this doesn't necessarily mean we
+		 * encountered a short line, so we leave cstate->simd_enabled set to
+		 * true.
+		 */
+		if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+			break;
+
+		/*
+		 * If we made it here, we have at least enough data to fit in a
+		 * Vector8, so we can use SIMD instructions to scan for special
+		 * characters.
+		 */
+		vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+		/*
+		 * Check for \n, \r, \\ (for text), quotes (for csv), and escapes (for
+		 * csv, if different from quotes).
+		 */
+		match = vector8_eq(chunk, nl_vec);
+		match = vector8_or(match, vector8_eq(chunk, cr_vec));
+		match = vector8_or(match, vector8_eq(chunk, bs_or_quote_vec));
+		if (unique_esc_char)
+			match = vector8_or(match, vector8_eq(chunk, esc_vec));
+
+		/*
+		 * If we found a special character, advance to it and hand off to the
+		 * scalar path.  Except for end-of-line characters, we also disable
+		 * SIMD processing for the remainder of the COPY FROM command.
+		 */
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+			char		c;
+
+			mask = vector8_highbit_mask(match);
+			input_buf_ptr += pg_rightmost_one_pos32(mask);
+
+			/*
+			 * Don't disable SIMD if we found \n or \r, else we'd stop using
+			 * SIMD instructions after the first line.  As an exception, we do
+			 * disable it if this is the first vector we processed, as that
+			 * means the line is too short for SIMD.
+			 */
+			c = copy_input_buf[input_buf_ptr];
+			if (first || (c != '\n' && c != '\r'))
+				cstate->simd_enabled = false;
+
+			break;
+		}
+
+		/* That chunk was clear of special characters, so we can skip it. */
+		input_buf_ptr += sizeof(Vector8);
+		first = false;
+	}
+
+	*input_buf_ptr_p = input_buf_ptr;
+	return result;
+}
+#endif							/* ! USE_NO_SIMD */
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1361,11 +1508,36 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
 	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
+	 * For a little extra speed within the loop, we copy some state
+	 * information into local variables.  input_buf_ptr could be changed in
+	 * the SIMD path, so we must set that one before it.  The others are set
+	 * afterwards.
 	 */
-	copy_input_buf = cstate->input_buf;
 	input_buf_ptr = cstate->input_buf_index;
+
+	/*
+	 * We first try to use SIMD for the task described above, falling back to
+	 * the scalar path (i.e., the loop below) if needed.
+	 */
+#ifndef USE_NO_SIMD
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Using a temporary variable seems to encourage the compiler to keep
+		 * it in a register, which is beneficial for performance.
+		 */
+		int			tmp_input_buf_ptr;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
+											&tmp_input_buf_ptr);
+		input_buf_ptr = tmp_input_buf_ptr;
+
+		if (result)
+			goto out;
+	}
+#endif							/* ! USE_NO_SIMD */
+
+	copy_input_buf = cstate->input_buf;
 	copy_buf_len = cstate->input_buf_len;
 
 	for (;;)
@@ -1605,6 +1777,10 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		}
 	}							/* end of outer loop */
 
+#ifndef USE_NO_SIMD
+out:
+#endif							/* ! USE_NO_SIMD */
+
 	/*
 	 * Transfer any still-uncopied data to line_buf.
 	 */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..9d3e244ee55 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,7 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	bool		simd_enabled;	/* use SIMD to scan for special chars? */
 
 	/*
 	 * True if the corresponding attribute's is a constrained domain. This
-- 
2.47.3



  [text/x-patch] v16-0002-Use-CopyReadLineSIMDResult-struct.patch (4.3K, 3-v16-0002-Use-CopyReadLineSIMDResult-struct.patch)
  download | inline diff:
From a32d853e020b1660510f960e7ba52707bbd6afe3 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Mar 2026 14:25:45 +0300
Subject: [PATCH v16 2/2] Use CopyReadLineSIMDResult struct

---
 src/backend/commands/copyfromparse.c | 44 +++++++++++++++++-----------
 src/tools/pgindent/typedefs.list     |  1 +
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index bae3bf6fb0d..3e3358af9e0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1313,6 +1313,17 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 }
 
 #ifndef USE_NO_SIMD
+/*
+ * Result of CopyReadLineTextSIMDHelper, returned by value to avoid
+ * pointer parameters that could inhibit register allocation in the caller.
+ */
+typedef struct CopyReadLineSIMDResult
+{
+	int			input_buf_ptr;
+	bool		hit_eof;
+	bool		result;
+} CopyReadLineSIMDResult;
+
 /*
  * Helper function for CopyReadLineText() that uses SIMD instructions to scan
  * the input buffer for special characters.  This can be much faster.
@@ -1323,21 +1334,23 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
  * regressions.  It could probably be made more lenient in the future via
  * fine-tuned heuristics.
  */
-static bool
-CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
-						   bool *hit_eof_p, int *input_buf_ptr_p)
+static CopyReadLineSIMDResult
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv)
 {
+	CopyReadLineSIMDResult ret;
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
 	int			copy_buf_len;
 	bool		unique_esc_char;	/* for csv, do quote/esc chars differ? */
 	bool		first = true;
-	bool		result = false;
 	const Vector8 nl_vec = vector8_broadcast('\n');
 	const Vector8 cr_vec = vector8_broadcast('\r');
 	Vector8		bs_or_quote_vec;	/* '\' for text, quote for csv */
 	Vector8		esc_vec;		/* only for csv */
 
+	ret.hit_eof = false;
+	ret.result = false;
+
 	if (is_csv)
 	{
 		char		quote = cstate->opts.quote[0];
@@ -1357,7 +1370,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 	 * For a little extra speed within the loop, we copy some state members
 	 * into local variables. Note that we need to use a separate local
 	 * variable for input_buf_ptr so that the REFILL_LINEBUF macro works.  We
-	 * copy its value into the input_buf_ptr_p argument before returning.
+	 * copy its value into the return struct before returning.
 	 */
 	copy_input_buf = cstate->input_buf;
 	input_buf_ptr = cstate->input_buf_index;
@@ -1381,7 +1394,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 
 			CopyLoadInputBuf(cstate);
 			/* update our local variables */
-			*hit_eof_p = cstate->input_reached_eof;
+			ret.hit_eof = cstate->input_reached_eof;
 			input_buf_ptr = cstate->input_buf_index;
 			copy_buf_len = cstate->input_buf_len;
 
@@ -1391,7 +1404,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 			 */
 			if (INPUT_BUF_BYTES(cstate) <= 0)
 			{
-				result = true;
+				ret.result = true;
 				break;
 			}
 		}
@@ -1453,8 +1466,8 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 		first = false;
 	}
 
-	*input_buf_ptr_p = input_buf_ptr;
-	return result;
+	ret.input_buf_ptr = input_buf_ptr;
+	return ret;
 }
 #endif							/* ! USE_NO_SIMD */
 
@@ -1522,15 +1535,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 #ifndef USE_NO_SIMD
 	if (cstate->simd_enabled)
 	{
-		/*
-		 * Using a temporary variable seems to encourage the compiler to keep
-		 * it in a register, which is beneficial for performance.
-		 */
-		int			tmp_input_buf_ptr;
+		CopyReadLineSIMDResult simd_result;
 
-		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
-											&tmp_input_buf_ptr);
-		input_buf_ptr = tmp_input_buf_ptr;
+		simd_result = CopyReadLineTextSIMDHelper(cstate, is_csv);
+		hit_eof = simd_result.hit_eof;
+		input_buf_ptr = simd_result.input_buf_ptr;
+		result = simd_result.result;
 
 		if (result)
 			goto out;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0de55183793..2acc40533c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -538,6 +538,7 @@ CopyMethod
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
+CopyReadLineSIMDResult
 CopySeqResult
 CopySource
 CopyStmt
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 13:34  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 13:34 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]> wrote:
>
> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
> tmp_hit_eof variable then the regression disappears. Also, if I use a
> struct like below, regression disappears again.

> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
> regression. I really don't understand why this is happening on my end.
> Manni didn't encounter any regression on the benchmark [1].

Problem might be related to gcc. I am using Debian Trixie and my
current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
is no regression, which makes more sense IMO.

Here is a comparison for csv & wide & 1/3 case. Postgres is compiled
with buildtype=debugoptimized and default_toast_compression is lz4.

+--------------------------------+
|   CSV & WIDE & 1/3, LZ4, -O2   |
+--------------+--------+--------+
|              |   gcc  |  clang |
|              | 14.0.2 | 19.1.7 |
+--------------+--------+--------+
|  old master  |  8250  |  10400 |
+--------------+--------+--------+
|      v14     |  8100  |  9800  |
+--------------+--------+--------+
|      v15     |  9200  |  9800  |
+--------------+--------+--------+
| v15 + struct |  7750  |  9800  |
+--------------+--------+--------+

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 14:05  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-03-13 14:05 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 04:34:49PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]> wrote:
>> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
>> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
>> tmp_hit_eof variable then the regression disappears. Also, if I use a
>> struct like below, regression disappears again.
> 
>> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
>> regression. I really don't understand why this is happening on my end.
>> Manni didn't encounter any regression on the benchmark [1].
> 
> Problem might be related to gcc. I am using Debian Trixie and my
> current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
> compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
> is no regression, which makes more sense IMO.

Let's just re-add the temporary variable for hit_eof.  The struct idea is
clever, but it's just a little more complicated than I think is necessary
here.

I've also removed the goto in favor of just duplicating the "out" code,
like you had before.  I'd like to avoid sporadic #ifndef USE_NO_SIMD uses,
and goto is out of fashion, anyway.

-- 
nathan


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 15:00  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-03-13 15:00 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Mar 12, 2026 at 09:39:38PM -0500, Manni Wood wrote:
> I found some time this evening to run some benchmarks using your v15 patch.
> I hope these help.

Thanks!

> x86 NARROW v15 default_toast_compression = lz4
> TXT :                 26019.629000 ms  0.347215% improvement
> CSV :                 26379.889000 ms  5.526984% improvement
> TXT with 1/3 escapes: 28865.322750 ms  -3.147600% regression
> CSV with 1/3 quotes:  33218.293250 ms  3.399359% improvement

> x86 NARROW v15 default_toast_compression = pglz
> TXT :                 26438.415000 ms  -0.128382% regression
> CSV :                 26869.718000 ms  4.509804% improvement
> TXT with 1/3 escapes: 29379.299750 ms  -4.610819% regression
> CSV with 1/3 quotes:  33371.390250 ms  3.278908% improvement

Those 3-5% regressions are interesting, but given there are similar
"improvements" for the surrounding cases, I'm going to consider them as
noise for now and proceed with the patch.  If folks feel strongly about
digging deeper here, I'm happy to revisit the subject.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 15:58  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 15:58 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 17:05, Nathan Bossart <[email protected]>
wrote:
>
> On Fri, Mar 13, 2026 at 04:34:49PM +0300, Nazir Bilal Yavuz wrote:
> > On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]>
wrote:
> >> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
> >> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
> >> tmp_hit_eof variable then the regression disappears. Also, if I use a
> >> struct like below, regression disappears again.
> >
> >> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
> >> regression. I really don't understand why this is happening on my end.
> >> Manni didn't encounter any regression on the benchmark [1].
> >
> > Problem might be related to gcc. I am using Debian Trixie and my
> > current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
> > compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
> > is no regression, which makes more sense IMO.
>
> Let's just re-add the temporary variable for hit_eof.  The struct idea is
> clever, but it's just a little more complicated than I think is necessary
> here.
>
> I've also removed the goto in favor of just duplicating the "out" code,
> like you had before.  I'd like to avoid sporadic #ifndef USE_NO_SIMD uses,
> and goto is out of fashion, anyway.

Thanks! v17 LGTM. I didn't encounter any regressions.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:08  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-13 16:08 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 06:58:49PM +0300, Nazir Bilal Yavuz wrote:
> Thanks! v17 LGTM. I didn't encounter any regressions.

Committed.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:13  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 16:13 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 19:08, Nathan Bossart <[email protected]> wrote:
>
> On Fri, Mar 13, 2026 at 06:58:49PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks! v17 LGTM. I didn't encounter any regressions.
>
> Committed.

Thank you for taking care of this!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:14  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 2 replies; 59+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 16:14 UTC (permalink / raw)
  To: Greg Burd <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Greg,

On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>
> I've always been a fan of these kinds of optimization so I couldn't resist reviewing, but I know you're ready to commit so I'll just check on some systems I have. :)

Thank you for the review!

> At first glance the implementation seems conservative, but correct and safe. Local testing on on Linux/FreeBSD x86_64, and Win11/aarch64/MSVC seem good. I also tried IllumOS/SPARCv9 and with some fixes (from another active thread) to the build system and it worked just fine too.  I'm sure the 10 people care will be thrilled. ;-

Yes, we can probably improve this further with heuristics, but for now
we wanted to avoid introducing any potential regressions.

> I also created a few tests (attached) to check boundary conditions, I might add some along with the RISC-V work.

Thank you for the tests! I have checked them and the output is the
same on both v17 and master. Do you think it would make sense to add
them as regression tests?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:16  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Nathan Bossart @ 2026-03-13 16:16 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Greg Burd <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 07:14:06PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>> I also created a few tests (attached) to check boundary conditions, I
>> might add some along with the RISC-V work.
> 
> Thank you for the tests! I have checked them and the output is the
> same on both v17 and master. Do you think it would make sense to add
> them as regression tests?

Seems like a good idea.  I was curious what the test coverage looked like
without extra tests.  Once there's a report, we could choose a subset of
these to close any gaps.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:21  Greg Burd <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 59+ messages in thread

From: Greg Burd @ 2026-03-13 17:21 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On Fri, Mar 13, 2026, at 12:14 PM, Nazir Bilal Yavuz wrote:
> Hi Greg,

Hello Nazir,

> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>
>> I've always been a fan of these kinds of optimization so I couldn't resist reviewing, but I know you're ready to commit so I'll just check on some systems I have. :)
>
> Thank you for the review!

Thank YOU for the work fixing this. :)

>> At first glance the implementation seems conservative, but correct and safe. Local testing on on Linux/FreeBSD x86_64, and Win11/aarch64/MSVC seem good. I also tried IllumOS/SPARCv9 and with some fixes (from another active thread) to the build system and it worked just fine too.  I'm sure the 10 people care will be thrilled. ;-
>
> Yes, we can probably improve this further with heuristics, but for now
> we wanted to avoid introducing any potential regressions.
>> I also created a few tests (attached) to check boundary conditions, I might add some along with the RISC-V work.
>
> Thank you for the tests! I have checked them and the output is the
> same on both v17 and master. Do you think it would make sense to add
> them as regression tests?

If there are tests that materially add to the coverage that's a good thing to consider adding.  I don't think all those tests are necessary.

best.

-greg

> -- 
> Regards,
> Nazir Bilal Yavuz
> Microsoft





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:22  Greg Burd <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Greg Burd @ 2026-03-13 17:22 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On Fri, Mar 13, 2026, at 12:16 PM, Nathan Bossart wrote:
> On Fri, Mar 13, 2026 at 07:14:06PM +0300, Nazir Bilal Yavuz wrote:
>> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>> I also created a few tests (attached) to check boundary conditions, I
>>> might add some along with the RISC-V work.
>> 
>> Thank you for the tests! I have checked them and the output is the
>> same on both v17 and master. Do you think it would make sense to add
>> them as regression tests?
>
> Seems like a good idea.  I was curious what the test coverage looked like
> without extra tests.  Once there's a report, we could choose a subset of
> these to close any gaps.

+1, you said it better than I did! :)

> -- 
> nathan

best.

-greg





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:33  Nathan Bossart <[email protected]>
  parent: Greg Burd <[email protected]>
  0 siblings, 0 replies; 59+ messages in thread

From: Nathan Bossart @ 2026-03-13 17:33 UTC (permalink / raw)
  To: Greg Burd <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 01:21:38PM -0400, Greg Burd wrote:
> On Fri, Mar 13, 2026, at 12:14 PM, Nazir Bilal Yavuz wrote:
>> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>> I also created a few tests (attached) to check boundary conditions, I
>>> might add some along with the RISC-V work.
>>
>> Thank you for the tests! I have checked them and the output is the
>> same on both v17 and master. Do you think it would make sense to add
>> them as regression tests?
> 
> If there are tests that materially add to the coverage that's a good
> thing to consider adding.  I don't think all those tests are necessary.

We seem to have good coverage on the new code [0].  I still wouldn't mind
adding a couple of tests for correctness, if folks want them.

[0] https://coverage.postgresql.org/src/backend/commands/copyfromparse.c.gcov.html

-- 
nathan





^ permalink  raw  reply  [nested|flat] 59+ messages in thread

end of thread, other threads:[~2026-03-13 17:33 UTC | newest]

Thread overview: 59+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-20 00:09 Re: Speed up COPY FROM text/CSV parsing using SIMD Manni Wood <[email protected]>
2026-02-20 09:50 ` Nazir Bilal Yavuz <[email protected]>
2026-02-20 18:15   ` Nathan Bossart <[email protected]>
2026-02-23 09:10     ` Nazir Bilal Yavuz <[email protected]>
2026-02-24 04:44       ` Manni Wood <[email protected]>
2026-02-24 13:57         ` Nazir Bilal Yavuz <[email protected]>
2026-02-24 15:07           ` KAZAR Ayoub <[email protected]>
2026-02-24 17:48           ` Nathan Bossart <[email protected]>
2026-02-25 04:06             ` Manni Wood <[email protected]>
2026-02-25 14:24             ` Nazir Bilal Yavuz <[email protected]>
2026-02-26 12:19               ` Nazir Bilal Yavuz <[email protected]>
2026-02-26 14:31                 ` KAZAR Ayoub <[email protected]>
2026-02-26 14:36                   ` Manni Wood <[email protected]>
2026-02-26 15:32                     ` Manni Wood <[email protected]>
2026-02-26 15:51                       ` KAZAR Ayoub <[email protected]>
2026-03-02 19:55               ` Nathan Bossart <[email protected]>
2026-03-04 15:15                 ` Nazir Bilal Yavuz <[email protected]>
2026-03-05 21:25                   ` Andrew Dunstan <[email protected]>
2026-03-06 16:59                     ` Manni Wood <[email protected]>
2026-03-06 17:39                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-06 18:13                         ` Manni Wood <[email protected]>
2026-03-06 18:55                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-06 21:25                             ` Manni Wood <[email protected]>
2026-03-06 23:13                               ` Nathan Bossart <[email protected]>
2026-03-06 23:31                                 ` KAZAR Ayoub <[email protected]>
2026-03-08 10:31                                   ` Nazir Bilal Yavuz <[email protected]>
2026-03-08 19:45                                     ` Manni Wood <[email protected]>
2026-03-09 08:10                                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-09 13:31                                         ` Manni Wood <[email protected]>
2026-03-09 13:43                                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-09 18:25                   ` Nathan Bossart <[email protected]>
2026-03-10 02:30                     ` Manni Wood <[email protected]>
2026-03-10 11:42                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-10 12:35                     ` Nazir Bilal Yavuz <[email protected]>
2026-03-10 17:10                       ` Nathan Bossart <[email protected]>
2026-03-11 11:36                         ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 12:19                           ` KAZAR Ayoub <[email protected]>
2026-03-11 13:10                             ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 13:23                               ` KAZAR Ayoub <[email protected]>
2026-03-11 18:09                           ` Nathan Bossart <[email protected]>
2026-03-11 18:49                             ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 19:02                               ` Nathan Bossart <[email protected]>
2026-03-11 19:22                                 ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 20:42                                   ` Nathan Bossart <[email protected]>
2026-03-12 10:59                                     ` Nazir Bilal Yavuz <[email protected]>
2026-03-12 17:37                                       ` Nathan Bossart <[email protected]>
2026-03-13 02:39                                         ` Manni Wood <[email protected]>
2026-03-13 15:00                                           ` Nathan Bossart <[email protected]>
2026-03-13 11:57                                         ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 13:34                                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 14:05                                             ` Nathan Bossart <[email protected]>
2026-03-13 15:58                                               ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:08                                                 ` Nathan Bossart <[email protected]>
2026-03-13 16:13                                                   ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:14                                               ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:16                                                 ` Nathan Bossart <[email protected]>
2026-03-13 17:22                                                   ` Greg Burd <[email protected]>
2026-03-13 17:21                                                 ` Greg Burd <[email protected]>
2026-03-13 17:33                                                   ` Nathan Bossart <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox