Speed up COPY FROM text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

Speed up COPY FROM text/CSV parsing using SIMD
114+ messages / 10 participants
[nested] [flat]

* Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-07 01:48  Shinya Kato <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Shinya Kato @ 2025-08-07 01:48 UTC (permalink / raw)
  To: pgsql-hackers

Hi hackers,

I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
text}) command and observed approximately a 5% performance
improvement. Please see the detailed test results below.

Idea
====
The current text/CSV parser processes input byte-by-byte, checking
whether each byte is a special character (\n, \r, quote, escape) or a
regular character, and transitions states in a state machine. This
sequential processing is inefficient and likely causes frequent branch
mispredictions due to the many if statements.

I thought this problem could be addressed by leveraging SIMD and
vectorized operations for faster processing.

Implementation Overview
=======================
1. Create a vector of special characters (e.g., Vector8 nl =
vector8_broadcast('\n');).
2. Load the input buffer into a Vector8 variable called chunk.
3. Perform vectorized operations between chunk and the special
character vectors to check if the buffer contains any special
characters.
4-1. If no special characters are found, advance the input_buf_ptr by
sizeof(Vector8).
4-2. If special characters are found, advance the input_buf_ptr as far
as possible, then fall back to the original text/CSV parser for
byte-by-byte processing.

Test
====
I tested the performance by measuring the time it takes to load a CSV
file created using the attached SQL script with the following COPY
command:
=# COPY t FROM '/tmp/t.csv' (FORMAT csv);

Environment
-----------
OS: Rocky Linux 9.6
CPU: Intel Core i7-10710U (6 Cores / 12 Threads, 1.1 GHz Base / 4.7
GHz Boost, AVX2 & FMA supported)

Time
----
master: 02.44.943
patch applied: 02:36.878 (about 5% faster)

Perf
----
Each call graphs are attached and the rates of CopyReadLineText are:
master: 12.15%
patch applied: 8.04%

Thought?
I would appreciate feedback on the implementation and any suggestions
for further improvement.

-- 
Best regards,
Shinya Kato
NTT OSS Center


Attachments:

  [application/octet-stream] v1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.9K, 2-v1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 5ae3be7d262e4251bf21ac0c73b3e0ebc2ba615d Mon Sep 17 00:00:00 2001
From: Shinya Kato <[email protected]>
Date: Mon, 28 Jul 2025 22:08:20 +0900
Subject: [PATCH v1] Speed up COPY FROM text/CSV parsing using SIMD

The inner loop of CopyReadLineText scans for newlines and other special
characters by processing the input byte-by-byte. For large inputs, this
can be a performance bottleneck.

This commit introduces a SIMD-accelerated path. When not parsing inside
a quoted field, we can use vector instructions to scan the input buffer
for any character of interest in 16-byte chunks. This significantly
improves performance, especially for data with long, unquoted fields.
---
 src/backend/commands/copyfromparse.c | 72 ++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..5aba0fa6cb7 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote;
+	Vector8		escape;
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,62 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+		/*
+		 * SIMD instructions are used here to efficiently scan the input buffer
+		 * for special characters (e.g., newline, carriage return, quotes, or
+		 * escape characters). This approach significantly improves performance
+		 * compared to byte-by-byte iteration, especially for large input
+		 * buffers.
+		 *
+		 * However, SIMD optimization cannot be applied in the following cases:
+		 * - Inside quoted fields, where escape sequences and closing quotes
+		 *   require sequential processing to handle correctly.
+		 * - When the remaining buffer size is smaller than the size of a SIMD
+		 *   vector register, as SIMD operations require processing data in
+		 *   fixed-size chunks.
+		 */
+		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			/* Create a mask of all special characters we need to stop at */
+			match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+			if (is_csv)
+			{
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+				match = vector8_or(match, vector8_eq(chunk, bs));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.47.1



  [application/octet-stream] test.sql (1.5K, 3-test.sql)
  download

  [image/svg+xml] master.svg (332.6K, 4-master.svg)
  download | view image

  [image/svg+xml] patch_applied.svg (343.1K, 5-patch_applied.svg)
  download | view image

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-07 11:15  Nazir Bilal Yavuz <[email protected]>
  parent: Shinya Kato <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-08-07 11:15 UTC (permalink / raw)
  To: Shinya Kato <[email protected]>; +Cc: pgsql-hackers

Hi,

Thank you for working on this!

On Thu, 7 Aug 2025 at 04:49, Shinya Kato <[email protected]> wrote:
>
> Hi hackers,
>
> I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> text}) command and observed approximately a 5% performance
> improvement. Please see the detailed test results below.

I have been working on the same idea. I was not moving input_buf_ptr
as far as possible, so I think your approach is better.

Also, I did a benchmark on text format. I created a benchmark for line
length in a table being from 1 byte to 1 megabyte.The peak improvement
is line length being 4096 and the improvement is more than 20% [1], I
saw no regression on your patch.

> Idea
> ====
> The current text/CSV parser processes input byte-by-byte, checking
> whether each byte is a special character (\n, \r, quote, escape) or a
> regular character, and transitions states in a state machine. This
> sequential processing is inefficient and likely causes frequent branch
> mispredictions due to the many if statements.
>
> I thought this problem could be addressed by leveraging SIMD and
> vectorized operations for faster processing.
>
> Implementation Overview
> =======================
> 1. Create a vector of special characters (e.g., Vector8 nl =
> vector8_broadcast('\n');).
> 2. Load the input buffer into a Vector8 variable called chunk.
> 3. Perform vectorized operations between chunk and the special
> character vectors to check if the buffer contains any special
> characters.
> 4-1. If no special characters are found, advance the input_buf_ptr by
> sizeof(Vector8).
> 4-2. If special characters are found, advance the input_buf_ptr as far
> as possible, then fall back to the original text/CSV parser for
> byte-by-byte processing.
>
...
> Thought?
> I would appreciate feedback on the implementation and any suggestions
> for further improvement.


I have a couple of ideas that I was working on:
---

+         * However, SIMD optimization cannot be applied in the following cases:
+         * - Inside quoted fields, where escape sequences and closing quotes
+         *   require sequential processing to handle correctly.

I think you can continue SIMD inside quoted fields. Only important
thing is you need to set last_was_esc to false when SIMD skipped the
chunk.
---

+         * - When the remaining buffer size is smaller than the size of a SIMD
+         *   vector register, as SIMD operations require processing data in
+         *   fixed-size chunks.

You run SIMD when 'copy_buf_len - input_buf_ptr >= sizeof(Vector8)'
but you only call CopyLoadInputBuf() when 'input_buf_ptr >=
copy_buf_len || need_data' so basically you need to wait at least the
sizeof(Vector8) character to pass for the next SIMD. And in the worst
case; if CopyLoadInputBuf() puts one character less than
sizeof(Vector8), then you can't ever run SIMD. I think we need to make
sure that CopyLoadInputBuf() loads at least the sizeof(Vector8)
character to the input_buf so we do not encounter that problem.
---

What do you think about adding SIMD to CopyReadAttributesText() and
CopyReadAttributesCSV() functions? When I add your SIMD approach to
CopyReadAttributesText() function, the improvement on the 4096 byte
line length input [1] goes from 20% to 30%.
---

I shared my ideas as a Feedback.txt file (.txt to stay off CFBot's
radar for this thread). I hope these help, please let me know if you
have any questions.

--
Regards,
Nazir Bilal Yavuz
Microsoft

From b13f4cdf134eef5fbecf9ea06f9b1c99890b7c02 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH] Feedback

---
 src/backend/commands/copyfromparse.c | 55 ++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..dae5c1f698c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1359,7 +1363,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1395,6 +1399,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1655,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);;
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1727,43 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+		if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+			/* Create a mask of all special characters we need to stop at */
+			match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				memcpy(output_ptr, cur_ptr, advance);
+				output_ptr += advance;
+				cur_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+				output_ptr += sizeof(Vector8);
+				cur_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
-- 
2.50.1



Attachments:

  [text/plain] Feedback.txt (3.4K, 2-Feedback.txt)
  download | inline diff:
From b13f4cdf134eef5fbecf9ea06f9b1c99890b7c02 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH] Feedback

---
 src/backend/commands/copyfromparse.c | 55 ++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..dae5c1f698c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1359,7 +1363,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1395,6 +1399,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1655,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);;
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1727,43 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+		if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+			/* Create a mask of all special characters we need to stop at */
+			match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				memcpy(output_ptr, cur_ptr, advance);
+				output_ptr += advance;
+				cur_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+				output_ptr += sizeof(Vector8);
+				cur_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
-- 
2.50.1



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-11 08:52  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-08-11 08:52 UTC (permalink / raw)
  To: Shinya Kato <[email protected]>; +Cc: pgsql-hackers

Hi,

On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <[email protected]> wrote:
>
> On Thu, 7 Aug 2025 at 04:49, Shinya Kato <[email protected]> wrote:
> >
> > I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> > text}) command and observed approximately a 5% performance
> > improvement. Please see the detailed test results below.
>
> Also, I did a benchmark on text format. I created a benchmark for line
> length in a table being from 1 byte to 1 megabyte.The peak improvement
> is line length being 4096 and the improvement is more than 20% [1], I
> saw no regression on your patch.

I did the same benchmark for the CSV format. The peak improvement is
line length being 4096 and the improvement is more than 25% [1]. I saw
a 5% regression on the 1 byte benchmark, there are no other
regressions.

> What do you think about adding SIMD to CopyReadAttributesText() and
> CopyReadAttributesCSV() functions? When I add your SIMD approach to
> CopyReadAttributesText() function, the improvement on the 4096 byte
> line length input [1] goes from 20% to 30%.

I wanted to try using SIMD in CopyReadAttributesCSV() as well. The
improvement on the 4096 byte line length input [1] goes from 25% to
35%, the regression on the 1 byte input is the same.

CopyReadAttributesCSV() changes are attached as feedback v2.

--
Regards,
Nazir Bilal Yavuz
Microsoft

From 203d648c4cf64c6d629f2abc719a371dd0393e22 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH v2] Feedback

---
 src/backend/commands/copyfromparse.c | 176 ++++++++++++++++++++++++---
 1 file changed, 160 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..7b83e64e23b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1345,21 +1349,22 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		}
 
 #ifndef USE_NO_SIMD
+
 		/*
-		 * SIMD instructions are used here to efficiently scan the input buffer
-		 * for special characters (e.g., newline, carriage return, quotes, or
-		 * escape characters). This approach significantly improves performance
-		 * compared to byte-by-byte iteration, especially for large input
-		 * buffers.
+		 * SIMD instructions are used here to efficiently scan the input
+		 * buffer for special characters (e.g., newline, carriage return,
+		 * quotes, or escape characters). This approach significantly improves
+		 * performance compared to byte-by-byte iteration, especially for
+		 * large input buffers.
 		 *
-		 * However, SIMD optimization cannot be applied in the following cases:
-		 * - Inside quoted fields, where escape sequences and closing quotes
-		 *   require sequential processing to handle correctly.
-		 * - When the remaining buffer size is smaller than the size of a SIMD
-		 *   vector register, as SIMD operations require processing data in
-		 *   fixed-size chunks.
+		 * However, SIMD optimization cannot be applied in the following
+		 * cases: - Inside quoted fields, where escape sequences and closing
+		 * quotes require sequential processing to handle correctly. - When
+		 * the remaining buffer size is smaller than the size of a SIMD vector
+		 * register, as SIMD operations require processing data in fixed-size
+		 * chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1388,13 +1393,15 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 				 * Found a special character. Advance up to that point and let
 				 * the scalar code handle it.
 				 */
-				int advance = pg_rightmost_one_pos32(mask);
+				int			advance = pg_rightmost_one_pos32(mask);
+
 				input_buf_ptr += advance;
 			}
 			else
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1657,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1729,44 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+			if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+			{
+				Vector8		chunk;
+				Vector8		match;
+				uint32		mask;
+
+				/* Load a chunk of data into a vector register */
+				vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+				/* Create a mask of all special characters we need to stop at */
+				match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+				/* Check if we found any special characters */
+				mask = vector8_highbit_mask(match);
+				if (mask != 0)
+				{
+					/*
+					 * Found a special character. Advance up to that point and
+					 * let the scalar code handle it.
+					 */
+					int			advance = pg_rightmost_one_pos32(mask);
+
+					memcpy(output_ptr, cur_ptr, advance);
+					output_ptr += advance;
+					cur_ptr += advance;
+				}
+				else
+				{
+					/* No special characters found, so skip the entire chunk */
+					memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+					output_ptr += sizeof(Vector8);
+					cur_ptr += sizeof(Vector8);
+					continue;
+				}
+			}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
@@ -1906,6 +1956,12 @@ CopyReadAttributesCSV(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		quote = vector8_broadcast(quotec);
+	Vector8		delim = vector8_broadcast(delimc);
+	Vector8		escape = vector8_broadcast(escapec);
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1972,6 +2028,50 @@ CopyReadAttributesCSV(CopyFromState cstate)
 			/* Not in quote */
 			for (;;)
 			{
+#ifndef USE_NO_SIMD
+				if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+				{
+					Vector8		chunk;
+					Vector8		match;
+					uint32		mask;
+
+					/* Load a chunk of data into a vector register */
+					vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+					/*
+					 * Create a mask of all special characters we need to stop
+					 * at
+					 */
+					match = vector8_or(vector8_eq(chunk, quote), vector8_eq(chunk, delim));
+
+					/* Check if we found any special characters */
+					mask = vector8_highbit_mask(match);
+					if (mask != 0)
+					{
+						/*
+						 * Found a special character. Advance up to that point
+						 * and let the scalar code handle it.
+						 */
+						int			advance = pg_rightmost_one_pos32(mask);
+
+						memcpy(output_ptr, cur_ptr, advance);
+						output_ptr += advance;
+						cur_ptr += advance;
+					}
+					else
+					{
+						/*
+						 * No special characters found, so skip the entire
+						 * chunk
+						 */
+						memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+						output_ptr += sizeof(Vector8);
+						cur_ptr += sizeof(Vector8);
+						continue;
+					}
+				}
+#endif
+
 				end_ptr = cur_ptr;
 				if (cur_ptr >= line_end_ptr)
 					goto endfield;
@@ -1995,6 +2095,50 @@ CopyReadAttributesCSV(CopyFromState cstate)
 			/* In quote */
 			for (;;)
 			{
+#ifndef USE_NO_SIMD
+				if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+				{
+					Vector8		chunk;
+					Vector8		match;
+					uint32		mask;
+
+					/* Load a chunk of data into a vector register */
+					vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+					/*
+					 * Create a mask of all special characters we need to stop
+					 * at
+					 */
+					match = vector8_or(vector8_eq(chunk, quote), vector8_eq(chunk, escape));
+
+					/* Check if we found any special characters */
+					mask = vector8_highbit_mask(match);
+					if (mask != 0)
+					{
+						/*
+						 * Found a special character. Advance up to that point
+						 * and let the scalar code handle it.
+						 */
+						int			advance = pg_rightmost_one_pos32(mask);
+
+						memcpy(output_ptr, cur_ptr, advance);
+						output_ptr += advance;
+						cur_ptr += advance;
+					}
+					else
+					{
+						/*
+						 * No special characters found, so skip the entire
+						 * chunk
+						 */
+						memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+						output_ptr += sizeof(Vector8);
+						cur_ptr += sizeof(Vector8);
+						continue;
+					}
+				}
+#endif
+
 				end_ptr = cur_ptr;
 				if (cur_ptr >= line_end_ptr)
 					ereport(ERROR,
-- 
2.50.1



Attachments:

  [text/plain] v2-0001-Feedback.txt (7.9K, 2-v2-0001-Feedback.txt)
  download | inline diff:
From 203d648c4cf64c6d629f2abc719a371dd0393e22 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH v2] Feedback

---
 src/backend/commands/copyfromparse.c | 176 ++++++++++++++++++++++++---
 1 file changed, 160 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..7b83e64e23b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1345,21 +1349,22 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		}
 
 #ifndef USE_NO_SIMD
+
 		/*
-		 * SIMD instructions are used here to efficiently scan the input buffer
-		 * for special characters (e.g., newline, carriage return, quotes, or
-		 * escape characters). This approach significantly improves performance
-		 * compared to byte-by-byte iteration, especially for large input
-		 * buffers.
+		 * SIMD instructions are used here to efficiently scan the input
+		 * buffer for special characters (e.g., newline, carriage return,
+		 * quotes, or escape characters). This approach significantly improves
+		 * performance compared to byte-by-byte iteration, especially for
+		 * large input buffers.
 		 *
-		 * However, SIMD optimization cannot be applied in the following cases:
-		 * - Inside quoted fields, where escape sequences and closing quotes
-		 *   require sequential processing to handle correctly.
-		 * - When the remaining buffer size is smaller than the size of a SIMD
-		 *   vector register, as SIMD operations require processing data in
-		 *   fixed-size chunks.
+		 * However, SIMD optimization cannot be applied in the following
+		 * cases: - Inside quoted fields, where escape sequences and closing
+		 * quotes require sequential processing to handle correctly. - When
+		 * the remaining buffer size is smaller than the size of a SIMD vector
+		 * register, as SIMD operations require processing data in fixed-size
+		 * chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1388,13 +1393,15 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 				 * Found a special character. Advance up to that point and let
 				 * the scalar code handle it.
 				 */
-				int advance = pg_rightmost_one_pos32(mask);
+				int			advance = pg_rightmost_one_pos32(mask);
+
 				input_buf_ptr += advance;
 			}
 			else
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1657,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1729,44 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+			if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+			{
+				Vector8		chunk;
+				Vector8		match;
+				uint32		mask;
+
+				/* Load a chunk of data into a vector register */
+				vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+				/* Create a mask of all special characters we need to stop at */
+				match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+				/* Check if we found any special characters */
+				mask = vector8_highbit_mask(match);
+				if (mask != 0)
+				{
+					/*
+					 * Found a special character. Advance up to that point and
+					 * let the scalar code handle it.
+					 */
+					int			advance = pg_rightmost_one_pos32(mask);
+
+					memcpy(output_ptr, cur_ptr, advance);
+					output_ptr += advance;
+					cur_ptr += advance;
+				}
+				else
+				{
+					/* No special characters found, so skip the entire chunk */
+					memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+					output_ptr += sizeof(Vector8);
+					cur_ptr += sizeof(Vector8);
+					continue;
+				}
+			}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
@@ -1906,6 +1956,12 @@ CopyReadAttributesCSV(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		quote = vector8_broadcast(quotec);
+	Vector8		delim = vector8_broadcast(delimc);
+	Vector8		escape = vector8_broadcast(escapec);
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1972,6 +2028,50 @@ CopyReadAttributesCSV(CopyFromState cstate)
 			/* Not in quote */
 			for (;;)
 			{
+#ifndef USE_NO_SIMD
+				if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+				{
+					Vector8		chunk;
+					Vector8		match;
+					uint32		mask;
+
+					/* Load a chunk of data into a vector register */
+					vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+					/*
+					 * Create a mask of all special characters we need to stop
+					 * at
+					 */
+					match = vector8_or(vector8_eq(chunk, quote), vector8_eq(chunk, delim));
+
+					/* Check if we found any special characters */
+					mask = vector8_highbit_mask(match);
+					if (mask != 0)
+					{
+						/*
+						 * Found a special character. Advance up to that point
+						 * and let the scalar code handle it.
+						 */
+						int			advance = pg_rightmost_one_pos32(mask);
+
+						memcpy(output_ptr, cur_ptr, advance);
+						output_ptr += advance;
+						cur_ptr += advance;
+					}
+					else
+					{
+						/*
+						 * No special characters found, so skip the entire
+						 * chunk
+						 */
+						memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+						output_ptr += sizeof(Vector8);
+						cur_ptr += sizeof(Vector8);
+						continue;
+					}
+				}
+#endif
+
 				end_ptr = cur_ptr;
 				if (cur_ptr >= line_end_ptr)
 					goto endfield;
@@ -1995,6 +2095,50 @@ CopyReadAttributesCSV(CopyFromState cstate)
 			/* In quote */
 			for (;;)
 			{
+#ifndef USE_NO_SIMD
+				if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+				{
+					Vector8		chunk;
+					Vector8		match;
+					uint32		mask;
+
+					/* Load a chunk of data into a vector register */
+					vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+					/*
+					 * Create a mask of all special characters we need to stop
+					 * at
+					 */
+					match = vector8_or(vector8_eq(chunk, quote), vector8_eq(chunk, escape));
+
+					/* Check if we found any special characters */
+					mask = vector8_highbit_mask(match);
+					if (mask != 0)
+					{
+						/*
+						 * Found a special character. Advance up to that point
+						 * and let the scalar code handle it.
+						 */
+						int			advance = pg_rightmost_one_pos32(mask);
+
+						memcpy(output_ptr, cur_ptr, advance);
+						output_ptr += advance;
+						cur_ptr += advance;
+					}
+					else
+					{
+						/*
+						 * No special characters found, so skip the entire
+						 * chunk
+						 */
+						memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+						output_ptr += sizeof(Vector8);
+						cur_ptr += sizeof(Vector8);
+						continue;
+					}
+				}
+#endif
+
 				end_ptr = cur_ptr;
 				if (cur_ptr >= line_end_ptr)
 					ereport(ERROR,
-- 
2.50.1



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-12 07:25  Shinya Kato <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Shinya Kato @ 2025-08-12 07:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: pgsql-hackers

On Thu, Aug 7, 2025 at 8:15 PM Nazir Bilal Yavuz <[email protected]> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Thu, 7 Aug 2025 at 04:49, Shinya Kato <[email protected]> wrote:
> >
> > Hi hackers,
> >
> > I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> > text}) command and observed approximately a 5% performance
> > improvement. Please see the detailed test results below.
>
> I have been working on the same idea. I was not moving input_buf_ptr
> as far as possible, so I think your approach is better.

Great. I'm looking forward to working with you on this feature implementation.

> Also, I did a benchmark on text format. I created a benchmark for line
> length in a table being from 1 byte to 1 megabyte.The peak improvement
> is line length being 4096 and the improvement is more than 20% [1], I
> saw no regression on your patch.

Thank you for the additional benchmarks.

> I have a couple of ideas that I was working on:
> ---
>
> +         * However, SIMD optimization cannot be applied in the following cases:
> +         * - Inside quoted fields, where escape sequences and closing quotes
> +         *   require sequential processing to handle correctly.
>
> I think you can continue SIMD inside quoted fields. Only important
> thing is you need to set last_was_esc to false when SIMD skipped the
> chunk.

That's a clever point that last_was_esc should be reset to false when
a SIMD chunk is skipped. You're right about that specific case.

However, the core challenge is not what happens when we skip a chunk,
but what happens when a chunk contains special characters like quotes
or escapes. The main reason we avoid SIMD inside quoted fields is that
the parsing logic becomes fundamentally sequential and
context-dependent.

To correctly parse a "" as a single literal quote, we must perform a
lookahead to check the next character. This is an inherently
sequential operation that doesn't map well to SIMD's parallel nature.

Trying to handle this stateful logic with SIMD would lead to
significant implementation complexity, especially with edge cases like
an escape character falling on the last byte of a chunk.

> +         * - When the remaining buffer size is smaller than the size of a SIMD
> +         *   vector register, as SIMD operations require processing data in
> +         *   fixed-size chunks.
>
> You run SIMD when 'copy_buf_len - input_buf_ptr >= sizeof(Vector8)'
> but you only call CopyLoadInputBuf() when 'input_buf_ptr >=
> copy_buf_len || need_data' so basically you need to wait at least the
> sizeof(Vector8) character to pass for the next SIMD. And in the worst
> case; if CopyLoadInputBuf() puts one character less than
> sizeof(Vector8), then you can't ever run SIMD. I think we need to make
> sure that CopyLoadInputBuf() loads at least the sizeof(Vector8)
> character to the input_buf so we do not encounter that problem.

I think you're probably right, but we only need to account for
sizeof(Vector8) when USE_NO_SIMD is not defined.

> What do you think about adding SIMD to CopyReadAttributesText() and
> CopyReadAttributesCSV() functions? When I add your SIMD approach to
> CopyReadAttributesText() function, the improvement on the 4096 byte
> line length input [1] goes from 20% to 30%.

Agreed, I will.

> I shared my ideas as a Feedback.txt file (.txt to stay off CFBot's
> radar for this thread). I hope these help, please let me know if you
> have any questions.

Thanks a lot!


On Mon, Aug 11, 2025 at 5:52 PM Nazir Bilal Yavuz <[email protected]> wrote:
>
> Hi,
>
> On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <[email protected]> wrote:
> >
> > On Thu, 7 Aug 2025 at 04:49, Shinya Kato <[email protected]> wrote:
> > >
> > > I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> > > text}) command and observed approximately a 5% performance
> > > improvement. Please see the detailed test results below.
> >
> > Also, I did a benchmark on text format. I created a benchmark for line
> > length in a table being from 1 byte to 1 megabyte.The peak improvement
> > is line length being 4096 and the improvement is more than 20% [1], I
> > saw no regression on your patch.
>
> I did the same benchmark for the CSV format. The peak improvement is
> line length being 4096 and the improvement is more than 25% [1]. I saw
> a 5% regression on the 1 byte benchmark, there are no other
> regressions.

Thank you. I'm not too concerned about a regression when there's only
one byte per line.

> > What do you think about adding SIMD to CopyReadAttributesText() and
> > CopyReadAttributesCSV() functions? When I add your SIMD approach to
> > CopyReadAttributesText() function, the improvement on the 4096 byte
> > line length input [1] goes from 20% to 30%.
>
> I wanted to try using SIMD in CopyReadAttributesCSV() as well. The
> improvement on the 4096 byte line length input [1] goes from 25% to
> 35%, the regression on the 1 byte input is the same.

Yes, I'm on it. I'm currently adding the SIMD logic to
CopyReadAttributesCSV() as you suggested. I'll share the new version
of the patch soon.


--
Best regards,
Shinya Kato
NTT OSS Center





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-13 06:21  Shinya Kato <[email protected]>
  parent: Shinya Kato <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Shinya Kato @ 2025-08-13 06:21 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: pgsql-hackers

On Tue, Aug 12, 2025 at 4:25 PM Shinya Kato <[email protected]> wrote:

> > +         * However, SIMD optimization cannot be applied in the following cases:
> > +         * - Inside quoted fields, where escape sequences and closing quotes
> > +         *   require sequential processing to handle correctly.
> >
> > I think you can continue SIMD inside quoted fields. Only important
> > thing is you need to set last_was_esc to false when SIMD skipped the
> > chunk.
>
> That's a clever point that last_was_esc should be reset to false when
> a SIMD chunk is skipped. You're right about that specific case.
>
> However, the core challenge is not what happens when we skip a chunk,
> but what happens when a chunk contains special characters like quotes
> or escapes. The main reason we avoid SIMD inside quoted fields is that
> the parsing logic becomes fundamentally sequential and
> context-dependent.
>
> To correctly parse a "" as a single literal quote, we must perform a
> lookahead to check the next character. This is an inherently
> sequential operation that doesn't map well to SIMD's parallel nature.
>
> Trying to handle this stateful logic with SIMD would lead to
> significant implementation complexity, especially with edge cases like
> an escape character falling on the last byte of a chunk.

Ah, you're right. My apologies, I misunderstood the implementation. It
appears that SIMD can be used even within quoted strings.

I think it would be better not to use the SIMD path when last_was_esc
is true. The next character is likely to be a special character, and
handling this case outside the SIMD loop would also improve
readability by consolidating the last_was_esc toggle logic in one
place.

Furthermore, when inside a quote (in_quote) in CSV mode, the detection
of \n and \r can be disabled.

+               last_was_esc = false;

Regarding the implementation, I believe we must set last_was_esc to
false when advancing input_buf_ptr, as shown in the code below. For
this reason, I think it’s best to keep the current logic for toggling
last_was_esc.

+               int advance = pg_rightmost_one_pos32(mask);
+               input_buf_ptr += advance;

I've attached a new patch that includes these changes. Further
modifications are still in progress.

-- 
Best regards,
Shinya Kato
NTT OSS Center


Attachments:

  [application/octet-stream] v2-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.4K, 2-v2-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 69e16f8c7a52d967385a1dc9b1602bbd4472df60 Mon Sep 17 00:00:00 2001
From: Shinya Kato <[email protected]>
Date: Mon, 28 Jul 2025 22:08:20 +0900
Subject: [PATCH v2] Speed up COPY FROM text/CSV parsing using SIMD

---
 src/backend/commands/copyfromparse.c | 71 ++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..f1a6ea81dd1 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote;
+	Vector8		escape;
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,61 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following cases:
+		 * - When the previously processed character was an escape character
+		 *   (last_was_esc), since the next byte must be examined sequentially.
+		 * - The remaining buffer is smaller than one vector width
+		 *   (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+		 */
+		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			/* \n and \r are not special inside quotes */
+			if (!in_quote)
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+			if (is_csv)
+			{
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+				match = vector8_or(match, vector8_eq(chunk, bs));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-14 02:24  KAZAR Ayoub <[email protected]>
  parent: Shinya Kato <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2025-08-14 02:24 UTC (permalink / raw)
  To: Shinya Kato <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; pgsql-hackers

Following Nazir's findings about 4096 bytes being the performant line
length, I did more benchmarks from my side on both TEXT and CSV formats
with two different cases of normal data (no special characters) and data
with many special characters.

Results are con good as expected and similar to previous benchmarks
 ~30.9% faster copy in TEXT format
 ~32.4% faster copy in CSV format
20%-30% reduces cycles per instructions

In the case of doing a lot of special characters in the lines (e.g., tables
with large numbers of columns maybe), we obviously expect regressions here
because of the overhead of many fallbacks to scalar processing.
Results for a 1/3 of line length of special characters:
~43.9% slower copy in TEXT format
~16.7% slower copy in CSV format
So for even less occurrences of special characters or wider distance
between there might still be some regressions in this case, a
non-significant case maybe, but can be treated in other patches if we
consider to not use SIMD path sometimes.

I hope this helps more and confirms the patch.

Regards,
Ayoub Kazar

Le jeu. 14 août 2025 à 01:55, Shinya Kato <[email protected]> a
écrit :

> On Tue, Aug 12, 2025 at 4:25 PM Shinya Kato <[email protected]>
> wrote:
>
> > > +         * However, SIMD optimization cannot be applied in the
> following cases:
> > > +         * - Inside quoted fields, where escape sequences and closing
> quotes
> > > +         *   require sequential processing to handle correctly.
> > >
> > > I think you can continue SIMD inside quoted fields. Only important
> > > thing is you need to set last_was_esc to false when SIMD skipped the
> > > chunk.
> >
> > That's a clever point that last_was_esc should be reset to false when
> > a SIMD chunk is skipped. You're right about that specific case.
> >
> > However, the core challenge is not what happens when we skip a chunk,
> > but what happens when a chunk contains special characters like quotes
> > or escapes. The main reason we avoid SIMD inside quoted fields is that
> > the parsing logic becomes fundamentally sequential and
> > context-dependent.
> >
> > To correctly parse a "" as a single literal quote, we must perform a
> > lookahead to check the next character. This is an inherently
> > sequential operation that doesn't map well to SIMD's parallel nature.
> >
> > Trying to handle this stateful logic with SIMD would lead to
> > significant implementation complexity, especially with edge cases like
> > an escape character falling on the last byte of a chunk.
>
> Ah, you're right. My apologies, I misunderstood the implementation. It
> appears that SIMD can be used even within quoted strings.
>
> I think it would be better not to use the SIMD path when last_was_esc
> is true. The next character is likely to be a special character, and
> handling this case outside the SIMD loop would also improve
> readability by consolidating the last_was_esc toggle logic in one
> place.
>
> Furthermore, when inside a quote (in_quote) in CSV mode, the detection
> of \n and \r can be disabled.
>
> +               last_was_esc = false;
>
> Regarding the implementation, I believe we must set last_was_esc to
> false when advancing input_buf_ptr, as shown in the code below. For
> this reason, I think it’s best to keep the current logic for toggling
> last_was_esc.
>
> +               int advance = pg_rightmost_one_pos32(mask);
> +               input_buf_ptr += advance;
>
> I've attached a new patch that includes these changes. Further
> modifications are still in progress.
>
> --
> Best regards,
> Shinya Kato
> NTT OSS Center
>


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-14 10:29  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-08-14 10:29 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <[email protected]> wrote:
>
> Following Nazir's findings about 4096 bytes being the performant line length, I did more benchmarks from my side on both TEXT and CSV formats with two different cases of normal data (no special characters) and data with many special characters.
>
> Results are con good as expected and similar to previous benchmarks
>  ~30.9% faster copy in TEXT format
>  ~32.4% faster copy in CSV format
> 20%-30% reduces cycles per instructions
>
> In the case of doing a lot of special characters in the lines (e.g., tables with large numbers of columns maybe), we obviously expect regressions here because of the overhead of many fallbacks to scalar processing.
> Results for a 1/3 of line length of special characters:
> ~43.9% slower copy in TEXT format
> ~16.7% slower copy in CSV format
> So for even less occurrences of special characters or wider distance between there might still be some regressions in this case, a non-significant case maybe, but can be treated in other patches if we consider to not use SIMD path sometimes.
>
> I hope this helps more and confirms the patch.

Thanks for running that benchmark! Would you mind sharing a reproducer
for the regression you observed?

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-14 14:59  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2025-08-14 14:59 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

> Hi,
>
> On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <[email protected]> wrote:
> >
> > Following Nazir's findings about 4096 bytes being the performant line
> length, I did more benchmarks from my side on both TEXT and CSV formats
> with two different cases of normal data (no special characters) and data
> with many special characters.
> >
> > Results are con good as expected and similar to previous benchmarks
> >  ~30.9% faster copy in TEXT format
> >  ~32.4% faster copy in CSV format
> > 20%-30% reduces cycles per instructions
> >
> > In the case of doing a lot of special characters in the lines (e.g.,
> tables with large numbers of columns maybe), we obviously expect
> regressions here because of the overhead of many fallbacks to scalar
> processing.
> > Results for a 1/3 of line length of special characters:
> > ~43.9% slower copy in TEXT format
> > ~16.7% slower copy in CSV format
> > So for even less occurrences of special characters or wider distance
> between there might still be some regressions in this case, a
> non-significant case maybe, but can be treated in other patches if we
> consider to not use SIMD path sometimes.
> >
> > I hope this helps more and confirms the patch.
>
> Thanks for running that benchmark! Would you mind sharing a reproducer
> for the regression you observed?
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft


Of course, I attached the sql to generate the text and csv test files.
If having a 1/3 of line length of special characters can be an
exaggeration, something lower might still reproduce some regressions of
course for the same idea.

Best regards,
Ayoub Kazar


Attachments:

  [application/sql] simd-copy-from-bench.sql (812B, 3-simd-copy-from-bench.sql)
  download

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-19 09:09  Ants Aasma <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Ants Aasma @ 2025-08-19 09:09 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz <[email protected]> wrote:
> I have a couple of ideas that I was working on:
> ---
>
> +         * However, SIMD optimization cannot be applied in the following cases:
> +         * - Inside quoted fields, where escape sequences and closing quotes
> +         *   require sequential processing to handle correctly.
>
> I think you can continue SIMD inside quoted fields. Only important
> thing is you need to set last_was_esc to false when SIMD skipped the
> chunk.

There is a trick with doing carryless multiplication with -1 that can
be used to SIMD process transitions between quoted/not-quoted. [1]
This is able to convert a bitmask of unescaped quote character
positions to a quote mask in a single operation. I last looked at it 5
years ago, but I remember coming to the conclusion that it would work
for implementing PostgreSQL's interpretation of CSV.

[1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76

--
Ants





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-19 12:33  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-08-19 12:33 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <[email protected]> wrote:
>> Thanks for running that benchmark! Would you mind sharing a reproducer
>> for the regression you observed?
>
> Of course, I attached the sql to generate the text and csv test files.
> If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea.

Thank you so much!

I am able to reproduce the regression you mentioned but both
regressions are %20 on my end. I found that (by experimenting) SIMD
causes a regression if it advances less than 5 characters.

So, I implemented a small heuristic. It works like that:

- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance ≥ 5, n is halved.

I am sharing a POC patch to show heuristic, it can be applied on top
of v1-0001. Heuristic version has the same performance improvements
with the v1-0001 but the regression is %5 instead of %20 compared to
the master.

--
Regards,
Nazir Bilal Yavuz
Microsoft

From aa55843b0c64bed9f72cf8cd7854df9df7ef989b Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 19 Aug 2025 15:16:02 +0300
Subject: [PATCH v1] COPY SIMD: add heuristic to avoid regression on small
 advances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When SIMD advances fewer than 5 characters, performance regresses.
To mitigate this, introduce a heuristic:

- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance ≥ 5, n is halved.
---
 src/backend/commands/copyfromparse.c | 42 ++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..e58d7d4e353 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1263,6 +1263,9 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	Vector8		bs = vector8_broadcast('\\');
 	Vector8		quote;
 	Vector8		escape;
+
+	int			sleep_cyle = 0;
+	int			last_sleep_cyle = 1;
 #endif
 
 	if (is_csv)
@@ -1359,7 +1362,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (sleep_cyle <= 0 && !in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1390,14 +1393,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 				 */
 				int advance = pg_rightmost_one_pos32(mask);
 				input_buf_ptr += advance;
+
+				/*
+				 * If we advance less than 5 characters we cause regression.
+				 * Sleep a bit then try again. Sleep time increases
+				 * exponentially.
+				 */
+				if (advance < 5)
+				{
+					if (last_sleep_cyle >= PG_INT16_MAX / 2)
+						last_sleep_cyle = PG_INT16_MAX;
+					else
+						last_sleep_cyle = last_sleep_cyle << 1;
+
+					sleep_cyle = last_sleep_cyle;
+				}
+
+				/*
+				 * If we advance more than 4 charactes this means we have
+				 * performance improvement. Halve sleep time for next sleep.
+				 */
+				else
+				{
+					last_sleep_cyle = Max(last_sleep_cyle >> 1, 1);
+					sleep_cyle = 0;
+				}
 			}
 			else
 			{
-				/* No special characters found, so skip the entire chunk */
+				/*
+				 * No special characters found, so skip the entire chunk and
+				 * halve sleep time for next sleep.
+				 */
 				input_buf_ptr += sizeof(Vector8);
+				last_sleep_cyle = Max(last_sleep_cyle >> 1, 1);
 				continue;
 			}
 		}
+
+		/*
+		 * Vulnerable to overflow if we are in quote for more than INT16_MAX
+		 * characters.
+		 */
+		sleep_cyle--;
 #endif
 
 		/* OK to fetch a character */
-- 
2.50.1



Attachments:

  [text/plain] COPY-SIMD-add-heuristic-to-avoid-regression-on-sm.txt (2.8K, 2-COPY-SIMD-add-heuristic-to-avoid-regression-on-sm.txt)
  download | inline diff:
From aa55843b0c64bed9f72cf8cd7854df9df7ef989b Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 19 Aug 2025 15:16:02 +0300
Subject: [PATCH v1] COPY SIMD: add heuristic to avoid regression on small
 advances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When SIMD advances fewer than 5 characters, performance regresses.
To mitigate this, introduce a heuristic:

- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance ≥ 5, n is halved.
---
 src/backend/commands/copyfromparse.c | 42 ++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..e58d7d4e353 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1263,6 +1263,9 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	Vector8		bs = vector8_broadcast('\\');
 	Vector8		quote;
 	Vector8		escape;
+
+	int			sleep_cyle = 0;
+	int			last_sleep_cyle = 1;
 #endif
 
 	if (is_csv)
@@ -1359,7 +1362,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (sleep_cyle <= 0 && !in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1390,14 +1393,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 				 */
 				int advance = pg_rightmost_one_pos32(mask);
 				input_buf_ptr += advance;
+
+				/*
+				 * If we advance less than 5 characters we cause regression.
+				 * Sleep a bit then try again. Sleep time increases
+				 * exponentially.
+				 */
+				if (advance < 5)
+				{
+					if (last_sleep_cyle >= PG_INT16_MAX / 2)
+						last_sleep_cyle = PG_INT16_MAX;
+					else
+						last_sleep_cyle = last_sleep_cyle << 1;
+
+					sleep_cyle = last_sleep_cyle;
+				}
+
+				/*
+				 * If we advance more than 4 charactes this means we have
+				 * performance improvement. Halve sleep time for next sleep.
+				 */
+				else
+				{
+					last_sleep_cyle = Max(last_sleep_cyle >> 1, 1);
+					sleep_cyle = 0;
+				}
 			}
 			else
 			{
-				/* No special characters found, so skip the entire chunk */
+				/*
+				 * No special characters found, so skip the entire chunk and
+				 * halve sleep time for next sleep.
+				 */
 				input_buf_ptr += sizeof(Vector8);
+				last_sleep_cyle = Max(last_sleep_cyle >> 1, 1);
 				continue;
 			}
 		}
+
+		/*
+		 * Vulnerable to overflow if we are in quote for more than INT16_MAX
+		 * characters.
+		 */
+		sleep_cyle--;
 #endif
 
 		/* OK to fetch a character */
-- 
2.50.1



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-19 14:14  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-08-19 14:14 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz <[email protected]> wrote:
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).

'sleep' might be a poor word choice here. I meant skipping SIMD for n
number of times.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-21 15:47  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2025-08-21 15:47 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz <[email protected]> wrote:
>> I am able to reproduce the regression you mentioned but both
>> regressions are %20 on my end. I found that (by experimenting) SIMD
>> causes a regression if it advances less than 5 characters.
>>
>> So, I implemented a small heuristic. It works like that:
>>
>> - If advance < 5 -> insert a sleep penalty (n cycles).
> 'sleep' might be a poor word choice here. I meant skipping SIMD for n
> number of times.
>

I was thinking a bit about that this morning. I wonder if it might be 
better instead of having a constantly applied heuristic like this, it 
might be better to do a little extra accounting in the first, say, 1000 
lines of an input file, and if less than some portion of the input is 
found to be special characters then switch to the SIMD code. What that 
portion should be would need to be determined by some experimentation 
with a variety of typical workloads, but given your findings 20% seems 
like a good starting point.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-08-21 19:36  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2025-08-21 19:36 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Shinya Kato <[email protected]>; pgsql-hackers

> On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub <[email protected]> wrote:
> >> Thanks for running that benchmark! Would you mind sharing a reproducer
> >> for the regression you observed?
> >
> > Of course, I attached the sql to generate the text and csv test files.
> > If having a 1/3 of line length of special characters can be an
> exaggeration, something lower might still reproduce some regressions of
> course for the same idea.
>
> Thank you so much!
>
> I am able to reproduce the regression you mentioned but both
> regressions are %20 on my end. I found that (by experimenting) SIMD
> causes a regression if it advances less than 5 characters.
>
> So, I implemented a small heuristic. It works like that:
>
> - If advance < 5 -> insert a sleep penalty (n cycles).
> - Each time advance < 5, n is doubled.
> - Each time advance ≥ 5, n is halved.
>
> I am sharing a POC patch to show heuristic, it can be applied on top
> of v1-0001. Heuristic version has the same performance improvements
> with the v1-0001 but the regression is %5 instead of %20 compared to
> the master.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Yes this is good, i'm also getting about 5% regression only now.



Regards,
Ayoub Kazar


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-16 14:29  Nazir Bilal Yavuz <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 3 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-10-16 14:29 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 21 Aug 2025 at 18:47, Andrew Dunstan <[email protected]> wrote:
>
>
> On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz <[email protected]> wrote:
> >> I am able to reproduce the regression you mentioned but both
> >> regressions are %20 on my end. I found that (by experimenting) SIMD
> >> causes a regression if it advances less than 5 characters.
> >>
> >> So, I implemented a small heuristic. It works like that:
> >>
> >> - If advance < 5 -> insert a sleep penalty (n cycles).
> > 'sleep' might be a poor word choice here. I meant skipping SIMD for n
> > number of times.
> >
>
> I was thinking a bit about that this morning. I wonder if it might be
> better instead of having a constantly applied heuristic like this, it
> might be better to do a little extra accounting in the first, say, 1000
> lines of an input file, and if less than some portion of the input is
> found to be special characters then switch to the SIMD code. What that
> portion should be would need to be determined by some experimentation
> with a variety of typical workloads, but given your findings 20% seems
> like a good starting point.

I implemented a heuristic something similar to this. It is a mix of
previous heuristic and your idea, it works like that:

Overall logic is that we will not run SIMD for the entire line and we
decide if it is worth it to run SIMD for the next lines.

1 - We will try SIMD and decide if it is worth it to run SIMD.
1.1 - If it is worth it, we will continue to run SIMD and we will
halve the simd_last_sleep_cycle variable.
1.2 - If it is not worth it, we will double the simd_last_sleep_cycle
and we will not run SIMD for these many lines.
1.3 - After skipping simd_last_sleep_cycle lines, we will go back to the #1.
Note: simd_last_sleep_cycle can not pass 1024, so we will run SIMD for
each 1024 lines at max.

With this heuristic the regression is limited by %2 in the worst case.

Patches are attached, the first patch is v2-0001 from Shinya with the
'-Werror=maybe-uninitialized' fixes and the pgindent changes. 0002 is
the actual heuristic patch.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v3-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.4K, 2-v3-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 2d2372e90305a81c80fe182003933039bf32f97e Mon Sep 17 00:00:00 2001
From: Shinya Kato <[email protected]>
Date: Mon, 28 Jul 2025 22:08:20 +0900
Subject: [PATCH v3 1/2] Speed up COPY FROM text/CSV parsing using SIMD

---
 src/backend/commands/copyfromparse.c | 73 ++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..99959a40fab 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,63 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - The remaining buffer is smaller than one vector
+		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+		 */
+		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			/* \n and \r are not special inside quotes */
+			if (!in_quote)
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+			if (is_csv)
+			{
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+				match = vector8_or(match, vector8_eq(chunk, bs));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.51.0



  [text/x-patch] v3-0002-COPY-SIMD-per-line-heuristic.patch (6.5K, 3-v3-0002-COPY-SIMD-per-line-heuristic.patch)
  download | inline diff:
From ad050583d3c14bdec44266d8d2110b384fa9d7dc Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 14 Oct 2025 13:18:13 +0300
Subject: [PATCH v3 2/2] COPY SIMD per-line heuristic

---
 src/include/commands/copyfrom_internal.h |  7 ++
 src/backend/commands/copyfrom.c          |  6 ++
 src/backend/commands/copyfromparse.c     | 82 ++++++++++++++++++++++--
 3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..9dd31320f52 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,13 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_continue;
+	bool		simd_initialized;
+	uint16		simd_last_sleep_cycle;
+	uint16		simd_current_sleep_cycle;
+
+
 	/*
 	 * Working state
 	 */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..4bdfd96c244 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1721,6 +1721,12 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD variables */
+	cstate->simd_continue = false;
+	cstate->simd_initialized = false;
+	cstate->simd_current_sleep_cycle = 0;
+	cstate->simd_last_sleep_cycle = 0;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 99959a40fab..24cef54e5e4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -143,12 +143,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 									 Oid typioparam, int32 typmod,
 									 bool *isnull);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv,
+														bool simd_continue);
 static pg_attribute_always_inline bool CopyFromTextLikeOneRow(CopyFromState cstate,
 															  ExprContext *econtext,
 															  Datum *values,
@@ -1173,8 +1175,23 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/* If that is the first time we do read, initalize the SIMD */
+	if (unlikely(!cstate->simd_initialized))
+	{
+		cstate->simd_initialized = true;
+		cstate->simd_continue = true;
+		cstate->simd_current_sleep_cycle = 0;
+		cstate->simd_last_sleep_cycle = 0;
+	}
+
+	/*
+	 * Parse data and transfer into line_buf. To get benefit from inlining,
+	 * call CopyReadLineText() with the constant boolean variables.
+	 */
+	if (cstate->simd_continue)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,8 +1258,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_continue)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1258,11 +1275,16 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		escapec = '\0';
 
 #ifndef USE_NO_SIMD
+#define SIMD_SLEEP_MAX 1024
+#define SIMD_ADVANCE_AT_LEAST 5
 	Vector8		nl = vector8_broadcast('\n');
 	Vector8		cr = vector8_broadcast('\r');
 	Vector8		bs = vector8_broadcast('\\');
 	Vector8		quote = vector8_broadcast(0);
 	Vector8		escape = vector8_broadcast(0);
+
+	uint64		simd_total_cycle = 0;
+	uint64		simd_total_advance = 0;
 #endif
 
 	if (is_csv)
@@ -1358,12 +1380,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (simd_continue && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
 			uint32		mask;
 
+			simd_total_cycle++;
+
 			/* Load a chunk of data into a vector register */
 			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
 
@@ -1391,11 +1415,13 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 				int			advance = pg_rightmost_one_pos32(mask);
 
 				input_buf_ptr += advance;
+				simd_total_advance += advance;
 			}
 			else
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				simd_total_advance += sizeof(Vector8);
 				continue;
 			}
 		}
@@ -1603,6 +1629,50 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		}
 	}							/* end of outer loop */
 
+#ifndef USE_NO_SIMD
+
+	/* SIMD was enabled */
+	if (simd_continue)
+	{
+		/* SIMD is worth */
+		if (simd_total_cycle && simd_total_advance / simd_total_cycle >= SIMD_ADVANCE_AT_LEAST)
+		{
+			Assert(cstate->simd_current_sleep_cycle == 0);
+			cstate->simd_last_sleep_cycle >>= 1;
+		}
+		/* SIMD was enabled but it isn't worth */
+		else
+		{
+			uint16		simd_last_sleep_cycle = cstate->simd_last_sleep_cycle;
+
+			cstate->simd_continue = false;
+
+			if (simd_last_sleep_cycle == 0)
+				simd_last_sleep_cycle = 1;
+			else if (simd_last_sleep_cycle >= SIMD_SLEEP_MAX / 2)
+				simd_last_sleep_cycle = SIMD_SLEEP_MAX;
+			else
+				simd_last_sleep_cycle <<= 1;
+			cstate->simd_current_sleep_cycle = simd_last_sleep_cycle;
+			cstate->simd_last_sleep_cycle = simd_last_sleep_cycle;
+		}
+	}
+	/* SIMD was disabled */
+	else
+	{
+		/*
+		 * We should come here with decrementing
+		 * cstate->simd_current_sleep_cycle from a positive number.
+		 */
+		Assert(cstate->simd_current_sleep_cycle != 0);
+		cstate->simd_current_sleep_cycle--;
+
+		if (cstate->simd_current_sleep_cycle == 0)
+			cstate->simd_continue = true;
+	}
+
+#endif
+
 	/*
 	 * Transfer any still-uncopied data to line_buf.
 	 */
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-18 18:46  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  2 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2025-10-18 18:46 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

I’ve rebenchmarked the new heuristic patch, We still have the previous
improvements ranging from 15% to 30%. For regressions i see at maximum 3%
or 4% in the worst case, so this is solid.

I'm also trying the idea of doing SIMD inside quotes with prefix XOR using
carry less multiplication avoiding the slow path in all cases even with
weird looking input, but it needs to take into consideration the
availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we
go, it quickly starts to become dirty OR we can wait for the decision to
start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.

Regards,
Ayoub Kazar

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-18 20:01  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  2 siblings, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-10-18 20:01 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 16 Oct 2025 at 17:29, Nazir Bilal Yavuz <[email protected]> wrote:
>
> Overall logic is that we will not run SIMD for the entire line and we
> decide if it is worth it to run SIMD for the next lines.

I had a typo there, correct sentence is that:

"Overall logic is that we *will* run SIMD for the entire line and we
decide if it is worth it to run SIMD for the next lines."

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-18 20:01  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-10-18 20:01 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 18 Oct 2025 at 21:46, KAZAR Ayoub <[email protected]> wrote:
>
> Hello,
>
> I’ve rebenchmarked the new heuristic patch, We still have the previous improvements ranging from 15% to 30%. For regressions i see at maximum 3% or 4% in the worst case, so this is solid.

Thank you so much for doing this! The results look nice, do you think
there are any other benchmarks that might be interesting to try?

> I'm also trying the idea of doing SIMD inside quotes with prefix XOR using carry less multiplication avoiding the slow path in all cases even with weird looking input, but it needs to take into consideration the availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we go, it quickly starts to become dirty OR we can wait for the decision to start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.

I can not quite picture this, would you mind sharing a few examples or patches?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-20 14:02  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  2 siblings, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2025-10-20 14:02 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Thu, 21 Aug 2025 at 18:47, Andrew Dunstan<[email protected]> wrote:
>>
>> On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote:
>>> Hi,
>>>
>>> On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz<[email protected]> wrote:
>>>> I am able to reproduce the regression you mentioned but both
>>>> regressions are %20 on my end. I found that (by experimenting) SIMD
>>>> causes a regression if it advances less than 5 characters.
>>>>
>>>> So, I implemented a small heuristic. It works like that:
>>>>
>>>> - If advance < 5 -> insert a sleep penalty (n cycles).
>>> 'sleep' might be a poor word choice here. I meant skipping SIMD for n
>>> number of times.
>>>
>> I was thinking a bit about that this morning. I wonder if it might be
>> better instead of having a constantly applied heuristic like this, it
>> might be better to do a little extra accounting in the first, say, 1000
>> lines of an input file, and if less than some portion of the input is
>> found to be special characters then switch to the SIMD code. What that
>> portion should be would need to be determined by some experimentation
>> with a variety of typical workloads, but given your findings 20% seems
>> like a good starting point.
> I implemented a heuristic something similar to this. It is a mix of
> previous heuristic and your idea, it works like that:
>
> Overall logic is that we will not run SIMD for the entire line and we
> decide if it is worth it to run SIMD for the next lines.
>
> 1 - We will try SIMD and decide if it is worth it to run SIMD.
> 1.1 - If it is worth it, we will continue to run SIMD and we will
> halve the simd_last_sleep_cycle variable.
> 1.2 - If it is not worth it, we will double the simd_last_sleep_cycle
> and we will not run SIMD for these many lines.
> 1.3 - After skipping simd_last_sleep_cycle lines, we will go back to the #1.
> Note: simd_last_sleep_cycle can not pass 1024, so we will run SIMD for
> each 1024 lines at max.
>
> With this heuristic the regression is limited by %2 in the worst case.
>

My worry is that the worst case is actually quite common. Sparse data 
sets dominated by a lot of null values (and hence lots of special 
characters) are very common. Are people prepared to accept a 2% 
regression on load times for such data sets?


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-20 17:04  Nathan Bossart <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2025-10-20 17:04 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote:
> On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
>> With this heuristic the regression is limited by %2 in the worst case.
> 
> My worry is that the worst case is actually quite common. Sparse data sets
> dominated by a lot of null values (and hence lots of special characters) are
> very common. Are people prepared to accept a 2% regression on load times for
> such data sets?

Without knowing how common it is, I think it's difficult to judge whether
2% is a reasonable trade-off.  If <5% of workloads might see a small
regression while the other >95% see double-digit percentage improvements,
then I might argue that it's fine.  But I'm not sure we have any way to
know those sorts of details at the moment.

I'm also at least a little skeptical about the 2% number.  IME that's
generally within the noise range and can vary greatly between machines and
test runs.

-- 
nathan

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-20 20:31  Andrew Dunstan <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2025-10-20 20:31 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On 2025-10-20 Mo 1:04 PM, Nathan Bossart wrote:
> On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote:
>> On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
>>> With this heuristic the regression is limited by %2 in the worst case.
>> My worry is that the worst case is actually quite common. Sparse data sets
>> dominated by a lot of null values (and hence lots of special characters) are
>> very common. Are people prepared to accept a 2% regression on load times for
>> such data sets?
> Without knowing how common it is, I think it's difficult to judge whether
> 2% is a reasonable trade-off.  If <5% of workloads might see a small
> regression while the other >95% see double-digit percentage improvements,
> then I might argue that it's fine.  But I'm not sure we have any way to
> know those sorts of details at the moment.

I guess what I don't understand is why we actually need to do the test 
continuously, even using an adaptive algorithm. Data files in my 
experience usually have lines with fairly similar shapes. It's highly 
unlikely that you will get the the first 1000 (say) lines of a file that 
are rich in special characters and then some later significant section 
that isn't, or vice versa. Therefore, doing the test once should yield 
the correct answer that can be applied to the rest of the file. That 
should reduce the worst case regression to ~0% without sacrificing any 
of the performance gains. I appreciate the elegance of what Bilal has 
done here, but it does seem like overkill.

> I'm also at least a little skeptical about the 2% number.  IME that's
> generally within the noise range and can vary greatly between machines and
> test runs.
>

Fair point.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-20 21:09  Nazir Bilal Yavuz <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-10-20 21:09 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Mon, 20 Oct 2025 at 23:32, Andrew Dunstan <[email protected]> wrote:
>
>
> On 2025-10-20 Mo 1:04 PM, Nathan Bossart wrote:
>
> On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote:
>
> On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
>
> With this heuristic the regression is limited by %2 in the worst case.
>
> My worry is that the worst case is actually quite common. Sparse data sets
> dominated by a lot of null values (and hence lots of special characters) are
> very common. Are people prepared to accept a 2% regression on load times for
> such data sets?
>
> Without knowing how common it is, I think it's difficult to judge whether
> 2% is a reasonable trade-off.  If <5% of workloads might see a small
> regression while the other >95% see double-digit percentage improvements,
> then I might argue that it's fine.  But I'm not sure we have any way to
> know those sorts of details at the moment.
>
>
> I guess what I don't understand is why we actually need to do the test continuously, even using an adaptive algorithm. Data files in my experience usually have lines with fairly similar shapes. It's highly unlikely that you will get the the first 1000 (say) lines of a file that are rich in special characters and then some later significant section that isn't, or vice versa. Therefore, doing the test once should yield the correct answer that can be applied to the rest of the file. That should reduce the worst case regression to ~0% without sacrificing any of the performance gains. I appreciate the elegance of what Bilal has done here, but it does seem like overkill.

I think the problem is deciding how many lines to process before
deciding for the rest. 1000 lines could work for the small sized data
but it might not work for the big sized data. Also, it might cause a
worse regressions for the small sized data. Because of this reason, I
tried to implement a heuristic that will work regardless of the size
of the data. The last heuristic I suggested will run SIMD for
approximately (#number_of_lines / 1024 [1024 is the max number of
lines to sleep before running SIMD again]) lines if all characters in
the data are special characters.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-21 06:17  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2025-10-21 06:17 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Sat, Oct 18, 2025 at 10:01 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Thank you so much for doing this! The results look nice, do you think
> there are any other benchmarks that might be interesting to try?
>

> > I'm also trying the idea of doing SIMD inside quotes with prefix XOR
> using carry less multiplication avoiding the slow path in all cases even
> with weird looking input, but it needs to take into consideration the
> availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we
> go, it quickly starts to become dirty OR we can wait for the decision to
> start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.
>
> I can not quite picture this, would you mind sharing a few examples or
> patches?
>
The idea aims to avoid stopping at characters that are not actually special
in their position (inside quote, escaped ..etc)
This is done by creating a lot of masks from the original chunk, masks
like: quote_mask, escape_mask, odd escape sequences mask ; from these we
can deduce which quotes are not special to stop at
Then for inside quotes, we aim to know which characters in our chunk are
inside quotes (also keeping in track the previous chunk's quote state) and
there's a clever/fast way to do it [1].
After this you start to match with LF and CR ..etc, all this while
maintaining the state of what you've seen (the annoying part).
At the end you only reach the scalar path advancing by the position of
first real special character that requires special treatment.

However, after trying to implement this on the existing pipeline way of
COPY command [2] (broken hopeless try, but has the idea), It becomes very
unreasonable for a lot of reasons:
- It is very challenging to correctly handle commas inside quoted fields,
and tracking quoted vs. unquoted state (especially across chunk boundaries,
or with escaped quotes) ....
- Using carry less multiplication (CLMUL) for prefix xor on a 16 bytes
chunk is overkill for some architectures where PCLMULQDQ latency is high
[3][4] to a point where it performs worse than an unrolled shifts + xor (5
cycles).
- It starts to feel that handling these cases is inherently scalar, doing
all that work for a 16 bytes chunk would be unreasonable since it's not
free, compared to a simple help using SIMD and heuristic of Nazir which is
way nicer in general.

Currently we are at 200-400Mbps which isn't that terrible compared to
production and non production grade parsers (of course we don't only parse
in our case), also we are using SSE2 only so theoretically if we add
support for avx later on we'll have even better numbers.
Maybe more micro optimizations to the current heuristic can squeeze it more.


[1]
https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-carry-less-multiply-pclmulq...
[2]
https://github.com/AyoubKaz07/postgres/commit/73c6ecfedae4cce5c3f375fd6074b1ca9dfe1daf
[3] https://agner.org/optimize/instruction_tables.pdf
[4] https://www.uops.info/table.html

Regards,
Ayoub Kazar.


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-21 06:44  KAZAR Ayoub <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2025-10-21 06:44 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; [email protected]; [email protected]; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Oct 21, 2025, 8:17 AM KAZAR Ayoub <[email protected]> wrote:

>
> Currently we are at 200-400Mbps which isn't that terrible compared to
> production and non production grade parsers (of course we don't only parse
> in our case), also we are using SSE2 only so theoretically if we add
> support for avx later on we'll have even better numbers.
> Maybe more micro optimizations to the current heuristic can squeeze it
> more.
>
>
> [1]
> https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-carry-less-multiply-pclmulq...
> [2]
> https://github.com/AyoubKaz07/postgres/commit/73c6ecfedae4cce5c3f375fd6074b1ca9dfe1daf
> [3] https://agner.org/optimize/instruction_tables.pdf
> [4] https://www.uops.info/table.html
>
> Regards,
> Ayoub Kazar.
>
Sorry, I meant 200-400MB/s.


Regards.
Ayoub Kazar.

>


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-21 18:40  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2025-10-21 18:40 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote:
> I think the problem is deciding how many lines to process before
> deciding for the rest. 1000 lines could work for the small sized data
> but it might not work for the big sized data. Also, it might cause a
> worse regressions for the small sized data.

IMHO we have some leeway with smaller amounts of data.  If COPY FROM for
1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems
unlikely users would be inconvenienced all that much.  (Those numbers are
completely made up in order to illustrate my point.)

> Because of this reason, I
> tried to implement a heuristic that will work regardless of the size
> of the data. The last heuristic I suggested will run SIMD for
> approximately (#number_of_lines / 1024 [1024 is the max number of
> lines to sleep before running SIMD again]) lines if all characters in
> the data are special characters.

I wonder if we could mitigate the regression further by spacing out the
checks a bit more.  It could be worth comparing a variety of values to
identify what works best with the test data.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-21 18:55  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Nathan Bossart @ 2025-10-21 18:55 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; [email protected] <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Oct 21, 2025 at 08:17:01AM +0200, KAZAR Ayoub wrote:
>>> I'm also trying the idea of doing SIMD inside quotes with prefix XOR
>>> using carry less multiplication avoiding the slow path in all cases even
>>> with weird looking input, but it needs to take into consideration the
>>> availability of PCLMULQDQ instruction set with <wmmintrin.h> and here we
>>> go, it quickly starts to become dirty OR we can wait for the decision to
>>> start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.
>
> [...]
> 
> Currently we are at 200-400Mbps which isn't that terrible compared to
> production and non production grade parsers (of course we don't only parse
> in our case), also we are using SSE2 only so theoretically if we add
> support for avx later on we'll have even better numbers.
> Maybe more micro optimizations to the current heuristic can squeeze it more.

I'd greatly prefer that we stick with SSE2/Neon (i.e., simd.h) unless the
gains are extraordinary.  Beyond the inherent complexity of using
architecture-specific intrinsics, you also have to deal with configure-time
checks, runtime checks, and function pointer overhead juggling.  That tends
to be a lot of work for the amount of gain.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-22 12:33  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-10-22 12:33 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Oct 21, 2025 at 12:09:27AM +0300, Nazir Bilal Yavuz wrote:
> > I think the problem is deciding how many lines to process before
> > deciding for the rest. 1000 lines could work for the small sized data
> > but it might not work for the big sized data. Also, it might cause a
> > worse regressions for the small sized data.
>
> IMHO we have some leeway with smaller amounts of data.  If COPY FROM for
> 1000 rows takes 19 milliseconds as opposed to 11 milliseconds, it seems
> unlikely users would be inconvenienced all that much.  (Those numbers are
> completely made up in order to illustrate my point.)
>
> > Because of this reason, I
> > tried to implement a heuristic that will work regardless of the size
> > of the data. The last heuristic I suggested will run SIMD for
> > approximately (#number_of_lines / 1024 [1024 is the max number of
> > lines to sleep before running SIMD again]) lines if all characters in
> > the data are special characters.
>
> I wonder if we could mitigate the regression further by spacing out the
> checks a bit more.  It could be worth comparing a variety of values to
> identify what works best with the test data.

Do you mean that instead of doubling the SIMD sleep, we should
multiply it by 3 (or another factor)? Or are you referring to
increasing the maximum sleep from 1024? Or possibly both?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-22 19:24  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2025-10-22 19:24 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <[email protected]> wrote:
>> I wonder if we could mitigate the regression further by spacing out the
>> checks a bit more.  It could be worth comparing a variety of values to
>> identify what works best with the test data.
> 
> Do you mean that instead of doubling the SIMD sleep, we should
> multiply it by 3 (or another factor)? Or are you referring to
> increasing the maximum sleep from 1024? Or possibly both?

I'm not sure of the precise details, but the main thrust of my suggestion
is to assume that whatever sampling you do to determine whether to use SIMD
is good for a larger chunk of data.  That is, if you are sampling 1K lines
and then using the result to choose whether to use SIMD for the next 100K
lines, we could instead bump the latter number to 1M lines (or something).
That way we minimize the regression for relatively uniform data sets while
retaining some ability to adapt in case things change halfway through a
large table.

-- 
nathan

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-10-29 22:22  Andrew Dunstan <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2025-10-29 22:22 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On 2025-10-22 We 3:24 PM, Nathan Bossart wrote:
> On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
>> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <[email protected]> wrote:
>>> I wonder if we could mitigate the regression further by spacing out the
>>> checks a bit more.  It could be worth comparing a variety of values to
>>> identify what works best with the test data.
>> Do you mean that instead of doubling the SIMD sleep, we should
>> multiply it by 3 (or another factor)? Or are you referring to
>> increasing the maximum sleep from 1024? Or possibly both?
> I'm not sure of the precise details, but the main thrust of my suggestion
> is to assume that whatever sampling you do to determine whether to use SIMD
> is good for a larger chunk of data.  That is, if you are sampling 1K lines
> and then using the result to choose whether to use SIMD for the next 100K
> lines, we could instead bump the latter number to 1M lines (or something).
> That way we minimize the regression for relatively uniform data sets while
> retaining some ability to adapt in case things change halfway through a
> large table.
>


I'd be ok with numbers like this, although I suspect the numbers of 
cases where we see shape shifts like this in the middle of a data set 
would be vanishingly small.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-11 22:23  Manni Wood <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-11-11 22:23 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Oct 29, 2025 at 5:23 PM Andrew Dunstan <[email protected]> wrote:

>
> On 2025-10-22 We 3:24 PM, Nathan Bossart wrote:
> > On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote:
> >> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <[email protected]>
> wrote:
> >>> I wonder if we could mitigate the regression further by spacing out the
> >>> checks a bit more.  It could be worth comparing a variety of values to
> >>> identify what works best with the test data.
> >> Do you mean that instead of doubling the SIMD sleep, we should
> >> multiply it by 3 (or another factor)? Or are you referring to
> >> increasing the maximum sleep from 1024? Or possibly both?
> > I'm not sure of the precise details, but the main thrust of my suggestion
> > is to assume that whatever sampling you do to determine whether to use
> SIMD
> > is good for a larger chunk of data.  That is, if you are sampling 1K
> lines
> > and then using the result to choose whether to use SIMD for the next 100K
> > lines, we could instead bump the latter number to 1M lines (or
> something).
> > That way we minimize the regression for relatively uniform data sets
> while
> > retaining some ability to adapt in case things change halfway through a
> > large table.
> >
>
>
> I'd be ok with numbers like this, although I suspect the numbers of
> cases where we see shape shifts like this in the middle of a data set
> would be vanishingly small.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>
Hello!

I wanted reproduce the results using files attached by Shinya Kato and
Ayoub Kazar. I installed a postgres compiled from master, and then I
installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches
applied.

The master+v3patches postgres naturally performed better on copying into
the database: anywhere from 11% better for the t.csv file produced by
Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file
created by Ayoub Kazar's simd-copy-from-bench.sql.

But here's where it gets weird. The two files created by Ayoub Kazar's
simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt,
and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5%
respectively.

This seems impossible.

A few things I should note:

I timed the commands using the Unix time command, like so:

time psql -X -U mwood -h localhost -d postgres -c '\copy t from
/tmp/t_4096_escape.txt'

For each file, I timed the copy 6 times and took the average.

This was done on my work Linux machine while also running Chrome and an
Open Office spreadsheet; not a dedicated machine only running postgres.

All of the copy results took between 4.5 seconds (Shinyo's t.csv copied
into postgres compiled from master) to 2 seconds (Ayoub
Kazar's t_4096_none.csv copied into postgres compiled from master plus
Nazir's v3 patches).

Perhaps I need to fiddle with the provided SQL to produce larger files to
get longer run times? Maybe sub-second differences won't tell as
interesting a story as minutes-long copy commands?

Thanks for reading this.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-12 14:44  KAZAR Ayoub <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2025-11-12 14:44 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Nov 11, 2025 at 11:23 PM Manni Wood <[email protected]>
wrote:

> Hello!
>
> I wanted reproduce the results using files attached by Shinya Kato and
> Ayoub Kazar. I installed a postgres compiled from master, and then I
> installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches
> applied.
>
> The master+v3patches postgres naturally performed better on copying into
> the database: anywhere from 11% better for the t.csv file produced by
> Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file
> created by Ayoub Kazar's simd-copy-from-bench.sql.
>
> But here's where it gets weird. The two files created by Ayoub Kazar's
> simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt,
> and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5%
> respectively.
>
> This seems impossible.
>
> A few things I should note:
>
> I timed the commands using the Unix time command, like so:
>
> time psql -X -U mwood -h localhost -d postgres -c '\copy t from
> /tmp/t_4096_escape.txt'
>
> For each file, I timed the copy 6 times and took the average.
>
> This was done on my work Linux machine while also running Chrome and an
> Open Office spreadsheet; not a dedicated machine only running postgres.
>
Hello,
I think if you do a perf benchmark (if it still reproduces) it would
probably be possible to explain why it's performing like that looking at
the CPI and other metrics and compare it to my findings.
What i also suggest is to make the data close even closer to the worst case
i.e: more special characters where it hurts the switching between SIMD and
scalar processing (in simd-copy-from-bench.sql file), if still does a good
job then there's something to look at.

>
>

> All of the copy results took between 4.5 seconds (Shinyo's t.csv copied
> into postgres compiled from master) to 2 seconds (Ayoub
> Kazar's t_4096_none.csv copied into postgres compiled from master plus
> Nazir's v3 patches).
>
> Perhaps I need to fiddle with the provided SQL to produce larger files to
> get longer run times? Maybe sub-second differences won't tell as
> interesting a story as minutes-long copy commands?
>
I did try it on some GBs (around 2-5GB only), the differences were not that
much, but if you can run this on more GBs (at least 10GB) it would be good
to look at, although i don't suspect anything interesting since the shape
of data is the same for the totality of the COPY.

>
> Thanks for reading this.
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>
Thanks for the info.


Regards,
Ayoub Kazar.


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-13 02:40  Manni Wood <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-11-13 02:40 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Nov 12, 2025 at 8:44 AM KAZAR Ayoub <[email protected]> wrote:

> On Tue, Nov 11, 2025 at 11:23 PM Manni Wood <[email protected]>
> wrote:
>
>> Hello!
>>
>> I wanted reproduce the results using files attached by Shinya Kato and
>> Ayoub Kazar. I installed a postgres compiled from master, and then I
>> installed a postgres built from master plus Nazir Bilal Yavuz's v3 patches
>> applied.
>>
>> The master+v3patches postgres naturally performed better on copying into
>> the database: anywhere from 11% better for the t.csv file produced by
>> Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file
>> created by Ayoub Kazar's simd-copy-from-bench.sql.
>>
>> But here's where it gets weird. The two files created by Ayoub Kazar's
>> simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.txt,
>> and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5%
>> respectively.
>>
>> This seems impossible.
>>
>> A few things I should note:
>>
>> I timed the commands using the Unix time command, like so:
>>
>> time psql -X -U mwood -h localhost -d postgres -c '\copy t from
>> /tmp/t_4096_escape.txt'
>>
>> For each file, I timed the copy 6 times and took the average.
>>
>> This was done on my work Linux machine while also running Chrome and an
>> Open Office spreadsheet; not a dedicated machine only running postgres.
>>
> Hello,
> I think if you do a perf benchmark (if it still reproduces) it would
> probably be possible to explain why it's performing like that looking at
> the CPI and other metrics and compare it to my findings.
> What i also suggest is to make the data close even closer to the worst
> case i.e: more special characters where it hurts the switching between SIMD
> and scalar processing (in simd-copy-from-bench.sql file), if still does a
> good job then there's something to look at.
>
>>
>>
>
>> All of the copy results took between 4.5 seconds (Shinyo's t.csv copied
>> into postgres compiled from master) to 2 seconds (Ayoub
>> Kazar's t_4096_none.csv copied into postgres compiled from master plus
>> Nazir's v3 patches).
>>
>> Perhaps I need to fiddle with the provided SQL to produce larger files to
>> get longer run times? Maybe sub-second differences won't tell as
>> interesting a story as minutes-long copy commands?
>>
> I did try it on some GBs (around 2-5GB only), the differences were not
> that much, but if you can run this on more GBs (at least 10GB) it would be
> good to look at, although i don't suspect anything interesting since the
> shape of data is the same for the totality of the COPY.
>
>>
>> Thanks for reading this.
>> --
>> -- Manni Wood EDB: https://www.enterprisedb.com
>>
> Thanks for the info.
>
>
> Regards,
> Ayoub Kazar.
>

Hello again!

It looks like using 10 times the data removed the apparent speedup in the
simd code when the simd code has to deal with t_4096_escape.txt
and t_4096_quote.csv. When both files contain 1,000,000 lines each,
postgres master+v3patch imports 0.63% slower and 0.54% slower respectively.
For 1,000,000 lines of t_4096_none.txt, the v3 patch yields a 30% speedup.
For 1,000,000 lines of t_4096_none.csv, the v3 patch yields a 33% speedup.

I got these numbers just via simple timing, though this time I used psql's
\timing feature. I left psql running rather than launching it each time as
I did when I used the unix "time" command. I ran the copy command 5 times
for each file and averaged the results. Again, this happened on a Linux
machine that also happened to be running Chrome and Open Office's
spreadsheet.

I should probably try to construct some .txt or .csv files that would trip
up the simd on/off heuristic in the v3 patch.

If data "in the wild" tend to be roughly the same "shape" from row to row,
as Andrew's experience has shown, I imagine these million row results bode
well for the v3 patch...
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-17 22:16  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2025-11-17 22:16 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Andrew Dunstan <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

I'd like to mark myself as the committer this one, but I noticed that the
commitfest entry [0] has been marked as Withdrawn.  Could someone either
reopen it or create a new one as appropriate (assuming there is a desire to
continue with it)?  I'm hoping to start spending more time on it soon.

[0] https://commitfest.postgresql.org/patch/5952/

-- 
nathan

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-17 22:52  Shinya Kato <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Shinya Kato @ 2025-11-17 22:52 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Andrew Dunstan <[email protected]>; Nazir Bilal Yavuz <[email protected]>; pgsql-hackers

On Tue, Nov 18, 2025, 07:16 Nathan Bossart <[email protected]> wrote:

> I'd like to mark myself as the committer this one, but I noticed that the
> commitfest entry [0] has been marked as Withdrawn.  Could someone either
> reopen it or create a new one as appropriate (assuming there is a desire to
> continue with it)?  I'm hoping to start spending more time on it soon.
>
> [0] https://commitfest.postgresql.org/patch/5952/


I closed this entry because I currently don't have enough time to continue
developing this patch. It is fine if someone else reopens it; I will do my
best to see the patch whenever I can.

Shinya


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-18 08:04  Nazir Bilal Yavuz <[email protected]>
  parent: Shinya Kato <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-11-18 08:04 UTC (permalink / raw)
  To: Shinya Kato <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Andrew Dunstan <[email protected]>; pgsql-hackers

Hi,

On Tue, 18 Nov 2025 at 01:53, Shinya Kato <[email protected]> wrote:
>
>
> On Tue, Nov 18, 2025, 07:16 Nathan Bossart <[email protected]> wrote:
>>
>> I'd like to mark myself as the committer this one, but I noticed that the
>> commitfest entry [0] has been marked as Withdrawn.  Could someone either
>> reopen it or create a new one as appropriate (assuming there is a desire to
>> continue with it)?  I'm hoping to start spending more time on it soon.
>>
>> [0] https://commitfest.postgresql.org/patch/5952/
>
>
> I closed this entry because I currently don't have enough time to continue developing this patch. It is fine if someone else reopens it; I will do my best to see the patch whenever I can.

Thank you for all your work on this patch.

I would like to continue working on this but I am not sure what are
the correct steps to reopen this commitfest entry. Do I just need to
change commitfest entry's status to 'Needs review'?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-18 14:01  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2025-11-18 14:01 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; Shinya Kato <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers


On 2025-11-18 Tu 3:04 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Tue, 18 Nov 2025 at 01:53, Shinya Kato <[email protected]> wrote:
>>
>> On Tue, Nov 18, 2025, 07:16 Nathan Bossart <[email protected]> wrote:
>>> I'd like to mark myself as the committer this one, but I noticed that the
>>> commitfest entry [0] has been marked as Withdrawn.  Could someone either
>>> reopen it or create a new one as appropriate (assuming there is a desire to
>>> continue with it)?  I'm hoping to start spending more time on it soon.
>>>
>>> [0] https://commitfest.postgresql.org/patch/5952/
>>
>> I closed this entry because I currently don't have enough time to continue developing this patch. It is fine if someone else reopens it; I will do my best to see the patch whenever I can.
> Thank you for all your work on this patch.
>
> I would like to continue working on this but I am not sure what are
> the correct steps to reopen this commitfest entry. Do I just need to
> change commitfest entry's status to 'Needs review'?


That should do it, I believe.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-18 14:20  Nazir Bilal Yavuz <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-11-18 14:20 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Shinya Kato <[email protected]>; Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers

Hi,

On Tue, 18 Nov 2025 at 17:01, Andrew Dunstan <[email protected]> wrote:
>
>
> On 2025-11-18 Tu 3:04 AM, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Tue, 18 Nov 2025 at 01:53, Shinya Kato <[email protected]> wrote:
> >>
> >> On Tue, Nov 18, 2025, 07:16 Nathan Bossart <[email protected]> wrote:
> >>> I'd like to mark myself as the committer this one, but I noticed that the
> >>> commitfest entry [0] has been marked as Withdrawn.  Could someone either
> >>> reopen it or create a new one as appropriate (assuming there is a desire to
> >>> continue with it)?  I'm hoping to start spending more time on it soon.
> >>>
> >>> [0] https://commitfest.postgresql.org/patch/5952/
> >>
> >> I closed this entry because I currently don't have enough time to continue developing this patch. It is fine if someone else reopens it; I will do my best to see the patch whenever I can.
> > Thank you for all your work on this patch.
> >
> > I would like to continue working on this but I am not sure what are
> > the correct steps to reopen this commitfest entry. Do I just need to
> > change commitfest entry's status to 'Needs review'?
>
> That should do it, I believe.

Thanks, done.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-18 20:42  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2025-11-18 20:42 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Nov 17, 2025, 11:16 PM Nathan Bossart <[email protected]>
wrote:

> (assuming there is a desire to
> continue with it)?

I'm hoping to start spending more time on it soon.
>
Somethings worth noting for future reference (so someone else wouldn't
waste time thinking about it), previously I tried extra several micro
optimizations inside and around CopyReadLineText:

SIMD alignment*:* Forcing 16-byte aligned buffers so we could use aligned
memory instructions (_mm_load_si128 vs _mm_loadu_si128) provided no
measurable benefit on modern CPUs (there's definitely a thread somewhere
talking about it that i didn't encounter yet). This likely explains why
simd.h exclusively uses unaligned load intrinsics the performance
difference has become negligible since Nehalem processors.

Memory prefetching: Explicit prefetch instructions for the COPY buffer
pipeline (copy_raw_buf, input buffers, etc.) either showed no improvement
or slight regression. Multiple chunks are already within a cache line,
other buffers are too far to prefetch and the next part of the buffer is
easily prefetched, nothing special, so it turns out to be not worth having
more uops.

Instruction-level parallelism: Spreading too many independent vector
operations to increase ILP eventually degrades performance, likely due to
backend saturation observed through perf (execution port and execution
units contention most likely ?)
.....

This simply suggests that further optimization work should focus on the
pipeline as a whole for large benefits (parallel copy[0], maybe ?).

[0]
https://www.postgresql.org/message-id/[email protected]...

--
Regards,
Ayoub Kazar

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-19 21:01  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2025-11-19 21:01 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers

On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
> Thanks, done.

I took a look at the v3 patches.  Here are my high-level thoughts:

+    /*
+     * Parse data and transfer into line_buf. To get benefit from inlining,
+     * call CopyReadLineText() with the constant boolean variables.
+     */
+    if (cstate->simd_continue)
+        result = CopyReadLineText(cstate, is_csv, true);
+    else
+        result = CopyReadLineText(cstate, is_csv, false);

I'm curious whether this actually generates different code, and if it does,
if it's actually faster.  We're already branching on cstate->simd_continue
here.

+            /* Load a chunk of data into a vector register */
+            vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);

In other places, processing 2 or 4 vectors of data at a time has proven
faster.  Have you tried that here?

+            /* \n and \r are not special inside quotes */
+            if (!in_quote)
+                match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+            if (is_csv)
+            {
+                match = vector8_or(match, vector8_eq(chunk, quote));
+                if (escapec != '\0')
+                    match = vector8_or(match, vector8_eq(chunk, escape));
+            }
+            else
+                match = vector8_or(match, vector8_eq(chunk, bs));

The amount of branching here catches my eye.  Some branching might be
unavoidable, but in general we want to keep these SIMD paths as branch-free
as possible.

+                /*
+                 * Found a special character. Advance up to that point and let
+                 * the scalar code handle it.
+                 */
+                int         advance = pg_rightmost_one_pos32(mask);
+
+                input_buf_ptr += advance;
+                simd_total_advance += advance;

Do we actually need to advance here?  Or could we just fall through to the
scalar path?  My suspicion is that this extra code doesn't gain us much.

+            if (simd_last_sleep_cycle == 0)
+                simd_last_sleep_cycle = 1;
+            else if (simd_last_sleep_cycle >= SIMD_SLEEP_MAX / 2)
+                simd_last_sleep_cycle = SIMD_SLEEP_MAX;
+            else
+                simd_last_sleep_cycle <<= 1;
+            cstate->simd_current_sleep_cycle = simd_last_sleep_cycle;
+            cstate->simd_last_sleep_cycle = simd_last_sleep_cycle;

IMHO we should be looking for ways to simplify this should-we-use-SIMD
code.  For example, perhaps we could just disable the SIMD path for 10K or
100K lines any time a special character is found.  I'm dubious that a lot
of complexity is warranted.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-20 12:55  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-11-20 12:55 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers

Hi,

Thank you for looking into this!

On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks, done.
>
> I took a look at the v3 patches.  Here are my high-level thoughts:
>
> +    /*
> +     * Parse data and transfer into line_buf. To get benefit from inlining,
> +     * call CopyReadLineText() with the constant boolean variables.
> +     */
> +    if (cstate->simd_continue)
> +        result = CopyReadLineText(cstate, is_csv, true);
> +    else
> +        result = CopyReadLineText(cstate, is_csv, false);
>
> I'm curious whether this actually generates different code, and if it does,
> if it's actually faster.  We're already branching on cstate->simd_continue
> here.

I had the same doubts before but my benchmark shows nice speedup. I
used a test which is full of delimiters. The current code gives 2700
ms but when I changed these lines with the 'result =
CopyReadLineText(cstate, is_csv, cstate->simd_continue);', the result
was 2920 ms. I compiled code with both -O3 and -O2 and the results
were similar.

>
> +            /* Load a chunk of data into a vector register */
> +            vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
>
> In other places, processing 2 or 4 vectors of data at a time has proven
> faster.  Have you tried that here?

Sorry, I could not find the related code piece. I only saw the
vector8_load() inside of hex_decode_safe() function and its comment
says:

/*
 * We must process 2 vectors at a time since the output will be half the
 * length of the input.
 */

But this does not mention any speedup from using 2 vectors at a time.
Could you please show the related code?

>
> +            /* \n and \r are not special inside quotes */
> +            if (!in_quote)
> +                match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
> +
> +            if (is_csv)
> +            {
> +                match = vector8_or(match, vector8_eq(chunk, quote));
> +                if (escapec != '\0')
> +                    match = vector8_or(match, vector8_eq(chunk, escape));
> +            }
> +            else
> +                match = vector8_or(match, vector8_eq(chunk, bs));
>
> The amount of branching here catches my eye.  Some branching might be
> unavoidable, but in general we want to keep these SIMD paths as branch-free
> as possible.

You are right, I will check these branches and will try to remove as
many branches as possible.

>
> +                /*
> +                 * Found a special character. Advance up to that point and let
> +                 * the scalar code handle it.
> +                 */
> +                int         advance = pg_rightmost_one_pos32(mask);
> +
> +                input_buf_ptr += advance;
> +                simd_total_advance += advance;
>
> Do we actually need to advance here?  Or could we just fall through to the
> scalar path?  My suspicion is that this extra code doesn't gain us much.

My testing shows that if we advance more than ~5 characters then SIMD
is worth it, but if we advance less than ~5; then code causes a
regression. I used this information while writing a heuristic.

>
> +            if (simd_last_sleep_cycle == 0)
> +                simd_last_sleep_cycle = 1;
> +            else if (simd_last_sleep_cycle >= SIMD_SLEEP_MAX / 2)
> +                simd_last_sleep_cycle = SIMD_SLEEP_MAX;
> +            else
> +                simd_last_sleep_cycle <<= 1;
> +            cstate->simd_current_sleep_cycle = simd_last_sleep_cycle;
> +            cstate->simd_last_sleep_cycle = simd_last_sleep_cycle;
>
> IMHO we should be looking for ways to simplify this should-we-use-SIMD
> code.  For example, perhaps we could just disable the SIMD path for 10K or
> 100K lines any time a special character is found.  I'm dubious that a lot
> of complexity is warranted.

I think this is a bit too harsh since SIMD is still worth it if SIMD
can advance more than ~5 character average. I am trying to use SIMD as
much as possible when it is worth it but what you said can remove the
regression completely, perhaps that is the correct way.

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-21 14:48  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Andrew Dunstan @ 2025-11-21 14:48 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; +Cc: Shinya Kato <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers


On 2025-11-20 Th 7:55 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> Thank you for looking into this!
>
> On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:
>
>> IMHO we should be looking for ways to simplify this should-we-use-SIMD
>> code.  For example, perhaps we could just disable the SIMD path for 10K or
>> 100K lines any time a special character is found.  I'm dubious that a lot
>> of complexity is warranted.
> I think this is a bit too harsh since SIMD is still worth it if SIMD
> can advance more than ~5 character average. I am trying to use SIMD as
> much as possible when it is worth it but what you said can remove the
> regression completely, perhaps that is the correct way.
>

Perhaps a very small regression (say under 1%) in the worst case would 
be OK. But the closer you can get that to zero the more acceptable this 
will be. Very large loads of sparse data, which will often have lots of 
special characters AIUI, are very common, so we should not dismiss the 
worst case as an outlier. I still like the idea of testing, say, a 
thousand lines every million, or something like that.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-24 21:59  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2025-11-24 21:59 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers

On Thu, Nov 20, 2025 at 03:55:43PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]> wrote:
>> +            /* Load a chunk of data into a vector register */
>> +            vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
>>
>> In other places, processing 2 or 4 vectors of data at a time has proven
>> faster.  Have you tried that here?
> 
> Sorry, I could not find the related code piece. I only saw the
> vector8_load() inside of hex_decode_safe() function and its comment
> says:
> 
> /*
>  * We must process 2 vectors at a time since the output will be half the
>  * length of the input.
>  */
> 
> But this does not mention any speedup from using 2 vectors at a time.
> Could you please show the related code?

See pg_lfind32().

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-26 00:09  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Manni Wood @ 2025-11-26 00:09 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; KAZAR Ayoub <[email protected]>; pgsql-hackers

Hello.

I tried Ayoub Kazar's test files again, using Nazir Bilal Yavuz's v3
patches, but with one difference since my last attempt: this time, I used 5
million lines per file. For each 5 million line file, I ran the import 5
times and averaged the results.

(I found that even using 1 million lines could sometimes produce surprising
speedups where the newer algorithm should be at least a tiny bit slower
than the non-simd version.)

The text file with no special characters is 30% faster. The CSV file with
no special characters is 39% faster. The text file with roughly 1/3rd
special characters is 0.5% slower. The CSV file with roughly 1/3rd special
characters is 2.7% slower.

I also tried files that alternated lines with no special characters and
lines with 1/3rd special characters, thinking I could force the algorithm
to continually check whether or not it should use simd and therefore force
more overhead in the try-simd/don't-try-simd housekeeping code. The text
file was still 50% faster. The CSV file was still 13% faster.

On Mon, Nov 24, 2025 at 3:59 PM Nathan Bossart <[email protected]>
wrote:

> On Thu, Nov 20, 2025 at 03:55:43PM +0300, Nazir Bilal Yavuz wrote:
> > On Thu, 20 Nov 2025 at 00:01, Nathan Bossart <[email protected]>
> wrote:
> >> +            /* Load a chunk of data into a vector register */
> >> +            vector8_load(&chunk, (const uint8 *)
> &copy_input_buf[input_buf_ptr]);
> >>
> >> In other places, processing 2 or 4 vectors of data at a time has proven
> >> faster.  Have you tried that here?
> >
> > Sorry, I could not find the related code piece. I only saw the
> > vector8_load() inside of hex_decode_safe() function and its comment
> > says:
> >
> > /*
> >  * We must process 2 vectors at a time since the output will be half the
> >  * length of the input.
> >  */
> >
> > But this does not mention any speedup from using 2 vectors at a time.
> > Could you please show the related code?
>
> See pg_lfind32().
>
> --
> nathan
>

-- 
-- Manni Wood EDB: https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-26 11:50  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2025-11-26 11:50 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Manni Wood <[email protected]>; pgsql-hackers

Hello,
On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <[email protected]>
wrote:

> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks, done.
>
> I took a look at the v3 patches.  Here are my high-level thoughts:
>
> +    /*
> +     * Parse data and transfer into line_buf. To get benefit from
> inlining,
> +     * call CopyReadLineText() with the constant boolean variables.
> +     */
> +    if (cstate->simd_continue)
> +        result = CopyReadLineText(cstate, is_csv, true);
> +    else
> +        result = CopyReadLineText(cstate, is_csv, false);
>
> I'm curious whether this actually generates different code, and if it does,
> if it's actually faster.  We're already branching on cstate->simd_continue
> here.

I've compiled both versions with -O2 and confirmed they generate different
code. When simd_continue is passed as a constant to CopyReadLineText, the
compiler optimizes out the condition checks from the SIMD path.
A small benchmark on a 1GB+ file shows the expected benefit which is around
6% performance improvement.
I've attached the assembly outputs in case someone wants to check something
else.


Regards,
Ayoub Kazar


Attachments:

  [application/octet-stream] copyfromparse-constant.asm (48.0K, 3-copyfromparse-constant.asm)
  download

  [application/octet-stream] copyfromparse-variable.asm (47.1K, 4-copyfromparse-variable.asm)
  download

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-11-26 14:21  Manni Wood <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-11-26 14:21 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Nov 26, 2025 at 5:51 AM KAZAR Ayoub <[email protected]> wrote:

> Hello,
> On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <[email protected]>
> wrote:
>
>> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
>> > Thanks, done.
>>
>> I took a look at the v3 patches.  Here are my high-level thoughts:
>>
>> +    /*
>> +     * Parse data and transfer into line_buf. To get benefit from
>> inlining,
>> +     * call CopyReadLineText() with the constant boolean variables.
>> +     */
>> +    if (cstate->simd_continue)
>> +        result = CopyReadLineText(cstate, is_csv, true);
>> +    else
>> +        result = CopyReadLineText(cstate, is_csv, false);
>>
>> I'm curious whether this actually generates different code, and if it
>> does,
>> if it's actually faster.  We're already branching on cstate->simd_continue
>> here.
>
> I've compiled both versions with -O2 and confirmed they generate different
> code. When simd_continue is passed as a constant to CopyReadLineText, the
> compiler optimizes out the condition checks from the SIMD path.
> A small benchmark on a 1GB+ file shows the expected benefit which is
> around 6% performance improvement.
> I've attached the assembly outputs in case someone wants to check
> something else.
>
>
> Regards,
> Ayoub Kazar
>

Correction to my last post:

I also tried files that alternated lines with no special characters and
lines with 1/3rd special characters, thinking I could force the algorithm
to continually check whether or not it should use simd and therefore force
more overhead in the try-simd/don't-try-simd housekeeping code. The text
file was still 20% faster (not 50% faster as I originally stated --- that
was a typo). The CSV file was still 13% faster.

Also, apologies for posting at the top in my last e-mail.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-06 01:39  Manni Wood <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-12-06 01:39 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Nov 26, 2025 at 8:21 AM Manni Wood <[email protected]>
wrote:

>
>
> On Wed, Nov 26, 2025 at 5:51 AM KAZAR Ayoub <[email protected]> wrote:
>
>> Hello,
>> On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <[email protected]>
>> wrote:
>>
>>> On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
>>> > Thanks, done.
>>>
>>> I took a look at the v3 patches.  Here are my high-level thoughts:
>>>
>>> +    /*
>>> +     * Parse data and transfer into line_buf. To get benefit from
>>> inlining,
>>> +     * call CopyReadLineText() with the constant boolean variables.
>>> +     */
>>> +    if (cstate->simd_continue)
>>> +        result = CopyReadLineText(cstate, is_csv, true);
>>> +    else
>>> +        result = CopyReadLineText(cstate, is_csv, false);
>>>
>>> I'm curious whether this actually generates different code, and if it
>>> does,
>>> if it's actually faster.  We're already branching on
>>> cstate->simd_continue
>>> here.
>>
>> I've compiled both versions with -O2 and confirmed they generate
>> different code. When simd_continue is passed as a constant to
>> CopyReadLineText, the compiler optimizes out the condition checks from the
>> SIMD path.
>> A small benchmark on a 1GB+ file shows the expected benefit which is
>> around 6% performance improvement.
>> I've attached the assembly outputs in case someone wants to check
>> something else.
>>
>>
>> Regards,
>> Ayoub Kazar
>>
>
> Correction to my last post:
>
> I also tried files that alternated lines with no special characters and
> lines with 1/3rd special characters, thinking I could force the algorithm
> to continually check whether or not it should use simd and therefore force
> more overhead in the try-simd/don't-try-simd housekeeping code. The text
> file was still 20% faster (not 50% faster as I originally stated --- that
> was a typo). The CSV file was still 13% faster.
>
> Also, apologies for posting at the top in my last e-mail.
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>

Hello, all.

Andrew, I tried your suggestion of just reading the first chunk of the copy
file to determine if SIMD is worth using. Attached are v4 versions of the
patches showing a first attempt at doing that.

I attached test.sh.txt to show how I've been testing, with 5 million lines
of the various copy file variations introduced by Ayub Kazar.

The text copy with no special chars is 30% faster. The CSV copy with no
special chars is 48% faster. The text with 1/3rd escapes is 3% slower. The
CSV with 1/3rd quotes is 0.27% slower.

This set of patches follows the simplest suggestion of just testing the
first N lines (actually first N bytes) of the file and then deciding
whether or not to enable SIMD. This set of patches does not follow Andrew's
later suggestion of maybe checking again every million lines or so.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com

#!/bin/bash

set -e
set -o pipefail
set -u

PG="psql -X -U postgres -h localhost -d postgres"

echo " ======= Create table t"

${PG} <<EOF
DROP TABLE IF EXISTS t;
CREATE UNLOGGED TABLE t (id INT PRIMARY KEY, filler TEXT);
EOF


echo " ======= Text, no special characters; create /tmp/t_none.txt"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A', 4096)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_none.txt' (FORMAT text);
EOF

echo " ======= Text, no special characters; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
EOF

rm /tmp/t_none.txt


echo " ======= CSV, no special characters; create /tmp/t_none.csv"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A', 4096)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_none.csv' (FORMAT csv, QUOTE '"');
EOF

echo " ======= CSV, no special characters; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
EOF

rm /tmp/t_none.csv


echo " ======= Text, with 1/3 escapes; create /tmp/t_escape.txt"

${PG} <<'EOF'
truncate t;
INSERT INTO t
SELECT s, repeat('A\A', 1365)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_escape.txt' (FORMAT text);
EOF

echo " ======= Text, with 1/3 escapes; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
EOF

rm /tmp/t_escape.txt


echo " ======= CSV, with 1/3 quotes; create /tmp/t_quote.csv"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A"A', 1365)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_quote.csv' (FORMAT csv, QUOTE '"');
EOF

echo " ======= CSV, with 1/3 quotes; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
EOF

rm /tmp/t_quote.csv

echo " ======= Drop table t"

${PG} <<EOF
DROP TABLE IF EXISTS t;
EOF



Attachments:

  [text/plain] test.sh.txt (2.6K, 3-test.sh.txt)
  download | inline:
#!/bin/bash

set -e
set -o pipefail
set -u

PG="psql -X -U postgres -h localhost -d postgres"

echo " ======= Create table t"

${PG} <<EOF
DROP TABLE IF EXISTS t;
CREATE UNLOGGED TABLE t (id INT PRIMARY KEY, filler TEXT);
EOF


echo " ======= Text, no special characters; create /tmp/t_none.txt"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A', 4096)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_none.txt' (FORMAT text);
EOF

echo " ======= Text, no special characters; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
truncate table t;
\copy t from /tmp/t_none.txt
EOF

rm /tmp/t_none.txt


echo " ======= CSV, no special characters; create /tmp/t_none.csv"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A', 4096)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_none.csv' (FORMAT csv, QUOTE '"');
EOF

echo " ======= CSV, no special characters; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
truncate table t;
\copy t from /tmp/t_none.csv (format csv)
EOF

rm /tmp/t_none.csv


echo " ======= Text, with 1/3 escapes; create /tmp/t_escape.txt"

${PG} <<'EOF'
truncate t;
INSERT INTO t
SELECT s, repeat('A\A', 1365)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_escape.txt' (FORMAT text);
EOF

echo " ======= Text, with 1/3 escapes; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
truncate table t;
\copy t from /tmp/t_escape.txt
EOF

rm /tmp/t_escape.txt


echo " ======= CSV, with 1/3 quotes; create /tmp/t_quote.csv"

${PG} <<EOF
truncate t;
INSERT INTO t
SELECT s, repeat('A"A', 1365)
FROM generate_series(1, 5000000) AS s;
COPY t TO '/tmp/t_quote.csv' (FORMAT csv, QUOTE '"');
EOF

echo " ======= CSV, with 1/3 quotes; load times"

${PG} <<'EOF'
\timing
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
truncate table t;
\copy t from /tmp/t_quote.csv (format csv)
EOF

rm /tmp/t_quote.csv

echo " ======= Drop table t"

${PG} <<EOF
DROP TABLE IF EXISTS t;
EOF


  [text/x-patch] v4-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (5.0K, 4-v4-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 38b587dda44cb7160ee734cdea55a573f302c3a9 Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:33:46 -0600
Subject: [PATCH v4 2/2] Speed up COPY FROM text/CSV parsing using SIMD

Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
 src/backend/commands/copyfrom.c          |  3 +++
 src/backend/commands/copyfromparse.c     | 29 +++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 11 +++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..e638623e5b5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->special_chars_encountered = 0;
+	cstate->checked_simd = false;
+	cstate->use_simd = false;

 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 1edb525f072..8cfdfcd4cd8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1346,6 +1346,28 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)

 #ifndef USE_NO_SIMD

+		/*
+		 * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
+		 * cstate->bytes_processed will grow an unpredictable amount with each
+		 * call to this function, so just wait until we have crossed the
+		 * threshold.
+		 */
+		if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
+		{
+			cstate->checked_simd = true;
+
+			/*
+			 * If we have not read too many special characters
+			 * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
+			 * processing. This heuristic assumes that input does not vary too
+			 * much from line to line and that number of special characters
+			 * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
+			 * indicitive of the whole file.
+			 */
+			if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
+				cstate->use_simd = true;
+		}
+
 		/*
 		 * Use SIMD instructions to efficiently scan the input buffer for
 		 * special characters (e.g., newline, carriage return, quote, and
@@ -1358,7 +1380,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
@@ -1415,6 +1437,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			 */
 			if (c == '\r')
 			{
+				cstate->special_chars_encountered++;
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			}

@@ -1446,6 +1469,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \r */
 		if (c == '\r' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
 				cstate->eol_type == EOL_CRNL)
@@ -1502,6 +1526,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \n */
 		if (c == '\n' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1524,6 +1549,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		{
 			char		c2;

+			cstate->special_chars_encountered++;
+
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			IF_NEED_REFILL_AND_EOF_BREAK(0);

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..215215f909f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,17 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)

 	uint64		bytes_processed;	/* number of bytes processed so far */
+
+	/* the amount of bytes to read until checking if we should try simd */
+#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
+	/* the number of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+	uint64		special_chars_encountered;	/* number of special chars
+											 * encountered so far */
+	bool		checked_simd;	/* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+								 * and checked if we should use SIMD on the
+								 * rest of the file */
+	bool		use_simd;		/* use simd to speed up copying */
 } CopyFromStateData;

 extern void ReceiveCopyBegin(CopyFromState cstate);
--
2.52.0



  [text/x-patch] v4-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.7K, 5-v4-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 0b1f786bf58c3d90e078d4afa83b7d43dda08491 Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:30:00 -0600
Subject: [PATCH v4 1/2] Speed up COPY FROM text/CSV parsing using SIMD

Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
 src/backend/commands/copyfromparse.c | 73 ++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index a09e7fbace3..1edb525f072 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,63 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - The remaining buffer is smaller than one vector
+		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+		 */
+		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			/* \n and \r are not special inside quotes */
+			if (!in_quote)
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+			if (is_csv)
+			{
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+				match = vector8_or(match, vector8_eq(chunk, bs));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-06 07:55  Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Bilal Yavuz @ 2025-12-06 07:55 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]> wrote:
> Hello, all.
>
> Andrew, I tried your suggestion of just reading the first chunk of the copy file to determine if SIMD is worth using. Attached are v4 versions of the patches showing a first attempt at doing that.

Thank you for doing this!

> I attached test.sh.txt to show how I've been testing, with 5 million lines of the various copy file variations introduced by Ayub Kazar.
>
> The text copy with no special chars is 30% faster. The CSV copy with no special chars is 48% faster. The text with 1/3rd escapes is 3% slower. The CSV with 1/3rd quotes is 0.27% slower.
>
> This set of patches follows the simplest suggestion of just testing the first N lines (actually first N bytes) of the file and then deciding whether or not to enable SIMD. This set of patches does not follow Andrew's later suggestion of maybe checking again every million lines or so.

My input-generation script is not ready to share yet, but the inputs
follow this format: text_${n}.input, where n represents the number of
normal characters before the delimiter. For example:

n = 0 -> "\n\n\n\n\n..." (no normal characters)
n = 1 -> "a\n..." (1 normal character before the delimiter)
...
n = 5 -> "aaaaa\n..."
… continuing up to n = 32.

Each line has 4096 chars and there are a total of 100000 lines in each
input file.

I only benchmarked the text format. I compared the latest heuristic I
shared [1] with the current method. The benchmarks show roughly a ~16%
regression at the worst case (n = 2), with regressions up to n = 5.
For the remaining values, performance was similar.

Actual comparison of timings (in ms):

current method / heuristic
n = 0 -> 3252.7253 / 2856.2753 (%12)
n = 1 -> 2910.321 / 2520.7717 (%13)
n = 2 -> 2865.008 / 2403.2017 (%16)
n = 3 -> 2608.649 / 2353.1477 (%9)
n = 4 -> 2460.74 / 2300.1783 (%6)
n = 5 -> 2451.696 / 2362.1573 (%3)
No difference for the rest.

Side note: Sorry for the delay in responding, I will continue working
on this next week.

[1] https://postgr.es/m/CAN55FZ1KF7XNpm2XyG%3DM-sFUODai%3D6Z8a11xE3s4YRBeBKY3tA%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-09 13:40  Bilal Yavuz <[email protected]>
  parent: Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Bilal Yavuz @ 2025-12-09 13:40 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 6 Dec 2025 at 10:55, Bilal Yavuz <[email protected]> wrote:
>
> Hi,
>
> On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]> wrote:
> > Hello, all.
> >
> > Andrew, I tried your suggestion of just reading the first chunk of the copy file to determine if SIMD is worth using. Attached are v4 versions of the patches showing a first attempt at doing that.
>
> Thank you for doing this!
>
> > I attached test.sh.txt to show how I've been testing, with 5 million lines of the various copy file variations introduced by Ayub Kazar.
> >
> > The text copy with no special chars is 30% faster. The CSV copy with no special chars is 48% faster. The text with 1/3rd escapes is 3% slower. The CSV with 1/3rd quotes is 0.27% slower.
> >
> > This set of patches follows the simplest suggestion of just testing the first N lines (actually first N bytes) of the file and then deciding whether or not to enable SIMD. This set of patches does not follow Andrew's later suggestion of maybe checking again every million lines or so.
>
> My input-generation script is not ready to share yet, but the inputs
> follow this format: text_${n}.input, where n represents the number of
> normal characters before the delimiter. For example:
>
> n = 0 -> "\n\n\n\n\n..." (no normal characters)
> n = 1 -> "a\n..." (1 normal character before the delimiter)
> ...
> n = 5 -> "aaaaa\n..."
> … continuing up to n = 32.
>
> Each line has 4096 chars and there are a total of 100000 lines in each
> input file.
>
> I only benchmarked the text format. I compared the latest heuristic I
> shared [1] with the current method. The benchmarks show roughly a ~16%
> regression at the worst case (n = 2), with regressions up to n = 5.
> For the remaining values, performance was similar.

I tried to improve the v4 patchset. My changes are:

1 - I changed CopyReadLineText() to an inline function and sent the
use_simd variable as an argument to get help from inlining.

2 - A main for loop in the CopyReadLineText() function is called many
times, so I moved the use_simd check to the CopyReadLine() function.

3 - Instead of 'bytes_processed', I used 'chars_processed' because
cstate->bytes_processed is increased before we process them and this
can cause wrong results.

4 - Because of #2 and #3, instead of having
'SPECIAL_CHAR_SIMD_THRESHOLD', I used the ratio of 'chars_processed /
special_chars_encountered' to determine whether we want to use SIMD.

5 - cstate->special_chars_encountered is incremented wrongly for the
CSV case. It is not incremented for the quote and escape delimiters. I
moved all increments of cstate->special_chars_encountered to the
central place and tried to optimize it but it still causes a
regression as it creates one more branching.

With these changes, I am able to decrease the regression to %10 from
%16. Regression decreases to %7 if I modify #5 for the only text input
but I did not do that.

My changes are in the 0003.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v4.1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.5K, 2-v4.1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From a1b4d28069786c3fb506c79e096312fcfd585fdb Mon Sep 17 00:00:00 2001
From: Shinya Kato <[email protected]>
Date: Mon, 28 Jul 2025 22:08:20 +0900
Subject: [PATCH v4.1 1/3] Speed up COPY FROM text/CSV parsing using SIMD

---
 src/backend/commands/copyfromparse.c | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 62afcd8fad1..cf110767542 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,66 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - The remaining buffer is smaller than one vector
+		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+		 */
+		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.51.0



  [text/x-patch] v4.1-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (5.0K, 3-v4.1-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 3a2f9ff26755a5248b7a33770f4603fec483d3bc Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:33:46 -0600
Subject: [PATCH v4.1 2/3] Speed up COPY FROM text/CSV parsing using SIMD

Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
 src/include/commands/copyfrom_internal.h | 11 +++++++++
 src/backend/commands/copyfrom.c          |  3 +++
 src/backend/commands/copyfromparse.c     | 29 +++++++++++++++++++++++-
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..215215f909f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,17 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
 	uint64		bytes_processed;	/* number of bytes processed so far */
+
+	/* the amount of bytes to read until checking if we should try simd */
+#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
+	/* the number of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+	uint64		special_chars_encountered;	/* number of special chars
+											 * encountered so far */
+	bool		checked_simd;	/* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+								 * and checked if we should use SIMD on the
+								 * rest of the file */
+	bool		use_simd;		/* use simd to speed up copying */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..e638623e5b5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->special_chars_encountered = 0;
+	cstate->checked_simd = false;
+	cstate->use_simd = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index cf110767542..549b56c21fb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1346,6 +1346,28 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 
 #ifndef USE_NO_SIMD
 
+		/*
+		 * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
+		 * cstate->bytes_processed will grow an unpredictable amount with each
+		 * call to this function, so just wait until we have crossed the
+		 * threshold.
+		 */
+		if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
+		{
+			cstate->checked_simd = true;
+
+			/*
+			 * If we have not read too many special characters
+			 * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
+			 * processing. This heuristic assumes that input does not vary too
+			 * much from line to line and that number of special characters
+			 * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
+			 * indicitive of the whole file.
+			 */
+			if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
+				cstate->use_simd = true;
+		}
+
 		/*
 		 * Use SIMD instructions to efficiently scan the input buffer for
 		 * special characters (e.g., newline, carriage return, quote, and
@@ -1358,7 +1380,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
@@ -1418,6 +1440,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			 */
 			if (c == '\r')
 			{
+				cstate->special_chars_encountered++;
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			}
 
@@ -1449,6 +1472,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \r */
 		if (c == '\r' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
 				cstate->eol_type == EOL_CRNL)
@@ -1505,6 +1529,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \n */
 		if (c == '\n' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1527,6 +1552,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		{
 			char		c2;
 
+			cstate->special_chars_encountered++;
+
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			IF_NEED_REFILL_AND_EOF_BREAK(0);
 
-- 
2.51.0



  [text/x-patch] v4.1-0003-Feedback-Changes.patch (7.9K, 4-v4.1-0003-Feedback-Changes.patch)
  download | inline diff:
From 8d0e6766175abac15b39884126c29da03657be40 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 9 Dec 2025 15:32:10 +0300
Subject: [PATCH v4.1 3/3] Feedback / Changes

---
 src/include/commands/copyfrom_internal.h |  9 +--
 src/backend/commands/copyfrom.c          |  1 +
 src/backend/commands/copyfromparse.c     | 88 +++++++++++++++---------
 3 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 215215f909f..397720bf875 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,12 +183,13 @@ typedef struct CopyFromStateData
 	uint64		bytes_processed;	/* number of bytes processed so far */
 
 	/* the amount of bytes to read until checking if we should try simd */
-#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
-	/* the number of special chars read below which we use simd */
-#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+#define CHARS_PROCESSED_UNTIL_SIMD_CHECK 100000
+	/* the ratio of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_RATIO 4
+	uint64		chars_processed;
 	uint64		special_chars_encountered;	/* number of special chars
 											 * encountered so far */
-	bool		checked_simd;	/* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+	bool		checked_simd;	/* we read CHARS_PROCESSED_UNTIL_SIMD_CHECK
 								 * and checked if we should use SIMD on the
 								 * rest of the file */
 	bool		use_simd;		/* use simd to speed up copying */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e638623e5b5..d44dd16eced 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->chars_processed = 0;
 	cstate->special_chars_encountered = 0;
 	cstate->checked_simd = false;
 	cstate->use_simd = false;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 549b56c21fb..86a268d0df9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -143,7 +143,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1173,8 +1173,40 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+#ifndef USE_NO_SIMD
+
+	/*
+	 * Wait until we have read more than CHARS_PROCESSED_UNTIL_SIMD_CHECK.
+	 * cstate->bytes_processed will grow an unpredictable amount with each
+	 * call to this function, so just wait until we have crossed the
+	 * threshold.
+	 */
+	if (!cstate->checked_simd && cstate->chars_processed > CHARS_PROCESSED_UNTIL_SIMD_CHECK)
+	{
+		cstate->checked_simd = true;
+
+		/*
+		 * If we have not read too many special characters then start using
+		 * SIMD to speed up processing. This heuristic assumes that input does
+		 * not vary too much from line to line and that number of special
+		 * characters encountered in the first
+		 * CHARS_PROCESSED_UNTIL_SIMD_CHECK are indicitive of the whole file.
+		 */
+		if (cstate->chars_processed / SPECIAL_CHAR_SIMD_RATIO >= cstate->special_chars_encountered)
+		{
+			cstate->use_simd = true;
+		}
+	}
+#endif
+
+	/*
+	 * Parse data and transfer into line_buf. To get benefit from inlining,
+	 * call CopyReadLineText() with the constant boolean variables.
+	 */
+	if (cstate->use_simd)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,8 +1273,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1309,7 +1341,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	input_buf_ptr = cstate->input_buf_index;
 	copy_buf_len = cstate->input_buf_len;
 
-	for (;;)
+	for (;; cstate->chars_processed++)
 	{
 		int			prev_raw_ptr;
 		char		c;
@@ -1346,28 +1378,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 
 #ifndef USE_NO_SIMD
 
-		/*
-		 * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
-		 * cstate->bytes_processed will grow an unpredictable amount with each
-		 * call to this function, so just wait until we have crossed the
-		 * threshold.
-		 */
-		if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
-		{
-			cstate->checked_simd = true;
-
-			/*
-			 * If we have not read too many special characters
-			 * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
-			 * processing. This heuristic assumes that input does not vary too
-			 * much from line to line and that number of special characters
-			 * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
-			 * indicitive of the whole file.
-			 */
-			if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
-				cstate->use_simd = true;
-		}
-
 		/*
 		 * Use SIMD instructions to efficiently scan the input buffer for
 		 * special characters (e.g., newline, carriage return, quote, and
@@ -1380,7 +1390,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
@@ -1430,6 +1440,21 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
+		/* Use this calculation decide whether to use SIMD later */
+		if (!use_simd && unlikely(!cstate->checked_simd))
+		{
+			if (is_csv)
+			{
+				if (c == '\r' || c == '\n' || c == quotec || c == escapec)
+					cstate->special_chars_encountered++;
+			}
+			else
+			{
+				if (c == '\r' || c == '\n' || c == '\\')
+					cstate->special_chars_encountered++;
+			}
+		}
+
 		if (is_csv)
 		{
 			/*
@@ -1440,7 +1465,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			 */
 			if (c == '\r')
 			{
-				cstate->special_chars_encountered++;
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			}
 
@@ -1472,7 +1496,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \r */
 		if (c == '\r' && (!is_csv || !in_quote))
 		{
-			cstate->special_chars_encountered++;
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
 				cstate->eol_type == EOL_CRNL)
@@ -1529,7 +1552,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \n */
 		if (c == '\n' && (!is_csv || !in_quote))
 		{
-			cstate->special_chars_encountered++;
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1552,8 +1574,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		{
 			char		c2;
 
-			cstate->special_chars_encountered++;
-
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			IF_NEED_REFILL_AND_EOF_BREAK(0);
 
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-09 22:13  Manni Wood <[email protected]>
  parent: Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-12-09 22:13 UTC (permalink / raw)
  To: Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Dec 9, 2025 at 7:40 AM Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sat, 6 Dec 2025 at 10:55, Bilal Yavuz <[email protected]> wrote:
> >
> > Hi,
> >
> > On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]>
> wrote:
> > > Hello, all.
> > >
> > > Andrew, I tried your suggestion of just reading the first chunk of the
> copy file to determine if SIMD is worth using. Attached are v4 versions of
> the patches showing a first attempt at doing that.
> >
> > Thank you for doing this!
> >
> > > I attached test.sh.txt to show how I've been testing, with 5 million
> lines of the various copy file variations introduced by Ayub Kazar.
> > >
> > > The text copy with no special chars is 30% faster. The CSV copy with
> no special chars is 48% faster. The text with 1/3rd escapes is 3% slower.
> The CSV with 1/3rd quotes is 0.27% slower.
> > >
> > > This set of patches follows the simplest suggestion of just testing
> the first N lines (actually first N bytes) of the file and then deciding
> whether or not to enable SIMD. This set of patches does not follow Andrew's
> later suggestion of maybe checking again every million lines or so.
> >
> > My input-generation script is not ready to share yet, but the inputs
> > follow this format: text_${n}.input, where n represents the number of
> > normal characters before the delimiter. For example:
> >
> > n = 0 -> "\n\n\n\n\n..." (no normal characters)
> > n = 1 -> "a\n..." (1 normal character before the delimiter)
> > ...
> > n = 5 -> "aaaaa\n..."
> > … continuing up to n = 32.
> >
> > Each line has 4096 chars and there are a total of 100000 lines in each
> > input file.
> >
> > I only benchmarked the text format. I compared the latest heuristic I
> > shared [1] with the current method. The benchmarks show roughly a ~16%
> > regression at the worst case (n = 2), with regressions up to n = 5.
> > For the remaining values, performance was similar.
>
> I tried to improve the v4 patchset. My changes are:
>
> 1 - I changed CopyReadLineText() to an inline function and sent the
> use_simd variable as an argument to get help from inlining.
>
> 2 - A main for loop in the CopyReadLineText() function is called many
> times, so I moved the use_simd check to the CopyReadLine() function.
>
> 3 - Instead of 'bytes_processed', I used 'chars_processed' because
> cstate->bytes_processed is increased before we process them and this
> can cause wrong results.
>
> 4 - Because of #2 and #3, instead of having
> 'SPECIAL_CHAR_SIMD_THRESHOLD', I used the ratio of 'chars_processed /
> special_chars_encountered' to determine whether we want to use SIMD.
>
> 5 - cstate->special_chars_encountered is incremented wrongly for the
> CSV case. It is not incremented for the quote and escape delimiters. I
> moved all increments of cstate->special_chars_encountered to the
> central place and tried to optimize it but it still causes a
> regression as it creates one more branching.
>
> With these changes, I am able to decrease the regression to %10 from
> %16. Regression decreases to %7 if I modify #5 for the only text input
> but I did not do that.
>
> My changes are in the 0003.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Bilal Yavuz (Nazir Bilal Yavuz?), I did not get a chance to do any work on
this today, but wanted to thank you for finding my logic errors in counting
special chars for CSV, and hacking on my naive solution to make it faster.
By attempting Andrew Dunstan's suggestion, I got a better feel for the
reality that the "housekeeping" code produces a significant amount of
overhead.
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-10 11:59  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-12-10 11:59 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 10 Dec 2025 at 01:13, Manni Wood <[email protected]> wrote:
>
> Bilal Yavuz (Nazir Bilal Yavuz?),

It is Nazir Bilal Yavuz, I changed some settings on my phone and it
seems that it affected my mail account, hopefully it should be fixed
now.

> I did not get a chance to do any work on this today, but wanted to thank you for finding my logic errors in counting special chars for CSV, and hacking on my naive solution to make it faster. By attempting Andrew Dunstan's suggestion, I got a better feel for the reality that the "housekeeping" code produces a significant amount of overhead.

You are welcome! v4.1 has some problems with in_quote case in SIMD
handling code and counting cstate->chars_processed variable. I fixed
them in v4.2.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v4.2-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.7K, 2-v4.2-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From e4546b0612bd2fde6190a9ade6e60a1f08299184 Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:30:00 -0600
Subject: [PATCH v4.2 1/3] Speed up COPY FROM text/CSV parsing using SIMD

Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
 src/backend/commands/copyfromparse.c | 73 ++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 62afcd8fad1..673d6683a72 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1328,6 +1344,63 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - The remaining buffer is smaller than one vector
+		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+		 */
+		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			/* \n and \r are not special inside quotes */
+			if (!in_quote)
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+			if (is_csv)
+			{
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+				match = vector8_or(match, vector8_eq(chunk, bs));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
-- 
2.51.0



  [text/x-patch] v4.2-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (5.0K, 3-v4.2-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 92ac4ada1e4833f81ce30164b48868dc1ade102f Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:33:46 -0600
Subject: [PATCH v4.2 2/3] Speed up COPY FROM text/CSV parsing using SIMD

Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
 src/include/commands/copyfrom_internal.h | 11 +++++++++
 src/backend/commands/copyfrom.c          |  3 +++
 src/backend/commands/copyfromparse.c     | 29 +++++++++++++++++++++++-
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..215215f909f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,17 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
 	uint64		bytes_processed;	/* number of bytes processed so far */
+
+	/* the amount of bytes to read until checking if we should try simd */
+#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
+	/* the number of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+	uint64		special_chars_encountered;	/* number of special chars
+											 * encountered so far */
+	bool		checked_simd;	/* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+								 * and checked if we should use SIMD on the
+								 * rest of the file */
+	bool		use_simd;		/* use simd to speed up copying */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ae3d2ba86e..6711c0cfcdd 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->special_chars_encountered = 0;
+	cstate->checked_simd = false;
+	cstate->use_simd = false;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 673d6683a72..d548674c8ff 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1346,6 +1346,28 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 
 #ifndef USE_NO_SIMD
 
+		/*
+		 * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
+		 * cstate->bytes_processed will grow an unpredictable amount with each
+		 * call to this function, so just wait until we have crossed the
+		 * threshold.
+		 */
+		if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
+		{
+			cstate->checked_simd = true;
+
+			/*
+			 * If we have not read too many special characters
+			 * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
+			 * processing. This heuristic assumes that input does not vary too
+			 * much from line to line and that number of special characters
+			 * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
+			 * indicitive of the whole file.
+			 */
+			if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
+				cstate->use_simd = true;
+		}
+
 		/*
 		 * Use SIMD instructions to efficiently scan the input buffer for
 		 * special characters (e.g., newline, carriage return, quote, and
@@ -1358,7 +1380,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
@@ -1415,6 +1437,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			 */
 			if (c == '\r')
 			{
+				cstate->special_chars_encountered++;
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			}
 
@@ -1446,6 +1469,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \r */
 		if (c == '\r' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
 				cstate->eol_type == EOL_CRNL)
@@ -1502,6 +1526,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \n */
 		if (c == '\n' && (!is_csv || !in_quote))
 		{
+			cstate->special_chars_encountered++;
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1524,6 +1549,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		{
 			char		c2;
 
+			cstate->special_chars_encountered++;
+
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			IF_NEED_REFILL_AND_EOF_BREAK(0);
 
-- 
2.51.0



  [text/x-patch] v4.2-0003-Feedback-Changes.patch (8.5K, 4-v4.2-0003-Feedback-Changes.patch)
  download | inline diff:
From 128574f80963c5b532c8aa7e7fad84a7e6e20874 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 9 Dec 2025 15:32:10 +0300
Subject: [PATCH v4.2 3/3] Feedback / Changes

---
 src/include/commands/copyfrom_internal.h |  9 +--
 src/backend/commands/copyfrom.c          |  1 +
 src/backend/commands/copyfromparse.c     | 92 +++++++++++++++---------
 3 files changed, 65 insertions(+), 37 deletions(-)

diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 215215f909f..397720bf875 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,12 +183,13 @@ typedef struct CopyFromStateData
 	uint64		bytes_processed;	/* number of bytes processed so far */
 
 	/* the amount of bytes to read until checking if we should try simd */
-#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
-	/* the number of special chars read below which we use simd */
-#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+#define CHARS_PROCESSED_UNTIL_SIMD_CHECK 100000
+	/* the ratio of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_RATIO 4
+	uint64		chars_processed;
 	uint64		special_chars_encountered;	/* number of special chars
 											 * encountered so far */
-	bool		checked_simd;	/* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+	bool		checked_simd;	/* we read CHARS_PROCESSED_UNTIL_SIMD_CHECK
 								 * and checked if we should use SIMD on the
 								 * rest of the file */
 	bool		use_simd;		/* use simd to speed up copying */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 6711c0cfcdd..2b77ba2556c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->chars_processed = 0;
 	cstate->special_chars_encountered = 0;
 	cstate->checked_simd = false;
 	cstate->use_simd = false;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d548674c8ff..720222152da 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -143,7 +143,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1173,8 +1173,40 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+#ifndef USE_NO_SIMD
+
+	/*
+	 * Wait until we have read more than CHARS_PROCESSED_UNTIL_SIMD_CHECK.
+	 * cstate->bytes_processed will grow an unpredictable amount with each
+	 * call to this function, so just wait until we have crossed the
+	 * threshold.
+	 */
+	if (!cstate->checked_simd && cstate->chars_processed > CHARS_PROCESSED_UNTIL_SIMD_CHECK)
+	{
+		cstate->checked_simd = true;
+
+		/*
+		 * If we have not read too many special characters then start using
+		 * SIMD to speed up processing. This heuristic assumes that input does
+		 * not vary too much from line to line and that number of special
+		 * characters encountered in the first
+		 * CHARS_PROCESSED_UNTIL_SIMD_CHECK are indicitive of the whole file.
+		 */
+		if (cstate->chars_processed / SPECIAL_CHAR_SIMD_RATIO >= cstate->special_chars_encountered)
+		{
+			cstate->use_simd = true;
+		}
+	}
+#endif
+
+	/*
+	 * Parse data and transfer into line_buf. To get benefit from inlining,
+	 * call CopyReadLineText() with the constant boolean variables.
+	 */
+	if (cstate->use_simd)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,11 +1273,12 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
+	int			start_input_buf_ptr;
 	int			copy_buf_len;
 	bool		need_data = false;
 	bool		hit_eof = false;
@@ -1309,6 +1342,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	input_buf_ptr = cstate->input_buf_index;
 	copy_buf_len = cstate->input_buf_len;
 
+	start_input_buf_ptr = input_buf_ptr;
 	for (;;)
 	{
 		int			prev_raw_ptr;
@@ -1327,9 +1361,11 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			REFILL_LINEBUF;
 
 			CopyLoadInputBuf(cstate);
+			cstate->chars_processed += (input_buf_ptr - start_input_buf_ptr);
 			/* update our local variables */
 			hit_eof = cstate->input_reached_eof;
 			input_buf_ptr = cstate->input_buf_index;
+			start_input_buf_ptr = input_buf_ptr;
 			copy_buf_len = cstate->input_buf_len;
 
 			/*
@@ -1346,28 +1382,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 
 #ifndef USE_NO_SIMD
 
-		/*
-		 * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
-		 * cstate->bytes_processed will grow an unpredictable amount with each
-		 * call to this function, so just wait until we have crossed the
-		 * threshold.
-		 */
-		if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
-		{
-			cstate->checked_simd = true;
-
-			/*
-			 * If we have not read too many special characters
-			 * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
-			 * processing. This heuristic assumes that input does not vary too
-			 * much from line to line and that number of special characters
-			 * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
-			 * indicitive of the whole file.
-			 */
-			if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
-				cstate->use_simd = true;
-		}
-
 		/*
 		 * Use SIMD instructions to efficiently scan the input buffer for
 		 * special characters (e.g., newline, carriage return, quote, and
@@ -1380,7 +1394,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * sequentially. - The remaining buffer is smaller than one vector
 		 * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
 		 */
-		if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match = vector8_broadcast(0);
@@ -1427,6 +1441,21 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
 
+		/* Use this calculation decide whether to use SIMD later */
+		if (!use_simd && unlikely(!cstate->checked_simd))
+		{
+			if (is_csv)
+			{
+				if (c == '\r' || c == '\n' || c == quotec || c == escapec)
+					cstate->special_chars_encountered++;
+			}
+			else
+			{
+				if (c == '\r' || c == '\n' || c == '\\')
+					cstate->special_chars_encountered++;
+			}
+		}
+
 		if (is_csv)
 		{
 			/*
@@ -1437,7 +1466,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			 */
 			if (c == '\r')
 			{
-				cstate->special_chars_encountered++;
 				IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			}
 
@@ -1469,7 +1497,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \r */
 		if (c == '\r' && (!is_csv || !in_quote))
 		{
-			cstate->special_chars_encountered++;
 			/* Check for \r\n on first line, _and_ handle \r\n. */
 			if (cstate->eol_type == EOL_UNKNOWN ||
 				cstate->eol_type == EOL_CRNL)
@@ -1526,7 +1553,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* Process \n */
 		if (c == '\n' && (!is_csv || !in_quote))
 		{
-			cstate->special_chars_encountered++;
 			if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
 				ereport(ERROR,
 						(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1549,8 +1575,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		{
 			char		c2;
 
-			cstate->special_chars_encountered++;
-
 			IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
 			IF_NEED_REFILL_AND_EOF_BREAK(0);
 
@@ -1635,6 +1659,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 */
 	REFILL_LINEBUF;
 
+	cstate->chars_processed += (input_buf_ptr - start_input_buf_ptr);
+
 	return result;
 }
 
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-12 20:42  Mark Wong <[email protected]>
  parent: Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Mark Wong @ 2025-12-12 20:42 UTC (permalink / raw)
  To: Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi everyone,

On Tue, Dec 09, 2025 at 04:40:19PM +0300, Bilal Yavuz wrote:
> Hi,
> 
> On Sat, 6 Dec 2025 at 10:55, Bilal Yavuz <[email protected]> wrote:
> >
> > Hi,
> >
> > On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]> wrote:
> > > Hello, all.
> > >
> > > Andrew, I tried your suggestion of just reading the first chunk of the copy file to determine if SIMD is worth using. Attached are v4 versions of the patches showing a first attempt at doing that.
> >
> > Thank you for doing this!
> >
> > > I attached test.sh.txt to show how I've been testing, with 5 million lines of the various copy file variations introduced by Ayub Kazar.
> > >
> > > The text copy with no special chars is 30% faster. The CSV copy with no special chars is 48% faster. The text with 1/3rd escapes is 3% slower. The CSV with 1/3rd quotes is 0.27% slower.
> > >
> > > This set of patches follows the simplest suggestion of just testing the first N lines (actually first N bytes) of the file and then deciding whether or not to enable SIMD. This set of patches does not follow Andrew's later suggestion of maybe checking again every million lines or so.
> >
> > My input-generation script is not ready to share yet, but the inputs
> > follow this format: text_${n}.input, where n represents the number of
> > normal characters before the delimiter. For example:
> >
> > n = 0 -> "\n\n\n\n\n..." (no normal characters)
> > n = 1 -> "a\n..." (1 normal character before the delimiter)
> > ...
> > n = 5 -> "aaaaa\n..."
> > … continuing up to n = 32.
> >
> > Each line has 4096 chars and there are a total of 100000 lines in each
> > input file.
> >
> > I only benchmarked the text format. I compared the latest heuristic I
> > shared [1] with the current method. The benchmarks show roughly a ~16%
> > regression at the worst case (n = 2), with regressions up to n = 5.
> > For the remaining values, performance was similar.
> 
> I tried to improve the v4 patchset. My changes are:
> 
> 1 - I changed CopyReadLineText() to an inline function and sent the
> use_simd variable as an argument to get help from inlining.
> 
> 2 - A main for loop in the CopyReadLineText() function is called many
> times, so I moved the use_simd check to the CopyReadLine() function.
> 
> 3 - Instead of 'bytes_processed', I used 'chars_processed' because
> cstate->bytes_processed is increased before we process them and this
> can cause wrong results.
> 
> 4 - Because of #2 and #3, instead of having
> 'SPECIAL_CHAR_SIMD_THRESHOLD', I used the ratio of 'chars_processed /
> special_chars_encountered' to determine whether we want to use SIMD.
> 
> 5 - cstate->special_chars_encountered is incremented wrongly for the
> CSV case. It is not incremented for the quote and escape delimiters. I
> moved all increments of cstate->special_chars_encountered to the
> central place and tried to optimize it but it still causes a
> regression as it creates one more branching.
> 
> With these changes, I am able to decrease the regression to %10 from
> %16. Regression decreases to %7 if I modify #5 for the only text input
> but I did not do that.
> 
> My changes are in the 0003.

I was helping collect some data, but I'm a little behind sharing what I
ran against the v4.1 patches (on commit 07961ef8) with the v4.2 version
out there...

I hope it's still helpfule that I share what I collected even though
they are not quite as nice, but maybe it's more about how/where I ran
them.

My laptop has a Intel(R) Core(TM) Ultra 7 165H, where most of these
tests were using up 95%+ of one of the cores (I have hyperthreading
disabled), and using about 10% the ssd's capacity.

Summarizing my results from the same script Manni ran, I didn't see as
much as an improvement in the positive tests, and then saw more negative
results in the other tests.

text copy with no special chars: 18% improvement of 15s from 80s before
the patch

CSV copy with no special chars: 23% improvement of 23s from 96s before
the patch

text with 1/3rd escapes: 6% slower, an additional 5s to 85 seconds
before the patch

CSV with 1/3rd quotes: 7% slower, an additional 10 seconds to 129
seconds before the patch


I'm wondering if my laptop/processor isn't the best test bed for this...

Regards,
Mark
--
Mark Wong <[email protected]>
EDB https://enterprisedb.com





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-12 23:09  Manni Wood <[email protected]>
  parent: Mark Wong <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2025-12-12 23:09 UTC (permalink / raw)
  To: Mark Wong <[email protected]>; +Cc: Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Dec 12, 2025 at 2:42 PM Mark Wong <[email protected]> wrote:

> Hi everyone,
>
> On Tue, Dec 09, 2025 at 04:40:19PM +0300, Bilal Yavuz wrote:
> > Hi,
> >
> > On Sat, 6 Dec 2025 at 10:55, Bilal Yavuz <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]>
> wrote:
> > > > Hello, all.
> > > >
> > > > Andrew, I tried your suggestion of just reading the first chunk of
> the copy file to determine if SIMD is worth using. Attached are v4 versions
> of the patches showing a first attempt at doing that.
> > >
> > > Thank you for doing this!
> > >
> > > > I attached test.sh.txt to show how I've been testing, with 5 million
> lines of the various copy file variations introduced by Ayub Kazar.
> > > >
> > > > The text copy with no special chars is 30% faster. The CSV copy with
> no special chars is 48% faster. The text with 1/3rd escapes is 3% slower.
> The CSV with 1/3rd quotes is 0.27% slower.
> > > >
> > > > This set of patches follows the simplest suggestion of just testing
> the first N lines (actually first N bytes) of the file and then deciding
> whether or not to enable SIMD. This set of patches does not follow Andrew's
> later suggestion of maybe checking again every million lines or so.
> > >
> > > My input-generation script is not ready to share yet, but the inputs
> > > follow this format: text_${n}.input, where n represents the number of
> > > normal characters before the delimiter. For example:
> > >
> > > n = 0 -> "\n\n\n\n\n..." (no normal characters)
> > > n = 1 -> "a\n..." (1 normal character before the delimiter)
> > > ...
> > > n = 5 -> "aaaaa\n..."
> > > … continuing up to n = 32.
> > >
> > > Each line has 4096 chars and there are a total of 100000 lines in each
> > > input file.
> > >
> > > I only benchmarked the text format. I compared the latest heuristic I
> > > shared [1] with the current method. The benchmarks show roughly a ~16%
> > > regression at the worst case (n = 2), with regressions up to n = 5.
> > > For the remaining values, performance was similar.
> >
> > I tried to improve the v4 patchset. My changes are:
> >
> > 1 - I changed CopyReadLineText() to an inline function and sent the
> > use_simd variable as an argument to get help from inlining.
> >
> > 2 - A main for loop in the CopyReadLineText() function is called many
> > times, so I moved the use_simd check to the CopyReadLine() function.
> >
> > 3 - Instead of 'bytes_processed', I used 'chars_processed' because
> > cstate->bytes_processed is increased before we process them and this
> > can cause wrong results.
> >
> > 4 - Because of #2 and #3, instead of having
> > 'SPECIAL_CHAR_SIMD_THRESHOLD', I used the ratio of 'chars_processed /
> > special_chars_encountered' to determine whether we want to use SIMD.
> >
> > 5 - cstate->special_chars_encountered is incremented wrongly for the
> > CSV case. It is not incremented for the quote and escape delimiters. I
> > moved all increments of cstate->special_chars_encountered to the
> > central place and tried to optimize it but it still causes a
> > regression as it creates one more branching.
> >
> > With these changes, I am able to decrease the regression to %10 from
> > %16. Regression decreases to %7 if I modify #5 for the only text input
> > but I did not do that.
> >
> > My changes are in the 0003.
>
> I was helping collect some data, but I'm a little behind sharing what I
> ran against the v4.1 patches (on commit 07961ef8) with the v4.2 version
> out there...
>
> I hope it's still helpfule that I share what I collected even though
> they are not quite as nice, but maybe it's more about how/where I ran
> them.
>
> My laptop has a Intel(R) Core(TM) Ultra 7 165H, where most of these
> tests were using up 95%+ of one of the cores (I have hyperthreading
> disabled), and using about 10% the ssd's capacity.
>
> Summarizing my results from the same script Manni ran, I didn't see as
> much as an improvement in the positive tests, and then saw more negative
> results in the other tests.
>
> text copy with no special chars: 18% improvement of 15s from 80s before
> the patch
>
> CSV copy with no special chars: 23% improvement of 23s from 96s before
> the patch
>
> text with 1/3rd escapes: 6% slower, an additional 5s to 85 seconds
> before the patch
>
> CSV with 1/3rd quotes: 7% slower, an additional 10 seconds to 129
> seconds before the patch
>
>
> I'm wondering if my laptop/processor isn't the best test bed for this...
>
> Regards,
> Mark
> --
> Mark Wong <[email protected]>
> EDB https://enterprisedb.com
>

Hello, Everyone!

I have attached two files. 1) the shell script that Mark and I have been
using to get our test results, and 2) a screenshot of a spreadsheet of my
latest test results. (Please let me know if there's a different format than
a screenshot that I could share my spreadsheet in.)

I took greater care this time to compile all three variants of Postgres
(master at bfb335df, master at bfb335df with v4.2 patches installed, master
at bfb335df with v3 patches installed) with the same gcc optimization flags
that would be used to build Postgres packages. To the best of my knowledge,
the two gcc flags of greatest interest would be -g and -O2. I built all
three variants of Postgres using meson like so:

BRANCH=$(git branch --show-current)
meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH}
--buildtype=debugoptimized

It occurred to me that in addition to end users only caring about 1) wall
clock time (is the speedup noticeable in "real time" or just technically
faster / uses less CPU?) and 2) Postgres binaries compiled with the same
optimization level one would get when installing Postgres from packages
like .deb or .rpm; in other words, will the user see speedups without
having do manually compile postgres.

My interesting finding, on my laptop (ThinkPad P14s Gen 1 running Ubuntu
24.04.3), is different from Mark Wong's. On my laptop, using three Postgres
installations all compiled with the -O2 optimization flag, I see speedups
with the v4.2 patch except for a 2% slowdown with CSV with 1/3rd quotes (a
2% slowdown). But with Nazir's proposed v3 patch, I see improvements across
the board. So even for a text file with 1/3rd escape characters, and even
with a CSV file with 1/3rd quotes, I see speedups of 11% and 26%
respectively.

The format of these test files originally comes from Ayoub Kazar's test
scripts; all Mark and I have done in playing with them is make them much
larger: 5,000,000 rows, based on the assumption that longer tests are
better tests.

I find my results interesting enough that I'd be curious to know if anybody
else can reproduce them. It is very interesting that Mark's results are
noticeably different from mine.
--
-- Manni Wood EDB: https://www.enterprisedb.com


Attachments:

  [application/x-shellscript] manni-simd-copy-bench-v1.2.1.sh (2.6K, 3-manni-simd-copy-bench-v1.2.1.sh)
  download

  [image/png] simd_copy_performance_2025_12_12.png (364.8K, 4-simd_copy_performance_2025_12_12.png)
  download | view image

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-18 07:35  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-12-18 07:35 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Mark Wong <[email protected]>; KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 13 Dec 2025 at 02:09, Manni Wood <[email protected]> wrote:
>
> Hello, Everyone!
>
> I have attached two files. 1) the shell script that Mark and I have been using to get our test results, and 2) a screenshot of a spreadsheet of my latest test results. (Please let me know if there's a different format than a screenshot that I could share my spreadsheet in.)
>
> I took greater care this time to compile all three variants of Postgres (master at bfb335df, master at bfb335df with v4.2 patches installed, master at bfb335df with v3 patches installed) with the same gcc optimization flags that would be used to build Postgres packages. To the best of my knowledge, the two gcc flags of greatest interest would be -g and -O2. I built all three variants of Postgres using meson like so:
>
> BRANCH=$(git branch --show-current)
> meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH} --buildtype=debugoptimized
>
> It occurred to me that in addition to end users only caring about 1) wall clock time (is the speedup noticeable in "real time" or just technically faster / uses less CPU?) and 2) Postgres binaries compiled with the same optimization level one would get when installing Postgres from packages like .deb or .rpm; in other words, will the user see speedups without having do manually compile postgres.
>
> My interesting finding, on my laptop (ThinkPad P14s Gen 1 running Ubuntu 24.04.3), is different from Mark Wong's. On my laptop, using three Postgres installations all compiled with the -O2 optimization flag, I see speedups with the v4.2 patch except for a 2% slowdown with CSV with 1/3rd quotes (a 2% slowdown). But with Nazir's proposed v3 patch, I see improvements across the board. So even for a text file with 1/3rd escape characters, and even with a CSV file with 1/3rd quotes, I see speedups of 11% and 26% respectively.
>
> The format of these test files originally comes from Ayoub Kazar's test scripts; all Mark and I have done in playing with them is make them much larger: 5,000,000 rows, based on the assumption that longer tests are better tests.
>
> I find my results interesting enough that I'd be curious to know if anybody else can reproduce them. It is very interesting that Mark's results are noticeably different from mine.

Thank you for sharing the benchmark script! I ran the benchmarks using
your script with --buildtype=debugoptimized. My results are below:

master: 85ddcc2f4c

text, no special: 102294
text, 1/3 special: 108946
csv, no special: 121831
csv, 1/3 special: 140063

v3

text, no special: 88890 (13.1% speedup)
text, 1/3 special: 110463 (1.4% regression)
csv, no special: 89781 (26.3% speedup)
csv, 1/3 special: 147094 (5.0% regression)

v4.2

text, no special: 87785 (14.2% speedup)
text, 1/3 special: 127008 (16.6% regression)
csv, no special: 88093 (27.7% speedup)
csv, 1/3 special: 164487 (17.4% regression)

One thing I noticed is that your benchmark timings appear to have some
variance. In my runs, I did not observe differences greater than one
second between runs. It is possible that this variance is affecting
your results.

Before running the benchmarks, I use the these commands [1] to improve
result stability; they might be helpful if you are not already using
something similar:

I did this benchmark on my local and my specs are Intel i5 13600k,
32GB Memory and SATA SSD.

[1]
sudo cpupower frequency-set --governor=performance
sudo cpupower idle-set -D 0 # disable idle
echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo (intel only)

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-24 15:07  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2025-12-24 15:07 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; Mark Wong <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,
Following the same path of optimizing COPY FROM using SIMD, i found that
COPY TO can also benefit from this.

I attached a small patch that uses SIMD to skip data and advance as far as
the first special character is found, then fallback to scalar processing
for that character and re-enter the SIMD path again...
There's two ways to do this:
1) Essentially we do SIMD until we find a special character, then continue
scalar path without re-entering SIMD again.
- This gives from 10% to 30% speedups depending on the weight of special
characters in the attribute, we don't lose anything here since it advances
with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
chars).

2) Do SIMD path, then use scalar path when we hit a special character, keep
re-entering the SIMD path each time.
- This is equivalent to the COPY FROM story, we'll need to find the same
heuristic to use for both COPY FROM/TO to reduce the regressions (same
regressions: around from 20% to 30% with 1/3, 2/3 specials chars).

Something else to note is that the scalar path for COPY TO isn't as heavy
as the state machine in COPY FROM.

So if we find the sweet spot for the heuristic, doing the same for COPY TO
will be trivial and always beneficial.
Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the
second one.


Regards,
Ayoub


Attachments:

  [text/x-patch] 0005-Speed-up-COPY-TO-text-CSV-using-SIMD.patch (9.2K, 3-0005-Speed-up-COPY-TO-text-CSV-using-SIMD.patch)
  download | inline diff:
From 319e5402e35429943d80ba136f27e6185410e6f5 Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Wed, 24 Dec 2025 15:20:53 +0100
Subject: [PATCH] Speed up COPY TO text CSV using SIMD

---
 src/backend/commands/copyto.c | 252 ++++++++++++++++++++++------------
 1 file changed, 167 insertions(+), 85 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e1306728509..b9d7b55f1ab 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1268,38 +1268,63 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
-		#ifndef USE_NO_SIMD
+		const char *end = ptr + strlen(ptr);
+
+		while ((c = *ptr) != '\0')
+		{
+#ifndef USE_NO_SIMD
+			/*
+			 * SIMD fast path: scan ahead for special characters.
+			 * We re-enter this path after handling each special character
+			 * to maximize the benefit of vectorization.
+			 */
 			{
-				const char* end = ptr + strlen(ptr);
-				while (ptr + sizeof(Vector8) <= end) {
-					Vector8 chunk;
-					Vector8 control_mask;
-					Vector8 backslash_mask;
-					Vector8 delim_mask;
-					Vector8 special_mask;
-					uint32 mask;
+				
+				while (ptr + sizeof(Vector8) <= end)
+				{
+					Vector8		chunk;
+					Vector8		control_mask;
+					Vector8		backslash_mask;
+					Vector8		delim_mask;
+					Vector8		special_mask;
+					uint32		mask;
 
 					vector8_load(&chunk, (const uint8 *) ptr);
+					
+					/* Check for control characters (< 0x20) */
 					control_mask = vector8_gt(vector8_broadcast(0x20), chunk);
-					backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk);
-					delim_mask = vector8_eq(vector8_broadcast(delimc), chunk);
+					
+					/* Check for backslash and delimiter */
+					backslash_mask = vector8_eq(chunk, vector8_broadcast('\\'));
+					delim_mask = vector8_eq(chunk, vector8_broadcast(delimc));
+					
 
-					special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask));
+					/* Combine all masks */
+					special_mask = vector8_or(
+						vector8_or(control_mask, backslash_mask), delim_mask);
 
 					mask = vector8_highbit_mask(special_mask);
-					if (mask != 0) {
+					if (mask != 0)
+					{
+						/* Found special character, advance to it */
 						int advance = pg_rightmost_one_pos32(mask);
 						ptr += advance;
 						break;
 					}
 
+					/* No special characters in this chunk, advance */
 					ptr += sizeof(Vector8);
 				}
-			} 
-		#endif
+				
+				/* Update c after SIMD scan */
+				c = *ptr;
+			}
+#endif /* !USE_NO_SIMD */
+
+			/* Scalar handling - same code for SIMD and non-SIMD builds */
+			if (c == '\0')
+				break;
 
-		while ((c = *ptr) != '\0')
-		{
 			if ((unsigned char) c < (unsigned char) 0x20)
 			{
 				/*
@@ -1358,38 +1383,60 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
-		#ifndef USE_NO_SIMD
+		const char *end = ptr + strlen(ptr);
+
+		while ((c = *ptr) != '\0')
+		{
+#ifndef USE_NO_SIMD
+			/*
+			 * SIMD fast path: scan ahead for special characters.
+			 */
 			{
-				const char* end = ptr + strlen(ptr);
-				while (ptr + sizeof(Vector8) <= end) {
-					Vector8 chunk;
-					Vector8 control_mask;
-					Vector8 backslash_mask;
-					Vector8 delim_mask;
-					Vector8 special_mask;
-					uint32 mask;
+				
+				while (ptr + sizeof(Vector8) <= end)
+				{
+					Vector8		chunk;
+					Vector8		control_mask;
+					Vector8		backslash_mask;
+					Vector8		delim_mask;
+					Vector8		special_mask;
+					uint32		mask;
 
 					vector8_load(&chunk, (const uint8 *) ptr);
+					
+					/* Check for control characters (< 0x20) */
 					control_mask = vector8_gt(vector8_broadcast(0x20), chunk);
-					backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk);
-					delim_mask = vector8_eq(vector8_broadcast(delimc), chunk);
+					
+					/* Check for backslash and delimiter */
+					backslash_mask = vector8_eq(chunk, vector8_broadcast('\\'));
+					delim_mask = vector8_eq(chunk, vector8_broadcast(delimc));
 
-					special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask));
+					/* Combine masks */
+					special_mask = vector8_or(control_mask, 
+											  vector8_or(backslash_mask, delim_mask));
 
 					mask = vector8_highbit_mask(special_mask);
-					if (mask != 0) {
+					if (mask != 0)
+					{
+						/* Found special character */
 						int advance = pg_rightmost_one_pos32(mask);
 						ptr += advance;
 						break;
 					}
 
+					/* No special characters, advance */
 					ptr += sizeof(Vector8);
 				}
-			} 
-		#endif
+				
+				/* Update c after SIMD scan */
+				c = *ptr;
+			}
+#endif /* !USE_NO_SIMD */
+
+			/* Scalar handling - same for SIMD and non-SIMD */
+			if (c == '\0')
+				break;
 
-		while ((c = *ptr) != '\0')
-		{
 			if ((unsigned char) c < (unsigned char) 0x20)
 			{
 				/*
@@ -1489,53 +1536,68 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		else
 		{
 			const char *tptr = ptr;
+			const char *end = tptr + strlen(tptr);
+			
+			while ((c = *tptr) != '\0') 
+			{
+#ifndef USE_NO_SIMD
+			/*
+			 * SIMD accelerated quote detection.
+			 */
+			{	
+				Vector8		delim_vec;
+				Vector8		quote_vec;
+				Vector8		newline_vec;
+				Vector8		cr_vec;
+				
+				delim_vec = vector8_broadcast(delimc);
+				quote_vec = vector8_broadcast(quotec);
+				newline_vec = vector8_broadcast('\n');
+				cr_vec = vector8_broadcast('\r');
+
+				while (tptr + sizeof(Vector8) <= end)
+				{
+					Vector8		chunk;
+					Vector8		special_mask;
+					uint32		mask;
 
-			#ifndef USE_NO_SIMD
-				{	
-					const char* end = tptr + strlen(tptr);
-
-					Vector8 delim_mask = vector8_broadcast(delimc);
-					Vector8 quote_mask = vector8_broadcast(quotec);
-					Vector8 newline_mask = vector8_broadcast('\n');
-					Vector8 carriage_return_mask = vector8_broadcast('\r');
-
-					while (tptr + sizeof(Vector8) <= end) {
-						Vector8 chunk;
-						Vector8 special_mask;
-						uint32 mask;
-
-						vector8_load(&chunk, (const uint8 *) tptr);
-						special_mask = vector8_or(
-							vector8_or(vector8_eq(chunk, delim_mask),
-									   vector8_eq(chunk, quote_mask)),
-							vector8_or(vector8_eq(chunk, newline_mask),
-									   vector8_eq(chunk, carriage_return_mask))
-						);
-
-						mask = vector8_highbit_mask(special_mask);
-						if (mask != 0) {
-							tptr += pg_rightmost_one_pos32(mask);
-							use_quote = true;
-							break;
-						}
+					vector8_load(&chunk, (const uint8 *) tptr);
+					
+					special_mask = vector8_or(
+						vector8_or(vector8_eq(chunk, delim_vec),
+								   vector8_eq(chunk, quote_vec)),
+						vector8_or(vector8_eq(chunk, newline_vec),
+								   vector8_eq(chunk, cr_vec)));
 
-						tptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(special_mask);
+					if (mask != 0)
+					{
+						tptr += pg_rightmost_one_pos32(mask);
+						use_quote = true;
+						break;
 					}
+
+					tptr += sizeof(Vector8);
 				}
-			#endif
+			}
+#endif /* !USE_NO_SIMD */
 
-			while ((c = *tptr) != '\0')
+			/*
+			 * Scalar scan for remaining bytes (tail after SIMD, or entire
+			 * string if USE_NO_SIMD).
+			 */
+			if ((c = *tptr) != '\0')
 			{
 				if (c == delimc || c == quotec || c == '\n' || c == '\r')
 				{
 					use_quote = true;
-					break;
 				}
 				if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
 					tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
 				else
 					tptr++;
 			}
+			}
 		}
 	}
 
@@ -1548,37 +1610,57 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 */
 		start = ptr;
 
-		#ifndef USE_NO_SIMD
-			{	
-				const char* end = ptr + strlen(ptr);
-
-				Vector8 escape_mask = vector8_broadcast(escapec);
-				Vector8 quote_mask = vector8_broadcast(quotec);
+		const char *end = ptr + strlen(ptr);
 
-				while (ptr + sizeof(Vector8) <= end) {
-					Vector8 chunk;
-					Vector8 special_mask;
-					uint32 mask;
+		while ((c = *ptr) != '\0')
+		{
+#ifndef USE_NO_SIMD
+			/*
+			 * SIMD fast path: scan ahead for quote/escape characters.
+			 * Re-enter after handling each special character.
+			 */
+			{	
+				Vector8		escape_vec;
+				Vector8		quote_vec;
+				
+				/* Pre-compute broadcast vectors */
+				escape_vec = vector8_broadcast(escapec);
+				quote_vec = vector8_broadcast(quotec);
+
+				while (ptr + sizeof(Vector8) <= end)
+				{
+					Vector8		chunk;
+					Vector8		special_mask;
+					uint32		mask;
 
 					vector8_load(&chunk, (const uint8 *) ptr);
+					
 					special_mask = vector8_or(
-						vector8_eq(chunk, escape_mask), 
-							vector8_eq(chunk, quote_mask));
+						vector8_eq(chunk, escape_vec), 
+						vector8_eq(chunk, quote_vec));
 
 					mask = vector8_highbit_mask(special_mask);
-					if (mask != 0) {
-						ptr += pg_rightmost_one_pos32(mask);
-						use_quote = true;
+					if (mask != 0)
+					{
+						/* Found special character */
+						int advance = pg_rightmost_one_pos32(mask);
+						ptr += advance;
 						break;
 					}
 
+					/* No special characters in this chunk */
 					ptr += sizeof(Vector8);
 				}
+				
+				/* Update c after SIMD scan */
+				c = *ptr;
 			}
-		#endif
-		
-		while ((c = *ptr) != '\0')
-		{
+#endif /* !USE_NO_SIMD */
+
+			/* Scalar handling - same code for SIMD and non-SIMD builds */
+			if (c == '\0')
+				break;
+
 			if (c == quotec || c == escapec)
 			{
 				DUMPSOFAR();
-- 
2.34.1



  [text/x-patch] 0004-Speed-up-COPY-TO-text-CSV-using-SIMD.patch (4.8K, 4-0004-Speed-up-COPY-TO-text-CSV-using-SIMD.patch)
  download | inline diff:
From bfc580b17ad5e6d981adc146c24690afe4634ce1 Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Wed, 24 Dec 2025 12:55:15 +0100
Subject: [PATCH] Speed up COPY TO text CSV using SIMD

---
 src/backend/commands/copyto.c | 126 ++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index dae91630ac3..e1306728509 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -31,6 +31,8 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1266,6 +1268,36 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
+		#ifndef USE_NO_SIMD
+			{
+				const char* end = ptr + strlen(ptr);
+				while (ptr + sizeof(Vector8) <= end) {
+					Vector8 chunk;
+					Vector8 control_mask;
+					Vector8 backslash_mask;
+					Vector8 delim_mask;
+					Vector8 special_mask;
+					uint32 mask;
+
+					vector8_load(&chunk, (const uint8 *) ptr);
+					control_mask = vector8_gt(vector8_broadcast(0x20), chunk);
+					backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk);
+					delim_mask = vector8_eq(vector8_broadcast(delimc), chunk);
+
+					special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask));
+
+					mask = vector8_highbit_mask(special_mask);
+					if (mask != 0) {
+						int advance = pg_rightmost_one_pos32(mask);
+						ptr += advance;
+						break;
+					}
+
+					ptr += sizeof(Vector8);
+				}
+			} 
+		#endif
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1326,6 +1358,36 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
+		#ifndef USE_NO_SIMD
+			{
+				const char* end = ptr + strlen(ptr);
+				while (ptr + sizeof(Vector8) <= end) {
+					Vector8 chunk;
+					Vector8 control_mask;
+					Vector8 backslash_mask;
+					Vector8 delim_mask;
+					Vector8 special_mask;
+					uint32 mask;
+
+					vector8_load(&chunk, (const uint8 *) ptr);
+					control_mask = vector8_gt(vector8_broadcast(0x20), chunk);
+					backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk);
+					delim_mask = vector8_eq(vector8_broadcast(delimc), chunk);
+
+					special_mask = vector8_or(control_mask, vector8_or(backslash_mask, delim_mask));
+
+					mask = vector8_highbit_mask(special_mask);
+					if (mask != 0) {
+						int advance = pg_rightmost_one_pos32(mask);
+						ptr += advance;
+						break;
+					}
+
+					ptr += sizeof(Vector8);
+				}
+			} 
+		#endif
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1428,6 +1490,40 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		{
 			const char *tptr = ptr;
 
+			#ifndef USE_NO_SIMD
+				{	
+					const char* end = tptr + strlen(tptr);
+
+					Vector8 delim_mask = vector8_broadcast(delimc);
+					Vector8 quote_mask = vector8_broadcast(quotec);
+					Vector8 newline_mask = vector8_broadcast('\n');
+					Vector8 carriage_return_mask = vector8_broadcast('\r');
+
+					while (tptr + sizeof(Vector8) <= end) {
+						Vector8 chunk;
+						Vector8 special_mask;
+						uint32 mask;
+
+						vector8_load(&chunk, (const uint8 *) tptr);
+						special_mask = vector8_or(
+							vector8_or(vector8_eq(chunk, delim_mask),
+									   vector8_eq(chunk, quote_mask)),
+							vector8_or(vector8_eq(chunk, newline_mask),
+									   vector8_eq(chunk, carriage_return_mask))
+						);
+
+						mask = vector8_highbit_mask(special_mask);
+						if (mask != 0) {
+							tptr += pg_rightmost_one_pos32(mask);
+							use_quote = true;
+							break;
+						}
+
+						tptr += sizeof(Vector8);
+					}
+				}
+			#endif
+
 			while ((c = *tptr) != '\0')
 			{
 				if (c == delimc || c == quotec || c == '\n' || c == '\r')
@@ -1451,6 +1547,36 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 * We adopt the same optimization strategy as in CopyAttributeOutText
 		 */
 		start = ptr;
+
+		#ifndef USE_NO_SIMD
+			{	
+				const char* end = ptr + strlen(ptr);
+
+				Vector8 escape_mask = vector8_broadcast(escapec);
+				Vector8 quote_mask = vector8_broadcast(quotec);
+
+				while (ptr + sizeof(Vector8) <= end) {
+					Vector8 chunk;
+					Vector8 special_mask;
+					uint32 mask;
+
+					vector8_load(&chunk, (const uint8 *) ptr);
+					special_mask = vector8_or(
+						vector8_eq(chunk, escape_mask), 
+							vector8_eq(chunk, quote_mask));
+
+					mask = vector8_highbit_mask(special_mask);
+					if (mask != 0) {
+						ptr += pg_rightmost_one_pos32(mask);
+						use_quote = true;
+						break;
+					}
+
+					ptr += sizeof(Vector8);
+				}
+			}
+		#endif
+		
 		while ((c = *ptr) != '\0')
 		{
 			if (c == quotec || c == escapec)
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-29 17:03  Manni Wood <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Manni Wood @ 2025-12-29 17:03 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Mark Wong <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Dec 24, 2025 at 9:08 AM KAZAR Ayoub <[email protected]> wrote:

> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that
> COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as
> the first special character is found, then fallback to scalar processing
> for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue
> scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special
> characters in the attribute, we don't lose anything here since it advances
> with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials
> chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character,
> keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same
> heuristic to use for both COPY FROM/TO to reduce the regressions (same
> regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy
> as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO
> will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the
> second one.
>
>
> Regards,
> Ayoub
>

Hello, Nazir and Ayoub!

Nazir, sorry for the late reply, I am on holiday. :-) I wanted to thank you
for the tips on using cpupower to get less variance in my test results.

Ayoub, I suppose it was inevitable the SIMD patch would work for copying
out as well as copying in!

I am back at work on 5 Jan 2026, so I till try to carve out time to test
this then, using Nazir's tips.

Happy Holidays!

-Manni
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2025-12-31 13:04  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2025-12-31 13:04 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Manni Wood <[email protected]>; Mark Wong <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 24 Dec 2025 at 18:08, KAZAR Ayoub <[email protected]> wrote:
>
> Hello,
> Following the same path of optimizing COPY FROM using SIMD, i found that COPY TO can also benefit from this.
>
> I attached a small patch that uses SIMD to skip data and advance as far as the first special character is found, then fallback to scalar processing for that character and re-enter the SIMD path again...
> There's two ways to do this:
> 1) Essentially we do SIMD until we find a special character, then continue scalar path without re-entering SIMD again.
> - This gives from 10% to 30% speedups depending on the weight of special characters in the attribute, we don't lose anything here since it advances with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials chars).
>
> 2) Do SIMD path, then use scalar path when we hit a special character, keep re-entering the SIMD path each time.
> - This is equivalent to the COPY FROM story, we'll need to find the same heuristic to use for both COPY FROM/TO to reduce the regressions (same regressions: around from 20% to 30% with 1/3, 2/3 specials chars).
>
> Something else to note is that the scalar path for COPY TO isn't as heavy as the state machine in COPY FROM.
>
> So if we find the sweet spot for the heuristic, doing the same for COPY TO will be trivial and always beneficial.
> Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is the second one.

Patches look correct to me. I think we could move these SIMD code
portions into a shared function to remove duplication, although that
might have a performance impact. I have not benchmarked these patches
yet.

Another consideration is that these patches might need their own
thread, though I am not completely sure about this yet.

One question: what do you think about having a 0004-style approach for
COPY FROM? What I have in mind is running SIMD for each line & column,
stopping SIMD once it can no longer skip an entire chunk, and then
continuing with the next line & column.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 00:09  Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-02-20 00:09 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 19, 2026 at 4:37 PM KAZAR Ayoub <[email protected]> wrote:

> Hello,
>
> I ran some long benchmarks on this, and I got stable results across
> multiple runs (few milliseconds difference)
>
> This is on an Intel I7-1255U CPU with:
> sudo cpupower frequency-set --governor=performance
> sudo cpupower idle-set -D 0
> echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
>
> WIDE (500k rows)
>
> TXT | none
> Master avg: 22,183 ms
> New avg: 20,435 ms
> Improvement: -7.88%
>
> CSV | none
> Master avg: 26,737 ms
> New avg: 24,625 ms
> Improvement: -7.90%
>
> TXT | escape
> Master avg: 26,720 ms
> New avg: 23,658 ms
> Improvement: -11.46%
>
> CSV | quote
> Master avg: 35,961 ms
> New avg: 33,317 ms
> Improvement: -7.35%
>
> --------------------------------------
>
> NARROW (1.5M rows)
>
> TXT | none
> Master avg: 2,220 ms
> New avg: 2,125 ms
> Improvement: -4.28%
>
> CSV | none
> Master avg: 2,330 ms
> New avg: 2,145 ms
> Improvement: -7.92%
>
> TXT | escape
> Master avg: 2,425 ms
> New avg: 2,187 ms
> Improvement: -9.79%
>
> CSV | quote
> Master avg: 2,272 ms
> New avg: 2,253 ms
> Improvement: -0.85%
>
> No regressions as expected, overall this looks good.
>
> Regards,
>
> Ayoub
>
> On Thu, Feb 19, 2026 at 10:01 AM Nazir Bilal Yavuz <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Thu, 19 Feb 2026 at 07:02, Manni Wood <[email protected]>
>> wrote:
>> >
>> > I took some time tonight to apply v8 to the latest master (759b03b2) on
>> my x86 tower and arm raspberry pi 5.
>> >
>> > Here are the results, using both narrow columns and the wider columns
>> we've been using througout:
>> >
>> > x86 master NARROW
>> > TXT :                 2587.642000 ms
>> > CSV :                 2621.759000 ms
>> > TXT with 1/3 escapes: 2707.933500 ms
>> > CSV with 1/3 quotes:  3254.896500 ms
>> >
>> > x86 v8 NARROW
>> > TXT :                 2488.655250 ms  3.825365% improvement
>> > CSV :                 2628.818000 ms  -0.269247% regression
>> > TXT with 1/3 escapes: 2615.522000 ms  3.412621% improvement
>> > CSV with 1/3 quotes:  3446.368000 ms  -5.882568% regression
>> >
>> > x86 master WIDE
>> > TXT :                 30583.229500 ms
>> > CSV :                 35054.533500 ms
>> > TXT with 1/3 escapes: 32767.421500 ms
>> > CSV with 1/3 quotes:  44214.163500 ms
>> >
>> > x86 v8 WIDE
>> > TXT :                 26527.494250 ms  13.261305% improvement
>> > CSV :                 33364.443750 ms  4.821316% improvement
>> > TXT with 1/3 escapes: 29320.648000 ms  10.518904% improvement
>> > CSV with 1/3 quotes:  42334.074750 ms  4.252232% improvement
>> >
>> >
>> >
>> > arm master NARROW
>> > TXT :                 1999.401000 ms
>> > CSV :                 2081.610750 ms
>> > TXT with 1/3 escapes: 2053.230250 ms
>> > CSV with 1/3 quotes:  2431.608750 ms
>> >
>> > arm v8 NARROW
>> > TXT :                 1981.663750 ms  0.887128% improvement
>> > CSV :                 2023.892500 ms  2.772769% improvement
>> > TXT with 1/3 escapes: 2004.215250 ms  2.387214% improvement
>> > CSV with 1/3 quotes:  2616.872750 ms  -7.618989% regression
>> >
>> > arm master WIDE
>> > TXT :                 9120.731750 ms
>> > CSV :                 11114.478250 ms
>> > TXT with 1/3 escapes: 10338.124500 ms
>> > CSV with 1/3 quotes:  13404.430250 ms
>> >
>> > arm v8 WIDE
>> > TXT :                 8430.090750 ms  7.572210% improvement
>> > CSV :                 10115.135500 ms  8.991360% improvement
>> > TXT with 1/3 escapes: 9624.383500 ms  6.903970% improvement
>> > CSV with 1/3 quotes:  12331.714000 ms  8.002699% improvement
>>
>> Thank you for the results, they are interesting. I didn't expect to
>> see any regression for this benchmark. Also, I would expect the
>> non-special character cases and the 1/3 special character cases to
>> perform similarly, since we are not using SIMD for this benchmark.
>>
>> I noticed that the timings in your narrow benchmark (both x86 and ARM)
>> are quite short. Would it be possible to extend the test so that the
>> total runtime is closer to ~10,000 ms? That might give us more stable
>> results.
>>
>> Here is my benchmark with using your script:
>>
>> WIDE: Total 500000 lines and each line is 4096 bytes.
>> NARROW: Total 1500000 lines and each line is 2-4 bytes (`"A""A"` and
>> `A\\A`).
>>
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | WIDE    | TXT None      | TXT 1/3       | CSV None      | CSV 1/3
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master  | 10512         | 11133         | 12241         | 14321
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 10000 (-%4.8) | 10804 (-%2.9) | 11571 (-%5.4) | 14008
>> (-%2.18) |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> |         |               |               |               |
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | NARROW  |               |               |               |
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | master  | 9702          | 9745          | 9784          | 10149
>>   |
>>
>> +---------+---------------+---------------+---------------+----------------+
>> | patched | 9344 (-%3.6)  | 9477 (-%2.7)  | 9439 (-%3.5)  | 9751 (-%3.9)
>>  |
>>
>> +---------+---------------+---------------+---------------+----------------+
>>
>> The results look promising to me.
>>
>> --
>> Regards,
>> Nazir Bilal Yavuz
>> Microsoft
>>
>
Hello!

Thanks for running benchmarks, Ayoub.

Nazir, I ran my benchmarks with more rows this time --- as many rows as
would fit on  my test computers without exhausting their RAM disks. That
seems to have brought things more into line with what Ayoub saw. I did get
some small regressions, but I suspect those are not a big deal. (For
instance, on both machines I also noticed the occasional "truncate table"
would take longer than the others, despite my scripts' best efforts to
steady a CPU core and pin postmaster and children to that core.)

x86 WIDE master 500,000 rows
TXT :                 30602.244000 ms
CSV :                 35062.451250 ms
TXT with 1/3 escapes: 32704.250250 ms
CSV with 1/3 quotes:  44128.072500 ms

x86 WIDE v8 500,000 rows
TXT :                 26611.953250 ms  13.039210% improvement
CSV :                 33366.184000 ms  4.837846% improvement
TXT with 1/3 escapes: 29251.310000 ms  10.558078% improvement
CSV with 1/3 quotes:  42368.421000 ms  3.987601% improvement

x86 NARROW master 50mil rows
TXT :                 25898.004000 ms
CSV :                 27212.684500 ms
TXT with 1/3 escapes: 29189.518250 ms
CSV with 1/3 quotes:  33222.510250 ms

x86 NARROW v8 50mil rows
TXT :                 26368.765000 ms  -1.817750% regression
CSV :                 26711.122250 ms  1.843119% improvement
TXT with 1/3 escapes: 28081.150750 ms  3.797142% improvement
CSV with 1/3 quotes:  32851.963500 ms  1.115348% improvement


arm WIDE master 250,000 rows
TXT :                 11392.462750 ms
CSV :                 13887.576500 ms
TXT with 1/3 escapes: 12908.560750 ms
CSV with 1/3 quotes:  16699.337000 ms

arm WIDE v8 250,000 rows
TXT :                 10524.567750 ms  7.618151% improvement
CSV :                 12621.211250 ms  9.118691% improvement
TXT with 1/3 escapes: 12017.030250 ms  6.906506% improvement
CSV with 1/3 quotes:  15428.020500 ms  7.612976% improvement

arm NARROW master 25mil rows
TXT :                 10030.274000 ms
CSV :                 10245.238750 ms
TXT with 1/3 escapes: 10345.224500 ms
CSV with 1/3 quotes:  12186.313250 ms

arm NARROW v8 25mil rows
TXT :                 10197.386500 ms  -1.666081% regression
CSV :                 10257.918750 ms  -0.123765% regression
TXT with 1/3 escapes: 10084.978500 ms  2.515615% improvement
CSV with 1/3 quotes:  12064.215000 ms  1.001929% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 09:50  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-20 09:50 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 20 Feb 2026 at 03:09, Manni Wood <[email protected]> wrote:
>
> Thanks for running benchmarks, Ayoub.
>
> Nazir, I ran my benchmarks with more rows this time --- as many rows as would fit on  my test computers without exhausting their RAM disks. That seems to have brought things more into line with what Ayoub saw. I did get some small regressions, but I suspect those are not a big deal. (For instance, on both machines I also noticed the occasional "truncate table" would take longer than the others, despite my scripts' best efforts to steady a CPU core and pin postmaster and children to that core.)

Thank you both for the benchmarks. Results look good to me!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-20 18:15  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-02-20 18:15 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Feb 20, 2026 at 12:50:35PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 20 Feb 2026 at 03:09, Manni Wood <[email protected]> wrote:
>> Nazir, I ran my benchmarks with more rows this time --- as many rows as
>> would fit on  my test computers without exhausting their RAM disks. That
>> seems to have brought things more into line with what Ayoub saw. I did
>> get some small regressions, but I suspect those are not a big deal. (For
>> instance, on both machines I also noticed the occasional "truncate
>> table" would take longer than the others, despite my scripts' best
>> efforts to steady a CPU core and pin postmaster and children to that
>> core.)

Yeah, the couple of small regressions seem close to (or below) the noise
level, and IIUC yours were the only benchmarks that showed them, anyway.
Plus, I think we'll need this change regardless as a prerequisite for the
SIMD work.

> Thank you both for the benchmarks. Results look good to me!

Committed that part.

-- 
nathan






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-23 09:10  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-23 09:10 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 20 Feb 2026 at 21:15, Nathan Bossart <[email protected]> wrote:
>
> Yeah, the couple of small regressions seem close to (or below) the noise
> level, and IIUC yours were the only benchmarks that showed them, anyway.
> Plus, I think we'll need this change regardless as a prerequisite for the
> SIMD work.
>
> > Thank you both for the benchmarks. Results look good to me!
>
> Committed that part.

Thank you! Attaching the SIMD patch only.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v10-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (7.7K, 2-v10-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 9ef4e1376657b577cd4b4c42fb6a592ebd5fae24 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:28:55 +0300
Subject: [PATCH v10] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 135 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2b7556b287c..3dd159f15b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1717,6 +1717,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 6b00d49c50f..7bdf5681628 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -142,7 +143,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
 static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
-														bool is_csv);
+														bool is_csv,
+														bool simd_enabled);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1182,9 +1184,19 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	 * specialized code with fewer branches.
 	 */
 	if (is_csv)
-		result = CopyReadLineText(cstate, true);
+	{
+		if (cstate->simd_enabled)
+			result = CopyReadLineText(cstate, true, true);
+		else
+			result = CopyReadLineText(cstate, true, false);
+	}
 	else
-		result = CopyReadLineText(cstate, false);
+	{
+		if (cstate->simd_enabled)
+			result = CopyReadLineText(cstate, false, true);
+		else
+			result = CopyReadLineText(cstate, false, false);
+	}
 
 	if (result)
 	{
@@ -1252,7 +1264,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static pg_attribute_always_inline bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1267,6 +1279,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1274,6 +1294,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1340,6 +1366,107 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - When the remaining buffer is smaller than one
+		 * vector width (sizeof(Vector8)), since SIMD operates on fixed-size
+		 * chunks.
+		 *
+		 * Note that, SIMD may become slower when the input contains many
+		 * special characters. To avoid this regression, we disable SIMD for
+		 * the rest of the input once we encounter a special character which
+		 * is neither EOF nor EOL.
+		 */
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+
+				simd_hit_eof = (c1 == '\\' && c2 == '.' && !is_csv);
+				simd_hit_eol = (c1 == '\r' || c1 == '\n');
+
+				/*
+				 * If (is_csv && in_quote), we shouldn't have picked up '\r'
+				 * or '\n' in the first place.
+				 */
+				Assert(!simd_hit_eol || !(is_csv && in_quote));
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+				{
+					simd_enabled = false;
+					cstate->simd_enabled = false;
+				}
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 822ef33cf69..73ce777c52b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 04:44  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-02-24 04:44 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Feb 23, 2026 at 3:10 AM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Fri, 20 Feb 2026 at 21:15, Nathan Bossart <[email protected]>
> wrote:
> >
> > Yeah, the couple of small regressions seem close to (or below) the noise
> > level, and IIUC yours were the only benchmarks that showed them, anyway.
> > Plus, I think we'll need this change regardless as a prerequisite for the
> > SIMD work.
> >
> > > Thank you both for the benchmarks. Results look good to me!
> >
> > Committed that part.
>
> Thank you! Attaching the SIMD patch only.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello!

I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit surprised
at the regression for x86 with wide rows for the 1/3rd special characters
scenarios. I'm hoping it's something I did wrong. If anyone else has
numbers to share, that would be excellent.

x86 NARROW master 50,000,000 rows
TXT :                 26359.319000 ms
CSV :                 25661.199750 ms
TXT with 1/3 escapes: 28170.085250 ms
CSV with 1/3 quotes:  32638.147500 ms

x86 NARROW v10 50,000,000 rows
TXT :                 26416.331500 ms  -0.216290% regression
CSV :                 25318.727500 ms  1.334592% improvement
TXT with 1/3 escapes: 28608.007500 ms  -1.554565% regression
CSV with 1/3 quotes:  32805.627750 ms  -0.513143% regression

x86 WIDE master 500,000 rows
TXT :                 26475.164250 ms
CSV :                 31963.478500 ms
TXT with 1/3 escapes: 29671.120750 ms
CSV with 1/3 quotes:  40391.616250 ms

x86 WIDE v10 500,000 rows
TXT :                 23067.046750 ms  12.872885% improvement
CSV :                 23259.092250 ms  27.232287% improvement
TXT with 1/3 escapes: 31796.098250 ms  -7.161770% regression
CSV with 1/3 quotes:  42925.792250 ms  -6.274015% regression



arm NARROW master 25,000,000 rows
TXT :                 10077.096250 ms
CSV :                 10310.671250 ms
TXT with 1/3 escapes: 9893.155000 ms
CSV with 1/3 quotes:  12133.064750 ms

arm NARROW v10 25,000,000 rows
TXT :                 10467.816750 ms  -3.877312% regression
CSV :                 9986.288000 ms  3.146092% improvement
TXT with 1/3 escapes: 10323.173750 ms  -4.346629% regression
CSV with 1/3 quotes:  11843.611750 ms  2.385654% improvement

arm WIDE master 250,000 rows
TXT :                 10568.344750 ms
CSV :                 13046.610500 ms
TXT with 1/3 escapes: 12193.088500 ms
CSV with 1/3 quotes:  16629.319000 ms

arm WIDE v10 250,000 rows
TXT :                 9064.959000 ms  14.225366% improvement
CSV :                 9019.553250 ms  30.866693% improvement
TXT with 1/3 escapes: 12344.497250 ms  -1.241759% regression
CSV with 1/3 quotes:  15495.863750 ms  6.816005% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 13:57  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-24 13:57 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 07:44, Manni Wood <[email protected]> wrote:
>
> Hello!
>
> I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit surprised at the regression for x86 with wide rows for the 1/3rd special characters scenarios. I'm hoping it's something I did wrong. If anyone else has numbers to share, that would be excellent.

Thank you for doing this!

I see similar regression on the wide & CSV 1/3 case by using your
benchmark script. I didn't see this regression when I used my
benchmark while sharing v9 [1].

+-------------+---------------------------+---------------------------+
|             |            Text           |            CSV            |
+-------------+-------------+-------------+-------------+-------------+
|  WIDE TEST  |     None    |     1/3     |     None    |     1/3     |
+-------------+-------------+-------------+-------------+-------------+
|    Master   |     9996    |    10769    |    11548    |    13960    |
+-------------+-------------+-------------+-------------+-------------+
|     v10     | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
+-------------+-------------+-------------+-------------+-------------+
|             |             |             |             |             |
+-------------+-------------+-------------+-------------+-------------+
|             |            Text           |             |     CSV     |
+-------------+-------------+-------------+-------------+-------------+
| NARROW TEST |     None    |     1/3     |     None    |     1/3     |
+-------------+-------------+-------------+-------------+-------------+
|    Master   |     9441    |     9561    |     9734    |     9830    |
+-------------+-------------+-------------+-------------+-------------+
|     v10     |  9291 %-1.5 |  9504 -%0.5 |  9644 %-0.9 | 10078 %-2.4 |
+-------------+-------------+-------------+-------------+-------------+

I will investigate this. However, please note that the current master
includes the inlining commit (dc592a4155), which makes the COPY FROM
faster. In my case,

1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + SIMD: 15123ms (%5 regression against #1 and %8
regression against #2)

Is it possible for you to do a similar test? I mean dropping
dc592a4155 from the current master and re-running the benchmark, that
would be helpful.

[1] https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 15:07  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2026-02-24 15:07 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Tue, Feb 24, 2026 at 2:57 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Tue, 24 Feb 2026 at 07:44, Manni Wood <[email protected]>
> wrote:
> >
> > Hello!
> >
> > I ran some speed tests on Nazir's v10 SIMD-only patch. I'm a bit
> surprised at the regression for x86 with wide rows for the 1/3rd special
> characters scenarios. I'm hoping it's something I did wrong. If anyone else
> has numbers to share, that would be excellent.
>
> Thank you for doing this!
>
> I see similar regression on the wide & CSV 1/3 case by using your
> benchmark script. I didn't see this regression when I used my
> benchmark while sharing v9 [1].
>
> +-------------+---------------------------+---------------------------+
> |             |            Text           |            CSV            |
> +-------------+-------------+-------------+-------------+-------------+
> |  WIDE TEST  |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9996    |    10769    |    11548    |    13960    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     | 8912 %-10.8 | 10902 %+1.2 | 8952 %-22.4 | 15123 %+8.3 |
> +-------------+-------------+-------------+-------------+-------------+
> |             |             |             |             |             |
> +-------------+-------------+-------------+-------------+-------------+
> |             |            Text           |             |     CSV     |
> +-------------+-------------+-------------+-------------+-------------+
> | NARROW TEST |     None    |     1/3     |     None    |     1/3     |
> +-------------+-------------+-------------+-------------+-------------+
> |    Master   |     9441    |     9561    |     9734    |     9830    |
> +-------------+-------------+-------------+-------------+-------------+
> |     v10     |  9291 %-1.5 |  9504 -%0.5 |  9644 %-0.9 | 10078 %-2.4 |
> +-------------+-------------+-------------+-------------+-------------+
>
> I will investigate this. However, please note that the current master
> includes the inlining commit (dc592a4155), which makes the COPY FROM
> faster. In my case,
>
> 1: current master without dc592a4155: 14400ms
> 2: current master: 13960ms (%3 improvement against #1)
> 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> regression against #2)
>
> Is it possible for you to do a similar test? I mean dropping
> dc592a4155 from the current master and re-running the benchmark, that
> would be helpful.
>
> [1]
> https://postgr.es/m/CAN55FZ0MiFCgK26gRgE05a%3D_ggenkxDM8H%3DA2uTHpywczqt%3D-Q%40mail.gmail.com

Here are some numbers for v10 from my end, these are multiple long runs:
Master contains the previous inlining patch.

This is on an Intel I7-1255U CPU

WIDE (500k rows)

TXT | none
Master avg: 20,721 ms
New avg: 17,980 ms
Improvement: -13.23%

CSV | none
Master avg: 26,608 ms
New avg: 18,433 ms
Improvement: -30.73%

TXT | escape
Master avg: 25,069 ms
New avg: 22,910 ms
Improvement: -8.61%

CSV | quote
Master avg: 31,931 ms
New avg: 31,493 ms
Improvement: -1.37%

--------------------------------------

NARROW (15M rows)

TXT | none
Master avg: 20,687 ms
New avg: 20,824 ms
Regression: +0.67%

CSV | none
Master avg: 21,187 ms
New avg: 21,153 ms
Improvement: -0.16%

TXT | escape
Master avg: 20,870 ms
New avg: 21,341 ms
Regression: +2.25%

CSV | quote
Master avg: 22,074 ms
New avg: 22,267 ms
Regression: +0.87%

For narrow that would be mostly noise and extra branch effects.

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-24 17:48  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-02-24 17:48 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> I will investigate this. However, please note that the current master
> includes the inlining commit (dc592a4155), which makes the COPY FROM
> faster. In my case,
> 
> 1: current master without dc592a4155: 14400ms
> 2: current master: 13960ms (%3 improvement against #1)
> 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> regression against #2)
> 
> Is it possible for you to do a similar test? I mean dropping
> dc592a4155 from the current master and re-running the benchmark, that
> would be helpful.

IMHO as long as the difference from v18 looks reasonable, commit-by-commit
regressions and improvements that even out in the end are okay.  That's
perhaps a bit of mental gymnastics (e.g., what if we had committed the
inlining patch for v18?), but I believe that's how we've dealt with similar
problems in the past.  But maybe there are ways to avoid even these
in-development regressions, too...

-- 
nathan






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-25 04:06  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 0 replies; 114+ messages in thread

From: Manni Wood @ 2026-02-25 04:06 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Feb 24, 2026 at 11:48 AM Nathan Bossart <[email protected]>
wrote:

> On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> > I will investigate this. However, please note that the current master
> > includes the inlining commit (dc592a4155), which makes the COPY FROM
> > faster. In my case,
> >
> > 1: current master without dc592a4155: 14400ms
> > 2: current master: 13960ms (%3 improvement against #1)
> > 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> > regression against #2)
> >
> > Is it possible for you to do a similar test? I mean dropping
> > dc592a4155 from the current master and re-running the benchmark, that
> > would be helpful.
>
> IMHO as long as the difference from v18 looks reasonable, commit-by-commit
> regressions and improvements that even out in the end are okay.  That's
> perhaps a bit of mental gymnastics (e.g., what if we had committed the
> inlining patch for v18?), but I believe that's how we've dealt with similar
> problems in the past.  But maybe there are ways to avoid even these
> in-development regressions, too...
>
> --
> nathan
>

Oh yes, I see now.

Commit 18bcdb75 is just before the v9 patch got applied, so I used that as
"old master" and compared that with master (v9 applied) and then "master
(v9 applied) + v10 applied".

arm NARROW old master 18bcdb75
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW master (v9 applied)
TXT :                 10077.096250 ms  8.369778% improvement
CSV :                 10310.671250 ms  4.509151% improvement
TXT with 1/3 escapes: 9893.155000 ms  3.941064% improvement
CSV with 1/3 quotes:  12133.064750 ms  3.394441% improvement

arm NARROW v10
TXT :                 10467.816750 ms  4.816988% improvement
CSV :                 9986.288000 ms  7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms  -0.234262% regression
CSV with 1/3 quotes:  11843.611750 ms  5.699116% improvement


arm WIDE old master 18bcdb75
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE master (v9 applied)
TXT :                 10568.344750 ms  10.632934% improvement
CSV :                 13046.610500 ms  6.187236% improvement
TXT with 1/3 escapes: 12193.088500 ms  9.214736% improvement
CSV with 1/3 quotes:  16629.319000 ms  5.288973% improvement

arm WIDE v10
TXT :                 9064.959000 ms  23.345727% improvement
CSV :                 9019.553250 ms  35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms  8.087402% improvement
CSV with 1/3 quotes:  15495.863750 ms  11.744482% improvement



x86 NARROW old master 18bcdb75
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW master
TXT :                 26359.319000 ms  -1.737842% regression
CSV :                 25661.199750 ms  8.801007% improvement
TXT with 1/3 escapes: 28170.085250 ms  -1.352471% regression
CSV with 1/3 quotes:  32638.147500 ms  5.510895% improvement

x86 NARROW v10
TXT :                 26416.331500 ms  -1.957890% regression
CSV :                 25318.727500 ms  10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms  -2.928061% regression
CSV with 1/3 quotes:  32805.627750 ms  5.026032% improvement

x86 WIDE old master 18bcdb75
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE master
TXT :                 26475.164250 ms  8.003434% improvement
CSV :                 31963.478500 ms  10.395938% improvement
TXT with 1/3 escapes: 29671.120750 ms  8.539755% improvement
CSV with 1/3 quotes:  40391.616250 ms  14.105012% improvement

x86 WIDE v10
TXT :                 23067.046750 ms  19.846046% improvement
CSV :                 23259.092250 ms  34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms  1.989583% improvement
CSV with 1/3 quotes:  42925.792250 ms  8.715948% improvement

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-25 14:24  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-25 14:24 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 24 Feb 2026 at 20:48, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Feb 24, 2026 at 04:57:21PM +0300, Nazir Bilal Yavuz wrote:
> > I will investigate this. However, please note that the current master
> > includes the inlining commit (dc592a4155), which makes the COPY FROM
> > faster. In my case,
> >
> > 1: current master without dc592a4155: 14400ms
> > 2: current master: 13960ms (%3 improvement against #1)
> > 3: current master + SIMD: 15123ms (%5 regression against #1 and %8
> > regression against #2)
> >
> > Is it possible for you to do a similar test? I mean dropping
> > dc592a4155 from the current master and re-running the benchmark, that
> > would be helpful.
>
> IMHO as long as the difference from v18 looks reasonable, commit-by-commit
> regressions and improvements that even out in the end are okay.  That's
> perhaps a bit of mental gymnastics (e.g., what if we had committed the
> inlining patch for v18?), but I believe that's how we've dealt with similar
> problems in the past.  But maybe there are ways to avoid even these
> in-development regressions, too...

I agree with you. However, unfortunately, I see regression on master +
v10 compared to REL_18_3 (62d6c7d3df6).

Thank you Kazar and Manni for benchmarks in [1] and [2]!

I am still able to reproduce regression for the 'wide & CSV 1/3' case
[3] by using Manni's benchmark script. I constantly see ~%5
regression, I am just curious if I am doing something wrong. I am a
bit surprised because I didn't see this regression before, also Kazar
and Manni don't see any regression in their [1] and [2] benchmarks. I
am still investigating this regression. Hopefully, I will come back
with more information soon.

If anyone has any suggestions/ideas, please let me know!

[1] https://postgr.es/m/CA%2BK2RukFH57QPAfTEzvy7PEyrLzav3HkyCiu-2yqR%2BuW_Niorw%40mail.gmail.com
[2] https://postgr.es/m/CAKWEB6oT5KbyF%2BuRRhjjJi7p2PmRdOzxp3T6vFcN04BCR-%3DB2w%40mail.gmail.com
[3]
1: current master without dc592a4155: 14400ms
2: current master: 13960ms (%3 improvement against #1)
3: current master + v10: 15123ms (%5 regression against #1 and %8
regression against #2)

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 12:19  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-02-26 12:19 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]> wrote:
>
> I agree with you. However, unfortunately, I see regression on master +
> v10 compared to REL_18_3 (62d6c7d3df6).
>
> Thank you Kazar and Manni for benchmarks in [1] and [2]!

Kazar and Manni, if possible could you please share the build commands
you use? I see regressions for an inlining patch (dc592a4155) too when
I build postgres with -O2.

My build commands are:

-O2: meson setup buildtype=debugoptimized ...

-O3: meson setup buildtype=release ...

This is a wide benchmark only, old master means b2ff2a0b529 without
0001 inlining patch (dc592a4155):

+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|            -O2           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10440 | 11000 | 11940 | 13600 |
+--------------------------+-------+-------+-------+-------+
|     Old Master + 0001    | 10140 | 10800 | 11600 | 14300 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9000 | 11000 |  8850 | 15300 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |       |  CSV  |
+--------------------------+-------+-------+-------+-------+
|            -O3           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10440 | 11200 | 12200 | 14390 |
+--------------------------+-------+-------+-------+-------+
|     Old Master + 0001    | 10000 | 10700 | 11540 | 13960 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8880 | 10900 |  8900 | 15100 |
+--------------------------+-------+-------+-------+-------+

This result shows that when we compare v18 and v18 + SIMD (0001 +
0002), there is only regression for the CSV 1/3 case. The regression
is %12.5 for the -O2 and %5 for the -O3.

--
Regards,
Nazir Bilal Yavuz
Microsoft






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 14:31  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2026-02-26 14:31 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello,

On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
> wrote:
> >
> > I agree with you. However, unfortunately, I see regression on master +
> > v10 compared to REL_18_3 (62d6c7d3df6).
> >
> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>
> Kazar and Manni, if possible could you please share the build commands
> you use? I see regressions for an inlining patch (dc592a4155) too when
> I build postgres with -O2.
>
> My build commands are:
>
> -O2: meson setup buildtype=debugoptimized ...
>
> -O3: meson setup buildtype=release ...

All my builds are with CFLAGS='-O2 -g'

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 14:36  Manni Wood <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-02-26 14:36 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 26, 2026 at 8:31 AM KAZAR Ayoub <[email protected]> wrote:

> Hello,
>
> On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
> wrote:
>
>> Hi,
>>
>> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
>> wrote:
>> >
>> > I agree with you. However, unfortunately, I see regression on master +
>> > v10 compared to REL_18_3 (62d6c7d3df6).
>> >
>> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>>
>> Kazar and Manni, if possible could you please share the build commands
>> you use? I see regressions for an inlining patch (dc592a4155) too when
>> I build postgres with -O2.
>>
>> My build commands are:
>>
>> -O2: meson setup buildtype=debugoptimized ...
>>
>> -O3: meson setup buildtype=release ...
>
> All my builds are with CFLAGS='-O2 -g'
>
> Regards,
> Ayoub
>

Hello!

I have been building with this command:

meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH}
--buildtype=debugoptimized

And in my notes I have "If I use `--buildtype=debugoptimized` it optimizes
`-O2` and uses `-g`"

Best,
-Manni
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 15:32  Manni Wood <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-02-26 15:32 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

I have a thought and a question:

My notes say "If I use `--buildtype=release` it optimizes `-O2` and the
executable contains no debug symbols."

So, seeing as end users will presumably be seeing the performance generated
by `--buildtype=release`, should we be building with that for all
performance testing?

Best,
-Manni

On Thu, Feb 26, 2026 at 8:36 AM Manni Wood <[email protected]>
wrote:

>
>
> On Thu, Feb 26, 2026 at 8:31 AM KAZAR Ayoub <[email protected]> wrote:
>
>> Hello,
>>
>> On Thu, Feb 26, 2026 at 1:19 PM Nazir Bilal Yavuz <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> On Wed, 25 Feb 2026 at 17:24, Nazir Bilal Yavuz <[email protected]>
>>> wrote:
>>> >
>>> > I agree with you. However, unfortunately, I see regression on master +
>>> > v10 compared to REL_18_3 (62d6c7d3df6).
>>> >
>>> > Thank you Kazar and Manni for benchmarks in [1] and [2]!
>>>
>>> Kazar and Manni, if possible could you please share the build commands
>>> you use? I see regressions for an inlining patch (dc592a4155) too when
>>> I build postgres with -O2.
>>>
>>> My build commands are:
>>>
>>> -O2: meson setup buildtype=debugoptimized ...
>>>
>>> -O3: meson setup buildtype=release ...
>>
>> All my builds are with CFLAGS='-O2 -g'
>>
>> Regards,
>> Ayoub
>>
>
> Hello!
>
> I have been building with this command:
>
> meson setup build --prefix=/home/mwood/compiled-pg-instances/${BRANCH}
> --buildtype=debugoptimized
>
> And in my notes I have "If I use `--buildtype=debugoptimized` it optimizes
> `-O2` and uses `-g`"
>
> Best,
> -Manni
> --
> -- Manni Wood EDB: https://www.enterprisedb.com
>


-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-02-26 15:51  KAZAR Ayoub <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2026-02-26 15:51 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Feb 26, 2026 at 4:32 PM Manni Wood <[email protected]>
wrote:

> I have a thought and a question:
>
> My notes say "If I use `--buildtype=release` it optimizes `-O2` and the
> executable contains no debug symbols."
>
That would be `debugoptimized` not `release`, from [1] i see that `release`
is -O3 with no debug.

>
> So, seeing as end users will presumably be seeing the performance
> generated by `--buildtype=release`, should we be building with that for all
> performance testing?
>
I know that Debian builds with  'CFLAGS=-g -O2 -flto=auto -ffat-lto-objects
-flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat
-Werror=format-security -fno-omit-frame-pointer'
'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto
-Wl,-z,relro -Wl,-z,now' ; this is from pg_config for v18.

[1] https://mesonbuild.com/Builtin-options.html

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-02 19:55  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-02 19:55 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> If anyone has any suggestions/ideas, please let me know!

A couple of random ideas:

* Additional inlining for callers.  I looked around a little bit and didn't
see any great candidates, so I don't have much faith in this, but maybe
you'll see something I don't.

* Disable SIMD if we are consistently getting small rows.  That won't help
your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
the regression for narrow rows described elsewhere.

* Surround the variable initializations with "if (simd_enabled)".
Presumably compilers are smart enough to remove those in the non-SIMD paths
already, but it could be worth a try.

* Add simd_enabled function parameter to CopyReadLine(),
NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
compiler to do some additional optimizations to reduce branching.

-- 
nathan

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-04 15:15  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-04 15:15 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> > If anyone has any suggestions/ideas, please let me know!

I am able to fix the problem. My first assumption was that the
branching of SIMD code caused that problem, so I moved SIMD code to
the CopyReadLineTextSIMDHelper() function. Then I moved this
CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
that we won't have any branching in the non-SIMD (scalar) code path.
This didn't solve the problem and then I realized that even though I
disable SIMD code path with 'if (false)', there is still regression
but if I comment all of the 'if (cstate->simd_enabled)' branch, then
there is no regression at all.

To find out more, I compared assembly outputs of both and found out
the possible reason. What I understood is that the compiler can't
promote a variable to register, instead these variables live in the
stack; which is slower. Please see the two different assembly outputs:

Slow code:

        c = copy_input_buf[input_buf_ptr++];
     db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
     db4:    48 63 c6                 movslq %esi,%rax
     db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
     dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
     dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx

Fast code:

        c = copy_input_buf[input_buf_ptr++];
     d80:    49 63 c4                 movslq %r12d,%rax
     d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
     d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax

And the reason for that is sending the address of input_buf_ptr to a
CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
this:

int            temp_input_buf_ptr = input_buf_ptr;
CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);

Then there is no regression. However, I am still not completely sure
if that is the same problem in the v10, I am planning to spend more
time debugging this.

> A couple of random ideas:
>
> * Additional inlining for callers.  I looked around a little bit and didn't
> see any great candidates, so I don't have much faith in this, but maybe
> you'll see something I don't.

I agree with you. CopyReadLineText() is already quite a big function.

> * Disable SIMD if we are consistently getting small rows.  That won't help
> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
> the regression for narrow rows described elsewhere.

I implemented this, two consecutive small rows disables SIMD.

> * Surround the variable initializations with "if (simd_enabled)".
> Presumably compilers are smart enough to remove those in the non-SIMD paths
> already, but it could be worth a try.

Done.

> * Add simd_enabled function parameter to CopyReadLine(),
> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
> compiler to do some additional optimizations to reduce branching.

I think we don't need this. At least the implementation with
CopyReadLineTextSIMDHelper() doesn't need this since branching will be
at the top and it will be once per line.

I think v11 looks better compared to v10. I liked the
CopyReadLineTextSIMDHelper() helper function. I also liked it being at
the top of CopyReadLineText(), not being in the scalar path. This
gives us more optimization options without affecting the scalar path.

Here are the new benchmark results, I benchmarked the changes with
both -O2 and -O3 and also both with and without 'changing
default_toast_compression to lz4' commit (65def42b1d5). Benchmark
results show that there is no regression and the performance
improvement is much bigger with 65def42b1d5, it is close to 2x for
text format and more than 2x for the csv format.

------------------------------

Benchmark results:

With 65def42b1d5:

+---------------------------------------------------------+
|                    Optimization: -O2                    |
+--------------------------+--------------+---------------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|           WIDE           | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 4220 |  4780 |  5930 |  8250 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2520 |  4500 |  2520 |  7800 |
+--------------------------+------+-------+-------+-------+
|                          |      |       |       |       |
+--------------------------+------+-------+-------+-------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|          NARROW          | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 9920 | 10100 | 10200 | 10470 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9970 | 10000 | 10180 | 10350 |
+--------------------------+------+-------+-------+-------+
|                                                         |
+---------------------------------------------------------+
|                                                         |
+---------------------------------------------------------+
|                    Optimization: -O3                    |
+--------------------------+--------------+---------------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|           WIDE           | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 4100 |  4900 |  6200 |  8300 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2470 |  4440 |  2570 |  7700 |
+--------------------------+------+-------+-------+-------+
|                          |      |       |       |       |
+--------------------------+------+-------+-------+-------+
|                          |     Text     |      CSV      |
+--------------------------+------+-------+-------+-------+
|          NARROW          | None |  1/3  |  None |  1/3  |
+--------------------------+------+-------+-------+-------+
|        Old Master        | 9530 |  9690 |  9800 | 10080 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9350 |  9450 |  9700 | 10000 |
+--------------------------+------+-------+-------+-------+

------------------------------

Without 65def42b1d5:

+----------------------------------------------------------+
|                     Optimization: -O2                    |
+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|           WIDE           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10550 | 11030 | 12250 | 14400 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8890 | 10700 |  8870 | 14070 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|          NARROW          |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        |  9921 | 10205 | 10123 | 10420 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9880 | 10070 | 10150 | 10400 |
+--------------------------+-------+-------+-------+-------+
|                                                          |
+----------------------------------------------------------+
|                                                          |
+----------------------------------------------------------+
|                     Optimization: -O3                    |
+--------------------------+---------------+---------------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|           WIDE           |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        | 10500 | 11100 | 12600 | 14580 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  8900 | 10660 |  8860 | 13990 |
+--------------------------+-------+-------+-------+-------+
|                          |       |       |       |       |
+--------------------------+-------+-------+-------+-------+
|                          |      Text     |      CSV      |
+--------------------------+-------+-------+-------+-------+
|          NARROW          |  None |  1/3  |  None |  1/3  |
+--------------------------+-------+-------+-------+-------+
|        Old Master        |  9600 |  9700 |  9800 | 10150 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 |  9300 |  9470 |  9600 |  9880 |
+--------------------------+-------+-------+-------+-------+

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v11-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (9.4K, 2-v11-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From 7acaeb3201ae4ae279bf8b25641bea7f8cb92cbe Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v11] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   4 +
 src/backend/commands/copyfromparse.c     | 222 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   4 +
 3 files changed, 223 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..2aa52810ff1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,10 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+	cstate->simd_failed_first_vector = false;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fbd13353efc..70e1a5a0410 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -158,6 +159,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1310,6 +1317,182 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which is neither EOF nor EOL.
+ * Also, SIMD is disabled when it encounters two consecutive short lines that
+ * SIMD can't create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) >= copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+
+				simd_hit_eof = (c1 == '\\' && c2 == '.' && !is_csv);
+				simd_hit_eol = (c1 == '\r' || c1 == '\n');
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+					cstate->simd_enabled = false;
+
+				/*
+				 * We encountered a EOL or EOF on the first vector. This means
+				 * lines are not long enough to skip fully sized vector. If
+				 * this happens two times consecutively, then disable the
+				 * SIMD.
+				 */
+				if (first_vector)
+				{
+					if (cstate->simd_failed_first_vector)
+						cstate->simd_enabled = false;
+
+					cstate->simd_failed_first_vector = true;
+				}
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			first_vector = false;
+			break;
+		}
+	}
+
+	cstate->simd_failed_first_vector = first_vector;
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1338,6 +1521,38 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/* input_buf_ptr will be used in the SIMD Helper function */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		int			temp_input_buf_ptr = input_buf_ptr;
+		bool		temp_hit_eof = false;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Short exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1359,14 +1574,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..4a748df8ac8 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,10 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+	bool		simd_failed_first_vector;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-05 21:25  Andrew Dunstan <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Andrew Dunstan @ 2026-03-05 21:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> Hi,
>
> On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]> wrote:
>> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
>>> If anyone has any suggestions/ideas, please let me know!
> I am able to fix the problem. My first assumption was that the
> branching of SIMD code caused that problem, so I moved SIMD code to
> the CopyReadLineTextSIMDHelper() function. Then I moved this
> CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> that we won't have any branching in the non-SIMD (scalar) code path.
> This didn't solve the problem and then I realized that even though I
> disable SIMD code path with 'if (false)', there is still regression
> but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> there is no regression at all.
>
> To find out more, I compared assembly outputs of both and found out
> the possible reason. What I understood is that the compiler can't
> promote a variable to register, instead these variables live in the
> stack; which is slower. Please see the two different assembly outputs:
>
> Slow code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
>       db4:    48 63 c6                 movslq %esi,%rax
>       db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
>       dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
>       dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx
>
> Fast code:
>
>          c = copy_input_buf[input_buf_ptr++];
>       d80:    49 63 c4                 movslq %r12d,%rax
>       d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
>       d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax
>
> And the reason for that is sending the address of input_buf_ptr to a
> CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> this:
>
> int            temp_input_buf_ptr = input_buf_ptr;
> CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
>
> Then there is no regression. However, I am still not completely sure
> if that is the same problem in the v10, I am planning to spend more
> time debugging this.
>
>> A couple of random ideas:
>>
>> * Additional inlining for callers.  I looked around a little bit and didn't
>> see any great candidates, so I don't have much faith in this, but maybe
>> you'll see something I don't.
> I agree with you. CopyReadLineText() is already quite a big function.
>
>> * Disable SIMD if we are consistently getting small rows.  That won't help
>> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
>> the regression for narrow rows described elsewhere.
> I implemented this, two consecutive small rows disables SIMD.
>
>> * Surround the variable initializations with "if (simd_enabled)".
>> Presumably compilers are smart enough to remove those in the non-SIMD paths
>> already, but it could be worth a try.
> Done.
>
>> * Add simd_enabled function parameter to CopyReadLine(),
>> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
>> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage the
>> compiler to do some additional optimizations to reduce branching.
> I think we don't need this. At least the implementation with
> CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> at the top and it will be once per line.
>
> I think v11 looks better compared to v10. I liked the
> CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> the top of CopyReadLineText(), not being in the scalar path. This
> gives us more optimization options without affecting the scalar path.
>
> Here are the new benchmark results, I benchmarked the changes with
> both -O2 and -O3 and also both with and without 'changing
> default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> results show that there is no regression and the performance
> improvement is much bigger with 65def42b1d5, it is close to 2x for
> text format and more than 2x for the csv format.


I spent some time exploring different ideas for improving this, but 
found none that didn't cause regression in some cases, so good to go 
from my POV.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 16:59  Manni Wood <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-06 16:59 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hello.

I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using
the same build I've been using: meson with "debugoptimized", which
translates to "-g -O2" gcc flags.

x86 NARROW old master (18bcdb75)
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW v10
TXT :                 26416.331500 ms  -1.957890% regression
CSV :                 25318.727500 ms  10.018142% improvement
TXT with 1/3 escapes: 28608.007500 ms  -2.928061% regression
CSV with 1/3 quotes:  32805.627750 ms  5.026032% improvement

x86 NARROW v11
TXT :                 27212.945750 ms  -5.032545% regression
CSV :                 26985.971250 ms  4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms  2.078374% improvement
CSV with 1/3 quotes:  32817.267500 ms  4.992334% improvement


x86 WIDE old master (18bcdb75)
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE v10
TXT :                 23067.046750 ms  19.846046% improvement
CSV :                 23259.092250 ms  34.797174% improvement
TXT with 1/3 escapes: 31796.098250 ms  1.989583% improvement
CSV with 1/3 quotes:  42925.792250 ms  8.715948% improvement

x86 WIDE v11
TXT :                 22571.305750 ms  21.568659% improvement
CSV :                 22711.524750 ms  36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms  9.879604% improvement
CSV with 1/3 quotes:  40022.110750 ms  14.890786% improvement



arm NARROW old master (18bcdb75)
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW v10
TXT :                 10467.816750 ms  4.816988% improvement
CSV :                 9986.288000 ms  7.513381% improvement
TXT with 1/3 escapes: 10323.173750 ms  -0.234262% regression
CSV with 1/3 quotes:  11843.611750 ms  5.699116% improvement

arm NARROW v11
TXT :                 10340.966250 ms  5.970429% improvement
CSV :                 10224.399500 ms  5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms  -1.351288% regression
CSV with 1/3 quotes:  11865.934000 ms  5.521383% improvement


arm WIDE old master (18bcdb75)
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE v10
TXT :                 9064.959000 ms  23.345727% improvement
CSV :                 9019.553250 ms  35.144134% improvement
TXT with 1/3 escapes: 12344.497250 ms  8.087402% improvement
CSV with 1/3 quotes:  15495.863750 ms  11.744482% improvement

arm WIDE v11
TXT :                 9001.442250 ms  23.882831% improvement
CSV :                 8940.928750 ms  35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms  10.282589% improvement
CSV with 1/3 quotes:  15277.843250 ms  12.986201% improvement

Best,

-Manni

On Thu, Mar 5, 2026 at 3:25 PM Andrew Dunstan <[email protected]> wrote:

>
> On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]>
> wrote:
> >> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> >>> If anyone has any suggestions/ideas, please let me know!
> > I am able to fix the problem. My first assumption was that the
> > branching of SIMD code caused that problem, so I moved SIMD code to
> > the CopyReadLineTextSIMDHelper() function. Then I moved this
> > CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
> > that we won't have any branching in the non-SIMD (scalar) code path.
> > This didn't solve the problem and then I realized that even though I
> > disable SIMD code path with 'if (false)', there is still regression
> > but if I comment all of the 'if (cstate->simd_enabled)' branch, then
> > there is no regression at all.
> >
> > To find out more, I compared assembly outputs of both and found out
> > the possible reason. What I understood is that the compiler can't
> > promote a variable to register, instead these variables live in the
> > stack; which is slower. Please see the two different assembly outputs:
> >
> > Slow code:
> >
> >          c = copy_input_buf[input_buf_ptr++];
> >       db0:    48 8b 55 b8              mov    -0x48(%rbp),%rdx
> >       db4:    48 63 c6                 movslq %esi,%rax
> >       db7:    44 8d 66 01              lea    0x1(%rsi),%r12d
> >       dbb:    44 89 65 cc              mov    %r12d,-0x34(%rbp)
> >       dbf:    0f be 14 02              movsbl (%rdx,%rax,1),%edx
> >
> > Fast code:
> >
> >          c = copy_input_buf[input_buf_ptr++];
> >       d80:    49 63 c4                 movslq %r12d,%rax
> >       d83:    45 8d 5c 24 01           lea    0x1(%r12),%r11d
> >       d88:    41 0f be 04 06           movsbl (%r14,%rax,1),%eax
> >
> > And the reason for that is sending the address of input_buf_ptr to a
> > CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
> > this:
> >
> > int            temp_input_buf_ptr = input_buf_ptr;
> > CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
> >
> > Then there is no regression. However, I am still not completely sure
> > if that is the same problem in the v10, I am planning to spend more
> > time debugging this.
> >
> >> A couple of random ideas:
> >>
> >> * Additional inlining for callers.  I looked around a little bit and
> didn't
> >> see any great candidates, so I don't have much faith in this, but maybe
> >> you'll see something I don't.
> > I agree with you. CopyReadLineText() is already quite a big function.
> >
> >> * Disable SIMD if we are consistently getting small rows.  That won't
> help
> >> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help
> with
> >> the regression for narrow rows described elsewhere.
> > I implemented this, two consecutive small rows disables SIMD.
> >
> >> * Surround the variable initializations with "if (simd_enabled)".
> >> Presumably compilers are smart enough to remove those in the non-SIMD
> paths
> >> already, but it could be worth a try.
> > Done.
> >
> >> * Add simd_enabled function parameter to CopyReadLine(),
> >> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do
> the
> >> bool literal trick in CopyFrom{Text,CSV}OneRow().  That could encourage
> the
> >> compiler to do some additional optimizations to reduce branching.
> > I think we don't need this. At least the implementation with
> > CopyReadLineTextSIMDHelper() doesn't need this since branching will be
> > at the top and it will be once per line.
> >
> > I think v11 looks better compared to v10. I liked the
> > CopyReadLineTextSIMDHelper() helper function. I also liked it being at
> > the top of CopyReadLineText(), not being in the scalar path. This
> > gives us more optimization options without affecting the scalar path.
> >
> > Here are the new benchmark results, I benchmarked the changes with
> > both -O2 and -O3 and also both with and without 'changing
> > default_toast_compression to lz4' commit (65def42b1d5). Benchmark
> > results show that there is no regression and the performance
> > improvement is much bigger with 65def42b1d5, it is close to 2x for
> > text format and more than 2x for the csv format.
>
>
> I spent some time exploring different ideas for improving this, but
> found none that didn't cause regression in some cases, so good to go
> from my POV.
>
>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 17:39  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-06 17:39 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 6 Mar 2026 at 20:00, Manni Wood <[email protected]> wrote:
>
> Hello.
>
> I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using the same build I've been using: meson with "debugoptimized", which translates to "-g -O2" gcc flags.

Thanks for the benchmark! The results look nice.

One question: does your benchmark include the 34dfca2934 LZ4 commit,
and is LZ4 enabled on your system?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 18:13  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-06 18:13 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 6, 2026 at 11:39 AM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Fri, 6 Mar 2026 at 20:00, Manni Wood <[email protected]>
> wrote:
> >
> > Hello.
> >
> > I ran Nazir's v11 patch on my x86 tower PC and my arm raspberry pi using
> the same build I've been using: meson with "debugoptimized", which
> translates to "-g -O2" gcc flags.
>
> Thanks for the benchmark! The results look nice.
>
> One question: does your benchmark include the 34dfca2934 LZ4 commit,
> and is LZ4 enabled on your system?
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello, Nazir!

When I ran `meson setup build --buildtype=debugoptimized` on both my x86
machine and my arm machine, the response on both was:

"External libraries
"  lz4                      : NO"

However, I did not remove commit 34dfca2934 from any of my Postgres builds;
I left that commit in place.

Let me know if that helps!

And I agree, the results look nice.

Best,

-Manni
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 18:55  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-06 18:55 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Manni!

On Fri, 6 Mar 2026 at 21:13, Manni Wood <[email protected]> wrote:
>
> When I ran `meson setup build --buildtype=debugoptimized` on both my x86 machine and my arm machine, the response on both was:
>
> "External libraries
> "  lz4                      : NO"
>
> However, I did not remove commit 34dfca2934 from any of my Postgres builds; I left that commit in place.
>
> Let me know if that helps!

That definitely helps, thanks! If you have a chance, could you also
run the benchmark with LZ4 enabled? I expect you may see significantly
better performance, similar to what I observed.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 21:25  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-06 21:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 6, 2026 at 12:55 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi Manni!
>
> On Fri, 6 Mar 2026 at 21:13, Manni Wood <[email protected]>
> wrote:
> >
> > When I ran `meson setup build --buildtype=debugoptimized` on both my x86
> machine and my arm machine, the response on both was:
> >
> > "External libraries
> > "  lz4                      : NO"
> >
> > However, I did not remove commit 34dfca2934 from any of my Postgres
> builds; I left that commit in place.
> >
> > Let me know if that helps!
>
> That definitely helps, thanks! If you have a chance, could you also
> run the benchmark with LZ4 enabled? I expect you may see significantly
> better performance, similar to what I observed.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hi, Nazir.

Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
with no lz4, and then your v11 patch with lz4 compiled in.

(Aside: I assume everything is good for my lz4 build after installing the
lz4 dev library and seeing this with my meson config:

"  External libraries
"    lz4                      : YES 1.9.4"

and this from the db itself:

$ ./bin/psql -U mwood -d postgres -c 'show default_toast_compression'
 default_toast_compression
---------------------------
 lz4
)

x86 NARROW old master
TXT :                 25909.060500 ms
CSV :                 28137.591250 ms
TXT with 1/3 escapes: 27794.177000 ms
CSV with 1/3 quotes:  34541.704750 ms

x86 NARROW v11
TXT :                 27212.945750 ms  -5.032545% regression
CSV :                 26985.971250 ms  4.092817% improvement
TXT with 1/3 escapes: 27216.510000 ms  2.078374% improvement
CSV with 1/3 quotes:  32817.267500 ms  4.992334% improvement

x86 NARROW v11 lz4
TXT :                 26471.776500 ms  -2.171889% regression
CSV :                 25607.026250 ms  8.993538% improvement
TXT with 1/3 escapes: 28628.729750 ms  -3.002617% regression
CSV with 1/3 quotes:  34729.006750 ms  -0.542249% regression


x86 WIDE old master
TXT :                 28778.426500 ms
CSV :                 35671.908000 ms
TXT with 1/3 escapes: 32441.549750 ms
CSV with 1/3 quotes:  47024.416000 ms

x86 WIDE v11
TXT :                 22571.305750 ms  21.568659% improvement
CSV :                 22711.524750 ms  36.332184% improvement
TXT with 1/3 escapes: 29236.453000 ms  9.879604% improvement
CSV with 1/3 quotes:  40022.110750 ms  14.890786% improvement

x86 WIDE v11 lz4
TXT :                 8032.912750 ms  72.087033% improvement
CSV :                 8047.098000 ms  77.441358% improvement
TXT with 1/3 escapes: 15428.139500 ms  52.443272% improvement
CSV with 1/3 quotes:  27517.084500 ms  41.483410% improvement



arm NARROW old master
TXT :                 10997.568250 ms
CSV :                 10797.549000 ms
TXT with 1/3 escapes: 10299.047000 ms
CSV with 1/3 quotes:  12559.385750 ms

arm NARROW v11
TXT :                 10340.966250 ms  5.970429% improvement
CSV :                 10224.399500 ms  5.308144% improvement
TXT with 1/3 escapes: 10438.216750 ms  -1.351288% regression
CSV with 1/3 quotes:  11865.934000 ms  5.521383% improvement

arm NARROW v11 lz4
TXT :                 9783.737000 ms  11.037270% improvement
CSV :                 10122.890750 ms  6.248254% improvement
TXT with 1/3 escapes: 10298.780250 ms  0.002590% improvement
CSV with 1/3 quotes:  11738.992250 ms  6.532115% improvement


arm WIDE old master
TXT :                 11825.771250 ms
CSV :                 13907.074000 ms
TXT with 1/3 escapes: 13430.691250 ms
CSV with 1/3 quotes:  17557.954500 ms

arm WIDE v11
TXT :                 9001.442250 ms  23.882831% improvement
CSV :                 8940.928750 ms  35.709490% improvement
TXT with 1/3 escapes: 12049.668500 ms  10.282589% improvement
CSV with 1/3 quotes:  15277.843250 ms  12.986201% improvement

arm WIDE v11 lz4
TXT :                 3186.825500 ms  73.051859% improvement
CSV :                 3142.526500 ms  77.403396% improvement
TXT with 1/3 escapes: 6180.176000 ms  53.984677% improvement
CSV with 1/3 quotes:  9460.505500 ms  46.118407% improvement

Cheers,

-Manni

-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 23:13  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-06 23:13 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
> with no lz4, and then your v11 patch with lz4 compiled in.

I'm appreciative of all the benchmarking that you and others are doing, but
wouldn't we be more interested in the difference between "old master with
lz4" and "v11 with lz4"?  Else, we have multiple variables in play.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-06 23:31  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2026-03-06 23:31 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]>
wrote:

> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> > Well, golly! Look at these numbers. Old master with no lz4, your v11
> patch
> > with no lz4, and then your v11 patch with lz4 compiled in.
>
> I'm appreciative of all the benchmarking that you and others are doing, but
> wouldn't we be more interested in the difference between "old master with
> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
>
Yes I agree because the lz4 effect doesn't prove anything for the SIMD
patch itself right ? So basically a comparison for the SIMD effect should
be "master with/out lz4 vs patched with/out lz4, respectively and nothing
more!", is this correct ?

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-08 10:31  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-08 10:31 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sat, 7 Mar 2026 at 02:31, KAZAR Ayoub <[email protected]> wrote:
>
> On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]> wrote:
>>
>> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
>> > Well, golly! Look at these numbers. Old master with no lz4, your v11 patch
>> > with no lz4, and then your v11 patch with lz4 compiled in.
>>
>> I'm appreciative of all the benchmarking that you and others are doing, but
>> wouldn't we be more interested in the difference between "old master with
>> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
>
> Yes I agree because the lz4 effect doesn't prove anything for the SIMD patch itself right ? So basically a comparison for the SIMD effect should be "master with/out lz4 vs patched with/out lz4, respectively and nothing more!", is this correct ?

Yes, I think 'master with/out lz4 vs patched with/out lz4,
respectively' is enough to determine the effect of the SIMD patch.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-08 19:45  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-08 19:45 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Sun, Mar 8, 2026 at 5:31 AM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sat, 7 Mar 2026 at 02:31, KAZAR Ayoub <[email protected]> wrote:
> >
> > On Sat, Mar 7, 2026 at 12:13 AM Nathan Bossart <[email protected]>
> wrote:
> >>
> >> On Fri, Mar 06, 2026 at 03:25:46PM -0600, Manni Wood wrote:
> >> > Well, golly! Look at these numbers. Old master with no lz4, your v11
> patch
> >> > with no lz4, and then your v11 patch with lz4 compiled in.
> >>
> >> I'm appreciative of all the benchmarking that you and others are doing,
> but
> >> wouldn't we be more interested in the difference between "old master
> with
> >> lz4" and "v11 with lz4"?  Else, we have multiple variables in play.
> >
> > Yes I agree because the lz4 effect doesn't prove anything for the SIMD
> patch itself right ? So basically a comparison for the SIMD effect should
> be "master with/out lz4 vs patched with/out lz4, respectively and nothing
> more!", is this correct ?
>
> Yes, I think 'master with/out lz4 vs patched with/out lz4,
> respectively' is enough to determine the effect of the SIMD patch.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello!

As requested, here are some numbers based on the latest master but with the
copy code inlining excised (`git revert
dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy
code inlining left in place and the v11 patch applied.
Both results have lz4 compression in place.

I have not run numbers without lz4. I assume I could use the two postgres
instances that I have compiled with lz4, but just set
`default_toast_compression = pglz` in postgesql.conf for both instances.
Let me know if that is a mistaken assumption on my part.

arm NARROW master without inline with lz4
TXT :                 10362.799500 ms
CSV :                 10288.791000 ms
TXT with 1/3 escapes: 10411.416250 ms
CSV with 1/3 quotes:  12318.385750 ms

arm NARROW master with inline with lz4 with v11patch
TXT :                 10317.125750 ms  0.440747% improvement
CSV :                 10418.020250 ms -1.256020% regression
TXT with 1/3 escapes: 10188.319500 ms  2.142809% improvement
CSV with 1/3 quotes:  12032.964500 ms  2.317035% improvement


arm WIDE master without inline with lz4
TXT :                  5608.834500 ms
CSV :                  8115.155000 ms
TXT with 1/3 escapes:  7037.290500 ms
CSV with 1/3 quotes:  10894.615750 ms

arm WIDE master with inline with lz4 with v11patch
TXT :                  3190.268750 ms  43.120647% improvement
CSV :                  3135.177000 ms  61.366394% improvement
TXT with 1/3 escapes:  6373.746750 ms   9.428966% improvement
CSV with 1/3 quotes:  10336.763500 ms   5.120440% improvement



x86 NARROW-master-without-inline-with-lz4.log
TXT :                 26701.079250 ms
CSV :                 26492.235500 ms
TXT with 1/3 escapes: 28590.508250 ms
CSV with 1/3 quotes:  34876.742750 ms

x86 NARROW-master-with-inline-with-lz4-with-v11patch.log
TXT :                 26511.747750 ms  0.709078% improvement
CSV :                 26261.269750 ms  0.871824% improvement
TXT with 1/3 escapes: 27702.964750 ms  3.104329% improvement
CSV with 1/3 quotes:  32339.393000 ms  7.275191% improvement


x86 WIDE-master-without-inline-with-lz4.log
TXT :                 14485.563250 ms
CSV :                 21392.582000 ms
TXT with 1/3 escapes: 18081.514750 ms
CSV with 1/3 quotes:  32547.086250 ms

x86 WIDE-master-with-inline-with-lz4-with-v11patch.log
TXT :                  8080.378250 ms  44.217714% improvement
CSV :                  8283.723000 ms  61.277591% improvement
TXT with 1/3 escapes: 15054.111000 ms  16.743087% improvement
CSV with 1/3 quotes:  25668.009750 ms  21.135768% improvement
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 08:10  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-09 08:10 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Sun, 8 Mar 2026 at 22:45, Manni Wood <[email protected]> wrote:
>
> As requested, here are some numbers based on the latest master but with the copy code inlining excised (`git revert dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy code inlining left in place and the v11 patch applied.
> Both results have lz4 compression in place.

Thank you for the benchmark!

> I have not run numbers without lz4. I assume I could use the two postgres instances that I have compiled with lz4, but just set `default_toast_compression = pglz` in postgesql.conf for both instances. Let me know if that is a mistaken assumption on my part.

I am a bit confused. Are you asking that for the current benchmark you
shared or future benchmarks? I assume your current benchmark has
'default_toast_compression = lz4' because your benchmark results are
very similar to my benchmark with 'default_toast_compression = lz4'
but I just wanted to make sure.

What you said about editing postgresql.conf is correct but you need to
make this change before creating the Postgres instance with 'pg_ctl
... start' command, otherwise it won't have an effect and you need to
restart the instance to see the effect. Also, If you want to benchmark
without lz4 change, you can just use the "SET
default_toast_compression to 'pglz';" command in psql, then you don't
need to edit postgresql.conf. Please note that this will affect only
the psql instance you typed the command. To make things easier, you
can run the 'SHOW default_toast_compression;' command to see the
current value of 'default_toast_compression'.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 13:31  Manni Wood <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-09 13:31 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Mar 9, 2026 at 3:10 AM Nazir Bilal Yavuz <[email protected]> wrote:

> Hi,
>
> On Sun, 8 Mar 2026 at 22:45, Manni Wood <[email protected]>
> wrote:
> >
> > As requested, here are some numbers based on the latest master but with
> the copy code inlining excised (`git revert
> dc592a41557b072178f1798700bf9c69cd8e4235`), compared to master with copy
> code inlining left in place and the v11 patch applied.
> > Both results have lz4 compression in place.
>
> Thank you for the benchmark!
>
> > I have not run numbers without lz4. I assume I could use the two
> postgres instances that I have compiled with lz4, but just set
> `default_toast_compression = pglz` in postgesql.conf for both instances.
> Let me know if that is a mistaken assumption on my part.
>
> I am a bit confused. Are you asking that for the current benchmark you
> shared or future benchmarks? I assume your current benchmark has
> 'default_toast_compression = lz4' because your benchmark results are
> very similar to my benchmark with 'default_toast_compression = lz4'
> but I just wanted to make sure.
>
> What you said about editing postgresql.conf is correct but you need to
> make this change before creating the Postgres instance with 'pg_ctl
> ... start' command, otherwise it won't have an effect and you need to
> restart the instance to see the effect. Also, If you want to benchmark
> without lz4 change, you can just use the "SET
> default_toast_compression to 'pglz';" command in psql, then you don't
> need to edit postgresql.conf. Please note that this will affect only
> the psql instance you typed the command. To make things easier, you
> can run the 'SHOW default_toast_compression;' command to see the
> current value of 'default_toast_compression'.
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Hello, Nazir!

I was being too brief.

The benchmarks I shared were absolutely with lz4 compiled in
and 'default_toast_compression = lz4' set in postgresql.conf for every
postgres instance I tested with. (Furthermore, I ran `show
default_toast_compression` via `psql` on each postgres instance to be
sure 'default_toast_compression = lz4' was really set!)

Also, all were compiled using meson using `debugoptimized` which results in
`-g -O2`.

So those are the benchmarks that I shared.

OK, so my final question, hopefully clarified: If I run additional
benchmarks where pglz is used for default_toast_compression, is it enough
to use the instances I have already compiled with lz4 in them, but
with 'default_toast_compression = pglz` explicitly set in postgresql.conf
in a brand new data dir created by initdb? (In other words, existing data
dir deleted, then initdb run to make a new data dir, then postgresql.conf
edited to ensure 'default_toast_compression = pglz` explicitly set, then
and only then starting up the cluster for the first time... and finally
verifying via `show default_toast_compression` for good measure.)

Or should I re-compile with the lz4-is-now-the-default commit completely
excised?

Thanks so much!

-Manni

--
-- Manni Wood EDB: https://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 13:43  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-09 13:43 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: KAZAR Ayoub <[email protected]>; Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Neil Conway <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Manni!

On Mon, 9 Mar 2026 at 16:31, Manni Wood <[email protected]> wrote:
>
> I was being too brief.
>
> The benchmarks I shared were absolutely with lz4 compiled in and 'default_toast_compression = lz4' set in postgresql.conf for every postgres instance I tested with. (Furthermore, I ran `show default_toast_compression` via `psql` on each postgres instance to be sure 'default_toast_compression = lz4' was really set!)
>
> Also, all were compiled using meson using `debugoptimized` which results in `-g -O2`.
>
> So those are the benchmarks that I shared.

Thanks for the clarification.

> OK, so my final question, hopefully clarified: If I run additional benchmarks where pglz is used for default_toast_compression, is it enough to use the instances I have already compiled with lz4 in them, but with 'default_toast_compression = pglz` explicitly set in postgresql.conf in a brand new data dir created by initdb? (In other words, existing data dir deleted, then initdb run to make a new data dir, then postgresql.conf edited to ensure 'default_toast_compression = pglz` explicitly set, then and only then starting up the cluster for the first time... and finally verifying via `show default_toast_compression` for good measure.)
>
> Or should I re-compile with the lz4-is-now-the-default commit completely excised?

Yes, it is clear now; thanks. You don't need to compile without the
lz4-is-now-the-default commit. You can compile with lz4 commit and set
the 'default_toast_compression = pglz' in the postgresql.conf like you
said. This should be enough.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-09 18:25  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-03-09 18:25 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> +#ifndef USE_NO_SIMD
> +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> +									   bool *temp_hit_eof, int *temp_input_buf_ptr);
> +#endif

Should we inline this, too?

> +				/*
> +				 * Do not disable SIMD when we hit EOL or EOF characters. In
> +				 * practice, it does not matter for EOF because parsing ends
> +				 * there, but we keep the behavior consistent.
> +				 */
> +				if (!(simd_hit_eof || simd_hit_eol))
> +					cstate->simd_enabled = false;

nitpick: I would personally avoid disabling it for EOF.  It probably
doesn't amount to much, but I don't see any point in the extra
complexity/work solely for consistency.

> +				/*
> +				 * We encountered a EOL or EOF on the first vector. This means
> +				 * lines are not long enough to skip fully sized vector. If
> +				 * this happens two times consecutively, then disable the
> +				 * SIMD.
> +				 */
> +				if (first_vector)
> +				{
> +					if (cstate->simd_failed_first_vector)
> +						cstate->simd_enabled = false;
> +
> +					cstate->simd_failed_first_vector = true;
> +				}

The first time I saw this, my mind immediately went to the extreme case
where this likely regresses: alternating long and short lines.  We might
just want to disable it the first time we see a short line, like we do for
special characters.  This is another thing that we can improve
independently later on.

> +	/* First try to run SIMD, then continue with the scalar path */
> +	if (cstate->simd_enabled)
> +	{
> +		int			temp_input_buf_ptr = input_buf_ptr;
> +		bool		temp_hit_eof = false;
> +
> +		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> +											&temp_input_buf_ptr);
> +		input_buf_ptr = temp_input_buf_ptr;
> +		hit_eof = temp_hit_eof;

Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
moving the SIMD stuff to its own function is nice.  The temp variables seem
a bit too magical to me, though.  If those really make a difference, IMHO
there ought to be a big comment explaining why.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 02:30  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-10 02:30 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Mon, Mar 9, 2026 at 1:25 PM Nathan Bossart <[email protected]>
wrote:

> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool
> is_csv,
> > +
> bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?
>
> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or
> EOF characters. In
> > +                              * practice, it does not matter for EOF
> because parsing ends
> > +                              * there, but we keep the behavior
> consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.
>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the
> first vector. This means
> > +                              * lines are not long enough to skip fully
> sized vector. If
> > +                              * this happens two times consecutively,
> then disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if
> (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled =
> false;
> > +
> > +                                     cstate->simd_failed_first_vector =
> true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.
>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv,
> &temp_hit_eof,
> > +
>              &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.
>
> --
> nathan
>

Here are some benchmarks showing what performance will look like for users
who continue to use default_toast_compression = pglz.

all compiled by meson with debugoptimized (-g -O2)

arm NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 10055.141000 ms
CSV :                 10549.174500 ms
TXT with 1/3 escapes: 10213.864750 ms
CSV with 1/3 quotes:  12188.039000 ms

arm NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 10070.153750 ms  -0.149304% regression
CSV :                 10161.348750 ms   3.676361% improvement
TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression

arm WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 11355.602750 ms
CSV :                 13893.110500 ms
TXT with 1/3 escapes: 12872.690500 ms
CSV with 1/3 quotes:  16722.262500 ms

arm WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 9001.007250 ms  20.735099% improvement
CSV :                 8988.679750 ms  35.301171% improvement
TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement


x86 NARROW master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 26243.084500 ms
CSV :                 27719.564000 ms
TXT with 1/3 escapes: 29578.192750 ms
CSV with 1/3 quotes:  34467.571250 ms

x86 NARROW master with inline with v11patch default_toast_compression = pglz
TXT :                 26371.996750 ms  -0.491224% regression
CSV :                 26137.186500 ms   5.708522% improvement
TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement

x86 WIDE master without inline (git revert
dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
TXT :                 28734.774750 ms
CSV :                 35700.485000 ms
TXT with 1/3 escapes: 32376.878250 ms
CSV with 1/3 quotes:  47024.985750 ms

x86 WIDE master with inline with v11patch default_toast_compression = pglz
TXT :                 22753.755750 ms  20.814567% improvement
CSV :                 22977.195500 ms  35.638982% improvement
TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 11:42  Nazir Bilal Yavuz <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-10 11:42 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nathan Bossart <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 10 Mar 2026 at 05:30, Manni Wood <[email protected]> wrote:
>
> Here are some benchmarks showing what performance will look like for users who continue to use default_toast_compression = pglz.
>
> all compiled by meson with debugoptimized (-g -O2)
>
> arm NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 10055.141000 ms
> CSV :                 10549.174500 ms
> TXT with 1/3 escapes: 10213.864750 ms
> CSV with 1/3 quotes:  12188.039000 ms
>
> arm NARROW master with inline with v11patch default_toast_compression = pglz
> TXT :                 10070.153750 ms  -0.149304% regression
> CSV :                 10161.348750 ms   3.676361% improvement
> TXT with 1/3 escapes: 10618.005000 ms  -3.956781% regression
> CSV with 1/3 quotes:  12279.366250 ms  -0.749319% regression
>
> arm WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 11355.602750 ms
> CSV :                 13893.110500 ms
> TXT with 1/3 escapes: 12872.690500 ms
> CSV with 1/3 quotes:  16722.262500 ms
>
> arm WIDE master with inline with v11patch default_toast_compression = pglz
> TXT :                 9001.007250 ms  20.735099% improvement
> CSV :                 8988.679750 ms  35.301171% improvement
> TXT with 1/3 escapes: 12191.137000 ms  5.294569% improvement
> CSV with 1/3 quotes:  16297.541500 ms  2.539854% improvement
>
>
> x86 NARROW master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 26243.084500 ms
> CSV :                 27719.564000 ms
> TXT with 1/3 escapes: 29578.192750 ms
> CSV with 1/3 quotes:  34467.571250 ms
>
> x86 NARROW master with inline with v11patch default_toast_compression = pglz
> TXT :                 26371.996750 ms  -0.491224% regression
> CSV :                 26137.186500 ms   5.708522% improvement
> TXT with 1/3 escapes: 28080.201000 ms   5.064514% improvement
> CSV with 1/3 quotes:  32557.377500 ms   5.542003% improvement
>
> x86 WIDE master without inline (git revert dc592a41557b072178f1798700bf9c69cd8e4235) default_toast_compression = pglz
> TXT :                 28734.774750 ms
> CSV :                 35700.485000 ms
> TXT with 1/3 escapes: 32376.878250 ms
> CSV with 1/3 quotes:  47024.985750 ms
>
> x86 WIDE master with inline with v11patch default_toast_compression = pglz
> TXT :                 22753.755750 ms  20.814567% improvement
> CSV :                 22977.195500 ms  35.638982% improvement
> TXT with 1/3 escapes: 29526.887000 ms   8.802551% improvement
> CSV with 1/3 quotes:  40298.196750 ms  14.304712% improvement

Thank you for the benchmark, results look nice! So, there is almost no
regression for both pglz and lz4 toast compression modes. Best case is
~60% improvement for the lz4 and ~35% improvement for the pglz.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 12:35  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-10 12:35 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Mon, 9 Mar 2026 at 21:25, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> > +                                                                        bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?

I think there is no need to inline this function. In the previous
version, SIMD code was in the main for loop which loops for every
character in the data. This means there was branching for every
character in the data. In the current version, SIMD code is outside of
this loop so there is no branching.


> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or EOF characters. In
> > +                              * practice, it does not matter for EOF because parsing ends
> > +                              * there, but we keep the behavior consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.

Done. I thought that was a small change but this removed more
complexity than I thought.


>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the first vector. This means
> > +                              * lines are not long enough to skip fully sized vector. If
> > +                              * this happens two times consecutively, then disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled = false;
> > +
> > +                                     cstate->simd_failed_first_vector = true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.

I agree with you, done.


>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> > +                                                                                     &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.

I added a comment, please let me know if you wouldn't like it.


--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v12-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (10.0K, 2-v12-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From de695aaf5c7ceeb4f62d2352fabbb111047a4434 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 206 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..166b1c4c415 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,155 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * We encountered a special character in the first vector.
+				 * This means line is not long enough to skip fully sized
+				 * vector. To be cautios, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1495,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Short exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1559,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-10 17:10  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-10 17:10 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Tue, Mar 10, 2026 at 03:35:30PM +0300, Nazir Bilal Yavuz wrote:
> Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD

This looks pretty good to me.  I'm hoping to take a closer look in the near
future, but I think we are approaching something committable.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 11:36  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 11:36 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Tue, 10 Mar 2026 at 20:10, Nathan Bossart <[email protected]> wrote:
>
> On Tue, Mar 10, 2026 at 03:35:30PM +0300, Nazir Bilal Yavuz wrote:
> > Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD
>
> This looks pretty good to me.  I'm hoping to take a closer look in the near
> future, but I think we are approaching something committable.

Thanks for looking into it!

I am attaching v13 of the patch.

0001 is basically some typo fixes on top of v12, no functional changes.

0002 has an attempt to remove some branches from SIMD code but since
it is kind of functional change, I wanted to attach that as another
patch. I think we can apply some parts of this, if not all.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v13-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (10.0K, 2-v13-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From f3e9a234ddc537544d510ad344eb1b8eb2127855 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:29 +0300
Subject: [PATCH v13 1/2] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 206 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..9f1256353c4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,155 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * If we encountered a special character in the first vector,
+				 * this means line is not long enough to skip fully sized
+				 * vector. To be cautious, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1495,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Early exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1559,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



  [text/x-patch] v13-0002-Upcoming-improvements.patch (2.4K, 3-v13-0002-Upcoming-improvements.patch)
  download | inline diff:
From 1006dac44cb208a8b164f2553772de942e79c2d4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:55 +0300
Subject: [PATCH v13 2/2] Upcoming improvements

---
 src/backend/commands/copyfromparse.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 9f1256353c4..55159b0122c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1333,8 +1333,6 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 static bool
 CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
 {
-	char		quotec = '\0';
-	char		escapec = '\0';
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
 	int			copy_buf_len;
@@ -1343,16 +1341,15 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof
 	bool		first_vector = true;
 	Vector8		nl = vector8_broadcast('\n');
 	Vector8		cr = vector8_broadcast('\r');
-	Vector8		bs = vector8_broadcast('\\');
-	Vector8		quote = vector8_broadcast(0);
+	Vector8		bs_or_quote = vector8_broadcast('\\');
 	Vector8		escape = vector8_broadcast(0);
 
 	if (is_csv)
 	{
-		quotec = cstate->opts.quote[0];
-		escapec = cstate->opts.escape[0];
+		char		quotec = cstate->opts.quote[0];
+		char		escapec = cstate->opts.escape[0];
 
-		quote = vector8_broadcast(quotec);
+		bs_or_quote = vector8_broadcast(quotec);
 		if (quotec != escapec)
 		{
 			unique_escapec = true;
@@ -1397,18 +1394,10 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof
 			/* Load a chunk of data into a vector register */
 			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
 
-			if (is_csv)
-			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, quote));
-				if (unique_escapec)
-					match = vector8_or(match, vector8_eq(chunk, escape));
-			}
-			else
-			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, bs));
-			}
+			match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+			match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
+			if (unique_escapec)
+				match = vector8_or(match, vector8_eq(chunk, escape));
 
 			/* Check if we found any special characters */
 			if (vector8_is_highbit_set(match))
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 12:19  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: KAZAR Ayoub @ 2026-03-11 12:19 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]> wrote:

> 0002 has an attempt to remove some branches from SIMD code but since
> it is kind of functional change, I wanted to attach that as another
> patch. I think we can apply some parts of this, if not all.
>
0002 sounds really good to have, haven't measured the diff but it's very
logical.

Another quick question though, do we need USE_NO_SIMD for any reason? I
just remembered that there's some simd paths like json that don't use it.

Regards,
Ayoub

>


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 13:10  Nazir Bilal Yavuz <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 13:10 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 15:19, KAZAR Ayoub <[email protected]> wrote:
>
> On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]> wrote:
>>
>> 0002 has an attempt to remove some branches from SIMD code but since
>> it is kind of functional change, I wanted to attach that as another
>> patch. I think we can apply some parts of this, if not all.
>
> 0002 sounds really good to have, haven't measured the diff but it's very logical.

I agree with you. I saw very small speedups like 1%-2% but I think
changes make sense regardless of the performance improvement.


> Another quick question though, do we need USE_NO_SIMD for any reason? I just remembered that there's some simd paths like json that don't use it.

vector8_eq() and vector8_highbit_mask() don't have non-SIMD
implementations, so we need to use USE_NO_SIMD.


-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 13:23  KAZAR Ayoub <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: KAZAR Ayoub @ 2026-03-11 13:23 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 2:10 PM Nazir Bilal Yavuz <[email protected]>
wrote:

> Hi,
>
> On Wed, 11 Mar 2026 at 15:19, KAZAR Ayoub <[email protected]> wrote:
> >
> > On Wed, Mar 11, 2026, 12:36 PM Nazir Bilal Yavuz <[email protected]>
> wrote:
> >>
> >> 0002 has an attempt to remove some branches from SIMD code but since
> >> it is kind of functional change, I wanted to attach that as another
> >> patch. I think we can apply some parts of this, if not all.
> >
> > 0002 sounds really good to have, haven't measured the diff but it's very
> logical.
>
> I agree with you. I saw very small speedups like 1%-2% but I think
> changes make sense regardless of the performance improvement.


>
> > Another quick question though, do we need USE_NO_SIMD for any reason? I
> just remembered that there's some simd paths like json that don't use it.
>
> vector8_eq() and vector8_highbit_mask() don't have non-SIMD
> implementations, so we need to use USE_NO_SIMD.
>
Aha ! that's true, thanks.

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 18:09  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-11 18:09 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote:
> 0002 has an attempt to remove some branches from SIMD code but since
> it is kind of functional change, I wanted to attach that as another
> patch. I think we can apply some parts of this, if not all.

Could you describe what this is doing and what the performance impact is?

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 18:49  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 18:49 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 21:09, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote:
> > 0002 has an attempt to remove some branches from SIMD code but since
> > it is kind of functional change, I wanted to attach that as another
> > patch. I think we can apply some parts of this, if not all.
>
> Could you describe what this is doing and what the performance impact is?

SIMD code check these characters:

csv mode: nl, cr, quote and possibly escape.

text mode: nl, cr and bs.

v12 checks them like that:

            if (is_csv)
            {
                match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
                match = vector8_or(match, vector8_eq(chunk, quote));
                if (unique_escapec)
                    match = vector8_or(match, vector8_eq(chunk, escape));
            }
            else
            {
                match = vector8_or(vector8_eq(chunk, nl),
vector8_eq(chunk, cr));
                match = vector8_or(match, vector8_eq(chunk, bs));
            }

But actually we know that we will definitely check nl, cr and one of
the quote or bs characters in the code. So, we can introduce a new
variable named bs_or_quote, it will be equal to bs if the mode is text
and it will be equal to quote if the mode is csv. Then, we can remove
the 'if (is_csv)' check and only check for escape ('if
(unique_escapec)'). Now code will look like that:

            match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
            match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
            if (unique_escapec)
                match = vector8_or(match, vector8_eq(chunk, escape));

That is what v13-0002 does. I saw 1%-2% speedups with this change and
there was no regression.

Regardless of introducing the bs_or_quote variable, we can move 'match
= vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));' outside
of the if checks, though.

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 19:02  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-11 19:02 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 09:49:22PM +0300, Nazir Bilal Yavuz wrote:
> That is what v13-0002 does. I saw 1%-2% speedups with this change and
> there was no regression.

Thanks for the explanation.  Is there any reason _not_ to add this to 0001?

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 19:22  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-11 19:22 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 22:02, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 09:49:22PM +0300, Nazir Bilal Yavuz wrote:
> > That is what v13-0002 does. I saw 1%-2% speedups with this change and
> > there was no regression.
>
> Thanks for the explanation.  Is there any reason _not_ to add this to 0001?

I noticed this improvement today. To make it easier to review for
anyone who may have already started looking at v12, I attached the new
functional code changes as 0002. There was no other reason.

Here is v14 which is v13-0001 + v13-0002.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v14-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (9.8K, 2-v14-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
  download | inline diff:
From d19e62275db2943cc4275cac2262c63d0bd4436c Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 11 Mar 2026 14:04:29 +0300
Subject: [PATCH v14] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 195 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 194 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..55159b0122c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,144 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs_or_quote = vector8_broadcast('\\');
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		char		quotec = cstate->opts.quote[0];
+		char		escapec = cstate->opts.escape[0];
+
+		bs_or_quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+			match = vector8_or(match, vector8_eq(chunk, bs_or_quote));
+			if (unique_escapec)
+				match = vector8_or(match, vector8_eq(chunk, escape));
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * If we encountered a special character in the first vector,
+				 * this means line is not long enough to skip fully sized
+				 * vector. To be cautious, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1484,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Early exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1548,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-11 20:42  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-11 20:42 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> Here is v14 which is v13-0001 + v13-0002.

Thanks!  It's getting close.

> +		/*
> +		 * Temporary variables are used here instead of passing the actual
> +		 * variables (especially input_buf_ptr) directly to the helper. Taking
> +		 * the address of a local variable might force the compiler to
> +		 * allocate it on the stack rather than in a register.  Because
> +		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
> +		 * it in a register is important for performance.
> +		 */
> +		int			temp_input_buf_ptr;
> +		bool		temp_hit_eof = hit_eof;

A few notes:

* Does using a temporary variable for hit_eof actually make a difference?
AFAICT that's only updated when loading more data.

* Does inlining the function produce the same results?

* Also, I'm curious what the usual benchmarks look like with and without
this hack for the latest patch.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-12 10:59  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-12 10:59 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Wed, 11 Mar 2026 at 23:42, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> > Here is v14 which is v13-0001 + v13-0002.
>
> Thanks!  It's getting close.
>
> > +             /*
> > +              * Temporary variables are used here instead of passing the actual
> > +              * variables (especially input_buf_ptr) directly to the helper. Taking
> > +              * the address of a local variable might force the compiler to
> > +              * allocate it on the stack rather than in a register.  Because
> > +              * input_buf_ptr is used heavily in the hot scalar path below, keeping
> > +              * it in a register is important for performance.
> > +              */
> > +             int                     temp_input_buf_ptr;
> > +             bool            temp_hit_eof = hit_eof;
>
> A few notes:
>
> * Does using a temporary variable for hit_eof actually make a difference?
> AFAICT that's only updated when loading more data.
>
> * Does inlining the function produce the same results?
>
> * Also, I'm curious what the usual benchmarks look like with and without
> this hack for the latest patch.

I tried to benchmark all of these questions, here are the results:

Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155).

v14 means d841ca2d14 + v14.

v14 + #1 means removing temporary variables.

v14 + #2 means removing temp_hit_eof variable only.

v14 + #3 means inlining CopyReadLineTextSIMDHelper().

v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing
temporary variables (#1).

------------------------------------------------------------

Results for default_toast_compression = 'lz4':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+

------------------------------------------------------------

Results for default_toast_compression = 'pglz':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+


------------------------------------------------------------

By looking these results:

v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases.

v14 + #4 performs worse on CSV & wide & 1/3 cases.

v14 and v14 + #2 perform very similarly. They don't have regression. I
think we can move forward with one of these.

--
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-12 17:37  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-03-12 17:37 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Here is what I have staged for commit, which I'm planning to do tomorrow.
Please review and/or test if you are able.

-- 
nathan


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 02:39  Manni Wood <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Manni Wood @ 2026-03-13 02:39 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Mar 12, 2026 at 12:37 PM Nathan Bossart <[email protected]>
wrote:

> Here is what I have staged for commit, which I'm planning to do tomorrow.
> Please review and/or test if you are able.
>
> --
> nathan
>

Hello, Nathan!

I found some time this evening to run some benchmarks using your v15 patch.
I hope these help.

lz4 - arm

arm NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 10203.493250 ms
CSV :                 10217.946000 ms
TXT with 1/3 escapes: 10305.912750 ms
CSV with 1/3 quotes:  12339.182000 ms

arm NARROW v15 default_toast_compression = lz4
TXT :                 10205.261500 ms  -0.017330% regression
CSV :                 10358.898500 ms  -1.379460% regression
TXT with 1/3 escapes: 10053.073000 ms  2.453347% improvement
CSV with 1/3 quotes:  11881.337000 ms  3.710497% improvement

arm WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 5613.525250 ms
CSV :                 8069.692750 ms
TXT with 1/3 escapes: 7088.888250 ms
CSV with 1/3 quotes:  10902.545500 ms

arm WIDE v15 default_toast_compression = lz4
TXT :                 3201.494500 ms  42.968200% improvement
CSV :                 3146.033750 ms  61.014207% improvement
TXT with 1/3 escapes: 6677.907500 ms  5.797535% improvement
CSV with 1/3 quotes:  10766.909500 ms  1.244076% improvement

lz4 - x86

x86 NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 26110.287750 ms
CSV :                 27923.199750 ms
TXT with 1/3 escapes: 27984.483250 ms
CSV with 1/3 quotes:  34387.239000 ms

x86 NARROW v15 default_toast_compression = lz4
TXT :                 26019.629000 ms  0.347215% improvement
CSV :                 26379.889000 ms  5.526984% improvement
TXT with 1/3 escapes: 28865.322750 ms  -3.147600% regression
CSV with 1/3 quotes:  33218.293250 ms  3.399359% improvement

x86 WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = lz4
TXT :                 15829.765000 ms
CSV :                 20479.146000 ms
TXT with 1/3 escapes: 18437.507500 ms
CSV with 1/3 quotes:  29749.379250 ms

x86 WIDE v15 default_toast_compression = lz4
TXT :                 8056.305000 ms  49.106604% improvement
CSV :                 7997.555500 ms  60.947808% improvement
TXT with 1/3 escapes: 16324.925500 ms  11.458067% improvement
CSV with 1/3 quotes:  29978.346500 ms  -0.769654% regression



pglz - arm

arm NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 10334.666250 ms
CSV :                 10978.851250 ms
TXT with 1/3 escapes: 11076.502750 ms
CSV with 1/3 quotes:  12582.679000 ms

arm NARROW v15 default_toast_compression = pglz
TXT :                 10002.507750 ms  3.214023% improvement
CSV :                 10017.436250 ms  8.756973% improvement
TXT with 1/3 escapes: 10179.949000 ms  8.094195% improvement
CSV with 1/3 quotes:  12088.836750 ms  3.924778% improvement

arm WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 11403.206000 ms
CSV :                 13915.718750 ms
TXT with 1/3 escapes: 12888.060250 ms
CSV with 1/3 quotes:  16741.463000 ms

arm WIDE v15 default_toast_compression = pglz
TXT :                 9005.868250 ms  21.023366% improvement
CSV :                 8935.159250 ms  35.790889% improvement
TXT with 1/3 escapes: 12432.655250 ms  3.533542% improvement
CSV with 1/3 quotes:  16564.852250 ms  1.054930% improvement

pglz - x86

x86 NARROW master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 26404.516250 ms
CSV :                 28138.719000 ms
TXT with 1/3 escapes: 28084.379750 ms
CSV with 1/3 quotes:  34502.702250 ms

x86 NARROW v15 default_toast_compression = pglz
TXT :                 26438.415000 ms  -0.128382% regression
CSV :                 26869.718000 ms  4.509804% improvement
TXT with 1/3 escapes: 29379.299750 ms  -4.610819% regression
CSV with 1/3 quotes:  33371.390250 ms  3.278908% improvement

x86 WIDE master (git revert dc592a41557b072178f1798700bf9c69cd8e4235)
default_toast_compression = pglz
TXT :                 30595.372000 ms
CSV :                 35665.908500 ms
TXT with 1/3 escapes: 32746.252000 ms
CSV with 1/3 quotes:  44136.542750 ms

x86 WIDE v15 default_toast_compression = pglz
TXT :                 22681.770750 ms  25.865354% improvement
CSV :                 22692.153000 ms  36.375789% improvement
TXT with 1/3 escapes: 30638.978000 ms  6.435161% improvement
CSV with 1/3 quotes:  44330.233000 ms  -0.438843% regression
-- 
-- Manni Wood EDB: https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 11:57  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 11:57 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Thu, 12 Mar 2026 at 20:37, Nathan Bossart <[email protected]> wrote:
>
> Here is what I have staged for commit, which I'm planning to do tomorrow.
> Please review and/or test if you are able.

Thank you!

Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
tmp_hit_eof variable then the regression disappears. Also, if I use a
struct like below, regression disappears again.

typedef struct CopyReadLineSIMDResult
{
    int            input_buf_ptr;
    bool        hit_eof;
    bool        result;
} CopyReadLineSIMDResult;

When I removed the tmp_hit_eof variable on v14, I didn't encounter any
regression. I really don't understand why this is happening on my end.
Manni didn't encounter any regression on the benchmark [1].

I benchmarked v15 and both of the cases above:

------------------------------------------------------------

Results for default_toast_compression = 'lz4':

+--------------------------------------------------+
|                 Optimization: -O2                |
+-------------------+--------------+---------------+
|                   |     Text     |      CSV      |
+-------------------+------+-------+-------+-------+
|        WIDE       | None |  1/3  |  None |  1/3  |
+-------------------+------+-------+-------+-------+
|     Old master    | 4260 |  4789 |  5930 |  8276 |
+-------------------+------+-------+-------+-------+
|        v14        | 2489 |  4439 |  2529 |  8098 |
+-------------------+------+-------+-------+-------+
|        v15        | 2494 |  4235 |  2490 |  9140 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 2487 |  4539 |  2478 |  8041 |
+-------------------+------+-------+-------+-------+
|    v15 + struct   | 2490 |  4531 |  2483 |  7756 |
+-------------------+------+-------+-------+-------+
|                   |      |       |       |       |
+-------------------+------+-------+-------+-------+
|                   |      |       |       |       |
+-------------------+------+-------+-------+-------+
|                   |     Text     |      CSV      |
+-------------------+------+-------+-------+-------+
|       NARROW      | None |  1/3  |  None |  1/3  |
+-------------------+------+-------+-------+-------+
|     Old master    | 9955 | 10056 | 10329 | 10872 |
+-------------------+------+-------+-------+-------+
|        v14        | 9917 | 10080 | 10104 | 10510 |
+-------------------+------+-------+-------+-------+
|        v15        | 9898 | 10062 | 10232 | 10483 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 9847 | 10004 | 10192 | 10437 |
+-------------------+------+-------+-------+-------+
|    v15 + struct   | 9877 | 10008 | 10107 | 10521 |
+-------------------+------+-------+-------+-------+


------------------------------------------------------------

Results for default_toast_compression = 'pglz':

+---------------------------------------------------+
|                 Optimization: -O2                 |
+-------------------+---------------+---------------+
|                   |      Text     |      CSV      |
+-------------------+-------+-------+-------+-------+
|        WIDE       |  None |  1/3  |  None |  1/3  |
+-------------------+-------+-------+-------+-------+
|     Old master    | 10579 | 10927 | 12276 | 14488 |
+-------------------+-------+-------+-------+-------+
|        v14        |  8832 | 10646 |  8815 | 14352 |
+-------------------+-------+-------+-------+-------+
|        v15        |  8859 | 10489 |  8835 | 15414 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof |  8828 | 10829 |  8840 | 14297 |
+-------------------+-------+-------+-------+-------+
|    v15 + struct   |  8847 | 10829 |  8846 | 14003 |
+-------------------+-------+-------+-------+-------+
|                   |       |       |       |       |
+-------------------+-------+-------+-------+-------+
|                   |       |       |       |       |
+-------------------+-------+-------+-------+-------+
|                   |      Text     |      CSV      |
+-------------------+-------+-------+-------+-------+
|       NARROW      |  None |  1/3  |  None |  1/3  |
+-------------------+-------+-------+-------+-------+
|     Old master    |  9952 | 10342 | 10112 | 10861 |
+-------------------+-------+-------+-------+-------+
|        v14        |  9907 | 10344 | 10103 | 10492 |
+-------------------+-------+-------+-------+-------+
|        v15        |  9897 | 10261 | 10126 | 10490 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof |  9848 | 10218 | 10184 | 10425 |
+-------------------+-------+-------+-------+-------+
|    v15 + struct   |  9858 | 10150 | 10116 | 10464 |
+-------------------+-------+-------+-------+-------+

------------------------------------------------------------

It can be seen that the 'csv & wide & 1/3' case is much better on 'v15
+ struct' and 'v15 + tmp_hit_eof' but 'text & wide & 1/3' case is a
bit worse but still better than master.


Regardless of the issues above, I encountered a compiler warning on
the v15, if 'USE_NO_SIMD' is defined, then this warning appears:

copyfromparse.c:1780:1: warning: label ‘out’ defined but not used
[-Wunused-label]

Rest of the changes look good to me. v16 is attached, it fixes the
warning by protecting 'out' with '#ifndef USE_NO_SIMD', no other
changes. In addition to that, I put 'using CopyReadLineSIMDResult
struct' as a 0002 to get an opinion.


[1] https://postgr.es/m/CAKWEB6pMbdMDvhfaX1Z0eSULVQFYhEhssaRHdOxAX_5OYubxKw%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft


Attachments:

  [text/x-patch] v16-0001-Optimize-COPY-FROM-FORMAT-text-csv-using-SIMD.patch (8.9K, 2-v16-0001-Optimize-COPY-FROM-FORMAT-text-csv-using-SIMD.patch)
  download | inline diff:
From 49e82abfc752032fb10e2c144f7656f6fdf78366 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Thu, 12 Mar 2026 12:32:23 -0500
Subject: [PATCH v16 1/2] Optimize COPY FROM (FORMAT {text,csv}) using SIMD.

Presently, such commands scan the input buffer one byte at a time
looking for special characters.  This commit adds a new path that
uses SIMD instructions to skip over chunks of data without any
special characters.  This can be much faster.

To avoid regressions, SIMD processing is disabled for the remainder
of the COPY FROM command as soon as we encounter a short line or a
special character (except for end-of-line characters, else we'd
always disable it after the first line).  This is perhaps too
conservative, but it could probably be made more lenient in the
future via fine-tuned heuristics.

Author: Nazir Bilal Yavuz <[email protected]>
Co-authored-by: Shinya Kato <[email protected]>
Reviewed-by: Ayoub Kazar <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Tested-by: Manni Wood <[email protected]>
Tested-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   1 +
 src/backend/commands/copyfromparse.c     | 182 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   1 +
 3 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0ece40557c8..95f6cb416a9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1746,6 +1746,7 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
+	cstate->simd_enabled = true;
 
 	/*
 	 * Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..bae3bf6fb0d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -1311,6 +1312,152 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Helper function for CopyReadLineText() that uses SIMD instructions to scan
+ * the input buffer for special characters.  This can be much faster.
+ *
+ * Note that we disable SIMD for the remainder of the COPY FROM command upon
+ * encountering a special character (except for end-of-line characters) or a
+ * short line.  This is perhaps too conservative, but it should help avoid
+ * regressions.  It could probably be made more lenient in the future via
+ * fine-tuned heuristics.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+						   bool *hit_eof_p, int *input_buf_ptr_p)
+{
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		unique_esc_char;	/* for csv, do quote/esc chars differ? */
+	bool		first = true;
+	bool		result = false;
+	const Vector8 nl_vec = vector8_broadcast('\n');
+	const Vector8 cr_vec = vector8_broadcast('\r');
+	Vector8		bs_or_quote_vec;	/* '\' for text, quote for csv */
+	Vector8		esc_vec;		/* only for csv */
+
+	if (is_csv)
+	{
+		char		quote = cstate->opts.quote[0];
+		char		esc = cstate->opts.escape[0];
+
+		bs_or_quote_vec = vector8_broadcast(quote);
+		esc_vec = vector8_broadcast(esc);
+		unique_esc_char = (quote != esc);
+	}
+	else
+	{
+		bs_or_quote_vec = vector8_broadcast('\\');
+		unique_esc_char = false;
+	}
+
+	/*
+	 * For a little extra speed within the loop, we copy some state members
+	 * into local variables. Note that we need to use a separate local
+	 * variable for input_buf_ptr so that the REFILL_LINEBUF macro works.  We
+	 * copy its value into the input_buf_ptr_p argument before returning.
+	 */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	/*
+	 * See the corresponding loop in CopyReadLineText() for more information
+	 * about the purpose of this loop.  This one does the same thing using
+	 * SIMD instructions, although we are quick to bail out to the scalar path
+	 * if we encounter a special character.
+	 */
+	for (;;)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		/* Load more data if needed. */
+		if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*hit_eof_p = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		/*
+		 * If we still don't have enough data for the SIMD path, fall back to
+		 * the scalar code.  Note that this doesn't necessarily mean we
+		 * encountered a short line, so we leave cstate->simd_enabled set to
+		 * true.
+		 */
+		if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+			break;
+
+		/*
+		 * If we made it here, we have at least enough data to fit in a
+		 * Vector8, so we can use SIMD instructions to scan for special
+		 * characters.
+		 */
+		vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+		/*
+		 * Check for \n, \r, \\ (for text), quotes (for csv), and escapes (for
+		 * csv, if different from quotes).
+		 */
+		match = vector8_eq(chunk, nl_vec);
+		match = vector8_or(match, vector8_eq(chunk, cr_vec));
+		match = vector8_or(match, vector8_eq(chunk, bs_or_quote_vec));
+		if (unique_esc_char)
+			match = vector8_or(match, vector8_eq(chunk, esc_vec));
+
+		/*
+		 * If we found a special character, advance to it and hand off to the
+		 * scalar path.  Except for end-of-line characters, we also disable
+		 * SIMD processing for the remainder of the COPY FROM command.
+		 */
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+			char		c;
+
+			mask = vector8_highbit_mask(match);
+			input_buf_ptr += pg_rightmost_one_pos32(mask);
+
+			/*
+			 * Don't disable SIMD if we found \n or \r, else we'd stop using
+			 * SIMD instructions after the first line.  As an exception, we do
+			 * disable it if this is the first vector we processed, as that
+			 * means the line is too short for SIMD.
+			 */
+			c = copy_input_buf[input_buf_ptr];
+			if (first || (c != '\n' && c != '\r'))
+				cstate->simd_enabled = false;
+
+			break;
+		}
+
+		/* That chunk was clear of special characters, so we can skip it. */
+		input_buf_ptr += sizeof(Vector8);
+		first = false;
+	}
+
+	*input_buf_ptr_p = input_buf_ptr;
+	return result;
+}
+#endif							/* ! USE_NO_SIMD */
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1361,11 +1508,36 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
 	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
+	 * For a little extra speed within the loop, we copy some state
+	 * information into local variables.  input_buf_ptr could be changed in
+	 * the SIMD path, so we must set that one before it.  The others are set
+	 * afterwards.
 	 */
-	copy_input_buf = cstate->input_buf;
 	input_buf_ptr = cstate->input_buf_index;
+
+	/*
+	 * We first try to use SIMD for the task described above, falling back to
+	 * the scalar path (i.e., the loop below) if needed.
+	 */
+#ifndef USE_NO_SIMD
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Using a temporary variable seems to encourage the compiler to keep
+		 * it in a register, which is beneficial for performance.
+		 */
+		int			tmp_input_buf_ptr;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
+											&tmp_input_buf_ptr);
+		input_buf_ptr = tmp_input_buf_ptr;
+
+		if (result)
+			goto out;
+	}
+#endif							/* ! USE_NO_SIMD */
+
+	copy_input_buf = cstate->input_buf;
 	copy_buf_len = cstate->input_buf_len;
 
 	for (;;)
@@ -1605,6 +1777,10 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		}
 	}							/* end of outer loop */
 
+#ifndef USE_NO_SIMD
+out:
+#endif							/* ! USE_NO_SIMD */
+
 	/*
 	 * Transfer any still-uncopied data to line_buf.
 	 */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..9d3e244ee55 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,7 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	bool		simd_enabled;	/* use SIMD to scan for special chars? */
 
 	/*
 	 * True if the corresponding attribute's is a constrained domain. This
-- 
2.47.3



  [text/x-patch] v16-0002-Use-CopyReadLineSIMDResult-struct.patch (4.3K, 3-v16-0002-Use-CopyReadLineSIMDResult-struct.patch)
  download | inline diff:
From a32d853e020b1660510f960e7ba52707bbd6afe3 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Mar 2026 14:25:45 +0300
Subject: [PATCH v16 2/2] Use CopyReadLineSIMDResult struct

---
 src/backend/commands/copyfromparse.c | 44 +++++++++++++++++-----------
 src/tools/pgindent/typedefs.list     |  1 +
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index bae3bf6fb0d..3e3358af9e0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1313,6 +1313,17 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 }
 
 #ifndef USE_NO_SIMD
+/*
+ * Result of CopyReadLineTextSIMDHelper, returned by value to avoid
+ * pointer parameters that could inhibit register allocation in the caller.
+ */
+typedef struct CopyReadLineSIMDResult
+{
+	int			input_buf_ptr;
+	bool		hit_eof;
+	bool		result;
+} CopyReadLineSIMDResult;
+
 /*
  * Helper function for CopyReadLineText() that uses SIMD instructions to scan
  * the input buffer for special characters.  This can be much faster.
@@ -1323,21 +1334,23 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
  * regressions.  It could probably be made more lenient in the future via
  * fine-tuned heuristics.
  */
-static bool
-CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
-						   bool *hit_eof_p, int *input_buf_ptr_p)
+static CopyReadLineSIMDResult
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv)
 {
+	CopyReadLineSIMDResult ret;
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
 	int			copy_buf_len;
 	bool		unique_esc_char;	/* for csv, do quote/esc chars differ? */
 	bool		first = true;
-	bool		result = false;
 	const Vector8 nl_vec = vector8_broadcast('\n');
 	const Vector8 cr_vec = vector8_broadcast('\r');
 	Vector8		bs_or_quote_vec;	/* '\' for text, quote for csv */
 	Vector8		esc_vec;		/* only for csv */
 
+	ret.hit_eof = false;
+	ret.result = false;
+
 	if (is_csv)
 	{
 		char		quote = cstate->opts.quote[0];
@@ -1357,7 +1370,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 	 * For a little extra speed within the loop, we copy some state members
 	 * into local variables. Note that we need to use a separate local
 	 * variable for input_buf_ptr so that the REFILL_LINEBUF macro works.  We
-	 * copy its value into the input_buf_ptr_p argument before returning.
+	 * copy its value into the return struct before returning.
 	 */
 	copy_input_buf = cstate->input_buf;
 	input_buf_ptr = cstate->input_buf_index;
@@ -1381,7 +1394,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 
 			CopyLoadInputBuf(cstate);
 			/* update our local variables */
-			*hit_eof_p = cstate->input_reached_eof;
+			ret.hit_eof = cstate->input_reached_eof;
 			input_buf_ptr = cstate->input_buf_index;
 			copy_buf_len = cstate->input_buf_len;
 
@@ -1391,7 +1404,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 			 */
 			if (INPUT_BUF_BYTES(cstate) <= 0)
 			{
-				result = true;
+				ret.result = true;
 				break;
 			}
 		}
@@ -1453,8 +1466,8 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
 		first = false;
 	}
 
-	*input_buf_ptr_p = input_buf_ptr;
-	return result;
+	ret.input_buf_ptr = input_buf_ptr;
+	return ret;
 }
 #endif							/* ! USE_NO_SIMD */
 
@@ -1522,15 +1535,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 #ifndef USE_NO_SIMD
 	if (cstate->simd_enabled)
 	{
-		/*
-		 * Using a temporary variable seems to encourage the compiler to keep
-		 * it in a register, which is beneficial for performance.
-		 */
-		int			tmp_input_buf_ptr;
+		CopyReadLineSIMDResult simd_result;
 
-		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
-											&tmp_input_buf_ptr);
-		input_buf_ptr = tmp_input_buf_ptr;
+		simd_result = CopyReadLineTextSIMDHelper(cstate, is_csv);
+		hit_eof = simd_result.hit_eof;
+		input_buf_ptr = simd_result.input_buf_ptr;
+		result = simd_result.result;
 
 		if (result)
 			goto out;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0de55183793..2acc40533c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -538,6 +538,7 @@ CopyMethod
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
+CopyReadLineSIMDResult
 CopySeqResult
 CopySource
 CopyStmt
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 13:34  Nazir Bilal Yavuz <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 13:34 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]> wrote:
>
> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
> tmp_hit_eof variable then the regression disappears. Also, if I use a
> struct like below, regression disappears again.

> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
> regression. I really don't understand why this is happening on my end.
> Manni didn't encounter any regression on the benchmark [1].

Problem might be related to gcc. I am using Debian Trixie and my
current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
is no regression, which makes more sense IMO.

Here is a comparison for csv & wide & 1/3 case. Postgres is compiled
with buildtype=debugoptimized and default_toast_compression is lz4.

+--------------------------------+
|   CSV & WIDE & 1/3, LZ4, -O2   |
+--------------+--------+--------+
|              |   gcc  |  clang |
|              | 14.0.2 | 19.1.7 |
+--------------+--------+--------+
|  old master  |  8250  |  10400 |
+--------------+--------+--------+
|      v14     |  8100  |  9800  |
+--------------+--------+--------+
|      v15     |  9200  |  9800  |
+--------------+--------+--------+
| v15 + struct |  7750  |  9800  |
+--------------+--------+--------+

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 14:05  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 2 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-03-13 14:05 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 04:34:49PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]> wrote:
>> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
>> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
>> tmp_hit_eof variable then the regression disappears. Also, if I use a
>> struct like below, regression disappears again.
> 
>> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
>> regression. I really don't understand why this is happening on my end.
>> Manni didn't encounter any regression on the benchmark [1].
> 
> Problem might be related to gcc. I am using Debian Trixie and my
> current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
> compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
> is no regression, which makes more sense IMO.

Let's just re-add the temporary variable for hit_eof.  The struct idea is
clever, but it's just a little more complicated than I think is necessary
here.

I've also removed the goto in favor of just duplicating the "out" code,
like you had before.  I'd like to avoid sporadic #ifndef USE_NO_SIMD uses,
and goto is out of fashion, anyway.

-- 
nathan


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 15:00  Nathan Bossart <[email protected]>
  parent: Manni Wood <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-03-13 15:00 UTC (permalink / raw)
  To: Manni Wood <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Thu, Mar 12, 2026 at 09:39:38PM -0500, Manni Wood wrote:
> I found some time this evening to run some benchmarks using your v15 patch.
> I hope these help.

Thanks!

> x86 NARROW v15 default_toast_compression = lz4
> TXT :                 26019.629000 ms  0.347215% improvement
> CSV :                 26379.889000 ms  5.526984% improvement
> TXT with 1/3 escapes: 28865.322750 ms  -3.147600% regression
> CSV with 1/3 quotes:  33218.293250 ms  3.399359% improvement

> x86 NARROW v15 default_toast_compression = pglz
> TXT :                 26438.415000 ms  -0.128382% regression
> CSV :                 26869.718000 ms  4.509804% improvement
> TXT with 1/3 escapes: 29379.299750 ms  -4.610819% regression
> CSV with 1/3 quotes:  33371.390250 ms  3.278908% improvement

Those 3-5% regressions are interesting, but given there are similar
"improvements" for the surrounding cases, I'm going to consider them as
noise for now and proceed with the patch.  If folks feel strongly about
digging deeper here, I'm happy to revisit the subject.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 15:58  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 15:58 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 17:05, Nathan Bossart <[email protected]>
wrote:
>
> On Fri, Mar 13, 2026 at 04:34:49PM +0300, Nazir Bilal Yavuz wrote:
> > On Fri, 13 Mar 2026 at 14:57, Nazir Bilal Yavuz <[email protected]>
wrote:
> >> Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
> >> on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
> >> tmp_hit_eof variable then the regression disappears. Also, if I use a
> >> struct like below, regression disappears again.
> >
> >> When I removed the tmp_hit_eof variable on v14, I didn't encounter any
> >> regression. I really don't understand why this is happening on my end.
> >> Manni didn't encounter any regression on the benchmark [1].
> >
> > Problem might be related to gcc. I am using Debian Trixie and my
> > current gcc version is 'gcc version 14.2.0 (Debian 14.2.0-19)'. If I
> > compile Postgres with 'Debian clang version 19.1.7 (3+b1)', then there
> > is no regression, which makes more sense IMO.
>
> Let's just re-add the temporary variable for hit_eof.  The struct idea is
> clever, but it's just a little more complicated than I think is necessary
> here.
>
> I've also removed the goto in favor of just duplicating the "out" code,
> like you had before.  I'd like to avoid sporadic #ifndef USE_NO_SIMD uses,
> and goto is out of fashion, anyway.

Thanks! v17 LGTM. I didn't encounter any regressions.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:08  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-13 16:08 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 06:58:49PM +0300, Nazir Bilal Yavuz wrote:
> Thanks! v17 LGTM. I didn't encounter any regressions.

Committed.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:13  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 16:13 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi,

On Fri, 13 Mar 2026 at 19:08, Nathan Bossart <[email protected]> wrote:
>
> On Fri, Mar 13, 2026 at 06:58:49PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks! v17 LGTM. I didn't encounter any regressions.
>
> Committed.

Thank you for taking care of this!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:14  Nazir Bilal Yavuz <[email protected]>
  parent: Nathan Bossart <[email protected]>
  1 sibling, 2 replies; 114+ messages in thread

From: Nazir Bilal Yavuz @ 2026-03-13 16:14 UTC (permalink / raw)
  To: Greg Burd <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

Hi Greg,

On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>
> I've always been a fan of these kinds of optimization so I couldn't resist reviewing, but I know you're ready to commit so I'll just check on some systems I have. :)

Thank you for the review!

> At first glance the implementation seems conservative, but correct and safe. Local testing on on Linux/FreeBSD x86_64, and Win11/aarch64/MSVC seem good. I also tried IllumOS/SPARCv9 and with some fixes (from another active thread) to the build system and it worked just fine too.  I'm sure the 10 people care will be thrilled. ;-

Yes, we can probably improve this further with heuristics, but for now
we wanted to avoid introducing any potential regressions.

> I also created a few tests (attached) to check boundary conditions, I might add some along with the RISC-V work.

Thank you for the tests! I have checked them and the output is the
same on both v17 and master. Do you think it would make sense to add
them as regression tests?

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 16:16  Nathan Bossart <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Nathan Bossart @ 2026-03-13 16:16 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Greg Burd <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 07:14:06PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>> I also created a few tests (attached) to check boundary conditions, I
>> might add some along with the RISC-V work.
> 
> Thank you for the tests! I have checked them and the output is the
> same on both v17 and master. Do you think it would make sense to add
> them as regression tests?

Seems like a good idea.  I was curious what the test coverage looked like
without extra tests.  Once there's a report, we could choose a subset of
these to close any gaps.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:21  Greg Burd <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  1 sibling, 1 reply; 114+ messages in thread

From: Greg Burd @ 2026-03-13 17:21 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On Fri, Mar 13, 2026, at 12:14 PM, Nazir Bilal Yavuz wrote:
> Hi Greg,

Hello Nazir,

> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>
>> I've always been a fan of these kinds of optimization so I couldn't resist reviewing, but I know you're ready to commit so I'll just check on some systems I have. :)
>
> Thank you for the review!

Thank YOU for the work fixing this. :)

>> At first glance the implementation seems conservative, but correct and safe. Local testing on on Linux/FreeBSD x86_64, and Win11/aarch64/MSVC seem good. I also tried IllumOS/SPARCv9 and with some fixes (from another active thread) to the build system and it worked just fine too.  I'm sure the 10 people care will be thrilled. ;-
>
> Yes, we can probably improve this further with heuristics, but for now
> we wanted to avoid introducing any potential regressions.
>> I also created a few tests (attached) to check boundary conditions, I might add some along with the RISC-V work.
>
> Thank you for the tests! I have checked them and the output is the
> same on both v17 and master. Do you think it would make sense to add
> them as regression tests?

If there are tests that materially add to the coverage that's a good thing to consider adding.  I don't think all those tests are necessary.

best.

-greg

> -- 
> Regards,
> Nazir Bilal Yavuz
> Microsoft





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:22  Greg Burd <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Greg Burd @ 2026-03-13 17:22 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers


On Fri, Mar 13, 2026, at 12:16 PM, Nathan Bossart wrote:
> On Fri, Mar 13, 2026 at 07:14:06PM +0300, Nazir Bilal Yavuz wrote:
>> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>> I also created a few tests (attached) to check boundary conditions, I
>>> might add some along with the RISC-V work.
>> 
>> Thank you for the tests! I have checked them and the output is the
>> same on both v17 and master. Do you think it would make sense to add
>> them as regression tests?
>
> Seems like a good idea.  I was curious what the test coverage looked like
> without extra tests.  Once there's a report, we could choose a subset of
> these to close any gaps.

+1, you said it better than I did! :)

> -- 
> nathan

best.

-greg





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

* Re: Speed up COPY FROM text/CSV parsing using SIMD
@ 2026-03-13 17:33  Nathan Bossart <[email protected]>
  parent: Greg Burd <[email protected]>
  0 siblings, 0 replies; 114+ messages in thread

From: Nathan Bossart @ 2026-03-13 17:33 UTC (permalink / raw)
  To: Greg Burd <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Manni Wood <[email protected]>; KAZAR Ayoub <[email protected]>; Neil Conway <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; pgsql-hackers

On Fri, Mar 13, 2026 at 01:21:38PM -0400, Greg Burd wrote:
> On Fri, Mar 13, 2026, at 12:14 PM, Nazir Bilal Yavuz wrote:
>> On Fri, 13 Mar 2026 at 18:29, Greg Burd <[email protected]> wrote:
>>> I also created a few tests (attached) to check boundary conditions, I
>>> might add some along with the RISC-V work.
>>
>> Thank you for the tests! I have checked them and the output is the
>> same on both v17 and master. Do you think it would make sense to add
>> them as regression tests?
> 
> If there are tests that materially add to the coverage that's a good
> thing to consider adding.  I don't think all those tests are necessary.

We seem to have good coverage on the new code [0].  I still wouldn't mind
adding a couple of tests for correctness, if folks want them.

[0] https://coverage.postgresql.org/src/backend/commands/copyfromparse.c.gcov.html

-- 
nathan





^ permalink  raw  reply  [nested|flat] 114+ messages in thread

end of thread, other threads:[~2026-03-13 17:33 UTC | newest]

Thread overview: 114+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-07 01:48 Speed up COPY FROM text/CSV parsing using SIMD Shinya Kato <[email protected]>
2025-08-07 11:15 ` Nazir Bilal Yavuz <[email protected]>
2025-08-11 08:52   ` Nazir Bilal Yavuz <[email protected]>
2025-08-12 07:25     ` Shinya Kato <[email protected]>
2025-08-13 06:21       ` Shinya Kato <[email protected]>
2025-08-14 02:24         ` KAZAR Ayoub <[email protected]>
2025-08-14 10:29           ` Nazir Bilal Yavuz <[email protected]>
2025-08-14 14:59             ` KAZAR Ayoub <[email protected]>
2025-08-19 12:33               ` Nazir Bilal Yavuz <[email protected]>
2025-08-19 14:14                 ` Nazir Bilal Yavuz <[email protected]>
2025-08-21 15:47                   ` Andrew Dunstan <[email protected]>
2025-10-16 14:29                     ` Nazir Bilal Yavuz <[email protected]>
2025-10-18 18:46                       ` KAZAR Ayoub <[email protected]>
2025-10-18 20:01                         ` Nazir Bilal Yavuz <[email protected]>
2025-10-21 06:17                           ` KAZAR Ayoub <[email protected]>
2025-10-21 06:44                             ` KAZAR Ayoub <[email protected]>
2025-10-21 18:55                             ` Nathan Bossart <[email protected]>
2025-10-18 20:01                       ` Nazir Bilal Yavuz <[email protected]>
2025-10-20 14:02                       ` Andrew Dunstan <[email protected]>
2025-10-20 17:04                         ` Nathan Bossart <[email protected]>
2025-10-20 20:31                           ` Andrew Dunstan <[email protected]>
2025-10-20 21:09                             ` Nazir Bilal Yavuz <[email protected]>
2025-10-21 18:40                               ` Nathan Bossart <[email protected]>
2025-10-22 12:33                                 ` Nazir Bilal Yavuz <[email protected]>
2025-10-22 19:24                                   ` Nathan Bossart <[email protected]>
2025-10-29 22:22                                     ` Andrew Dunstan <[email protected]>
2025-11-11 22:23                                       ` Manni Wood <[email protected]>
2025-11-12 14:44                                         ` KAZAR Ayoub <[email protected]>
2025-11-13 02:40                                           ` Manni Wood <[email protected]>
2025-11-17 22:16                                             ` Nathan Bossart <[email protected]>
2025-11-17 22:52                                               ` Shinya Kato <[email protected]>
2025-11-18 08:04                                                 ` Nazir Bilal Yavuz <[email protected]>
2025-11-18 14:01                                                   ` Andrew Dunstan <[email protected]>
2025-11-18 14:20                                                     ` Nazir Bilal Yavuz <[email protected]>
2025-11-19 21:01                                                       ` Nathan Bossart <[email protected]>
2025-11-20 12:55                                                         ` Nazir Bilal Yavuz <[email protected]>
2025-11-21 14:48                                                           ` Andrew Dunstan <[email protected]>
2025-11-24 21:59                                                           ` Nathan Bossart <[email protected]>
2025-11-26 00:09                                                             ` Manni Wood <[email protected]>
2025-11-26 11:50                                                         ` KAZAR Ayoub <[email protected]>
2025-11-26 14:21                                                           ` Manni Wood <[email protected]>
2025-12-06 01:39                                                             ` Manni Wood <[email protected]>
2025-12-06 07:55                                                               ` Bilal Yavuz <[email protected]>
2025-12-09 13:40                                                                 ` Bilal Yavuz <[email protected]>
2025-12-09 22:13                                                                   ` Manni Wood <[email protected]>
2025-12-10 11:59                                                                     ` Nazir Bilal Yavuz <[email protected]>
2025-12-12 20:42                                                                   ` Mark Wong <[email protected]>
2025-12-12 23:09                                                                     ` Manni Wood <[email protected]>
2025-12-18 07:35                                                                       ` Nazir Bilal Yavuz <[email protected]>
2025-12-24 15:07                                                                         ` KAZAR Ayoub <[email protected]>
2025-12-29 17:03                                                                           ` Manni Wood <[email protected]>
2025-12-31 13:04                                                                           ` Nazir Bilal Yavuz <[email protected]>
2025-11-18 20:42                                               ` KAZAR Ayoub <[email protected]>
2025-08-21 19:36                 ` KAZAR Ayoub <[email protected]>
2025-08-19 09:09   ` Ants Aasma <[email protected]>
2026-02-20 00:09 Re: Speed up COPY FROM text/CSV parsing using SIMD Manni Wood <[email protected]>
2026-02-20 09:50 ` Nazir Bilal Yavuz <[email protected]>
2026-02-20 18:15   ` Nathan Bossart <[email protected]>
2026-02-23 09:10     ` Nazir Bilal Yavuz <[email protected]>
2026-02-24 04:44       ` Manni Wood <[email protected]>
2026-02-24 13:57         ` Nazir Bilal Yavuz <[email protected]>
2026-02-24 15:07           ` KAZAR Ayoub <[email protected]>
2026-02-24 17:48           ` Nathan Bossart <[email protected]>
2026-02-25 04:06             ` Manni Wood <[email protected]>
2026-02-25 14:24             ` Nazir Bilal Yavuz <[email protected]>
2026-02-26 12:19               ` Nazir Bilal Yavuz <[email protected]>
2026-02-26 14:31                 ` KAZAR Ayoub <[email protected]>
2026-02-26 14:36                   ` Manni Wood <[email protected]>
2026-02-26 15:32                     ` Manni Wood <[email protected]>
2026-02-26 15:51                       ` KAZAR Ayoub <[email protected]>
2026-03-02 19:55               ` Nathan Bossart <[email protected]>
2026-03-04 15:15                 ` Nazir Bilal Yavuz <[email protected]>
2026-03-05 21:25                   ` Andrew Dunstan <[email protected]>
2026-03-06 16:59                     ` Manni Wood <[email protected]>
2026-03-06 17:39                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-06 18:13                         ` Manni Wood <[email protected]>
2026-03-06 18:55                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-06 21:25                             ` Manni Wood <[email protected]>
2026-03-06 23:13                               ` Nathan Bossart <[email protected]>
2026-03-06 23:31                                 ` KAZAR Ayoub <[email protected]>
2026-03-08 10:31                                   ` Nazir Bilal Yavuz <[email protected]>
2026-03-08 19:45                                     ` Manni Wood <[email protected]>
2026-03-09 08:10                                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-09 13:31                                         ` Manni Wood <[email protected]>
2026-03-09 13:43                                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-09 18:25                   ` Nathan Bossart <[email protected]>
2026-03-10 02:30                     ` Manni Wood <[email protected]>
2026-03-10 11:42                       ` Nazir Bilal Yavuz <[email protected]>
2026-03-10 12:35                     ` Nazir Bilal Yavuz <[email protected]>
2026-03-10 17:10                       ` Nathan Bossart <[email protected]>
2026-03-11 11:36                         ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 12:19                           ` KAZAR Ayoub <[email protected]>
2026-03-11 13:10                             ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 13:23                               ` KAZAR Ayoub <[email protected]>
2026-03-11 18:09                           ` Nathan Bossart <[email protected]>
2026-03-11 18:49                             ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 19:02                               ` Nathan Bossart <[email protected]>
2026-03-11 19:22                                 ` Nazir Bilal Yavuz <[email protected]>
2026-03-11 20:42                                   ` Nathan Bossart <[email protected]>
2026-03-12 10:59                                     ` Nazir Bilal Yavuz <[email protected]>
2026-03-12 17:37                                       ` Nathan Bossart <[email protected]>
2026-03-13 02:39                                         ` Manni Wood <[email protected]>
2026-03-13 15:00                                           ` Nathan Bossart <[email protected]>
2026-03-13 11:57                                         ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 13:34                                           ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 14:05                                             ` Nathan Bossart <[email protected]>
2026-03-13 15:58                                               ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:08                                                 ` Nathan Bossart <[email protected]>
2026-03-13 16:13                                                   ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:14                                               ` Nazir Bilal Yavuz <[email protected]>
2026-03-13 16:16                                                 ` Nathan Bossart <[email protected]>
2026-03-13 17:22                                                   ` Greg Burd <[email protected]>
2026-03-13 17:21                                                 ` Greg Burd <[email protected]>
2026-03-13 17:33                                                   ` Nathan Bossart <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox