public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nazir Bilal Yavuz <[email protected]>
To: Shinya Kato <[email protected]>
Cc: [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Thu, 7 Aug 2025 14:15:06 +0300
Message-ID: <CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com> (raw)
In-Reply-To: <CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com>
References: <CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com>

Hi,

Thank you for working on this!

On Thu, 7 Aug 2025 at 04:49, Shinya Kato <[email protected]> wrote:
>
> Hi hackers,
>
> I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
> text}) command and observed approximately a 5% performance
> improvement. Please see the detailed test results below.

I have been working on the same idea. I was not moving input_buf_ptr
as far as possible, so I think your approach is better.

Also, I did a benchmark on text format. I created a benchmark for line
length in a table being from 1 byte to 1 megabyte.The peak improvement
is line length being 4096 and the improvement is more than 20% [1], I
saw no regression on your patch.

> Idea
> ====
> The current text/CSV parser processes input byte-by-byte, checking
> whether each byte is a special character (\n, \r, quote, escape) or a
> regular character, and transitions states in a state machine. This
> sequential processing is inefficient and likely causes frequent branch
> mispredictions due to the many if statements.
>
> I thought this problem could be addressed by leveraging SIMD and
> vectorized operations for faster processing.
>
> Implementation Overview
> =======================
> 1. Create a vector of special characters (e.g., Vector8 nl =
> vector8_broadcast('\n');).
> 2. Load the input buffer into a Vector8 variable called chunk.
> 3. Perform vectorized operations between chunk and the special
> character vectors to check if the buffer contains any special
> characters.
> 4-1. If no special characters are found, advance the input_buf_ptr by
> sizeof(Vector8).
> 4-2. If special characters are found, advance the input_buf_ptr as far
> as possible, then fall back to the original text/CSV parser for
> byte-by-byte processing.
>
...
> Thought?
> I would appreciate feedback on the implementation and any suggestions
> for further improvement.


I have a couple of ideas that I was working on:
---

+         * However, SIMD optimization cannot be applied in the following cases:
+         * - Inside quoted fields, where escape sequences and closing quotes
+         *   require sequential processing to handle correctly.

I think you can continue SIMD inside quoted fields. Only important
thing is you need to set last_was_esc to false when SIMD skipped the
chunk.
---

+         * - When the remaining buffer size is smaller than the size of a SIMD
+         *   vector register, as SIMD operations require processing data in
+         *   fixed-size chunks.

You run SIMD when 'copy_buf_len - input_buf_ptr >= sizeof(Vector8)'
but you only call CopyLoadInputBuf() when 'input_buf_ptr >=
copy_buf_len || need_data' so basically you need to wait at least the
sizeof(Vector8) character to pass for the next SIMD. And in the worst
case; if CopyLoadInputBuf() puts one character less than
sizeof(Vector8), then you can't ever run SIMD. I think we need to make
sure that CopyLoadInputBuf() loads at least the sizeof(Vector8)
character to the input_buf so we do not encounter that problem.
---

What do you think about adding SIMD to CopyReadAttributesText() and
CopyReadAttributesCSV() functions? When I add your SIMD approach to
CopyReadAttributesText() function, the improvement on the 4096 byte
line length input [1] goes from 20% to 30%.
---

I shared my ideas as a Feedback.txt file (.txt to stay off CFBot's
radar for this thread). I hope these help, please let me know if you
have any questions.

--
Regards,
Nazir Bilal Yavuz
Microsoft

From b13f4cdf134eef5fbecf9ea06f9b1c99890b7c02 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH] Feedback

---
 src/backend/commands/copyfromparse.c | 55 ++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..dae5c1f698c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1359,7 +1363,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1395,6 +1399,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1655,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);;
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1727,43 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+		if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+			/* Create a mask of all special characters we need to stop at */
+			match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				memcpy(output_ptr, cur_ptr, advance);
+				output_ptr += advance;
+				cur_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+				output_ptr += sizeof(Vector8);
+				cur_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
-- 
2.50.1



Attachments:

  [text/plain] Feedback.txt (3.4K, 2-Feedback.txt)
  download | inline diff:
From b13f4cdf134eef5fbecf9ea06f9b1c99890b7c02 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Thu, 7 Aug 2025 13:27:34 +0300
Subject: [PATCH] Feedback

---
 src/backend/commands/copyfromparse.c | 55 ++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5aba0fa6cb7..dae5c1f698c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate)
 		/* If we now have some unconverted data, try to convert it */
 		CopyConvertBuf(cstate);
 
-		/* If we now have some more input bytes ready, return them */
-		if (INPUT_BUF_BYTES(cstate) > nbytes)
+		/*
+		 * If we now have at least sizeof(Vector8) input bytes ready, return
+		 * them. This is beneficial for SIMD processing in the
+		 * CopyReadLineText() function.
+		 */
+		if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8))
 			return;
 
 		/*
@@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 * unsafe with the old v2 COPY protocol, but we don't support that
 		 * anymore.
 		 */
-		if (input_buf_ptr >= copy_buf_len || need_data)
+		if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data)
 		{
 			REFILL_LINEBUF;
 
@@ -1359,7 +1363,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		 *   vector register, as SIMD operations require processing data in
 		 *   fixed-size chunks.
 		 */
-		if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
 		{
 			Vector8		chunk;
 			Vector8		match;
@@ -1395,6 +1399,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			{
 				/* No special characters found, so skip the entire chunk */
 				input_buf_ptr += sizeof(Vector8);
+				last_was_esc = false;
 				continue;
 			}
 		}
@@ -1650,6 +1655,11 @@ CopyReadAttributesText(CopyFromState cstate)
 	char	   *cur_ptr;
 	char	   *line_end_ptr;
 
+#ifndef USE_NO_SIMD
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		delim = vector8_broadcast(delimc);;
+#endif
+
 	/*
 	 * We need a special case for zero-column tables: check that the input
 	 * line is empty, and return.
@@ -1717,6 +1727,43 @@ CopyReadAttributesText(CopyFromState cstate)
 		{
 			char		c;
 
+#ifndef USE_NO_SIMD
+		if (line_end_ptr - cur_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match;
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) cur_ptr);
+
+			/* Create a mask of all special characters we need to stop at */
+			match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim));
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int advance = pg_rightmost_one_pos32(mask);
+				memcpy(output_ptr, cur_ptr, advance);
+				output_ptr += advance;
+				cur_ptr += advance;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				memcpy(output_ptr, cur_ptr, sizeof(Vector8));
+				output_ptr += sizeof(Vector8);
+				cur_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 			end_ptr = cur_ptr;
 			if (cur_ptr >= line_end_ptr)
 				break;
-- 
2.50.1



view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CAN55FZ247JdiT8Sd1SRiyOJxk3Ei=pDCL4kpdP=HqLRjOhKf1Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox