Re: Speed up COPY TO text/CSV parsing using SIMD

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Speed up COPY TO text/CSV parsing using SIMD
13+ messages / 3 participants
[nested] [flat]

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-02-12 21:25  Andres Freund <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Andres Freund @ 2026-02-12 21:25 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: pgsql-hackers; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

Hi,

On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
> Currently optimizing COPY FROM using SIMD is still under review, but for
> the case of COPY TO using the same ideas, we found that the problem is
> trivial, the attached patch gives very nice speedups as confirmed by
> Manni's benchmarks.

I have a hard time believing that adding a strlen() to the handling of a short
column won't be a measurable overhead with lots of short attributes.
Particularly because the patch afaict will call it repeatedly if there are any
to-be-escaped characters.

I also don't think it's good how much code this repeats. I think you'd have to
start with preparatory moving the exiting code into static inline helper
functions and then introduce SIMD into those.

Greetings,

Andres Freund

^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-02-14 15:02  KAZAR Ayoub <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: KAZAR Ayoub @ 2026-02-14 15:02 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Nathan Bossart <[email protected]>; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

Hi,

On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <[email protected]> wrote:

> Hi,
>
> On 2026-02-12 22:07:52 +0100, KAZAR Ayoub wrote:
> > Currently optimizing COPY FROM using SIMD is still under review, but for
> > the case of COPY TO using the same ideas, we found that the problem is
> > trivial, the attached patch gives very nice speedups as confirmed by
> > Manni's benchmarks.
>
> I have a hard time believing that adding a strlen() to the handling of a
> short
> column won't be a measurable overhead with lots of short attributes.
> Particularly because the patch afaict will call it repeatedly if there are
> any
> to-be-escaped characters.
>
Thanks for pointing that out, so here's what i did:
1) In the previous patch, strlen was called twice if a CSV attribute needed
to add a quote, the attached patch gets the length in the beginning and
uses it for both SIMD paths, so basically one call.
2) If an attribute needs encoding we need to recalculate string length
because it can grow. (so 2 calls at maximum in all cases)
3) Supposing the very worse cases, i benchmarked this against master for
tables that have 100, 500, 1000 columns : all integers only, so one would
want to process the whole thing in just a pass rather than calculating
length of such short attributes:
1000 columns:
TEXT: 17% regression
CSV: 3.4% regression

500 columns:
TEXT: 17.7% regression
CSV: 3.1% regression

100 columns:
TEXT: 17.3% regression
CSV: 3% regression

A bit unstable results, but yeah the overhead for worse cases like this is
really significant, I can't argue whether this is worth it or not, so
thoughts on this ?

I also don't think it's good how much code this repeats. I think you'd have
> to
> start with preparatory moving the exiting code into static inline helper
> functions and then introduce SIMD into those.
>
Done, yet i'm not too sure whether this is the right place to put it, let
me know.


Regards,
Ayoub


Attachments:

  [text/x-patch] v2-0001-Speed-up-COPY-TO-text-CSV-using-SIMD.patch (7.0K, 3-v2-0001-Speed-up-COPY-TO-text-CSV-using-SIMD.patch)
  download | inline diff:
From 2060f79bd3ee29809788a466700a9dc49459ca8e Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Sat, 14 Feb 2026 15:12:42 +0100
Subject: [PATCH] Speed up COPY TO text/CSV using SIMD

Use SIMD to scan for special characters in COPY TO, processing 16+ bytes
at a time instead of byte-by-byte. This speeds up export of fields that
contain few characters requiring escaping, so the code falls back to scalar path when we find a special char.
---
 src/backend/commands/copyto.c | 159 +++++++++++++++++++++++++++++++++-
 1 file changed, 157 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..49198137531 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -31,6 +31,8 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -121,6 +123,142 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 								bool use_quote);
 static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
 						   uint64 *processed);
+static pg_attribute_always_inline void CopySkipTextSIMD(const char **ptr, 
+							 size_t len, char delimc);
+
+static pg_attribute_always_inline void CopyCheckCSVQuoteNeedSIMD(const char **ptr,
+							 size_t len, char delimc, char quotec);
+
+static pg_attribute_always_inline void CopySkipCSVEscapeSIMD(const char **ptr,
+							 size_t len, char escapec, char quotec);
+
+/*
+ * CopySkipTextSIMD - Skip forward past safe characters in TEXT mode using SIMD
+ *
+ * Advances ptr as far as possible, stopping at first special character.
+ */
+static pg_attribute_always_inline void
+CopySkipTextSIMD(const char **ptr, size_t len, char delimc)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+
+	const char *end = p + len;
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8 chunk;
+		Vector8 control_mask;
+		Vector8 backslash_mask;
+		Vector8 delim_mask;
+		Vector8 special_mask;
+		uint32 mask;
+
+		vector8_load(&chunk, (const uint8 *) p);
+		control_mask = vector8_gt(vector8_broadcast(0x20), chunk);
+		backslash_mask = vector8_eq(vector8_broadcast('\\'), chunk);
+		delim_mask = vector8_eq(vector8_broadcast(delimc), chunk);
+
+		special_mask = vector8_or(control_mask,
+								  vector8_or(backslash_mask, delim_mask));
+
+		mask = vector8_highbit_mask(special_mask);
+		if (mask != 0)
+		{
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopyCheckCSVQuoteNeedSIMD - Check if CSV field needs quoting using SIMD
+ *
+ * Advances ptr as far as possible, stopping at first special character.
+ */
+static pg_attribute_always_inline void
+CopyCheckCSVQuoteNeedSIMD(const char **ptr, size_t len, char delimc, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	Vector8 delim_mask = vector8_broadcast(delimc);
+	Vector8 quote_mask = vector8_broadcast(quotec);
+	Vector8 newline_mask = vector8_broadcast('\n');
+	Vector8 carriage_return_mask = vector8_broadcast('\r');
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8 chunk;
+		Vector8 special_mask;
+		uint32 mask;
+
+		vector8_load(&chunk, (const uint8 *) p);
+		special_mask = vector8_or(
+			vector8_or(vector8_eq(chunk, delim_mask),
+					   vector8_eq(chunk, quote_mask)),
+			vector8_or(vector8_eq(chunk, newline_mask),
+					   vector8_eq(chunk, carriage_return_mask))
+		);
+
+		mask = vector8_highbit_mask(special_mask);
+		if (mask != 0)
+		{
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopySkipCSVEscapeSIMD - Skip forward past safe characters in CSV mode using SIMD
+ *
+ * Advances ptr as far as possible, stopping at first quote or escape character.
+ */
+static pg_attribute_always_inline void
+CopySkipCSVEscapeSIMD(const char **ptr, size_t len, char escapec, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	Vector8 escape_mask = vector8_broadcast(escapec);
+	Vector8 quote_mask = vector8_broadcast(quotec);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8 chunk;
+		Vector8 special_mask;
+		uint32 mask;
+
+		vector8_load(&chunk, (const uint8 *) p);
+		special_mask = vector8_or(vector8_eq(chunk, escape_mask),
+								  vector8_eq(chunk, quote_mask));
+
+		mask = vector8_highbit_mask(special_mask);
+		if (mask != 0)
+		{
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
 
 /* built-in format-specific routines */
 static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -1245,9 +1383,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	const char *start;
 	char		c;
 	char		delimc = cstate->opts.delim[0];
+	size_t len = strlen(string);
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+		/* We have to recalculate the length after transcoding, because it can change the string length */
+		len = strlen(ptr);
+	}
 	else
 		ptr = string;
 
@@ -1268,6 +1411,8 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
+		CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1328,6 +1473,8 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
+		CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1402,13 +1549,18 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 	char		quotec = cstate->opts.quote[0];
 	char		escapec = cstate->opts.escape[0];
 	bool		single_attr = (list_length(cstate->attnumlist) == 1);
+	size_t 	len = strlen(string);
 
 	/* force quoting if it matches null_print (before conversion!) */
 	if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
 		use_quote = true;
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+		/* We have to recalculate the length after transcoding, because it can change the string length */
+		len = strlen(ptr);
+	}
 	else
 		ptr = string;
 
@@ -1429,6 +1581,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		else
 		{
 			const char *tptr = ptr;
+			CopyCheckCSVQuoteNeedSIMD(&tptr, len, delimc, quotec);
 
 			while ((c = *tptr) != '\0')
 			{
@@ -1453,6 +1606,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 * We adopt the same optimization strategy as in CopyAttributeOutText
 		 */
 		start = ptr;
+		CopySkipCSVEscapeSIMD(&ptr, len, escapec, quotec);
+
 		while ((c = *ptr) != '\0')
 		{
 			if (c == quotec || c == escapec)
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-10 19:16  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nathan Bossart @ 2026-03-10 19:16 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
> On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <[email protected]> wrote:
>> I have a hard time believing that adding a strlen() to the handling of a
>> short column won't be a measurable overhead with lots of short attributes.
>> Particularly because the patch afaict will call it repeatedly if there are
>> any to-be-escaped characters.
> 
> [...]
> 
> 1000 columns:
> TEXT: 17% regression
> CSV: 3.4% regression
> 
> 500 columns:
> TEXT: 17.7% regression
> CSV: 3.1% regression
> 
> 100 columns:
> TEXT: 17.3% regression
> CSV: 3% regression
> 
> A bit unstable results, but yeah the overhead for worse cases like this is
> really significant, I can't argue whether this is worth it or not, so
> thoughts on this ?

I seriously doubt we'd commit something that produces a 17% regression
here.  Perhaps we should skip the SIMD paths whenever transcoding is
required.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-14 22:43  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: KAZAR Ayoub @ 2026-03-14 22:43 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

Hello,
On Tue, Mar 10, 2026 at 8:17 PM Nathan Bossart <[email protected]>
wrote:

> On Sat, Feb 14, 2026 at 04:02:21PM +0100, KAZAR Ayoub wrote:
> > On Thu, Feb 12, 2026 at 10:25 PM Andres Freund <[email protected]>
> wrote:
> >> I have a hard time believing that adding a strlen() to the handling of a
> >> short column won't be a measurable overhead with lots of short
> attributes.
> >> Particularly because the patch afaict will call it repeatedly if there
> are
> >> any to-be-escaped characters.
> >
> > [...]
> >
> > 1000 columns:
> > TEXT: 17% regression
> > CSV: 3.4% regression
> >
> > 500 columns:
> > TEXT: 17.7% regression
> > CSV: 3.1% regression
> >
> > 100 columns:
> > TEXT: 17.3% regression
> > CSV: 3% regression
> >
> > A bit unstable results, but yeah the overhead for worse cases like this
> is
> > really significant, I can't argue whether this is worth it or not, so
> > thoughts on this ?
>
> I seriously doubt we'd commit something that produces a 17% regression
> here.  Perhaps we should skip the SIMD paths whenever transcoding is
> required.
>
> --
> nathan
>
I've spent some time rethinking about this and here's what i've done in v3:
SIMD is only used for varlena attributes whose text representation is
longer than a single SIMD vector, and only when no transcoding is required.

Fixed-size types such as integers etc.. mostly produce short ASCII output
for which SIMD provides no benefit.

For eligible attributes, the stored varlena size is used as a cheap
pre-filter to avoid an
unnecessary strlen() call on short values.

Here are the benchmark results after many runs compared to master
(4deecb52aff):
TEXT clean: -34.0%
CSV clean: -39.3%
TEXT 1/3: +4.7%
CSV 1/3: -2.3%
the above numbers have a variance of 1% to 3% improvs or regressions
across +20 runs

WIDE tables short attributes TEXT:
50 columns: -3.7%
100 columns: -1.7%
200 columns: +1.8%
500 columns: -0.5%
1000 columns: -0.3%

WIDE tables short attributes CSV:
50 columns: -2.5%
100 columns: +1.8%
200 columns: +1.4%
500 columns: -0.9%
1000 columns: -1.1%

Wide tables benchmarks where all similar noise, across +20 runs its always
around -2% and +4% for all numbers of columns.

Just a small concern about where some varlenas have a larger binary size
than its text representation ex:
SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
 pg_column_size
----------------
             32

its text representation is less than sizeof(Vector8) so currently v3 would
enter SIMD path and exit out just from the beginning (two extra branches)
because it does this:
+ if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+ VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))

I thought maybe we could do * 2 or * 4 its binary size, depends on the type
really but this is just a proposition if this case is something concerning.

Thoughts?


Regards,
Ayoub


Attachments:

  [text/x-patch] v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch (12.6K, 3-v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch)
  download | inline diff:
From a22258dfe42d9804cd6cc41c7a15151c4d30c8b9 Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Sat, 14 Mar 2026 22:52:22 +0100
Subject: [PATCH] Speed up COPY TO (FORMAT {text,csv}) using SIMD. Presently,
 such commands scan each attribute's string representation one byte at a time
 looking for special characters.  This commit adds a new path that uses SIMD
 instructions to skip over chunks of data without any special characters. 
 This can be much faster.

SIMD processing is only used for varlena attributes whose text
representation is longer than a single SIMD vector, and only when
no encoding conversion is required.  Fixed-size types such as
integers and booleans always produce short ASCII output for which
SIMD provides no benefit, and when transcoding is needed the string
length may change after conversion.  For eligible attributes, the
stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values, this version also avoids
calling strlen twice when transcoding is necessary.

For TEXT mode, the SIMD path scans for ASCII control characters,
backslash, and the delimiter.  For CSV mode, two SIMD helpers are
used: one to determine whether a field requires quoting by scanning
for the delimiter, quote character, and end-of-line characters, and
one to scan for characters requiring escaping during the output pass.
In both modes, the scalar path handles any remaining characters after
the SIMD pre-pass.
---
 src/backend/commands/copyto.c | 254 +++++++++++++++++++++++++++++++---
 1 file changed, 236 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..fde19f9a6a4 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -31,6 +31,8 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -117,11 +119,147 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
-static void CopyAttributeOutText(CopyToState cstate, const char *string);
-static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
-								bool use_quote);
+static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
+															bool use_simd, size_t len);
+static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
+														   bool use_quote, bool use_simd, size_t len);
 static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
 						   uint64 *processed);
+static void CopySkipTextSIMD(const char **ptr,
+							 size_t len, char delimc);
+static void CopyCheckCSVQuoteNeedSIMD(const char **ptr,
+									  size_t len, char delimc, char quotec);
+static void CopySkipCSVEscapeSIMD(const char **ptr,
+								  size_t len, char escapec, char quotec);
+
+/*
+ * CopySkipTextSIMD - Scan forward in TEXT mode using SIMD,
+ * stopping at the first special character then caller continues processing any remaining
+ * characters in the scalar path.
+ *
+ * Special characters for TEXT mode are: ASCII control characters (< 0x20),
+ * backslash, and the delimiter.
+ */
+static void
+CopySkipTextSIMD(const char **ptr, size_t len, char delimc)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 backslash_mask = vector8_broadcast('\\');
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 control_mask = vector8_broadcast(0x20);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_gt(control_mask, chunk),
+						   vector8_eq(chunk, backslash_mask));
+		match = vector8_or(match, vector8_eq(chunk, delim_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopyCheckCSVQuoteNeedSIMD - Scan a CSV field using SIMD to determine
+ * whether it needs quoting stopping at the first character that would require the field to be quoted:
+ * the delimiter, the quote character, newline, or carriage return.
+ */
+static void
+CopyCheckCSVQuoteNeedSIMD(const char **ptr, size_t len, char delimc, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+	const Vector8 nl_mask = vector8_broadcast('\n');
+	const Vector8 cr_mask = vector8_broadcast('\r');
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, nl_mask), vector8_eq(chunk, cr_mask));
+		match = vector8_or(match, vector8_or(vector8_eq(chunk, delim_mask),
+											 vector8_eq(chunk, quote_mask)));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopySkipCSVEscapeSIMD - Same as CopyCheckCSVQuoteNeedSIMD, scan forward in CSV mode using SIMD,
+ * stopping at the first character that requires escaping.
+ */
+static void
+CopySkipCSVEscapeSIMD(const char **ptr, size_t len, char escapec, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 escape_mask = vector8_broadcast(escapec);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, quote_mask), vector8_eq(chunk, escape_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
 
 /* built-in format-specific routines */
 static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -222,9 +360,9 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
 			if (cstate->opts.csv_mode)
-				CopyAttributeOutCSV(cstate, colname, false);
+				CopyAttributeOutCSV(cstate, colname, false, false, 0);
 			else
-				CopyAttributeOutText(cstate, colname);
+				CopyAttributeOutText(cstate, colname, false, 0);
 		}
 
 		CopySendTextLikeEndOfRow(cstate);
@@ -273,6 +411,7 @@ CopyToTextLikeOneRow(CopyToState cstate,
 {
 	bool		need_delim = false;
 	FmgrInfo   *out_functions = cstate->out_functions;
+	TupleDesc	tup_desc = slot->tts_tupleDescriptor;
 
 	foreach_int(attnum, cstate->attnumlist)
 	{
@@ -290,15 +429,48 @@ CopyToTextLikeOneRow(CopyToState cstate,
 		else
 		{
 			char	   *string;
+			bool		use_simd = false;
+			size_t		len = 0;
+
+			string = OutputFunctionCall(&out_functions[attnum - 1], value);
 
-			string = OutputFunctionCall(&out_functions[attnum - 1],
-										value);
+			/*
+			* Only use SIMD for varlena types without transcoding.  Fixed-size
+			* types (int4, bool, date, etc.) always produce short ASCII output
+			* for which SIMD provides no benefit.  When transcoding is needed,
+			* the string length may change after conversion, so we skip SIMD
+			* entirely in that case too.
+			*
+			* We use VARSIZE_ANY_EXHDR as a cheap pre-filter to avoid calling
+			* strlen() on short varlenas.  The actual length passed to the SIMD
+			* helpers is always strlen(string) so the text output length not
+			* the binary storage size.
+			*/
+			if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+				VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
+			{
+				len = strlen(string);
+				use_simd = !cstate->need_transcoding && (len > sizeof(Vector8));
+			}
 
 			if (is_csv)
-				CopyAttributeOutCSV(cstate, string,
-									cstate->opts.force_quote_flags[attnum - 1]);
+			{
+				if (use_simd)
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										true, len);
+				else
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										false, len);
+			}
 			else
-				CopyAttributeOutText(cstate, string);
+			{
+				if (use_simd)
+					CopyAttributeOutText(cstate, string, true, len);
+				else
+					CopyAttributeOutText(cstate, string, false, len);
+			}
 		}
 	}
 
@@ -1239,8 +1411,24 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			CopySendData(cstate, start, ptr - start); \
 	} while (0)
 
-static void
-CopyAttributeOutText(CopyToState cstate, const char *string)
+/*
+ * CopyAttributeOutText - Send text representation of one attribute,
+ * with conversion and escaping.
+ *
+ * For a little extra speed, if use_simd is true we first use SIMD
+ * instructions to skip over chunks of data that contain no special
+ * characters.  This pre-pass advances ptr as far as possible before
+ * handing off to the scalar loop below, which then processes any
+ * remaining characters.  use_simd is only set by the caller when the
+ * attribute is a varlena type whose text representation is longer than
+ * a single SIMD vector and no encoding conversion is required.  In all
+ * other cases we fall straight through to the scalar path.
+ *
+ * When use_simd is true, len must be the strlen() of string, otherwise it is unused
+ */
+static pg_attribute_always_inline void
+CopyAttributeOutText(CopyToState cstate, const char *string,
+					 bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1248,7 +1436,15 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	char		delimc = cstate->opts.delim[0];
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1269,6 +1465,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1329,6 +1528,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1389,12 +1591,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 }
 
 /*
- * Send text representation of one attribute, with conversion and
- * CSV-style escaping
+ * CopyAttributeOutCSV - Send text representation of one attribute,
+ * with conversion and CSV-style escaping.
+ *
+ * We use the same simd optimization idea, see CopyAttributeOutText comment.
  */
-static void
+static pg_attribute_always_inline void
 CopyAttributeOutCSV(CopyToState cstate, const char *string,
-					bool use_quote)
+					bool use_quote, bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1409,7 +1613,15 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		use_quote = true;
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1431,6 +1643,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		{
 			const char *tptr = ptr;
 
+			if (use_simd)
+				CopyCheckCSVQuoteNeedSIMD(&tptr, len, delimc, quotec);
+
 			while ((c = *tptr) != '\0')
 			{
 				if (c == delimc || c == quotec || c == '\n' || c == '\r')
@@ -1454,6 +1669,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 * We adopt the same optimization strategy as in CopyAttributeOutText
 		 */
 		start = ptr;
+		if (use_simd)
+			CopySkipCSVEscapeSIMD(&ptr, len, escapec, quotec);
+
 		while ((c = *ptr) != '\0')
 		{
 			if (c == quotec || c == escapec)
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-17 18:49  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nathan Bossart @ 2026-03-17 18:49 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
> 
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> 
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +															bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +														   bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.

>  			if (is_csv)
> -				CopyAttributeOutCSV(cstate, string,
> -									cstate->opts.force_quote_flags[attnum - 1]);
> +			{
> +				if (use_simd)
> +					CopyAttributeOutCSV(cstate, string,
> +										cstate->opts.force_quote_flags[attnum - 1],
> +										true, len);
> +				else
> +					CopyAttributeOutCSV(cstate, string,
> +										cstate->opts.force_quote_flags[attnum - 1],
> +										false, len);
> +			}
>  			else
> -				CopyAttributeOutText(cstate, string);
> +			{
> +				if (use_simd)
> +					CopyAttributeOutText(cstate, string, true, len);
> +				else
> +					CopyAttributeOutText(cstate, string, false, len);
> +			}

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-17 23:02  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 2 replies; 13+ messages in thread

From: KAZAR Ayoub @ 2026-03-17 23:02 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <[email protected]>
wrote:

> On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> > Just a small concern about where some varlenas have a larger binary size
> > than its text representation ex:
> > SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
> >  pg_column_size
> > ----------------
> >              32
> >
> > its text representation is less than sizeof(Vector8) so currently v3
> would
> > enter SIMD path and exit out just from the beginning (two extra branches)
> > because it does this:
> > + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> > + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> >
> > I thought maybe we could do * 2 or * 4 its binary size, depends on the
> type
> > really but this is just a proposition if this case is something
> concerning.
>
> Can we measure the impact of this?  How likely is this case?
>
I'll respond to this separately in a different email.

>
> > +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState
> cstate, const char *string,
> > +
>                                              bool use_simd, size_t len);
> > +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState
> cstate, const char *string,
> > +
>                                         bool use_quote, bool use_simd,
> size_t len);
>
> Can you test this on its own, too?  We might be able to separate this and
> the change below into a prerequisite patch, assuming they show benefits.
>
I tested inlining alone and found the results were about an improvement of
1% to 4% across all configurations.
The inlining is only meaningful in combination with the SIMD work, for the
reason described below.

>
> >                       if (is_csv)
> > -                             CopyAttributeOutCSV(cstate, string,
> > -
>  cstate->opts.force_quote_flags[attnum - 1]);
> > +                     {
> > +                             if (use_simd)
> > +                                     CopyAttributeOutCSV(cstate, string,
> > +
>      cstate->opts.force_quote_flags[attnum - 1],
> > +
>      true, len);
> > +                             else
> > +                                     CopyAttributeOutCSV(cstate, string,
> > +
>      cstate->opts.force_quote_flags[attnum - 1],
> > +
>      false, len);
>
> There isn't a terrible amount of branching on use_simd in these functions,
> so I'm a little skeptical this makes much difference.  As above, it would
> be good to measure it

I compiled three variants

v3: use_simd passed as compile-time, CopyAttribute functions inlined.
v3_variable: use_simd as is variable, CopyAttribute functions inlined.
v3_variable_noinline: use_simd as is variable, CopyAttribute functions are
not inlined.

None of the helpers are explicitly inlined by us.

The assembly reveals two things:
1) The CSV SIMD helpers (CopyCheckCSVQuoteNeedSIMD, CopySkipCSVEscapeSIMD)
are inlined by the compiler naturally in all
three variants, CopySkipTextSIMD is never inlined by the compiler in any
variant.

2) The constant-emitting approach (v3) does matter (just a little
apparently) specifically for CopySkipTextSIMD.
Its the same story as COPY FROM patch's first commit it just emits code
without use_simd branch
     jbe  ...   ; len > sizeof(Vector8)
     je   ...   ; need_transcoding
     call CopySkipTextSIMD

Whether the extra branching in for constant passing is worth it or not is
demonstrated by the benchmark.


  Test                 Master    v3       v3_var   v3_var_noinl
  TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
  CSV clean            1760ms   -34.9%   -32.7%   -33.0%
  TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
  CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Wide table TEXT (integer columns):

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2083ms   -0.7%    -0.6%    +3.5%
  100     4094ms   -0.1%    -0.5%    +4.5%
  200     1560ms   +0.6%    -2.3%    +3.2%
  500     1905ms   -1.0%    -1.3%    +4.7%
  1000    1455ms   +1.8%    +0.4%    +4.3%

Wide table CSV:

  Cols    Master    v3       v3_var   v3_var_noinl
  50      2421ms   +4.0%    +6.7%    +5.8%
  100     4980ms   +0.1%    +2.0%     +0.1%
  200     1901ms   +1.4%    +3.5%    +1.4%
  500     2328ms   +1.8%    +2.7%    +2.2%
  1000    1815ms   +2.0%    +2.8%    +2.5%

I'm not sure whether there's a diff between v3 and v3_var practically
speaking, what do you think ?


Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-18 02:29  KAZAR Ayoub <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 2 replies; 13+ messages in thread

From: KAZAR Ayoub @ 2026-03-18 02:29 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <[email protected]> wrote:

> On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <[email protected]>
> wrote:
>
>> On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
>> > Just a small concern about where some varlenas have a larger binary size
>> > than its text representation ex:
>> > SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>> >  pg_column_size
>> > ----------------
>> >              32
>> >
>> > its text representation is less than sizeof(Vector8) so currently v3
>> would
>> > enter SIMD path and exit out just from the beginning (two extra
>> branches)
>> > because it does this:
>> > + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
>> > + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>> >
>> > I thought maybe we could do * 2 or * 4 its binary size, depends on the
>> type
>> > really but this is just a proposition if this case is something
>> concerning.
>>
>> Can we measure the impact of this?  How likely is this case?
>>
> I'll respond to this separately in a different email.
>
My example was already incorrect (the text representation is lexems and
positions, not the text we read as it is, its lossy), anyways the point
still holds.
If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
CSV format this would immediately exit the SIMD path because of quote
character, for json(b) this is going to be always the case.
I measured the overhead of exiting the SIMD path a lot (8 million times for
one COPY TO command), i only found 3% regression for this case, sometimes
2%.

For cases where we do a false commitment on SIMD because we read a binary
size >= sizeof(Vector8), which i found very niche too, the short circuit to
scalar each time is even more negligible (the above CSV JSON case is the
absolute worst case).
So I don't think any of this should be a concern.


Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-24 00:16  KAZAR Ayoub <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: KAZAR Ayoub @ 2026-03-24 00:16 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Wed, Mar 18, 2026 at 3:29 AM KAZAR Ayoub <[email protected]> wrote:

> On Wed, Mar 18, 2026 at 12:02 AM KAZAR Ayoub <[email protected]> wrote:
>
>> On Tue, Mar 17, 2026 at 7:49 PM Nathan Bossart <[email protected]>
>> wrote:
>>
>>> On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
>>> > Just a small concern about where some varlenas have a larger binary
>>> size
>>> > than its text representation ex:
>>> > SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>>> >  pg_column_size
>>> > ----------------
>>> >              32
>>> >
>>> > its text representation is less than sizeof(Vector8) so currently v3
>>> would
>>> > enter SIMD path and exit out just from the beginning (two extra
>>> branches)
>>> > because it does this:
>>> > + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
>>> > + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
>>> >
>>> > I thought maybe we could do * 2 or * 4 its binary size, depends on the
>>> type
>>> > really but this is just a proposition if this case is something
>>> concerning.
>>>
>>> Can we measure the impact of this?  How likely is this case?
>>>
>> I'll respond to this separately in a different email.
>>
> My example was already incorrect (the text representation is lexems and
> positions, not the text we read as it is, its lossy), anyways the point
> still holds.
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times
> for one COPY TO command), i only found 3% regression for this case,
> sometimes 2%.
>
> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).
> So I don't think any of this should be a concern.
>
>
> Regards,
> Ayoub
>
Rebased patch.

Regards,
Ayoub


Attachments:

  [text/x-patch] v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch (12.6K, 3-v3-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch)
  download | inline diff:
From dae1e6c444a73cf0ddc21b14e1e0b225fdf46107 Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Sat, 14 Mar 2026 22:52:22 +0100
Subject: [PATCH] Speed up COPY TO (FORMAT {text,csv}) using SIMD. Presently,
 such commands scan each attribute's string representation one byte at a time
 looking for special characters.  This commit adds a new path that uses SIMD
 instructions to skip over chunks of data without any special characters. 
 This can be much faster.

SIMD processing is only used for varlena attributes whose text
representation is longer than a single SIMD vector, and only when
no encoding conversion is required.  Fixed-size types such as
integers and booleans always produce short ASCII output for which
SIMD provides no benefit, and when transcoding is needed the string
length may change after conversion.  For eligible attributes, the
stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values, this version also avoids
calling strlen twice when transcoding is necessary.

For TEXT mode, the SIMD path scans for ASCII control characters,
backslash, and the delimiter.  For CSV mode, two SIMD helpers are
used: one to determine whether a field requires quoting by scanning
for the delimiter, quote character, and end-of-line characters, and
one to scan for characters requiring escaping during the output pass.
In both modes, the scalar path handles any remaining characters after
the SIMD pre-pass.
---
 src/backend/commands/copyto.c | 254 +++++++++++++++++++++++++++++++---
 1 file changed, 236 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..95d2b54761c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -33,6 +33,8 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/json.h"
@@ -128,11 +130,147 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
-static void CopyAttributeOutText(CopyToState cstate, const char *string);
-static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
-								bool use_quote);
+static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
+															bool use_simd, size_t len);
+static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
+														   bool use_quote, bool use_simd, size_t len);
 static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
 						   uint64 *processed);
+static void CopySkipTextSIMD(const char **ptr,
+							 size_t len, char delimc);
+static void CopyCheckCSVQuoteNeedSIMD(const char **ptr,
+									  size_t len, char delimc, char quotec);
+static void CopySkipCSVEscapeSIMD(const char **ptr,
+								  size_t len, char escapec, char quotec);
+
+/*
+ * CopySkipTextSIMD - Scan forward in TEXT mode using SIMD,
+ * stopping at the first special character then caller continues processing any remaining
+ * characters in the scalar path.
+ *
+ * Special characters for TEXT mode are: ASCII control characters (< 0x20),
+ * backslash, and the delimiter.
+ */
+static void
+CopySkipTextSIMD(const char **ptr, size_t len, char delimc)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 backslash_mask = vector8_broadcast('\\');
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 control_mask = vector8_broadcast(0x20);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_gt(control_mask, chunk),
+						   vector8_eq(chunk, backslash_mask));
+		match = vector8_or(match, vector8_eq(chunk, delim_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopyCheckCSVQuoteNeedSIMD - Scan a CSV field using SIMD to determine
+ * whether it needs quoting stopping at the first character that would require the field to be quoted:
+ * the delimiter, the quote character, newline, or carriage return.
+ */
+static void
+CopyCheckCSVQuoteNeedSIMD(const char **ptr, size_t len, char delimc, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+	const Vector8 nl_mask = vector8_broadcast('\n');
+	const Vector8 cr_mask = vector8_broadcast('\r');
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, nl_mask), vector8_eq(chunk, cr_mask));
+		match = vector8_or(match, vector8_or(vector8_eq(chunk, delim_mask),
+											 vector8_eq(chunk, quote_mask)));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopySkipCSVEscapeSIMD - Same as CopyCheckCSVQuoteNeedSIMD, scan forward in CSV mode using SIMD,
+ * stopping at the first character that requires escaping.
+ */
+static void
+CopySkipCSVEscapeSIMD(const char **ptr, size_t len, char escapec, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 escape_mask = vector8_broadcast(escapec);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, quote_mask), vector8_eq(chunk, escape_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
 
 /* built-in format-specific routines */
 static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -244,9 +382,9 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
-				CopyAttributeOutCSV(cstate, colname, false);
+				CopyAttributeOutCSV(cstate, colname, false, false, 0);
 			else
-				CopyAttributeOutText(cstate, colname);
+				CopyAttributeOutText(cstate, colname, false, 0);
 		}
 
 		CopySendTextLikeEndOfRow(cstate);
@@ -304,6 +442,7 @@ CopyToTextLikeOneRow(CopyToState cstate,
 {
 	bool		need_delim = false;
 	FmgrInfo   *out_functions = cstate->out_functions;
+	TupleDesc	tup_desc = slot->tts_tupleDescriptor;
 
 	foreach_int(attnum, cstate->attnumlist)
 	{
@@ -321,15 +460,48 @@ CopyToTextLikeOneRow(CopyToState cstate,
 		else
 		{
 			char	   *string;
+			bool		use_simd = false;
+			size_t		len = 0;
+
+			string = OutputFunctionCall(&out_functions[attnum - 1], value);
 
-			string = OutputFunctionCall(&out_functions[attnum - 1],
-										value);
+			/*
+			* Only use SIMD for varlena types without transcoding.  Fixed-size
+			* types (int4, bool, date, etc.) always produce short ASCII output
+			* for which SIMD provides no benefit.  When transcoding is needed,
+			* the string length may change after conversion, so we skip SIMD
+			* entirely in that case too.
+			*
+			* We use VARSIZE_ANY_EXHDR as a cheap pre-filter to avoid calling
+			* strlen() on short varlenas.  The actual length passed to the SIMD
+			* helpers is always strlen(string) so the text output length not
+			* the binary storage size.
+			*/
+			if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+				VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
+			{
+				len = strlen(string);
+				use_simd = !cstate->need_transcoding && (len > sizeof(Vector8));
+			}
 
 			if (is_csv)
-				CopyAttributeOutCSV(cstate, string,
-									cstate->opts.force_quote_flags[attnum - 1]);
+			{
+				if (use_simd)
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										true, len);
+				else
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										false, len);
+			}
 			else
-				CopyAttributeOutText(cstate, string);
+			{
+				if (use_simd)
+					CopyAttributeOutText(cstate, string, true, len);
+				else
+					CopyAttributeOutText(cstate, string, false, len);
+			}
 		}
 	}
 
@@ -1416,8 +1588,24 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			CopySendData(cstate, start, ptr - start); \
 	} while (0)
 
-static void
-CopyAttributeOutText(CopyToState cstate, const char *string)
+/*
+ * CopyAttributeOutText - Send text representation of one attribute,
+ * with conversion and escaping.
+ *
+ * For a little extra speed, if use_simd is true we first use SIMD
+ * instructions to skip over chunks of data that contain no special
+ * characters.  This pre-pass advances ptr as far as possible before
+ * handing off to the scalar loop below, which then processes any
+ * remaining characters.  use_simd is only set by the caller when the
+ * attribute is a varlena type whose text representation is longer than
+ * a single SIMD vector and no encoding conversion is required.  In all
+ * other cases we fall straight through to the scalar path.
+ *
+ * When use_simd is true, len must be the strlen() of string, otherwise it is unused
+ */
+static pg_attribute_always_inline void
+CopyAttributeOutText(CopyToState cstate, const char *string,
+					 bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1425,7 +1613,15 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	char		delimc = cstate->opts.delim[0];
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1446,6 +1642,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1506,6 +1705,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1566,12 +1768,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 }
 
 /*
- * Send text representation of one attribute, with conversion and
- * CSV-style escaping
+ * CopyAttributeOutCSV - Send text representation of one attribute,
+ * with conversion and CSV-style escaping.
+ *
+ * We use the same simd optimization idea, see CopyAttributeOutText comment.
  */
-static void
+static pg_attribute_always_inline void
 CopyAttributeOutCSV(CopyToState cstate, const char *string,
-					bool use_quote)
+					bool use_quote, bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1586,7 +1790,15 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		use_quote = true;
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1608,6 +1820,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		{
 			const char *tptr = ptr;
 
+			if (use_simd)
+				CopyCheckCSVQuoteNeedSIMD(&tptr, len, delimc, quotec);
+
 			while ((c = *tptr) != '\0')
 			{
 				if (c == delimc || c == quotec || c == '\n' || c == '\r')
@@ -1631,6 +1846,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 * We adopt the same optimization strategy as in CopyAttributeOutText
 		 */
 		start = ptr;
+		if (use_simd)
+			CopySkipCSVEscapeSIMD(&ptr, len, escapec, quotec);
+
 		while ((c = *ptr) != '\0')
 		{
 			if (c == quotec || c == escapec)
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-26 21:09  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: Nathan Bossart @ 2026-03-26 21:09 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Wed, Mar 18, 2026 at 12:02:28AM +0100, KAZAR Ayoub wrote:
>   Test                 Master    v3       v3_var   v3_var_noinl
>   TEXT clean           1504ms   -24.1%   -23.0%   -21.5%
>   CSV clean            1760ms   -34.9%   -32.7%   -33.0%

Nice!

>   TEXT 1/3 backslashes     3763ms    +4.6%    +6.9%   +4.1%
>   CSV 1/3 quotes           3885ms    +3.1%    +2.7%    -0.8%

Hm.  These seem a little bit beyond what we could ignore as noise.

> Wide table TEXT (integer columns):
> 
>   Cols    Master    v3       v3_var   v3_var_noinl
>   50      2083ms   -0.7%    -0.6%    +3.5%
>   100     4094ms   -0.1%    -0.5%    +4.5%
>   200     1560ms   +0.6%    -2.3%    +3.2%
>   500     1905ms   -1.0%    -1.3%    +4.7%
>   1000    1455ms   +1.8%    +0.4%    +4.3%

These numbers look roughly within the noise range.

> Wide table CSV:
> 
>   Cols    Master    v3       v3_var   v3_var_noinl
>   50      2421ms   +4.0%    +6.7%    +5.8%

Hm.  Is this reproducible?  A 4% regression is a bit worrisome.

>   100     4980ms   +0.1%    +2.0%     +0.1%
>   200     1901ms   +1.4%    +3.5%    +1.4%
>   500     2328ms   +1.8%    +2.7%    +2.2%
>   1000    1815ms   +2.0%    +2.8%    +2.5%

These numbers don't bother me too much, but maybe there are some ways to
minimize the regressions further.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-26 21:23  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  1 sibling, 1 reply; 13+ messages in thread

From: Nathan Bossart @ 2026-03-26 21:23 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> CSV format this would immediately exit the SIMD path because of quote
> character, for json(b) this is going to be always the case.
> I measured the overhead of exiting the SIMD path a lot (8 million times for
> one COPY TO command), i only found 3% regression for this case, sometimes
> 2%.

I'm a little worried that we might be dismissing small-yet-measurable
regressions for extremely common workloads.  Unlike the COPY FROM work,
this operates on a per-attribute level, meaning we only use SIMD when an
attribute is at least 16 bytes.  The extra branching for each attribute
might not be something we can just ignore.

> For cases where we do a false commitment on SIMD because we read a binary
> size >= sizeof(Vector8), which i found very niche too, the short circuit to
> scalar each time is even more negligible (the above CSV JSON case is the
> absolute worst case).

That's good to hear.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-27 18:48  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: KAZAR Ayoub @ 2026-03-27 18:48 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

Hello,
On Thu, Mar 26, 2026 at 10:23 PM Nathan Bossart <[email protected]>
wrote:

> On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote:
> > If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for
> > CSV format this would immediately exit the SIMD path because of quote
> > character, for json(b) this is going to be always the case.
> > I measured the overhead of exiting the SIMD path a lot (8 million times
> for
> > one COPY TO command), i only found 3% regression for this case, sometimes
> > 2%.
>
> I'm a little worried that we might be dismissing small-yet-measurable
> regressions for extremely common workloads.  Unlike the COPY FROM work,
> this operates on a per-attribute level, meaning we only use SIMD when an
> attribute is at least 16 bytes.  The extra branching for each attribute
> might not be something we can just ignore.
>
Thanks for the review.

I added a prescan loop inside the simd helpers trying to catch special
chars in sizeof(Vector8) characters, i measured how good is this at
reducing the overhead of starting simd and exiting at first vector:
the scalar loop is better than SIMD for one vector if it finds a special
character before 6th character, worst case is not a clean vector, where the
scalar loop needs 20 more cycles compared to SIMD.
This helps mitigate the case of JSON(B) in CSV format, this is why I only
added this for CSV case only.

In a benchmark with 10M early SIMD exit like the JSONB case, the previous
3% regression is gone.

For the normal benchmark (clean, 1/3 specials, wide table), i ran for
longer times for v4 now and i found this:
  Test                       Master    V4
  TEXT clean                 1619ms    -28.0%
  CSV clean                  1866ms    -37.1%
  TEXT 1/3 backslashes       3913ms    +1.2%
  CSV 1/3 quotes             4012ms    -3.0%

Wide table TEXT:

  Cols    Master    V4
  50      2109ms    -2.9%
  100     2029ms    -1.6%
  200     3982ms    -2.9%
  500     1962ms    -6.1%
  1000    3812ms    -3.6%

Wide table CSV:

  Cols    Master    V4
  50      2531ms    +0.3%
  100     2465ms    +1.1%
  200     4965ms    -0.2%
  500     2346ms    +1.4%
  1000    4709ms    -0.4%

Do we need more benchmarks for some other kind of workloads ? If i'm
missing something else that has noticeable overhead maybe ?

Regards,
Ayoub


Attachments:

  [text/x-patch] v4-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch (13.3K, 3-v4-0001-Speed-up-COPY-TO-FORMAT-text-csv-using-SIMD.patch)
  download | inline diff:
From f268d3881a50f94352c8ee4abc334164d25f2d9e Mon Sep 17 00:00:00 2001
From: AyoubKAZ <[email protected]>
Date: Sat, 14 Mar 2026 22:52:22 +0100
Subject: [PATCH] Speed up COPY TO (FORMAT {text,csv}) using SIMD. Presently,
 such commands scan each attribute's string representation one byte at a time
 looking for special characters.  This commit adds a new path that uses SIMD
 instructions to skip over chunks of data without any special characters. 
 This can be much faster.

SIMD processing is only used for varlena attributes whose text
representation is longer than a single SIMD vector, and only when
no encoding conversion is required.  Fixed-size types such as
integers and booleans always produce short ASCII output for which
SIMD provides no benefit, and when transcoding is needed the string
length may change after conversion.  For eligible attributes, the
stored varlena size is used as a cheap pre-filter to avoid an
unnecessary strlen() call on short values, this version also avoids
calling strlen twice when transcoding is necessary.

For TEXT mode, the SIMD path scans for ASCII control characters,
backslash, and the delimiter.  For CSV mode, two SIMD helpers are
used: one to determine whether a field requires quoting by scanning
for the delimiter, quote character, and end-of-line characters, and
one to scan for characters requiring escaping during the output pass.
In both modes, the scalar path handles any remaining characters after
the SIMD pre-pass.
---
 src/backend/commands/copyto.c | 280 +++++++++++++++++++++++++++++++---
 1 file changed, 262 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..ee09cef0307 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -33,6 +33,8 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_bitutils.h"
+#include "port/simd.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/json.h"
@@ -128,11 +130,173 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
 static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot);
-static void CopyAttributeOutText(CopyToState cstate, const char *string);
-static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
-								bool use_quote);
+static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
+															bool use_simd, size_t len);
+static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
+														   bool use_quote, bool use_simd, size_t len);
 static void CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel,
 						   uint64 *processed);
+static void CopySkipTextSIMD(const char **ptr,
+							 size_t len, char delimc);
+static void CopyCheckCSVQuoteNeedSIMD(const char **ptr,
+									  size_t len, char delimc, char quotec);
+static void CopySkipCSVEscapeSIMD(const char **ptr,
+								  size_t len, char escapec, char quotec);
+
+/*
+ * CopySkipTextSIMD - Scan forward in TEXT mode using SIMD,
+ * stopping at the first special character then caller continues processing any remaining
+ * characters in the scalar path.
+ *
+ * Special characters for TEXT mode are: ASCII control characters (< 0x20),
+ * backslash, and the delimiter.
+ */
+static void
+CopySkipTextSIMD(const char **ptr, size_t len, char delimc)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	const Vector8 backslash_mask = vector8_broadcast('\\');
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 control_mask = vector8_broadcast(0x20);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_gt(control_mask, chunk),
+						   vector8_eq(chunk, backslash_mask));
+		match = vector8_or(match, vector8_eq(chunk, delim_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopyCheckCSVQuoteNeedSIMD - Scan a CSV field using SIMD to determine
+ * whether it needs quoting stopping at the first character that would require the field to be quoted:
+ * the delimiter, the quote character, newline, or carriage return.
+ */
+static void
+CopyCheckCSVQuoteNeedSIMD(const char **ptr, size_t len, char delimc, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	// Do a scalar prescan of sizeof(Vector8) if possible, to avoid the overhead of setting up SIMD for early stops.
+	const char *prescan_end = p + sizeof(Vector8);
+	while (p < prescan_end && p < end)
+	{
+		char c = *p;
+		if (c == delimc || c == quotec || c == '\n' || c == '\r')
+		{
+			*ptr = p;
+			return;
+		}
+		p++;
+	}
+
+	const Vector8 delim_mask = vector8_broadcast(delimc);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+	const Vector8 nl_mask = vector8_broadcast('\n');
+	const Vector8 cr_mask = vector8_broadcast('\r');
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, nl_mask), vector8_eq(chunk, cr_mask));
+		match = vector8_or(match, vector8_or(vector8_eq(chunk, delim_mask),
+											 vector8_eq(chunk, quote_mask)));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
+
+/*
+ * CopySkipCSVEscapeSIMD - Same as CopyCheckCSVQuoteNeedSIMD, scan forward in CSV mode using SIMD,
+ * stopping at the first character that requires escaping.
+ */
+static void
+CopySkipCSVEscapeSIMD(const char **ptr, size_t len, char escapec, char quotec)
+{
+#ifndef USE_NO_SIMD
+	const char *p = *ptr;
+	const char *end = p + len;
+
+	// Do a scalar prescan of sizeof(Vector8) if possible, to avoid the overhead of setting up SIMD for early stops.
+	const char *prescan_end = p + sizeof(Vector8);
+	while (p < prescan_end && p < end)
+	{
+		char c = *p;
+		if (c == escapec || c == quotec)
+		{
+			*ptr = p;
+			return;
+		}
+		p++;
+	}
+
+	const Vector8 escape_mask = vector8_broadcast(escapec);
+	const Vector8 quote_mask = vector8_broadcast(quotec);
+
+	while (p + sizeof(Vector8) <= end)
+	{
+		Vector8		chunk;
+		Vector8		match;
+
+		vector8_load(&chunk, (const uint8 *) p);
+
+		match = vector8_or(vector8_eq(chunk, quote_mask), vector8_eq(chunk, escape_mask));
+
+		if (vector8_is_highbit_set(match))
+		{
+			uint32		mask;
+
+			mask = vector8_highbit_mask(match);
+			*ptr = p + pg_rightmost_one_pos32(mask);
+			return;
+		}
+
+		p += sizeof(Vector8);
+	}
+
+	*ptr = p;
+#endif
+}
 
 /* built-in format-specific routines */
 static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
@@ -244,9 +408,9 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 			colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
 			if (cstate->opts.format == COPY_FORMAT_CSV)
-				CopyAttributeOutCSV(cstate, colname, false);
+				CopyAttributeOutCSV(cstate, colname, false, false, 0);
 			else
-				CopyAttributeOutText(cstate, colname);
+				CopyAttributeOutText(cstate, colname, false, 0);
 		}
 
 		CopySendTextLikeEndOfRow(cstate);
@@ -304,6 +468,7 @@ CopyToTextLikeOneRow(CopyToState cstate,
 {
 	bool		need_delim = false;
 	FmgrInfo   *out_functions = cstate->out_functions;
+	TupleDesc	tup_desc = slot->tts_tupleDescriptor;
 
 	foreach_int(attnum, cstate->attnumlist)
 	{
@@ -321,15 +486,48 @@ CopyToTextLikeOneRow(CopyToState cstate,
 		else
 		{
 			char	   *string;
+			bool		use_simd = false;
+			size_t		len = 0;
+
+			string = OutputFunctionCall(&out_functions[attnum - 1], value);
 
-			string = OutputFunctionCall(&out_functions[attnum - 1],
-										value);
+			/*
+			* Only use SIMD for varlena types without transcoding.  Fixed-size
+			* types (int4, bool, date, etc.) always produce short ASCII output
+			* for which SIMD provides no benefit.  When transcoding is needed,
+			* the string length may change after conversion, so we skip SIMD
+			* entirely in that case too.
+			*
+			* We use VARSIZE_ANY_EXHDR as a cheap pre-filter to avoid calling
+			* strlen() on short varlenas.  The actual length passed to the SIMD
+			* helpers is always strlen(string) so the text output length not
+			* the binary storage size.
+			*/
+			if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
+				VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
+			{
+				len = strlen(string);
+				use_simd = !cstate->need_transcoding && (len > sizeof(Vector8));
+			}
 
 			if (is_csv)
-				CopyAttributeOutCSV(cstate, string,
-									cstate->opts.force_quote_flags[attnum - 1]);
+			{
+				if (use_simd)
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										true, len);
+				else
+					CopyAttributeOutCSV(cstate, string,
+										cstate->opts.force_quote_flags[attnum - 1],
+										false, len);
+			}
 			else
-				CopyAttributeOutText(cstate, string);
+			{
+				if (use_simd)
+					CopyAttributeOutText(cstate, string, true, len);
+				else
+					CopyAttributeOutText(cstate, string, false, len);
+			}
 		}
 	}
 
@@ -1416,8 +1614,24 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 			CopySendData(cstate, start, ptr - start); \
 	} while (0)
 
-static void
-CopyAttributeOutText(CopyToState cstate, const char *string)
+/*
+ * CopyAttributeOutText - Send text representation of one attribute,
+ * with conversion and escaping.
+ *
+ * For a little extra speed, if use_simd is true we first use SIMD
+ * instructions to skip over chunks of data that contain no special
+ * characters.  This pre-pass advances ptr as far as possible before
+ * handing off to the scalar loop below, which then processes any
+ * remaining characters.  use_simd is only set by the caller when the
+ * attribute is a varlena type whose text representation is longer than
+ * a single SIMD vector and no encoding conversion is required.  In all
+ * other cases we fall straight through to the scalar path.
+ *
+ * When use_simd is true, len must be the strlen() of string, otherwise it is unused
+ */
+static pg_attribute_always_inline void
+CopyAttributeOutText(CopyToState cstate, const char *string,
+					 bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1425,7 +1639,15 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	char		delimc = cstate->opts.delim[0];
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1446,6 +1668,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	if (cstate->encoding_embeds_ascii)
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1506,6 +1731,9 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 	else
 	{
 		start = ptr;
+		if (use_simd)
+			CopySkipTextSIMD(&ptr, len, delimc);
+
 		while ((c = *ptr) != '\0')
 		{
 			if ((unsigned char) c < (unsigned char) 0x20)
@@ -1566,12 +1794,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
 }
 
 /*
- * Send text representation of one attribute, with conversion and
- * CSV-style escaping
+ * CopyAttributeOutCSV - Send text representation of one attribute,
+ * with conversion and CSV-style escaping.
+ *
+ * We use the same simd optimization idea, see CopyAttributeOutText comment.
  */
-static void
+static pg_attribute_always_inline void
 CopyAttributeOutCSV(CopyToState cstate, const char *string,
-					bool use_quote)
+					bool use_quote, bool use_simd, size_t len)
 {
 	const char *ptr;
 	const char *start;
@@ -1586,7 +1816,15 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		use_quote = true;
 
 	if (cstate->need_transcoding)
-		ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+	{
+		/*
+		 * len may already be set by the caller for long varlenas, avoiding an extra
+		 * strlen() call.  For all other cases it is 0 and we compute it here.
+		 */
+		if (len == 0)
+			len = strlen(string);
+		ptr = pg_server_to_any(string, len, cstate->file_encoding);
+	}
 	else
 		ptr = string;
 
@@ -1608,6 +1846,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		{
 			const char *tptr = ptr;
 
+			if (use_simd)
+				CopyCheckCSVQuoteNeedSIMD(&tptr, len, delimc, quotec);
+
 			while ((c = *tptr) != '\0')
 			{
 				if (c == delimc || c == quotec || c == '\n' || c == '\r')
@@ -1631,6 +1872,9 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 		 * We adopt the same optimization strategy as in CopyAttributeOutText
 		 */
 		start = ptr;
+		if (use_simd)
+			CopySkipCSVEscapeSIMD(&ptr, len, escapec, quotec);
+
 		while ((c = *ptr) != '\0')
 		{
 			if (c == quotec || c == escapec)
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-03-31 16:30  Nathan Bossart <[email protected]>
  parent: KAZAR Ayoub <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nathan Bossart @ 2026-03-31 16:30 UTC (permalink / raw)
  To: KAZAR Ayoub <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special
> chars in sizeof(Vector8) characters, i measured how good is this at
> reducing the overhead of starting simd and exiting at first vector:
> the scalar loop is better than SIMD for one vector if it finds a special
> character before 6th character, worst case is not a clean vector, where the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I only
> added this for CSV case only.

Interesting.

> In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> 3% regression is gone.

While these are nice results, I think it's best that we target v20 for this
patch so that we have more time to benchmark and explore edge cases.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Speed up COPY TO text/CSV parsing using SIMD
@ 2026-04-02 18:07  KAZAR Ayoub <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 13+ messages in thread

From: KAZAR Ayoub @ 2026-04-02 18:07 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; Neil Conway <[email protected]>; Manni Wood <[email protected]>; Andrew Dunstan <[email protected]>; Shinya Kato <[email protected]>; Mark Wong <[email protected]>; Nazir Bilal Yavuz <[email protected]>

On Tue, Mar 31, 2026 at 6:30 PM Nathan Bossart <[email protected]>
wrote:

> On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> > I added a prescan loop inside the simd helpers trying to catch special
> > chars in sizeof(Vector8) characters, i measured how good is this at
> > reducing the overhead of starting simd and exiting at first vector:
> > the scalar loop is better than SIMD for one vector if it finds a special
> > character before 6th character, worst case is not a clean vector, where
> the
> > scalar loop needs 20 more cycles compared to SIMD.
> > This helps mitigate the case of JSON(B) in CSV format, this is why I only
> > added this for CSV case only.
>
> Interesting.
>
> > In a benchmark with 10M early SIMD exit like the JSONB case, the previous
> > 3% regression is gone.
>
> While these are nice results, I think it's best that we target v20 for this
> patch so that we have more time to benchmark and explore edge cases.
>
Thanks for the review.
Fair enough, I'll try many more cases in the upcoming weeks to make sure
we're not missing anything.

>
> --
> nathan

Regards,
Ayoub


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

end of thread, other threads:[~2026-04-02 18:07 UTC | newest]

Thread overview: 13+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-12 21:25 Re: Speed up COPY TO text/CSV parsing using SIMD Andres Freund <[email protected]>
2026-02-14 15:02 ` KAZAR Ayoub <[email protected]>
2026-03-10 19:16   ` Nathan Bossart <[email protected]>
2026-03-14 22:43     ` KAZAR Ayoub <[email protected]>
2026-03-17 18:49       ` Nathan Bossart <[email protected]>
2026-03-17 23:02         ` KAZAR Ayoub <[email protected]>
2026-03-18 02:29           ` KAZAR Ayoub <[email protected]>
2026-03-24 00:16             ` KAZAR Ayoub <[email protected]>
2026-03-26 21:23             ` Nathan Bossart <[email protected]>
2026-03-27 18:48               ` KAZAR Ayoub <[email protected]>
2026-03-31 16:30                 ` Nathan Bossart <[email protected]>
2026-04-02 18:07                   ` KAZAR Ayoub <[email protected]>
2026-03-26 21:09           ` Nathan Bossart <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox