public inbox for [email protected]
help / color / mirror / Atom feedFrom: Bilal Yavuz <[email protected]>
To: Manni Wood <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Tue, 9 Dec 2025 16:40:19 +0300
Message-ID: <CAN55FZ1fwKgGo2wEie1w2M2jzJko6cMi1NWD05Xm47_L9a3D+g@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ0Nd9FL=aDSjOTJTeFAn8VNrZgWG+WbcHR+R7GkDMvUyw@mail.gmail.com>
References: <aPkvi5P7kpA8oQKc@nathan>
<[email protected]>
<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
<aRue0D4QQkUf2B_N@nathan>
<CAOzEurTHCGL-Txqf5rxMsPgTF=dTCOsr=uhJdXebqjEJy-0L7g@mail.gmail.com>
<CAN55FZ0+JZvKYVCnJqLhHaWF9eBGmTaF1BCEpttxw1aT3G_+Qw@mail.gmail.com>
<[email protected]>
<CAN55FZ1XF=R7F7B__gq04rp2nQnJqs1yfExEXo4riWc68+Pe0w@mail.gmail.com>
<aR4wDwNdLc5TmcQq@nathan>
<CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com>
<CAKWEB6rLxPVtN4ffZ3CMTL518zhk_BWzzBt6ZE2oUSaErdphxA@mail.gmail.com>
<CAKWEB6oO4gQd+UJBrU=uuUTE8Hv7GMznjMouvn0Lskr52UqjhQ@mail.gmail.com>
<CAN55FZ0Nd9FL=aDSjOTJTeFAn8VNrZgWG+WbcHR+R7GkDMvUyw@mail.gmail.com>
Hi,
On Sat, 6 Dec 2025 at 10:55, Bilal Yavuz <[email protected]> wrote:
>
> Hi,
>
> On Sat, 6 Dec 2025 at 04:40, Manni Wood <[email protected]> wrote:
> > Hello, all.
> >
> > Andrew, I tried your suggestion of just reading the first chunk of the copy file to determine if SIMD is worth using. Attached are v4 versions of the patches showing a first attempt at doing that.
>
> Thank you for doing this!
>
> > I attached test.sh.txt to show how I've been testing, with 5 million lines of the various copy file variations introduced by Ayub Kazar.
> >
> > The text copy with no special chars is 30% faster. The CSV copy with no special chars is 48% faster. The text with 1/3rd escapes is 3% slower. The CSV with 1/3rd quotes is 0.27% slower.
> >
> > This set of patches follows the simplest suggestion of just testing the first N lines (actually first N bytes) of the file and then deciding whether or not to enable SIMD. This set of patches does not follow Andrew's later suggestion of maybe checking again every million lines or so.
>
> My input-generation script is not ready to share yet, but the inputs
> follow this format: text_${n}.input, where n represents the number of
> normal characters before the delimiter. For example:
>
> n = 0 -> "\n\n\n\n\n..." (no normal characters)
> n = 1 -> "a\n..." (1 normal character before the delimiter)
> ...
> n = 5 -> "aaaaa\n..."
> … continuing up to n = 32.
>
> Each line has 4096 chars and there are a total of 100000 lines in each
> input file.
>
> I only benchmarked the text format. I compared the latest heuristic I
> shared [1] with the current method. The benchmarks show roughly a ~16%
> regression at the worst case (n = 2), with regressions up to n = 5.
> For the remaining values, performance was similar.
I tried to improve the v4 patchset. My changes are:
1 - I changed CopyReadLineText() to an inline function and sent the
use_simd variable as an argument to get help from inlining.
2 - A main for loop in the CopyReadLineText() function is called many
times, so I moved the use_simd check to the CopyReadLine() function.
3 - Instead of 'bytes_processed', I used 'chars_processed' because
cstate->bytes_processed is increased before we process them and this
can cause wrong results.
4 - Because of #2 and #3, instead of having
'SPECIAL_CHAR_SIMD_THRESHOLD', I used the ratio of 'chars_processed /
special_chars_encountered' to determine whether we want to use SIMD.
5 - cstate->special_chars_encountered is incremented wrongly for the
CSV case. It is not incremented for the quote and escape delimiters. I
moved all increments of cstate->special_chars_encountered to the
central place and tried to optimize it but it still causes a
regression as it creates one more branching.
With these changes, I am able to decrease the regression to %10 from
%16. Regression decreases to %7 if I modify #5 for the only text input
but I did not do that.
My changes are in the 0003.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Attachments:
[text/x-patch] v4.1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (3.5K, 2-v4.1-0001-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
download | inline diff:
From a1b4d28069786c3fb506c79e096312fcfd585fdb Mon Sep 17 00:00:00 2001
From: Shinya Kato <[email protected]>
Date: Mon, 28 Jul 2025 22:08:20 +0900
Subject: [PATCH v4.1 1/3] Speed up COPY FROM text/CSV parsing using SIMD
---
src/backend/commands/copyfromparse.c | 76 ++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 62afcd8fad1..cf110767542 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -71,7 +71,9 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "pgstat.h"
+#include "port/pg_bitutils.h"
#include "port/pg_bswap.h"
+#include "port/simd.h"
#include "utils/builtins.h"
#include "utils/rel.h"
@@ -1255,6 +1257,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
char quotec = '\0';
char escapec = '\0';
+#ifndef USE_NO_SIMD
+ Vector8 nl = vector8_broadcast('\n');
+ Vector8 cr = vector8_broadcast('\r');
+ Vector8 bs = vector8_broadcast('\\');
+ Vector8 quote = vector8_broadcast(0);
+ Vector8 escape = vector8_broadcast(0);
+#endif
+
if (is_csv)
{
quotec = cstate->opts.quote[0];
@@ -1262,6 +1272,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
/* ignore special escape processing if it's the same as quotec */
if (quotec == escapec)
escapec = '\0';
+
+#ifndef USE_NO_SIMD
+ quote = vector8_broadcast(quotec);
+ if (quotec != escapec)
+ escape = vector8_broadcast(escapec);
+#endif
}
/*
@@ -1328,6 +1344,66 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
need_data = false;
}
+#ifndef USE_NO_SIMD
+
+ /*
+ * Use SIMD instructions to efficiently scan the input buffer for
+ * special characters (e.g., newline, carriage return, quote, and
+ * escape). This is faster than byte-by-byte iteration, especially on
+ * large buffers.
+ *
+ * We do not apply the SIMD fast path in either of the following
+ * cases: - When the previously processed character was an escape
+ * character (last_was_esc), since the next byte must be examined
+ * sequentially. - The remaining buffer is smaller than one vector
+ * width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
+ */
+ if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+ {
+ Vector8 chunk;
+ Vector8 match = vector8_broadcast(0);
+ uint32 mask;
+
+ /* Load a chunk of data into a vector register */
+ vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]);
+
+ if (is_csv)
+ {
+ /* \n and \r are not special inside quotes */
+ if (!in_quote)
+ match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+ match = vector8_or(match, vector8_eq(chunk, quote));
+ if (escapec != '\0')
+ match = vector8_or(match, vector8_eq(chunk, escape));
+ }
+ else
+ {
+ match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+ match = vector8_or(match, vector8_eq(chunk, bs));
+ }
+
+ /* Check if we found any special characters */
+ mask = vector8_highbit_mask(match);
+ if (mask != 0)
+ {
+ /*
+ * Found a special character. Advance up to that point and let
+ * the scalar code handle it.
+ */
+ int advance = pg_rightmost_one_pos32(mask);
+
+ input_buf_ptr += advance;
+ }
+ else
+ {
+ /* No special characters found, so skip the entire chunk */
+ input_buf_ptr += sizeof(Vector8);
+ continue;
+ }
+ }
+#endif
+
/* OK to fetch a character */
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
--
2.51.0
[text/x-patch] v4.1-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch (5.0K, 3-v4.1-0002-Speed-up-COPY-FROM-text-CSV-parsing-using-SIMD.patch)
download | inline diff:
From 3a2f9ff26755a5248b7a33770f4603fec483d3bc Mon Sep 17 00:00:00 2001
From: Manni Wood <[email protected]>
Date: Fri, 5 Dec 2025 18:33:46 -0600
Subject: [PATCH v4.1 2/3] Speed up COPY FROM text/CSV parsing using SIMD
Authors: Shinya Kato <[email protected]>,
Nazir Bilal Yavuz <[email protected]>,
Ayoub Kazar <[email protected]>
Reviewers: Andrew Dunstan <[email protected]>
Descussion:
https://www.postgresql.org/message-id/flat/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com
---
src/include/commands/copyfrom_internal.h | 11 +++++++++
src/backend/commands/copyfrom.c | 3 +++
src/backend/commands/copyfromparse.c | 29 +++++++++++++++++++++++-
3 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..215215f909f 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,17 @@ typedef struct CopyFromStateData
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
uint64 bytes_processed; /* number of bytes processed so far */
+
+ /* the amount of bytes to read until checking if we should try simd */
+#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
+ /* the number of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+ uint64 special_chars_encountered; /* number of special chars
+ * encountered so far */
+ bool checked_simd; /* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+ * and checked if we should use SIMD on the
+ * rest of the file */
+ bool use_simd; /* use simd to speed up copying */
} CopyFromStateData;
extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..e638623e5b5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,9 @@ BeginCopyFrom(ParseState *pstate,
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
cstate->relname_only = false;
+ cstate->special_chars_encountered = 0;
+ cstate->checked_simd = false;
+ cstate->use_simd = false;
/*
* Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index cf110767542..549b56c21fb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1346,6 +1346,28 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
#ifndef USE_NO_SIMD
+ /*
+ * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
+ * cstate->bytes_processed will grow an unpredictable amount with each
+ * call to this function, so just wait until we have crossed the
+ * threshold.
+ */
+ if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
+ {
+ cstate->checked_simd = true;
+
+ /*
+ * If we have not read too many special characters
+ * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
+ * processing. This heuristic assumes that input does not vary too
+ * much from line to line and that number of special characters
+ * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
+ * indicitive of the whole file.
+ */
+ if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
+ cstate->use_simd = true;
+ }
+
/*
* Use SIMD instructions to efficiently scan the input buffer for
* special characters (e.g., newline, carriage return, quote, and
@@ -1358,7 +1380,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
* sequentially. - The remaining buffer is smaller than one vector
* width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
*/
- if (!last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+ if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
{
Vector8 chunk;
Vector8 match = vector8_broadcast(0);
@@ -1418,6 +1440,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
*/
if (c == '\r')
{
+ cstate->special_chars_encountered++;
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
}
@@ -1449,6 +1472,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
/* Process \r */
if (c == '\r' && (!is_csv || !in_quote))
{
+ cstate->special_chars_encountered++;
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
cstate->eol_type == EOL_CRNL)
@@ -1505,6 +1529,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
/* Process \n */
if (c == '\n' && (!is_csv || !in_quote))
{
+ cstate->special_chars_encountered++;
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1527,6 +1552,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
{
char c2;
+ cstate->special_chars_encountered++;
+
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
IF_NEED_REFILL_AND_EOF_BREAK(0);
--
2.51.0
[text/x-patch] v4.1-0003-Feedback-Changes.patch (7.9K, 4-v4.1-0003-Feedback-Changes.patch)
download | inline diff:
From 8d0e6766175abac15b39884126c29da03657be40 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 9 Dec 2025 15:32:10 +0300
Subject: [PATCH v4.1 3/3] Feedback / Changes
---
src/include/commands/copyfrom_internal.h | 9 +--
src/backend/commands/copyfrom.c | 1 +
src/backend/commands/copyfromparse.c | 88 +++++++++++++++---------
3 files changed, 60 insertions(+), 38 deletions(-)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 215215f909f..397720bf875 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,12 +183,13 @@ typedef struct CopyFromStateData
uint64 bytes_processed; /* number of bytes processed so far */
/* the amount of bytes to read until checking if we should try simd */
-#define BYTES_PROCESSED_UNTIL_SIMD_CHECK 100000
- /* the number of special chars read below which we use simd */
-#define SPECIAL_CHAR_SIMD_THRESHOLD 20000
+#define CHARS_PROCESSED_UNTIL_SIMD_CHECK 100000
+ /* the ratio of special chars read below which we use simd */
+#define SPECIAL_CHAR_SIMD_RATIO 4
+ uint64 chars_processed;
uint64 special_chars_encountered; /* number of special chars
* encountered so far */
- bool checked_simd; /* we read BYTES_PROCESSED_UNTIL_SIMD_CHECK
+ bool checked_simd; /* we read CHARS_PROCESSED_UNTIL_SIMD_CHECK
* and checked if we should use SIMD on the
* rest of the file */
bool use_simd; /* use simd to speed up copying */
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e638623e5b5..d44dd16eced 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1720,6 +1720,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
cstate->relname_only = false;
+ cstate->chars_processed = 0;
cstate->special_chars_encountered = 0;
cstate->checked_simd = false;
cstate->use_simd = false;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 549b56c21fb..86a268d0df9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -143,7 +143,7 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd);
static int CopyReadAttributesText(CopyFromState cstate);
static int CopyReadAttributesCSV(CopyFromState cstate);
static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -1173,8 +1173,40 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = false;
- /* Parse data and transfer into line_buf */
- result = CopyReadLineText(cstate, is_csv);
+#ifndef USE_NO_SIMD
+
+ /*
+ * Wait until we have read more than CHARS_PROCESSED_UNTIL_SIMD_CHECK.
+ * cstate->bytes_processed will grow an unpredictable amount with each
+ * call to this function, so just wait until we have crossed the
+ * threshold.
+ */
+ if (!cstate->checked_simd && cstate->chars_processed > CHARS_PROCESSED_UNTIL_SIMD_CHECK)
+ {
+ cstate->checked_simd = true;
+
+ /*
+ * If we have not read too many special characters then start using
+ * SIMD to speed up processing. This heuristic assumes that input does
+ * not vary too much from line to line and that number of special
+ * characters encountered in the first
+ * CHARS_PROCESSED_UNTIL_SIMD_CHECK are indicitive of the whole file.
+ */
+ if (cstate->chars_processed / SPECIAL_CHAR_SIMD_RATIO >= cstate->special_chars_encountered)
+ {
+ cstate->use_simd = true;
+ }
+ }
+#endif
+
+ /*
+ * Parse data and transfer into line_buf. To get benefit from inlining,
+ * call CopyReadLineText() with the constant boolean variables.
+ */
+ if (cstate->use_simd)
+ result = CopyReadLineText(cstate, is_csv, true);
+ else
+ result = CopyReadLineText(cstate, is_csv, false);
if (result)
{
@@ -1241,8 +1273,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
/*
* CopyReadLineText - inner loop of CopyReadLine for text mode
*/
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool use_simd)
{
char *copy_input_buf;
int input_buf_ptr;
@@ -1309,7 +1341,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
input_buf_ptr = cstate->input_buf_index;
copy_buf_len = cstate->input_buf_len;
- for (;;)
+ for (;; cstate->chars_processed++)
{
int prev_raw_ptr;
char c;
@@ -1346,28 +1378,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
#ifndef USE_NO_SIMD
- /*
- * Wait until we have read more than BYTES_PROCESSED_UNTIL_SIMD_CHECK.
- * cstate->bytes_processed will grow an unpredictable amount with each
- * call to this function, so just wait until we have crossed the
- * threshold.
- */
- if (!cstate->checked_simd && cstate->bytes_processed > BYTES_PROCESSED_UNTIL_SIMD_CHECK)
- {
- cstate->checked_simd = true;
-
- /*
- * If we have not read too many special characters
- * (SPECIAL_CHAR_SIMD_THRESHOLD) then start using SIMD to speed up
- * processing. This heuristic assumes that input does not vary too
- * much from line to line and that number of special characters
- * encountered in the first BYTES_PROCESSED_UNTIL_SIMD_CHECK are
- * indicitive of the whole file.
- */
- if (cstate->special_chars_encountered < SPECIAL_CHAR_SIMD_THRESHOLD)
- cstate->use_simd = true;
- }
-
/*
* Use SIMD instructions to efficiently scan the input buffer for
* special characters (e.g., newline, carriage return, quote, and
@@ -1380,7 +1390,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
* sequentially. - The remaining buffer is smaller than one vector
* width (sizeof(Vector8)); SIMD operates on fixed-size chunks.
*/
- if (cstate->use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+ if (use_simd && !last_was_esc && copy_buf_len - input_buf_ptr >= sizeof(Vector8))
{
Vector8 chunk;
Vector8 match = vector8_broadcast(0);
@@ -1430,6 +1440,21 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
prev_raw_ptr = input_buf_ptr;
c = copy_input_buf[input_buf_ptr++];
+ /* Use this calculation decide whether to use SIMD later */
+ if (!use_simd && unlikely(!cstate->checked_simd))
+ {
+ if (is_csv)
+ {
+ if (c == '\r' || c == '\n' || c == quotec || c == escapec)
+ cstate->special_chars_encountered++;
+ }
+ else
+ {
+ if (c == '\r' || c == '\n' || c == '\\')
+ cstate->special_chars_encountered++;
+ }
+ }
+
if (is_csv)
{
/*
@@ -1440,7 +1465,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
*/
if (c == '\r')
{
- cstate->special_chars_encountered++;
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
}
@@ -1472,7 +1496,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
/* Process \r */
if (c == '\r' && (!is_csv || !in_quote))
{
- cstate->special_chars_encountered++;
/* Check for \r\n on first line, _and_ handle \r\n. */
if (cstate->eol_type == EOL_UNKNOWN ||
cstate->eol_type == EOL_CRNL)
@@ -1529,7 +1552,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
/* Process \n */
if (c == '\n' && (!is_csv || !in_quote))
{
- cstate->special_chars_encountered++;
if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -1552,8 +1574,6 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
{
char c2;
- cstate->special_chars_encountered++;
-
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
IF_NEED_REFILL_AND_EOF_BREAK(0);
--
2.51.0
view thread (99+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
In-Reply-To: <CAN55FZ1fwKgGo2wEie1w2M2jzJko6cMi1NWD05Xm47_L9a3D+g@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox