public inbox for [email protected]
help / color / mirror / Atom feedFrom: Kirill Reshke <[email protected]>
To: Fujii Masao <[email protected]>
Cc: jian he <[email protected]>
Cc: Jim Jones <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: Yugo NAGATA <[email protected]>
Cc: torikoshia <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: Sat, 26 Oct 2024 02:03:32 +0500
Message-ID: <CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com>
<[email protected]>
<CACJufxFFU92-H7G8tmpxt9oTSfL062OA7n5rPx-YbOAtDUUzGw@mail.gmail.com>
<[email protected]>
<CACJufxGnc+=No=Ua6NFT2ADt0vRL=m1QsuCOM=9aKPKWh9_L6Q@mail.gmail.com>
<[email protected]>
<CACJufxFT9j8o5kEC8dPCQqLomWjeJm9V9m8eZjj2Gvc_F5ha=g@mail.gmail.com>
<[email protected]>
<CAKFQuwaYNw8U-9JkFdyOX4i4Y3J1sp6+dk-sh8YmZGCq8gMeVQ@mail.gmail.com>
<[email protected]>
<CACJufxFFdtPKk4B5rSVNEk6yCH2Amvi_8w3Gaz5wg9M_t9c5Rw@mail.gmail.com>
<[email protected]>
<CACJufxEgiysa2SMJPGp0aN476Ojm636MfJK88DZC7TVYsXYBBQ@mail.gmail.com>
<CALdSSPhgjCbyb=ZRgr4LaCFJV2-F9_CxMeX6poHuGCt_f9GYAw@mail.gmail.com>
<[email protected]>
Hi!
On Mon, 21 Oct 2024 at 17:39, Fujii Masao <[email protected]> wrote:
>
>
>
> On 2024/10/21 18:30, Kirill Reshke wrote:
> > v4 no longer applies. It now conflicts with
> > e7834a1a251d4a28245377f383ff20a657ba8262.
> > Also, there were review comments.
> >
> > So, I decided to rebase.
>
> Thanks for the patch! Here are my review comments:
>
> I noticed that on_error=set_to_null does not trigger NOTICE messages for rows
> and columns with errors. It's "unexpected" thing for columns to be silently
> replaced with NULL due to on_error=set_to_null. So, similar to on_error=ignore,
> there should be NOTICE messages indicating which input records had columns
> set to NULL because of data type incompatibility. Without these messages,
> users might not realize that some columns were set to NULL.
Nice catch. That is a feature introduced by
f5a227895e178bf528b18f82bbe554435fb3e64f.
>
> How should on_error=set_to_null behave when reject_limit is set? It seems
> intuitive to trigger an error if the number of rows with columns' data type
> issues exceeds reject_limit, similar to on_error=ignore. This is open to discussion.
Ok, let's discuss. My first suggestion was:
when the REJECT LIMIT is set to some non-zero number and the number of
row NULL replacements exceeds the limit, is it OK to fail. Because
there WAS errors, and we should not tolerate more than $limit errors .
I do find this behavior to be consistent.
But what if we don't set a REJECT LIMIT, it is sane to do all
replacements, as if REJECT LIMIT is inf. But our REJECT LIMIT is zero
(not set).
So, we ignore zero REJECT LIMIT if set_to_null is set.
But while I was trying to implement that, I realized that I don't
understand v4 of this patch. My misunderstanding is about
`t_on_error_null` tests. We are allowed to insert a NULL value for the
first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
do we do that? My thought is we should try to execute
InputFunctionCallSafe with NULL value (i mean, here [1]) for the
column after we failed to insert the input value. And, if this second
call is successful, we do replacement, otherwise we count the row as
erroneous.
>
> psql's tab-completion should be updated to include SET_TO_NULL.
Agreed.
>
>
> An <replaceable class="parameter">error_action</replaceable> value of
> <literal>stop</literal> means fail the command, while
> <literal>ignore</literal> means discard the input row and continue with the next one.
> + <literal>set_to_null</literal> means the input value will be set to <literal>null</literal> and continue with the next one.
>
> How about merging these two descriptions to one and updating it to the following?
>
> -------------------
> An error_action value of stop means fail the command, ignore means discard the input
> row and continue with the next one, and set_to_null means replace columns with invalid
> input values with NULL and move to the next row.
> -------------------
>
Hm, good catch. Applied almost as you suggested. I did tweak this
"replace columns with invalid input values with " into "replace
columns containing erroneous input values with". Is that OK?
> The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
>
> This should be "... ignore and set_to_null options are ..."?
Sure!
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
Here I sent v6 where review comments, except set_to_null vs reject
limit, were addressed. No new tests added yet, as some details are
unclear...
[1] https://github.com/postgresql-cfbot/postgresql/compare/cf/4810~1...cf/4810#diff-98d8bfd706468f77f8b0...
--
Best regards,
Kirill Reshke
Attachments:
[application/octet-stream] v6-0001-on_error-set_to_null.patch (13.4K, 2-v6-0001-on_error-set_to_null.patch)
download | inline diff:
From 85a62beb248cf053dd0e05b01df7572dff7766db Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Thu, 12 Sep 2024 17:07:02 +0800
Subject: [PATCH v6] on_error set_to_null
extent "on_error action", introduce new option: on_error set_to_null.
Due to current grammar, we cannot use "on_error null",
so I choose on_error set_to_null.
any data type conversion errors while the COPY FROM process will set that column value to be NULL.
this will only work with COPY FROM and non-binary format.
However this will respect the not-null constraint, meaning, if you actually converted error to null,
but the column has not-null constraint, not-null constraint violation ERROR will be raised.
discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
doc/src/sgml/ref/copy.sgml | 7 ++--
src/backend/commands/copy.c | 12 ++++---
src/backend/commands/copyfrom.c | 9 ++---
src/backend/commands/copyfromparse.c | 11 ++++++
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy2.out | 50 ++++++++++++++++++++++++++++
src/test/regress/sql/copy2.sql | 44 ++++++++++++++++++++++++
8 files changed, 123 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f096..dcbfa17a3ce 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,12 +394,13 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Specifies how to behave when encountering an error converting a column's
input value into its data type.
An <replaceable class="parameter">error_action</replaceable> value of
- <literal>stop</literal> means fail the command, while
- <literal>ignore</literal> means discard the input row and continue with the next one.
+ <literal>stop</literal> means fail the command,
+ <literal>ignore</literal> means discard the input row and continue with the next one, and
+ <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next row.
The default is <literal>stop</literal>.
</para>
<para>
- The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+ The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
</para>
<para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3485ba8663f..304022cd867 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
parser_errposition(pstate, def->location)));
/*
- * Allow "stop", or "ignore" values.
+ * Allow "stop", "ignore", "set_to_null" values.
*/
if (pg_strcasecmp(sval, "stop") == 0)
return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
return COPY_ON_ERROR_IGNORE;
+ if (pg_strcasecmp(sval, "set_to_null") == 0)
+ return COPY_ON_ERROR_NULL;
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -904,13 +906,13 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
- if (opts_out->reject_limit && !opts_out->on_error)
+ if (opts_out->reject_limit && !(opts_out->on_error == COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
/*- translator: first and second %s are the names of COPY option, e.g.
- * ON_ERROR, third is the value of the COPY option, e.g. IGNORE */
- errmsg("COPY %s requires %s to be set to %s",
- "REJECT_LIMIT", "ON_ERROR", "IGNORE")));
+ * ON_ERROR, third is the value of the COPY option, e.g. IGNORE or SET_TO_NULL */
+ errmsg("COPY %s requires %s to be set to %s or %s",
+ "REJECT_LIMIT", "ON_ERROR", "IGNORE", "SET_TO_NULL")));
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b8..dde30989009 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,7 +1321,7 @@ CopyFrom(CopyFromState cstate)
/* Done, clean up */
error_context_stack = errcallback.previous;
- if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
cstate->num_errors > 0 &&
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
ereport(NOTICE,
@@ -1474,10 +1474,11 @@ BeginCopyFrom(ParseState *pstate,
cstate->escontext->error_occurred = false;
/*
- * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
- * options later
+ * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+ * We'll add other options later
*/
- if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+ if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+ cstate->opts.on_error == COPY_ON_ERROR_NULL)
cstate->escontext->details_wanted = false;
}
else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 654fecb1b14..8ab42dbceee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,6 +961,17 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
{
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+ if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+ {
+ values[m] = (Datum) 0;
+ nulls[m] = true;
+ /*
+ * set error_occurred to false, so next
+ * InputFunctionCallSafe call behave sane.
+ */
+ cstate->escontext->error_occurred = false;
+ continue;
+ }
cstate->num_errors++;
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af73..33652b79f62 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
- "ON_ERROR", "LOG_VERBOSITY");
+ "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..051ca12d107 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
{
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
COPY_ON_ERROR_IGNORE, /* ignore errors */
+ COPY_ON_ERROR_NULL, /* set error field to null */
} CopyOnErrorChoice;
/*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..55ece18f9bf 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
^
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
+ERROR: conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error set_to_nul...
+ ^
COPY x from stdin (log_verbosity default, log_verbosity verbose);
ERROR: conflicting or redundant options
LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,8 @@ COPY x from stdin (format BINARY, null 'x');
ERROR: cannot specify NULL in BINARY mode
COPY x from stdin (format BINARY, on_error ignore);
ERROR: only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR: only ON_ERROR STOP is allowed in BINARY mode
COPY x from stdin (on_error unsupported);
ERROR: COPY ON_ERROR "unsupported" not recognized
LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +130,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
ERROR: COPY ON_ERROR cannot be used with COPY TO
LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
^
+COPY x to stdin (on_error set_to_null);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+ ^
COPY x from stdin (log_verbosity unsupported);
ERROR: COPY LOG_VERBOSITY "unsupported" not recognized
LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +779,24 @@ CONTEXT: COPY check_ign_err
NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a"
CONTEXT: COPY check_ign_err
NOTICE: 6 rows were skipped due to data type incompatibility
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR: null value in column "c" of relation "t_on_error_null" violates not-null constraint
+DETAIL: Failing row contains (11, null, 12).
+CONTEXT: COPY t_on_error_null, line 1: "11 a 12"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+select * from t_on_error_null;
+ a | c | b
+---+----+----
+ | 11 | 13
+ | 11 | 14
+(2 rows)
+
+drop table t_on_error_null;
+drop domain d_int_not_null;
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -813,6 +841,28 @@ ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY check_ign_err, line 5, column n: ""
COPY check_ign_err FROM STDIN WITH (on_error ignore, reject_limit 4);
NOTICE: 4 rows were skipped due to data type incompatibility
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+\pset null NULL
+SELECT * FROM check_ign_err;
+ n | m | k
+------+-----+------
+ 1 | {1} | NULL
+ 2 | {2} | 1
+ 3 | {3} | 2
+ 4 | {4} | NULL
+ NULL | {5} | NULL
+(5 rows)
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: missing data for column "k"
+CONTEXT: COPY check_ign_err, line 1, column m: ""
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR: extra data after last expected column
+CONTEXT: COPY check_ign_err, line 1: "1,{1},1,1"
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..8d974f4d9a9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,14 @@ COPY x from stdin (force_null (a), force_null (b));
COPY x from stdin (convert_selectively (a), convert_selectively (b));
COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error set_to_null);
COPY x from stdin (log_verbosity default, log_verbosity verbose);
-- incorrect options
COPY x from stdin (format BINARY, delimiter ',');
COPY x from stdin (format BINARY, null 'x');
COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
COPY x from stdin (on_error unsupported);
COPY x from stdin (format TEXT, force_quote(a));
COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +89,7 @@ COPY x from stdin (format TEXT, force_null *);
COPY x to stdout (format CSV, force_null(a));
COPY x to stdout (format CSV, force_null *);
COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
COPY x from stdin (log_verbosity unsupported);
COPY x from stdin with (reject_limit 1);
COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +537,24 @@ a {2} 2
8 {8} 8
\.
+create domain d_int_not_null as int not null check(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, c int not null, b int);
+
+--fail
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+11 a 12
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1 11 13
+a 11 14
+\.
+
+select * from t_on_error_null;
+drop table t_on_error_null;
+drop domain d_int_not_null;
+
-- tests for on_error option with log_verbosity and null constraint via domain
CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -588,6 +609,29 @@ a {7} 7
10 {10} 10
\.
+-- tests for on_error set_to_null option
+truncate check_ign_err;
+COPY check_ign_err FROM STDIN WITH (on_error set_to_null);
+1 {1} a
+2 {2} 1
+3 {3} 2
+4 {4} b
+a {5} c
+\.
+
+\pset null NULL
+SELECT * FROM check_ign_err;
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,
+\.
+
+--should fail.
+COPY check_ign_err FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,{1},1,1
+\.
+
-- clean up
DROP TABLE forcetest;
DROP TABLE vistest;
--
2.34.1
view thread (29+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
In-Reply-To: <CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox