public inbox for [email protected]help / color / mirror / Atom feed
Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row 20+ messages / 7 participants [nested] [flat]
* Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2024-01-26 15:08 David G. Johnston <[email protected]> 0 siblings, 3 replies; 20+ messages in thread From: David G. Johnston @ 2024-01-26 15:08 UTC (permalink / raw) To: PostgreSQL Hackers <[email protected]> Hi, The option choice of "ignore" in the COPY ON_ERROR clause seems overly generic. There would seem to be two relevant ways to ignore bad column input data - drop the entire row or just set the column value to null. I can see us wanting to provide the set to null option and in any case having the option name be explicit that it ignores the row seems like a good idea. David J. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2024-01-28 23:50 jian he <[email protected]> parent: David G. Johnston <[email protected]> 2 siblings, 1 reply; 20+ messages in thread From: jian he @ 2024-01-28 23:50 UTC (permalink / raw) To: David G. Johnston <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston <[email protected]> wrote: > > Hi, > > The option choice of "ignore" in the COPY ON_ERROR clause seems overly generic. There would seem to be two relevant ways to ignore bad column input data - drop the entire row or just set the column value to null. I can see us wanting to provide the set to null option and in any case having the option name be explicit that it ignores the row seems like a good idea. two issue I found out while playing around with it; create table x1(a int not null, b int not null ); you can only do: COPY x1 from stdin (on_error 'null'); but you cannot do COPY x1 from stdin (on_error null); we need to hack the gram.y to escape the "null". I don't know how to make it work. related post I found: https://stackoverflow.com/questions/31786611/how-to-escape-flex-keyword another issue: COPY x1 from stdin (on_error null); when we already have `not null` top level constraint for table x1. Do we need an error immediately? "on_error null" seems to conflict with `not null` constraint (assume refers to the same column). it may fail while doing bulk inserts while on_error is set to null because of violating a not null constraint. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2024-01-29 00:19 David G. Johnston <[email protected]> parent: jian he <[email protected]> 0 siblings, 0 replies; 20+ messages in thread From: David G. Johnston @ 2024-01-29 00:19 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Sun, Jan 28, 2024 at 4:51 PM jian he <[email protected]> wrote: > On Fri, Jan 26, 2024 at 11:09 PM David G. Johnston > <[email protected]> wrote: > > > > Hi, > > > > The option choice of "ignore" in the COPY ON_ERROR clause seems overly > generic. There would seem to be two relevant ways to ignore bad column > input data - drop the entire row or just set the column value to null. I > can see us wanting to provide the set to null option and in any case having > the option name be explicit that it ignores the row seems like a good idea. > > two issue I found out while playing around with it; > create table x1(a int not null, b int not null ); > > another issue: > COPY x1 from stdin (on_error null); > > when we already have `not null` top level constraint for table x1. > Do we need an error immediately? > "on_error null" seems to conflict with `not null` constraint (assume > refers to the same column). > it may fail while doing bulk inserts while on_error is set to null > because of violating a not null constraint. > You should not error immediately since whether or not there is a problem is table and data dependent. I would not check for the case of all columns being defined not null and just let the mismatch happen. That said, maybe with this being a string we can accept something like: 'null, ignore' And so if attempting to place any one null fails, assuming we can make that a soft error too, we would then ignore the entire row. David J. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2024-01-29 08:28 Yugo NAGATA <[email protected]> parent: David G. Johnston <[email protected]> 2 siblings, 0 replies; 20+ messages in thread From: Yugo NAGATA @ 2024-01-29 08:28 UTC (permalink / raw) To: David G. Johnston <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Fri, 26 Jan 2024 08:08:29 -0700 "David G. Johnston" <[email protected]> wrote: > Hi, > > The option choice of "ignore" in the COPY ON_ERROR clause seems overly > generic. There would seem to be two relevant ways to ignore bad column > input data - drop the entire row or just set the column value to null. I > can see us wanting to provide the set to null option and in any case having > the option name be explicit that it ignores the row seems like a good idea. I am not in favour of renaming the option name "ignore", instead we can use another style of name for the option to set the column value to NULL, for example, "set_to_null". (Maybe, we can make a more generic option "set_to (col, val)" that can set the value of column specified by "col" value to the specified value "val" (e.g. 'N/A') on a soft error, although the syntax would be a bit complex...) IMO, it is more simple to define "ignore" as to skip the entire row rather than having variety of "ignore". Once defined it so, the option to set the column value to NULL should not be called "ignore" because values in other columns will be inserted. Regards, Yugo Nagata > > David J. -- Yugo NAGATA <[email protected]> ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-07 03:48 jian he <[email protected]> parent: David G. Johnston <[email protected]> 2 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-03-07 03:48 UTC (permalink / raw) To: Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> hi. rebase only. Attachments: [text/x-patch] v12-0001-COPY-on_error-set_to_null.patch (19.8K, 2-v12-0001-COPY-on_error-set_to_null.patch) download | inline diff: From ce0ce6438094cad553e509db65b7fd27de2b9af6 Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Fri, 7 Mar 2025 11:43:51 +0800 Subject: [PATCH v12 1/1] COPY (on_error set_to_null) Extent "on_error action", introduce new option: on_error set_to_null. Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null". Any data type conversion errors during the COPY FROM process will result in the affected column being set to NULL. This only applicable when using the non-binary format for COPY FROM. However, the not-null constraint will still be enforced. If a conversion error leads to a NULL value in a column that has a not-null constraint, a not-null constraint violation error will be triggered. A regression test for a domain with a not-null constraint has been added. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com --- doc/src/sgml/ref/copy.sgml | 34 +++++++++++----- src/backend/commands/copy.c | 6 ++- src/backend/commands/copyfrom.c | 29 +++++++++----- src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++ src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++ 8 files changed, 196 insertions(+), 25 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index df093da97c5..91bc25b9ab3 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, and + <literal>set_to_null</literal> means replace columns containing erroneous input values with + <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_to_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + For <literal>ignore</literal> option, + a <literal>NOTICE</literal> message containing the ignored row count is + emitted at the end of the <command>COPY FROM</command> if at least one row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null + happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> + <para> + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. - When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + conversion has failed is emitted for each discarded row; + for <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message containing the line of the input file and the column name + where value was replaced with <literal>NULL</literal> for each input conversion failure. + When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_to_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..afe60758d40 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_to_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_to_null") == 0) + return COPY_ON_ERROR_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index bcf66f0adf8..4502ce2d366 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%llu row was skipped due to data type incompatibility", - "%llu rows were skipped due to data type incompatibility", - (unsigned long long) cstate->num_errors, - (unsigned long long) cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("erroneous values in %llu row was replaced with null", + "erroneous values in %llu rows were replaced with null", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..e2b4d1f7ec9 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, /* * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with set_to_null, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + { + /* + * we use this count the number of rows (not fields) that + * successfully applied the on_error set_to_null + */ + if (!current_row_erroneous) + current_row_erroneous = true; + + /* + * we need another InputFunctionCallSafe so we can error out + * not-null violation for domain with not-null constraint. + */ + cstate->escontext->error_occurred = false; + if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + ereport(NOTICE, + errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu", + cstate->cur_attname, + (unsigned long long) cstate->cur_lineno)); + continue; + } + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + } cstate->num_errors++; - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE && + cstate->opts.on_error == COPY_ON_ERROR_IGNORE) { /* * Since we emit line number and column info in the below @@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, cstate->cur_attval = NULL; } + if (current_row_erroneous) + cstate->num_errors++; Assert(fieldno == attr_count); return true; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 8432be641ac..fcdc61bcb57 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3279,7 +3279,7 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", - "ON_ERROR", "LOG_VERBOSITY"); + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); /* Complete COPY <sth> FROM|TO filename WITH (FORMAT */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..7ebf4f78933 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae8..9a5acef8db0 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_to_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_to_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x FROM stdin (on_error set_to_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdin (on_error set_to_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdin (on_error set_to_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "a" +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +NOTICE: column "b" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column b: "a" +NOTICE: column "c" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column c: "d" +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null, line 2, column b: "b" +NOTICE: column "c" was set to null due to data type incompatibility at line 3 +CONTEXT: COPY t_on_error_null, line 3, column c: "e" +NOTICE: erroneous values in 3 rows were replaced with null +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce0..003a91648e2 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_to_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_to_null); +COPY x FROM stdin (on_error set_to_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdin (on_error set_to_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -534,6 +538,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +a 11 14 +\. + +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +10 a d +11 b 12 +13 14 e +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-11 10:31 Jim Jones <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Jim Jones @ 2025-03-11 10:31 UTC (permalink / raw) To: jian he <[email protected]>; Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> Hi Jian On 07.03.25 04:48, jian he wrote: > hi. > rebase only. I revisited this patch today. It applies and builds cleanly, and it works as expected. Some tests and minor comments: ==== 1) WARNING might be a better fit than NOTICE here. postgres=# \pset null NULL Null display is "NULL". postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text); CREATE TABLE postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format csv); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,a >> 2,1 >> 3,2 >> 4,b >> a,c >> \. NOTICE: erroneous values in 3 rows were replaced with null COPY 5 postgres=# SELECT * FROM t; x | y | z ------+------+------ 1 | NULL | NULL 2 | 1 | NULL 3 | 2 | NULL 4 | NULL | NULL NULL | NULL | NULL (5 rows) postgres=# \pset null NULL Null display is "NULL". postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text); CREATE TABLE postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format csv, log_verbosity verbose); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,a >> 2,1 >> 3,2 >> 4,b >> a,c >> \. NOTICE: column "y" was set to null due to data type incompatibility at line 1 NOTICE: column "y" was set to null due to data type incompatibility at line 4 NOTICE: column "x" was set to null due to data type incompatibility at line 5 NOTICE: column "y" was set to null due to data type incompatibility at line 5 NOTICE: erroneous values in 3 rows were replaced with null COPY 5 postgres=# SELECT * FROM t; x | y | z ------+------+------ 1 | NULL | NULL 2 | 1 | NULL 3 | 2 | NULL 4 | NULL | NULL NULL | NULL | NULL (5 rows) I would still leave the extra messages from "log_verbosity verbose" as NOTICE though. What do you think? ==== 2) Inconsistent terminology. Invalid values in "on_error set_to_null" mode are names as "erroneous", but as "invalid" in "on_error stop" mode. I don't want to get into the semantics of erroneous or invalid, but sticking to one terminology would IMHO look better. postgres=# \pset null NULL Null display is "NULL". postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text); CREATE TABLE postgres=# COPY t (x,y) FROM STDIN WITH (on_error stop, format csv, log_verbosity verbose); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,a >> 2,1 >> 3,2 >> 4,b >> a,c >> \. ERROR: invalid input syntax for type integer: "a" CONTEXT: COPY t, line 1, column y: "a" postgres=# SELECT * FROM t; x | y | z ---+---+--- (0 rows) ==== 3) same as in 1) postgres=# \pset null NULL Null display is "NULL". postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text); CREATE TABLE postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv, log_verbosity verbose); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,a >> 2,1 >> 3,2 >> 4,b >> a,c >> \. NOTICE: skipping row due to data type incompatibility at line 1 for column "y": "a" NOTICE: skipping row due to data type incompatibility at line 4 for column "y": "b" NOTICE: skipping row due to data type incompatibility at line 5 for column "x": "a" NOTICE: 3 rows were skipped due to data type incompatibility COPY 2 postgres=# SELECT * FROM t; x | y | z ---+---+------ 2 | 1 | NULL 3 | 2 | NULL (2 rows)==== ==== "on_error ignore" works well with "reject_limit #" postgres=# \pset null NULL Null display is "NULL". postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text); CREATE TABLE postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv, log_verbosity verbose, reject_limit 1); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,a >> 2,1 >> 3,2 >> 4,b >> a,c >> \. NOTICE: skipping row due to data type incompatibility at line 1 for column "y": "a" NOTICE: skipping row due to data type incompatibility at line 4 for column "y": "b" ERROR: skipped more than REJECT_LIMIT (1) rows due to data type incompatibility CONTEXT: COPY t, line 4, column y: "b" postgres=# SELECT * FROM t; x | y | z ---+---+--- (0 rows) best regards, Jim ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-12 08:00 jian he <[email protected]> parent: Jim Jones <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-03-12 08:00 UTC (permalink / raw) To: Jim Jones <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Mar 11, 2025 at 6:31 PM Jim Jones <[email protected]> wrote: > > > I revisited this patch today. It applies and builds cleanly, and it > works as expected. > > Some tests and minor comments: > hi. Jim Jones. thanks for testsing it again! > ==== > > 1) WARNING might be a better fit than NOTICE here. > but NOTICE, on_errror set_to_null is aligned with on_errror ignore. > > I would still leave the extra messages from "log_verbosity verbose" as > NOTICE though. What do you think? > > ==== When LOG_VERBOSITY option is set to verbose, for ignore option, a NOTICE message containing the line of the input file and the column name whose input conversion has failed is emitted for each discarded row; for set_to_null option, a NOTICE message containing the line of the input file and the column name where value was replaced with NULL for each input conversion failure. see the above desciption, on_errror set_to_null is aligned with on_errror ignore. it's just on_errror ignore is per row, on_errror set_to_null is per column/field. so NOTICE is aligned with other on_error option. > > 2) Inconsistent terminology. Invalid values in "on_error set_to_null" > mode are names as "erroneous", but as "invalid" in "on_error stop" mode. > I don't want to get into the semantics of erroneous or invalid, but > sticking to one terminology would IMHO look better. > I am open to changing it. what do you think "invalid values in %llu row was replaced with null"? > ==== > > "on_error ignore" works well with "reject_limit #" > i remember there was some confusion about on_error set_to_null with reject_limit option. I choose to not suport it. obviously, if there is consenses, we can support it later. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-12 08:25 Jim Jones <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Jim Jones @ 2025-03-12 08:25 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On 12.03.25 09:00, jian he wrote: >> 1) WARNING might be a better fit than NOTICE here. >> > but NOTICE, on_errror set_to_null is aligned with on_errror ignore. > >> I would still leave the extra messages from "log_verbosity verbose" as >> NOTICE though. What do you think? >> >> ==== > When LOG_VERBOSITY option is set to verbose, > for ignore option, a NOTICE message containing the line of the input > file and the column name > whose input conversion has failed is emitted for each discarded row; > for set_to_null option, a NOTICE message containing the line of the > input file and the column name > where value was replaced with NULL for each input conversion failure. > > see the above desciption, > on_errror set_to_null is aligned with on_errror ignore. > it's just on_errror ignore is per row, on_errror set_to_null is per > column/field. > so NOTICE is aligned with other on_error option. I considered using a WARNING due to the severity of the issue - the failure to import data - but either NOTICE or WARNING works for me. >> 2) Inconsistent terminology. Invalid values in "on_error set_to_null" >> mode are names as "erroneous", but as "invalid" in "on_error stop" mode. >> I don't want to get into the semantics of erroneous or invalid, but >> sticking to one terminology would IMHO look better. >> > I am open to changing it. > what do you think "invalid values in %llu row was replaced with null"? LGTM: "invalid values in %llu rows were replaced with null" Thanks for the patch! Best, Jim ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-18 03:55 jian he <[email protected]> parent: Jim Jones <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-03-18 03:55 UTC (permalink / raw) To: Jim Jones <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, Mar 12, 2025 at 4:26 PM Jim Jones <[email protected]> wrote: > > >> 2) Inconsistent terminology. Invalid values in "on_error set_to_null" > >> mode are names as "erroneous", but as "invalid" in "on_error stop" mode. > >> I don't want to get into the semantics of erroneous or invalid, but > >> sticking to one terminology would IMHO look better. > >> > > I am open to changing it. > > what do you think "invalid values in %llu row was replaced with null"? > > LGTM: "invalid values in %llu rows were replaced with null" > changed based on this. also minor documentation tweaks. Attachments: [text/x-patch] v13-0001-COPY-on_error-set_to_null.patch (19.8K, 2-v13-0001-COPY-on_error-set_to_null.patch) download | inline diff: From 3553eee56c8dd0c3ce334d1f37b511acbbc640af Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Tue, 18 Mar 2025 11:51:48 +0800 Subject: [PATCH v13 1/1] COPY (on_error set_to_null) Extent "on_error action", introduce new option: on_error set_to_null. Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null". Any data type conversion errors during the COPY FROM process will result in the affected column being set to NULL. This only applies when using the non-binary format for COPY FROM. However, the not-null constraint will still be enforced. If a column have not-null constraint, successful (on_error set_to_null) action will cause not-null constraint violation. This also apply to column type is domain with not-null constraint. A regression test for a domain with a not-null constraint has been added. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com --- doc/src/sgml/ref/copy.sgml | 34 +++++++++++----- src/backend/commands/copy.c | 6 ++- src/backend/commands/copyfrom.c | 29 +++++++++----- src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++ src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++ 8 files changed, 196 insertions(+), 25 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index df093da97c5..cdb725d7565 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, and + <literal>set_to_null</literal> means replace columns containing invalid input values with + <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_to_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + For <literal>ignore</literal> option, + a <literal>NOTICE</literal> message containing the ignored row count is + emitted at the end of the <command>COPY FROM</command> if at least one row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows where invalid input values were replaced with + null is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> + <para> + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. - When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + conversion has failed is emitted for each discarded row; + for <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message containing the line of the input file and the column name + where value was replaced with <literal>NULL</literal> for each input conversion failure. + When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_to_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..afe60758d40 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_to_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_to_null") == 0) + return COPY_ON_ERROR_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index bcf66f0adf8..43a227eae72 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%llu row was skipped due to data type incompatibility", - "%llu rows were skipped due to data type incompatibility", - (unsigned long long) cstate->num_errors, - (unsigned long long) cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %llu row was replaced with null", + "invalid values in %llu rows were replaced with null", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..e2b4d1f7ec9 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, /* * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with set_to_null, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + { + /* + * we use this count the number of rows (not fields) that + * successfully applied the on_error set_to_null + */ + if (!current_row_erroneous) + current_row_erroneous = true; + + /* + * we need another InputFunctionCallSafe so we can error out + * not-null violation for domain with not-null constraint. + */ + cstate->escontext->error_occurred = false; + if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + ereport(NOTICE, + errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu", + cstate->cur_attname, + (unsigned long long) cstate->cur_lineno)); + continue; + } + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + } cstate->num_errors++; - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE && + cstate->opts.on_error == COPY_ON_ERROR_IGNORE) { /* * Since we emit line number and column info in the below @@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, cstate->cur_attval = NULL; } + if (current_row_erroneous) + cstate->num_errors++; Assert(fieldno == attr_count); return true; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 9a4d993e2bc..7980513a9bd 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", - "ON_ERROR", "LOG_VERBOSITY"); + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); /* Complete COPY <sth> FROM|TO filename WITH (FORMAT */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..7ebf4f78933 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae8..caa94bfd526 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_to_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_to_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x FROM stdin (on_error set_to_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdin (on_error set_to_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdin (on_error set_to_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "a" +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +NOTICE: column "b" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column b: "a" +NOTICE: column "c" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column c: "d" +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null, line 2, column b: "b" +NOTICE: column "c" was set to null due to data type incompatibility at line 3 +CONTEXT: COPY t_on_error_null, line 3, column c: "e" +NOTICE: invalid values in 3 rows were replaced with null +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce0..003a91648e2 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_to_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_to_null); +COPY x FROM stdin (on_error set_to_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdin (on_error set_to_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -534,6 +538,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +a 11 14 +\. + +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +10 a d +11 b 12 +13 14 e +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-21 06:34 vignesh C <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: vignesh C @ 2025-03-21 06:34 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, 18 Mar 2025 at 09:26, jian he <[email protected]> wrote: > > changed based on this. > > also minor documentation tweaks. Few comments: 1) I felt this is wrong: diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 9a4d993e2bc..7980513a9bd 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", - "ON_ERROR", "LOG_VERBOSITY"); + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); as the following fails: postgres=# copy t_on_error_null from stdin WITH ( set_to_null ); ERROR: option "set_to_null" not recognized LINE 1: copy t_on_error_null from stdin WITH ( set_to_null ); 2) Can you limit this to 80 chars if possible to improve the readability: + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, and + <literal>set_to_null</literal> means replace columns containing invalid input values with + <literal>NULL</literal> and move to the next field. 3) similarly here too: + For <literal>ignore</literal> option, + a <literal>NOTICE</literal> message containing the ignored row count is + emitted at the end of the <command>COPY FROM</command> if at least one row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows where invalid input values were replaced with + null is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced. 4) Could you mention a brief one line in the commit message as to why "on_error null" cannot be used: Extent "on_error action", introduce new option: on_error set_to_null. Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null". Regards, Vignesh ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-24 07:50 jian he <[email protected]> parent: vignesh C <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-03-24 07:50 UTC (permalink / raw) To: vignesh C <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On Fri, Mar 21, 2025 at 2:34 PM vignesh C <[email protected]> wrote: > > Few comments: > 1) I felt this is wrong: > diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c > index 9a4d993e2bc..7980513a9bd 100644 > --- a/src/bin/psql/tab-complete.in.c > +++ b/src/bin/psql/tab-complete.in.c > @@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id, > COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", > "HEADER", "QUOTE", "ESCAPE", > "FORCE_QUOTE", > "FORCE_NOT_NULL", > "FORCE_NULL", "ENCODING", "DEFAULT", > - "ON_ERROR", "LOG_VERBOSITY"); > + "ON_ERROR", "SET_TO_NULL", > "LOG_VERBOSITY"); > > as the following fails: > postgres=# copy t_on_error_null from stdin WITH ( set_to_null ); > ERROR: option "set_to_null" not recognized > LINE 1: copy t_on_error_null from stdin WITH ( set_to_null ); > - COMPLETE_WITH("stop", "ignore"); + COMPLETE_WITH("stop", "ignore", "set_to_null"); yech. I think I fixed this. > 2) Can you limit this to 80 chars if possible to improve the readability: > + <literal>stop</literal> means fail the command, > + <literal>ignore</literal> means discard the input row and > continue with the next one, and > + <literal>set_to_null</literal> means replace columns containing > invalid input values with > + <literal>NULL</literal> and move to the next field. > > 3) similarly here too: > + For <literal>ignore</literal> option, > + a <literal>NOTICE</literal> message containing the ignored row count is > + emitted at the end of the <command>COPY FROM</command> if at > least one row was discarded. > + For <literal>set_to_null</literal> option, > + a <literal>NOTICE</literal> message indicating the number of > rows where invalid input values were replaced with > + null is emitted at the end of the <command>COPY FROM</command> > if at least one row was replaced. > sure. > 4) Could you mention a brief one line in the commit message as to why > "on_error null" cannot be used: > Extent "on_error action", introduce new option: on_error set_to_null. > Current grammar makes us unable to use "on_error null", so we choose > "on_error set_to_null". by the following changes, we can change to (on_error null). --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -3579,6 +3579,7 @@ copy_generic_opt_elem: copy_generic_opt_arg: opt_boolean_or_string { $$ = (Node *) makeString($1); } + | NULL_P { $$ = (Node *) makeString("null"); } | NumericOnly { $$ = (Node *) $1; } | '*' { $$ = (Node *) makeNode(A_Star); } | DEFAULT { $$ = (Node *) makeString("default"); } COPY x from stdin (format null); ERROR: syntax error at or near "null" LINE 1: COPY x from stdin (format null); ^ will become COPY x from stdin (format null); ERROR: COPY format "null" not recognized LINE 1: COPY x from stdin (format null); ^ it will cause NULL_P from reserved word to non-reserved word in the COPY related command. I am not sure this is what we want. Anyway, I attached both two version (ON_ERROR SET_TO_NULL) (ON_ERROR NULL). Attachments: [text/x-patch] v14-0001-COPY-on_error-set_to_null.patch (19.9K, 2-v14-0001-COPY-on_error-set_to_null.patch) download | inline diff: From 7a6b7edf0877bd9c5bb2629888d3d9c29618ca5d Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Mon, 24 Mar 2025 15:14:17 +0800 Subject: [PATCH v14 1/1] COPY (on_error set_to_null) Extent "on_error action", introduce new option: on_error set_to_null. Current grammar makes us unable to use "on_error null". if we did it, then in all the COPY command options's value, null will becomen reserved to non-reserved word. so we choose "on_error set_to_null". Any data type conversion errors during the COPY FROM process will result in the affected column being set to NULL. This only applies when using the non-binary format for COPY FROM. However, the not-null constraint will still be enforced. If a column have not-null constraint, successful (on_error set_to_null) action will cause not-null constraint violation. This also apply to column type is domain with not-null constraint. A regression test for a domain with a not-null constraint has been added. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com --- doc/src/sgml/ref/copy.sgml | 36 ++++++++++++----- src/backend/commands/copy.c | 6 ++- src/backend/commands/copyfrom.c | 29 +++++++++----- src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 60 ++++++++++++++++++++++++++++ src/test/regress/sql/copy2.sql | 46 +++++++++++++++++++++ 8 files changed, 198 insertions(+), 25 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index df093da97c5..1909c11edff 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, + and <literal>set_to_null</literal> means replace columns containing invalid + input values with <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_to_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> + <para> + For <literal>ignore</literal> option, a <literal>NOTICE</literal> message + containing the ignored row count is emitted at the end of the <command>COPY + FROM</command> if at least one row was discarded. + For <literal>set_to_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows + where invalid input values were replaced with null is emitted + at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. - When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + conversion has failed is emitted for each discarded row; + for <literal>set_to_null</literal> option, a <literal>NOTICE</literal> + message containing the line of the input file and the column name where + value was replaced with <literal>NULL</literal> for each input conversion + failure. + When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_to_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..afe60758d40 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_to_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_to_null") == 0) + return COPY_ON_ERROR_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index bcf66f0adf8..43a227eae72 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%llu row was skipped due to data type incompatibility", - "%llu rows were skipped due to data type incompatibility", - (unsigned long long) cstate->num_errors, - (unsigned long long) cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %llu row was replaced with null", + "invalid values in %llu rows were replaced with null", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..e2b4d1f7ec9 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, /* * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with set_to_null, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + if (cstate->opts.on_error == COPY_ON_ERROR_NULL) + { + /* + * we use this count the number of rows (not fields) that + * successfully applied the on_error set_to_null + */ + if (!current_row_erroneous) + current_row_erroneous = true; + + /* + * we need another InputFunctionCallSafe so we can error out + * not-null violation for domain with not-null constraint. + */ + cstate->escontext->error_occurred = false; + if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + ereport(NOTICE, + errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu", + cstate->cur_attname, + (unsigned long long) cstate->cur_lineno)); + continue; + } + else + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + } cstate->num_errors++; - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE && + cstate->opts.on_error == COPY_ON_ERROR_IGNORE) { /* * Since we emit line number and column info in the below @@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, cstate->cur_attval = NULL; } + if (current_row_erroneous) + cstate->num_errors++; Assert(fieldno == attr_count); return true; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 98951aef82c..c79b3af0495 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id, /* Complete COPY <sth> FROM filename WITH (ON_ERROR */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR")) - COMPLETE_WITH("stop", "ignore"); + COMPLETE_WITH("stop", "ignore", "set_to_null"); /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..7ebf4f78933 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae8..caa94bfd526 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_to_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_to_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x FROM stdin (on_error set_to_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdin (on_error set_to_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdin (on_error set_to_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "a" +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +NOTICE: column "b" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column b: "a" +NOTICE: column "c" was set to null due to data type incompatibility at line 1 +CONTEXT: COPY t_on_error_null, line 1, column c: "d" +NOTICE: column "b" was set to null due to data type incompatibility at line 2 +CONTEXT: COPY t_on_error_null, line 2, column b: "b" +NOTICE: column "c" was set to null due to data type incompatibility at line 3 +CONTEXT: COPY t_on_error_null, line 3, column c: "e" +NOTICE: invalid values in 3 rows were replaced with null +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce0..003a91648e2 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_to_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_to_null); +COPY x FROM stdin (on_error set_to_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdin (on_error set_to_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -534,6 +538,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +a 11 14 +\. + +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose); +10 a d +11 b 12 +13 14 e +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 [application/octet-stream] v14-0001-COPY-on_error-null.nocfbot (20.0K, 3-v14-0001-COPY-on_error-null.nocfbot) download ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-03-25 06:31 vignesh C <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: vignesh C @ 2025-03-25 06:31 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, 24 Mar 2025 at 13:21, jian he <[email protected]> wrote: > > I am not sure this is what we want. > Anyway, I attached both two version Few comments 1) I understood the problem, your first approach is ok for me. 2) Here in error we say column c1 violates not-null constraint and in the context we show column c2, should the context also display c2 column: postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10)); CREATE TABLE postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> a b >> \. ERROR: null value in column "c1" of relation "t3" violates not-null constraint DETAIL: Failing row contains (null, null). CONTEXT: COPY t3, line 1, column c2: "b" 3) typo becomen should be become: null will becomen reserved to non-reserved 4) There is a whitespace error while applying patch Applying: COPY (on_error set_to_null) .git/rebase-apply/patch:39: trailing whitespace. a <literal>NOTICE</literal> message indicating the number of rows warning: 1 line adds whitespace errors. Regards, Vignesh ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-07-01 14:54 torikoshia <[email protected]> parent: vignesh C <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: torikoshia @ 2025-07-01 14:54 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> Hi, Thanks for updating the patch and I've read v17-0001-COPY-on_error-set_null.patch and here are some comments. > +COPY x from stdin (on_error set_null, reject_limit 2); > +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE I understand that REJECT_LIMIT is out of scope for this patch, but personally, I feel that supporting REJECT_LIMIT with ON_ERROR SET_NULL would be a natural extension. - Both IGNORE and SET_NULL share the common behavior of allowing COPY to continue despite soft errors. - Since REJECT_LIMIT defines the threshold for how many soft errors can be tolerated before COPY fails, it seems consistent to allow it with SET_NULL as well. + if (current_row_erroneous) + cstate->num_errors++; Is there any reason this error counting isn't placed inside the "if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block? As far as I can tell, current_row_erroneous is only modified within that block, so it might make sense to keep this logic together for clarity. These may be very minor, but I noticed a few inconsistencies in casing and wording: + * If ON_ERROR is specified with IGNORE, skip rows with soft errors. + * If ON_ERROR is specified with set_null, try to replace with null. IGNORE is in uppercase, but set_null is lowercase. + * we use it to count number of rows (not fields!) that + * successfully applied on_error set_null. The sentence should begin with a capital: "We use it..." Also, I felt it's unclear what "we use it" means. Does it necessary? +COPY x to stdout (on_error set_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdout (on_error set_null); COPY is uppercase, but to is lowercase. +COPY x from stdin (format BINARY, on_error set_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE ... +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input It might be better to consider standardizing casing across all COPY statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA Japan Corporation to SRA OSS K.K. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-07-02 09:25 jian he <[email protected]> parent: torikoshia <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-07-02 09:25 UTC (permalink / raw) To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Jul 1, 2025 at 10:54 PM torikoshia <[email protected]> wrote: > > Hi, > > Thanks for updating the patch and I've read > v17-0001-COPY-on_error-set_null.patch and here are some comments. > > + if (current_row_erroneous) > + cstate->num_errors++; > > Is there any reason this error counting isn't placed inside the "if > (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block? > As far as I can tell, current_row_erroneous is only modified within that > block, so it might make sense to keep this logic together for clarity. > > These may be very minor, but I noticed a few inconsistencies in casing > and wording: > > + * If ON_ERROR is specified with IGNORE, skip rows with > soft errors. > + * If ON_ERROR is specified with set_null, try to > replace with null. > > IGNORE is in uppercase, but set_null is lowercase. > > + * we use it to count number of rows > (not fields!) that > + * successfully applied on_error > set_null. > > The sentence should begin with a capital: "We use it..." > Also, I felt it's unclear what "we use it" means. Does it necessary? > hi. I changed this comment, also heavily refactored CopyFromTextLikeOneRow based on v17-0001-COPY-on_error-set_null.patch. Now it looks way more intuitive, IMHO. CopyFromTextLikeOneRow else if (!InputFunctionCallSafe(&in_functions[m], string, typioparams[m], att->atttypmod, (Node *) cstate->escontext, &values[m])) { if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) ////code for on_errr ignore else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) ////code for on_errr set_null if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) //code for verbose message for on_error ignore or on_error set_null if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) ////code for on_errr ignore loop control else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) ////code for on_errr set_null loop control } > +COPY x to stdout (on_error set_null); > +ERROR: COPY ON_ERROR cannot be used with COPY TO > +LINE 1: COPY x to stdout (on_error set_null); > > COPY is uppercase, but to is lowercase. > > +COPY x from stdin (format BINARY, on_error set_null); > +ERROR: only ON_ERROR STOP is allowed in BINARY mode > +COPY x from stdin (on_error set_null, reject_limit 2); > +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE > ... > +COPY t_on_error_null FROM STDIN WITH (on_error set_null); > +ERROR: domain d_int_not_null does not allow null values > +CONTEXT: COPY t_on_error_null, line 1, column a: null input > > It might be better to consider standardizing casing across all COPY > statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency. > I followed near code conventions, changing the casing here seems not necessary. Attachments: [text/x-patch] v18-0001-COPY-on_error-set_null.patch (22.7K, 2-v18-0001-COPY-on_error-set_null.patch) download | inline diff: From feded9f7562f608ec97cbb08399661c6494df021 Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Wed, 2 Jul 2025 16:32:57 +0800 Subject: [PATCH v18 1/1] COPY (on_error set_null) Current grammar makes us unable to use "on_error null". if we did it, then in all the COPY command options's value, null will become reserved to non-reserved words. so we choose "on_error set_null". When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion errors will result in the affected column being set to NULL. However, column's not-null constraints are still enforced, attempting to set a NULL value in such columns will raise a constraint violation error. This applies to column data type is a domain with a NOT NULL constraint. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> Masahiko Sawada <[email protected]> Atsushi Torikoshi <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com commitfest entry: https://commitfest.postgresql.org/patch/4810 --- doc/src/sgml/ref/copy.sgml | 35 +++++++--- src/backend/commands/copy.c | 6 +- src/backend/commands/copyfrom.c | 42 +++++++++--- src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++---- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 7 ++ src/test/regress/expected/copy2.out | 60 +++++++++++++++++ src/test/regress/sql/copy2.sql | 46 +++++++++++++ 9 files changed, 247 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 8433344e5b6..26fb4be1709 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -394,23 +394,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, + and <literal>set_null</literal> means replace column containing invalid + input value with <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> + <para> + For <literal>ignore</literal> option, a <literal>NOTICE</literal> message + containing the ignored row count is emitted at the end of the <command>COPY + FROM</command> if at least one row was discarded. + For <literal>set_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows + where invalid input values were replaced with null is emitted + at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. + conversion has failed is emitted for each discarded row; + for <literal>set_null</literal> option, a <literal>NOTICE</literal> + message containing the line of the input file and the column name where + value was replaced with <literal>NULL</literal> for each input conversion + failure. When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -458,7 +472,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 74ae42b19a7..f963d0e51ff 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_null") == 0) + return COPY_ON_ERROR_SET_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index fbbbc09a97b..750d597d4d0 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", - "%" PRIu64 " rows were skipped due to data type incompatibility", - cstate->num_errors, - cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", + "%" PRIu64 " rows were skipped due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility", + "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate, } } + if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + int attr_count = list_length(cstate->attnumlist); + + cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool)); + foreach_int(attno, cstate->attnumlist) + { + int i = foreach_current_index(attno); + Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1); + cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid); + } + } + /* Set up soft error handler for ON_ERROR */ if (cstate->opts.on_error != COPY_ON_ERROR_STOP) { @@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index f52f2477df1..2147d8423fd 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, } /* - * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with IGNORE, skip rows with soft errors. + * If ON_ERROR is specified with SET_NULL, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1035,7 +1037,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - cstate->num_errors++; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + cstate->num_errors++; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + cstate->escontext->error_occurred = false; + Assert(cstate->domain_with_constraint != NULL); + + /* + * When the column's type is a domain with constraints, an + * additional InputFunctionCallSafe may be needed to raise + * errors for domain constraint violations. + */ + if (!cstate->domain_with_constraint[m] || + InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + } + else if (string == NULL) + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + else + ereport(ERROR, + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input value for domain %s: \"%s\"", + format_type_be(typioparams[m]), string)); + + /* + * We count only the number of rows (not individual fields) + * where ON_ERROR SET_NULL was successfully applied. + */ + if (!current_row_erroneous) + { + current_row_erroneous = true; + cstate->num_errors++; + } + } if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) { @@ -1052,24 +1097,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, char *attval; attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", - cstate->cur_lineno, - cstate->cur_attname, - attval)); + + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); pfree(attval); } else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", - cstate->cur_lineno, - cstate->cur_attname)); - + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", + cstate->cur_lineno, + cstate->cur_attname)); + } /* reset relname_only */ cstate->relname_only = false; } - return true; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + return true; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + continue; } cstate->cur_attname = NULL; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 8c2ea0b9587..a587f4162ee 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3305,7 +3305,7 @@ match_previous_words(int pattern_id, /* Complete COPY <sth> FROM filename WITH (ON_ERROR */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR")) - COMPLETE_WITH("stop", "ignore"); + COMPLETE_WITH("stop", "ignore", "set_null"); /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..935d21ee77a 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_SET_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..c82bfab4636 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -108,6 +108,13 @@ typedef struct CopyFromStateData * att */ bool *defaults; /* if DEFAULT marker was found for * corresponding att */ + /* + * Set to true if the corresponding attribute's data type is a domain with + * constraints. This field is usually NULL, except when ON_ERROR is set to + * SET_NULL. + */ + bool *domain_with_constraint; + bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; /* single element list of RangeTblEntry */ List *rteperminfos; /* single element list of RTEPermissionInfo */ diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae8..3f843d1cd5c 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdout (on_error set_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdout (on_error set_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -769,6 +781,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: invalid input value for domain d_int_not_null: "ss" +CONTEXT: COPY t_on_error_null, line 1, column a: "ss" +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: invalid input value for domain d_int_not_null: "-1" +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); +NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea" +CONTEXT: COPY t_on_error_null +NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce0..d77a06668e8 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_null); +COPY x from stdin (on_error set_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdout (on_error set_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -534,6 +538,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ss 11 14 +\. + +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); +10 x1 yx +11 zx 12 +13 14 ea +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-07-30 04:44 jian he <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-07-30 04:44 UTC (permalink / raw) To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> hi. rebase. Attachments: [text/x-patch] v19-0001-COPY-on_error-set_null.patch (22.7K, 2-v19-0001-COPY-on_error-set_null.patch) download | inline diff: From b3b2d794c83b36cf129d917d527ebf2cac46ca3b Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Wed, 30 Jul 2025 11:06:17 +0800 Subject: [PATCH v19 1/1] COPY (on_error set_null) Current grammar makes us unable to use "on_error null". if we did it, then in all the COPY command options's value, null will become reserved to non-reserved words. so we choose "on_error set_null". When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion errors will result in the affected column being set to NULL. However, column's not-null constraints are still enforced, attempting to set a NULL value in such columns will raise a constraint violation error. This applies to column data type is a domain with a NOT NULL constraint. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> Masahiko Sawada <[email protected]> Atsushi Torikoshi <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com commitfest entry: https://commitfest.postgresql.org/patch/4810 --- doc/src/sgml/ref/copy.sgml | 35 +++++++--- src/backend/commands/copy.c | 6 +- src/backend/commands/copyfrom.c | 42 +++++++++--- src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++---- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 7 ++ src/test/regress/expected/copy2.out | 60 +++++++++++++++++ src/test/regress/sql/copy2.sql | 46 +++++++++++++ 9 files changed, 247 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index c2d1fbc1fbe..a36e33f320f 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, + and <literal>set_null</literal> means replace column containing invalid + input value with <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> + <para> + For <literal>ignore</literal> option, a <literal>NOTICE</literal> message + containing the ignored row count is emitted at the end of the <command>COPY + FROM</command> if at least one row was discarded. + For <literal>set_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows + where invalid input values were replaced with null is emitted + at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. + conversion has failed is emitted for each discarded row; + for <literal>set_null</literal> option, a <literal>NOTICE</literal> + message containing the line of the input file and the column name where + value was replaced with <literal>NULL</literal> for each input conversion + failure. When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index fae9c41db65..9213bfb167f 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -413,12 +413,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_null") == 0) + return COPY_ON_ERROR_SET_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -928,7 +930,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index fbbbc09a97b..750d597d4d0 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", - "%" PRIu64 " rows were skipped due to data type incompatibility", - cstate->num_errors, - cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", + "%" PRIu64 " rows were skipped due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility", + "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate, } } + if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + int attr_count = list_length(cstate->attnumlist); + + cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool)); + foreach_int(attno, cstate->attnumlist) + { + int i = foreach_current_index(attno); + Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1); + cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid); + } + } + /* Set up soft error handler for ON_ERROR */ if (cstate->opts.on_error != COPY_ON_ERROR_STOP) { @@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index b1ae97b833d..ee887a37afd 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, } /* - * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with IGNORE, skip rows with soft errors. + * If ON_ERROR is specified with SET_NULL, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - cstate->num_errors++; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + cstate->num_errors++; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + cstate->escontext->error_occurred = false; + Assert(cstate->domain_with_constraint != NULL); + + /* + * If the column type is a domain with constraints, an + * additional InputFunctionCallSafe may be needed to raise + * errors for domain constraint violations. + */ + if (!cstate->domain_with_constraint[m] || + InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + } + else if (string == NULL) + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + else + ereport(ERROR, + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input value for domain %s: \"%s\"", + format_type_be(typioparams[m]), string)); + + /* + * We count only the number of rows (not individual fields) + * where ON_ERROR SET_NULL was successfully applied. + */ + if (!current_row_erroneous) + { + current_row_erroneous = true; + cstate->num_errors++; + } + } if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) { @@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, char *attval; attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", - cstate->cur_lineno, - cstate->cur_attname, - attval)); + + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); pfree(attval); } else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", - cstate->cur_lineno, - cstate->cur_attname)); - + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", + cstate->cur_lineno, + cstate->cur_attname)); + } /* reset relname_only */ cstate->relname_only = false; } - return true; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + return true; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + continue; } cstate->cur_attname = NULL; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index dbc586c5bc3..c88593e4158 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3343,7 +3343,7 @@ match_previous_words(int pattern_id, /* Complete COPY <sth> FROM filename WITH (ON_ERROR */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR")) - COMPLETE_WITH("stop", "ignore"); + COMPLETE_WITH("stop", "ignore", "set_null"); /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "LOG_VERBOSITY")) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 541176e1980..da3622028e7 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_SET_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..c82bfab4636 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -108,6 +108,13 @@ typedef struct CopyFromStateData * att */ bool *defaults; /* if DEFAULT marker was found for * corresponding att */ + /* + * Set to true if the corresponding attribute's data type is a domain with + * constraints. This field is usually NULL, except when ON_ERROR is set to + * SET_NULL. + */ + bool *domain_with_constraint; + bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; /* single element list of RangeTblEntry */ List *rteperminfos; /* single element list of RTEPermissionInfo */ diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index caa3c44f0d0..919a2296c27 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_null, on_error ignore); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_null, on_error ignore); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdout (on_error set_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdout (on_error set_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -775,6 +787,51 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: invalid input value for domain d_int_not_null: "ss" +CONTEXT: COPY t_on_error_null, line 1, column a: "ss" +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ERROR: invalid input value for domain d_int_not_null: "-1" +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); +NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea" +CONTEXT: COPY t_on_error_null +NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility +-- check inserted content +select * from t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -834,6 +891,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index cef45868db5..be05ed52def 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_null, on_error ignore); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_null); +COPY x from stdin (on_error set_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdout (on_error set_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -537,6 +541,45 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +\N 11 13 +\. + +--fail, column a is domain with not-null constraint +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +ss 11 14 +\. + +--fail, column a cannot set to null value +COPY t_on_error_null FROM STDIN WITH (on_error set_null); +-1 11 13 +\. + +--fail. less data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,1 +\. +--fail. extra data +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,2,3,4 +\. + +--ok +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); +10 x1 yx +11 zx 12 +13 14 ea +\. + +-- check inserted content +select * from t_on_error_null; +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -606,6 +649,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2025-11-10 10:22 jian he <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: jian he @ 2025-11-10 10:22 UTC (permalink / raw) To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> hi. rebase and minor cosmetic changes. -- jian https://www.enterprisedb.com/ Attachments: [text/x-patch] v20-0001-COPY-on_error-set_null.patch (22.4K, 2-v20-0001-COPY-on_error-set_null.patch) download | inline diff: From 05574920774c9bb2f93f9cb6ecea136aa673cba7 Mon Sep 17 00:00:00 2001 From: jian he <[email protected]> Date: Mon, 10 Nov 2025 18:17:09 +0800 Subject: [PATCH v20 1/1] COPY (on_error set_null) When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion errors will result in the affected column being set to NULL. However, column's not-null constraints are still enforced, attempting to set a NULL value in such columns will raise a constraint violation error. This applies to column data type is a domain with a NOT NULL constraint. Author: Jian He <[email protected]> Author: Kirill Reshke <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Jim Jones <[email protected]> "David G. Johnston" <[email protected]> Yugo NAGATA <[email protected]> torikoshia <[email protected]> Masahiko Sawada <[email protected]> Atsushi Torikoshi <[email protected]> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com commitfest entry: https://commitfest.postgresql.org/patch/4810 --- doc/src/sgml/ref/copy.sgml | 35 +++++++--- src/backend/commands/copy.c | 6 +- src/backend/commands/copyfrom.c | 42 +++++++++--- src/backend/commands/copyfromparse.c | 84 ++++++++++++++++++++---- src/bin/psql/tab-complete.in.c | 2 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 7 ++ src/test/regress/expected/copy2.out | 55 ++++++++++++++++ src/test/regress/sql/copy2.sql | 43 ++++++++++++ 9 files changed, 239 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index fdc24b36bb8..1f42fd0972d 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable Specifies how to behave when encountering an error converting a column's input value into its data type. An <replaceable class="parameter">error_action</replaceable> value of - <literal>stop</literal> means fail the command, while - <literal>ignore</literal> means discard the input row and continue with the next one. + <literal>stop</literal> means fail the command, + <literal>ignore</literal> means discard the input row and continue with the next one, + and <literal>set_null</literal> means replace column containing invalid + input value with <literal>NULL</literal> and move to the next field. The default is <literal>stop</literal>. </para> <para> - The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command> + The <literal>ignore</literal> and <literal>set_null</literal> + options are applicable only for <command>COPY FROM</command> when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>. </para> + <para> + For <literal>ignore</literal> option, a <literal>NOTICE</literal> message + containing the ignored row count is emitted at the end of the <command>COPY + FROM</command> if at least one row was discarded. + For <literal>set_null</literal> option, + a <literal>NOTICE</literal> message indicating the number of rows + where invalid input values were replaced with null is emitted + at the end of the <command>COPY FROM</command> if at least one row was replaced. + </para> <para> - A <literal>NOTICE</literal> message containing the ignored row count is - emitted at the end of the <command>COPY FROM</command> if at least one - row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to - <literal>verbose</literal>, a <literal>NOTICE</literal> message + When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>, + for <literal>ignore</literal> option, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input - conversion has failed is emitted for each discarded row. + conversion has failed is emitted for each discarded row; + for <literal>set_null</literal> option, a <literal>NOTICE</literal> + message containing the line of the input file and the column name where + value was replaced with <literal>NULL</literal> for each input conversion + failure. When it is set to <literal>silent</literal>, no message is emitted - regarding ignored rows. + regarding input conversion failed rows. </para> </listitem> </varlistentry> @@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable </para> <para> This is currently used in <command>COPY FROM</command> command when - <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>. + <literal>ON_ERROR</literal> option is set to <literal>ignore</literal> + or <literal>set_null</literal>. </para> </listitem> </varlistentry> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 28e878c3688..9d0413f4cfa 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -456,12 +456,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) parser_errposition(pstate, def->location))); /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore", "set_null" values. */ if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "set_null") == 0) + return COPY_ON_ERROR_SET_NULL; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -971,7 +973,7 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); - if (opts_out->reject_limit && !opts_out->on_error) + if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), /*- translator: first and second %s are the names of COPY option, e.g. diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 12781963b4f..87588b688f2 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate) /* Done, clean up */ error_context_stack = errcallback.previous; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0 && + if (cstate->num_errors > 0 && cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) - ereport(NOTICE, - errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", - "%" PRIu64 " rows were skipped due to data type incompatibility", - cstate->num_errors, - cstate->num_errors)); + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility", + "%" PRIu64 " rows were skipped due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility", + "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility", + cstate->num_errors, + cstate->num_errors)); + } if (bistate != NULL) FreeBulkInsertState(bistate); @@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate, } } + if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + int attr_count = list_length(cstate->attnumlist); + + cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool)); + foreach_int(attno, cstate->attnumlist) + { + int i = foreach_current_index(attno); + Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1); + cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid); + } + } + /* Set up soft error handler for ON_ERROR */ if (cstate->opts.on_error != COPY_ON_ERROR_STOP) { @@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate, cstate->escontext->error_occurred = false; /* - * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other - * options later + * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL. + * We'll add other options later */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE || + cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) cstate->escontext->details_wanted = false; } else diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index b1ae97b833d..7c0d13ab38b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, int fldct; int fieldno; char *string; + bool current_row_erroneous = false; tupDesc = RelationGetDescr(cstate->rel); attr_count = list_length(cstate->attnumlist); @@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, } /* - * If ON_ERROR is specified with IGNORE, skip rows with soft errors + * If ON_ERROR is specified with IGNORE, skip rows with soft errors. + * If ON_ERROR is specified with SET_NULL, try to replace with null. */ else if (!InputFunctionCallSafe(&in_functions[m], string, @@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, { Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - cstate->num_errors++; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + cstate->num_errors++; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + { + cstate->escontext->error_occurred = false; + Assert(cstate->domain_with_constraint != NULL); + + /* + * If the column type is a domain with constraints, an + * additional InputFunctionCallSafe may be needed to raise + * errors for domain constraint violations. + */ + if (!cstate->domain_with_constraint[m] || + InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + nulls[m] = true; + values[m] = (Datum) 0; + } + else if (string == NULL) + ereport(ERROR, + errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), + errdatatype(typioparams[m])); + else + ereport(ERROR, + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input value for domain %s: \"%s\"", + format_type_be(typioparams[m]), string)); + + /* + * We count only the number of rows (not individual fields) + * where ON_ERROR SET_NULL was successfully applied. + */ + if (!current_row_erroneous) + { + current_row_erroneous = true; + cstate->num_errors++; + } + } if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) { @@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, char *attval; attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", - cstate->cur_lineno, - cstate->cur_attname, - attval)); + + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + ereport(NOTICE, + errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); pfree(attval); } else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", - cstate->cur_lineno, - cstate->cur_attname)); - + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", + cstate->cur_lineno, + cstate->cur_attname)); + } /* reset relname_only */ cstate->relname_only = false; } - return true; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) + return true; + else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL) + continue; } cstate->cur_attname = NULL; diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index 316a2dafbf1..7124f468ba4 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -3381,7 +3381,7 @@ match_previous_words(int pattern_id, /* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") || Matches("COPY|\\copy", MatchAny, "FROM", "PROGRAM", MatchAny, "WITH", "(", "ON_ERROR")) - COMPLETE_WITH("stop", "ignore"); + COMPLETE_WITH("stop", "ignore", "set_null"); /* Complete COPY <sth> FROM [PROGRAM] filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "LOG_VERBOSITY") || diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 541176e1980..da3622028e7 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice { COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ COPY_ON_ERROR_IGNORE, /* ignore errors */ + COPY_ON_ERROR_SET_NULL, /* set error field to null */ } CopyOnErrorChoice; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..a606a38cc23 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -108,6 +108,13 @@ typedef struct CopyFromStateData * att */ bool *defaults; /* if DEFAULT marker was found for * corresponding att */ + /* + * Set to true if the corresponding attribute's data type is a domain with + * constraints. This field is usually NULL, except when ON_ERROR is set to + * SET_NULL. + */ + bool *domain_with_constraint; + bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; /* single element list of RangeTblEntry */ List *rteperminfos; /* single element list of RTEPermissionInfo */ diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index f3fdce23459..16e70ccb813 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (on_error ignore, on_error ignore); ^ +COPY x from stdin (on_error set_null, on_error set_null); +ERROR: conflicting or redundant options +LINE 1: COPY x from stdin (on_error set_null, on_error set_null); + ^ COPY x from stdin (log_verbosity default, log_verbosity verbose); ERROR: conflicting or redundant options LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb... @@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x'); ERROR: cannot specify NULL in BINARY mode COPY x from stdin (format BINARY, on_error ignore); ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (format BINARY, on_error set_null); +ERROR: only ON_ERROR STOP is allowed in BINARY mode +COPY x from stdin (on_error set_null, reject_limit 2); +ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE COPY x from stdin (on_error unsupported); ERROR: COPY ON_ERROR "unsupported" not recognized LINE 1: COPY x from stdin (on_error unsupported); @@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported); ERROR: COPY ON_ERROR cannot be used with COPY TO LINE 1: COPY x to stdout (format BINARY, on_error unsupported); ^ +COPY x to stdout (on_error set_null); +ERROR: COPY ON_ERROR cannot be used with COPY TO +LINE 1: COPY x to stdout (on_error set_null); + ^ COPY x from stdin (log_verbosity unsupported); ERROR: COPY LOG_VERBOSITY "unsupported" not recognized LINE 1: COPY x from stdin (log_verbosity unsupported); @@ -776,6 +788,46 @@ CONTEXT: COPY check_ign_err NOTICE: skipping row due to data type incompatibility at line 8 for column "k": "a" CONTEXT: COPY check_ign_err NOTICE: 6 rows were skipped due to data type incompatibility +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); +\pset null NULL +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +ERROR: domain d_int_not_null does not allow null values +CONTEXT: COPY t_on_error_null, line 1, column a: null input +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +ERROR: invalid input value for domain d_int_not_null: "ss" +CONTEXT: COPY t_on_error_null, line 1, column a: "ss" +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +ERROR: invalid input value for domain d_int_not_null: "-1" +CONTEXT: COPY t_on_error_null, line 1, column a: "-1" +--fail, less data. +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: missing data for column "c" +CONTEXT: COPY t_on_error_null, line 1: "1,1" +--fail, extra data. +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +ERROR: extra data after last expected column +CONTEXT: COPY t_on_error_null, line 1: "1,2,3,4" +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok +NOTICE: setting to null due to data type incompatibility at line 1 for column "b": "x1" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 1 for column "c": "yx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 2 for column "b": "zx" +CONTEXT: COPY t_on_error_null +NOTICE: setting to null due to data type incompatibility at line 3 for column "c": "ea" +CONTEXT: COPY t_on_error_null +NOTICE: invalid values in 3 rows were replaced with null due to data type incompatibility +SELECT * FROM t_on_error_null; + a | b | c +----+------+------ + 10 | NULL | NULL + 11 | NULL | 12 + 13 | 14 | NULL +(3 rows) + +\pset null '' -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -835,6 +887,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index cef45868db5..49200cf064d 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b)); COPY x from stdin (convert_selectively (a), convert_selectively (b)); COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii'); COPY x from stdin (on_error ignore, on_error ignore); +COPY x from stdin (on_error set_null, on_error set_null); COPY x from stdin (log_verbosity default, log_verbosity verbose); -- incorrect options COPY x from stdin (format BINARY, delimiter ','); COPY x from stdin (format BINARY, null 'x'); COPY x from stdin (format BINARY, on_error ignore); +COPY x from stdin (format BINARY, on_error set_null); +COPY x from stdin (on_error set_null, reject_limit 2); COPY x from stdin (on_error unsupported); COPY x from stdin (format TEXT, force_quote(a)); COPY x from stdin (format TEXT, force_quote *); @@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *); COPY x to stdout (format CSV, force_null(a)); COPY x to stdout (format CSV, force_null *); COPY x to stdout (format BINARY, on_error unsupported); +COPY x to stdout (on_error set_null); COPY x from stdin (log_verbosity unsupported); COPY x from stdin with (reject_limit 1); COPY x from stdin with (on_error ignore, reject_limit 0); @@ -537,6 +541,42 @@ a {2} 2 8 {8} 8 \. +CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0); +CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0); +CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT); + +\pset null NULL +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +\N 11 13 +\. + +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +ss 11 14 +\. + +COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail +-1 11 13 +\. + +--fail, less data. +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,1 +\. +--fail, extra data. +COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null); +1,2,3,4 +\. + +COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok +10 x1 yx +11 zx 12 +13 14 ea +\. + +SELECT * FROM t_on_error_null; + +\pset null '' + -- tests for on_error option with log_verbosity and null constraint via domain CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL; CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2); @@ -606,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view; DROP VIEW instead_of_insert_tbl_view_2; DROP FUNCTION fun_instead_of_insert_tbl(); DROP TABLE check_ign_err; +DROP TABLE t_on_error_null; +DROP DOMAIN d_int_not_null; +DROP DOMAIN d_int_positive_maybe_null; DROP TABLE check_ign_err2; DROP DOMAIN dcheck_ign_err2; DROP TABLE hard_err; -- 2.34.1 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2026-03-04 13:25 Fujii Masao <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Fujii Masao @ 2026-03-04 13:25 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: jian he <[email protected]>; Matheus Alcantara <[email protected]>; torikoshia <[email protected]>; Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Mar 3, 2026 at 3:38 PM Peter Eisentraut <[email protected]> wrote: > Thanks, committed. Thanks for committing the patch! With this change, ON_ERROR = 'set_null' can now be used with foreign tables backed by file_fdw. However, unlike ON_ERROR = 'ignore', there is currently no regression test covering this behavior in file_fdw. How about adding a regression test to ensure that file_fdw works correctly with ON_ERROR = 'set_null', and to improve test coverage? Patch attached. Regards, -- Fujii Masao Attachments: [application/octet-stream] v1-0001-file_fdw-Add-regression-test-for-file_fdw-with-ON.patch (2.3K, 2-v1-0001-file_fdw-Add-regression-test-for-file_fdw-with-ON.patch) download | inline diff: From 400adb16a50cc817eaea816fc6735a7b4e7af9aa Mon Sep 17 00:00:00 2001 From: Fujii Masao <[email protected]> Date: Wed, 4 Mar 2026 22:19:06 +0900 Subject: [PATCH v1] file_fdw: Add regression test for file_fdw with ON_ERROR='set_null'. Commit 2a525cc97e1 introduced the ON_ERROR = 'set_null' option for COPY, allowing it to be used with foreign tables backed by file_fdw. However, unlike ON_ERROR = 'ignore', no regression test was added to verify this behavior for file_fdw. This commit adds a regression test to ensure that foreign tables using file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage. --- contrib/file_fdw/expected/file_fdw.out | 12 +++++++++++- contrib/file_fdw/sql/file_fdw.sql | 4 +++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out index 251f00bd258..640986528ae 100644 --- a/contrib/file_fdw/expected/file_fdw.out +++ b/contrib/file_fdw/expected/file_fdw.out @@ -246,7 +246,17 @@ SELECT * FROM agg_bad; -- ERROR ERROR: invalid input syntax for type real: "aaa" CONTEXT: COPY agg_bad, line 3, column b: "aaa" -- on_error, log_verbosity and reject_limit tests -ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'set_null'); +SELECT * FROM agg_bad; + a | b +-----+-------- + 100 | 99.097 + 0 | _null_ + 42 | 324.78 + 1 | _null_ +(4 rows) + +ALTER FOREIGN TABLE agg_bad OPTIONS (SET on_error 'ignore'); SELECT * FROM agg_bad; NOTICE: 2 rows were skipped due to data type incompatibility a | b diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql index 2cba84b1db7..56bfc926c00 100644 --- a/contrib/file_fdw/sql/file_fdw.sql +++ b/contrib/file_fdw/sql/file_fdw.sql @@ -171,7 +171,9 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a; SELECT * FROM agg_bad; -- ERROR -- on_error, log_verbosity and reject_limit tests -ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'set_null'); +SELECT * FROM agg_bad; +ALTER FOREIGN TABLE agg_bad OPTIONS (SET on_error 'ignore'); SELECT * FROM agg_bad; ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent'); SELECT * FROM agg_bad; -- 2.51.2 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2026-03-13 02:57 Fujii Masao <[email protected]> parent: Fujii Masao <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Fujii Masao @ 2026-03-13 02:57 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: jian he <[email protected]>; Matheus Alcantara <[email protected]>; torikoshia <[email protected]>; Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, Mar 4, 2026 at 10:25 PM Fujii Masao <[email protected]> wrote: > > On Tue, Mar 3, 2026 at 3:38 PM Peter Eisentraut <[email protected]> wrote: > > Thanks, committed. > > Thanks for committing the patch! > > With this change, ON_ERROR = 'set_null' can now be used with foreign tables > backed by file_fdw. However, unlike ON_ERROR = 'ignore', there is currently > no regression test covering this behavior in file_fdw. > > How about adding a regression test to ensure that file_fdw works correctly > with ON_ERROR = 'set_null', and to improve test coverage? Patch attached. Barring any objections, I will commit the patch. Regards, -- Fujii Masao ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2026-03-13 13:50 Fujii Masao <[email protected]> parent: Fujii Masao <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Fujii Masao @ 2026-03-13 13:50 UTC (permalink / raw) To: Yi Ding <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; jian he <[email protected]>; Matheus Alcantara <[email protected]>; torikoshia <[email protected]>; Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> On Fri, Mar 13, 2026 at 2:51 PM Yi Ding <[email protected]> wrote: > The new test added in v1 makes sense to me. A small suggestion is that to verify if a field is really null, we can do: > > ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'set_null'); > SELECT a, b IS NULL FROM agg_bad; Since the file_fdw test runs "\pset null _null_", a NULL value is displayed as "_null_". So you can verify that the value is NULL by checking whether "_null_" is shown. One could argue that this cannot be distinguished from the literal text value "_null_". However, relying on "\pset null _null_" is sufficient for this test, I think. Regards, -- Fujii Masao ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: Re: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row @ 2026-03-16 03:15 Fujii Masao <[email protected]> parent: Fujii Masao <[email protected]> 0 siblings, 0 replies; 20+ messages in thread From: Fujii Masao @ 2026-03-16 03:15 UTC (permalink / raw) To: Yi Ding <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; jian he <[email protected]>; Matheus Alcantara <[email protected]>; torikoshia <[email protected]>; Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, Mar 16, 2026 at 11:39 AM Yi Ding <[email protected]> wrote: > > > At 2026-03-13 21:50:07, "Fujii Masao" <[email protected]> wrote: > >On Fri, Mar 13, 2026 at 2:51 PM Yi Ding <[email protected]> wrote: > >> The new test added in v1 makes sense to me. A small suggestion is that to verify if a field is really null, we can do: > >> > >> ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'set_null'); > >> SELECT a, b IS NULL FROM agg_bad; > > > >Since the file_fdw test runs "\pset null _null_", a NULL value is displayed as > >"_null_". So you can verify that the value is NULL by checking whether > >"_null_" is shown. > > > >One could argue that this cannot be distinguished from the literal text value > >"_null_". However, relying on "\pset null _null_" is sufficient for this test, > >I think. > > > >Regards, > > > >-- > >Fujii Masao > > > > Sounds reasonable,that addressed my comment. I've pushed the patch. Thanks! Regards, -- Fujii Masao ^ permalink raw reply [nested|flat] 20+ messages in thread
end of thread, other threads:[~2026-03-16 03:15 UTC | newest] Thread overview: 20+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-01-26 15:08 Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row David G. Johnston <[email protected]> 2024-01-28 23:50 ` jian he <[email protected]> 2024-01-29 00:19 ` David G. Johnston <[email protected]> 2024-01-29 08:28 ` Yugo NAGATA <[email protected]> 2025-03-07 03:48 ` jian he <[email protected]> 2025-03-11 10:31 ` Jim Jones <[email protected]> 2025-03-12 08:00 ` jian he <[email protected]> 2025-03-12 08:25 ` Jim Jones <[email protected]> 2025-03-18 03:55 ` jian he <[email protected]> 2025-03-21 06:34 ` vignesh C <[email protected]> 2025-03-24 07:50 ` jian he <[email protected]> 2025-03-25 06:31 ` vignesh C <[email protected]> 2025-07-01 14:54 ` torikoshia <[email protected]> 2025-07-02 09:25 ` jian he <[email protected]> 2025-07-30 04:44 ` jian he <[email protected]> 2025-11-10 10:22 ` jian he <[email protected]> 2026-03-04 13:25 ` Fujii Masao <[email protected]> 2026-03-13 02:57 ` Fujii Masao <[email protected]> 2026-03-13 13:50 ` Fujii Masao <[email protected]> 2026-03-16 03:15 ` Fujii Masao <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox