public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
22+ messages / 7 participants
[nested] [flat]

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-01-10 06:38  jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-01-10 06:38 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Wed, Jan 8, 2025 at 3:05 PM Kirill Reshke <[email protected]> wrote:
>
> So, in this version you essentially removed support for REJECT_LIMIT +
> SET_TO_NULL feature? Looks like a promising change. It is more likely
> to see this committed.
> So, +1 on that too.
>
> However, v9 lacks tests for REJECT_LIMIT vs erroneous rows tests.
> In short, we need  this message somewhere in a regression test.
> ```
> ERROR:  skipped more than REJECT_LIMIT (xxx) rows due to data type
> incompatibility
> ```
>

hi.
you already answered this question.
since we do not support REJECT_LIMIT+SET_TO_NULL,
so these code path would not be reachable.

> Also, please update commit msg with all authors and reviewers. This
> will make committer job a little bit easier
>
commit message polished.
here and there cosmetic changes.

I think there are three remaining issues that may need more attention
1.
Table 27.42. pg_stat_progress_copy View
(<structname>pg_stat_progress_copy</structname>)
column pg_stat_progress_copy.tuples_skipped now the description is
""
When the ON_ERROR option is set to ignore, this value shows the number of tuples
skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
this value shows the number of tuples where malformed data was converted to
NULL.
"""
now the column name tuples_skipped would not be that suitable for
(on_error set_to_null).
since now it is not tuple skipped, it is in a tuple some value was set to null.
Or
we can skip progress reports for (on_error set_to_null) case.

2. The doc is not very great, I guess.
3. do we settled (on_error set_to_null) syntax.


Attachments:

  [text/x-patch] v10-0001-extent-on_error-action-introduce-new-option-on_e.patch (21.3K, 2-v10-0001-extent-on_error-action-introduce-new-option-on_e.patch)
  download | inline diff:
From 2f77abbb058c952715838d31b1e04f678a079e30 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 10 Jan 2025 14:33:55 +0800
Subject: [PATCH v10 1/1] extent "on_error action", introduce new option:
 on_error set_to_null.

due to current grammar, we cannot use "on_error null", so i choose on_error set_to_null.

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using
the non-binary format for COPY FROM. However, the not-null constraint will still
be enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>,
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>,
Jim Jones <[email protected]>,
"David G. Johnston" <[email protected]>,
Yugo NAGATA <[email protected]>,
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml             |  8 ++--
 doc/src/sgml/ref/copy.sgml               | 27 +++++++----
 src/backend/commands/copy.c              |  6 ++-
 src/backend/commands/copyfrom.c          | 33 +++++++++----
 src/backend/commands/copyfromparse.c     | 46 +++++++++++++++++-
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  4 +-
 src/test/regress/expected/copy2.out      | 60 ++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql           | 46 ++++++++++++++++++
 10 files changed, 205 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..6639561384 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5975,10 +5975,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
        <structfield>tuples_skipped</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of tuples skipped because they contain malformed data.
-       This counter only advances when a value other than
-       <literal>stop</literal> is specified to the <literal>ON_ERROR</literal>
-       option.
+       When the <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>,
+       this value shows the number of tuples skipped due to malformed data.
+       When the <literal>ON_ERROR</literal> option is set to <literal>set_to_null</literal>,
+       this value shows the number of tuples where malformed data was converted to NULL.
       </para></entry>
      </row>
     </tbody>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..4346fb0756 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing erroneous input values with <literal>null</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal> options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
       emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null happened is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+      </para>
+      <para>
+      When <literal>LOG_VERBOSITY</literal> option is set to
+      <literal>verbose</literal>, for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message specifies the line number of the input file and column name
+      where the input value was replaced with NULL due to input conversion failure.
       When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..afe60758d4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..33bd67767e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1029,6 +1029,10 @@ CopyFrom(CopyFromState cstate)
 			continue;
 		}
 
+		/* Report that this tuple some value was replaced with NULL by the ON_ERROR clause */
+		if (cstate->opts.on_error == COPY_ON_ERROR_NULL && cstate->num_errors > 0)
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+										 cstate->num_errors);
 		ExecStoreVirtualTuple(myslot);
 
 		/*
@@ -1321,14 +1325,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("Erroneous values in %llu row was replaced with NULL",
+								  "Erroneous values in %llu rows were replaced with NULL",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1474,10 +1486,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..8d5ab08491 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 		int			fldct;
 		int			fieldno;
 		char	   *string;
+		bool		current_row_erroneous = false;
 
 		/* read raw fields in the next line */
 		if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 			/*
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
-			 * errors
+			 * errors. If ON_ERROR is specified with set_to_null, try
+			 * to replace with NULL.
 			 */
 			else if (!InputFunctionCallSafe(&in_functions[m],
 											string,
@@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			{
 				Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+				if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+				{
+					/*
+					 * we use this count the number of rows (not fields) that
+					 * successfully applied the on_error set_to_null
+					*/
+					if (!current_row_erroneous)
+						current_row_erroneous = true;
+
+					/*
+					 * we need another InputFunctionCallSafe so we can error out
+					 * not-null violation for domain with not-null constraint.
+					*/
+					cstate->escontext->error_occurred = false;
+					if (InputFunctionCallSafe(&in_functions[m],
+											  NULL,
+											  typioparams[m],
+											  att->atttypmod,
+											  (Node *) cstate->escontext,
+											  &values[m]))
+					{
+						nulls[m] = true;
+						values[m] = (Datum) 0;
+						if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+							ereport(NOTICE,
+									errmsg("column \"%s\" was set to NULL due to data type incompatibility at line %llu",
+											cstate->cur_attname,
+											(unsigned long long) cstate->cur_lineno));
+						continue;
+					}
+					else
+						ereport(ERROR,
+								errcode(ERRCODE_NOT_NULL_VIOLATION),
+								errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+								errdatatype(typioparams[m]));
+				}
+
 				cstate->num_errors++;
 
-				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+					cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 				{
 					/*
 					 * Since we emit line number and column info in the below
@@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			cstate->cur_attval = NULL;
 		}
 
+		if (current_row_erroneous)
+			cstate->num_errors++;
 		Assert(fieldno == attr_count);
 	}
 	else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..04a155ad5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
 					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
-					  "ON_ERROR", "LOG_VERBOSITY");
+					  "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7ebf4f7893 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e..50759eaf1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
 	ErrorSaveContext *escontext;	/* soft error trapped during in_functions
 									 * execution */
 	uint64		num_errors;		/* total number of rows which contained soft
-								 * errors */
+								 * errors, for ON_ERROR set_to_null, it's the
+								 * number of rows successfully converted to null
+								*/
 	int		   *defmap;			/* array of default att numbers related to
 								 * missing att */
 	ExprState **defexprs;		/* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..377be5b99d 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to NULL due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to NULL due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to NULL due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to NULL due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  Erroneous values in 3 rows were replaced with NULL
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..6cd477af14 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to NULL value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-01-11 09:53  Kirill Reshke <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Kirill Reshke @ 2025-01-11 09:53 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, 10 Jan 2025 at 11:38, jian he <[email protected]> wrote:
> I think there are three remaining issues that may need more attention
> 1.
> Table 27.42. pg_stat_progress_copy View
> (<structname>pg_stat_progress_copy</structname>)
> column pg_stat_progress_copy.tuples_skipped now the description is
> ""
> When the ON_ERROR option is set to ignore, this value shows the number of tuples
> skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
> this value shows the number of tuples where malformed data was converted to
> NULL.
> """
> now the column name tuples_skipped would not be that suitable for
> (on_error set_to_null).
> since now it is not tuple skipped, it is in a tuple some value was set to null.

Indeed this is something we need to fix.

> Or
> we can skip progress reports for (on_error set_to_null) case.

Maybe we can add a `malformed_tuples` column to this view?


> 3. do we settled (on_error set_to_null) syntax.

I think so. I prefer this syntax to others discussed in this thread.



-- 
Best regards,
Kirill Reshke






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-01-14 05:51  jian he <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-01-14 05:51 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Jan 11, 2025 at 5:54 PM Kirill Reshke <[email protected]> wrote:
>
> On Fri, 10 Jan 2025 at 11:38, jian he <[email protected]> wrote:
> > I think there are three remaining issues that may need more attention
> > 1.
> > Table 27.42. pg_stat_progress_copy View
> > (<structname>pg_stat_progress_copy</structname>)
> > column pg_stat_progress_copy.tuples_skipped now the description is
> > ""
> > When the ON_ERROR option is set to ignore, this value shows the number of tuples
> > skipped due to malformed data. When the ON_ERROR option is set to set_to_null,
> > this value shows the number of tuples where malformed data was converted to
> > NULL.
> > """
> > now the column name tuples_skipped would not be that suitable for
> > (on_error set_to_null).
> > since now it is not tuple skipped, it is in a tuple some value was set to null.
>
> Indeed this is something we need to fix.
>
> > Or
> > we can skip progress reports for (on_error set_to_null) case.
>
> Maybe we can add a `malformed_tuples` column to this view?
>
we can do this later.
so for on_error set_to_null, i've removed pgstat_progress_update_param
related code.

the attached patch also did some doc enhancement, error message enhancement.


Attachments:

  [text/x-patch] v11-0001-COPY-on_error-set_to_null.patch (20.6K, 2-v11-0001-COPY-on_error-set_to_null.patch)
  download | inline diff:
From a95d42bf1e6044c6c9a2afbb15d168d6679eceab Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 14 Jan 2025 13:46:12 +0800
Subject: [PATCH v11 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using the
non-binary format for COPY FROM. However, the not-null constraint will still be
enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>,
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>,
Jim Jones <[email protected]>,
"David G. Johnston" <[email protected]>,
Yugo NAGATA <[email protected]>,
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml               | 34 +++++++++-----
 src/backend/commands/copy.c              |  6 ++-
 src/backend/commands/copyfrom.c          | 29 ++++++++----
 src/backend/commands/copyfromparse.c     | 46 +++++++++++++++++-
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  4 +-
 src/test/regress/expected/copy2.out      | 60 ++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql           | 46 ++++++++++++++++++
 9 files changed, 201 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..5e1d08ab91 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing erroneous input values with
+      <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null
+      happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
+     <para>
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the line of the input file and the column name
+      where value was replaced with <literal>NULL</literal> for each input conversion failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..afe60758d4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..c38ff3dc6f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,14 +1321,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("erroneous values in %llu row was replaced with null",
+								  "erroneous values in %llu rows were replaced with null",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1474,10 +1482,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..c0f6ce5057 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 		int			fldct;
 		int			fieldno;
 		char	   *string;
+		bool		current_row_erroneous = false;
 
 		/* read raw fields in the next line */
 		if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
 			/*
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
-			 * errors
+			 * errors. If ON_ERROR is specified with set_to_null, try
+			 * to replace with null.
 			 */
 			else if (!InputFunctionCallSafe(&in_functions[m],
 											string,
@@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			{
 				Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+				if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+				{
+					/*
+					 * we use this count the number of rows (not fields) that
+					 * successfully applied the on_error set_to_null
+					*/
+					if (!current_row_erroneous)
+						current_row_erroneous = true;
+
+					/*
+					 * we need another InputFunctionCallSafe so we can error out
+					 * not-null violation for domain with not-null constraint.
+					*/
+					cstate->escontext->error_occurred = false;
+					if (InputFunctionCallSafe(&in_functions[m],
+											  NULL,
+											  typioparams[m],
+											  att->atttypmod,
+											  (Node *) cstate->escontext,
+											  &values[m]))
+					{
+						nulls[m] = true;
+						values[m] = (Datum) 0;
+						if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+							ereport(NOTICE,
+									errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+											cstate->cur_attname,
+											(unsigned long long) cstate->cur_lineno));
+						continue;
+					}
+					else
+						ereport(ERROR,
+								errcode(ERRCODE_NOT_NULL_VIOLATION),
+								errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+								errdatatype(typioparams[m]));
+				}
+
 				cstate->num_errors++;
 
-				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+					cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 				{
 					/*
 					 * Since we emit line number and column info in the below
@@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			cstate->cur_attval = NULL;
 		}
 
+		if (current_row_erroneous)
+			cstate->num_errors++;
 		Assert(fieldno == attr_count);
 	}
 	else
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..04a155ad5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
 					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
-					  "ON_ERROR", "LOG_VERBOSITY");
+					  "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7ebf4f7893 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e..50759eaf1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
 	ErrorSaveContext *escontext;	/* soft error trapped during in_functions
 									 * execution */
 	uint64		num_errors;		/* total number of rows which contained soft
-								 * errors */
+								 * errors, for ON_ERROR set_to_null, it's the
+								 * number of rows successfully converted to null
+								*/
 	int		   *defmap;			/* array of default att numbers related to
 								 * missing att */
 	ExprState **defexprs;		/* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..9a5acef8db 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  erroneous values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..003a91648e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-01-20 14:03  Kirill Reshke <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Kirill Reshke @ 2025-01-20 14:03 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, 14 Jan 2025 at 10:51, jian he <[email protected]> wrote:
>
> the attached patch also did some doc enhancement, error message enhancement.

LGTM


-- 
Best regards,
Kirill Reshke






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-07 03:48  jian he <[email protected]>
  parent: Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-03-07 03:48 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; Jim Jones <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

hi.
rebase only.


Attachments:

  [text/x-patch] v12-0001-COPY-on_error-set_to_null.patch (19.8K, 2-v12-0001-COPY-on_error-set_to_null.patch)
  download | inline diff:
From ce0ce6438094cad553e509db65b7fd27de2b9af6 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 7 Mar 2025 11:43:51 +0800
Subject: [PATCH v12 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using the
non-binary format for COPY FROM. However, the not-null constraint will still be
enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml           | 34 +++++++++++-----
 src/backend/commands/copy.c          |  6 ++-
 src/backend/commands/copyfrom.c      | 29 +++++++++-----
 src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          |  1 +
 src/test/regress/expected/copy2.out  | 60 ++++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql       | 46 +++++++++++++++++++++
 8 files changed, 196 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..91bc25b9ab3 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing erroneous input values with
+      <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null
+      happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
+     <para>
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the line of the input file and the column name
+      where value was replaced with <literal>NULL</literal> for each input conversion failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..4502ce2d366 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("erroneous values in %llu row was replaced with null",
+								  "erroneous values in %llu rows were replaced with null",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 
 		/*
 		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with set_to_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			{
+				/*
+				 * we use this count the number of rows (not fields) that
+				 * successfully applied the on_error set_to_null
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
+
+				/*
+				 * we need another InputFunctionCallSafe so we can error out
+				 * not-null violation for domain with not-null constraint.
+				*/
+				cstate->escontext->error_occurred = false;
+				if (InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+						ereport(NOTICE,
+								errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+										cstate->cur_attname,
+										(unsigned long long) cstate->cur_lineno));
+					continue;
+				}
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+			}
 			cstate->num_errors++;
 
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+				cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			{
 				/*
 				 * Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8432be641ac..fcdc61bcb57 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3279,7 +3279,7 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
 					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
-					  "ON_ERROR", "LOG_VERBOSITY");
+					  "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..9a5acef8db0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  erroneous values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-11 10:31  Jim Jones <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Jim Jones @ 2025-03-11 10:31 UTC (permalink / raw)
  To: jian he <[email protected]>; Kirill Reshke <[email protected]>; +Cc: Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi Jian

On 07.03.25 04:48, jian he wrote:
> hi.
> rebase only.


I revisited this patch today. It applies and builds cleanly, and it
works as expected.

Some tests and minor comments:

====

1) WARNING might be a better fit than NOTICE here.

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format csv);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
  x   |  y   |  z   
------+------+------
    1 | NULL | NULL
    2 |    1 | NULL
    3 |    2 | NULL
    4 | NULL | NULL
 NULL | NULL | NULL
(5 rows)


postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error set_to_null, format
csv, log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  column "y" was set to null due to data type incompatibility at
line 1
NOTICE:  column "y" was set to null due to data type incompatibility at
line 4
NOTICE:  column "x" was set to null due to data type incompatibility at
line 5
NOTICE:  column "y" was set to null due to data type incompatibility at
line 5
NOTICE:  erroneous values in 3 rows were replaced with null
COPY 5
postgres=# SELECT * FROM t;
  x   |  y   |  z   
------+------+------
    1 | NULL | NULL
    2 |    1 | NULL
    3 |    2 | NULL
    4 | NULL | NULL
 NULL | NULL | NULL
(5 rows)


I would still leave the extra messages from "log_verbosity verbose" as
NOTICE though. What do you think?

====

2) Inconsistent terminology. Invalid values in "on_error set_to_null"
mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
I don't want to get into the semantics of erroneous or invalid, but
sticking to one terminology would IMHO look better.

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error stop, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
ERROR:  invalid input syntax for type integer: "a"
CONTEXT:  COPY t, line 1, column y: "a"
postgres=# SELECT * FROM t;
 x | y | z
---+---+---
(0 rows)

====

3) same as in 1)

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE:  skipping row due to data type incompatibility at line 4 for
column "y": "b"
NOTICE:  skipping row due to data type incompatibility at line 5 for
column "x": "a"
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres=# SELECT * FROM t;
 x | y |  z   
---+---+------
 2 | 1 | NULL
 3 | 2 | NULL
(2 rows)====

====

"on_error ignore" works well with "reject_limit #"

postgres=# \pset null NULL
Null display is "NULL".
postgres=# CREATE TEMPORARY TABLE t (x int, y int, z text);
CREATE TABLE
postgres=# COPY t (x,y) FROM STDIN WITH (on_error ignore, format csv,
log_verbosity verbose, reject_limit 1);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,a
>> 2,1
>> 3,2
>> 4,b
>> a,c
>> \.
NOTICE:  skipping row due to data type incompatibility at line 1 for
column "y": "a"
NOTICE:  skipping row due to data type incompatibility at line 4 for
column "y": "b"
ERROR:  skipped more than REJECT_LIMIT (1) rows due to data type
incompatibility
CONTEXT:  COPY t, line 4, column y: "b"
postgres=# SELECT * FROM t;
 x | y | z
---+---+---
(0 rows)

best regards, Jim





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-12 08:00  jian he <[email protected]>
  parent: Jim Jones <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-03-12 08:00 UTC (permalink / raw)
  To: Jim Jones <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Mar 11, 2025 at 6:31 PM Jim Jones <[email protected]> wrote:
>
>
> I revisited this patch today. It applies and builds cleanly, and it
> works as expected.
>
> Some tests and minor comments:
>

hi. Jim Jones.
thanks for testsing it again!


> ====
>
> 1) WARNING might be a better fit than NOTICE here.
>

but NOTICE, on_errror set_to_null is aligned with on_errror ignore.

>
> I would still leave the extra messages from "log_verbosity verbose" as
> NOTICE though. What do you think?
>
> ====

When LOG_VERBOSITY option is set to verbose,
for ignore option, a NOTICE message containing the line of the input
file and the column name
whose input conversion has failed is emitted for each discarded row;
for set_to_null option, a NOTICE message containing the line of the
input file and the column name
where value was replaced with NULL for each input conversion failure.

see the above desciption,
on_errror set_to_null is aligned with on_errror ignore.
it's just on_errror ignore is per row, on_errror set_to_null is per
column/field.
so NOTICE is aligned with other on_error option.

>
> 2) Inconsistent terminology. Invalid values in "on_error set_to_null"
> mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
> I don't want to get into the semantics of erroneous or invalid, but
> sticking to one terminology would IMHO look better.
>
I am open to changing it.
what do you think "invalid values in %llu row was replaced with null"?

> ====
>
> "on_error ignore" works well with "reject_limit #"
>
i remember there was some confusion about on_error set_to_null with
reject_limit option.
I choose to not suport it.
obviously, if there is consenses, we can support it later.





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-12 08:25  Jim Jones <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Jim Jones @ 2025-03-12 08:25 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>


On 12.03.25 09:00, jian he wrote:
>> 1) WARNING might be a better fit than NOTICE here.
>>
> but NOTICE, on_errror set_to_null is aligned with on_errror ignore.
>
>> I would still leave the extra messages from "log_verbosity verbose" as
>> NOTICE though. What do you think?
>>
>> ====
> When LOG_VERBOSITY option is set to verbose,
> for ignore option, a NOTICE message containing the line of the input
> file and the column name
> whose input conversion has failed is emitted for each discarded row;
> for set_to_null option, a NOTICE message containing the line of the
> input file and the column name
> where value was replaced with NULL for each input conversion failure.
>
> see the above desciption,
> on_errror set_to_null is aligned with on_errror ignore.
> it's just on_errror ignore is per row, on_errror set_to_null is per
> column/field.
> so NOTICE is aligned with other on_error option.


I considered using a WARNING due to the severity of the issue - the
failure to import data - but either NOTICE or WARNING works for me.


>> 2) Inconsistent terminology. Invalid values in "on_error set_to_null"
>> mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
>> I don't want to get into the semantics of erroneous or invalid, but
>> sticking to one terminology would IMHO look better.
>>
> I am open to changing it.
> what do you think "invalid values in %llu row was replaced with null"?


LGTM: "invalid values in %llu rows were replaced with null"

Thanks for the patch!

Best, Jim






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-18 03:55  jian he <[email protected]>
  parent: Jim Jones <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-03-18 03:55 UTC (permalink / raw)
  To: Jim Jones <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Wed, Mar 12, 2025 at 4:26 PM Jim Jones <[email protected]> wrote:
>
> >> 2) Inconsistent terminology. Invalid values in "on_error set_to_null"
> >> mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
> >> I don't want to get into the semantics of erroneous or invalid, but
> >> sticking to one terminology would IMHO look better.
> >>
> > I am open to changing it.
> > what do you think "invalid values in %llu row was replaced with null"?
>
> LGTM: "invalid values in %llu rows were replaced with null"
>
changed based on this.

also minor documentation tweaks.


Attachments:

  [text/x-patch] v13-0001-COPY-on_error-set_to_null.patch (19.8K, 2-v13-0001-COPY-on_error-set_to_null.patch)
  download | inline diff:
From 3553eee56c8dd0c3ce334d1f37b511acbbc640af Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 18 Mar 2025 11:51:48 +0800
Subject: [PATCH v13 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml           | 34 +++++++++++-----
 src/backend/commands/copy.c          |  6 ++-
 src/backend/commands/copyfrom.c      | 29 +++++++++-----
 src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          |  1 +
 src/test/regress/expected/copy2.out  | 60 ++++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql       | 46 +++++++++++++++++++++
 8 files changed, 196 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..cdb725d7565 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing invalid input values with
+      <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows where invalid input values were replaced with
+      null is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
+     <para>
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the line of the input file and the column name
+      where value was replaced with <literal>NULL</literal> for each input conversion failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..43a227eae72 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %llu row was replaced with null",
+								  "invalid values in %llu rows were replaced with null",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 
 		/*
 		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with set_to_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			{
+				/*
+				 * we use this count the number of rows (not fields) that
+				 * successfully applied the on_error set_to_null
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
+
+				/*
+				 * we need another InputFunctionCallSafe so we can error out
+				 * not-null violation for domain with not-null constraint.
+				*/
+				cstate->escontext->error_occurred = false;
+				if (InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+						ereport(NOTICE,
+								errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+										cstate->cur_attname,
+										(unsigned long long) cstate->cur_lineno));
+					continue;
+				}
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+			}
 			cstate->num_errors++;
 
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+				cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			{
 				/*
 				 * Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 9a4d993e2bc..7980513a9bd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
 					  "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
 					  "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
-					  "ON_ERROR", "LOG_VERBOSITY");
+					  "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
 
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa94bfd526 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-21 06:34  vignesh C <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: vignesh C @ 2025-03-21 06:34 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, 18 Mar 2025 at 09:26, jian he <[email protected]> wrote:
>
> changed based on this.
>
> also minor documentation tweaks.

Few comments:
1) I felt this is wrong:
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 9a4d993e2bc..7980513a9bd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
                COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
                                          "HEADER", "QUOTE", "ESCAPE",
"FORCE_QUOTE",
                                          "FORCE_NOT_NULL",
"FORCE_NULL", "ENCODING", "DEFAULT",
-                                         "ON_ERROR", "LOG_VERBOSITY");
+                                         "ON_ERROR", "SET_TO_NULL",
"LOG_VERBOSITY");

as the following fails:
postgres=# copy t_on_error_null from stdin WITH ( set_to_null );
ERROR:  option "set_to_null" not recognized
LINE 1: copy t_on_error_null from stdin WITH ( set_to_null );

2) Can you limit this to 80 chars if possible to improve the readability:
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and
continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing
invalid input values with
+      <literal>NULL</literal> and move to the next field.

3) similarly here too:
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at
least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of
rows where invalid input values were replaced with
+      null is emitted at the end of the <command>COPY FROM</command>
if at least one row was replaced.

4) Could you mention a brief one line in the commit message as to why
"on_error null" cannot be used:
Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose
"on_error set_to_null".

Regards,
Vignesh





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-24 07:50  jian he <[email protected]>
  parent: vignesh C <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-03-24 07:50 UTC (permalink / raw)
  To: vignesh C <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Mar 21, 2025 at 2:34 PM vignesh C <[email protected]> wrote:
>
> Few comments:
> 1) I felt this is wrong:
> diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
> index 9a4d993e2bc..7980513a9bd 100644
> --- a/src/bin/psql/tab-complete.in.c
> +++ b/src/bin/psql/tab-complete.in.c
> @@ -3280,7 +3280,7 @@ match_previous_words(int pattern_id,
>                 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
>                                           "HEADER", "QUOTE", "ESCAPE",
> "FORCE_QUOTE",
>                                           "FORCE_NOT_NULL",
> "FORCE_NULL", "ENCODING", "DEFAULT",
> -                                         "ON_ERROR", "LOG_VERBOSITY");
> +                                         "ON_ERROR", "SET_TO_NULL",
> "LOG_VERBOSITY");
>
> as the following fails:
> postgres=# copy t_on_error_null from stdin WITH ( set_to_null );
> ERROR:  option "set_to_null" not recognized
> LINE 1: copy t_on_error_null from stdin WITH ( set_to_null );
>

- COMPLETE_WITH("stop", "ignore");
+ COMPLETE_WITH("stop", "ignore", "set_to_null");
yech. I think I fixed this.

> 2) Can you limit this to 80 chars if possible to improve the readability:
> +      <literal>stop</literal> means fail the command,
> +      <literal>ignore</literal> means discard the input row and
> continue with the next one, and
> +      <literal>set_to_null</literal> means replace columns containing
> invalid input values with
> +      <literal>NULL</literal> and move to the next field.
>
> 3) similarly here too:
> +      For <literal>ignore</literal> option,
> +      a <literal>NOTICE</literal> message containing the ignored row count is
> +      emitted at the end of the <command>COPY FROM</command> if at
> least one row was discarded.
> +      For <literal>set_to_null</literal> option,
> +      a <literal>NOTICE</literal> message indicating the number of
> rows where invalid input values were replaced with
> +      null is emitted at the end of the <command>COPY FROM</command>
> if at least one row was replaced.
>
sure.

> 4) Could you mention a brief one line in the commit message as to why
> "on_error null" cannot be used:
> Extent "on_error action", introduce new option:  on_error set_to_null.
> Current grammar makes us unable to use "on_error null", so we choose
> "on_error set_to_null".

by the following changes, we can change to (on_error null).
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3579,6 +3579,7 @@ copy_generic_opt_elem:

 copy_generic_opt_arg:
                        opt_boolean_or_string                   { $$ =
(Node *) makeString($1); }
+                       | NULL_P
         { $$ = (Node *) makeString("null"); }
                        | NumericOnly
 { $$ = (Node *) $1; }
                        | '*'
         { $$ = (Node *) makeNode(A_Star); }
                        | DEFAULT                       { $$ = (Node
*) makeString("default"); }

COPY x from stdin (format null);
ERROR:  syntax error at or near "null"
LINE 1: COPY x from stdin (format null);
                                  ^
will become

COPY x from stdin (format null);
ERROR:  COPY format "null" not recognized
LINE 1: COPY x from stdin (format null);
                           ^

it will cause NULL_P from reserved word to
non-reserved word in the COPY related command.


I am not sure this is what we want.
Anyway, I attached both two version
(ON_ERROR SET_TO_NULL) (ON_ERROR NULL).


Attachments:

  [text/x-patch] v14-0001-COPY-on_error-set_to_null.patch (19.9K, 2-v14-0001-COPY-on_error-set_to_null.patch)
  download | inline diff:
From 7a6b7edf0877bd9c5bb2629888d3d9c29618ca5d Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 24 Mar 2025 15:14:17 +0800
Subject: [PATCH v14 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will becomen reserved to non-reserved
word.  so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml           | 36 ++++++++++++-----
 src/backend/commands/copy.c          |  6 ++-
 src/backend/commands/copyfrom.c      | 29 +++++++++-----
 src/backend/commands/copyfromparse.c | 43 +++++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          |  1 +
 src/test/regress/expected/copy2.out  | 60 ++++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql       | 46 +++++++++++++++++++++
 8 files changed, 198 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..1909c11edff 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_to_null</literal> means replace columns containing invalid
+      input values with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows 
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..43a227eae72 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %llu row was replaced with null",
+								  "invalid values in %llu rows were replaced with null",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..e2b4d1f7ec9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1025,6 +1026,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 
 		/*
 		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with set_to_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,9 +1037,46 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			{
+				/*
+				 * we use this count the number of rows (not fields) that
+				 * successfully applied the on_error set_to_null
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
+
+				/*
+				 * we need another InputFunctionCallSafe so we can error out
+				 * not-null violation for domain with not-null constraint.
+				*/
+				cstate->escontext->error_occurred = false;
+				if (InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+						ereport(NOTICE,
+								errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+										cstate->cur_attname,
+										(unsigned long long) cstate->cur_lineno));
+					continue;
+				}
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+			}
 			cstate->num_errors++;
 
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+				cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			{
 				/*
 				 * Since we emit line number and column info in the below
@@ -1076,6 +1115,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 98951aef82c..c79b3af0495 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_to_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..caa94bfd526 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..003a91648e2 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



  [application/octet-stream] v14-0001-COPY-on_error-null.nocfbot (20.0K, 3-v14-0001-COPY-on_error-null.nocfbot)
  download

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-03-25 06:31  vignesh C <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: vignesh C @ 2025-03-25 06:31 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, 24 Mar 2025 at 13:21, jian he <[email protected]> wrote:
>
> I am not sure this is what we want.
> Anyway, I attached both two version

Few comments
1) I understood the problem, your first approach is ok for me.

2) Here in error we say column c1 violates not-null constraint and in
the context we show column c2, should the context also display c2
column:
postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
CREATE TABLE
postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> a  b
>> \.
ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
DETAIL:  Failing row contains (null, null).
CONTEXT:  COPY t3, line 1, column c2: "b"

3) typo becomen should be become:
null will becomen reserved to non-reserved

4) There is a whitespace error while applying patch
Applying: COPY (on_error set_to_null)
.git/rebase-apply/patch:39: trailing whitespace.
      a <literal>NOTICE</literal> message indicating the number of rows
warning: 1 line adds whitespace errors.

Regards,
Vignesh





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-04-04 11:55  jian he <[email protected]>
  parent: vignesh C <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-04-04 11:55 UTC (permalink / raw)
  To: vignesh C <[email protected]>; +Cc: Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Mar 25, 2025 at 2:31 PM vignesh C <[email protected]> wrote:
>
> 2) Here in error we say column c1 violates not-null constraint and in
> the context we show column c2, should the context also display c2
> column:
> postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> CREATE TABLE
> postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> a  b
> >> \.
> ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> DETAIL:  Failing row contains (null, null).
> CONTEXT:  COPY t3, line 1, column c2: "b"
>

It took me a while to figure out why.
with the attached, now the error message becomes:

ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
DETAIL:  Failing row contains (null, null).
CONTEXT:  COPY t3, line 1: "a,b"

while at it,
(on_error set_to_null, log_verbosity verbose)
error message CONTEXT will only emit out relation name,
this aligns with (on_error ignore, log_verbosity verbose).

one of the message out example:
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null



> 3) typo becomen should be become:
> null will becomen reserved to non-reserved
fixed.

> 4) There is a whitespace error while applying patch
> Applying: COPY (on_error set_to_null)
> .git/rebase-apply/patch:39: trailing whitespace.
>       a <literal>NOTICE</literal> message indicating the number of rows
> warning: 1 line adds whitespace errors.
fixed.


Attachments:

  [text/x-patch] v15-0001-COPY-on_error-set_to_null.patch (20.2K, 2-v15-0001-COPY-on_error-set_to_null.patch)
  download | inline diff:
From cfd9afbc583aac39f73f224cb70c9196398c3176 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 4 Apr 2025 19:43:52 +0800
Subject: [PATCH v15 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words.  so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml           | 36 +++++++++++-----
 src/backend/commands/copy.c          |  6 ++-
 src/backend/commands/copyfrom.c      | 29 ++++++++-----
 src/backend/commands/copyfromparse.c | 61 +++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          |  1 +
 src/test/regress/expected/copy2.out  | 60 +++++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql       | 46 +++++++++++++++++++++
 8 files changed, 215 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..ebe2eaa36e2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_to_null</literal> means replace columns containing invalid
+      input values with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..13bbe58855c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
+		return COPY_ON_ERROR_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..a3143ca4f29 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null",
+								  "invalid values in %" PRIu64 " rows were replaced with null",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..63a4400c8a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with set_to_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,9 +1037,62 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+			if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			{
+				/*
+				 * we use it to count number of rows (not fields!) that
+				 * successfully applied on_error set_to_null.
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
+
+				/*
+				 * when column type is domain with not-null constraint, we need
+				 * another InputFunctionCallSafe to error out not-null
+				 * violation.
+				*/
+				cstate->escontext->error_occurred = false;
+				if (InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+					{
+						/*
+						 * Since we emit line number and column info in the below
+						 * notice message, we suppress error context information other
+						 * than the relation name.
+						*/
+						Assert(!cstate->relname_only);
+						cstate->relname_only = true;
+						ereport(NOTICE,
+								errmsg("column \"%s\" was set to null due to data type incompatibility at line %" PRIu64 "",
+										cstate->cur_attname,
+										cstate->cur_lineno));
+
+						/* reset relname_only */
+						cstate->relname_only = false;
+					}
+
+					cstate->cur_attname = NULL;
+
+					continue;
+				}
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+			}
 			cstate->num_errors++;
 
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+				cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			{
 				/*
 				 * Since we emit line number and column info in the below
@@ -1076,6 +1131,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 98951aef82c..c79b3af0495 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_to_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7ebf4f78933 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..879a898911a 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_to_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_to_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..fbf80004178 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x from stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-04-04 21:32  Masahiko Sawada <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Masahiko Sawada @ 2025-04-04 21:32 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 4, 2025 at 4:55 AM jian he <[email protected]> wrote:
>
> On Tue, Mar 25, 2025 at 2:31 PM vignesh C <[email protected]> wrote:
> >
> > 2) Here in error we say column c1 violates not-null constraint and in
> > the context we show column c2, should the context also display c2
> > column:
> > postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> > CREATE TABLE
> > postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >> a  b
> > >> \.
> > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > DETAIL:  Failing row contains (null, null).
> > CONTEXT:  COPY t3, line 1, column c2: "b"
> >
>
> It took me a while to figure out why.
> with the attached, now the error message becomes:
>
> ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> DETAIL:  Failing row contains (null, null).
> CONTEXT:  COPY t3, line 1: "a,b"
>
> while at it,
> (on_error set_to_null, log_verbosity verbose)
> error message CONTEXT will only emit out relation name,
> this aligns with (on_error ignore, log_verbosity verbose).
>
> one of the message out example:
> +NOTICE:  column "b" was set to null due to data type incompatibility at line 2
> +CONTEXT:  COPY t_on_error_null
>
>
>
> > 3) typo becomen should be become:
> > null will becomen reserved to non-reserved
> fixed.
>
> > 4) There is a whitespace error while applying patch
> > Applying: COPY (on_error set_to_null)
> > .git/rebase-apply/patch:39: trailing whitespace.
> >       a <literal>NOTICE</literal> message indicating the number of rows
> > warning: 1 line adds whitespace errors.
> fixed.

I've reviewed the v15 patch and here are some comments:

How about renaming the new option value to 'set_null"? The 'to' in the
value name seems redundant to me.

---
+        COPY_ON_ERROR_NULL,                    /* set error field to null */

I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
COPY_ON_ERROR_SET_NULL if we change the option value name) for
consistency with the value name.

---
+                else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+                        ereport(NOTICE,
+                                        errmsg_plural("invalid values
in %" PRIu64 " row was replaced with null",
+
"invalid values in %" PRIu64 " rows were replaced with null",
+
cstate->num_errors,
+
cstate->num_errors));

How about adding "due to data type incompatibility" at the end of the message?

---
+                                    ereport(NOTICE,
+                                                    errmsg("column
\"%s\" was set to null due to data type incompatibility at line %"
PRIu64 "",
+
cstate->cur_attname,
+
cstate->cur_lineno));

Similar to the IGNORE case, we can show the data in question in the message.

---
+                    else
+                            ereport(ERROR,
+
errcode(ERRCODE_NOT_NULL_VIOLATION),
+                                            errmsg("domain %s does
not allow null values", format_type_be(typioparams[m])),
+                                            errdatatype(typioparams[m]));

If domain data type is the sole case where not to accept NULL, can we
check it beforehand to avoid calling the second
InputFunctionCallSafe() for non-domain data types? Also, if we want to
end up with an error when setting NULL to a domain type with NOT NULL,
I think we don't need to try to handle a soft error by passing
econtext to InputFunctionCallSafe().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-04-05 08:31  jian he <[email protected]>
  parent: Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-04-05 08:31 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <[email protected]> wrote:
>
> On Fri, Apr 4, 2025 at 4:55 AM jian he <[email protected]> wrote:
> >
> > On Tue, Mar 25, 2025 at 2:31 PM vignesh C <[email protected]> wrote:
> > >
> > > 2) Here in error we say column c1 violates not-null constraint and in
> > > the context we show column c2, should the context also display c2
> > > column:
> > > postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> > > CREATE TABLE
> > > postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> > > Enter data to be copied followed by a newline.
> > > End with a backslash and a period on a line by itself, or an EOF signal.
> > > >> a  b
> > > >> \.
> > > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > > DETAIL:  Failing row contains (null, null).
> > > CONTEXT:  COPY t3, line 1, column c2: "b"
> > >
> >
> > It took me a while to figure out why.
> > with the attached, now the error message becomes:
> >
> > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > DETAIL:  Failing row contains (null, null).
> > CONTEXT:  COPY t3, line 1: "a,b"
> >
> > while at it,
> > (on_error set_to_null, log_verbosity verbose)
> > error message CONTEXT will only emit out relation name,
> > this aligns with (on_error ignore, log_verbosity verbose).
> >
> > one of the message out example:
> > +NOTICE:  column "b" was set to null due to data type incompatibility at line 2
> > +CONTEXT:  COPY t_on_error_null
> >
> >
> >
> > > 3) typo becomen should be become:
> > > null will becomen reserved to non-reserved
> > fixed.
> >
> > > 4) There is a whitespace error while applying patch
> > > Applying: COPY (on_error set_to_null)
> > > .git/rebase-apply/patch:39: trailing whitespace.
> > >       a <literal>NOTICE</literal> message indicating the number of rows
> > > warning: 1 line adds whitespace errors.
> > fixed.
>
> I've reviewed the v15 patch and here are some comments:
>
> How about renaming the new option value to 'set_null"? The 'to' in the
> value name seems redundant to me.
>
> ---
> +        COPY_ON_ERROR_NULL,                    /* set error field to null */
>
> I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
> COPY_ON_ERROR_SET_NULL if we change the option value name) for
> consistency with the value name.
>
> ---
> +                else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
> +                        ereport(NOTICE,
> +                                        errmsg_plural("invalid values
> in %" PRIu64 " row was replaced with null",
> +
> "invalid values in %" PRIu64 " rows were replaced with null",
> +
> cstate->num_errors,
> +
> cstate->num_errors));
>
> How about adding "due to data type incompatibility" at the end of the message?
>
> ---
> +                                    ereport(NOTICE,
> +                                                    errmsg("column
> \"%s\" was set to null due to data type incompatibility at line %"
> PRIu64 "",
> +
> cstate->cur_attname,
> +
> cstate->cur_lineno));
>
> Similar to the IGNORE case, we can show the data in question in the message.
>
> ---
> +                    else
> +                            ereport(ERROR,
> +
> errcode(ERRCODE_NOT_NULL_VIOLATION),
> +                                            errmsg("domain %s does
> not allow null values", format_type_be(typioparams[m])),
> +                                            errdatatype(typioparams[m]));
>
> If domain data type is the sole case where not to accept NULL, can we
> check it beforehand to avoid calling the second
> InputFunctionCallSafe() for non-domain data types? Also, if we want to
> end up with an error when setting NULL to a domain type with NOT NULL,
> I think we don't need to try to handle a soft error by passing
> econtext to InputFunctionCallSafe().
>

please check attached, hope i have addressed all the points you've mentioned.


> If domain data type is the sole case where not to accept NULL, can we
> check it beforehand to avoid calling the second
> InputFunctionCallSafe() for non-domain data types?

I doubt it.

we have
InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
                      Oid typioparam, int32 typmod,
                      fmNodePtr escontext,
                      Datum *result)
{
    LOCAL_FCINFO(fcinfo, 3);
    if (str == NULL && flinfo->fn_strict)
    {
        *result = (Datum) 0;    /* just return null result */
        return true;
    }
}

Most of the non-domain type input functions are strict.
see query result:

select proname, pt.typname, proisstrict,pt.typtype
from pg_type pt
join pg_proc pp on pp.oid = pt.typinput
where pt.typtype <> 'd'
and pt.typtype <> 'p'
and proisstrict is false;

so the second InputFunctionCallSafe will be faster for non-domain types.

before CopyFromTextLikeOneRow we don't know if this type is
domain_with_constraint or not.
Beforehand, we can conditionally call DomainHasConstraints to find out.
but DomainHasConstraints is expensive, which may carry extra
performance issues for non-domain types.

but the second InputFunctionCallSafe call will not be a big issue for
domain_with_constraint,
because the first time domain_in call already cached related structs.


Attachments:

  [text/x-patch] v16-0001-COPY-on_error-set_null.patch (20.2K, 2-v16-0001-COPY-on_error-set_null.patch)
  download | inline diff:
From f6cd33623f12d8f105af4e847726867e6ed53a6b Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sat, 5 Apr 2025 16:30:10 +0800
Subject: [PATCH v16 1/1] COPY (on_error set_null)

Extent "on_error action", introduce new option:  on_error set_null.

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words.  so we choose "on_error set_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>
Masahiko Sawada <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml           | 36 ++++++++++-----
 src/backend/commands/copy.c          |  6 ++-
 src/backend/commands/copyfrom.c      | 29 ++++++++-----
 src/backend/commands/copyfromparse.c | 65 +++++++++++++++++++++++++++-
 src/bin/psql/tab-complete.in.c       |  2 +-
 src/include/commands/copy.h          |  1 +
 src/test/regress/expected/copy2.out  | 60 +++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql       | 46 ++++++++++++++++++++
 8 files changed, 219 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..db112867fa0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_null</literal> means replace columns containing invalid
+      input values with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_null") == 0)
+		return COPY_ON_ERROR_SET_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..d4a91b68ac1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+								  "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1622,10 +1630,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..79e726701ad 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with set_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,9 +1037,65 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
+			if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			{
+				/*
+				 * we use it to count number of rows (not fields!) that
+				 * successfully applied on_error set_null.
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
+
+				/*
+				 * when column type is domain with not-null constraint, we need
+				 * another InputFunctionCallSafe to error out domain constraint
+				 * violation.
+				*/
+				cstate->escontext->error_occurred = false;
+				if (InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  NULL,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+					{
+						char	   *attval;
+
+						/*
+						 * Since we emit line number and column info in the below
+						 * notice message, we suppress error context information other
+						 * than the relation name.
+						*/
+						Assert(!cstate->relname_only);
+						Assert(cstate->cur_attval);
+
+						cstate->relname_only = true;
+						attval = CopyLimitPrintoutLength(cstate->cur_attval);
+						ereport(NOTICE,
+								errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+										cstate->cur_lineno,
+										cstate->cur_attname,
+										attval));
+						pfree(attval);
+
+						/* reset relname_only */
+						cstate->relname_only = false;
+					}
+
+					cstate->cur_attname = NULL;
+
+					continue;
+				}
+			}
 			cstate->num_errors++;
 
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+				cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 			{
 				/*
 				 * Since we emit line number and column info in the below
@@ -1076,6 +1134,9 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
+
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..8e6f4930919 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_SET_NULL,		/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..91fa2087cef 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+                                              ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d27f3495cf7 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+a	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10	x1	yx
+11	zx	12
+13	14	ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-04-07 22:41  Masahiko Sawada <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Masahiko Sawada @ 2025-04-07 22:41 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Apr 5, 2025 at 1:31 AM jian he <[email protected]> wrote:
>
> On Sat, Apr 5, 2025 at 5:33 AM Masahiko Sawada <[email protected]> wrote:
> >
> > On Fri, Apr 4, 2025 at 4:55 AM jian he <[email protected]> wrote:
> > >
> > > On Tue, Mar 25, 2025 at 2:31 PM vignesh C <[email protected]> wrote:
> > > >
> > > > 2) Here in error we say column c1 violates not-null constraint and in
> > > > the context we show column c2, should the context also display c2
> > > > column:
> > > > postgres=# create table t3(c1 int not null, c2 int, check (c1 > 10));
> > > > CREATE TABLE
> > > > postgres=# COPY t3 FROM STDIN WITH (on_error set_to_null);
> > > > Enter data to be copied followed by a newline.
> > > > End with a backslash and a period on a line by itself, or an EOF signal.
> > > > >> a  b
> > > > >> \.
> > > > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > > > DETAIL:  Failing row contains (null, null).
> > > > CONTEXT:  COPY t3, line 1, column c2: "b"
> > > >
> > >
> > > It took me a while to figure out why.
> > > with the attached, now the error message becomes:
> > >
> > > ERROR:  null value in column "c1" of relation "t3" violates not-null constraint
> > > DETAIL:  Failing row contains (null, null).
> > > CONTEXT:  COPY t3, line 1: "a,b"
> > >
> > > while at it,
> > > (on_error set_to_null, log_verbosity verbose)
> > > error message CONTEXT will only emit out relation name,
> > > this aligns with (on_error ignore, log_verbosity verbose).
> > >
> > > one of the message out example:
> > > +NOTICE:  column "b" was set to null due to data type incompatibility at line 2
> > > +CONTEXT:  COPY t_on_error_null
> > >
> > >
> > >
> > > > 3) typo becomen should be become:
> > > > null will becomen reserved to non-reserved
> > > fixed.
> > >
> > > > 4) There is a whitespace error while applying patch
> > > > Applying: COPY (on_error set_to_null)
> > > > .git/rebase-apply/patch:39: trailing whitespace.
> > > >       a <literal>NOTICE</literal> message indicating the number of rows
> > > > warning: 1 line adds whitespace errors.
> > > fixed.
> >
> > I've reviewed the v15 patch and here are some comments:
> >
> > How about renaming the new option value to 'set_null"? The 'to' in the
> > value name seems redundant to me.
> >
> > ---
> > +        COPY_ON_ERROR_NULL,                    /* set error field to null */
> >
> > I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or
> > COPY_ON_ERROR_SET_NULL if we change the option value name) for
> > consistency with the value name.
> >
> > ---
> > +                else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
> > +                        ereport(NOTICE,
> > +                                        errmsg_plural("invalid values
> > in %" PRIu64 " row was replaced with null",
> > +
> > "invalid values in %" PRIu64 " rows were replaced with null",
> > +
> > cstate->num_errors,
> > +
> > cstate->num_errors));
> >
> > How about adding "due to data type incompatibility" at the end of the message?
> >
> > ---
> > +                                    ereport(NOTICE,
> > +                                                    errmsg("column
> > \"%s\" was set to null due to data type incompatibility at line %"
> > PRIu64 "",
> > +
> > cstate->cur_attname,
> > +
> > cstate->cur_lineno));
> >
> > Similar to the IGNORE case, we can show the data in question in the message.
> >
> > ---
> > +                    else
> > +                            ereport(ERROR,
> > +
> > errcode(ERRCODE_NOT_NULL_VIOLATION),
> > +                                            errmsg("domain %s does
> > not allow null values", format_type_be(typioparams[m])),
> > +                                            errdatatype(typioparams[m]));
> >
> > If domain data type is the sole case where not to accept NULL, can we
> > check it beforehand to avoid calling the second
> > InputFunctionCallSafe() for non-domain data types? Also, if we want to
> > end up with an error when setting NULL to a domain type with NOT NULL,
> > I think we don't need to try to handle a soft error by passing
> > econtext to InputFunctionCallSafe().
> >
>
> please check attached, hope i have addressed all the points you've mentioned.
>
>
> > If domain data type is the sole case where not to accept NULL, can we
> > check it beforehand to avoid calling the second
> > InputFunctionCallSafe() for non-domain data types?
>
> I doubt it.
>
> we have
> InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
>                       Oid typioparam, int32 typmod,
>                       fmNodePtr escontext,
>                       Datum *result)
> {
>     LOCAL_FCINFO(fcinfo, 3);
>     if (str == NULL && flinfo->fn_strict)
>     {
>         *result = (Datum) 0;    /* just return null result */
>         return true;
>     }
> }
>
> Most of the non-domain type input functions are strict.
> see query result:
>
> select proname, pt.typname, proisstrict,pt.typtype
> from pg_type pt
> join pg_proc pp on pp.oid = pt.typinput
> where pt.typtype <> 'd'
> and pt.typtype <> 'p'
> and proisstrict is false;
>
> so the second InputFunctionCallSafe will be faster for non-domain types.

Agreed.

BTW have you measured the overheads of calling InputFunctionCallSafe
twice? If it's significant, we might want to find other ways to
achieve it as it would not be good to incur overhead just for
relatively rare cases.

Here are some comments:

+               if (InputFunctionCallSafe(&in_functions[m],
+                                         NULL,
+                                         typioparams[m],
+                                         att->atttypmod,
+                                         NULL,
+                                         &values[m]))

Given that we pass NULL to escontext, does this function return false
in an error case? Or can we use InputFunctionCall instead?

I think we should mention that SET_NULL still could fail if the data
type of the column doesn't accept NULL.

How about restructuring the codes around handling data incompatibility
errors like:

else if (!InputFunctionCallSafe(...))
{
    if (cstate->opts.on_error == IGNORE)
    {
        cstate->num_errors++;
        if (cstate->opts.log_verbosity == VERBOSE)
            write a NOTICE message;
        return true; // ignore whole row.
    }
    else if (cstate->opts.on_error == SET_NULL)
    {
        current_row_erroneous = true;
        set NULL to the column;
        if (cstate->opts.log_verbosity == VERBOSE)
            write a NOTICE message;
        continue; // go to the next column.
}

That way, we have similar structures for both on_error handling and
don't need to reset cstate->cur_attname at the end of SET_NULL
handling.

---
From the regression tests:

--fail, column a is domain with not-null constraint
COPY t_on_error_null FROM STDIN WITH (on_error set_null);
a       11      14
\.
ERROR:  domain d_int_not_null does not allow null values
CONTEXT:  COPY t_on_error_null, line 1, column a: "a"

I guess that the log messages could confuse users since while the
actual error was caused by setting NULL to the non-NULL domain type
column, the context message says the data 'a' was erroneous.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-04-08 10:53  jian he <[email protected]>
  parent: Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-04-08 10:53 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; torikoshia <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada <[email protected]> wrote:
>
>
> BTW have you measured the overheads of calling InputFunctionCallSafe
> twice? If it's significant, we might want to find other ways to
> achieve it as it would not be good to incur overhead just for
> relatively rare cases.
>

Please check the attached two patches
v17-0001-COPY-on_error-set_null.original,
v17-0001-COPY-on_error-set_null.patch

for non-domain types, (on_error set_null), the performance of these
two are the same.
for domain type with or without constraint,
(on_error set_null): v17.original is slower than v17.patch.


test script:

create unlogged table t2(a text);
insert into t2 select 'a' from generate_Series(1, 10_000_000) g;
copy t2 to '/tmp/2.txt';
CREATE DOMAIN d1 AS INT ;
CREATE DOMAIN d2 AS INT check (value > 0);
create unlogged table t3(a int);
create unlogged table t4(a d1);
create unlogged table t5(a d2);


performance result:
v17-0001-COPY-on_error-set_null.patch
-- 764.903 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 779.253 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- Time: 750.390 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1

v17-0001-COPY-on_error-set_null.original
-- 774.943 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 867.671 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 927.685 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1


> Here are some comments:
>
> +               if (InputFunctionCallSafe(&in_functions[m],
> +                                         NULL,
> +                                         typioparams[m],
> +                                         att->atttypmod,
> +                                         NULL,
> +                                         &values[m]))
>
> Given that we pass NULL to escontext, does this function return false
> in an error case? Or can we use InputFunctionCall instead?
>
> I think we should mention that SET_NULL still could fail if the data
> type of the column doesn't accept NULL.
>
> How about restructuring the codes around handling data incompatibility
> errors like:
>
> else if (!InputFunctionCallSafe(...))
> {
>     if (cstate->opts.on_error == IGNORE)
>     {
>         cstate->num_errors++;
>         if (cstate->opts.log_verbosity == VERBOSE)
>             write a NOTICE message;
>         return true; // ignore whole row.
>     }
>     else if (cstate->opts.on_error == SET_NULL)
>     {
>         current_row_erroneous = true;
>         set NULL to the column;
>         if (cstate->opts.log_verbosity == VERBOSE)
>             write a NOTICE message;
>         continue; // go to the next column.
> }
>
> That way, we have similar structures for both on_error handling and
> don't need to reset cstate->cur_attname at the end of SET_NULL
> handling.
>

I think we still need to reset cstate->cur_attname.
the current code structure is
``
foreach(cur, cstate->attnumlist)
{
       if (condition x)
            continue;
        cstate->cur_attname = NULL;
        cstate->cur_attval = NULL;
}
``
In some cases (last column , condition x is satisfied), once we reach
the ``continue``, then we cannot reach.
``
        cstate->cur_attname = NULL;
        cstate->cur_attval = NULL;
``



> ---
> From the regression tests:
>
> --fail, column a is domain with not-null constraint
> COPY t_on_error_null FROM STDIN WITH (on_error set_null);
> a       11      14
> \.
> ERROR:  domain d_int_not_null does not allow null values
> CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
>
> I guess that the log messages could confuse users since while the
> actual error was caused by setting NULL to the non-NULL domain type
> column, the context message says the data 'a' was erroneous.
>

if the second function is InputFunctionCall, then we cannot customize
the error message.
we can't have both.
I guess we need a second InputFunctionCallSafe with escontext NOT NULL.

now i change it to
                if (!cstate->domain_with_constraint[m] ||
                    InputFunctionCallSafe(&in_functions[m],
                                          NULL,
                                          typioparams[m],
                                          att->atttypmod,
                                          (Node *) cstate->escontext,
                                          &values[m]))
                else if (string == NULL)
                    ereport(ERROR,
                            errcode(ERRCODE_NOT_NULL_VIOLATION),
                            errmsg("domain %s does not allow null
values", format_type_be(typioparams[m])),
                            errdatatype(typioparams[m]));
                else
                    ereport(ERROR,
                            errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                            errmsg("invalid input value for domain %s: \"%s\"",
                                   format_type_be(typioparams[m]), string));


do these ``ELSE IF``, ``ELSE`` error report messages make sense to you?


Attachments:

  [application/octet-stream] v17-0001-COPY-on_error-set_null.original (22.3K, 2-v17-0001-COPY-on_error-set_null.original)
  download

  [text/x-patch] v17-0001-COPY-on_error-set_null.patch (23.9K, 3-v17-0001-COPY-on_error-set_null.patch)
  download | inline diff:
From 660389d38a84275a62e497b676c388c063374909 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 8 Apr 2025 15:07:55 +0800
Subject: [PATCH v17 1/1] COPY (on_error set_null)

Extent "on_error action", introduce new option:  on_error set_null.

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words.  so we choose "on_error set_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column has a not-null constraint, successful (on_error set_null)
action will cause not-null constraint violation.
This also applies to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>
Masahiko Sawada <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml               |  36 +++++--
 src/backend/commands/copy.c              |   6 +-
 src/backend/commands/copyfrom.c          |  42 ++++++--
 src/backend/commands/copyfromparse.c     | 130 ++++++++++++++++++-----
 src/bin/psql/tab-complete.in.c           |   2 +-
 src/include/commands/copy.h              |   1 +
 src/include/commands/copyfrom_internal.h |   6 ++
 src/test/regress/expected/copy2.out      |  60 +++++++++++
 src/test/regress/sql/copy2.sql           |  46 ++++++++
 9 files changed, 277 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index d6859276bed..db112867fa0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_null</literal> means replace columns containing invalid
+      input values with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_null") == 0)
+		return COPY_ON_ERROR_SET_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+								  "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
 		}
 	}
 
+	if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+	{
+		int	attr_count = list_length(cstate->attnumlist);
+
+		cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+		foreach_int(attno, cstate->attnumlist)
+		{
+			int			i = foreach_current_index(attno);
+			Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+			cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+		}
+	}
+
 	/* Set up soft error handler for ON_ERROR */
 	if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
 	{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..e638d32e8f5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with set_null, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,47 +1037,119 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
-			cstate->num_errors++;
-
-			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+			if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			{
 				/*
-				 * Since we emit line number and column info in the below
-				 * notice message, we suppress error context information other
-				 * than the relation name.
-				 */
-				Assert(!cstate->relname_only);
-				cstate->relname_only = true;
+				 * we use it to count number of rows (not fields!) that
+				 * successfully applied on_error set_null.
+				*/
+				if (!current_row_erroneous)
+					current_row_erroneous = true;
 
-				if (cstate->cur_attval)
+				cstate->escontext->error_occurred = false;
+				Assert(cstate->domain_with_constraint != NULL);
+
+				/*
+				 * when column type is domain with constraints, we may
+				 * need another InputFunctionCallSafe to error out domain
+				 * constraint violation.
+				*/
+				if (!cstate->domain_with_constraint[m] ||
+					InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
 				{
-					char	   *attval;
-
-					attval = CopyLimitPrintoutLength(cstate->cur_attval);
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
-								   cstate->cur_lineno,
-								   cstate->cur_attname,
-								   attval));
-					pfree(attval);
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+
+					if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+					{
+						char	   *attval;
+
+						/*
+						* Since we emit line number and column info in the below
+						* notice message, we suppress error context information other
+						* than the relation name.
+						*/
+						Assert(!cstate->relname_only);
+						Assert(cstate->cur_attval);
+
+						cstate->relname_only = true;
+						attval = CopyLimitPrintoutLength(cstate->cur_attval);
+						ereport(NOTICE,
+								errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+										cstate->cur_lineno,
+										cstate->cur_attname,
+										attval));
+						pfree(attval);
+
+						/* reset relname_only */
+						cstate->relname_only = false;
+					}
+
+					cstate->cur_attname = NULL;
+					continue;
 				}
+				else if (string == NULL)
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
 				else
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
-								   cstate->cur_lineno,
-								   cstate->cur_attname));
-
-				/* reset relname_only */
-				cstate->relname_only = false;
+					ereport(ERROR,
+							errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input value for domain %s: \"%s\"",
+								   format_type_be(typioparams[m]), string));
 			}
+			else if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			{
+				cstate->num_errors++;
+
+				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+				{
+					/*
+					* Since we emit line number and column info in the below
+					* notice message, we suppress error context information other
+					* than the relation name.
+					*/
+					Assert(!cstate->relname_only);
+					cstate->relname_only = true;
+
+					if (cstate->cur_attval)
+					{
+						char	   *attval;
 
-			return true;
+						attval = CopyLimitPrintoutLength(cstate->cur_attval);
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									cstate->cur_lineno,
+									cstate->cur_attname,
+									attval));
+						pfree(attval);
+					}
+					else
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+									cstate->cur_lineno,
+									cstate->cur_attname));
+
+					/* reset relname_only */
+					cstate->relname_only = false;
+				}
+				return true;
+			}
 		}
 
 		cstate->cur_attname = NULL;
 		cstate->cur_attval = NULL;
 	}
 
+	if (current_row_erroneous)
+		cstate->num_errors++;
+
 	Assert(fieldno == attr_count);
 
 	return true;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..8e6f4930919 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3291,7 +3291,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_SET_NULL,		/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..b427e71b9b3 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,12 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	/*
+	 * Set to true if the corresponding att data type is domain with constraint.
+	 * normally this field is NULL, except when on_error is specified as SET_NULL.
+	*/
+	bool	   *domain_with_constraint;
+
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;	/* single element list of RangeTblEntry */
 	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..3f843d1cd5c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+                                              ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "ss"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "-1"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d77a06668e8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10	x1	yx
+11	zx	12
+13	14	ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-07-01 14:54  torikoshia <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: torikoshia @ 2025-07-01 14:54 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

Thanks for updating the patch and I've read 
v17-0001-COPY-on_error-set_null.patch and here are some comments.

> +COPY x from stdin (on_error set_null, reject_limit 2);
> +ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE

I understand that REJECT_LIMIT is out of scope for this patch, but 
personally, I feel that supporting REJECT_LIMIT with ON_ERROR SET_NULL 
would be a natural extension.
- Both IGNORE and SET_NULL share the common behavior of allowing COPY to 
continue despite soft errors.
- Since REJECT_LIMIT defines the threshold for how many soft errors can 
be tolerated before COPY fails, it seems consistent to allow it with 
SET_NULL as well.


+       if (current_row_erroneous)
+               cstate->num_errors++;

Is there any reason this error counting isn't placed inside the "if 
(cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block?
As far as I can tell, current_row_erroneous is only modified within that 
block, so it might make sense to keep this logic together for clarity.


These may be very minor, but I noticed a few inconsistencies in casing 
and wording:

+                * If ON_ERROR is specified with IGNORE, skip rows with 
soft errors.
+                * If ON_ERROR is specified with set_null, try to 
replace with null.

IGNORE is in uppercase, but set_null is lowercase.

+                                * we use it to count number of rows 
(not fields!) that
+                                * successfully applied on_error 
set_null.

The sentence should begin with a capital: "We use it..."
Also, I felt it's unclear what "we use it" means. Does it necessary?


+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);

COPY is uppercase, but to is lowercase.


+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
...
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input

It might be better to consider standardizing casing across all COPY 
statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.





^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-07-02 09:25  jian he <[email protected]>
  parent: torikoshia <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-07-02 09:25 UTC (permalink / raw)
  To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Jul 1, 2025 at 10:54 PM torikoshia <[email protected]> wrote:
>
> Hi,
>
> Thanks for updating the patch and I've read
> v17-0001-COPY-on_error-set_null.patch and here are some comments.
>
> +       if (current_row_erroneous)
> +               cstate->num_errors++;
>
> Is there any reason this error counting isn't placed inside the "if
> (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)" block?
> As far as I can tell, current_row_erroneous is only modified within that
> block, so it might make sense to keep this logic together for clarity.
>
> These may be very minor, but I noticed a few inconsistencies in casing
> and wording:
>
> +                * If ON_ERROR is specified with IGNORE, skip rows with
> soft errors.
> +                * If ON_ERROR is specified with set_null, try to
> replace with null.
>
> IGNORE is in uppercase, but set_null is lowercase.
>
> +                                * we use it to count number of rows
> (not fields!) that
> +                                * successfully applied on_error
> set_null.
>
> The sentence should begin with a capital: "We use it..."
> Also, I felt it's unclear what "we use it" means. Does it necessary?
>

hi.
I changed this comment, also heavily refactored CopyFromTextLikeOneRow based on
v17-0001-COPY-on_error-set_null.patch.
Now it looks way more intuitive, IMHO.

CopyFromTextLikeOneRow
else if (!InputFunctionCallSafe(&in_functions[m],
                                string,
                                typioparams[m],
                                att->atttypmod,
                                (Node *) cstate->escontext,
                                &values[m]))
{
    if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
        ////code for on_errr ignore
    else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
        ////code for on_errr set_null

    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
        //code for verbose message for on_error ignore or on_error set_null

    if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
        ////code for on_errr ignore loop control
    else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
        ////code for on_errr set_null loop control
}


> +COPY x to stdout (on_error set_null);
> +ERROR:  COPY ON_ERROR cannot be used with COPY TO
> +LINE 1: COPY x to stdout (on_error set_null);
>
> COPY is uppercase, but to is lowercase.
>
> +COPY x from stdin (format BINARY, on_error set_null);
> +ERROR:  only ON_ERROR STOP is allowed in BINARY mode
> +COPY x from stdin (on_error set_null, reject_limit 2);
> +ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
> ...
> +COPY t_on_error_null FROM STDIN WITH (on_error set_null);
> +ERROR:  domain d_int_not_null does not allow null values
> +CONTEXT:  COPY t_on_error_null, line 1, column a: null input
>
> It might be better to consider standardizing casing across all COPY
> statements (e.g., COPY ... TO, COPY ... FROM STDIN) for consistency.
>
I followed near code conventions, changing the casing here seems not necessary.


Attachments:

  [text/x-patch] v18-0001-COPY-on_error-set_null.patch (22.7K, 2-v18-0001-COPY-on_error-set_null.patch)
  download | inline diff:
From feded9f7562f608ec97cbb08399661c6494df021 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 2 Jul 2025 16:32:57 +0800
Subject: [PATCH v18 1/1] COPY (on_error set_null)

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words.  so we choose "on_error set_null".

When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error.  This applies to column data
type is a domain with a NOT NULL constraint.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>
Masahiko Sawada <[email protected]>
Atsushi Torikoshi <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
 doc/src/sgml/ref/copy.sgml               | 35 +++++++---
 src/backend/commands/copy.c              |  6 +-
 src/backend/commands/copyfrom.c          | 42 +++++++++---
 src/backend/commands/copyfromparse.c     | 84 ++++++++++++++++++++----
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  7 ++
 src/test/regress/expected/copy2.out      | 60 +++++++++++++++++
 src/test/regress/sql/copy2.sql           | 46 +++++++++++++
 9 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..26fb4be1709 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_null</literal> means replace column containing invalid
+      input value with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
       When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -458,7 +472,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..f963d0e51ff 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_null") == 0)
+		return COPY_ON_ERROR_SET_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+								  "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
 		}
 	}
 
+	if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+	{
+		int	attr_count = list_length(cstate->attnumlist);
+
+		cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+		foreach_int(attno, cstate->attnumlist)
+		{
+			int			i = foreach_current_index(attno);
+			Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+			cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+		}
+	}
+
 	/* Set up soft error handler for ON_ERROR */
 	if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
 	{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f52f2477df1..2147d8423fd 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with SET_NULL, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1035,7 +1037,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
-			cstate->num_errors++;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				cstate->num_errors++;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			{
+				cstate->escontext->error_occurred = false;
+				Assert(cstate->domain_with_constraint != NULL);
+
+				/*
+				 * When the column's type is a domain with constraints, an
+				 * additional InputFunctionCallSafe may be needed to raise
+				 * errors for domain constraint violations.
+				*/
+				if (!cstate->domain_with_constraint[m] ||
+					InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+				}
+				else if (string == NULL)
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input value for domain %s: \"%s\"",
+								   format_type_be(typioparams[m]), string));
+
+				/*
+				 * We count only the number of rows (not individual fields)
+				 * where ON_ERROR SET_NULL was successfully applied.
+				*/
+				if (!current_row_erroneous)
+				{
+					current_row_erroneous = true;
+					cstate->num_errors++;
+				}
+			}
 
 			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
 			{
@@ -1052,24 +1097,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 					char	   *attval;
 
 					attval = CopyLimitPrintoutLength(cstate->cur_attval);
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
-								   cstate->cur_lineno,
-								   cstate->cur_attname,
-								   attval));
+
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									   cstate->cur_lineno,
+									   cstate->cur_attname,
+									   attval));
+					else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+						ereport(NOTICE,
+									errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+											cstate->cur_lineno,
+											cstate->cur_attname,
+											attval));
 					pfree(attval);
 				}
 				else
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
-								   cstate->cur_lineno,
-								   cstate->cur_attname));
-
+				{
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+									   cstate->cur_lineno,
+									   cstate->cur_attname));
+				}
 				/* reset relname_only */
 				cstate->relname_only = false;
 			}
 
-			return true;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				return true;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+				continue;
 		}
 
 		cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8c2ea0b9587..a587f4162ee 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3305,7 +3305,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..935d21ee77a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_SET_NULL,		/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..c82bfab4636 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	/*
+	 * Set to true if the corresponding attribute's data type is a domain with
+	 * constraints.  This field is usually NULL, except when ON_ERROR is set to
+	 * SET_NULL.
+	*/
+	bool	   *domain_with_constraint;
+
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;	/* single element list of RangeTblEntry */
 	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae8..3f843d1cd5c 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+                                              ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "ss"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "-1"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce0..d77a06668e8 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10	x1	yx
+11	zx	12
+13	14	ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-07-30 04:44  jian he <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-07-30 04:44 UTC (permalink / raw)
  To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]>

hi.

rebase.


Attachments:

  [text/x-patch] v19-0001-COPY-on_error-set_null.patch (22.7K, 2-v19-0001-COPY-on_error-set_null.patch)
  download | inline diff:
From b3b2d794c83b36cf129d917d527ebf2cac46ca3b Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 11:06:17 +0800
Subject: [PATCH v19 1/1] COPY (on_error set_null)

Current grammar makes us unable to use "on_error null". if we did it, then in
all the COPY command options's value, null will become reserved to non-reserved
words.  so we choose "on_error set_null".

When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error.  This applies to column data
type is a domain with a NOT NULL constraint.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>
Masahiko Sawada <[email protected]>
Atsushi Torikoshi <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
 doc/src/sgml/ref/copy.sgml               | 35 +++++++---
 src/backend/commands/copy.c              |  6 +-
 src/backend/commands/copyfrom.c          | 42 +++++++++---
 src/backend/commands/copyfromparse.c     | 84 ++++++++++++++++++++----
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  7 ++
 src/test/regress/expected/copy2.out      | 60 +++++++++++++++++
 src/test/regress/sql/copy2.sql           | 46 +++++++++++++
 9 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..a36e33f320f 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_null</literal> means replace column containing invalid
+      input value with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
       When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..9213bfb167f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -413,12 +413,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_null") == 0)
+		return COPY_ON_ERROR_SET_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -928,7 +930,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..750d597d4d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+								  "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
 		}
 	}
 
+	if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+	{
+		int	attr_count = list_length(cstate->attnumlist);
+
+		cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+		foreach_int(attno, cstate->attnumlist)
+		{
+			int			i = foreach_current_index(attno);
+			Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+			cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+		}
+	}
+
 	/* Set up soft error handler for ON_ERROR */
 	if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
 	{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..ee887a37afd 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with SET_NULL, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
-			cstate->num_errors++;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				cstate->num_errors++;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			{
+				cstate->escontext->error_occurred = false;
+				Assert(cstate->domain_with_constraint != NULL);
+
+				/*
+				 * If the column type is a domain with constraints, an
+				 * additional InputFunctionCallSafe may be needed to raise
+				 * errors for domain constraint violations.
+				*/
+				if (!cstate->domain_with_constraint[m] ||
+					InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+				}
+				else if (string == NULL)
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input value for domain %s: \"%s\"",
+								   format_type_be(typioparams[m]), string));
+
+				/*
+				 * We count only the number of rows (not individual fields)
+				 * where ON_ERROR SET_NULL was successfully applied.
+				*/
+				if (!current_row_erroneous)
+				{
+					current_row_erroneous = true;
+					cstate->num_errors++;
+				}
+			}
 
 			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
 			{
@@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 					char	   *attval;
 
 					attval = CopyLimitPrintoutLength(cstate->cur_attval);
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
-								   cstate->cur_lineno,
-								   cstate->cur_attname,
-								   attval));
+
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									   cstate->cur_lineno,
+									   cstate->cur_attname,
+									   attval));
+					else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+						ereport(NOTICE,
+								errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									   cstate->cur_lineno,
+									   cstate->cur_attname,
+									   attval));
 					pfree(attval);
 				}
 				else
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
-								   cstate->cur_lineno,
-								   cstate->cur_attname));
-
+				{
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+									   cstate->cur_lineno,
+									   cstate->cur_attname));
+				}
 				/* reset relname_only */
 				cstate->relname_only = false;
 			}
 
-			return true;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				return true;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+				continue;
 		}
 
 		cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index dbc586c5bc3..c88593e4158 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3343,7 +3343,7 @@ match_previous_words(int pattern_id,
 
 	/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_null");
 
 	/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "LOG_VERBOSITY"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..da3622028e7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_SET_NULL,		/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..c82bfab4636 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	/*
+	 * Set to true if the corresponding attribute's data type is a domain with
+	 * constraints.  This field is usually NULL, except when ON_ERROR is set to
+	 * SET_NULL.
+	*/
+	bool	   *domain_with_constraint;
+
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;	/* single element list of RangeTblEntry */
 	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index caa3c44f0d0..919a2296c27 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error ignore);
+                                              ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -775,6 +787,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "ss"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "ss"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ERROR:  invalid input value for domain d_int_not_null: "-1"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null due to data type incompatibility
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -834,6 +891,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index cef45868db5..be05ed52def 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -537,6 +541,45 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+\N	11	13
+\.
+
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+ss	11	14
+\.
+
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_null);
+-1	11	13
+\.
+
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+--ok
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose);
+10	x1	yx
+11	zx	12
+13	14	ea
+\.
+
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -606,6 +649,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2025-11-10 10:22  jian he <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: jian he @ 2025-11-10 10:22 UTC (permalink / raw)
  To: torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]>

hi.

rebase and minor cosmetic changes.

--
jian
https://www.enterprisedb.com/


Attachments:

  [text/x-patch] v20-0001-COPY-on_error-set_null.patch (22.4K, 2-v20-0001-COPY-on_error-set_null.patch)
  download | inline diff:
From 05574920774c9bb2f93f9cb6ecea136aa673cba7 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 10 Nov 2025 18:17:09 +0800
Subject: [PATCH v20 1/1] COPY (on_error set_null)

When COPY FROM, if ON_ERROR SET_NULL is specified, any data type conversion
errors will result in the affected column being set to NULL. However, column's
not-null constraints are still enforced, attempting to set a NULL value in such
columns will raise a constraint violation error.  This applies to column data
type is a domain with a NOT NULL constraint.

Author: Jian He <[email protected]>
Author: Kirill Reshke <[email protected]>

Reviewed-by:
Fujii Masao <[email protected]>
Jim Jones <[email protected]>
"David G. Johnston" <[email protected]>
Yugo NAGATA <[email protected]>
torikoshia <[email protected]>
Masahiko Sawada <[email protected]>
Atsushi Torikoshi <[email protected]>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
commitfest entry: https://commitfest.postgresql.org/patch/4810
---
 doc/src/sgml/ref/copy.sgml               | 35 +++++++---
 src/backend/commands/copy.c              |  6 +-
 src/backend/commands/copyfrom.c          | 42 +++++++++---
 src/backend/commands/copyfromparse.c     | 84 ++++++++++++++++++++----
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  7 ++
 src/test/regress/expected/copy2.out      | 55 ++++++++++++++++
 src/test/regress/sql/copy2.sql           | 43 ++++++++++++
 9 files changed, 239 insertions(+), 36 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fdc24b36bb8..1f42fd0972d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -412,23 +412,37 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one,
+      and <literal>set_null</literal> means replace column containing invalid
+      input value with <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
      </para>
      <para>
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
      </para>
+    <para>
+      For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
+      containing the ignored row count is emitted at the end of the <command>COPY
+      FROM</command> if at least one row was discarded.
+      For <literal>set_null</literal> option,
+      a <literal>NOTICE</literal> message indicating the number of rows
+      where invalid input values were replaced with null is emitted
+      at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
      <para>
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_null</literal> option, a <literal>NOTICE</literal>
+      message containing the line of the input file and the column name where
+      value was replaced with <literal>NULL</literal> for each input conversion
+      failure.
       When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      regarding input conversion failed rows.
      </para>
     </listitem>
    </varlistentry>
@@ -476,7 +490,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_null</literal>.
       </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28e878c3688..9d0413f4cfa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -456,12 +456,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "set_null") == 0)
+		return COPY_ON_ERROR_SET_NULL;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -971,7 +973,7 @@ ProcessCopyOptions(ParseState *pstate,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
 
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..87588b688f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
-							  "%" PRIu64 " rows were skipped due to data type incompatibility",
-							  cstate->num_errors,
-							  cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
+								  "%" PRIu64 " rows were skipped due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			ereport(NOTICE,
+					errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
+								  "invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
+								  cstate->num_errors,
+								  cstate->num_errors));
+	}
 
 	if (bistate != NULL)
 		FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
 		}
 	}
 
+	if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+	{
+		int	attr_count = list_length(cstate->attnumlist);
+
+		cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
+		foreach_int(attno, cstate->attnumlist)
+		{
+			int			i = foreach_current_index(attno);
+			Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
+			cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
+		}
+	}
+
 	/* Set up soft error handler for ON_ERROR */
 	if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
 	{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
 
 		/*
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
+		 * We'll add other options later
 		 */
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
 			cstate->escontext->details_wanted = false;
 	}
 	else
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..7c0d13ab38b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -956,6 +956,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 	int			fldct;
 	int			fieldno;
 	char	   *string;
+	bool		current_row_erroneous = false;
 
 	tupDesc = RelationGetDescr(cstate->rel);
 	attr_count = list_length(cstate->attnumlist);
@@ -1033,7 +1034,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		}
 
 		/*
-		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+		 * If ON_ERROR is specified with IGNORE, skip rows with soft errors.
+		 * If ON_ERROR is specified with SET_NULL, try to replace with null.
 		 */
 		else if (!InputFunctionCallSafe(&in_functions[m],
 										string,
@@ -1044,7 +1046,50 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 		{
 			Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
 
-			cstate->num_errors++;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				cstate->num_errors++;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+			{
+				cstate->escontext->error_occurred = false;
+				Assert(cstate->domain_with_constraint != NULL);
+
+				/*
+				 * If the column type is a domain with constraints, an
+				 * additional InputFunctionCallSafe may be needed to raise
+				 * errors for domain constraint violations.
+				 */
+				if (!cstate->domain_with_constraint[m] ||
+					InputFunctionCallSafe(&in_functions[m],
+										  NULL,
+										  typioparams[m],
+										  att->atttypmod,
+										  (Node *) cstate->escontext,
+										  &values[m]))
+				{
+					nulls[m] = true;
+					values[m] = (Datum) 0;
+				}
+				else if (string == NULL)
+					ereport(ERROR,
+							errcode(ERRCODE_NOT_NULL_VIOLATION),
+							errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+							errdatatype(typioparams[m]));
+				else
+					ereport(ERROR,
+							errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+							errmsg("invalid input value for domain %s: \"%s\"",
+								   format_type_be(typioparams[m]), string));
+
+				/*
+				 * We count only the number of rows (not individual fields)
+				 * where ON_ERROR SET_NULL was successfully applied.
+				 */
+				if (!current_row_erroneous)
+				{
+					current_row_erroneous = true;
+					cstate->num_errors++;
+				}
+			}
 
 			if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
 			{
@@ -1061,24 +1106,37 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
 					char	   *attval;
 
 					attval = CopyLimitPrintoutLength(cstate->cur_attval);
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
-								   cstate->cur_lineno,
-								   cstate->cur_attname,
-								   attval));
+
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									   cstate->cur_lineno,
+									   cstate->cur_attname,
+									   attval));
+					else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+						ereport(NOTICE,
+								errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
+									   cstate->cur_lineno,
+									   cstate->cur_attname,
+									   attval));
 					pfree(attval);
 				}
 				else
-					ereport(NOTICE,
-							errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
-								   cstate->cur_lineno,
-								   cstate->cur_attname));
-
+				{
+					if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+						ereport(NOTICE,
+								errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
+									   cstate->cur_lineno,
+									   cstate->cur_attname));
+				}
 				/* reset relname_only */
 				cstate->relname_only = false;
 			}
 
-			return true;
+			if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+				return true;
+			else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
+				continue;
 		}
 
 		cstate->cur_attname = NULL;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 316a2dafbf1..7124f468ba4 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3381,7 +3381,7 @@ match_previous_words(int pattern_id,
 	/* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") ||
 			 Matches("COPY|\\copy", MatchAny, "FROM", "PROGRAM", MatchAny, "WITH", "(", "ON_ERROR"))
-		COMPLETE_WITH("stop", "ignore");
+		COMPLETE_WITH("stop", "ignore", "set_null");
 
 	/* Complete COPY <sth> FROM [PROGRAM] filename WITH (LOG_VERBOSITY */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "LOG_VERBOSITY") ||
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..da3622028e7 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -35,6 +35,7 @@ typedef enum CopyOnErrorChoice
 {
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_SET_NULL,		/* set error field to null */
 } CopyOnErrorChoice;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..a606a38cc23 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,13 @@ typedef struct CopyFromStateData
 								 * att */
 	bool	   *defaults;		/* if DEFAULT marker was found for
 								 * corresponding att */
+	/*
+	 * Set to true if the corresponding attribute's data type is a domain with
+	 * constraints.  This field is usually NULL, except when ON_ERROR is set to
+	 * SET_NULL.
+	 */
+	bool	   *domain_with_constraint;
+
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;	/* single element list of RangeTblEntry */
 	List	   *rteperminfos;	/* single element list of RTEPermissionInfo */
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index f3fdce23459..16e70ccb813 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
                                             ^
+COPY x from stdin (on_error set_null, on_error set_null);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_null, on_error set_null);
+                                              ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (on_error set_null, reject_limit 2);
+ERROR:  COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
                                          ^
+COPY x to stdout (on_error set_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdout (on_error set_null);
+                          ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -776,6 +788,46 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR:  invalid input value for domain d_int_not_null: "ss"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "ss"
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ERROR:  invalid input value for domain d_int_not_null: "-1"
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail, less data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail, extra data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "b": "x1"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 1 for column "c": "yx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 2 for column "b": "zx"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  setting to null due to data type incompatibility at line 3 for column "c": "ea"
+CONTEXT:  COPY t_on_error_null
+NOTICE:  invalid values in 3 rows were replaced with null due to data type incompatibility
+SELECT * FROM t_on_error_null;
+ a  |  b   |  c   
+----+------+------
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -835,6 +887,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index cef45868db5..49200cf064d 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_null, on_error set_null);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_null);
+COPY x from stdin (on_error set_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdout (on_error set_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -537,6 +541,42 @@ a	{2}	2
 8	{8}	8
 \.
 
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+
+\pset null NULL
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+\N	11	13
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+ss	11	14
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null); --fail
+-1	11	13
+\.
+
+--fail, less data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,1
+\.
+--fail, extra data.
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_null);
+1,2,3,4
+\.
+
+COPY t_on_error_null FROM STDIN WITH (on_error set_null, log_verbosity verbose); --ok
+10	x1	yx
+11	zx	12
+13	14	ea
+\.
+
+SELECT * FROM t_on_error_null;
+
+\pset null ''
+
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -606,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
@ 2026-01-20 19:55  Matheus Alcantara <[email protected]>
  parent: jian he <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Matheus Alcantara @ 2026-01-20 19:55 UTC (permalink / raw)
  To: jian he <[email protected]>; torikoshia <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; vignesh C <[email protected]>; Jim Jones <[email protected]>; Kirill Reshke <[email protected]>; Fujii Masao <[email protected]>; David G. Johnston <[email protected]>; Yugo NAGATA <[email protected]>; PostgreSQL Hackers <[email protected]>

On 10/11/25 07:22, jian he wrote:
> hi.
> 
> rebase and minor cosmetic changes.
> 
Hi,

The patch needs a new rebase, could you please send a new version?


--
Matheus Alcantara
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 22+ messages in thread


end of thread, other threads:[~2026-01-20 19:55 UTC | newest]

Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-01-10 06:38 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row jian he <[email protected]>
2025-01-11 09:53 ` Kirill Reshke <[email protected]>
2025-01-14 05:51   ` jian he <[email protected]>
2025-01-20 14:03     ` Kirill Reshke <[email protected]>
2025-03-07 03:48       ` jian he <[email protected]>
2025-03-11 10:31         ` Jim Jones <[email protected]>
2025-03-12 08:00           ` jian he <[email protected]>
2025-03-12 08:25             ` Jim Jones <[email protected]>
2025-03-18 03:55               ` jian he <[email protected]>
2025-03-21 06:34                 ` vignesh C <[email protected]>
2025-03-24 07:50                   ` jian he <[email protected]>
2025-03-25 06:31                     ` vignesh C <[email protected]>
2025-04-04 11:55                       ` jian he <[email protected]>
2025-04-04 21:32                         ` Masahiko Sawada <[email protected]>
2025-04-05 08:31                           ` jian he <[email protected]>
2025-04-07 22:41                             ` Masahiko Sawada <[email protected]>
2025-04-08 10:53                               ` jian he <[email protected]>
2025-07-01 14:54                                 ` torikoshia <[email protected]>
2025-07-02 09:25                                   ` jian he <[email protected]>
2025-07-30 04:44                                     ` jian he <[email protected]>
2025-11-10 10:22                                       ` jian he <[email protected]>
2026-01-20 19:55                                         ` Matheus Alcantara <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox