Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tASKC-00Fpqh-5X for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 11:11:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tASK8-00DQz9-7d for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 11:11:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tASK7-00DQz0-UZ for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 11:11:12 +0000 Received: from oss.nttdata.com ([49.212.34.109]) by magus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tASK3-001Lef-6y for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 11:11:11 +0000 Received: from oss.nttdata.com (localhost [127.0.0.1]) by oss.nttdata.com (Postfix) with ESMTPA id AF126613B4; Mon, 11 Nov 2024 20:11:02 +0900 (JST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at oss.nttdata.com MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 11 Nov 2024 20:11:02 +0900 From: torikoshia To: Kirill Reshke Cc: Fujii Masao , jian he , Jim Jones , "David G. Johnston" , Yugo NAGATA , PostgreSQL Hackers Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row In-Reply-To: References: <04bf425ad1b15a4daefe96c478a5253b@oss.nttdata.com> <20240206191937.72eaf0ccc20cfea37944b422@sraoss.co.jp> <76da9fcc-93c5-4053-872e-12932a95356d@uni-muenster.de> <6eac5b45-7f45-4c7a-aae1-e90db8be2e08@uni-muenster.de> <3d6b5885-16a1-475d-b56f-41701c48d9d4@uni-muenster.de> <63595e8f-a245-4335-aa22-7e449a70e210@oss.nttdata.com> <07587c36-18b3-4ccb-b5fb-579bcb04ed37@oss.nttdata.com> User-Agent: Roundcube Webmail/1.4.11 Message-ID: <501dd655ddb04693c15baeb6485bc601@oss.nttdata.com> X-Sender: torikoshia@oss.nttdata.com List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2024-11-09 21:55, Kirill Reshke wrote: Thanks for working on this! > On Thu, 7 Nov 2024 at 23:00, Fujii Masao > wrote: >> >> >> >> On 2024/10/26 6:03, Kirill Reshke wrote: >> > when the REJECT LIMIT is set to some non-zero number and the number of >> > row NULL replacements exceeds the limit, is it OK to fail. Because >> > there WAS errors, and we should not tolerate more than $limit errors . >> > I do find this behavior to be consistent. >> >> +1 >> >> >> > But what if we don't set a REJECT LIMIT, it is sane to do all >> > replacements, as if REJECT LIMIT is inf. >> >> +1 > > After thinking for a while, I'm now more opposed to this approach. I > think we should count rows with erroneous data as errors only if > null substitution for these rows failed, not the total number of rows > which were modified. > Then, to respect the REJECT LIMIT option, we compare this number with > the limit. This is actually simpler approach IMHO. What do You think? IMHO I prefer the previous interpretation. I'm not sure this is what people expect, but I assume that REJECT_LIMIT is used to specify how many malformed rows are acceptable in the "original" data source. >> > But while I was trying to implement that, I realized that I don't >> > understand v4 of this patch. My misunderstanding is about >> > `t_on_error_null` tests. We are allowed to insert a NULL value for the >> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why >> > do we do that? My thought is we should try to execute >> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the >> > column after we failed to insert the input value. And, if this second >> > call is successful, we do replacement, otherwise we count the row as >> > erroneous. >> >> Your concern is valid. Allowing NULL to be stored in a column with a >> NOT NULL >> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you >> suggested, >> NULL values set by SET_TO_NULL should probably be re-evaluated. > > Thank you. I updated the patch with a NULL re-evaluation. > > > PFA v7. I did not yet update the doc for this patch version, waiting > for feedback about REJECT LIMIT + SET_TO_NULL behaviour. > There were warnings when I applied the patch: $ git apply v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170: trailing whitespace. /* v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181: trailing whitespace. v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189: trailing whitespace. /* If datatype if okay with NULL, replace v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196: trailing whitespace. v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212: trailing whitespace. /* > @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState > *pstate, bool is_from) > parser_errposition(pstate, def->location))); ... > > - if (opts_out->reject_limit && !opts_out->on_error) > + if (opts_out->reject_limit && !(opts_out->on_error == > COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE)) Hmm, is this change necessary? Personally, I feel the previous code is easier to read. "REJECT LIMIT" should be "REJECT_LIMIT"? > 1037 errhint("Consider specifying the > REJECT LIMIT option to skip erroneous rows."))); SET_TO_NULL is one of the value for ON_ERROR, but the patch treats SET_TO_NULL as option for COPY: 221 --- a/src/bin/psql/tab-complete.in.c 222 +++ b/src/bin/psql/tab-complete.in.c 223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id, 224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", 225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", 226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT", 227 - "ON_ERROR", "LOG_VERBOSITY"); 228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); > Best regards, > Kirill Reshke -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.