public inbox for [email protected]  
help / color / mirror / Atom feed
From: torikoshia <[email protected]>
To: Kirill Reshke <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: jian he <[email protected]>
Cc: Jim Jones <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: Yugo NAGATA <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: Mon, 11 Nov 2024 20:11:02 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com>
References: <CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com>
	<CACJufxFFU92-H7G8tmpxt9oTSfL062OA7n5rPx-YbOAtDUUzGw@mail.gmail.com>
	<[email protected]>
	<CACJufxGnc+=No=Ua6NFT2ADt0vRL=m1QsuCOM=9aKPKWh9_L6Q@mail.gmail.com>
	<[email protected]>
	<CACJufxFT9j8o5kEC8dPCQqLomWjeJm9V9m8eZjj2Gvc_F5ha=g@mail.gmail.com>
	<[email protected]>
	<CAKFQuwaYNw8U-9JkFdyOX4i4Y3J1sp6+dk-sh8YmZGCq8gMeVQ@mail.gmail.com>
	<[email protected]>
	<CACJufxFFdtPKk4B5rSVNEk6yCH2Amvi_8w3Gaz5wg9M_t9c5Rw@mail.gmail.com>
	<[email protected]>
	<CACJufxEgiysa2SMJPGp0aN476Ojm636MfJK88DZC7TVYsXYBBQ@mail.gmail.com>
	<CALdSSPhgjCbyb=ZRgr4LaCFJV2-F9_CxMeX6poHuGCt_f9GYAw@mail.gmail.com>
	<[email protected]>
	<CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com>
	<[email protected]>
	<CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com>

On 2024-11-09 21:55, Kirill Reshke wrote:

Thanks for working on this!

> On Thu, 7 Nov 2024 at 23:00, Fujii Masao <[email protected]> 
> wrote:
>> 
>> 
>> 
>> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > when the REJECT LIMIT is set to some non-zero number and the number of
>> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > there WAS errors, and we should not tolerate more than $limit errors .
>> > I do find this behavior to be consistent.
>> 
>> +1
>> 
>> 
>> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > replacements, as if REJECT LIMIT is inf.
>> 
>> +1
> 
> After thinking for a while, I'm now more opposed to this approach. I
> think we should count rows with erroneous data as errors only if
> null substitution for these rows failed, not the total number of rows
> which were modified.
> Then, to respect the REJECT LIMIT option, we compare this number with
> the limit. This is actually simpler approach IMHO. What do You think?

IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT 
is used to specify how many malformed rows are acceptable in the 
"original" data source.

>> > But while I was trying to implement that, I realized that I don't
>> > understand v4 of this patch. My misunderstanding is about
>> > `t_on_error_null` tests. We are allowed to insert a NULL value for the
>> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
>> > do we do that? My thought is we should try to execute
>> > InputFunctionCallSafe with NULL value  (i mean, here [1]) for the
>> > column after we failed to insert the input value. And, if this second
>> > call is successful, we do replacement, otherwise we count the row as
>> > erroneous.
>> 
>> Your concern is valid. Allowing NULL to be stored in a column with a 
>> NOT NULL
>> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you 
>> suggested,
>> NULL values set by SET_TO_NULL should probably be re-evaluated.
> 
> Thank you. I updated the patch with a NULL re-evaluation.
> 
> 
> PFA v7. I did not yet update the doc for this patch version, waiting
> for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
> 


There were warnings when I applied the patch:

$ git apply 
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170: 
trailing whitespace.
                    /*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181: 
trailing whitespace.

v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189: 
trailing whitespace.
                                    /* If datatype if okay with NULL, 
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196: 
trailing whitespace.

v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212: 
trailing whitespace.
                                 /*

>  @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState 
> *pstate, bool is_from)
>                   parser_errposition(pstate, def->location)));
...
> 
>  -   if (opts_out->reject_limit && !opts_out->on_error)
>  +   if (opts_out->reject_limit && !(opts_out->on_error == 
> COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))

Hmm, is this change necessary?
Personally, I feel the previous code is easier to read.


"REJECT LIMIT" should be "REJECT_LIMIT"?
> 1037                             errhint("Consider specifying the 
> REJECT LIMIT option to skip erroneous rows.")));


SET_TO_NULL is one of the value for ON_ERROR, but the patch treats 
SET_TO_NULL as option for COPY:

221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224         COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225                       "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226                       "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", 
"DEFAULT",
227 -                     "ON_ERROR", "LOG_VERBOSITY");
228 +                     "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");


> Best regards,
> Kirill Reshke

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.






view thread (29+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox