public inbox for [email protected]
help / color / mirror / Atom feedFrom: torikoshia <[email protected]>
To: Kirill Reshke <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: jian he <[email protected]>
Cc: Jim Jones <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: Yugo NAGATA <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: Mon, 11 Nov 2024 20:11:02 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com>
References: <CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com>
<CACJufxFFU92-H7G8tmpxt9oTSfL062OA7n5rPx-YbOAtDUUzGw@mail.gmail.com>
<[email protected]>
<CACJufxGnc+=No=Ua6NFT2ADt0vRL=m1QsuCOM=9aKPKWh9_L6Q@mail.gmail.com>
<[email protected]>
<CACJufxFT9j8o5kEC8dPCQqLomWjeJm9V9m8eZjj2Gvc_F5ha=g@mail.gmail.com>
<[email protected]>
<CAKFQuwaYNw8U-9JkFdyOX4i4Y3J1sp6+dk-sh8YmZGCq8gMeVQ@mail.gmail.com>
<[email protected]>
<CACJufxFFdtPKk4B5rSVNEk6yCH2Amvi_8w3Gaz5wg9M_t9c5Rw@mail.gmail.com>
<[email protected]>
<CACJufxEgiysa2SMJPGp0aN476Ojm636MfJK88DZC7TVYsXYBBQ@mail.gmail.com>
<CALdSSPhgjCbyb=ZRgr4LaCFJV2-F9_CxMeX6poHuGCt_f9GYAw@mail.gmail.com>
<[email protected]>
<CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com>
<[email protected]>
<CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com>
On 2024-11-09 21:55, Kirill Reshke wrote:
Thanks for working on this!
> On Thu, 7 Nov 2024 at 23:00, Fujii Masao <[email protected]>
> wrote:
>>
>>
>>
>> On 2024/10/26 6:03, Kirill Reshke wrote:
>> > when the REJECT LIMIT is set to some non-zero number and the number of
>> > row NULL replacements exceeds the limit, is it OK to fail. Because
>> > there WAS errors, and we should not tolerate more than $limit errors .
>> > I do find this behavior to be consistent.
>>
>> +1
>>
>>
>> > But what if we don't set a REJECT LIMIT, it is sane to do all
>> > replacements, as if REJECT LIMIT is inf.
>>
>> +1
>
> After thinking for a while, I'm now more opposed to this approach. I
> think we should count rows with erroneous data as errors only if
> null substitution for these rows failed, not the total number of rows
> which were modified.
> Then, to respect the REJECT LIMIT option, we compare this number with
> the limit. This is actually simpler approach IMHO. What do You think?
IMHO I prefer the previous interpretation.
I'm not sure this is what people expect, but I assume that REJECT_LIMIT
is used to specify how many malformed rows are acceptable in the
"original" data source.
>> > But while I was trying to implement that, I realized that I don't
>> > understand v4 of this patch. My misunderstanding is about
>> > `t_on_error_null` tests. We are allowed to insert a NULL value for the
>> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why
>> > do we do that? My thought is we should try to execute
>> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the
>> > column after we failed to insert the input value. And, if this second
>> > call is successful, we do replacement, otherwise we count the row as
>> > erroneous.
>>
>> Your concern is valid. Allowing NULL to be stored in a column with a
>> NOT NULL
>> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you
>> suggested,
>> NULL values set by SET_TO_NULL should probably be re-evaluated.
>
> Thank you. I updated the patch with a NULL re-evaluation.
>
>
> PFA v7. I did not yet update the doc for this patch version, waiting
> for feedback about REJECT LIMIT + SET_TO_NULL behaviour.
>
There were warnings when I applied the patch:
$ git apply
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170:
trailing whitespace.
/*
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189:
trailing whitespace.
/* If datatype if okay with NULL,
replace
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196:
trailing whitespace.
v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212:
trailing whitespace.
/*
> @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState
> *pstate, bool is_from)
> parser_errposition(pstate, def->location)));
...
>
> - if (opts_out->reject_limit && !opts_out->on_error)
> + if (opts_out->reject_limit && !(opts_out->on_error ==
> COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE))
Hmm, is this change necessary?
Personally, I feel the previous code is easier to read.
"REJECT LIMIT" should be "REJECT_LIMIT"?
> 1037 errhint("Consider specifying the
> REJECT LIMIT option to skip erroneous rows.")));
SET_TO_NULL is one of the value for ON_ERROR, but the patch treats
SET_TO_NULL as option for COPY:
221 --- a/src/bin/psql/tab-complete.in.c
222 +++ b/src/bin/psql/tab-complete.in.c
223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id,
224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING",
"DEFAULT",
227 - "ON_ERROR", "LOG_VERBOSITY");
228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY");
> Best regards,
> Kirill Reshke
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
view thread (29+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox