public inbox for [email protected]  
help / color / mirror / Atom feed
From: Yugo Nagata <[email protected]>
To: Kirill Reshke <[email protected]>
Cc: torikoshia <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: jian he <[email protected]>
Cc: Jim Jones <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: Tue, 12 Nov 2024 14:03:50 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALdSSPhKs69u20a5CfYaYRegmfoz13+cjBngkC=RkTMvqtbH2g@mail.gmail.com>
References: <CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com>
	<CACJufxFT9j8o5kEC8dPCQqLomWjeJm9V9m8eZjj2Gvc_F5ha=g@mail.gmail.com>
	<[email protected]>
	<CAKFQuwaYNw8U-9JkFdyOX4i4Y3J1sp6+dk-sh8YmZGCq8gMeVQ@mail.gmail.com>
	<[email protected]>
	<CACJufxFFdtPKk4B5rSVNEk6yCH2Amvi_8w3Gaz5wg9M_t9c5Rw@mail.gmail.com>
	<[email protected]>
	<CACJufxEgiysa2SMJPGp0aN476Ojm636MfJK88DZC7TVYsXYBBQ@mail.gmail.com>
	<CALdSSPhgjCbyb=ZRgr4LaCFJV2-F9_CxMeX6poHuGCt_f9GYAw@mail.gmail.com>
	<[email protected]>
	<CALdSSPi1JE9xc31W6DPAdk-bQHeo3HNAYB-10Biruu-w4GJN0Q@mail.gmail.com>
	<[email protected]>
	<CALdSSPjYw5g7_sc++bRcxOnC7jW6O2qiSkgdKRUYFXZZv3-Ktw@mail.gmail.com>
	<[email protected]>
	<CALdSSPhKs69u20a5CfYaYRegmfoz13+cjBngkC=RkTMvqtbH2g@mail.gmail.com>

On Tue, 12 Nov 2024 01:27:53 +0500
Kirill Reshke <[email protected]> wrote:

> On Mon, 11 Nov 2024 at 16:11, torikoshia <[email protected]> wrote:
> >
> > On 2024-11-09 21:55, Kirill Reshke wrote:
> >
> > Thanks for working on this!
> 
> Thanks for reviewing the v7 patch series!
> 
> > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao <[email protected]>
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 2024/10/26 6:03, Kirill Reshke wrote:
> > >> > when the REJECT LIMIT is set to some non-zero number and the number of
> > >> > row NULL replacements exceeds the limit, is it OK to fail. Because
> > >> > there WAS errors, and we should not tolerate more than $limit errors .
> > >> > I do find this behavior to be consistent.
> > >>
> > >> +1
> > >>
> > >>
> > >> > But what if we don't set a REJECT LIMIT, it is sane to do all
> > >> > replacements, as if REJECT LIMIT is inf.
> > >>
> > >> +1
> > >
> > > After thinking for a while, I'm now more opposed to this approach. I
> > > think we should count rows with erroneous data as errors only if
> > > null substitution for these rows failed, not the total number of rows
> > > which were modified.
> > > Then, to respect the REJECT LIMIT option, we compare this number with
> > > the limit. This is actually simpler approach IMHO. What do You think?
> >
> > IMHO I prefer the previous interpretation.
> > I'm not sure this is what people expect, but I assume that REJECT_LIMIT
> > is used to specify how many malformed rows are acceptable in the
> > "original" data source.

I also prefer the previous version.
 
> I do like the first version of interpretation, but I have a struggle
> with it. According to this interpretation, we will fail COPY command
> if the number
> of malformed data rows exceeds the limit, not the number of rejected
> rows (some percentage of malformed rows are accepted with null
> substitution)
> So, a proper name for the limit will be MALFORMED_LIMIT, or something.
> However, we are unable to change the name since the REJECT_LIMIT
> option has already been committed.
> I guess I'll just have to put up with this contradiction. I will send
> an updated patch shortly...

I think we can rename the REJECT_LIMIT option because it is not yet released.

The documentation says that REJECT_LIMIT "Specifies the maximum number of errors",
and there are no wording "reject" in the description, so I wonder it is unclear
what means in "REJECT" in REJECT_LIMIT. It may be proper to use ERROR_LIMIT
since it is supposed to be used with ON_ERROR. 

Alternatively, if we emphasize that errors are handled other than terminating
the command,perhaps MALFORMED_LIMIT as proposed above or TOLERANCE_LIMIT may be
good, for example.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>






view thread (29+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox