Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rUFO6-009lTH-CK for pgsql-hackers@arkaria.postgresql.org; Mon, 29 Jan 2024 00:20:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rUFO4-00AY1U-Mv for pgsql-hackers@arkaria.postgresql.org; Mon, 29 Jan 2024 00:20:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rUFO4-00AY1M-DR for pgsql-hackers@lists.postgresql.org; Mon, 29 Jan 2024 00:20:32 +0000 Received: from mail-oa1-x2b.google.com ([2001:4860:4864:20::2b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rUFNz-004EsV-OD for pgsql-hackers@lists.postgresql.org; Mon, 29 Jan 2024 00:20:31 +0000 Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-210c535a19bso1356290fac.1 for ; Sun, 28 Jan 2024 16:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706487625; x=1707092425; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+JZe7zbPWJQu21snO4Qhh3DYhsPUI80TcbOF7+pb+tE=; b=hO4CeeSuUvl23Xm2aOOf6QN/XjzJHLBN1BslkrYaYPqWaovfMUFUCREdvDvTliaWXD R251mQ8DypZorYGjJmaqVNbfMZpScOR+HFnEglSAtJ/B7xGzdYD+Y1B+WI8F9QpOznji Gi1AVaq2d6e4B1GF9bukCbpaycyNW6W5blwcs/xHoQbiVwxaLUy9N6kN+5Ga/c0gkPYq coIpbzyM3QxymuA6SSmxz7J7H0J8TCf7TZDKs80yVhUhw3GIel5sQyOm6a4uJ5xpWZBl oZRoGC4nlBEp3gX106Xd5D3E0xVVNf1+cwhNbTUln5r/Ser5a0ddUJyl57BQWQmpYFyk O6nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706487625; x=1707092425; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+JZe7zbPWJQu21snO4Qhh3DYhsPUI80TcbOF7+pb+tE=; b=qXmcUNAGr3ku2tGoboWnn8Lyept9TuOjL9moh3OQFr1E4WrmqBrSMj2NoRQE47AlCB m1zwgDm/2ktYE2NShpguKXiVreNhq+JMs640QUbQER2XIv3aj0gyGj5LC/lciIIeaNX3 3Q6/RZnm9YuzQHQy3Hdruir5v+3CtSRMzCqU5UCP7cFpK4W9pVoQmHHEIFqlDPGTwhSV U6fSIZMyQbNp6MAOSrPMhy3DH3CJQCHWf1jD4ADy521ddEEhWPMhwvi3Y/OwEcm3Bmzm TtYDzHjuI9nhVy5q826kvXfpCKij5TwWq40XOnFjsd+GzdXp26ifzy6F19Anux6J0s19 XSKg== X-Gm-Message-State: AOJu0YzQBazfzYLZWDnc9JFgDa2hpNlyodoD+M/HgFBFWwZlDNbZ8/jA fJEQ71hNuRNikYakSAXsQhfR5ziDwP1uFeHyorbvYde4mWvmCeCn7YYH4zVqrgSLxi3775Gub59 F05eHEnXJQj+qEnKUh4Z51WCdfEw= X-Google-Smtp-Source: AGHT+IG6i7EIzLENxg44Gb1nXlFdlnfCG4xqgNhLOyigZLuhmMpLTb8h1VD02pNEYv9u2jay0F+4zFWt11PYEObtbF0= X-Received: by 2002:a05:6870:63a0:b0:214:9ecd:56e8 with SMTP id t32-20020a05687063a000b002149ecd56e8mr4320936oap.33.1706487625136; Sun, 28 Jan 2024 16:20:25 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: "David G. Johnston" Date: Sun, 28 Jan 2024 17:19:47 -0700 Message-ID: Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row To: jian he Cc: PostgreSQL Hackers Content-Type: multipart/alternative; boundary="000000000000b5bd6706100a9c75" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b5bd6706100a9c75 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Jan 28, 2024 at 4:51=E2=80=AFPM jian he wrote: > On Fri, Jan 26, 2024 at 11:09=E2=80=AFPM David G. Johnston > wrote: > > > > Hi, > > > > The option choice of "ignore" in the COPY ON_ERROR clause seems overly > generic. There would seem to be two relevant ways to ignore bad column > input data - drop the entire row or just set the column value to null. I > can see us wanting to provide the set to null option and in any case havi= ng > the option name be explicit that it ignores the row seems like a good ide= a. > > two issue I found out while playing around with it; > create table x1(a int not null, b int not null ); > > another issue: > COPY x1 from stdin (on_error null); > > when we already have `not null` top level constraint for table x1. > Do we need an error immediately? > "on_error null" seems to conflict with `not null` constraint (assume > refers to the same column). > it may fail while doing bulk inserts while on_error is set to null > because of violating a not null constraint. > You should not error immediately since whether or not there is a problem is table and data dependent. I would not check for the case of all columns being defined not null and just let the mismatch happen. That said, maybe with this being a string we can accept something like: 'null, ignore' And so if attempting to place any one null fails, assuming we can make that a soft error too, we would then ignore the entire row. David J. --000000000000b5bd6706100a9c75 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sun, Jan 28, 2024 at 4:51= =E2=80=AFPM jian he <jian= .universality@gmail.com> wrote:
On Fri, Jan 26, 2024 at 11:09=E2=80=AFPM David G. Jo= hnston
<david.g= .johnston@gmail.com> wrote:
>
> Hi,
>
> The option choice of "ignore" in the COPY ON_ERROR clause se= ems overly generic.=C2=A0 There would seem to be two relevant ways to ignor= e bad column input data - drop the entire row or just set the column value = to null.=C2=A0 I can see us wanting to provide the set to null option and i= n any case having the option name be explicit that it ignores the row seems= like a good idea.

two issue I found out while playing around with it;
create table x1(a int not null, b int not null );

another issue:
COPY x1 from stdin (on_error null);

when we already have `not null` top level constraint for table x1.
Do we need an error immediately?
"on_error null" seems to conflict with `not null` constraint (ass= ume
refers to the same column).
it may fail while doing bulk inserts while on_error is set to null
because of violating a not null constraint.

=
You should not error immediately since whether or not there is a problem= is table and data dependent.=C2=A0 I would not check for the case of all c= olumns being defined not null and just let the mismatch happen.
That said, maybe with this being a string we can accept something= like: 'null, ignore'

And so if attempting to = place any one null fails, assuming we can make that a soft error too, we wo= uld then ignore the entire row.

David J.

<= /div>
--000000000000b5bd6706100a9c75--