Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1vAO-00AGZD-4G for pgsql-hackers@arkaria.postgresql.org; Mon, 07 Apr 2025 22:42:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u1vAL-004SYS-GD for pgsql-hackers@arkaria.postgresql.org; Mon, 07 Apr 2025 22:42:05 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1vAK-004SYK-Vn for pgsql-hackers@lists.postgresql.org; Mon, 07 Apr 2025 22:42:05 +0000 Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u1vAI-003YU2-2p for pgsql-hackers@lists.postgresql.org; Mon, 07 Apr 2025 22:42:04 +0000 Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-5499c5d9691so5570601e87.2 for ; Mon, 07 Apr 2025 15:42:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744065721; x=1744670521; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=99oPZjk9zFtB84zjwI2sTOit7zwNdcwqR/XW2q7Lsgk=; b=bV2gzvgIbHFQ9XXC7EI0TMT1SWFQUpyXOb4zr7tLF4LwxtlivPIINo0SGYK6WEH8nD 42GPyDHC1qKXOcTSjbc1OC5n6K/rb9HYrRW086wz5sbSuNdT0OnDqfhj138S5p8wzmOI +FT79viUric4BVWNRKTjinjjDov84jp+/YgFQiBjlpWcocwrA5pKuJDRkmNnLzVD7UZ1 Ihe6S8kb5L/g8ygmoCuR8vFVJdZQgLq8e4YU4CcERkROTcoWFmvm9vnQxQR/XNFubE8y tgNQjqrdSkeOn8uhsHC4UzeqrcGIEEv3p81uW35XftJG+IF9W7Zh2QtmV/qCVP2AbvgQ GNng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744065721; x=1744670521; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=99oPZjk9zFtB84zjwI2sTOit7zwNdcwqR/XW2q7Lsgk=; b=cX3KElYGshkrdjyFjcc6Idm/rt+/iGyoMW7OBE8oUaSPY/3clX+GEZHPhscmv9ghnB uQVdJopfzp/jZhXyc86yNUr5SbCVs7BZZumLqgDNh+QykvQgowQGsCszZObGk7wkyk1l 3Qw/S6zSb6FGlm4wmaWSkBLlshFdMJg0nN94xcGd6TMwFIU6yTb3Mi0XDEiVGdsYLfMd xRrBredV+7u26f9U0ojI18o08lGwoEg0mBhaVz2PdqO4I20bnBy11RYJPqEygRFqOc1B efL/CnKXNFGftrR93NWddQAWOO4X7b1CV0uujcNl4XkRgxGMC+HidXLyHqBGFSfNWG9n Ueiw== X-Forwarded-Encrypted: i=1; AJvYcCWzqQXyNfsByRqEUbc5S17MRXqOKZ379EArkEvyeyF/QxrGwy1JAv7Ool9VPP1fyHC9ToUNhWZ7VwlnebGd@lists.postgresql.org X-Gm-Message-State: AOJu0YyNG4/gvCRVMCOQjmfENWsaRwaPamiQ7zPbJxFCH8bFALAtTF0K CQhegtnTZDmBKJlzAhSULAPG3W8T5Ye0eMe8VQam9vMuwYuMIhP2ELcVwYfwbySelQvtqqZ2vS4 tIEnmXYVEAih34nXPMJfuT+CgCGo= X-Gm-Gg: ASbGncsBQjAztXyQqs7K9vZC3V3rqcihZosqcnxKrZNnWY5T2PiAkkVxdrOFwqp3WKa iZSPGAFuoXbfp1yxpEo4InF9xT1tAxCA9Lfp+C7F1PAkNTfM8zpsAnmnp1xXpIqsth+RLtzZ5C6 7bUrIZI5FSRGmgJMU1ApC9BSg9Hpo= X-Google-Smtp-Source: AGHT+IGLI2WVp9iaVGMqjluy+q/0KFxgr8Z4IM1azNJVpmgVYpC+GmvSW5aIQznDb6u1LuNr/GlgoGglWI8UMOIbtbM= X-Received: by 2002:a05:6512:68a:b0:545:2950:5360 with SMTP id 2adb3069b0e04-54c22777b1amr3400381e87.22.1744065721071; Mon, 07 Apr 2025 15:42:01 -0700 (PDT) MIME-Version: 1.0 References: <90dc6e9d-9348-485a-b27c-7b1637f06c2e@uni-muenster.de> In-Reply-To: From: Masahiko Sawada Date: Mon, 7 Apr 2025 15:41:24 -0700 X-Gm-Features: ATxdqUF6F3b2zuEzJeUUWhWQQ-1Lhw_7Jd4WcJBJqS3Vm3cuWpgL10XuXGnR55A Message-ID: Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row To: jian he Cc: vignesh C , Jim Jones , Kirill Reshke , Fujii Masao , "David G. Johnston" , Yugo NAGATA , torikoshia , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Apr 5, 2025 at 1:31=E2=80=AFAM jian he wrote: > > On Sat, Apr 5, 2025 at 5:33=E2=80=AFAM Masahiko Sawada wrote: > > > > On Fri, Apr 4, 2025 at 4:55=E2=80=AFAM jian he wrote: > > > > > > On Tue, Mar 25, 2025 at 2:31=E2=80=AFPM vignesh C wrote: > > > > > > > > 2) Here in error we say column c1 violates not-null constraint and = in > > > > the context we show column c2, should the context also display c2 > > > > column: > > > > postgres=3D# create table t3(c1 int not null, c2 int, check (c1 > 1= 0)); > > > > CREATE TABLE > > > > postgres=3D# COPY t3 FROM STDIN WITH (on_error set_to_null); > > > > Enter data to be copied followed by a newline. > > > > End with a backslash and a period on a line by itself, or an EOF si= gnal. > > > > >> a b > > > > >> \. > > > > ERROR: null value in column "c1" of relation "t3" violates not-nul= l constraint > > > > DETAIL: Failing row contains (null, null). > > > > CONTEXT: COPY t3, line 1, column c2: "b" > > > > > > > > > > It took me a while to figure out why. > > > with the attached, now the error message becomes: > > > > > > ERROR: null value in column "c1" of relation "t3" violates not-null = constraint > > > DETAIL: Failing row contains (null, null). > > > CONTEXT: COPY t3, line 1: "a,b" > > > > > > while at it, > > > (on_error set_to_null, log_verbosity verbose) > > > error message CONTEXT will only emit out relation name, > > > this aligns with (on_error ignore, log_verbosity verbose). > > > > > > one of the message out example: > > > +NOTICE: column "b" was set to null due to data type incompatibility= at line 2 > > > +CONTEXT: COPY t_on_error_null > > > > > > > > > > > > > 3) typo becomen should be become: > > > > null will becomen reserved to non-reserved > > > fixed. > > > > > > > 4) There is a whitespace error while applying patch > > > > Applying: COPY (on_error set_to_null) > > > > .git/rebase-apply/patch:39: trailing whitespace. > > > > a NOTICE message indicating the number of = rows > > > > warning: 1 line adds whitespace errors. > > > fixed. > > > > I've reviewed the v15 patch and here are some comments: > > > > How about renaming the new option value to 'set_null"? The 'to' in the > > value name seems redundant to me. > > > > --- > > + COPY_ON_ERROR_NULL, /* set error field to n= ull */ > > > > I think it's better to rename COPY_ON_ERROR_SET_TO_NULL (or > > COPY_ON_ERROR_SET_NULL if we change the option value name) for > > consistency with the value name. > > > > --- > > + else if (cstate->opts.on_error =3D=3D COPY_ON_ERROR_NU= LL) > > + ereport(NOTICE, > > + errmsg_plural("invalid values > > in %" PRIu64 " row was replaced with null", > > + > > "invalid values in %" PRIu64 " rows were replaced with null", > > + > > cstate->num_errors, > > + > > cstate->num_errors)); > > > > How about adding "due to data type incompatibility" at the end of the m= essage? > > > > --- > > + ereport(NOTICE, > > + errmsg("column > > \"%s\" was set to null due to data type incompatibility at line %" > > PRIu64 "", > > + > > cstate->cur_attname, > > + > > cstate->cur_lineno)); > > > > Similar to the IGNORE case, we can show the data in question in the mes= sage. > > > > --- > > + else > > + ereport(ERROR, > > + > > errcode(ERRCODE_NOT_NULL_VIOLATION), > > + errmsg("domain %s does > > not allow null values", format_type_be(typioparams[m])), > > + errdatatype(typioparams[m]= )); > > > > If domain data type is the sole case where not to accept NULL, can we > > check it beforehand to avoid calling the second > > InputFunctionCallSafe() for non-domain data types? Also, if we want to > > end up with an error when setting NULL to a domain type with NOT NULL, > > I think we don't need to try to handle a soft error by passing > > econtext to InputFunctionCallSafe(). > > > > please check attached, hope i have addressed all the points you've mentio= ned. > > > > If domain data type is the sole case where not to accept NULL, can we > > check it beforehand to avoid calling the second > > InputFunctionCallSafe() for non-domain data types? > > I doubt it. > > we have > InputFunctionCallSafe(FmgrInfo *flinfo, char *str, > Oid typioparam, int32 typmod, > fmNodePtr escontext, > Datum *result) > { > LOCAL_FCINFO(fcinfo, 3); > if (str =3D=3D NULL && flinfo->fn_strict) > { > *result =3D (Datum) 0; /* just return null result */ > return true; > } > } > > Most of the non-domain type input functions are strict. > see query result: > > select proname, pt.typname, proisstrict,pt.typtype > from pg_type pt > join pg_proc pp on pp.oid =3D pt.typinput > where pt.typtype <> 'd' > and pt.typtype <> 'p' > and proisstrict is false; > > so the second InputFunctionCallSafe will be faster for non-domain types. Agreed. BTW have you measured the overheads of calling InputFunctionCallSafe twice? If it's significant, we might want to find other ways to achieve it as it would not be good to incur overhead just for relatively rare cases. Here are some comments: + if (InputFunctionCallSafe(&in_functions[m], + NULL, + typioparams[m], + att->atttypmod, + NULL, + &values[m])) Given that we pass NULL to escontext, does this function return false in an error case? Or can we use InputFunctionCall instead? I think we should mention that SET_NULL still could fail if the data type of the column doesn't accept NULL. How about restructuring the codes around handling data incompatibility errors like: else if (!InputFunctionCallSafe(...)) { if (cstate->opts.on_error =3D=3D IGNORE) { cstate->num_errors++; if (cstate->opts.log_verbosity =3D=3D VERBOSE) write a NOTICE message; return true; // ignore whole row. } else if (cstate->opts.on_error =3D=3D SET_NULL) { current_row_erroneous =3D true; set NULL to the column; if (cstate->opts.log_verbosity =3D=3D VERBOSE) write a NOTICE message; continue; // go to the next column. } That way, we have similar structures for both on_error handling and don't need to reset cstate->cur_attname at the end of SET_NULL handling. --- From the regression tests: --fail, column a is domain with not-null constraint COPY t_on_error_null FROM STDIN WITH (on_error set_null); a 11 14 \. ERROR: domain d_int_not_null does not allow null values CONTEXT: COPY t_on_error_null, line 1, column a: "a" I guess that the log messages could confuse users since while the actual error was caused by setting NULL to the non-NULL domain type column, the context message says the data 'a' was erroneous. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com