Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAb1A-00GdVN-6v for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 20:28:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tAb17-000LvM-8T for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 20:28:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAb16-000LvD-Tm for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 20:28:09 +0000 Received: from mail-lj1-x22d.google.com ([2a00:1450:4864:20::22d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tAb13-001Pzl-Hb for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 20:28:08 +0000 Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2fb5fa911aaso62095531fa.2 for ; Mon, 11 Nov 2024 12:28:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731356885; x=1731961685; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oqM6hn1RA+4iws4epy7wmn0r59uRFA3tfT+gzTlaLmM=; b=gobebgYBkbZeGhRMTsWUc3NoHsWboDHOY3z4ji/OYIYlhrzf6V+Gr+bDctrZGSxBB3 uw8oQNdBN1ISS0vbmahC1Bw1MjMIHgh88shjR7g8dGcyRbuM+ZxqGNaJ63URq623XsHn 4HmD6r+X2SOcqVT16xQhvmMPpsGkC5I+8rnYBXGbFEG6eBlW2Me+K0+xttxWY0c/NQSP qy1oA/vVepI3JNtQAMNBRiR2zP/wxmW5n+7JmUcn+zBV7VOL4Z+J1eHmLt4ejBDlbCBB +7IS0XWed/gkbZXhpoG7b8LhHQrl2UrQHlDRWlsLYmNhudLirSbrEMcoHXNGSnJz/ifH GyHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731356885; x=1731961685; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oqM6hn1RA+4iws4epy7wmn0r59uRFA3tfT+gzTlaLmM=; b=aBbTrcV78D58nJlg8hNdwN4e/cq2fojHLnfUUr30cWREpmKdl4S5/5G9dzUuf9I/2C Bffe4JTOpUf75FrtyzwSOCEeX8wZmawo4kj3QDSMK9uLkXXv95QNFJnb9XDcGzz0Ngsw XtNfGC0u4gdEQ6RLoXkWX6mmOXQZDLQD4LuWlH3SboZuXgCSahiKAdd+hpXYVkuDAvaH 4YacQzU+vmBK+m+CWVJZOYBIjuYaWU6TYEU0ctXSpwCsIsmda/vT5hgPQ6mzNJPIWoge qzPDIMCOMu2oB2ce+hoh3GrDPTmPHjWpHGFJ6zYJFXh6W8WWI89YHGqNxq+cIJS9Wa9a DY+A== X-Forwarded-Encrypted: i=1; AJvYcCUuvMtnlUkT6pLru6R1hgC08RuVfuyXNXhbmKM6VK6iRtPzsO/Hzz4+qwN6xZ5yHiDfrXFg1w5XyNq16XZD@lists.postgresql.org X-Gm-Message-State: AOJu0YyleMju0ULLVVwBNO8Hm7+69JqObkaf85cAPRb+TRbCqf38RBmc ZgnurPokoBOnhMF+8VZT3jOWY/koB1rHtOD4B1iDhYQqMBZE1ao4kznsFKa3G8Elq2iA6Pp5JiB rWSNPv3GP5JWbSr0p8INfrxFBGDE= X-Google-Smtp-Source: AGHT+IGsWdwWjisjOpMoKtXkxJDd1sgujAEOOa9Y+9489N3tE+CaYrDRsU58attN5gwByW5/XqCxhOR/vqjZuexETwU= X-Received: by 2002:a05:651c:1146:b0:2fa:de13:5c34 with SMTP id 38308e7fff4ca-2ff201c80b1mr96499811fa.19.1731356885061; Mon, 11 Nov 2024 12:28:05 -0800 (PST) MIME-Version: 1.0 References: <04bf425ad1b15a4daefe96c478a5253b@oss.nttdata.com> <20240206191937.72eaf0ccc20cfea37944b422@sraoss.co.jp> <76da9fcc-93c5-4053-872e-12932a95356d@uni-muenster.de> <6eac5b45-7f45-4c7a-aae1-e90db8be2e08@uni-muenster.de> <3d6b5885-16a1-475d-b56f-41701c48d9d4@uni-muenster.de> <63595e8f-a245-4335-aa22-7e449a70e210@oss.nttdata.com> <07587c36-18b3-4ccb-b5fb-579bcb04ed37@oss.nttdata.com> <501dd655ddb04693c15baeb6485bc601@oss.nttdata.com> In-Reply-To: <501dd655ddb04693c15baeb6485bc601@oss.nttdata.com> From: Kirill Reshke Date: Tue, 12 Nov 2024 01:27:53 +0500 Message-ID: Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row To: torikoshia Cc: Fujii Masao , jian he , Jim Jones , "David G. Johnston" , Yugo NAGATA , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 11 Nov 2024 at 16:11, torikoshia wrote: > > On 2024-11-09 21:55, Kirill Reshke wrote: > > Thanks for working on this! Thanks for reviewing the v7 patch series! > > On Thu, 7 Nov 2024 at 23:00, Fujii Masao > > wrote: > >> > >> > >> > >> On 2024/10/26 6:03, Kirill Reshke wrote: > >> > when the REJECT LIMIT is set to some non-zero number and the number of > >> > row NULL replacements exceeds the limit, is it OK to fail. Because > >> > there WAS errors, and we should not tolerate more than $limit errors . > >> > I do find this behavior to be consistent. > >> > >> +1 > >> > >> > >> > But what if we don't set a REJECT LIMIT, it is sane to do all > >> > replacements, as if REJECT LIMIT is inf. > >> > >> +1 > > > > After thinking for a while, I'm now more opposed to this approach. I > > think we should count rows with erroneous data as errors only if > > null substitution for these rows failed, not the total number of rows > > which were modified. > > Then, to respect the REJECT LIMIT option, we compare this number with > > the limit. This is actually simpler approach IMHO. What do You think? > > IMHO I prefer the previous interpretation. > I'm not sure this is what people expect, but I assume that REJECT_LIMIT > is used to specify how many malformed rows are acceptable in the > "original" data source. I do like the first version of interpretation, but I have a struggle with it. According to this interpretation, we will fail COPY command if the number of malformed data rows exceeds the limit, not the number of rejected rows (some percentage of malformed rows are accepted with null substitution) So, a proper name for the limit will be MALFORMED_LIMIT, or something. However, we are unable to change the name since the REJECT_LIMIT option has already been committed. I guess I'll just have to put up with this contradiction. I will send an updated patch shortly... > >> > But while I was trying to implement that, I realized that I don't > >> > understand v4 of this patch. My misunderstanding is about > >> > `t_on_error_null` tests. We are allowed to insert a NULL value for the > >> > first column of t_on_error_null using COPY ON_ERROR SET_TO_NULL. Why > >> > do we do that? My thought is we should try to execute > >> > InputFunctionCallSafe with NULL value (i mean, here [1]) for the > >> > column after we failed to insert the input value. And, if this second > >> > call is successful, we do replacement, otherwise we count the row as > >> > erroneous. > >> > >> Your concern is valid. Allowing NULL to be stored in a column with a > >> NOT NULL > >> constraint via COPY ON_ERROR=SET_TO_NULL does seem unexpected. As you > >> suggested, > >> NULL values set by SET_TO_NULL should probably be re-evaluated. > > > > Thank you. I updated the patch with a NULL re-evaluation. > > > > > > PFA v7. I did not yet update the doc for this patch version, waiting > > for feedback about REJECT LIMIT + SET_TO_NULL behaviour. > > > > > There were warnings when I applied the patch: > > $ git apply > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:170: > trailing whitespace. > /* > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:181: > trailing whitespace. > > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:189: > trailing whitespace. > /* If datatype if okay with NULL, > replace > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:196: > trailing whitespace. > > v7-0001-Incrtoduce-COPY-option-to-replace-columns-contain.patch:212: > trailing whitespace. > /* > > > @@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState > > *pstate, bool is_from) > > parser_errposition(pstate, def->location))); > ... > > > > - if (opts_out->reject_limit && !opts_out->on_error) > > + if (opts_out->reject_limit && !(opts_out->on_error == > > COPY_ON_ERROR_NULL || opts_out->on_error == COPY_ON_ERROR_IGNORE)) > > Hmm, is this change necessary? > Personally, I feel the previous code is easier to read. > > > "REJECT LIMIT" should be "REJECT_LIMIT"? > > 1037 errhint("Consider specifying the > > REJECT LIMIT option to skip erroneous rows."))); > > > SET_TO_NULL is one of the value for ON_ERROR, but the patch treats > SET_TO_NULL as option for COPY: > > 221 --- a/src/bin/psql/tab-complete.in.c > 222 +++ b/src/bin/psql/tab-complete.in.c > 223 @@ -3235,7 +3235,7 @@ match_previous_words(int pattern_id, > 224 COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL", > 225 "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE", > 226 "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", > "DEFAULT", > 227 - "ON_ERROR", "LOG_VERBOSITY"); > 228 + "ON_ERROR", "SET_TO_NULL", "LOG_VERBOSITY"); > > > > Best regards, > > Kirill Reshke > > -- > Regards, > > -- > Atsushi Torikoshi > Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K. -- Best regards, Kirill Reshke