public inbox for [email protected]
help / color / mirror / Atom feedFrom: Matheus Alcantara <[email protected]>
To: jian he <[email protected]>
Cc: torikoshia <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: vignesh C <[email protected]>
Cc: Jim Jones <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: Yugo NAGATA <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
Date: Tue, 27 Jan 2026 15:37:07 -0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACJufxGYPXQ_Jz1avF5eSh_XJRsxhPSUZ+=RzG3Hz4_XNAc32g@mail.gmail.com>
References: <CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com>
<CALDaNm0FUtPjH9wz5dgwOBNtsAXEZj=0-TYGsyQXchQy-hXGrw@mail.gmail.com>
<CACJufxFpm2Gzx4AD9qKtiQSiiiunX02wNTxu0JoFm7nEKF2KUw@mail.gmail.com>
<CAD21AoBjmEYjZT9A6P-vAuwboiiYtXaMc12pX2ySmh3RXi=v8w@mail.gmail.com>
<CACJufxF6_YwAboiCaVYLRtNpO4kGbXqkXzH_7W=pUvrNXK8WuQ@mail.gmail.com>
<CAD21AoBWwEzpp3Z6tp-O-AQGftK-kuj6vRva2L5daWoWBtbnRg@mail.gmail.com>
<CACJufxGEHmijmP-QqvrmqU6cxmhgpdjY7ewQBQ=E9NmdyEcqmw@mail.gmail.com>
<[email protected]>
<CACJufxG=em0PHZvy1EAZ+vxPZ8UA68MfQ-Hji+h+WgnWNpqmVQ@mail.gmail.com>
<CACJufxF0c3k5O8up9NOY-m02nyJ0f6N1tKxZwjCewTqvvFmbLw@mail.gmail.com>
<CACJufxHr-LBV2pB9m64mA=1pgVMM4LvUn+-MzpeViWV=Ks_cyg@mail.gmail.com>
<[email protected]>
<CACJufxEs_VBV39gYHnpLFOMcUaUD-ADAst_ePTCgmoDR8O=ekg@mail.gmail.com>
<[email protected]>
<CACJufxGYPXQ_Jz1avF5eSh_XJRsxhPSUZ+=RzG3Hz4_XNAc32g@mail.gmail.com>
On 22/01/26 11:45, jian he wrote:
>> Should FORCE_NOT_NULL be allowed to be used with ON_ERROR set_null? It
>> seems to me that ON_ERROR set_null overwrite the FORCE_NOT_NULL
>> behaviour:
>>
>> postgres=# create table t4(a int, b varchar(5));
>> CREATE TABLE
>>
>> postgres=# copy t4 from 'data.csv' with (FORCE_NOT_NULL(b), format csv, delimiter ',', NULL 'NULL', ON_ERROR set_null);
>> NOTICE: invalid values in 2 rows were replaced with null due to data type incompatibility
>> COPY 5
>>
>> postgres=# \pset null 'NULL'
>> Null display is "NULL".
>> postgres=# select * from t4;
>> a | b
>> ---+------
>> 1 | aaaa
>> 2 | bbbb
>> 2 | NULL
>> 2 | NULL
>> 5 | NULL
>> (5 rows)
>>
>> Note that only the ccccc rows on .csv file was inserted with a NULL
>> value on b column. The 5,NULL row was inserted with a "NULL" string as a
>> value:
>>
>> postgres=# select * from t4 where b is null;
>> a | b
>> ---+------
>> 2 | NULL
>> 2 | NULL
>> (2 rows)
>>
>> The contents of data.csv:
>> 1,aaaa
>> 2,bbbb
>> 2,ccccc
>> 2,ccccc
>> 5,NULL
>>
>> Perhaps we should block the usage of FORCE_NOT_NULL with ON_ERROR
>> SET_NULL?
>>
> FORCE_NOT_NULL is related to how we handle NULL string in column value.
>
> We first process cstate->opts.force_notnull_flags, cstate->opts.force_null_flags
> then InputFunctionCallSafe.
> see copyfromparse.c, CopyFromTextLikeOneRow ``if (is_csv)``loop.
>
> I think these two are unrelated things, FORCE_NOT_NULL should be fine with
> ON_ERROR SET_NULL.
> you can see related tests in
> https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/sql/copy2.sql#n330
>
> Am I missing something?
Yeah, after some more thinking it seems ok to use both options
together. I just found a bit strange when using integer columns.
Consider this example:
cat data.csv
1,11
2,22
3,
4,44
postgres=# create table t(a int not null, b int);
CREATE TABLE
postgres=# copy t from
'/Users/matheus/dev/pgdev/copy-on-error-set-null/data.csv' with
(FORCE_NOT_NULL(b), format csv, delimiter ',', ON_ERROR set_null);
NOTICE: 1 row was replaced with null due to data type incompatibility
COPY 4
postgres=# select * from t where b is null;
a | b
---+---
3 |
(1 row)
We are requiring a not null value on column b but we are still
generating rows with null values on b.
The reasoning on this is that the row 3 would generate a "invalid
input syntax for type integer" error and the ON_ERROR set_null fix
this by inserting a NULL value. It make sense I think but I'm
wondering if it could cause any confusion?
>> On monitoring.sgml we have the following for pg_stat_progress_copy
>> tuples_skipped:
>> Number of tuples skipped because they contain malformed data.
>> This counter only advances when a value other than
>> <literal>stop</literal> is specified to the <literal>ON_ERROR</literal>
>>
>> IIUC we are not updating this view if we set a column to NULL due to an
>> error, perhaps this documentation should be updated to mention that it
>> will not be updated with ON_ERROR set_null?
>>
>
> IMHO, we don't need to mention ON_ERROR set_null, since we do not support it.
> change to the following should be ok, i think.
>
> <para>
> Number of tuples skipped because they contain malformed data.
> This counter only advances when
> <literal>ignore</literal> is specified to the <literal>ON_ERROR</literal>
> option.
> </para></entry>
It looks good, I was thinking in something like this.
>>
>> I may have missing something, but we are still considering implementing
>> the REJECT_LIMIT + ON_ERROR set_null?
> Possibly as a separate patch later.
Ok, good, thanks.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
view thread (15+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox