public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Please implement a catch-all error handler per row, for COPY
8+ messages / 7 participants
[nested] [flat]

* Re: Please implement a catch-all error handler per row, for COPY
@ 2025-03-01 16:20 Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Tom Lane @ 2025-03-01 16:20 UTC (permalink / raw)
  To: me nefcanto <[email protected]>; +Cc: [email protected]

me nefcanto <[email protected]> writes:
> Can you please provide a row-level catch-all handler for the copy command?

Very unlikely to happen.  COPY is not intended as a general purpose
ETL mechanism, and we don't want to load it down with features that
would create substantial performance penalties.  Which that would.

Even ignoring the performance angle, this request seems remarkably
ill-specified.  What is a "row-level handler" for errors that have
to do with identifying row boundaries?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
@ 2025-03-01 16:35 ` David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: David G. Johnston @ 2025-03-01 16:35 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: me nefcanto <[email protected]>; [email protected]

On Sat, Mar 1, 2025 at 9:20 AM Tom Lane <[email protected]> wrote:

> me nefcanto <[email protected]> writes:
> > Can you please provide a row-level catch-all handler for the copy
> command?
>
> Very unlikely to happen.  COPY is not intended as a general purpose
> ETL mechanism, and we don't want to load it down with features that
> would create substantial performance penalties.  Which that would.
>

Maybe it isn't a general purpose ETL tool but there is no reasonable way to
do some things unless COPY can be put into a mode that doesn't have the
same performance requirements it needs to serve as our dump/restore tool of
choice.

I have to imagine such a mode, if not enabled, would have little to no
impact on how COPY behaves compared to today.  It's kinda like VACUUM FULL
existing doesn't impact how VACUUM behaves.  Seems more desirable than
inventing a whole new SQL Command to do this and copy-paste all of the COPY
code since it does mostly the same thing.

In short, it's probably worth giving it a try if someone wants to.
Rejecting it without seeing the proposal seems premature.

Now, it's probably challenging enough that if the person requesting the
feature isn't driving its development the odds of it getting worked on is
fairly low.

David J.


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
@ 2025-03-01 17:14   ` Greg Sabino Mullane <[email protected]>
  2025-03-03 03:44     ` Re: Please implement a catch-all error handler per row, for COPY me nefcanto <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Greg Sabino Mullane @ 2025-03-01 17:14 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Tom Lane <[email protected]>; me nefcanto <[email protected]>; [email protected]

FYI the -bugs thread in question:

https://www.postgresql.org/message-id/flat/CAEHBEOBCweDWGNHDaUk4%3D10HG0QXXJJAGXbEnFLMB30M%2BQw%2Bdg...

seems to imply the primary blocker was a unique constraint.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
@ 2025-03-03 03:44     ` me nefcanto <[email protected]>
  2025-03-03 03:54       ` Re: Please implement a catch-all error handler per row, for COPY Christophe Pettus <[email protected]>
  2025-03-03 04:07       ` Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-03 05:33       ` Re: Please implement a catch-all error handler per row, for COPY Adrian Klaver <[email protected]>
  2025-03-03 08:55       ` Re: Please implement a catch-all error handler per row, for COPY Laurenz Albe <[email protected]>
  0 siblings, 4 replies; 8+ messages in thread

From: me nefcanto @ 2025-03-03 03:44 UTC (permalink / raw)
  To: Greg Sabino Mullane <[email protected]>; +Cc: David G. Johnston <[email protected]>; Tom Lane <[email protected]>; [email protected]

Thank you all for responding. With respect, I don't understand why COPY
gets related to ETL. All of the requirements I mentioned above have nothing
to do with ETL. We have a table of categories. A hierarchical table. This
is a multitenant app. Hundreds of thousands of records are in it. We want
to calculate the hierarchical properties (nested set models, node depth,
determining leaves, materialized paths, etc.) and then update the entire
table. What does this have to do with ETL? Or as I said we create millions
of records for products, attributes of products, and pricing and media of
products to load test our system. Again, what does that have to do with ETL?

The point is, that there is already an `on_error ignore` clause there. This
means that somewhere there is a try/catch per row. If I'm wrong, please let
me know. But when the `on_error ignore` catches problem x for each row,
then it can catch all problems for each row without any performance problem.

Let me give you an example in C#:

try
{
}
catch (SomeException ex)
{
}

becomes:

try
{
}
catch (Exception ex)
{
    if (ex is SomeException)
    {
    }
}

The last catch clause catches everything. How does it affect performance?
Running a simple if for hundreds of millions of iterations is literally
nothing in time complexity.

As I have specified in the bug thread, from 11 RDBMSs, 7 support this. Thus
it's not an uncommon weird request.

Regards
Saeed

On Sat, Mar 1, 2025 at 8:45 PM Greg Sabino Mullane <[email protected]>
wrote:

> FYI the -bugs thread in question:
>
>
> https://www.postgresql.org/message-id/flat/CAEHBEOBCweDWGNHDaUk4%3D10HG0QXXJJAGXbEnFLMB30M%2BQw%2Bdg...
>
> seems to imply the primary blocker was a unique constraint.
>
> Cheers,
> Greg
>
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
>
>


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
  2025-03-03 03:44     ` Re: Please implement a catch-all error handler per row, for COPY me nefcanto <[email protected]>
@ 2025-03-03 03:54       ` Christophe Pettus <[email protected]>
  3 siblings, 0 replies; 8+ messages in thread

From: Christophe Pettus @ 2025-03-03 03:54 UTC (permalink / raw)
  To: me nefcanto <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; David G. Johnston <[email protected]>; Tom Lane <[email protected]>; [email protected]



> On Mar 2, 2025, at 19:44, me nefcanto <[email protected]> wrote:
> 
> As I have specified in the bug thread, from 11 RDBMSs, 7 support this. Thus it's not an uncommon weird request.

If your organization is interested in producing a design and a patch, or paying a developer or organization to do so, that would be the best way forward.  Everyone who works on the core PostgreSQL code is either a volunteer or extremely busy.  Those that are paid to work on PostgreSQL usually have their priorities mapped out for months (years?) already.  The best way to convince the community to adopt a feature to come bearing a high-quality patch.  That's no guarantee, of course, but the probability is much higher that way.





^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
  2025-03-03 03:44     ` Re: Please implement a catch-all error handler per row, for COPY me nefcanto <[email protected]>
@ 2025-03-03 04:07       ` Tom Lane <[email protected]>
  3 siblings, 0 replies; 8+ messages in thread

From: Tom Lane @ 2025-03-03 04:07 UTC (permalink / raw)
  To: me nefcanto <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; David G. Johnston <[email protected]>; [email protected]

me nefcanto <[email protected]> writes:
> The point is, that there is already an `on_error ignore` clause there. This
> means that somewhere there is a try/catch per row. If I'm wrong, please let
> me know.

You are wrong.  See

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=d9f7f5d32

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=9e2d87011

and a ton of related commits.

It's possible that the specific case of unique-index violations
could be handled in a similar style.  But "catch any error whatever"
is simply not going to happen here, because a subtransaction per
row is not practical.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
  2025-03-03 03:44     ` Re: Please implement a catch-all error handler per row, for COPY me nefcanto <[email protected]>
@ 2025-03-03 05:33       ` Adrian Klaver <[email protected]>
  3 siblings, 0 replies; 8+ messages in thread

From: Adrian Klaver @ 2025-03-03 05:33 UTC (permalink / raw)
  To: me nefcanto <[email protected]>; Greg Sabino Mullane <[email protected]>; +Cc: David G. Johnston <[email protected]>; Tom Lane <[email protected]>; [email protected]

On 3/2/25 19:44, me nefcanto wrote:
> Thank you all for responding. With respect, I don't understand why COPY 
> gets related to ETL. All of the requirements I mentioned above have 
> nothing to do with ETL. We have a table of categories. A hierarchical 
> table. This is a multitenant app. Hundreds of thousands of records are 
> in it. We want to calculate the hierarchical properties (nested set 
> models, node depth, determining leaves, materialized paths, etc.) and 
> then update the entire table. What does this have to do with ETL? Or as

Pretty sure it defines ETL, extract data from a source, transform it to 
meet the properties and load it.

At any rate:

1) ON_ERROR IGNORE only just appeared in Postgres 17.

2) Any further changes to it would constitute a feature change that can 
only happen in major version upgrade. The next major version is due this 
Fall and development is already well under way. The chances of it making 
it in given the time frame is slim, so that pushes a practicable 
deadline until Fall of 2026.

3) As I mentioned in a previous post there are packages already out 
there that may get you want you want now.



-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Please implement a catch-all error handler per row, for COPY
  2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
  2025-03-01 16:35 ` Re: Please implement a catch-all error handler per row, for COPY David G. Johnston <[email protected]>
  2025-03-01 17:14   ` Re: Please implement a catch-all error handler per row, for COPY Greg Sabino Mullane <[email protected]>
  2025-03-03 03:44     ` Re: Please implement a catch-all error handler per row, for COPY me nefcanto <[email protected]>
@ 2025-03-03 08:55       ` Laurenz Albe <[email protected]>
  3 siblings, 0 replies; 8+ messages in thread

From: Laurenz Albe @ 2025-03-03 08:55 UTC (permalink / raw)
  To: me nefcanto <[email protected]>; Greg Sabino Mullane <[email protected]>; +Cc: David G. Johnston <[email protected]>; Tom Lane <[email protected]>; [email protected]

On Mon, 2025-03-03 at 07:14 +0330, me nefcanto wrote:
> The point is, that there is already an `on_error ignore` clause there. This means that
> somewhere there is a try/catch per row. If I'm wrong, please let me know.

The crucial point that Tom referred to is the "per row".  What is a row?

Imagine you have a COPY FROM statement for a table with three columns,
and the data look like this:

1,Smith,John
2,Lewis,Jerry
Lee
3,Prince
4,Albe,Laurenz

We may be able to guess what is meant, but how shall the machine know
where the line boundaries are, which data to ignore and which to process?

Currently, that is no problem, because errors are only identified
after the data have been successfully parsed.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 8+ messages in thread


end of thread, other threads:[~2025-03-03 08:55 UTC | newest]

Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-01 16:20 Re: Please implement a catch-all error handler per row, for COPY Tom Lane <[email protected]>
2025-03-01 16:35 ` David G. Johnston <[email protected]>
2025-03-01 17:14   ` Greg Sabino Mullane <[email protected]>
2025-03-03 03:44     ` me nefcanto <[email protected]>
2025-03-03 03:54       ` Christophe Pettus <[email protected]>
2025-03-03 04:07       ` Tom Lane <[email protected]>
2025-03-03 05:33       ` Adrian Klaver <[email protected]>
2025-03-03 08:55       ` Laurenz Albe <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox