public inbox for [email protected]  
help / color / mirror / Atom feed
Multiple COPY statements for one table vs one for ~half a billion records
2+ messages / 2 participants
[nested] [flat]

* Multiple COPY statements for one table vs one for ~half a billion records
@ 2024-04-04 18:03 Carl L <[email protected]>
  2024-04-04 18:15 ` Re: Multiple COPY statements for one table vs one for ~half a billion records Ron Johnson <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Carl L @ 2024-04-04 18:03 UTC (permalink / raw)
  To: pgsql-general

Hi there,

I have around half a billion records that are being generated from a back
end that are split into 80 threads (one per core) and I'm performing a copy
from memory ( from stdin binary) into Postgres from each of these threads -
i.e. there are 80 COPY statements being generated for one table that are
running concurrently. I can see each of the Postgres processes sitting at
around 15% CPU usage.

These are all also in the same transaction - I am the only one connected,
so it's not an issue to hold a big transaction.

I can see that many of the Postgres threads have a wait event "LWLock:
BufferContent", which I assume means that they are waiting for each other
before they can write to the table. Therefore, would it be more efficient
to combine all of these and put them into one COPY statement?

Thanks!


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Multiple COPY statements for one table vs one for ~half a billion records
  2024-04-04 18:03 Multiple COPY statements for one table vs one for ~half a billion records Carl L <[email protected]>
@ 2024-04-04 18:15 ` Ron Johnson <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Ron Johnson @ 2024-04-04 18:15 UTC (permalink / raw)
  To: pgsql-general

On Thu, Apr 4, 2024 at 2:04 PM Carl L <[email protected]> wrote:

> Hi there,
>
> I have around half a billion records that are being generated from a back
> end that are split into 80 threads (one per core) and I'm performing a copy
> from memory ( from stdin binary) into Postgres from each of these threads -
> i.e. there are 80 COPY statements being generated for one table that are
> running concurrently. I can see each of the Postgres processes sitting at
> around 15% CPU usage.
>

Is the target table partitioned in the same way that the input data is
split?

That would make things faster...


> These are all also in the same transaction - I am the only one connected,
> so it's not an issue to hold a big transaction.
>

Unless it fills up your WAL partition.

>


^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2024-04-04 18:15 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 18:03 Multiple COPY statements for one table vs one for ~half a billion records Carl L <[email protected]>
2024-04-04 18:15 ` Ron Johnson <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox