public inbox for [email protected]  
help / color / mirror / Atom feed
From: Carl L <[email protected]>
To: [email protected]
Subject: Multiple COPY statements for one table vs one for ~half a billion records
Date: Thu, 4 Apr 2024 11:03:56 -0700
Message-ID: <CAPtGvF9i5XunrgFUWYrCLnmnD0akdLKBQLdO1qsz9C5nz0m3ZQ@mail.gmail.com> (raw)

Hi there,

I have around half a billion records that are being generated from a back
end that are split into 80 threads (one per core) and I'm performing a copy
from memory ( from stdin binary) into Postgres from each of these threads -
i.e. there are 80 COPY statements being generated for one table that are
running concurrently. I can see each of the Postgres processes sitting at
around 15% CPU usage.

These are all also in the same transaction - I am the only one connected,
so it's not an issue to hold a big transaction.

I can see that many of the Postgres threads have a wait event "LWLock:
BufferContent", which I assume means that they are waiting for each other
before they can write to the table. Therefore, would it be more efficient
to combine all of these and put them into one COPY statement?

Thanks!


view thread (2+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Multiple COPY statements for one table vs one for ~half a billion records
  In-Reply-To: <CAPtGvF9i5XunrgFUWYrCLnmnD0akdLKBQLdO1qsz9C5nz0m3ZQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox