public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter J. Holzer <[email protected]>
To: [email protected]
Subject: Re: Faster data load
Date: Sun, 8 Sep 2024 19:45:39 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
References: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>

On 2024-09-06 01:44:00 +0530, Lok P wrote:
> We are having a requirement to create approx 50 billion rows in a partition
> table(~1 billion rows per partition, 200+gb size daily partitions) for a
> performance test. We are currently using ' insert into <target table_partition>
> select.. From <source_table_partition> or <some transformed query>;' method .
> We have dropped all indexes and constraints First and then doing the load.
> Still it's taking 2-3 hours to populate one partition.

That seems quite slow. Is the table very wide or does it have a large
number of indexes?

> Is there a faster way to achieve this? 
> 
> Few teammate suggesting to use copy command and use file load instead, which
> will be faster.

I doubt that.

I benchmarked several strategies for populating tables 5 years ago and
(for my test data and on our hardware at the time - YMMV) s simple
INSERT ... SELECT was more than twice as fast as 8 parallel COPY
operations (and about 8 times as fast as a single COPY).

Details will have changed since then (I should rerun that benchmark on
a current system), but I'd be surprised if COPY became that much faster
relative to INSERT ... SELECT.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Faster data load
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox