Re: Faster data load - Ron Johnson

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Ron Johnson <[email protected]>
To: pgsql-general <[email protected]>
Subject: Re: Faster data load
Date: Thu, 5 Sep 2024 17:45:12 -0400
Message-ID: <CANzqJaCxizjVfdiDvxeGyrWUisefQ3y9Z3DVdA5rEU4PC5SLJw@mail.gmail.com> (raw)
In-Reply-To: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
References: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>

On Thu, Sep 5, 2024 at 4:14 PM Lok P <[email protected]> wrote:

> Hi,
>
> We are having a requirement to create approx 50 billion rows in a
> partition table(~1 billion rows per partition, 200+gb size daily
> partitions) for a performance test. We are currently using ' insert into
> <target table_partition> select.. From <source_table_partition> or <some
> transformed query>;' method . We have dropped all indexes and constraints
> First and then doing the load. Still it's taking 2-3 hours to populate one
> partition.
>

At three hours, that's 92,593 records/second.  Seems pretty slow.

How much of that time is taken by <some transformed query>?
How big are the records?
How fast is the hardware?

Is there a faster way to achieve this?
>

Testing is the only way to know for sure.

> Few teammate suggesting to use copy command and use file load instead,
> which will be faster. So I wanted to understand, how different things it
> does behind the scenes as compared to insert as select command? As because
> it only deals with sql engine only.
>

COPY is highly optimized for buffered operation.  INSERT... maybe not so
much.

But if the source data is already in a table, that would require piping the
data to stdout and then back into the database.

psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY
some_table FROM STDOUT;".  Use binary mode, so text conversion isn't
required.

Maybe that's faster, maybe not.

Additionally, when we were trying to create indexes post data load on one
> partition, it took 30+ minutes. Any possible way to make it faster?
>
> Is there any way to drive the above things in parallel by utilizing full
> database resources?
>

Put the destination tables in a different tablespace on a different
controller.

> It's postgres 15.4
>

Why not 15.8?

-- 
Death to America, and butter sauce.
Iraq lobster!

view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Faster data load
  In-Reply-To: <CANzqJaCxizjVfdiDvxeGyrWUisefQ3y9Z3DVdA5rEU4PC5SLJw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox