Re: How to Copy/Load 1 billions rows into a Partition Tables Fast

public inbox for [email protected]  
help / color / mirror / Atom feed

From: David Rowley <[email protected]>
To: Wong, Kam Fook (TR Technology) <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: Tue, 15 Oct 2024 17:57:46 +1300
Message-ID: <CAApHDvre3nuCLA4DVfYSW_EAiuV9FKaZV6k7_dPMSa9d6_Mg2Q@mail.gmail.com> (raw)
In-Reply-To: <CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>
References: <CAM+6J95roDtn05fcobKJjWhG1YdJYwzoUFhjaj-dVOCw+9m6OQ@mail.gmail.com>
	<CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>

On Tue, 15 Oct 2024 at 06:59, Wong, Kam Fook (TR Technology)
<[email protected]> wrote:
> I am trying to copy a table (Postgres) that is close to 1 billion rows into a Partition table (Postgres) within the same DB.  What is the fastest way to copy the data?   This table has 37 columns where some of which are text data types.

Is the purpose of this question because you're partitioning an
existing table?  If so, you might want to consider if speed is the
biggest consideration to the requirements. It is possible, for
example, to partition a table "online" by using table inheritance as
an intermediate way to partition the table and migrate the rows in
smaller batches into inheritance child tables with CHECK constraints
backing up the partition constraint. You can use a CTE with a DELETE
.. WHERE <clause to some small batch of rows> RETURNING with an INSERT
INTO new_table SELECT * FROM cte;.  Once the inheritance parent table
is empty, you can then consider rearranging the inheritance hierarchy
into a partitioned table and its partitions. The CHECK constraint will
allow the tables to be ATTACHed as partitions to a new partitioned
table without having to scan each partition to ensure no rows violate
the partition constraint. If done correctly, the only blocking
operation done is some DDL which includes renaming a table and
attaching all the partitions. All of that should be metadata-only
operations. You'll want to rehearse the migration a few times away
from production to help ensure it'll run smoothly on the day.

I'm not familiar with pg_bulkload so can't comment on the other
suggestions, however, I'd be surprised if exporting the data out of
and back into PostgreSQL would be faster than having it remain inside
PostgreSQL. Not exporting/importing means you don't need to call
output and input functions for every row and column. If you didn't
want to go down the inheritance table as an intermediate step, then
you might find it's quite fast to start up a series of parallel jobs
to INSERT INTO partition_name SELECT * FROM original_table WHERE <rows
for this partition>;

David

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
  In-Reply-To: <CAApHDvre3nuCLA4DVfYSW_EAiuV9FKaZV6k7_dPMSa9d6_Mg2Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox