Re: How to Copy/Load 1 billions rows into a Partition Tables Fast

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Muhammad Usman Khan <[email protected]>
To: Wong, Kam Fook (TR Technology) <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: Tue, 15 Oct 2024 08:39:19 +0500
Message-ID: <CAPnRvGt12Kf4aBaU8kHKG-0w0BW-o0LnNgw8ZFeY4F+wvzcCWA@mail.gmail.com> (raw)
In-Reply-To: <CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>
References: <CAM+6J95roDtn05fcobKJjWhG1YdJYwzoUFhjaj-dVOCw+9m6OQ@mail.gmail.com>
	<CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>

Hi,
There are many methods to achieve this and one of them is pg_bulkload
utility as described in previous email but I always preferred using python
multiprocessing which I think is more efficient. Below is the code which
you can modify as per your requirement:

import multiprocessing
import psycopg2

def insert_partition(date_range):
    conn = psycopg2.connect("dbname=your_db user=your_user
password=your_password")
    cur = conn.cursor()
    query = f"""
        INSERT INTO partitioned_table (column1, column2, ...)
        SELECT column1, column2, ...
        FROM source_table
        WHERE partition_key BETWEEN '{date_range[0]}' AND '{date_range[1]}';
    """
    cur.execute(query)
    conn.commit()
    cur.close()
    conn.close()

if __name__ == "__main__":
    ranges = [
        ('2024-01-01', '2024-03-31'),
        ('2024-04-01', '2024-06-30'),
        # Add more ranges as needed
    ]
    pool = multiprocessing.Pool(processes=4)  # Adjust based on CPU cores
    pool.map(insert_partition, ranges)
    pool.close()
    pool.join()



On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) <
[email protected]> wrote:

> I am trying to copy a table (Postgres) that is close to 1 billion rows
> into a Partition table (Postgres) within the same DB.  What is the fastest
> way to copy the data?   This table has 37 columns where some of which are
> text data types.
>
> Thank you
> Kam Fook Wong
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
  In-Reply-To: <CAPnRvGt12Kf4aBaU8kHKG-0w0BW-o0LnNgw8ZFeY4F+wvzcCWA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox