MIME-Version: 1.0
References: <CAM+6J95roDtn05fcobKJjWhG1YdJYwzoUFhjaj-dVOCw+9m6OQ@mail.gmail.com>
 <CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>
In-Reply-To: <CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com>
From: Muhammad Usman Khan <usman.k@bitnine.net>
Date: Tue, 15 Oct 2024 08:39:19 +0500
Message-ID: <CAPnRvGt12Kf4aBaU8kHKG-0w0BW-o0LnNgw8ZFeY4F+wvzcCWA@mail.gmail.com>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
To: "Wong, Kam Fook (TR Technology)" <kamfook.wong@thomsonreuters.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000007c5a8b06247bb3b3"
Archived-At: <https://www.postgresql.org/message-id/CAPnRvGt12Kf4aBaU8kHKG-0w0BW-o0LnNgw8ZFeY4F%2BwvzcCWA%40mail.gmail.com>
Precedence: bulk

--0000000000007c5a8b06247bb3b3
Content-Type: text/plain; charset="UTF-8"

Hi,
There are many methods to achieve this and one of them is pg_bulkload
utility as described in previous email but I always preferred using python
multiprocessing which I think is more efficient. Below is the code which
you can modify as per your requirement:

import multiprocessing
import psycopg2

def insert_partition(date_range):
    conn = psycopg2.connect("dbname=your_db user=your_user
password=your_password")
    cur = conn.cursor()
    query = f"""
        INSERT INTO partitioned_table (column1, column2, ...)
        SELECT column1, column2, ...
        FROM source_table
        WHERE partition_key BETWEEN '{date_range[0]}' AND '{date_range[1]}';
    """
    cur.execute(query)
    conn.commit()
    cur.close()
    conn.close()

if __name__ == "__main__":
    ranges = [
        ('2024-01-01', '2024-03-31'),
        ('2024-04-01', '2024-06-30'),
        # Add more ranges as needed
    ]
    pool = multiprocessing.Pool(processes=4)  # Adjust based on CPU cores
    pool.map(insert_partition, ranges)
    pool.close()
    pool.join()


On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) <
kamfook.wong@thomsonreuters.com> wrote:

> I am trying to copy a table (Postgres) that is close to 1 billion rows
> into a Partition table (Postgres) within the same DB.  What is the fastest
> way to copy the data?   This table has 37 columns where some of which are
> text data types.
>
> Thank you
> Kam Fook Wong
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>

--0000000000007c5a8b06247bb3b3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<br>There are many methods to achieve=C2=A0this and one=
 of them is pg_bulkload utility as described in previous email but I always=
 preferred using python multiprocessing which I think is more efficient. Be=
low is the code which you can modify=C2=A0as per your requirement:<br><br>i=
mport multiprocessing<br>import psycopg2<br><br>def insert_partition(date_r=
ange):<br>=C2=A0 =C2=A0 conn =3D psycopg2.connect(&quot;dbname=3Dyour_db us=
er=3Dyour_user password=3Dyour_password&quot;)<br>=C2=A0 =C2=A0 cur =3D con=
n.cursor()<br>=C2=A0 =C2=A0 query =3D f&quot;&quot;&quot;<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 INSERT INTO partitioned_table (column1, column2, ...)<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 SELECT column1, column2, ...<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 FROM source_table<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE partition=
_key BETWEEN &#39;{date_range[0]}&#39; AND &#39;{date_range[1]}&#39;;<br>=
=C2=A0 =C2=A0 &quot;&quot;&quot;<br>=C2=A0 =C2=A0 cur.execute(query)<br>=C2=
=A0 =C2=A0 conn.commit()<br>=C2=A0 =C2=A0 cur.close()<br>=C2=A0 =C2=A0 conn=
.close()<br><br>if __name__ =3D=3D &quot;__main__&quot;:<br>=C2=A0 =C2=A0 r=
anges =3D [<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 (&#39;2024-01-01&#39;, &#39;2024=
-03-31&#39;),<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 (&#39;2024-04-01&#39;, &#39;20=
24-06-30&#39;),<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 # Add more ranges as needed<=
br>=C2=A0 =C2=A0 ]<br>=C2=A0 =C2=A0 pool =3D multiprocessing.Pool(processes=
=3D4) =C2=A0# Adjust based on CPU cores<br>=C2=A0 =C2=A0 pool.map(insert_pa=
rtition, ranges)<br>=C2=A0 =C2=A0 pool.close()<br>=C2=A0 =C2=A0 pool.join()=
<br><div><div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail=
_signature"><div dir=3D"ltr"><span></span><div><br></div></div></div></div>=
<br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at=
tr">On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) &lt;<a hre=
f=3D"mailto:kamfook.wong@thomsonreuters.com">kamfook.wong@thomsonreuters.co=
m</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
><div class=3D"msg-5834309040141361787">


<div lang=3D"EN-US" style=3D"overflow-wrap: break-word;">
<div class=3D"m_-5834309040141361787WordSection1">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt">I am trying to copy a=
 table (Postgres) that is close to 1 billion rows into a Partition table (P=
ostgres) within the same DB.=C2=A0 What is the fastest way to copy the data=
?=C2=A0 =C2=A0This table has 37 columns where some
 of which are text data types.<br>
<br>
Thank you<br>
Kam Fook Wong<br>
<br>
<br>
<u></u><u></u></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
This e-mail is for the sole use of the intended recipient and contains info=
rmation that may be privileged and/or confidential. If you are not an inten=
ded recipient, please notify the sender by return e-mail and delete this e-=
mail and any attachments. Certain
 required legal entity disclosures can be accessed on our website: <a href=
=3D"https://www.thomsonreuters.com/en/resources/disclosures.html" target=3D=
"_blank">https://www.thomsonreuters.com/en/resources/disclosures.html</a>
</div>

</div></blockquote></div>

--0000000000007c5a8b06247bb3b3--