Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0YPN-00C2fp-Sb for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 03:39:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t0YPK-002Zjb-At for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 03:39:38 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0YPJ-002ZhQ-Qi for pgsql-general@lists.postgresql.org; Tue, 15 Oct 2024 03:39:38 +0000 Received: from mail-pf1-x430.google.com ([2607:f8b0:4864:20::430]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t0YPG-0017JQ-2D for pgsql-general@postgresql.org; Tue, 15 Oct 2024 03:39:37 +0000 Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-71e467c3996so2250876b3a.2 for ; Mon, 14 Oct 2024 20:39:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitnine-net.20230601.gappssmtp.com; s=20230601; t=1728963571; x=1729568371; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=817k5rN0lT0l8Jk+P3PH15Qw6dbTxU4eJ+Ix1AgEob0=; b=tkmgcGWd1d4akioq1lKUfSsB/eY9C0RtX6qmAL18MCG4QGSGV4+A238R1Dvypx1vgN 0Crm4mdmwwlg9f9oyQ5oXNRQqiHD75QXb5veuVCUCGKtwJ2Ny4x37oTDbnxamfN3yrb2 TtE3V3ilPTsyZBwan1iNXB5LXYr2s7j/E8qg7+SPw/tNpcUaqH6hDc2genSfYFGLlSQG 9EUU6o5VqW/zU+/qZZaB3vnOA1k0AxPFPicOOdkAPMw0+TrVRZydb91d2Lf9D8G+FfzO TyKTzMWn8nJ2LKUTnbBc3qJN+fjVI9xAmmDz4AP1fvDftigKwl85RqLTPQxSUxizX6iH zfXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728963571; x=1729568371; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=817k5rN0lT0l8Jk+P3PH15Qw6dbTxU4eJ+Ix1AgEob0=; b=mLlINReBTzjuUStpjmJjz/uq0p6NbLB2rp0vAfV7ltELmiKOGYcOnafyce2zkAueO5 GnmK4UI2xUiSHFxS2hoT5wIlY1KMcA+PpD6c1cQ3vNIKIc2xMBqP1IoySaz+SItdyYJK sXH/zAqNJAMfVg5A4geQeZWhd6nVKh2pSk3dBs8t8WEORAkMaWdLXd9LI9/HXGVnYtk/ wJ1P+Jqw3BYTWvefFxmRex4zUlYbk9KQoCZNjyCT8kVHdaTW//PjEktKkuPCr+RWIfSn 9Ac9L23I0pedKyXXC3CoKLESRlpF0i0sh7Tdg20aUk2Sm+0evsKKcpllDscgIAYppF4c t8BQ== X-Gm-Message-State: AOJu0Yx/ztp/Z6IsoffjGMcthzv/oXAHUhs1qOdcwRZuG9suaSo8uhFb seBq1NmWzpn3lhMzj5quY3dHpoq2CTYod694/AAfJuuQwad6068KwQ88SjoNdW7rvFXLByM+QVu aV9qik8HRyTbNP3ruRBahau9eVKgIJYEQwzkRfw== X-Google-Smtp-Source: AGHT+IGM3zwR5w2pEXaa2LewXR57SahIem70LNre3POqyyRTiMAwxB+/PXlEpIeBA7glU1Sbfpi437opTVra/7kkpfo= X-Received: by 2002:a05:6a00:9282:b0:717:9154:b5d6 with SMTP id d2e1a72fcca58-71e37f4ed89mr22233578b3a.22.1728963571104; Mon, 14 Oct 2024 20:39:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Muhammad Usman Khan Date: Tue, 15 Oct 2024 08:39:19 +0500 Message-ID: Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast To: "Wong, Kam Fook (TR Technology)" Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000007c5a8b06247bb3b3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007c5a8b06247bb3b3 Content-Type: text/plain; charset="UTF-8" Hi, There are many methods to achieve this and one of them is pg_bulkload utility as described in previous email but I always preferred using python multiprocessing which I think is more efficient. Below is the code which you can modify as per your requirement: import multiprocessing import psycopg2 def insert_partition(date_range): conn = psycopg2.connect("dbname=your_db user=your_user password=your_password") cur = conn.cursor() query = f""" INSERT INTO partitioned_table (column1, column2, ...) SELECT column1, column2, ... FROM source_table WHERE partition_key BETWEEN '{date_range[0]}' AND '{date_range[1]}'; """ cur.execute(query) conn.commit() cur.close() conn.close() if __name__ == "__main__": ranges = [ ('2024-01-01', '2024-03-31'), ('2024-04-01', '2024-06-30'), # Add more ranges as needed ] pool = multiprocessing.Pool(processes=4) # Adjust based on CPU cores pool.map(insert_partition, ranges) pool.close() pool.join() On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) < kamfook.wong@thomsonreuters.com> wrote: > I am trying to copy a table (Postgres) that is close to 1 billion rows > into a Partition table (Postgres) within the same DB. What is the fastest > way to copy the data? This table has 37 columns where some of which are > text data types. > > Thank you > Kam Fook Wong > > > This e-mail is for the sole use of the intended recipient and contains > information that may be privileged and/or confidential. If you are not an > intended recipient, please notify the sender by return e-mail and delete > this e-mail and any attachments. Certain required legal entity disclosures > can be accessed on our website: > https://www.thomsonreuters.com/en/resources/disclosures.html > --0000000000007c5a8b06247bb3b3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
There are many methods to achieve=C2=A0this and one= of them is pg_bulkload utility as described in previous email but I always= preferred using python multiprocessing which I think is more efficient. Be= low is the code which you can modify=C2=A0as per your requirement:

i= mport multiprocessing
import psycopg2

def insert_partition(date_r= ange):
=C2=A0 =C2=A0 conn =3D psycopg2.connect("dbname=3Dyour_db us= er=3Dyour_user password=3Dyour_password")
=C2=A0 =C2=A0 cur =3D con= n.cursor()
=C2=A0 =C2=A0 query =3D f"""
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 INSERT INTO partitioned_table (column1, column2, ...)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 SELECT column1, column2, ...
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 FROM source_table
=C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE partition= _key BETWEEN '{date_range[0]}' AND '{date_range[1]}';
= =C2=A0 =C2=A0 """
=C2=A0 =C2=A0 cur.execute(query)
=C2= =A0 =C2=A0 conn.commit()
=C2=A0 =C2=A0 cur.close()
=C2=A0 =C2=A0 conn= .close()

if __name__ =3D=3D "__main__":
=C2=A0 =C2=A0 r= anges =3D [
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ('2024-01-01', '2024= -03-31'),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ('2024-04-01', '20= 24-06-30'),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 # Add more ranges as needed<= br>=C2=A0 =C2=A0 ]
=C2=A0 =C2=A0 pool =3D multiprocessing.Pool(processes= =3D4) =C2=A0# Adjust based on CPU cores
=C2=A0 =C2=A0 pool.map(insert_pa= rtition, ranges)
=C2=A0 =C2=A0 pool.close()
=C2=A0 =C2=A0 pool.join()=

=

On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) <kamfook.wong@thomsonreuters.co= m> wrote:

I am trying to copy a= table (Postgres) that is close to 1 billion rows into a Partition table (P= ostgres) within the same DB.=C2=A0 What is the fastest way to copy the data= ?=C2=A0 =C2=A0This table has 37 columns where some of which are text data types.

Thank you
Kam Fook Wong


This e-mail is for the sole use of the intended recipient and contains info= rmation that may be privileged and/or confidential. If you are not an inten= ded recipient, please notify the sender by return e-mail and delete this e-= mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
--0000000000007c5a8b06247bb3b3--