MIME-Version: 1.0
References: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
In-Reply-To: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
From: Muhammad Usman Khan <usman.k@bitnine.net>
Date: Fri, 6 Sep 2024 08:49:51 +0500
Message-ID: <CAPnRvGswoifDeosThKM5zkFZzph5u86ozYkrETpGD=oig14pOA@mail.gmail.com>
Subject: Re: Faster data load
To: Lok P <loknath.73@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000047a30506216b4dea"
Archived-At: <https://www.postgresql.org/message-id/CAPnRvGswoifDeosThKM5zkFZzph5u86ozYkrETpGD%3Doig14pOA%40mail.gmail.com>
Precedence: bulk

--00000000000047a30506216b4dea
Content-Type: text/plain; charset="UTF-8"

Hi,

You can use pg_partman. If your table is partitioned, you can manage
partitions in parallel by distributing the load across partitions
concurrently. Or you can use citus. It can be an excellent solution,
especially for handling large data volumes and parallelizing data operations


On Fri, 6 Sept 2024 at 01:14, Lok P <loknath.73@gmail.com> wrote:

> Hi,
>
> We are having a requirement to create approx 50 billion rows in a
> partition table(~1 billion rows per partition, 200+gb size daily
> partitions) for a performance test. We are currently using ' insert into
> <target table_partition> select.. From <source_table_partition> or <some
> transformed query>;' method . We have dropped all indexes and constraints
> First and then doing the load. Still it's taking 2-3 hours to populate one
> partition. Is there a faster way to achieve this?
>
> Few teammate suggesting to use copy command and use file load instead,
> which will be faster. So I wanted to understand, how different things it
> does behind the scenes as compared to insert as select command? As because
> it only deals with sql engine only.
>
> Additionally, when we were trying to create indexes post data load on one
> partition, it took 30+ minutes. Any possible way to make it faster?
>
> Is there any way to drive the above things in parallel by utilizing full
> database resources?
>
> It's postgres 15.4
>
> Regards
> Lok
>

--00000000000047a30506216b4dea
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div dir=3D"ltr" class=3D"gmail_signature" data-smart=
mail=3D"gmail_signature"><div dir=3D"ltr"><div>Hi,<br><br>You can use pg_pa=
rtman. If your table is partitioned, you can manage partitions in parallel =
by distributing the load across partitions concurrently. Or you can use cit=
us. It can be an excellent solution, especially for handling large data vol=
umes and parallelizing data operations<br></div></div></div></div><br></div=
><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fr=
i, 6 Sept 2024 at 01:14, Lok P &lt;<a href=3D"mailto:loknath.73@gmail.com">=
loknath.73@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex"><div dir=3D"auto">Hi,=C2=A0<div dir=3D"auto"><br><div d=
ir=3D"auto">We are having a requirement to create approx 50 billion rows in=
 a partition table(~1 billion rows per partition, 200+gb size daily partiti=
ons) for a performance test. We are currently using &#39; insert into &lt;t=
arget table_partition&gt; select.. From &lt;source_table_partition&gt; or &=
lt;some transformed query&gt;;&#39; method . We have dropped all indexes an=
d constraints First and then doing the load. Still it&#39;s taking 2-3 hour=
s to populate one partition. Is there a faster way to achieve this?=C2=A0<d=
iv dir=3D"auto"><br></div><div dir=3D"auto">Few teammate suggesting to use =
 copy command and use file load instead, which will be faster. So I wanted =
to understand, how different things it does behind the scenes as compared t=
o insert as select command? As because it only deals with sql engine only.=
=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto">Additionally, whe=
n we were trying to create indexes post data load on one partition, it took=
 30+ minutes. Any possible way to make it faster?=C2=A0</div><div dir=3D"au=
to"><br></div><div dir=3D"auto">Is there any way to drive the above things =
in parallel by utilizing full database resources?</div><div dir=3D"auto"><b=
r></div><div dir=3D"auto">It&#39;s postgres 15.4</div><div dir=3D"auto"><br=
></div><div dir=3D"auto">Regards</div><div dir=3D"auto">Lok</div></div></di=
v></div>
</blockquote></div>

--00000000000047a30506216b4dea--