MIME-Version: 1.0
References: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
In-Reply-To: <CAKna9VaVsDzfOfOGu1+grStp9BBHFMKrH5DCEbbtGcQUWJ74KQ@mail.gmail.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Thu, 5 Sep 2024 17:45:12 -0400
Message-ID: <CANzqJaCxizjVfdiDvxeGyrWUisefQ3y9Z3DVdA5rEU4PC5SLJw@mail.gmail.com>
Subject: Re: Faster data load
To: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000003a9fda062166356b"
Archived-At: <https://www.postgresql.org/message-id/CANzqJaCxizjVfdiDvxeGyrWUisefQ3y9Z3DVdA5rEU4PC5SLJw%40mail.gmail.com>
Precedence: bulk

--0000000000003a9fda062166356b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 5, 2024 at 4:14=E2=80=AFPM Lok P <loknath.73@gmail.com> wrote:

> Hi,
>
> We are having a requirement to create approx 50 billion rows in a
> partition table(~1 billion rows per partition, 200+gb size daily
> partitions) for a performance test. We are currently using ' insert into
> <target table_partition> select.. From <source_table_partition> or <some
> transformed query>;' method . We have dropped all indexes and constraints
> First and then doing the load. Still it's taking 2-3 hours to populate on=
e
> partition.
>

At three hours, that's 92,593 records/second.  Seems pretty slow.

How much of that time is taken by <some transformed query>?
How big are the records?
How fast is the hardware?

Is there a faster way to achieve this?
>

Testing is the only way to know for sure.


> Few teammate suggesting to use copy command and use file load instead,
> which will be faster. So I wanted to understand, how different things it
> does behind the scenes as compared to insert as select command? As becaus=
e
> it only deals with sql engine only.
>

COPY is highly optimized for buffered operation.  INSERT... maybe not so
much.

But if the source data is already in a table, that would require piping the
data to stdout and then back into the database.

psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY
some_table FROM STDOUT;".  Use binary mode, so text conversion isn't
required.

Maybe that's faster, maybe not.

Additionally, when we were trying to create indexes post data load on one
> partition, it took 30+ minutes. Any possible way to make it faster?
>
> Is there any way to drive the above things in parallel by utilizing full
> database resources?
>

Put the destination tables in a different tablespace on a different
controller.


> It's postgres 15.4
>

Why not 15.8?

--=20
Death to America, and butter sauce.
Iraq lobster!

--0000000000003a9fda062166356b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Thu, Sep 5, 2024 at 4:14=E2=80=AFPM Lo=
k P &lt;<a href=3D"mailto:loknath.73@gmail.com">loknath.73@gmail.com</a>&gt=
; wrote:<br></div><div class=3D"gmail_quote"><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex"><div dir=3D"auto">Hi,=C2=A0<div dir=3D"auto"><br><div d=
ir=3D"auto">We are having a requirement to create approx 50 billion rows in=
 a partition table(~1 billion rows per partition, 200+gb size daily partiti=
ons) for a performance test. We are currently using &#39; insert into &lt;t=
arget table_partition&gt; select.. From &lt;source_table_partition&gt; or &=
lt;some transformed query&gt;;&#39; method . We have dropped all indexes an=
d constraints First and then doing the load. Still it&#39;s taking 2-3 hour=
s to populate one partition.</div></div></div></blockquote><div><br></div><=
div>At three hours, that&#39;s 92,593 records/second.=C2=A0 Seems pretty sl=
ow.</div><div><br></div><div>How much of that time is taken by &lt;some tra=
nsformed query&gt;?</div><div>How big are the records?</div><div>How fast i=
s the hardware?<br></div><div><br></div><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pad=
ding-left:1ex"><div dir=3D"auto"><div dir=3D"auto"><div dir=3D"auto"> Is th=
ere a faster way to achieve this?=C2=A0</div></div></div></blockquote><div>=
<br></div><div>Testing is the only way to know for sure.</div><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><di=
v dir=3D"auto"><div dir=3D"auto"><div dir=3D"auto">Few teammate suggesting =
to use  copy command and use file load instead, which will be faster. So I =
wanted to understand, how different things it does behind the scenes as com=
pared to insert as select command? As because it only deals with sql engine=
 only.=C2=A0<br></div></div></div></div></blockquote><div><br></div><div>CO=
PY is highly optimized for buffered operation.=C2=A0 INSERT... maybe not so=
 much.</div><div><br></div><div>But if the source data is already in a tabl=
e, that would require piping the data to stdout and then back into the data=
base.<br></div><div><br></div><div>psql appdb -c &quot;COPY (SELECT ...) TO=
 STDOUT;&quot; | psql appdb -c &quot;COPY some_table FROM STDOUT;&quot;.=C2=
=A0 Use binary mode, so text conversion isn&#39;t required.</div><div><br><=
/div><div>Maybe that&#39;s faster, maybe not.</div><div><br></div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto=
"><div dir=3D"auto"><div dir=3D"auto"></div><div dir=3D"auto">Additionally,=
 when we were trying to create indexes post data load on one partition, it =
took 30+ minutes. Any possible way to make it faster?=C2=A0<br></div><div d=
ir=3D"auto"><br></div><div dir=3D"auto">Is there any way to drive the above=
 things in parallel by utilizing full database resources?</div></div></div>=
</div></blockquote><div><br></div><div>Put the destination tables in a diff=
erent tablespace on a different controller.</div><div>=C2=A0</div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto=
"><div dir=3D"auto"><div dir=3D"auto">It&#39;s postgres 15.4<br></div></div=
></div></div></blockquote><div><br></div><div>Why not 15.8?</div></div><div=
><br></div><span class=3D"gmail_signature_prefix">-- </span><br><div dir=3D=
"ltr" class=3D"gmail_signature"><div dir=3D"ltr">Death to America, and butt=
er sauce.<div>Iraq lobster!</div></div></div></div>

--0000000000003a9fda062166356b--