Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smKIF-004teF-Cs for pgsql-general@arkaria.postgresql.org; Thu, 05 Sep 2024 21:45:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1smKIC-000lnH-VU for pgsql-general@arkaria.postgresql.org; Thu, 05 Sep 2024 21:45:29 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smKIC-000ln7-Gk for pgsql-general@lists.postgresql.org; Thu, 05 Sep 2024 21:45:29 +0000 Received: from mail-oa1-x2e.google.com ([2001:4860:4864:20::2e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1smKIA-000Lhv-JV for pgsql-general@lists.postgresql.org; Thu, 05 Sep 2024 21:45:28 +0000 Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-2781cb2800eso828880fac.1 for ; Thu, 05 Sep 2024 14:45:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725572724; x=1726177524; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=sE9BhGZhQvszELmjstrjnuzc14R1Ss9RmCEyIhO+OdQ=; b=LgGuqvV6YN2Tsf8v9gG8rN3f6yTKTu1BAki1VIyg1hUnB2O9mBxUlRl1bZYunDYcLv Z0GDNQ9ZadC6nQK67A3IPCNX/PV1gvE+eeAMUMKdmuUZuFUukSVViJQNd8hg2/x/96us wguwiCJixcU3GeRBkHqD6ctC5FKS4II1ppWEnd8DDSQnH0UbqCkT8Kvn0/i5N3IKwxCT 0P6pYNF6z56nTq25BQOloIEE3qSl8k6Wlo/hLbY4WX1H8GFtX5XWX6ppSIQng3gpHv3a 6yrgZiZYCDRvc8hHGPBP8WCJwIkpqlOySDqAz8GgVzGmy79VVrlviorfc2xMqa3fwbJh DssA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725572724; x=1726177524; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sE9BhGZhQvszELmjstrjnuzc14R1Ss9RmCEyIhO+OdQ=; b=V/sJm92VAWBU85BesmXcPEmfqcptUVWFJJedVEuc6IZEMVheRaVYmdyiflNErzez3k SlNGB4osfyMRMSOHDohY6EgBBpgJYmvGv/yfyFXfKdftOsC09NjOeUAktYkHiqCBuaq5 3kpiVomtD0TcuMBMB4WDnp7/s76iqBaSo4jetiHoTHLff8v+WJj6J05xQTHvHeo3307d XY4boN2wA7RV/2M/9vK75OyxwmrAH5i2mSi8fJwJOfvdrr9Ba2YyORHwjS21o2l4iBqm gA4k26702KrRCWO98I4s+dWOPeOjzo8hEccEMKvkm+ebQnJjOx2ZfkQEguKN08V5/xFC PSgg== X-Gm-Message-State: AOJu0YzLXuozZe1puMyN1NAyqFwZuYNPBjvx6S93WWjF8LloOPBdYp2m bdQgZLTlTNamlUlMHUbCtBNZ5CWNwbQgWuz6AYNevem3GujbI8lYW7UCv2ZPbjO2kBOm7lbWBjv iT6y+tItQBNXbbBqkiMl2MdeDqWtvZL+d X-Google-Smtp-Source: AGHT+IFSB3ZTu2JMbIlJX9UzWIpNi5DjuN1GlEfAiU1lY1cZsJKK8+xCc7qjgOYVnJXOKgFEV9hb5adzb62+5DF4l9E= X-Received: by 2002:a05:6870:8a11:b0:25e:14e9:10c9 with SMTP id 586e51a60fabf-27b82db89fcmr947540fac.6.1725572723677; Thu, 05 Sep 2024 14:45:23 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Thu, 5 Sep 2024 17:45:12 -0400 Message-ID: Subject: Re: Faster data load To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000003a9fda062166356b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003a9fda062166356b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Sep 5, 2024 at 4:14=E2=80=AFPM Lok P wrote: > Hi, > > We are having a requirement to create approx 50 billion rows in a > partition table(~1 billion rows per partition, 200+gb size daily > partitions) for a performance test. We are currently using ' insert into > select.. From or transformed query>;' method . We have dropped all indexes and constraints > First and then doing the load. Still it's taking 2-3 hours to populate on= e > partition. > At three hours, that's 92,593 records/second. Seems pretty slow. How much of that time is taken by ? How big are the records? How fast is the hardware? Is there a faster way to achieve this? > Testing is the only way to know for sure. > Few teammate suggesting to use copy command and use file load instead, > which will be faster. So I wanted to understand, how different things it > does behind the scenes as compared to insert as select command? As becaus= e > it only deals with sql engine only. > COPY is highly optimized for buffered operation. INSERT... maybe not so much. But if the source data is already in a table, that would require piping the data to stdout and then back into the database. psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY some_table FROM STDOUT;". Use binary mode, so text conversion isn't required. Maybe that's faster, maybe not. Additionally, when we were trying to create indexes post data load on one > partition, it took 30+ minutes. Any possible way to make it faster? > > Is there any way to drive the above things in parallel by utilizing full > database resources? > Put the destination tables in a different tablespace on a different controller. > It's postgres 15.4 > Why not 15.8? --=20 Death to America, and butter sauce. Iraq lobster! --0000000000003a9fda062166356b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Sep 5, 2024 at 4:14=E2=80=AFPM Lo= k P <loknath.73@gmail.com>= ; wrote:
Hi,=C2=A0

We are having a requirement to create approx 50 billion rows in= a partition table(~1 billion rows per partition, 200+gb size daily partiti= ons) for a performance test. We are currently using ' insert into <t= arget table_partition> select.. From <source_table_partition> or &= lt;some transformed query>;' method . We have dropped all indexes an= d constraints First and then doing the load. Still it's taking 2-3 hour= s to populate one partition.

<= div>At three hours, that's 92,593 records/second.=C2=A0 Seems pretty sl= ow.

How much of that time is taken by <some tra= nsformed query>?
How big are the records?
How fast i= s the hardware?

Is th= ere a faster way to achieve this?=C2=A0
=
Testing is the only way to know for sure.
=C2=A0
Few teammate suggesting = to use copy command and use file load instead, which will be faster. So I = wanted to understand, how different things it does behind the scenes as com= pared to insert as select command? As because it only deals with sql engine= only.=C2=A0

CO= PY is highly optimized for buffered operation.=C2=A0 INSERT... maybe not so= much.

But if the source data is already in a tabl= e, that would require piping the data to stdout and then back into the data= base.

psql appdb -c "COPY (SELECT ...) TO= STDOUT;" | psql appdb -c "COPY some_table FROM STDOUT;".=C2= =A0 Use binary mode, so text conversion isn't required.

<= /div>
Maybe that's faster, maybe not.

Additionally,= when we were trying to create indexes post data load on one partition, it = took 30+ minutes. Any possible way to make it faster?=C2=A0

Is there any way to drive the above= things in parallel by utilizing full database resources?
=

Put the destination tables in a diff= erent tablespace on a different controller.
=C2=A0
It's postgres 15.4

Why not 15.8?

--
Death to America, and butt= er sauce.
Iraq lobster!
--0000000000003a9fda062166356b--