Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sqQto-009h7t-4i for pgsql-general@arkaria.postgresql.org; Tue, 17 Sep 2024 05:37:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sqQtn-004uyI-DT for pgsql-general@arkaria.postgresql.org; Tue, 17 Sep 2024 05:37:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sqQtm-004uwX-Sg for pgsql-general@lists.postgresql.org; Tue, 17 Sep 2024 05:37:15 +0000 Received: from mail-pj1-x1033.google.com ([2607:f8b0:4864:20::1033]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sqQte-001b0x-TL for pgsql-general@lists.postgresql.org; Tue, 17 Sep 2024 05:37:14 +0000 Received: by mail-pj1-x1033.google.com with SMTP id 98e67ed59e1d1-2da4ea59658so3295107a91.0 for ; Mon, 16 Sep 2024 22:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitnine-net.20230601.gappssmtp.com; s=20230601; t=1726551425; x=1727156225; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6htFqAtptIkdBJq/4TTOW+cLZRFFe+7ieErK80TrHD8=; b=eFt/9BDNCcn4bguw7NGmrRLFJfAfuLkvFA398KaF5ECpONBqNbk4DuFC28pbOD/cvp p/a8aRqO8gv9iE3ovwHJxM1O0ZTL+TScRZ2YKXWWmUuq1V5e8jUIXHBiz1yA0i94XIfW Wy4vPOFPp42YcwzZLeocDcOvi3o4DcDduVnQG127EFDrHm4H+aKvlUpNd8YpkjBikKD3 Eb3SaqVDvjLn6mGI433xiBYdQis5EATNH/ECiHDxV/fBfgpJC9TkhtgoDpuZyDbHGGyq BKkXQzh0bVW3FR4C373ySDPlnWEUkN2TxPbKvBAzb/7d8g/0WJjv7jUCIuHLvnucuBUd a1ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726551425; x=1727156225; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6htFqAtptIkdBJq/4TTOW+cLZRFFe+7ieErK80TrHD8=; b=BIHqvgHPahxG0v4dZh6pElRzz5wZqu2PXnUTAUcbhXfGO984JDgugvjqsfC526wodu +ZRwQV1hTfCCYrnszjuINlaJ3XIP7bCz0A9gDLuPgtOMRy7KVfaAV0XPQpOJjQejF/6D VRSwPoo5aBoGsNUWi0IHUg6MF9oj/T0ec1NktAi2yxSl78P1LvR/40Ht+Vs+cxYWYHpi TBh5EgmS/7Q4E46h9YwaXJn4hpj+27YzhWZu1/4cuTyAdnsuvLqMA+P9YGbLRndOFOkS PiyfM/xKXFfo8Gzs5jMGk+xZG0Wj8LxQFxB4769Ui797l5bkxCYahAQGu1UeHReCQE8v zIWg== X-Gm-Message-State: AOJu0Yyg+X96a3Zryd0mLwBPPTxrPHsToJ87gWk/My6poauWxU09vRIg 4VlEtMF7GmMBT23KKefjHv08CcoiPqo5ku4X31iY84R4kuKV0SAGYHPHxrxi4qAQaRRjI8qCc/M OznEQbdOlIOSZDOqmHDFKjF2NY5yBY2ch1AwJEsHt+nEt1U24KVs= X-Google-Smtp-Source: AGHT+IGKHjsetNROCiLVe3UZDDVl0w11uWu1AeTkEylGjwtr9uN0/t5UvKu2DYa3MPaf+DjqHon2MhCFuxEUigJwkdw= X-Received: by 2002:a17:90b:1c92:b0:2d3:bc5f:715f with SMTP id 98e67ed59e1d1-2dbb9df9afbmr16598178a91.10.1726551425323; Mon, 16 Sep 2024 22:37:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Muhammad Usman Khan Date: Tue, 17 Sep 2024 10:36:50 +0500 Message-ID: Subject: Re: update faster way To: yudhi s Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000064b6d706224a14c5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000064b6d706224a14c5 Content-Type: text/plain; charset="UTF-8" Hi, You can solve this problem using Citus in PostgreSQL, which is specifically designed for parallelism SELECT create_distributed_table('tab_part1', 'partition_key'); SELECT create_distributed_table('reference_tab', 'reference_key'); UPDATE tab_part1 SET column1 = reftab.code FROM reference_tab reftab WHERE tab_part1.column1 = reftab.column1; On Sat, 14 Sept 2024 at 08:22, yudhi s wrote: > Hello, > We have to update a column value(from numbers like '123' to codes like > 'abc' by looking into a reference table data) in a partitioned table with > billions of rows in it, with each partition having 100's millions rows. As > we tested for ~30million rows it's taking ~20minutes to update. So if we go > by this calculation, it's going to take days for updating all the values. > So my question is > > 1) If there is any inbuilt way of running the update query in parallel > (e.g. using parallel hints etc) to make it run faster? > 2) should we run each individual partition in a separate session (e.g. > five partitions will have the updates done at same time from 5 different > sessions)? And will it have any locking effect or we can just start the > sessions and let them run without impacting our live transactions? > > UPDATE tab_part1 > SET column1 = reftab.code > FROM reference_tab reftab > WHERE tab_part1.column1 = subquery.column1; > > Regards > Yudhi > --00000000000064b6d706224a14c5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

You can so= lve this problem using Citus in PostgreSQL, which is specifically designed = for parallelism

SELECT create_distributed_table('tab_part1',= 'partition_key');
SELECT create_distributed_table('referenc= e_tab', 'reference_key');

UPDATE tab_part1
SET column= 1 =3D reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = =3D reftab.column1;


On Sat, 14 Sept 2024 at = 08:22, yudhi s <learnerda= tabase99@gmail.com> wrote:
Hello,
We have to update a column val= ue(from numbers like '123' to codes like 'abc' by looking i= nto a reference table data) in a partitioned table with billions of rows in= it, with each partition having 100's millions rows. As we tested for ~= 30million rows it's taking ~20minutes to update. So if we go by this ca= lculation, it's going to take days for updating all the values. So my q= uestion is

1) If there is any inbuilt way of running the update quer= y in parallel (e.g. using parallel hints etc) to make it run faster?
2) = should we run each individual partition in a separate session (e.g. five pa= rtitions will have the updates done at same time from 5 different sessions)= ? And will it have any locking effect or we can just=C2=A0start the session= s and let them run without impacting our live transactions?

UPDATE t= ab_part1
SET column1 =3D reftab.code
FROM reference_tab reftab
WHE= RE tab_part1.column1 =3D subquery.column1;

Regards
Yudhi
--00000000000064b6d706224a14c5--