public inbox for [email protected]  
help / color / mirror / Atom feed
update faster way
4+ messages / 4 participants
[nested] [flat]

* update faster way
@ 2024-09-14 03:13  yudhi s <[email protected]>
  0 siblings, 3 replies; 4+ messages in thread

From: yudhi s @ 2024-09-14 03:13 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

Hello,
We have to update a column value(from numbers like '123' to codes like
'abc' by looking into a reference table data) in a partitioned table with
billions of rows in it, with each partition having 100's millions rows. As
we tested for ~30million rows it's taking ~20minutes to update. So if we go
by this calculation, it's going to take days for updating all the values.
So my question is

1) If there is any inbuilt way of running the update query in parallel
(e.g. using parallel hints etc) to make it run faster?
2) should we run each individual partition in a separate session (e.g. five
partitions will have the updates done at same time from 5 different
sessions)? And will it have any locking effect or we can just start the
sessions and let them run without impacting our live transactions?

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = subquery.column1;

Regards
Yudhi


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: update faster way
@ 2024-09-14 03:29  Igor Korot <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Igor Korot @ 2024-09-14 03:29 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

Hii,

On Fri, Sep 13, 2024 at 10:22 PM yudhi s <[email protected]> wrote:
>
> Hello,
> We have to update a column value(from numbers like '123' to codes like 'abc' by looking into a reference table data) in a partitioned table with billions of rows in it, with each partition having 100's millions rows. As we tested for ~30million rows it's taking ~20minutes to update. So if we go by this calculation, it's going to take days for updating all the values. So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g. five partitions will have the updates done at same time from 5 different sessions)? And will it have any locking effect or we can just start the sessions and let them run without impacting our live transactions?

Do you have any indexes?
If not - you should, if yes - what are they?

Thank you.

>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: update faster way
@ 2024-09-17 05:36  Muhammad Usman Khan <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Muhammad Usman Khan @ 2024-09-17 05:36 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

Hi,

You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism

SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;


On Sat, 14 Sept 2024 at 08:22, yudhi s <[email protected]> wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: update faster way
@ 2024-09-17 08:28  Alvaro Herrera <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Alvaro Herrera @ 2024-09-17 08:28 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

On 2024-Sep-14, yudhi s wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows.

Another option is to not update anything, and instead create a view on
top of the partitioned table (joined to the reference table) that
returns the reference value instead of the original number value from
the column; when the application wants to receive those reference
values, it queries the view instead of the partitioned table directly.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)






^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2024-09-17 08:28 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-14 03:13 update faster way yudhi s <[email protected]>
2024-09-14 03:29 ` Igor Korot <[email protected]>
2024-09-17 05:36 ` Muhammad Usman Khan <[email protected]>
2024-09-17 08:28 ` Alvaro Herrera <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox