Re: update faster way - Laurenz Albe

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Laurenz Albe <[email protected]>
To: yudhi s <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: update faster way
Date: Sat, 14 Sep 2024 19:23:31 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>
References: <CAEzWdqd-22B-bpVdT3yzegiOig9zvJnfJvi=GOMFfHT-Jg8CgQ@mail.gmail.com>
	<[email protected]>
	<CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>

On Sat, 2024-09-14 at 16:10 +0530, yudhi s wrote:
> > However, the strain on your system's resources and particularly the row
> > locks will impair normal database work.
> > 
> > Essentially, you can either take an extended down time or perform the updates
> > in very small chunks with a very low "lock_timeout" over a very long period
> > of time.  If any of the batches fails because of locking conflicts, it has
> > to be retried.
> > 
> > Investigate with EXPLAIN (ANALYZE) why the updates take that long.  It could
> > be a lame disk, tons of (unnecessary?) indexes or triggers, but it might as
> > well be the join with the lookup table, so perhaps there is room for
> > improvement (more "work_mem" for a hash join?)
> 
> We have mostly insert/update happen on current day/live partition. So
> considering that, if we will run batch updates(with batch size of 1000) from
> five different sessions in parallel on different historical partition, at any
> time they will lock 5000 rows and then commit. And also those rows will not
> collide with each other. So do you think that approach can anyway cause locking
> issues?

The updates won't lock with each other.  I thought that other database activity
might modify rows in these partitions.  If that is not the case, you don't need
to worry about locks.

In that case I would also choose a much higher batch size.

You should make sure to back off every now and then and VACUUM the partition,
so that you avoid excessive table bloat.

> We will ensure the update of live partition occurs when we have least activity.
> So in that way we will not need extended down time. Please correct me if wrong.

That sounds right.

> Never used lock_timeout though, but in above case do we need lock_timeout?

It can be useful if your updating process is blocked by a lock from the
application.  Setting the parameter to a low value will keep your update
from hanging for a long time and will throw an error instead.
Erroring out early reduces the danger of a deadlock.

> Regarding batch update with batch size of 1000, do we have any method exists
> in postgres (say like forall statement in Oracle) which will do the batch dml.
> Can you please guide me here, how we can do it in postgres.

I would certainly not perform the update row for row in PL/pgSQL code.
Perhaps something like this:

  DO
  $$DECLARE
     i bigint;
  BEGIN
     FOR i IN 1..1000000 by 100000 LOOP
        UPDATE tab SET ...
        WHERE id >= i AND id < i + 100000;

        COMMIT;
     END LOOP;
  END;$$;

  VACUUM tab;

Then repeat for the next million rows, and so on.

> And yes will need to see what happens in the update using explain analyze.
> And I was trying to see, if we can run explain analyze without doing
> actual update , but seems that is not possible. 

You can do it in a transaction and roll the transaction back.

Yours,
Laurenz Albe

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: update faster way
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox