Message-ID: <37bc4217078946546b62c6ca2f2e458b798b1f36.camel@cybertec.at>
Subject: Re: update faster way
From: Laurenz Albe <laurenz.albe@cybertec.at>
To: yudhi s <learnerdatabase99@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Date: Sat, 14 Sep 2024 19:23:31 +0200
In-Reply-To: <CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>
References: 
	<CAEzWdqd-22B-bpVdT3yzegiOig9zvJnfJvi=GOMFfHT-Jg8CgQ@mail.gmail.com>
	 <1955d0e6cafd643520d282a74d9956340983074e.camel@cybertec.at>
	 <CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.52.4 (3.52.4-1.fc40) 
MIME-Version: 1.0
Archived-At: <https://www.postgresql.org/message-id/37bc4217078946546b62c6ca2f2e458b798b1f36.camel%40cybertec.at>
Precedence: bulk

On Sat, 2024-09-14 at 16:10 +0530, yudhi s wrote:
> > However, the strain on your system's resources and particularly the row
> > locks will impair normal database work.
> >=20
> > Essentially, you can either take an extended down time or perform the u=
pdates
> > in very small chunks with a very low "lock_timeout" over a very long pe=
riod
> > of time.=C2=A0 If any of the batches fails because of locking conflicts=
, it has
> > to be retried.
> >=20
> > Investigate with EXPLAIN (ANALYZE) why the updates take that long.=C2=
=A0 It could
> > be a lame disk, tons of (unnecessary?) indexes or triggers, but it migh=
t as
> > well be the join with the lookup table, so perhaps there is room for
> > improvement (more "work_mem" for a hash join?)
>=20
> We have mostly insert/update happen on current day/live partition. So
> considering that, if we will run batch updates(with batch size of 1000) f=
rom
> five different sessions in parallel on different historical partition, at=
 any
> time they will lock 5000 rows and then commit. And also those rows will n=
ot
> collide with each other. So do you think that approach can anyway cause l=
ocking
> issues?

The updates won't lock with each other.  I thought that other database acti=
vity
might modify rows in these partitions.  If that is not the case, you don't =
need
to worry about locks.

In that case I would also choose a much higher batch size.

You should make sure to back off every now and then and VACUUM the partitio=
n,
so that you avoid excessive table bloat.

> We will ensure the update of live partition occurs when we have least act=
ivity.
> So in that way we will not need extended down time. Please correct me if =
wrong.

That sounds right.

> Never used lock_timeout though, but in above case do we need lock_timeout=
?

It can be useful if your updating process is blocked by a lock from the
application.  Setting the parameter to a low value will keep your update
from hanging for a long time and will throw an error instead.
Erroring out early reduces the danger of a deadlock.

> Regarding batch update with batch size of 1000, do we have any method exi=
sts
> in postgres (say like forall statement in Oracle) which will do the batch=
 dml.
> Can you please guide me here, how we can do it in postgres.

I would certainly not perform the update row for row in PL/pgSQL code.
Perhaps something like this:

  DO
  $$DECLARE
     i bigint;
  BEGIN
     FOR i IN 1..1000000 by 100000 LOOP
        UPDATE tab SET ...
        WHERE id >=3D i AND id < i + 100000;

        COMMIT;
     END LOOP;
  END;$$;

  VACUUM tab;

Then repeat for the next million rows, and so on.

> And yes will need to see what happens in the update using explain analyze=
.
> And I was trying to see, if we can run explain analyze without doing
> actual update , but seems that is not possible.=C2=A0

You can do it in a transaction and roll the transaction back.

Yours,
Laurenz Albe