public inbox for [email protected]
help / color / mirror / Atom feedFrom: Laurenz Albe <[email protected]>
To: yudhi s <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: update faster way
Date: Sat, 14 Sep 2024 19:23:31 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>
References: <CAEzWdqd-22B-bpVdT3yzegiOig9zvJnfJvi=GOMFfHT-Jg8CgQ@mail.gmail.com>
<[email protected]>
<CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com>
On Sat, 2024-09-14 at 16:10 +0530, yudhi s wrote:
> > However, the strain on your system's resources and particularly the row
> > locks will impair normal database work.
> >
> > Essentially, you can either take an extended down time or perform the updates
> > in very small chunks with a very low "lock_timeout" over a very long period
> > of time. If any of the batches fails because of locking conflicts, it has
> > to be retried.
> >
> > Investigate with EXPLAIN (ANALYZE) why the updates take that long. It could
> > be a lame disk, tons of (unnecessary?) indexes or triggers, but it might as
> > well be the join with the lookup table, so perhaps there is room for
> > improvement (more "work_mem" for a hash join?)
>
> We have mostly insert/update happen on current day/live partition. So
> considering that, if we will run batch updates(with batch size of 1000) from
> five different sessions in parallel on different historical partition, at any
> time they will lock 5000 rows and then commit. And also those rows will not
> collide with each other. So do you think that approach can anyway cause locking
> issues?
The updates won't lock with each other. I thought that other database activity
might modify rows in these partitions. If that is not the case, you don't need
to worry about locks.
In that case I would also choose a much higher batch size.
You should make sure to back off every now and then and VACUUM the partition,
so that you avoid excessive table bloat.
> We will ensure the update of live partition occurs when we have least activity.
> So in that way we will not need extended down time. Please correct me if wrong.
That sounds right.
> Never used lock_timeout though, but in above case do we need lock_timeout?
It can be useful if your updating process is blocked by a lock from the
application. Setting the parameter to a low value will keep your update
from hanging for a long time and will throw an error instead.
Erroring out early reduces the danger of a deadlock.
> Regarding batch update with batch size of 1000, do we have any method exists
> in postgres (say like forall statement in Oracle) which will do the batch dml.
> Can you please guide me here, how we can do it in postgres.
I would certainly not perform the update row for row in PL/pgSQL code.
Perhaps something like this:
DO
$$DECLARE
i bigint;
BEGIN
FOR i IN 1..1000000 by 100000 LOOP
UPDATE tab SET ...
WHERE id >= i AND id < i + 100000;
COMMIT;
END LOOP;
END;$$;
VACUUM tab;
Then repeat for the next million rows, and so on.
> And yes will need to see what happens in the update using explain analyze.
> And I was trying to see, if we can run explain analyze without doing
> actual update , but seems that is not possible.
You can do it in a transaction and roll the transaction back.
Yours,
Laurenz Albe
view thread (2+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: update faster way
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox