MIME-Version: 1.0
References: <CAEzWdqd-22B-bpVdT3yzegiOig9zvJnfJvi=GOMFfHT-Jg8CgQ@mail.gmail.com>
 <1955d0e6cafd643520d282a74d9956340983074e.camel@cybertec.at>
 <CAEzWdqcxzKOMMe3LfTjfnOXwhRZNyci-aMO0ko4HYYAs8yYAFA@mail.gmail.com> <20240914112451.bgxnbjv5b6unoijc@hjp.at>
In-Reply-To: <20240914112451.bgxnbjv5b6unoijc@hjp.at>
From: yudhi s <learnerdatabase99@gmail.com>
Date: Sat, 14 Sep 2024 20:26:32 +0530
Message-ID: <CAEzWdqcgLj_vothSMttqN4N3Ly46O=MyhmUMvyQqfg6RiuYioQ@mail.gmail.com>
Subject: Re: update faster way
To: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="0000000000004581740622158cbb"
Archived-At: <https://www.postgresql.org/message-id/CAEzWdqcgLj_vothSMttqN4N3Ly46O%3DMyhmUMvyQqfg6RiuYioQ%40mail.gmail.com>
Precedence: bulk

--0000000000004581740622158cbb
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Sep 14, 2024 at 4:55=E2=80=AFPM Peter J. Holzer <hjp-pgsql@hjp.at> =
wrote:

>
> Which in turn means that you want as little overhead as possible per
> batch which means finding those 5000 rows should be quick. Which brings
> us back to Igor's question: Do you have any indexes in place which speed
> up finding those 5000 rows (the primary key almost certainly won't help
> with that). EXPLAIN (ANALYZE) (as suggested by Laurenz) will certainly
> help answering that question.
>
> > And also those rows will not collide with each other. So do you think
> > that approach can anyway cause locking issues?
>
> No, I don't think so. With a batch size that small I wouldn't expect
> problems even on the live partition. But of course many busy parallel
> sessions will put additional load on the system which may or may not be
> noticeable by users (you might saturate the disks writing WAL entries
> for example, which would slow down other sessions trying to commit).
>
>
> > Regarding batch update with batch size of 1000, do we have any method
> exists in
> > postgres (say like forall statement in Oracle) which will do the batch
> dml. Can
> > you please guide me here, how we can do it in postgres.
>
> Postgres offers several server side languages. As an Oracle admin you
> will probably find PL/pgSQL most familiar. But you could also use Perl
> or Python or several others. And of course you could use any
> programming/scripting language you like on the client side.
>
>
 When you said *"(the primary key almost certainly won't help with that)", =
*I
am trying to understand why it is so ?
I was thinking of using that column as an incrementing filter and driving
the eligible rows based on that filter. And if it would have been a
sequence. I think it would have helped but in this case it's UUID , so I
may not be able to do the batch DML using that as filter criteria. but in
that case will it be fine to drive the update based on ctid something as
below? Each session will have the range of 5 days of data or five partition
data and will execute a query something as below which will update in the
batches of 10K and then commit. Is this fine? Or is there some better way
of doing the batch DML in postgres plpgsql?

DO $$
DECLARE
    l_rowid_array ctid[];
    l_ctid ctid;
    l_array_size INT :=3D 10000;
    l_processed INT :=3D 0;
BEGIN

    FOR l_cnt IN 0..(SELECT COUNT(*) FROM part_tab WHERE   part_date >
'1-sep-2024' and part_date < '5-sep-2024'
) / l_array_size LOOP
        l_rowid_array :=3D ARRAY(
            SELECT ctid
            FROM part_tab
            WHERE part_date   > '1-sep-2024' and part_date < '5-sep-2024'
            LIMIT l_array_size OFFSET l_cnt * l_array_size
        );

        FOREACH l_ctid IN ARRAY l_rowid_array LOOP
            update  part_tab
            SET column1 =3D reftab.code
           FROM reference_tab reftab
            WHERE tab_part1.column1 =3D reftab.column1
            and ctid =3D l_ctid;
            l_processed :=3D l_processed + 1;
        END LOOP;

        COMMIT;
    END LOOP;

END $$;

--0000000000004581740622158cbb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sat, Sep 14, 2024 at 4:55=E2=80=AF=
PM Peter J. Holzer &lt;<a href=3D"mailto:hjp-pgsql@hjp.at">hjp-pgsql@hjp.at=
</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">=
<br>
Which in turn means that you want as little overhead as possible per<br>
batch which means finding those 5000 rows should be quick. Which brings<br>
us back to Igor&#39;s question: Do you have any indexes in place which spee=
d<br>
up finding those 5000 rows (the primary key almost certainly won&#39;t help=
<br>
with that). EXPLAIN (ANALYZE) (as suggested by Laurenz) will certainly<br>
help answering that question.<br>
<br>
&gt; And also those rows will not collide with each other. So do you think<=
br>
&gt; that approach can anyway cause locking issues?<br>
<br>
No, I don&#39;t think so. With a batch size that small I wouldn&#39;t expec=
t<br>
problems even on the live partition. But of course many busy parallel<br>
sessions will put additional load on the system which may or may not be<br>
noticeable by users (you might saturate the disks writing WAL entries<br>
for example, which would slow down other sessions trying to commit).<br>
<br>
<br>
&gt; Regarding batch update with batch size of 1000, do we have any method =
exists in<br>
&gt; postgres (say like forall statement in Oracle) which will do the batch=
 dml. Can<br>
&gt; you please guide me here, how we can do it in postgres.<br>
<br>
Postgres offers several server side languages. As an Oracle admin you<br>
will probably find PL/pgSQL most familiar. But you could also use Perl<br>
or Python or several others. And of course you could use any<br>
programming/scripting language you like on the client side.<br><br></blockq=
uote><div><br></div><div>=C2=A0When you said <i>&quot;(the primary key almo=
st certainly won&#39;t help with that)&quot;, </i>I am trying to understand=
 why it is so=C2=A0?<i>=C2=A0</i></div><div>I was thinking of using that co=
lumn as an incrementing filter and driving the eligible rows based on that =
filter. And if it would have been a sequence. I think it would have helped =
but in this case it&#39;s UUID , so I may not be able to do the batch DML u=
sing that as filter criteria. but in that case will it be fine to drive the=
 update based on ctid something as below? Each session will have the range =
of 5 days of data or five partition data and will execute a query something=
 as below which will update in the batches of 10K and then commit. Is this =
fine? Or is there some better way of doing the batch DML in postgres plpgsq=
l?</div><div><br></div><div>DO $$<br>DECLARE<br>=C2=A0 =C2=A0 l_rowid_array=
 ctid[];<br>=C2=A0 =C2=A0 l_ctid ctid;<br>=C2=A0 =C2=A0 l_array_size INT :=
=3D 10000;<br>=C2=A0 =C2=A0 l_processed INT :=3D 0;<br>BEGIN<br>=C2=A0<br>=
=C2=A0 =C2=A0 FOR l_cnt IN 0..(SELECT COUNT(*) FROM part_tab WHERE =C2=A0 p=
art_date &gt; &#39;1-sep-2024&#39; and part_date &lt; &#39;5-sep-2024&#39; =
<br>					 ) / l_array_size LOOP<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 l_rowid_arra=
y :=3D ARRAY(<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SELECT ctid<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FROM part_tab<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE part_date =C2=A0 &gt; &#39;1-sep-2024&#39=
; and part_date &lt; &#39;5-sep-2024&#39;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 LIMIT l_array_size OFFSET l_cnt * l_array_size<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 );<br>=C2=A0		<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 FOREACH l_c=
tid IN ARRAY l_rowid_array LOOP<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 update =C2=A0part_tab<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SET =
column1 =3D reftab.code<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0FROM re=
ference_tab reftab<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE tab_p=
art1.column1 =3D reftab.column1<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 and ctid =3D l_ctid;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 l_pro=
cessed :=3D l_processed + 1;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 END LOOP;<br>=
=C2=A0<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 COMMIT;<br>=C2=A0 =C2=A0 END LOOP;<br=
>=C2=A0<br>END $$;<br></div></div></div>

--0000000000004581740622158cbb--