Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1spUCd-001nBt-TR for pgsql-general@arkaria.postgresql.org; Sat, 14 Sep 2024 14:56:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1spUCd-00Ewl8-LC for pgsql-general@arkaria.postgresql.org; Sat, 14 Sep 2024 14:56:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1spUCd-00EwjB-5W for pgsql-general@lists.postgresql.org; Sat, 14 Sep 2024 14:56:47 +0000 Received: from mail-oo1-xc33.google.com ([2607:f8b0:4864:20::c33]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1spUCa-0018Pd-3c for pgsql-general@lists.postgresql.org; Sat, 14 Sep 2024 14:56:45 +0000 Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-5e1ba05bf73so1541533eaf.3 for ; Sat, 14 Sep 2024 07:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726325803; x=1726930603; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=AtwpVLrvCDYMl2X01feef4qN+sAHKmQJ8Q8dsiGWqTI=; b=IviUaO7eaEj+AeFr8rpHeQ7Sl3ukLjH4HDCy4gLThtKPlgrtnNCPbXDWOOcGlfngAa bQcWiUgb14icB6aLfWvB83qvoIniI3v/tUWwMCmvS6HzKgTY55fVtr4JOX0xwhnvVR5z MDfhjkOlN+ZkyBSGSN0zOqQujhEPf5rTpzjDF5vkoE72R2AWp1Q0VqyGV1W3EpdVflgk WHF/APhT3kI7FHVC3+uy1zH15eVzGae7UhoeDTD9EQ8OER2hIMuVQnSBUsGSDkEVOaMN 5meLSgryAZ2GJVOayhaWSGVFIV29J5JlOgbyyCuQxATmyqtOAoYyw+A/SG2HRoUnRQ5I EujQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726325803; x=1726930603; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AtwpVLrvCDYMl2X01feef4qN+sAHKmQJ8Q8dsiGWqTI=; b=aav5ZKidNW3OpJ4V2PzP3Yacq7GfncC7fxlwvUEBTPomRUula5gTkUCcjE1yV4oC6l G7XZk9EX+qo5qKwnjJS04MpMgot1zVO6kikv24H+NsOXCApAhuMwGZZ+dja4wCxfDlXv NUyGqL33BOf6KKx+13KdV02Ndi230Ovje3MmRnbKTDL4mqpnFcDNyLe7yx3u11gyaDFH 3gsnntAzQ6GZVgs2ggtLZSITFb4Ce20NqXf1qfOJxYb4isSSW1/AwTGAOOTqgzccbJr4 bDInK0m9h54v994WxbEi85jyQkS4G81ljjKHqF3CBFH0a8OW+B2XpaKMLUYOKfOLNXA6 0BxA== X-Gm-Message-State: AOJu0Yw6+DL4gCoiZEeNKhnzvd9rJQ7AvpgcAUcbQlRzjy8kZqxfmWqT 3i5wpmnIcJMf7BH2wQBC9JuXSsHGFeLyYO3Kx/IwfycQrdCsia8E9+6+pJ9KtobKIxuSo7NnsmX qVcYh4bYhjJzaISlD+UL4YkLhS9VCxb7Q X-Google-Smtp-Source: AGHT+IEWvKGgofRup0F1VsrGyFyjIbS3TdUmsrTPNpRzsEXFfDj7OpDQ1Z2AEfslJHkGCrweyoNH1R+nS8RyNxet0eE= X-Received: by 2002:a05:6870:1c7:b0:270:1498:6a36 with SMTP id 586e51a60fabf-27c3f646832mr8143939fac.29.1726325803284; Sat, 14 Sep 2024 07:56:43 -0700 (PDT) MIME-Version: 1.0 References: <1955d0e6cafd643520d282a74d9956340983074e.camel@cybertec.at> <20240914112451.bgxnbjv5b6unoijc@hjp.at> In-Reply-To: <20240914112451.bgxnbjv5b6unoijc@hjp.at> From: yudhi s Date: Sat, 14 Sep 2024 20:26:32 +0530 Message-ID: Subject: Re: update faster way To: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000004581740622158cbb" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004581740622158cbb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Sep 14, 2024 at 4:55=E2=80=AFPM Peter J. Holzer = wrote: > > Which in turn means that you want as little overhead as possible per > batch which means finding those 5000 rows should be quick. Which brings > us back to Igor's question: Do you have any indexes in place which speed > up finding those 5000 rows (the primary key almost certainly won't help > with that). EXPLAIN (ANALYZE) (as suggested by Laurenz) will certainly > help answering that question. > > > And also those rows will not collide with each other. So do you think > > that approach can anyway cause locking issues? > > No, I don't think so. With a batch size that small I wouldn't expect > problems even on the live partition. But of course many busy parallel > sessions will put additional load on the system which may or may not be > noticeable by users (you might saturate the disks writing WAL entries > for example, which would slow down other sessions trying to commit). > > > > Regarding batch update with batch size of 1000, do we have any method > exists in > > postgres (say like forall statement in Oracle) which will do the batch > dml. Can > > you please guide me here, how we can do it in postgres. > > Postgres offers several server side languages. As an Oracle admin you > will probably find PL/pgSQL most familiar. But you could also use Perl > or Python or several others. And of course you could use any > programming/scripting language you like on the client side. > > When you said *"(the primary key almost certainly won't help with that)", = *I am trying to understand why it is so ? I was thinking of using that column as an incrementing filter and driving the eligible rows based on that filter. And if it would have been a sequence. I think it would have helped but in this case it's UUID , so I may not be able to do the batch DML using that as filter criteria. but in that case will it be fine to drive the update based on ctid something as below? Each session will have the range of 5 days of data or five partition data and will execute a query something as below which will update in the batches of 10K and then commit. Is this fine? Or is there some better way of doing the batch DML in postgres plpgsql? DO $$ DECLARE l_rowid_array ctid[]; l_ctid ctid; l_array_size INT :=3D 10000; l_processed INT :=3D 0; BEGIN FOR l_cnt IN 0..(SELECT COUNT(*) FROM part_tab WHERE part_date > '1-sep-2024' and part_date < '5-sep-2024' ) / l_array_size LOOP l_rowid_array :=3D ARRAY( SELECT ctid FROM part_tab WHERE part_date > '1-sep-2024' and part_date < '5-sep-2024' LIMIT l_array_size OFFSET l_cnt * l_array_size ); FOREACH l_ctid IN ARRAY l_rowid_array LOOP update part_tab SET column1 =3D reftab.code FROM reference_tab reftab WHERE tab_part1.column1 =3D reftab.column1 and ctid =3D l_ctid; l_processed :=3D l_processed + 1; END LOOP; COMMIT; END LOOP; END $$; --0000000000004581740622158cbb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sat, Sep 14, 2024 at 4:55=E2=80=AF= PM Peter J. Holzer <hjp-pgsql@hjp.at= > wrote:
=
Which in turn means that you want as little overhead as possible per
batch which means finding those 5000 rows should be quick. Which brings
us back to Igor's question: Do you have any indexes in place which spee= d
up finding those 5000 rows (the primary key almost certainly won't help=
with that). EXPLAIN (ANALYZE) (as suggested by Laurenz) will certainly
help answering that question.

> And also those rows will not collide with each other. So do you think<= br> > that approach can anyway cause locking issues?

No, I don't think so. With a batch size that small I wouldn't expec= t
problems even on the live partition. But of course many busy parallel
sessions will put additional load on the system which may or may not be
noticeable by users (you might saturate the disks writing WAL entries
for example, which would slow down other sessions trying to commit).


> Regarding batch update with batch size of 1000, do we have any method = exists in
> postgres (say like forall statement in Oracle) which will do the batch= dml. Can
> you please guide me here, how we can do it in postgres.

Postgres offers several server side languages. As an Oracle admin you
will probably find PL/pgSQL most familiar. But you could also use Perl
or Python or several others. And of course you could use any
programming/scripting language you like on the client side.


=C2=A0When you said "(the primary key almo= st certainly won't help with that)", I am trying to understand= why it is so=C2=A0?=C2=A0
I was thinking of using that co= lumn as an incrementing filter and driving the eligible rows based on that = filter. And if it would have been a sequence. I think it would have helped = but in this case it's UUID , so I may not be able to do the batch DML u= sing that as filter criteria. but in that case will it be fine to drive the= update based on ctid something as below? Each session will have the range = of 5 days of data or five partition data and will execute a query something= as below which will update in the batches of 10K and then commit. Is this = fine? Or is there some better way of doing the batch DML in postgres plpgsq= l?

DO $$
DECLARE
=C2=A0 =C2=A0 l_rowid_array= ctid[];
=C2=A0 =C2=A0 l_ctid ctid;
=C2=A0 =C2=A0 l_array_size INT := =3D 10000;
=C2=A0 =C2=A0 l_processed INT :=3D 0;
BEGIN
=C2=A0
= =C2=A0 =C2=A0 FOR l_cnt IN 0..(SELECT COUNT(*) FROM part_tab WHERE =C2=A0 p= art_date > '1-sep-2024' and part_date < '5-sep-2024' =
) / l_array_size LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 l_rowid_arra= y :=3D ARRAY(
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SELECT ctid
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 FROM part_tab
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE part_date =C2=A0 > '1-sep-2024'= ; and part_date < '5-sep-2024'
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 LIMIT l_array_size OFFSET l_cnt * l_array_size
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 );
=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 FOREACH l_c= tid IN ARRAY l_rowid_array LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 update =C2=A0part_tab
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SET = column1 =3D reftab.code
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0FROM re= ference_tab reftab
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE tab_p= art1.column1 =3D reftab.column1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 and ctid =3D l_ctid;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 l_pro= cessed :=3D l_processed + 1;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 END LOOP;
= =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 COMMIT;
=C2=A0 =C2=A0 END LOOP;=C2=A0
END $$;
--0000000000004581740622158cbb--