Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srGyx-000F2l-Vc for pgsql-general@arkaria.postgresql.org; Thu, 19 Sep 2024 13:14:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1srGxw-000uIP-WB for pgsql-general@arkaria.postgresql.org; Thu, 19 Sep 2024 13:13:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srGxw-000uHx-FC for pgsql-general@lists.postgresql.org; Thu, 19 Sep 2024 13:13:01 +0000 Received: from mail-yb1-xb29.google.com ([2607:f8b0:4864:20::b29]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1srGxu-0000nG-Gc for pgsql-general@lists.postgresql.org; Thu, 19 Sep 2024 13:13:00 +0000 Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-e1d74e6c8d8so689444276.1 for ; Thu, 19 Sep 2024 06:12:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726751578; x=1727356378; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hO3dK6f0l6S7KwC0kYF0mMiA89bSYgSJXfL/+7SPPcw=; b=a5BTrFhIk9nxV6bIo1ryIlijE2gz7HJ+TGgUnFLbnuD+RvCEFbcUSIdkYUbfHGnm3C DExWMtkAXeZz9wNarfezvffnBlSYSEURayJmT05Fumc2xUcPqpx+EcNEHS79VWeI850s hxRHBKGAbrjk7BH1ljVWlnT0jKnQVKtoJbw7HQNdIssqvMh+R9EXWz4AtaSS5M5SRi46 DTCSZuiaoz1yNWzzcnBnQUX6AQdZJsTAHV45DrPRIG35fB9OOgPqbJ3HjfHYS+ffiFr5 F/ypVaYsQT5JNywbJ3IY6z8mSy2lKHygHW+c39RgDrLKse+suxRFVLwwVLlVVc59vx+L B7dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726751578; x=1727356378; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hO3dK6f0l6S7KwC0kYF0mMiA89bSYgSJXfL/+7SPPcw=; b=cxggGkcDO3mSaOUE8CzwufiHJQgQNRg+WJLzXrE2xJpCFu1NHjmCmHiPAahdy5a8Y9 YWiPCf2jDorrbDOCdGlWemFeuLjuCUDndJwEiGC+pNOppKrRfVAXgNzP7+xK44DgSGjI eBqWzU5roU68nG0rhOOZcHBO5N4rCVxYL5mBseuLUsI1wq3O85c9XeWwG0KEw/ajRZ1h UcU766gIBwEPGaUQkO978WHBWo7VlFyhN+WWyYl04aYKdi6LGgCKwJorHsQQ+0zuD3Jt QAFDeBbVv6FHUSjowByHOvxyf/k2+Q5407TGkbILIxO4ZEpn+UNfVHdhdLp3YNdp83ds vXxw== X-Gm-Message-State: AOJu0YxDPA99ahlXi2e943Pgba+5AsmUb+2iQlmKVSrJX5jXwjjOCb2q bdPdSaRqvIECtwKOnzoc/gIviZ8S6O1nCmQMxRJCyi+xTSw/o99Tibg/lPjh65xa+uqG9MS228n R7v9jJiZhai640flDo+wdSZ7hyMw= X-Google-Smtp-Source: AGHT+IGicWe3wMbdhhKzQn+LrFC50oRTg6MCwvCX3SP2Zd+VQKR8e2Cyg7j6v0iPPAOza8a5jEfXzvNqxn2BpbGK1KE= X-Received: by 2002:a05:6902:120c:b0:e1d:bd69:bf26 with SMTP id 3f1490d57ef6-e1dbd69c200mr14663378276.22.1726751577466; Thu, 19 Sep 2024 06:12:57 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Lok P Date: Thu, 19 Sep 2024 18:42:45 +0530 Message-ID: Subject: Re: How batch processing works To: Ron Johnson Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000063f1fd062278aebd" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000063f1fd062278aebd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Sep 19, 2024 at 5:40=E2=80=AFPM Ron Johnson wrote: > On Thu, Sep 19, 2024 at 5:24=E2=80=AFAM Lok P wrot= e: > >> >> >> [snip] > >> DO $$ >> DECLARE >> num_inserts INTEGER :=3D 100000; >> batch_size INTEGER :=3D 50; >> start_time TIMESTAMP; >> end_time TIMESTAMP; >> elapsed_time INTERVAL; >> i INTEGER; >> BEGIN >> -- Method 1: Individual Inserts with Commit after every Row >> start_time :=3D clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table VALUES (i, 'a'); >> COMMIT; >> END LOOP; >> >> end_time :=3D clock_timestamp(); >> elapsed_time :=3D end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 1: Individual Inserts with Commit after every Row', >> start_time, end_time, elapsed_time); >> >> -- Method 2: Individual Inserts with Commit after 100 Rows >> start_time :=3D clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table2 VALUES (i, 'a'); >> -- Commit after every 100 rows >> IF i % batch_size =3D 0 THEN >> COMMIT; >> END IF; >> END LOOP; >> >> -- Final commit if not already committed >> commit; >> >> end_time :=3D clock_timestamp(); >> elapsed_time :=3D end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >> start_time, end_time, elapsed_time); >> >> -- Method 3: Batch Inserts with Commit after all >> start_time :=3D clock_timestamp(); >> >> FOR i IN 1..(num_inserts / batch_size) LOOP >> INSERT INTO parent_table3 VALUES >> (1 + (i - 1) * batch_size, 'a'), >> > [snip] > >> (49 + (i - 1) * batch_size, 'a'), >> (50 + (i - 1) * batch_size, 'a')); >> COMMIT; >> END LOOP; >> >> COMMIT; -- Final commit for all >> end_time :=3D clock_timestamp(); >> elapsed_time :=3D end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >> end_time, elapsed_time); >> >> END $$; >> > > Reproduce what behavior? > > Anyway, plpgsql functions (including anonymous DO statements) are -- to > Postgresql -- single statements. Thus, they'll be faster than > individual calls.. > > An untrusted language like plpython3u might speed things up even more, if > you have to read a heterogeneous external file and insert all the records > into the db. > Here if you see my script , the method-1 is doing commit after each row insert. And method-2 is doing a batch commit i.e. commit after every "50" row. And method-3 is doing a true batch insert i.e. combining all the 50 values in one insert statement and submitting to the database in oneshot and then COMMIT it, so the context switching will be a lot less. So I was expecting Method-3 to be the fastest way to insert the rows here, but the response time shows the same response time for Method-2 and method-3. Method-1 is the slowest through. --00000000000063f1fd062278aebd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Sep 19, 2024 at 5:40=E2=80=AF= PM Ron Johnson <ronljohnsonjr= @gmail.com> wrote:
On Thu, Sep 19, 2024 at 5:24= =E2=80=AFAM Lok P <loknath.73@gmail.com> wrote:


[snip]=C2=A0
DO $$
DECLARE
=C2=A0 =C2=A0 n= um_inserts INTEGER :=3D 100000;
=C2=A0 =C2=A0 batch_size INTEGER :=3D 50= ;
=C2=A0 =C2=A0 start_time TIMESTAMP;
=C2=A0 =C2=A0 end_time TIMESTAM= P;
=C2=A0 =C2=A0 elapsed_time INTERVAL;
=C2=A0 =C2=A0 i INTEGER;
B= EGIN
=C2=A0 =C2=A0 -- Method 1: Individual Inserts with Commit after eve= ry Row
=C2=A0 =C2=A0 start_time :=3D clock_timestamp();

=C2=A0 = =C2=A0 FOR i IN 1..num_inserts LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT I= NTO parent_table VALUES (i, 'a');
=C2=A0 =C2=A0 =C2=A0 =C2=A0 CO= MMIT; =C2=A0
=C2=A0 =C2=A0 END LOOP;

=C2=A0 =C2=A0 end_time :=3D = clock_timestamp();
=C2=A0 =C2=A0 elapsed_time :=3D end_time - start_time= ;
=C2=A0 =C2=A0 INSERT INTO debug_log (method1, start_time, end_time, el= apsed_time)
=C2=A0 =C2=A0 VALUES ('Method 1: Individual Inserts wit= h Commit after every Row', start_time, end_time, elapsed_time);

= =C2=A0 =C2=A0 -- Method 2: Individual Inserts with Commit after 100 Rows=C2=A0 =C2=A0 start_time :=3D clock_timestamp();

=C2=A0 =C2=A0 FOR = i IN 1..num_inserts LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT INTO parent_= table2 VALUES (i, 'a');
=C2=A0 =C2=A0 =C2=A0 =C2=A0 -- Commit af= ter every 100 rows
=C2=A0 =C2=A0 =C2=A0 =C2=A0 IF i % batch_size =3D 0 T= HEN
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 COMMIT; =C2=A0
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 END IF;
=C2=A0 =C2=A0 END LOOP;

=C2=A0 =C2= =A0 -- Final commit if not already committed
=C2=A0 =C2=A0commit;
=C2=A0 =C2=A0 end_time :=3D clock_timestamp();
=C2=A0 =C2=A0 elapsed_ti= me :=3D end_time - start_time;
=C2=A0 =C2=A0 INSERT INTO debug_log (meth= od1, start_time, end_time, elapsed_time)
=C2=A0 =C2=A0 VALUES ('Met= hod 2: Individual Inserts with Commit after 100 Rows', start_time, end_= time, elapsed_time);

=C2=A0 =C2=A0 -- Method 3: Batch Inserts with C= ommit after all
=C2=A0 =C2=A0 start_time :=3D clock_timestamp();

= =C2=A0 =C2=A0 FOR i IN 1..(num_inserts / batch_size) LOOP
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 INSERT INTO parent_table3 VALUES
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 (1 + (i - 1) * batch_size, 'a'),
=
[snip]=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (49 + (i - 1) * batch_= size, 'a'),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (50 + (i -= 1) * batch_size, 'a'));
COMMIT;
=C2=A0 =C2=A0 END LOOP;
=C2=A0 =C2=A0 COMMIT; =C2=A0-- Final commit for all
=C2=A0 =C2=A0 = end_time :=3D clock_timestamp();
=C2=A0 =C2=A0 elapsed_time :=3D end_tim= e - start_time;
=C2=A0 =C2=A0 INSERT INTO debug_log (method1, start_time= , end_time, elapsed_time)
=C2=A0 =C2=A0 VALUES ('Method 3: Batch In= serts with Commit after All', start_time, end_time, elapsed_time);
<= br>END $$;

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to = Postgresql -- single statements.=C2=A0 Thus, they'll be faster than ind= ividual=C2=A0calls..

An untrusted lang= uage like=C2=A0plpython3u might speed things up even more, if you have to r= ead a heterogeneous external file and insert all the records into the db.

Here if you see my script = , the method-1 is doing commit after each row insert. And method-2 is doing= a batch commit i.e. commit after every "50" row. And method-3 is= doing a true batch insert i.e. combining all the 50 values in one insert s= tatement and submitting to the database in oneshot and then COMMIT it,=C2= =A0so the context switching will be a lot less. So I was expecting Method-3= to be the fastest way to insert the rows here, but the response time shows= the same response time for Method-2 and method-3. Method-1 is the slowest = through.
--00000000000063f1fd062278aebd--