Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srG0L-0002I3-2y for pgsql-general@arkaria.postgresql.org; Thu, 19 Sep 2024 12:11:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1srFzI-0000V5-OE for pgsql-general@arkaria.postgresql.org; Thu, 19 Sep 2024 12:10:22 +0000 Received: from magus.postgresql.org ([87.238.57.229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srFzI-0000Uu-7B for pgsql-general@lists.postgresql.org; Thu, 19 Sep 2024 12:10:21 +0000 Received: from mail-oo1-xc2f.google.com ([2607:f8b0:4864:20::c2f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1srFx8-0000iq-6E for pgsql-general@lists.postgresql.org; Thu, 19 Sep 2024 12:08:09 +0000 Received: by mail-oo1-xc2f.google.com with SMTP id 006d021491bc7-5e1baf0f764so229895eaf.1 for ; Thu, 19 Sep 2024 05:08:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726747685; x=1727352485; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=vruaG4rgFuERmtHFQrGrIRK9udXPc2UHJvUYZfwccME=; b=juabETZIlJqV3mqUYnGi2eSQ7ZhULtgurP6y3HkzKuAbbA2ejg9fF8AxEarYjvsdOy IVihyZ+t53AAPXu+FLCRTKW6jgfUS/GMDu2LqK2dSWV7tmHPIEbNLrbDySlxDoC1lhtx rc90Nl+Z5cD5ultttlXFgxdsJktg/fAuRa4lugaYN15Fke4VCWe4dAVijjjOG+V4I7wi T8ZJSyC3wKgX7jeYElQrQC4u5Bl3MZsFR47HSn1sVoG5ygZ+SMIdaqbkt926o7Smouhi x4a5BxSOaJUsv2xfM5HvpGb9MFa5GwwA063fytuh7eeQuiZJi3lcfz4Xidpv1216/RMb DPTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726747685; x=1727352485; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vruaG4rgFuERmtHFQrGrIRK9udXPc2UHJvUYZfwccME=; b=IakOW0H+lOjJlJZVeg9sYHzxISFDp8CMzfEjHDf5vh8XdniaY6LhllpKTl1mr+oCdQ oB/D0a5y8/Y7LUUekxSCTuqdG+YaAoMNwwNo8Avp6M1jg4SAe+QZ2c8ZeoayFYUKfBl8 aR6RFYZNChUBax4lX0lpdHLg5RAvkrxjEDfydr6yJy02JcdWLC+cMSL8kJefy08X/Hug TX7g3cJOzC/5htGbosFSOFRVytphtn6tziA8wJSq81mCaAV4j9YLseUU4QbYQ2KQU/Xu 0nILsTMNn/qS2lmOVM64ytKDT+QlmOU7sNemkNawLkUkh+6XGxKnwghWZRqE9aN7VAae NkkQ== X-Gm-Message-State: AOJu0YytxQQEYTLDvTTvFTeisaevkhqubWg2/DT0Y/m3w7zaVH3rW6fN vg+wOZt4AATbXPnHoPLlyYOGKkfxrnJqiaxXcPlA6DliaauAWwtgv1v0rhti3p6r1PW1QAgqxoE D08M4oXUt7ta7dZWXyRRPArTEpvIIluQW X-Google-Smtp-Source: AGHT+IEn/WBb8vjEFiDNWE8hfXbdme8PTqL1QOp57U++pjmWqOeT4KbCUpOVr1UwkajcihfzNKoZMONd3mcuxpwgiLI= X-Received: by 2002:a05:6820:2215:b0:5e1:e748:7d2 with SMTP id 006d021491bc7-5e20c1f9a06mr11097675eaf.1.1726747685476; Thu, 19 Sep 2024 05:08:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Thu, 19 Sep 2024 08:07:54 -0400 Message-ID: Subject: Re: How batch processing works To: pgsql-general Content-Type: multipart/alternative; boundary="00000000000068e393062277c601" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000068e393062277c601 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Sep 19, 2024 at 5:24=E2=80=AFAM Lok P wrote: > > On Thu, Sep 19, 2024 at 11:31=E2=80=AFAM Ron Johnson > wrote: > >> >> [snip] >> >>> >>> Method-4 >>> >>> INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); >>> INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); >>> commit; >>> >> >> If I knew that I had to load a structured input data file (even if it ha= d >> parent and child records), this is how I'd do it (but probably first try >> and see if "in-memory COPY INTO" is such a thing). >> >> > > I was trying to reproduce this behaviour using row by row commit vs just > batch commit vs true batch insert as you mentioned, i am not able to see > any difference between "batch commit" and "true batch insert" response. A= m > I missing anything? > > [snip] > DO $$ > DECLARE > num_inserts INTEGER :=3D 100000; > batch_size INTEGER :=3D 50; > start_time TIMESTAMP; > end_time TIMESTAMP; > elapsed_time INTERVAL; > i INTEGER; > BEGIN > -- Method 1: Individual Inserts with Commit after every Row > start_time :=3D clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table VALUES (i, 'a'); > COMMIT; > END LOOP; > > end_time :=3D clock_timestamp(); > elapsed_time :=3D end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 1: Individual Inserts with Commit after every Row', > start_time, end_time, elapsed_time); > > -- Method 2: Individual Inserts with Commit after 100 Rows > start_time :=3D clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table2 VALUES (i, 'a'); > -- Commit after every 100 rows > IF i % batch_size =3D 0 THEN > COMMIT; > END IF; > END LOOP; > > -- Final commit if not already committed > commit; > > end_time :=3D clock_timestamp(); > elapsed_time :=3D end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', > start_time, end_time, elapsed_time); > > -- Method 3: Batch Inserts with Commit after all > start_time :=3D clock_timestamp(); > > FOR i IN 1..(num_inserts / batch_size) LOOP > INSERT INTO parent_table3 VALUES > (1 + (i - 1) * batch_size, 'a'), > [snip] > (49 + (i - 1) * batch_size, 'a'), > (50 + (i - 1) * batch_size, 'a')); > COMMIT; > END LOOP; > > COMMIT; -- Final commit for all > end_time :=3D clock_timestamp(); > elapsed_time :=3D end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 3: Batch Inserts with Commit after All', start_time, > end_time, elapsed_time); > > END $$; > Reproduce what behavior? Anyway, plpgsql functions (including anonymous DO statements) are -- to Postgresql -- single statements. Thus, they'll be faster than individual calls.. An untrusted language like plpython3u might speed things up even more, if you have to read a heterogeneous external file and insert all the records into the db. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. crustacean! --00000000000068e393062277c601 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Sep 19, 2024 at 5:24=E2=80=AFAM L= ok P <loknath.73@gmail.com&g= t; wrote:

On Thu, Sep 19, 2024= at 11:31=E2=80=AFAM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

[snip]=C2=A0

Method-4<= br>
INSERT INTO parent_table VALUES =C2=A0(1, 'a'), (2, 'a&#= 39;);
INSERT INTO child_table VALUES =C2=A0 (1,1, 'a'), (1,2, &#= 39;a');
commit;

If= I knew that I had to load a structured input data file (even if it had par= ent=C2=A0and child records), this is how I'd do it (but probably first = try and see if "in-memory COPY INTO" is such a thing).
=


I was tryi= ng to reproduce this behaviour using row by row commit vs just batch comm= it vs true batch insert as you mentioned, i am not able to see any differe= nce between "batch commit" and "true batch insert" resp= onse. Am I missing anything?
<= br>
[sn= ip]=C2=A0
DO $$
DECL= ARE
=C2=A0 =C2=A0 num_inserts INTEGER :=3D 100000;
=C2=A0 =C2=A0 batc= h_size INTEGER :=3D 50;
=C2=A0 =C2=A0 start_time TIMESTAMP;
=C2=A0 = =C2=A0 end_time TIMESTAMP;
=C2=A0 =C2=A0 elapsed_time INTERVAL;
=C2= =A0 =C2=A0 i INTEGER;
BEGIN
=C2=A0 =C2=A0 -- Method 1: Individual Ins= erts with Commit after every Row
=C2=A0 =C2=A0 start_time :=3D clock_tim= estamp();

=C2=A0 =C2=A0 FOR i IN 1..num_inserts LOOP
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 INSERT INTO parent_table VALUES (i, 'a');
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 COMMIT; =C2=A0
=C2=A0 =C2=A0 END LOOP;

= =C2=A0 =C2=A0 end_time :=3D clock_timestamp();
=C2=A0 =C2=A0 elapsed_tim= e :=3D end_time - start_time;
=C2=A0 =C2=A0 INSERT INTO debug_log (metho= d1, start_time, end_time, elapsed_time)
=C2=A0 =C2=A0 VALUES ('Meth= od 1: Individual Inserts with Commit after every Row', start_time, end_= time, elapsed_time);

=C2=A0 =C2=A0 -- Method 2: Individual Inserts w= ith Commit after 100 Rows
=C2=A0 =C2=A0 start_time :=3D clock_timestamp(= );

=C2=A0 =C2=A0 FOR i IN 1..num_inserts LOOP
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 INSERT INTO parent_table2 VALUES (i, 'a');
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 -- Commit after every 100 rows
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 IF i % batch_size =3D 0 THEN
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 COMMIT; =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 END IF;
=C2=A0 =C2= =A0 END LOOP;

=C2=A0 =C2=A0 -- Final commit if not already committed=
=C2=A0 =C2=A0commit;

=C2=A0 =C2=A0 end_time :=3D clock_timestamp= ();
=C2=A0 =C2=A0 elapsed_time :=3D end_time - start_time;
=C2=A0 =C2= =A0 INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) =C2=A0 =C2=A0 VALUES ('Method 2: Individual Inserts with Commit after = 100 Rows', start_time, end_time, elapsed_time);

=C2=A0 =C2=A0 --= Method 3: Batch Inserts with Commit after all
=C2=A0 =C2=A0 start_time = :=3D clock_timestamp();

=C2=A0 =C2=A0 FOR i IN 1..(num_inserts / bat= ch_size) LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT INTO parent_table3 VALU= ES
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (1 + (i - 1) * batch_size,= 'a'),
[snip]=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 (49 + (i - 1) * batch_size, 'a'),
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 (50 + (i - 1) * batch_size, 'a'));
C= OMMIT;
=C2=A0 =C2=A0 END LOOP;

=C2=A0 =C2=A0 COMMIT; =C2=A0-- Fin= al commit for all
=C2=A0 =C2=A0 end_time :=3D clock_timestamp();
=C2= =A0 =C2=A0 elapsed_time :=3D end_time - start_time;
=C2=A0 =C2=A0 INSERT= INTO debug_log (method1, start_time, end_time, elapsed_time)
=C2=A0 = =C2=A0 VALUES ('Method 3: Batch Inserts with Commit after All', sta= rt_time, end_time, elapsed_time);

END $$;

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to = Postgresql -- single statements.=C2=A0 Thus, they'll be faster than ind= ividual=C2=A0calls..

An untrusted lang= uage like=C2=A0plpython3u might speed things up even more, if you have to r= ead a heterogeneous external file and insert all the records into the db.

--
Death to <Redac= ted>, and butter sauce.
Don't boil me, I'm still alive.
<= div>
<Redacted> crustacean!
--00000000000068e393062277c601--