Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srVnY-002FkT-UM for pgsql-general@arkaria.postgresql.org; Fri, 20 Sep 2024 05:03:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1srVnW-00FIrV-H7 for pgsql-general@arkaria.postgresql.org; Fri, 20 Sep 2024 05:03:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srVnW-00FIrM-2o for pgsql-general@lists.postgresql.org; Fri, 20 Sep 2024 05:03:15 +0000 Received: from mail-yb1-xb30.google.com ([2607:f8b0:4864:20::b30]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1srVnT-00081C-5K for pgsql-general@lists.postgresql.org; Fri, 20 Sep 2024 05:03:14 +0000 Received: by mail-yb1-xb30.google.com with SMTP id 3f1490d57ef6-e1a9dc3efc1so1659521276.2 for ; Thu, 19 Sep 2024 22:03:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726808591; x=1727413391; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AX7O2I5UcajoVFoCL3C2d6EYcPsJiOsNc7APAJF/ZEc=; b=MPJDQHIVrn+foXOCP4FP98fLupnloJ1aDGDDCbwi/EP0us7r8sNvbhGy1pYX16Har9 vvKZa3Kb/Ka2EhiOmNBYmC9lQWKChI0HyHfLN5GkItGeVIjbIz7UCjad/5tWzuW3xTks +s+bxnq/ufSRaOL6oZ92m+GrwOdgsqmh017SqGazz9VxLFSuJQ0gWGPnhmD6hRnZW5nX UiRY7BmTMByxuZettP7+nrgWdTicFopZPGeosZ1xjqkU+B5GlJfVdmAIj4GOVB6zr5NH rA8CxgvxHpkY5S3x11RtvuFQqT/OaKxZEqFxzuKLUVP43LxKzs66/vWvi8dYBHbIfCG8 3kXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726808591; x=1727413391; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AX7O2I5UcajoVFoCL3C2d6EYcPsJiOsNc7APAJF/ZEc=; b=K6kQhiv1ZcDmI6rLcZMbnxABr7o7w+lvfKs0yF39PF0Mb/rpVH2PV6xzCL1nbzGxH+ ZvEjvv6OFHFhXF7VbaVYBx9bUzKltfzmYrzitLyAqZCuLUfmYSXtekJjT2XstM31tVIQ KBiOQozPbMYjfDdtSAvIN9156EyoApUtbG0+QXw8fU/HtD1Btle4y2zHn+ByTPqmqF1l Htka5YXa1syLQB7RL5+NAwczX+gpzJFFAN1Tf1GCP22i5+igbgrkNB8LgqQ6YQph5FJ3 5MKcWSV8sj9PlwU0QiQkT2MqOcf5dDPukZKjX0P0EJ8rhzOoZywv/2404ykX5KEU5lPY Zuog== X-Gm-Message-State: AOJu0YzlId4LkrrgEGhxqHE7NA8WsrQIonng8nN6lxRcO998/izzCs1x LQhZBjmRIFhGV5Un4LnK5hKMV5tLP0iVuy6iMY04oW5GpEjTJ39qMKBurXUENCaF0a3Gry6b89L n4s28DmruJqiDUf6Yf7UaBXIK0Gw= X-Google-Smtp-Source: AGHT+IGKscAdb37J9ttwMsUgq3kjvlwCWU2q6yL2aDaxoLAapjCPGoNaRfUcruNd+BNruqtUpWaP1CohIKjV8qG7woM= X-Received: by 2002:a05:690c:89:b0:6b2:d72a:9dfc with SMTP id 00721157ae682-6dfeeec580fmr17634127b3.33.1726808590677; Thu, 19 Sep 2024 22:03:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Lok P Date: Fri, 20 Sep 2024 10:32:59 +0530 Message-ID: Subject: Re: How batch processing works To: Ron Johnson Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000a48269062285f457" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a48269062285f457 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Below are the results for the posted methods. Tested it on local and it gave no difference in timing between the method-2 andmethod-3. Failed to run in dbfiddle somehow. Also I was initially worried if adding the trigger to the our target table, will worsen the performance as because , it will make all the execution to "row by row" rather a true batch insert(method-3 as posted) as there will be more number of context switches , but it seems it will still be doing the batch commits(like the way its in method-2). So as per that , we won't lose any performance as such. Is this understanding correct? *Method-1- 00:01:44.48* *Method-2- 00:00:02.67* *Method-3- 00:00:02.39* https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6 On Thu, Sep 19, 2024 at 6:42=E2=80=AFPM Lok P wrote: > > > On Thu, Sep 19, 2024 at 5:40=E2=80=AFPM Ron Johnson > wrote: > >> On Thu, Sep 19, 2024 at 5:24=E2=80=AFAM Lok P wro= te: >> >>> >>> >>> [snip] >> >>> DO $$ >>> DECLARE >>> num_inserts INTEGER :=3D 100000; >>> batch_size INTEGER :=3D 50; >>> start_time TIMESTAMP; >>> end_time TIMESTAMP; >>> elapsed_time INTERVAL; >>> i INTEGER; >>> BEGIN >>> -- Method 1: Individual Inserts with Commit after every Row >>> start_time :=3D clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table VALUES (i, 'a'); >>> COMMIT; >>> END LOOP; >>> >>> end_time :=3D clock_timestamp(); >>> elapsed_time :=3D end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 1: Individual Inserts with Commit after every Row', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 2: Individual Inserts with Commit after 100 Rows >>> start_time :=3D clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table2 VALUES (i, 'a'); >>> -- Commit after every 100 rows >>> IF i % batch_size =3D 0 THEN >>> COMMIT; >>> END IF; >>> END LOOP; >>> >>> -- Final commit if not already committed >>> commit; >>> >>> end_time :=3D clock_timestamp(); >>> elapsed_time :=3D end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 3: Batch Inserts with Commit after all >>> start_time :=3D clock_timestamp(); >>> >>> FOR i IN 1..(num_inserts / batch_size) LOOP >>> INSERT INTO parent_table3 VALUES >>> (1 + (i - 1) * batch_size, 'a'), >>> >> [snip] >> >>> (49 + (i - 1) * batch_size, 'a'), >>> (50 + (i - 1) * batch_size, 'a')); >>> COMMIT; >>> END LOOP; >>> >>> COMMIT; -- Final commit for all >>> end_time :=3D clock_timestamp(); >>> elapsed_time :=3D end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time= , >>> end_time, elapsed_time); >>> >>> END $$; >>> >> >> Reproduce what behavior? >> >> Anyway, plpgsql functions (including anonymous DO statements) are -- to >> Postgresql -- single statements. Thus, they'll be faster than >> individual calls.. >> >> An untrusted language like plpython3u might speed things up even more, i= f >> you have to read a heterogeneous external file and insert all the record= s >> into the db. >> > > Here if you see my script , the method-1 is doing commit after each row > insert. And method-2 is doing a batch commit i.e. commit after every "50" > row. And method-3 is doing a true batch insert i.e. combining all the 50 > values in one insert statement and submitting to the database in oneshot > and then COMMIT it, so the context switching will be a lot less. So I was > expecting Method-3 to be the fastest way to insert the rows here, but the > response time shows the same response time for Method-2 and method-3. > Method-1 is the slowest through. > --000000000000a48269062285f457 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Below are the results for the posted methods. Tested it on local and it ga= ve no difference in timing between the method-2 andmethod-3. Failed to run = in dbfiddle somehow.

Also I = was initially worried if adding the trigger to the our target table, will w= orsen the performance as because , it will make all the execution to "= row by row" rather a true batch insert(method-3 as posted) as there wi= ll be more number of context switches , but it seems it will still be doing= the batch commits(like the way its in method-2). So as per that , we won&#= 39;t lose any performance as such. Is this understanding correct?


Method-1- 00:01:44.48

Method-2- 00:00:02.67

Method-3-= 00:00:02.39

https://gist.github.= com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6


On Thu, Sep 19= , 2024 at 6:42=E2=80=AFPM Lok P <loknath.73@gmail.com> wrote:


On Thu, Sep 19, = 2024 at 5:40=E2=80=AFPM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">
On Thu, Sep 19, 2024 at 5:24=E2=80=AFAM Lok P <loknath.73@gmail.com> wrot= e:


[snip]=C2=A0
DO $$
DECLARE
=C2=A0 =C2=A0 num_inserts INTEGER :=3D 100000;
=C2= =A0 =C2=A0 batch_size INTEGER :=3D 50;
=C2=A0 =C2=A0 start_time TIMESTAM= P;
=C2=A0 =C2=A0 end_time TIMESTAMP;
=C2=A0 =C2=A0 elapsed_time INTER= VAL;
=C2=A0 =C2=A0 i INTEGER;
BEGIN
=C2=A0 =C2=A0 -- Method 1: Ind= ividual Inserts with Commit after every Row
=C2=A0 =C2=A0 start_time := =3D clock_timestamp();

=C2=A0 =C2=A0 FOR i IN 1..num_inserts LOOP=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT INTO parent_table VALUES (i, 'a'= ;);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 COMMIT; =C2=A0
=C2=A0 =C2=A0 END LOOP= ;

=C2=A0 =C2=A0 end_time :=3D clock_timestamp();
=C2=A0 =C2=A0 el= apsed_time :=3D end_time - start_time;
=C2=A0 =C2=A0 INSERT INTO debug_l= og (method1, start_time, end_time, elapsed_time)
=C2=A0 =C2=A0 VALUES (= 'Method 1: Individual Inserts with Commit after every Row', start_t= ime, end_time, elapsed_time);

=C2=A0 =C2=A0 -- Method 2: Individual = Inserts with Commit after 100 Rows
=C2=A0 =C2=A0 start_time :=3D clock_t= imestamp();

=C2=A0 =C2=A0 FOR i IN 1..num_inserts LOOP
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 INSERT INTO parent_table2 VALUES (i, 'a');
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 -- Commit after every 100 rows
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 IF i % batch_size =3D 0 THEN
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 COMMIT; =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 END IF;
=C2= =A0 =C2=A0 END LOOP;

=C2=A0 =C2=A0 -- Final commit if not already co= mmitted
=C2=A0 =C2=A0commit;

=C2=A0 =C2=A0 end_time :=3D clock_ti= mestamp();
=C2=A0 =C2=A0 elapsed_time :=3D end_time - start_time;
=C2= =A0 =C2=A0 INSERT INTO debug_log (method1, start_time, end_time, elapsed_ti= me)
=C2=A0 =C2=A0 VALUES ('Method 2: Individual Inserts with Commit= after 100 Rows', start_time, end_time, elapsed_time);

=C2=A0 = =C2=A0 -- Method 3: Batch Inserts with Commit after all
=C2=A0 =C2=A0 st= art_time :=3D clock_timestamp();

=C2=A0 =C2=A0 FOR i IN 1..(num_inse= rts / batch_size) LOOP
=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT INTO parent_ta= ble3 VALUES
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (1 + (i - 1) * ba= tch_size, 'a'),
[snip]=C2= =A0
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 (49 + (i - 1) * batch_size, 'a'),
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (50 + (i - 1) * batch_size, 'a')= );
COMMIT;
=C2=A0 =C2=A0 END LOOP;

=C2=A0 =C2=A0 COMMIT; = =C2=A0-- Final commit for all
=C2=A0 =C2=A0 end_time :=3D clock_timestam= p();
=C2=A0 =C2=A0 elapsed_time :=3D end_time - start_time;
=C2=A0 = =C2=A0 INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) =
=C2=A0 =C2=A0 VALUES ('Method 3: Batch Inserts with Commit after Al= l', start_time, end_time, elapsed_time);

END $$;

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to = Postgresql -- single statements.=C2=A0 Thus, they'll be faster than ind= ividual=C2=A0calls..

An untrusted lang= uage like=C2=A0plpython3u might speed things up even more, if you have to r= ead a heterogeneous external file and insert all the records into the db.

Here if you see my script = , the method-1 is doing commit after each row insert. And method-2 is doing= a batch commit i.e. commit after every "50" row. And method-3 is= doing a true batch insert i.e. combining all the 50 values in one insert s= tatement and submitting to the database in oneshot and then COMMIT it,=C2= =A0so the context switching will be a lot less. So I was expecting Method-3= to be the fastest way to insert the rows here, but the response time shows= the same response time for Method-2 and method-3. Method-1 is the slowest = through.
--000000000000a48269062285f457--