public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ron Johnson <[email protected]>
To: pgsql-general <[email protected]>
Subject: Re: How batch processing works
Date: Thu, 19 Sep 2024 08:07:54 -0400
Message-ID: <CANzqJaBahWw3gGhgaatOQnUh3Daij7kjtk7a3+Z=dJNL4d4SKg@mail.gmail.com> (raw)
In-Reply-To: <CAKna9VYNiTpaQkEnRRsN7P2PX2NU6vpYZF376NFvWv8LhzfX3g@mail.gmail.com>
References: <CAKna9VY2v0XsDberzbJXZ4MqEW1RUtD0L_Mis_vrgEQWZgH0gg@mail.gmail.com>
	<CANzqJaDLSfHeWqe5BGS93bLhE4PnUgW4eSZS1E4xUa83kQtBgA@mail.gmail.com>
	<CAKna9VYNiTpaQkEnRRsN7P2PX2NU6vpYZF376NFvWv8LhzfX3g@mail.gmail.com>

On Thu, Sep 19, 2024 at 5:24 AM Lok P <[email protected]> wrote:

>
> On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson <[email protected]>
> wrote:
>
>>
>> [snip]
>>
>>>
>>> Method-4
>>>
>>> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
>>> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
>>> commit;
>>>
>>
>> If I knew that I had to load a structured input data file (even if it had
>> parent and child records), this is how I'd do it (but probably first try
>> and see if "in-memory COPY INTO" is such a thing).
>>
>>
>
> I was trying to reproduce this behaviour using row by row commit vs just
> batch commit vs true batch insert as you mentioned, i am not able to see
> any difference between "batch commit" and "true batch insert" response. Am
> I missing anything?
>
> [snip]

> DO $$
> DECLARE
>     num_inserts INTEGER := 100000;
>     batch_size INTEGER := 50;
>     start_time TIMESTAMP;
>     end_time TIMESTAMP;
>     elapsed_time INTERVAL;
>     i INTEGER;
> BEGIN
>     -- Method 1: Individual Inserts with Commit after every Row
>     start_time := clock_timestamp();
>
>     FOR i IN 1..num_inserts LOOP
>         INSERT INTO parent_table VALUES (i, 'a');
>         COMMIT;
>     END LOOP;
>
>     end_time := clock_timestamp();
>     elapsed_time := end_time - start_time;
>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>     VALUES ('Method 1: Individual Inserts with Commit after every Row',
> start_time, end_time, elapsed_time);
>
>     -- Method 2: Individual Inserts with Commit after 100 Rows
>     start_time := clock_timestamp();
>
>     FOR i IN 1..num_inserts LOOP
>         INSERT INTO parent_table2 VALUES (i, 'a');
>         -- Commit after every 100 rows
>         IF i % batch_size = 0 THEN
>             COMMIT;
>         END IF;
>     END LOOP;
>
>     -- Final commit if not already committed
>    commit;
>
>     end_time := clock_timestamp();
>     elapsed_time := end_time - start_time;
>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>     VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
> start_time, end_time, elapsed_time);
>
>     -- Method 3: Batch Inserts with Commit after all
>     start_time := clock_timestamp();
>
>     FOR i IN 1..(num_inserts / batch_size) LOOP
>         INSERT INTO parent_table3 VALUES
>             (1 + (i - 1) * batch_size, 'a'),
>
[snip]

>             (49 + (i - 1) * batch_size, 'a'),
>             (50 + (i - 1) * batch_size, 'a'));
> COMMIT;
>     END LOOP;
>
>     COMMIT;  -- Final commit for all
>     end_time := clock_timestamp();
>     elapsed_time := end_time - start_time;
>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>     VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
> end_time, elapsed_time);
>
> END $$;
>

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to
Postgresql -- single statements.  Thus, they'll be faster than
individual calls..

An untrusted language like plpython3u might speed things up even more, if
you have to read a heterogeneous external file and insert all the records
into the db.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!


view thread (5+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How batch processing works
  In-Reply-To: <CANzqJaBahWw3gGhgaatOQnUh3Daij7kjtk7a3+Z=dJNL4d4SKg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox