Re: How batch processing works

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Lok P <[email protected]>
To: Ron Johnson <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: How batch processing works
Date: Fri, 20 Sep 2024 10:32:59 +0530
Message-ID: <CAKna9Vb7kg4FjkYRrDJLFGX6cqwYSxEJwo3uEq5uSupH=wp+ew@mail.gmail.com> (raw)
In-Reply-To: <CAKna9VZGR+BkBOocbShxRpcXJMgeT=aFDYut+Va9n5P42mqtcg@mail.gmail.com>
References: <CAKna9VY2v0XsDberzbJXZ4MqEW1RUtD0L_Mis_vrgEQWZgH0gg@mail.gmail.com>
	<CANzqJaDLSfHeWqe5BGS93bLhE4PnUgW4eSZS1E4xUa83kQtBgA@mail.gmail.com>
	<CAKna9VYNiTpaQkEnRRsN7P2PX2NU6vpYZF376NFvWv8LhzfX3g@mail.gmail.com>
	<CANzqJaBahWw3gGhgaatOQnUh3Daij7kjtk7a3+Z=dJNL4d4SKg@mail.gmail.com>
	<CAKna9VZGR+BkBOocbShxRpcXJMgeT=aFDYut+Va9n5P42mqtcg@mail.gmail.com>

Below are the results for the posted methods. Tested it on local and it
gave no difference in timing between the method-2 andmethod-3. Failed to
run in dbfiddle somehow.

Also I was initially worried if adding the trigger to the our target table,
will worsen the performance as because , it will make all the execution to
"row by row" rather a true batch insert(method-3 as posted) as there will
be more number of context switches , but it seems it will still be doing
the batch commits(like the way its in method-2). So as per that , we won't
lose any performance as such. Is this understanding correct?


*Method-1- 00:01:44.48*

*Method-2- 00:00:02.67*

*Method-3- 00:00:02.39*

https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6

On Thu, Sep 19, 2024 at 6:42 PM Lok P <[email protected]> wrote:

>
>
> On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson <[email protected]>
> wrote:
>
>> On Thu, Sep 19, 2024 at 5:24 AM Lok P <[email protected]> wrote:
>>
>>>
>>>
>>> [snip]
>>
>>> DO $$
>>> DECLARE
>>>     num_inserts INTEGER := 100000;
>>>     batch_size INTEGER := 50;
>>>     start_time TIMESTAMP;
>>>     end_time TIMESTAMP;
>>>     elapsed_time INTERVAL;
>>>     i INTEGER;
>>> BEGIN
>>>     -- Method 1: Individual Inserts with Commit after every Row
>>>     start_time := clock_timestamp();
>>>
>>>     FOR i IN 1..num_inserts LOOP
>>>         INSERT INTO parent_table VALUES (i, 'a');
>>>         COMMIT;
>>>     END LOOP;
>>>
>>>     end_time := clock_timestamp();
>>>     elapsed_time := end_time - start_time;
>>>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>>     VALUES ('Method 1: Individual Inserts with Commit after every Row',
>>> start_time, end_time, elapsed_time);
>>>
>>>     -- Method 2: Individual Inserts with Commit after 100 Rows
>>>     start_time := clock_timestamp();
>>>
>>>     FOR i IN 1..num_inserts LOOP
>>>         INSERT INTO parent_table2 VALUES (i, 'a');
>>>         -- Commit after every 100 rows
>>>         IF i % batch_size = 0 THEN
>>>             COMMIT;
>>>         END IF;
>>>     END LOOP;
>>>
>>>     -- Final commit if not already committed
>>>    commit;
>>>
>>>     end_time := clock_timestamp();
>>>     elapsed_time := end_time - start_time;
>>>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>>     VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>>> start_time, end_time, elapsed_time);
>>>
>>>     -- Method 3: Batch Inserts with Commit after all
>>>     start_time := clock_timestamp();
>>>
>>>     FOR i IN 1..(num_inserts / batch_size) LOOP
>>>         INSERT INTO parent_table3 VALUES
>>>             (1 + (i - 1) * batch_size, 'a'),
>>>
>> [snip]
>>
>>>             (49 + (i - 1) * batch_size, 'a'),
>>>             (50 + (i - 1) * batch_size, 'a'));
>>> COMMIT;
>>>     END LOOP;
>>>
>>>     COMMIT;  -- Final commit for all
>>>     end_time := clock_timestamp();
>>>     elapsed_time := end_time - start_time;
>>>     INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>>     VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>>> end_time, elapsed_time);
>>>
>>> END $$;
>>>
>>
>> Reproduce what behavior?
>>
>> Anyway, plpgsql functions (including anonymous DO statements) are -- to
>> Postgresql -- single statements.  Thus, they'll be faster than
>> individual calls..
>>
>> An untrusted language like plpython3u might speed things up even more, if
>> you have to read a heterogeneous external file and insert all the records
>> into the db.
>>
>
> Here if you see my script , the method-1 is doing commit after each row
> insert. And method-2 is doing a batch commit i.e. commit after every "50"
> row. And method-3 is doing a true batch insert i.e. combining all the 50
> values in one insert statement and submitting to the database in oneshot
> and then COMMIT it, so the context switching will be a lot less. So I was
> expecting Method-3 to be the fastest way to insert the rows here, but the
> response time shows the same response time for Method-2 and method-3.
> Method-1 is the slowest through.
>

view thread (5+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: How batch processing works
  In-Reply-To: <CAKna9Vb7kg4FjkYRrDJLFGX6cqwYSxEJwo3uEq5uSupH=wp+ew@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox