public inbox for [email protected]  
help / color / mirror / Atom feed
Re: How batch processing works
4+ messages / 3 participants
[nested] [flat]

* Re: How batch processing works
@ 2024-09-21 15:25 Lok P <[email protected]>
  2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Lok P @ 2024-09-21 15:25 UTC (permalink / raw)
  To: [email protected]

On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer <[email protected]> wrote:

> On 2024-09-21 16:44:08 +0530, Lok P wrote:
> > But wondering why we don't see any difference in performance between
> method-2
> > and method-3 above.
>
> The code runs completely inside the database. So there isn't much
> difference between a single statement which inserts 50 rows and 50
> statements which insert 1 row each. The work to be done is (almost) the
> same.
>
> This changes once you consider an application which runs outside of the
> database (maybe even on a different host). Such an application has to
> wait for the result of each statement before it can send the next one.
> Now it makes a difference whether you are waiting 50 times for a
> statement which does very little or just once for a statement which does
> more work.
>
> > So does it mean that,I am testing this in a wrong way or
>
> That depends on what you want to test. If you are interested in the
> behaviour of stored procedures, the test is correct. If you want to know
> about the performance of a database client (whether its written in Java,
> Python, Go or whatever), this is the wrong test. You have to write the
> test in your target language and run it on the client system to get
> realistic results (for example, the round-trip times will be a lot
> shorter if the client and database are on the same computer than when
> one is in Europe and the other in America).
>
> For example, here are the three methods as Python scripts:
>
>
> ---------------------------------------------------------------------------------------------------
> #!/usr/bin/python3
>
> import time
> import psycopg2
>
> num_inserts = 10_000
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
>
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
>     csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
>     db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 1: Individual Inserts with Commit after every Row:
> {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---------------------------------------------------------------------------------------------------
> #!/usr/bin/python3
>
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
>     csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
>     if i % batch_size == 0:
>         db.commit()
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 2: Individual Inserts with Commit after {batch_size}  Rows:
> {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---------------------------------------------------------------------------------------------------
> #!/usr/bin/python3
>
> import itertools
> import time
> import psycopg2
>
> num_inserts = 10_000
> batch_size = 50
>
> db = psycopg2.connect()
> csr = db.cursor()
>
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
>
> start_time = time.monotonic()
> batch = []
> for i in range(1, num_inserts+1):
>     batch.append((i, 'a'))
>     if i % batch_size == 0:
>         q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
>         params = list(itertools.chain.from_iterable(batch))
>         csr.execute(q, params)
>         db.commit()
>         batch = []
> if batch:
>     q = "insert into parent_table values" + ",".join(["(%s, %s)"] *
> len(batch))
>     csr.execute(q, list(itertools.chain(batch)))
>     db.commit()
>     batch = []
>
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 3: Batch Inserts ({batch_size})  with Commit after each
> batch: {elapsed_time:.3} seconds")
>
> # vim: tw=99
>
> ---------------------------------------------------------------------------------------------------
>
> On my laptop, method2 is about twice as fast as method3. But if I
> connect to a database on the other side of the city, method2 is now more
> than 16 times faster than method3 . Simply because the delay in
> communication is now large compared to the time it takes to insert those
> rows.
>
>
Thank you so much.
I was expecting method-3(batch insert) to be the fastest or atleast as you
said perform with similar speed as method-2 (row by row insert with batch
commit) if we do it within the procedure inside the database. But because
the context switching will be minimal in method-3 as it will prepare the
insert and submit to the database in one shot in one DB call, so it should
be a bit fast. But from your figures , it appears to be the opposite ,
i.e.method-2 is faster than method-3. Not able to understand the reason
though. So in this case then ,it appears we can follow method-2 as that is
cheaper in regards to less code change , i.e. just shifting the commit
points without any changes for doing the batch insert.

Btw,Do you have any thoughts,  why method-2 is faster as compared to
method-3 in your test?


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: How batch processing works
  2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]>
@ 2024-09-22 19:23 ` Peter J. Holzer <[email protected]>
  2024-10-04 20:05   ` Re: How batch processing works Lok P <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Peter J. Holzer @ 2024-09-22 19:23 UTC (permalink / raw)
  To: [email protected]

On 2024-09-21 20:55:13 +0530, Lok P wrote:
> On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer <[email protected]> wrote:
[... lots of code elided. method2 used 1 insert per row, method3 1
insert for 50 rows ...]

>     On my laptop, method2 is about twice as fast as method3. But if I
>     connect to a database on the other side of the city, method2 is now more
>     than 16 times faster than method3 . Simply because the delay in
>     communication is now large compared to the time it takes to insert those
>     rows.
> 
> 
> 
> Thank you so much.
> I was expecting method-3(batch insert) to be the fastest or atleast as you said
> perform with similar speed as method-2 (row by row insert with batch commit)

Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest.
I guess I meant to write "method2 takes about twice as long as method3"
or something like that.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: How batch processing works
  2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]>
  2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]>
@ 2024-10-04 20:05   ` Lok P <[email protected]>
  2024-10-04 20:49     ` Re: How batch processing works Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Lok P @ 2024-10-04 20:05 UTC (permalink / raw)
  To: [email protected]

On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer <[email protected]> wrote:

>
> > Thank you so much.
> > I was expecting method-3(batch insert) to be the fastest or atleast as
> you said
> > perform with similar speed as method-2 (row by row insert with batch
> commit)
>
> Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest.
> I guess I meant to write "method2 takes about twice as long as method3"
> or something like that.
>
>
As in case of batch insert below is the fastest one as it inserts
multiple rows in one statement. Similarly I understand, Delete can be
batched as below. However, can you suggest how an Update can be batched in
a simple/generic fashion in JDBC for an input data stream with multiple
input values. As because for an update if we write as below , it will just
do one row update at a time?

Update <table_name> SET column1=?,   column2=?, column3=? where
<PK_Column>=? ;


INSERT INTO <table_name> VALUES  (1, 'a'), (2, 'a'),(3,'a');
Delete from <table_name> where column_name  in (<value1>,
<value2>,<value3>...);


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: How batch processing works
  2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]>
  2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]>
  2024-10-04 20:05   ` Re: How batch processing works Lok P <[email protected]>
@ 2024-10-04 20:49     ` Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Adrian Klaver @ 2024-10-04 20:49 UTC (permalink / raw)
  To: Lok P <[email protected]>; [email protected]



On 10/4/24 1:05 PM, Lok P wrote:
> 
> 
> On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
>      > Thank you so much.
>      > I was expecting method-3(batch insert) to be the fastest or
>     atleast as you said
>      > perform with similar speed as method-2 (row by row insert with
>     batch commit)
> 
>     Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest.
>     I guess I meant to write "method2 takes about twice as long as method3"
>     or something like that.
> 
> 
> As in case of batch insert below is the fastest one as it inserts 
> multiple rows in one statement. Similarly I understand, Delete can be 
> batched as below. However, can you suggest how an Update can be batched 
> in a simple/generic fashion in JDBC for an input data stream with 
> multiple input values. As because for an update if we write as below , 
> it will just do one row update at a time?
> 
> Update <table_name> SET column1=?,   column2=?, column3=? where 
> <PK_Column>=? ;

UPDATE table_name SET column1 = vals.text_val, column2=int_val FROM 
(VALUES (1, 'dog', 23),(2, 'cat', 44)) AS vals (id, text_val, int_val) 
where table_name.id = vals.id;

> 
> 
> INSERT INTO <table_name> VALUES  (1, 'a'), (2, 'a'),(3,'a');
> Delete from <table_name> where column_name  in (<value1>, 
> <value2>,<value3>...);
> 
> 

-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2024-10-04 20:49 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]>
2024-09-22 19:23 ` Peter J. Holzer <[email protected]>
2024-10-04 20:05   ` Lok P <[email protected]>
2024-10-04 20:49     ` Adrian Klaver <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox