public inbox for [email protected]help / color / mirror / Atom feed
Re: How batch processing works 4+ messages / 3 participants [nested] [flat]
* Re: How batch processing works @ 2024-09-21 15:25 Lok P <[email protected]> 2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Lok P @ 2024-09-21 15:25 UTC (permalink / raw) To: [email protected] On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer <[email protected]> wrote: > On 2024-09-21 16:44:08 +0530, Lok P wrote: > > But wondering why we don't see any difference in performance between > method-2 > > and method-3 above. > > The code runs completely inside the database. So there isn't much > difference between a single statement which inserts 50 rows and 50 > statements which insert 1 row each. The work to be done is (almost) the > same. > > This changes once you consider an application which runs outside of the > database (maybe even on a different host). Such an application has to > wait for the result of each statement before it can send the next one. > Now it makes a difference whether you are waiting 50 times for a > statement which does very little or just once for a statement which does > more work. > > > So does it mean that,I am testing this in a wrong way or > > That depends on what you want to test. If you are interested in the > behaviour of stored procedures, the test is correct. If you want to know > about the performance of a database client (whether its written in Java, > Python, Go or whatever), this is the wrong test. You have to write the > test in your target language and run it on the client system to get > realistic results (for example, the round-trip times will be a lot > shorter if the client and database are on the same computer than when > one is in Europe and the other in America). > > For example, here are the three methods as Python scripts: > > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 1: Individual Inserts with Commit after every Row: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import itertools > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > batch = [] > for i in range(1, num_inserts+1): > batch.append((i, 'a')) > if i % batch_size == 0: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > params = list(itertools.chain.from_iterable(batch)) > csr.execute(q, params) > db.commit() > batch = [] > if batch: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > csr.execute(q, list(itertools.chain(batch))) > db.commit() > batch = [] > > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each > batch: {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more > than 16 times faster than method3 . Simply because the delay in > communication is now large compared to the time it takes to insert those > rows. > > Thank you so much. I was expecting method-3(batch insert) to be the fastest or atleast as you said perform with similar speed as method-2 (row by row insert with batch commit) if we do it within the procedure inside the database. But because the context switching will be minimal in method-3 as it will prepare the insert and submit to the database in one shot in one DB call, so it should be a bit fast. But from your figures , it appears to be the opposite , i.e.method-2 is faster than method-3. Not able to understand the reason though. So in this case then ,it appears we can follow method-2 as that is cheaper in regards to less code change , i.e. just shifting the commit points without any changes for doing the batch insert. Btw,Do you have any thoughts, why method-2 is faster as compared to method-3 in your test? ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: How batch processing works 2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]> @ 2024-09-22 19:23 ` Peter J. Holzer <[email protected]> 2024-10-04 20:05 ` Re: How batch processing works Lok P <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Peter J. Holzer @ 2024-09-22 19:23 UTC (permalink / raw) To: [email protected] On 2024-09-21 20:55:13 +0530, Lok P wrote: > On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer <[email protected]> wrote: [... lots of code elided. method2 used 1 insert per row, method3 1 insert for 50 rows ...] > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more > than 16 times faster than method3 . Simply because the delay in > communication is now large compared to the time it takes to insert those > rows. > > > > Thank you so much. > I was expecting method-3(batch insert) to be the fastest or atleast as you said > perform with similar speed as method-2 (row by row insert with batch commit) Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest. I guess I meant to write "method2 takes about twice as long as method3" or something like that. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | [email protected] | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: How batch processing works 2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]> 2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]> @ 2024-10-04 20:05 ` Lok P <[email protected]> 2024-10-04 20:49 ` Re: How batch processing works Adrian Klaver <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Lok P @ 2024-10-04 20:05 UTC (permalink / raw) To: [email protected] On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer <[email protected]> wrote: > > > Thank you so much. > > I was expecting method-3(batch insert) to be the fastest or atleast as > you said > > perform with similar speed as method-2 (row by row insert with batch > commit) > > Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest. > I guess I meant to write "method2 takes about twice as long as method3" > or something like that. > > As in case of batch insert below is the fastest one as it inserts multiple rows in one statement. Similarly I understand, Delete can be batched as below. However, can you suggest how an Update can be batched in a simple/generic fashion in JDBC for an input data stream with multiple input values. As because for an update if we write as below , it will just do one row update at a time? Update <table_name> SET column1=?, column2=?, column3=? where <PK_Column>=? ; INSERT INTO <table_name> VALUES (1, 'a'), (2, 'a'),(3,'a'); Delete from <table_name> where column_name in (<value1>, <value2>,<value3>...); ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: How batch processing works 2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]> 2024-09-22 19:23 ` Re: How batch processing works Peter J. Holzer <[email protected]> 2024-10-04 20:05 ` Re: How batch processing works Lok P <[email protected]> @ 2024-10-04 20:49 ` Adrian Klaver <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Adrian Klaver @ 2024-10-04 20:49 UTC (permalink / raw) To: Lok P <[email protected]>; [email protected] On 10/4/24 1:05 PM, Lok P wrote: > > > On Mon, Sep 23, 2024 at 12:53 AM Peter J. Holzer <[email protected] > <mailto:[email protected]>> wrote: > > > > Thank you so much. > > I was expecting method-3(batch insert) to be the fastest or > atleast as you said > > perform with similar speed as method-2 (row by row insert with > batch commit) > > Oops, sorry! I wrote that the wrong way around. Method 3 is the fastest. > I guess I meant to write "method2 takes about twice as long as method3" > or something like that. > > > As in case of batch insert below is the fastest one as it inserts > multiple rows in one statement. Similarly I understand, Delete can be > batched as below. However, can you suggest how an Update can be batched > in a simple/generic fashion in JDBC for an input data stream with > multiple input values. As because for an update if we write as below , > it will just do one row update at a time? > > Update <table_name> SET column1=?, column2=?, column3=? where > <PK_Column>=? ; UPDATE table_name SET column1 = vals.text_val, column2=int_val FROM (VALUES (1, 'dog', 23),(2, 'cat', 44)) AS vals (id, text_val, int_val) where table_name.id = vals.id; > > > INSERT INTO <table_name> VALUES (1, 'a'), (2, 'a'),(3,'a'); > Delete from <table_name> where column_name in (<value1>, > <value2>,<value3>...); > > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2024-10-04 20:49 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-09-21 15:25 Re: How batch processing works Lok P <[email protected]> 2024-09-22 19:23 ` Peter J. Holzer <[email protected]> 2024-10-04 20:05 ` Lok P <[email protected]> 2024-10-04 20:49 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox