public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: [email protected]
Subject: Re: How batch processing works
Date: Sat, 21 Sep 2024 12:15:44 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CAKna9VY2v0XsDberzbJXZ4MqEW1RUtD0L_Mis_vrgEQWZgH0gg@mail.gmail.com>
	<[email protected]>
	<CAKna9Vbt1VJu7Oa8FTWasgby+-kJn7omOhbfmWzkdpVwBiqNzQ@mail.gmail.com>
	<[email protected]>

On 9/21/24 07:36, Peter J. Holzer wrote:
> On 2024-09-21 16:44:08 +0530, Lok P wrote:

> ---------------------------------------------------------------------------------------------------
> #!/usr/bin/python3
> 
> import time
> import psycopg2
> 
> num_inserts = 10_000
> batch_size = 50
> 
> db = psycopg2.connect()
> csr = db.cursor()
> 
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
> 
> start_time = time.monotonic()
> for i in range(1, num_inserts+1):
>      csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
>      if i % batch_size == 0:
>          db.commit()
> db.commit()
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 2: Individual Inserts with Commit after {batch_size}  Rows: {elapsed_time:.3} seconds")
> 
> # vim: tw=99
> ---------------------------------------------------------------------------------------------------

FYI, this is less of problem with psycopg(3) and pipeline mode:

import time
import psycopg

num_inserts = 10_000
batch_size = 50

db = psycopg.connect("dbname=test user=postgres host=104.237.158.68")
csr = db.cursor()

csr.execute("drop table if exists parent_table")
csr.execute("create table parent_table (id int primary key, t text)")
db.commit()

start_time = time.monotonic()
with db.pipeline():
     for i in range(1, num_inserts+1):
         csr.execute("insert into parent_table values(%s, %s)", (i, 'a'))
         if i % batch_size == 0:
             db.commit()
db.commit()
end_time = time.monotonic()
elapsed_time = end_time - start_time
print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit 
after {batch_size}  Rows: {elapsed_time:.3} seconds")


For remote to a database in another state that took the  time from:

Method 2: Individual Inserts with Commit after 50  Rows: 2.42e+02 seconds

to:

Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after 
50  Rows: 9.83 seconds

> #!/usr/bin/python3
> 
> import itertools
> import time
> import psycopg2
> 
> num_inserts = 10_000
> batch_size = 50
> 
> db = psycopg2.connect()
> csr = db.cursor()
> 
> csr.execute("drop table if exists parent_table")
> csr.execute("create table parent_table (id int primary key, t text)")
> db.commit()
> 
> start_time = time.monotonic()
> batch = []
> for i in range(1, num_inserts+1):
>      batch.append((i, 'a'))
>      if i % batch_size == 0:
>          q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
>          params = list(itertools.chain.from_iterable(batch))
>          csr.execute(q, params)
>          db.commit()
>          batch = []
> if batch:
>      q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch))
>      csr.execute(q, list(itertools.chain(batch)))
>      db.commit()
>      batch = []
> 
> end_time = time.monotonic()
> elapsed_time = end_time - start_time
> print(f"Method 3: Batch Inserts ({batch_size})  with Commit after each batch: {elapsed_time:.3} seconds")
> 
> # vim: tw=99
> ---------------------------------------------------------------------------------------------------

The above can also be handled with execute_batch() and execute_values() 
from:

https://www.psycopg.org/docs/extras.html#fast-execution-helpers

> 
> On my laptop, method2 is about twice as fast as method3. But if I
> connect to a database on the other side of the city, method2 is now more
> than 16 times faster than method3 . Simply because the delay in
> communication is now large compared to the time it takes to insert those
> rows.
> 
>          hp
> 

-- 
Adrian Klaver
[email protected]







view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How batch processing works
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox