Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss2yq-005iUy-NY for pgsql-general@arkaria.postgresql.org; Sat, 21 Sep 2024 16:29:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ss1EN-004MJz-Qb for pgsql-general@arkaria.postgresql.org; Sat, 21 Sep 2024 14:37:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss1EN-004MGc-Fk for pgsql-general@lists.postgresql.org; Sat, 21 Sep 2024 14:37:04 +0000 Received: from mail.hjp.at ([212.17.106.138] helo=rorschach.hjp.at) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss1EI-000ME1-Sj for pgsql-general@lists.postgresql.org; Sat, 21 Sep 2024 14:37:03 +0000 Received: by rorschach.hjp.at (Postfix, from userid 1000) id 1A04A1DA7B; Sat, 21 Sep 2024 16:36:29 +0200 (CEST) Date: Sat, 21 Sep 2024 16:36:29 +0200 From: "Peter J. Holzer" To: pgsql-general@lists.postgresql.org Subject: Re: How batch processing works Message-ID: <20240921143629.t2x37xfczeeunpnf@hjp.at> Mail-Followup-To: pgsql-general@lists.postgresql.org References: <4178E73A-24F5-4E3C-92F6-1532D8102C3E@kleczek.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="xvouln7ckot6xtwz" Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --xvouln7ckot6xtwz Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2024-09-21 16:44:08 +0530, Lok P wrote: > But wondering why we don't see any difference in performance=A0between me= thod-2 > and method-3 above. The code runs completely inside the database. So there isn't much difference between a single statement which inserts 50 rows and 50 statements which insert 1 row each. The work to be done is (almost) the same. This changes once you consider an application which runs outside of the database (maybe even on a different host). Such an application has to wait for the result of each statement before it can send the next one. Now it makes a difference whether you are waiting 50 times for a statement which does very little or just once for a statement which does more work. > So does it mean that,I am testing this in a wrong way or That depends on what you want to test. If you are interested in the behaviour of stored procedures, the test is correct. If you want to know about the performance of a database client (whether its written in Java, Python, Go or whatever), this is the wrong test. You have to write the test in your target language and run it on the client system to get realistic results (for example, the round-trip times will be a lot shorter if the client and database are on the same computer than when one is in Europe and the other in America). For example, here are the three methods as Python scripts: ---------------------------------------------------------------------------= ------------------------ #!/usr/bin/python3 import time import psycopg2 num_inserts =3D 10_000 db =3D psycopg2.connect() csr =3D db.cursor() csr.execute("drop table if exists parent_table") csr.execute("create table parent_table (id int primary key, t text)") start_time =3D time.monotonic() for i in range(1, num_inserts+1): csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) db.commit() end_time =3D time.monotonic() elapsed_time =3D end_time - start_time print(f"Method 1: Individual Inserts with Commit after every Row: {elapsed_= time:.3} seconds") # vim: tw=3D99 ---------------------------------------------------------------------------= ------------------------ #!/usr/bin/python3 import time import psycopg2 num_inserts =3D 10_000 batch_size =3D 50 db =3D psycopg2.connect() csr =3D db.cursor() csr.execute("drop table if exists parent_table") csr.execute("create table parent_table (id int primary key, t text)") db.commit() start_time =3D time.monotonic() for i in range(1, num_inserts+1): csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) if i % batch_size =3D=3D 0: db.commit() db.commit() end_time =3D time.monotonic() elapsed_time =3D end_time - start_time print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: = {elapsed_time:.3} seconds") # vim: tw=3D99 ---------------------------------------------------------------------------= ------------------------ #!/usr/bin/python3 import itertools import time import psycopg2 num_inserts =3D 10_000 batch_size =3D 50 db =3D psycopg2.connect() csr =3D db.cursor() csr.execute("drop table if exists parent_table") csr.execute("create table parent_table (id int primary key, t text)") db.commit() start_time =3D time.monotonic() batch =3D [] for i in range(1, num_inserts+1): batch.append((i, 'a')) if i % batch_size =3D=3D 0: q =3D "insert into parent_table values" + ",".join(["(%s, %s)"] * l= en(batch)) params =3D list(itertools.chain.from_iterable(batch)) csr.execute(q, params) db.commit() batch =3D [] if batch: q =3D "insert into parent_table values" + ",".join(["(%s, %s)"] * len(b= atch)) csr.execute(q, list(itertools.chain(batch))) db.commit() batch =3D [] end_time =3D time.monotonic() elapsed_time =3D end_time - start_time print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each batc= h: {elapsed_time:.3} seconds") # vim: tw=3D99 ---------------------------------------------------------------------------= ------------------------ On my laptop, method2 is about twice as fast as method3. But if I connect to a database on the other side of the city, method2 is now more than 16 times faster than method3 . Simply because the delay in communication is now large compared to the time it takes to insert those rows. hp --=20 _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" --xvouln7ckot6xtwz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmbu2ecACgkQ8g5IURL+ KF29HxAAlmf4CJrpm57VtjQLfkIeQCwyy6gb2rHBeL92NB7+ie07moxG+IAt8se/ tPLnJ9KNg9Xr1PFxGHreqaC1uyXbLgKzp1ZJjAKpyNsRtNl6GTCBRAG0+SjJWB6x rmqrwP8TG90aNV4CdSYliCBQn8Cu7MZO4sCh5ixGKkIXQAaBf+qWOQhe8V4Vjm9M /6RhBqKAa8NXZfxKFvZ8p8pFMXHEwu+OVzZUm8rcTLDFbTbGQzu2vMatxaB5T2dW HkeWrldWcznVAxYY/hhYTIMbyv/xrb/Q6cSEw0VdzEpGaCAt+EZzYxlbtLIyAuJu 6sYtKYi18lEOl3IxwUmsbqFbbcq1apoPMdNDteY8UpFWsPZYYm4wuYsMm+R9aY8H fjkdMiXZbucPt5pW62tXtHVRXbbcn6PlE6Hw6s0x4+vKiBk22rJemvhonl3qRCYQ 5HbAUSfjghvkZV1DRJ1hv3i4V4ePqo1BYmQ7tZk+EaODg7yBG6k1wyF2P78YHmBM TguBtXpcrFYdNeZSqhZu/nc6krAHPhX4KBM3FTh6XMPRgCcad9/tfQiOgkJp6OV8 intCVV92mnkdRsG9wRWCTueLy0QymLWj+lYYGbfdsPV1rq4EwMzLR7W5H/ZwvMYE o42H0gJQEO8QUYTMImboF8nxcIjy6gM5ESCcHssPuDnFeS41Z2Y= =0lGT -----END PGP SIGNATURE----- --xvouln7ckot6xtwz--