Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss5aF-0060wO-CV for pgsql-general@arkaria.postgresql.org; Sat, 21 Sep 2024 19:15:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ss5aD-008jsM-9Z for pgsql-general@arkaria.postgresql.org; Sat, 21 Sep 2024 19:15:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss5aB-008jnn-P9 for pgsql-general@lists.postgresql.org; Sat, 21 Sep 2024 19:15:54 +0000 Received: from fhigh1-smtp.messagingengine.com ([103.168.172.152]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ss5a7-000OVW-Gu for pgsql-general@lists.postgresql.org; Sat, 21 Sep 2024 19:15:52 +0000 Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.phl.internal (Postfix) with ESMTP id 124B011400FF for ; Sat, 21 Sep 2024 15:15:46 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-10.internal (MEProxy); Sat, 21 Sep 2024 15:15:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1726946146; x=1727032546; bh=4+x9fnw9bEWfaahTHRN6ooZQuVr5LCzD7Svu+vCIqos=; b= iilQfqmHSn1WojeiKH6tN5socBi0UWomRL4y1+MWq1+9Qxnk85Fh4rMS9ppzV/vP Wwu/0v3115K65oVBGna5ozR+tIr3XUZcfHaUZ8P5GnrG15xSApCJbZ7dlHl+jq/A MvH4XkPiPlsvO4lWDX8xj1rhv/td86il6TxiOzuihlyVXWUd9worFiHBsXb2KB/P ifZlJM97W3/WreARsPkcFNvszY0Roi65O50uzHNdWfeZhGRTr9+qpCNDj7VLwNdd qiQN7VlbgMR/o46az22sbLHZhuAxv6JVE5nPM/8WkDgtZhODJlOYu9s9Zs92KWZ1 9UHbN5aaaiMmzueFESJhug== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1726946146; x= 1727032546; bh=4+x9fnw9bEWfaahTHRN6ooZQuVr5LCzD7Svu+vCIqos=; b=i YRiwXAdsmEKAgAFy5nPQOcMIOQsqD7C2ZL7H1tLwJVxfSSu/ZmpwzFDKJWmQ5htm T/cmNXoNJI+lzIZ4sXJLVGASiNclCfSMRys7QM1iKrgutAuq6wF7UBpnZFVHSTHb RaFBQhwLWui1IYdtA3E/Z2QfiW2yX3MZ12Fe8O7Ef4j40u0mfWq8B0zSCLzCoCDe y2V88I504cvMgAER9VJ3EipcKAOKXc7IsUtm9Eg2dpIbH/dndgRFLEq1ipSQv/yI E5f23q6Dt+ym8XqI5YA8PRrB9cxz2mmbI4AG2QzHOeTzt+yBX5l64RELdvzKrYuy u4hWeeKTSPQwmVLiM6U9Q== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrudelhedgudefiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepkfffgg gfuffvfhfhjggtgfesthejredttddvjeenucfhrhhomheptegurhhirghnucfmlhgrvhgv rhcuoegrughrihgrnhdrkhhlrghvvghrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrg htthgvrhhnpeelteegffehhfdtveelkeelffdtgfejleeiteduhfehhfetteetteetlefg tdevhfenucffohhmrghinhepphhshigtohhpghdrohhrghenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrughrihgrnhdrkhhlrghvvghrsegr khhlrghvvghrrdgtohhmpdhnsggprhgtphhtthhopedupdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehpghhsqhhlqdhgvghnvghrrghlsehlihhsthhsrdhpohhsthhgrhgv shhqlhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Sat, 21 Sep 2024 15:15:45 -0400 (EDT) Message-ID: <2fe0be15-f64e-465c-9dd2-b55c559ac7e2@aklaver.com> Date: Sat, 21 Sep 2024 12:15:44 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: How batch processing works To: pgsql-general@lists.postgresql.org References: <4178E73A-24F5-4E3C-92F6-1532D8102C3E@kleczek.org> <20240921143629.t2x37xfczeeunpnf@hjp.at> Content-Language: en-US From: Adrian Klaver In-Reply-To: <20240921143629.t2x37xfczeeunpnf@hjp.at> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 9/21/24 07:36, Peter J. Holzer wrote: > On 2024-09-21 16:44:08 +0530, Lok P wrote: > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: {elapsed_time:.3} seconds") > > # vim: tw=99 > --------------------------------------------------------------------------------------------------- FYI, this is less of problem with psycopg(3) and pipeline mode: import time import psycopg num_inserts = 10_000 batch_size = 50 db = psycopg.connect("dbname=test user=postgres host=104.237.158.68") csr = db.cursor() csr.execute("drop table if exists parent_table") csr.execute("create table parent_table (id int primary key, t text)") db.commit() start_time = time.monotonic() with db.pipeline(): for i in range(1, num_inserts+1): csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) if i % batch_size == 0: db.commit() db.commit() end_time = time.monotonic() elapsed_time = end_time - start_time print(f"Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after {batch_size} Rows: {elapsed_time:.3} seconds") For remote to a database in another state that took the time from: Method 2: Individual Inserts with Commit after 50 Rows: 2.42e+02 seconds to: Method 2: Individual Inserts(psycopg3 pipeline mode) with Commit after 50 Rows: 9.83 seconds > #!/usr/bin/python3 > > import itertools > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > batch = [] > for i in range(1, num_inserts+1): > batch.append((i, 'a')) > if i % batch_size == 0: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch)) > params = list(itertools.chain.from_iterable(batch)) > csr.execute(q, params) > db.commit() > batch = [] > if batch: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * len(batch)) > csr.execute(q, list(itertools.chain(batch))) > db.commit() > batch = [] > > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each batch: {elapsed_time:.3} seconds") > > # vim: tw=99 > --------------------------------------------------------------------------------------------------- The above can also be handled with execute_batch() and execute_values() from: https://www.psycopg.org/docs/extras.html#fast-execution-helpers > > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more > than 16 times faster than method3 . Simply because the delay in > communication is now large compared to the time it takes to insert those > rows. > > hp > -- Adrian Klaver adrian.klaver@aklaver.com