Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJVMy-009661-2P for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Nov 2025 11:20:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vJVMw-000Xy6-1l for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Nov 2025 11:20:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJVMw-000Xxx-0i for pgsql-hackers@lists.postgresql.org; Thu, 13 Nov 2025 11:20:02 +0000 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vJUOP-007Vvf-2A for pgsql-hackers@postgresql.org; Thu, 13 Nov 2025 10:17:31 +0000 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-42b312a08a2so507648f8f.1 for ; Thu, 13 Nov 2025 02:17:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763029048; x=1763633848; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LWNS0/aYaWlmaznLGInVRWZ/nf3A3YANWoHCVGOhdzo=; b=NwB0mAWmPKGDd9nbQu04Lp0p0TxKNEd7bN3KIP/wv6E1aTxuSA0qyPMl0GJB5m6d3e P5livSpn2rNYVVtZBhx2fNJPWzTP3AAM/O1RH/uscwUQfnkkAT/inwTGksvLCCyktLMr BKI3WCFq7JAaU9h8+tt1nk+QiLHrEABzY9wduPedDhVs/ATL56na3wyk0eRwMAhdJenM Kiapy9RJ6wZRunI0ItpC2tFBs1HJCG7LcLf4hV7WHZbwqTh/p8KVkt8V7G/i3shGKDuQ 2ZZJiddZ3/I0KbkdCjp5vucBlYZiNSRGkk6fGkz2fEonY5TUFlzLz1bVuwZFLo3fMwNZ b5XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763029048; x=1763633848; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=LWNS0/aYaWlmaznLGInVRWZ/nf3A3YANWoHCVGOhdzo=; b=ePbtlhgfXpooidTf/8KwVmy6XyeDFkt5sz72eIrm3YYiomZXfBVJszaJ4WSteZYDuI n0OBqnmEQeDZ/WczJ91knO5yoV6LxYqUzqQbkPlbjN3w+rK2UIlK/GKZSlLwyd1dTXnK bcwzgsDHUowBjYTvRcU/mQil0B8tTYFsAWGr05IwU7RrrqiFyHw/M91KcqwLe+/65PTh D0A4NeIlMR6xktPSlmZSvOVm9vS27NtbY6xYQpJcdZ0TXDCscHZjtb+XppU7wtPE7S8Q S/FcY0RDXoKGfbRXO7DcjHtpJ7wgTtesyP+Kdoy6bZWkPyJnueHLPuiY2ZScig/r4ZtM +W3A== X-Gm-Message-State: AOJu0YxDz1hh7aojun7LDAkqy2WGSBCa9tPqGwqlBPlT3h5XBwkIP2s2 gJzrV5F2gPSE5d6mpqpy8GqWmDCR0uETI2tAiuPFjYrjtLLvE9zUtVodBWPRasZkXMYHiDQpkf8 nste1nA6eR91VwmJeJnlWmtP0Stz9FYZUyg== X-Gm-Gg: ASbGncs62kJ0P0yR4/S+w+qtpyeJgve2nWSU2zX3WW5+NY9FSdDn4f7s6y6JEXtfNhR L6dchnp10e7q/kKpaom6K1TuqJg/lO566u+TurNKzu4uzS0/1v2RhinwQEOkMRAvqXidceauV9S boE0W8wFi+GR0PkkBI5zAHViHbfIEscVDuLRh3E5bUcqdtu2KdHbDoUnNqs2MvO1EwnZBZmmAea 0HhZAT+QN1+Q7axfVviroTLStjbrguDcSGEb9IscIneOPyEDl320BdITJd7 X-Google-Smtp-Source: AGHT+IHDbqTlaN3RcrRE54RRogaLKMxZPL88jH1QP7FcLZSU1ERaoCJjb9DE8oo9iaV6ChuouwaTLGXkvc2J6xBB/qI= X-Received: by 2002:a5d:5f87:0:b0:42b:4858:751 with SMTP id ffacd0b85a97d-42b4bdae8f8mr6268036f8f.62.1763029047892; Thu, 13 Nov 2025 02:17:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ashutosh Bapat Date: Thu, 13 Nov 2025 15:47:15 +0530 X-Gm-Features: AWmQ_bkfUKSqGt-yPOUlDCa3lFQtCMQ6BdyhUmngQi5-dtMYN9k9l2rOE2xreT0 Message-ID: Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) To: Boris Mironov Cc: "pgsql-hackers@postgresql.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Boris, On Wed, Nov 12, 2025 at 3:25=E2=80=AFAM Boris Mironov wrote: > > Hello hackers, > > For some of my specific hardware tests I needed to generate big databases= well beyond RAM size. Hence I turned to pgbench tool and its default 2 mod= es for client- and server-side generation for TPC-B tests. When I use "scal= e" factor in range of few thousands (eg, 3000 - 5000) data generation phase= takes quite some time. I looked at it as opportunity to prove/disprove 2 h= ypothesises: > > will INSERT mode work faster if we commit once every "scale" and turn sin= gle INSERT into "for" loop with commits for 3 tables in the end of each loo= p > will "INSERT .. SELECT FROM unnest" be faster than "INSERT .. SELECT FROM= generate_series" > will BINARY mode work faster than TEXT even though we send much more data > and so on > > As a result of my experiments I produced significant patch for pgbench ut= ility and though that it might be of interest not just for me. Therefore I'= m sending draft version of it in diff format for current development tree o= n GitHub. As of November 11, 2025 I can merge with main branch of the proje= ct on GitHub. > > Spoiler alert: "COPY FROM BINARY" is significantly faster than current "C= OPY FROM TEXT" > > Would be happy to polish it if there is interest to such change. Making pgbench data initialization faster at a higher scale is desirable and the community might be willing to accept such a change. Running very large benchmarks is becoming common these days. However, it's not clear what you are proposing and what's the performance improvement. Answering following question may help: Your patch implements all the above methods? Do all of them provide performance improvement? If each of them performs better under certain conditions, what are those conditions? Is there one method which performs better than all others, which is it and why not implement just that method? What performance numbers are we looking at? Can the methods which use batch commits, also run those batches in parallel? --=20 Best Wishes, Ashutosh Bapat