Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snMcj-002yDA-Sp for pgsql-general@arkaria.postgresql.org; Sun, 08 Sep 2024 18:26:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1snMch-00HNH4-Jk for pgsql-general@arkaria.postgresql.org; Sun, 08 Sep 2024 18:26:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snMcf-00HNGq-Sv for pgsql-general@lists.postgresql.org; Sun, 08 Sep 2024 18:26:55 +0000 Received: from fhigh7-smtp.messagingengine.com ([103.168.172.158]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snMcW-000B4w-Vi for pgsql-general@lists.postgresql.org; Sun, 08 Sep 2024 18:26:52 +0000 Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.phl.internal (Postfix) with ESMTP id 961D611400E7 for ; Sun, 8 Sep 2024 14:26:43 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Sun, 08 Sep 2024 14:26:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1725820003; x=1725906403; bh=SJksatAzkv8x/eQ4x6aDPpYqjyDryovaIvYhtEdtQI0=; b= ookyzHnD7j4HpCNKrNlr/rTHw5g4L2xnELa/zgPKi+chzi8TnPEN8NMAwiLRmV3e G72Z6qTlJchRq30TQmzPEBobhj6w07UcAB35EWvq2shYEeJWXDlSPC0rr6ZglmWD oyOCUOp9ZM0HA29/ThGsmz3y73y9deeSBTt6FuqOD7QrhatMeBQ9/ZUJ3tKEL4lM VBySRmpYcNjeamhjH44nvEOpDLvVND5+7h55nx3eJD77H6rynJX37IYnQ4ukob+m QOV1c5wg+qKih4eeu/+ZyMmStgG1yfBGoiIirUGGg5Kf1y3GRi+JHYI1I1p+LeJ3 tVozJQuZGyLYRC0TIeALHA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1725820003; x= 1725906403; bh=SJksatAzkv8x/eQ4x6aDPpYqjyDryovaIvYhtEdtQI0=; b=X pqC+2QaerBI0/MYqzqFJMmoXJmGTJQLuCpuwWK23rFx9ARyN9sxUsATlAQDWCsiP 1hKm+XBu0KxBl32Deuyi/lI2Rum6EblWC0ufaNrR5j9AcBBz4424JoZR8n6IY/84 78qF6NmmaBAqlIfm5UbsLIsZgWeiNm1xKPA28o0K/ciDEiJxe4tgxD8+VbQXGDe/ 2ncLy+0is0SetoZb07q2hRLloBQKAUrrW/UcLcwRxt4FQqPWIS9HiHUIixe9Ats4 FztfDWWRE04YENLYG9b6/mrwDH00oaGGM8sce7NUmwWoa2LfcatNf1U4885OP6oW vJYE2pQH0+frHatKCbZlg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrudeihedguddvhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepkfffgg gfuffvfhfhjggtgfesthejredttddvjeenucfhrhhomheptegurhhirghnucfmlhgrvhgv rhcuoegrughrihgrnhdrkhhlrghvvghrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrg htthgvrhhnpeeivdfhieehheegueeileejieettdejhedugeefleekvdelkeehtdfgieff feekudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grughrihgrnhdrkhhlrghvvghrsegrkhhlrghvvghrrdgtohhmpdhnsggprhgtphhtthho pedupdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehpghhsqhhlqdhgvghnvghrrg hlsehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Sun, 8 Sep 2024 14:26:42 -0400 (EDT) Message-ID: <1a34eef6-9b30-46fa-bcc7-d7dc1c0602bb@aklaver.com> Date: Sun, 8 Sep 2024 11:26:42 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Faster data load To: pgsql-general@lists.postgresql.org References: <20240908174539.teeanjwrthkjm4ti@hjp.at> Content-Language: en-US From: Adrian Klaver In-Reply-To: <20240908174539.teeanjwrthkjm4ti@hjp.at> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 9/8/24 10:45, Peter J. Holzer wrote: > On 2024-09-06 01:44:00 +0530, Lok P wrote: >> We are having a requirement to create approx 50 billion rows in a partition >> table(~1 billion rows per partition, 200+gb size daily partitions) for a >> performance test. We are currently using ' insert into >> select.. From or ;' method . >> We have dropped all indexes and constraints First and then doing the load. >> Still it's taking 2-3 hours to populate one partition. > > That seems quite slow. Is the table very wide or does it have a large > number of indexes? > >> Is there a faster way to achieve this? >> >> Few teammate suggesting to use copy command and use file load instead, which >> will be faster. > > I doubt that. > > I benchmarked several strategies for populating tables 5 years ago and > (for my test data and on our hardware at the time - YMMV) s simple > INSERT ... SELECT was more than twice as fast as 8 parallel COPY > operations (and about 8 times as fast as a single COPY). > > Details will have changed since then (I should rerun that benchmark on > a current system), but I'd be surprised if COPY became that much faster > relative to INSERT ... SELECT. Yeah they seem to have changed a great deal. Though you are correct in saying COPY is not faster then INSERT..SELECT select version(); version ----------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit select count(*) from nyc_taxi_duckdb ; count --------- 2846722 ll -h nyc_taxi.csv -rw-rw-r-- 1 aklaver aklaver 252M Sep 8 10:54 nyc_taxi.csv insert into nyc_duplicate select * from nyc_taxi_duckdb ; INSERT 0 2846722 Time: 7015.072 ms (00:07.015) truncate nyc_duplicate ; \copy nyc_duplicate from 'nyc_taxi.csv' with csv header COPY 2846722 Time: 8760.197 ms (00:08.760) copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header; COPY 2846722 Time: 7904.279 ms (00:07.904) Just to see what the coming attraction offers: select version(); version -------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 17rc1 (Ubuntu 17~rc1-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit insert into nyc_duplicate select * from nyc_taxi_duckdb ; INSERT 0 2846722 Time: 5315.878 ms (00:05.316) \copy nyc_duplicate from 'nyc_taxi.csv' with csv header COPY 2846722 Time: 10042.129 ms (00:10.042) copy nyc_duplicate from '/tmp/nyc_taxi.csv' with csv header; COPY 2846722 Time: 8422.503 ms (00:08.423) > > hp > -- Adrian Klaver adrian.klaver@aklaver.com