Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smPz7-005Ya2-95 for pgsql-general@arkaria.postgresql.org; Fri, 06 Sep 2024 03:50:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1smPz5-005rcP-Qx for pgsql-general@arkaria.postgresql.org; Fri, 06 Sep 2024 03:50:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smPz5-005rcF-Fm for pgsql-general@lists.postgresql.org; Fri, 06 Sep 2024 03:50:07 +0000 Received: from mail-pj1-x102d.google.com ([2607:f8b0:4864:20::102d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1smPz2-000OAK-LP for pgsql-general@lists.postgresql.org; Fri, 06 Sep 2024 03:50:06 +0000 Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-2d89dbb60bdso1127679a91.1 for ; Thu, 05 Sep 2024 20:50:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitnine-net.20230601.gappssmtp.com; s=20230601; t=1725594602; x=1726199402; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Ik8gDhW5bG6JLh9ZB6a0KlZKb8QlQUKM+32VS1zwwhU=; b=p2JyC9WDgyHSXCi+Av9TVdIOp0M33XpEEkfpa8+viXfDiELrkpNqThWU+bcfC+Kkie Ui3LsugvUZl0wsFfAYK3gZE1N7/15I7RXDKdumuODrQ1CBUVlk17298pqV/YknFKf+hu nQCJhDJwZmDMpvymlSeczDD7mYZQ/Ltk5Ht3kBlj1MffmeufmACgioQyEHNjjEpy9Xe9 SMHCs99K6YGVFJIUHNiRrMGYpspr77jJOOK+luugABtVudG6fcG5oSL5kysH1FtNE86G ifbx2pxNqNQOKHR9CniQDOdNQpC4tSNVtWdbb6D9IgDS0VAqy2kJzJMM1VxHR0/RCE6R IE9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725594602; x=1726199402; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ik8gDhW5bG6JLh9ZB6a0KlZKb8QlQUKM+32VS1zwwhU=; b=je50Qc9RoXdUShyop9tHnG22yzJND6RyWI7mLYtXcwwgYAsjPqU3PZ2ddt6QtlGGiT ly3vF5UmdQwfk6rRJQXpOsy7JTPCRJgz/BdxnclcRBTkCYK+wlXdY+vPw1BkgbtPJ/pZ p8CcQ8fQ/E3pOddWdJkNY4TVxIJp0vZPR1sHDZ+QlvXbk6rqUlxI8YNfOmKM5LquvBeH c3LpDapiJ8ie/G7EYPL/TMX3InmQwfVdltvykQ9DT3PHdtRN+05KncyP8UkqyowSq89z qHql5Y6QYI/NrRSUuPOvn47fOxXNnfMmhf8APhOC3am3O4+0AuktM54TsqK39Tp1/GAT /K3A== X-Gm-Message-State: AOJu0YyIPOAvA2+CDt3CjA8Hnz3y1GQZRasTa2PzwpJd4ejrDXBAKzX1 NtMppaJXGRF61UMS5gywZBBCI3oFNVhl2IzvLO71mEplRw/gYtnOnQ0t43Xxept1sSshF4Ci6FO BQYq+6rWFIlW9vORjVtdHHGenIM7o6lAqMRWf5WrzCrvEy+uYAAgUKg== X-Google-Smtp-Source: AGHT+IG07/ZnhZzebcqOtN1HxVeIvlk1W7CN6ZmvopOBd35BYItW5xUiVQydP8AxVx+HpX0yY5qUX54kabvVe93US6c= X-Received: by 2002:a17:90a:cf0b:b0:2d4:6ef:cb14 with SMTP id 98e67ed59e1d1-2d8563916e5mr26654884a91.28.1725594602014; Thu, 05 Sep 2024 20:50:02 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Muhammad Usman Khan Date: Fri, 6 Sep 2024 08:49:51 +0500 Message-ID: Subject: Re: Faster data load To: Lok P Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000047a30506216b4dea" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000047a30506216b4dea Content-Type: text/plain; charset="UTF-8" Hi, You can use pg_partman. If your table is partitioned, you can manage partitions in parallel by distributing the load across partitions concurrently. Or you can use citus. It can be an excellent solution, especially for handling large data volumes and parallelizing data operations On Fri, 6 Sept 2024 at 01:14, Lok P wrote: > Hi, > > We are having a requirement to create approx 50 billion rows in a > partition table(~1 billion rows per partition, 200+gb size daily > partitions) for a performance test. We are currently using ' insert into > select.. From or transformed query>;' method . We have dropped all indexes and constraints > First and then doing the load. Still it's taking 2-3 hours to populate one > partition. Is there a faster way to achieve this? > > Few teammate suggesting to use copy command and use file load instead, > which will be faster. So I wanted to understand, how different things it > does behind the scenes as compared to insert as select command? As because > it only deals with sql engine only. > > Additionally, when we were trying to create indexes post data load on one > partition, it took 30+ minutes. Any possible way to make it faster? > > Is there any way to drive the above things in parallel by utilizing full > database resources? > > It's postgres 15.4 > > Regards > Lok > --00000000000047a30506216b4dea Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

You can use pg_pa= rtman. If your table is partitioned, you can manage partitions in parallel = by distributing the load across partitions concurrently. Or you can use cit= us. It can be an excellent solution, especially for handling large data vol= umes and parallelizing data operations


On Fr= i, 6 Sept 2024 at 01:14, Lok P <= loknath.73@gmail.com> wrote:
Hi,=C2=A0

We are having a requirement to create approx 50 billion rows in= a partition table(~1 billion rows per partition, 200+gb size daily partiti= ons) for a performance test. We are currently using ' insert into <t= arget table_partition> select.. From <source_table_partition> or &= lt;some transformed query>;' method . We have dropped all indexes an= d constraints First and then doing the load. Still it's taking 2-3 hour= s to populate one partition. Is there a faster way to achieve this?=C2=A0
Few teammate suggesting to use = copy command and use file load instead, which will be faster. So I wanted = to understand, how different things it does behind the scenes as compared t= o insert as select command? As because it only deals with sql engine only.= =C2=A0

Additionally, whe= n we were trying to create indexes post data load on one partition, it took= 30+ minutes. Any possible way to make it faster?=C2=A0

Is there any way to drive the above things = in parallel by utilizing full database resources?
It's postgres 15.4
Regards
Lok
--00000000000047a30506216b4dea--