Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0ZdF-00C99K-0x for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 04:58:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t0ZdD-0039mb-7X for pgsql-general@arkaria.postgresql.org; Tue, 15 Oct 2024 04:58:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t0ZdC-0039mS-T2 for pgsql-general@lists.postgresql.org; Tue, 15 Oct 2024 04:58:03 +0000 Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t0ZdB-0017ov-59 for pgsql-general@postgresql.org; Tue, 15 Oct 2024 04:58:02 +0000 Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-539d9fffea1so3675318e87.2 for ; Mon, 14 Oct 2024 21:58:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728968279; x=1729573079; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JI4KPTdlfn7u3fjFq/2dfu/tuqWUF/hREmtrvAbnQ/I=; b=ALBxcB8UHbU3M9sJBfQRFuxzyGZEC2grgu5nyJdbnpmqg8pnCuj0SRvsRDI/gEi/qi NoeRmwd/iidDSIGqeylNBrK+0kP/YOGIRY8mL6sjfRB/ZQ5DCakLVjS9FHU08NINj5IR bZGULjOyTnO5XUqz56BaExOzXQAcU7BcmFJYfwoxu5wLCLCnsdZbwuCZJTrabTcmsTpe EvzfRgIV5lmHJqAw2XOXR0S6ZoCBa6hGsikTBbVEitlcpEY+bqnUEHw6pNM0/XqqRoAZ nRfF+ODQTdIVjwqVW/b7ea+UhRv9i5n5SfNiDdbs4yZQFtf0buZd6XuAeDEdz5ZZjZpt 1S+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728968279; x=1729573079; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JI4KPTdlfn7u3fjFq/2dfu/tuqWUF/hREmtrvAbnQ/I=; b=KrXES70nVhMyW8g0qctSRe0f4J8Lb0Jyk4qY7+72TkvjJjUtJ2VesB21pTPFrSvP7X QbbEAYhua6I/RsrzHNfgzGKOasNnUOS3zs0atigxVl+VVjhDiI5NW2LubI2AXaCyhkqm CYV9UV+R1doua/KsNASVmbv5S5qtl2NtyI7ouCh6lBh1MjYtGRiQBgZaGSctv3UUXUQM Q9y36ihjArSGxjpvmpS8IPymCjjzB0Vacfj1VwxYZ7vH2j/qws1HEVSzAbMz2oDBLfou LiLcefrUYkVJ5+nxfTs+e4t74FOR22VELHO/uGsasP3v8PiW0uH/Qqyxb5mC2DiUTnyG 7kVA== X-Gm-Message-State: AOJu0Yz0ZjK9udVvqcuwl92Q11KyAbMXLUP0geIWJ4UIjcduD5eDsTOf qpFzka1zdGLnR8C713I08zUsrU1GmpepzlDY3n5Cf+52O4cY91cL7GeRr9m5mEPRZ387U5xlfWn SIlSisM/xxBGdN2aHLxHbF/3tFIM= X-Google-Smtp-Source: AGHT+IHEyCLtMT2RNYLwBHi1MzKOIMmVUj0ozQJkEvkUnICl8p2uEnWRTwDensbZUhfQEzFP2Yf2hZETKopl+2H6AFg= X-Received: by 2002:a05:6512:3090:b0:534:543e:1895 with SMTP id 2adb3069b0e04-539da565e2bmr6673187e87.39.1728968278466; Mon, 14 Oct 2024 21:57:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: David Rowley Date: Tue, 15 Oct 2024 17:57:46 +1300 Message-ID: Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast To: "Wong, Kam Fook (TR Technology)" Cc: pgsql-general Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 15 Oct 2024 at 06:59, Wong, Kam Fook (TR Technology) wrote: > I am trying to copy a table (Postgres) that is close to 1 billion rows into a Partition table (Postgres) within the same DB. What is the fastest way to copy the data? This table has 37 columns where some of which are text data types. Is the purpose of this question because you're partitioning an existing table? If so, you might want to consider if speed is the biggest consideration to the requirements. It is possible, for example, to partition a table "online" by using table inheritance as an intermediate way to partition the table and migrate the rows in smaller batches into inheritance child tables with CHECK constraints backing up the partition constraint. You can use a CTE with a DELETE .. WHERE RETURNING with an INSERT INTO new_table SELECT * FROM cte;. Once the inheritance parent table is empty, you can then consider rearranging the inheritance hierarchy into a partitioned table and its partitions. The CHECK constraint will allow the tables to be ATTACHed as partitions to a new partitioned table without having to scan each partition to ensure no rows violate the partition constraint. If done correctly, the only blocking operation done is some DDL which includes renaming a table and attaching all the partitions. All of that should be metadata-only operations. You'll want to rehearse the migration a few times away from production to help ensure it'll run smoothly on the day. I'm not familiar with pg_bulkload so can't comment on the other suggestions, however, I'd be surprised if exporting the data out of and back into PostgreSQL would be faster than having it remain inside PostgreSQL. Not exporting/importing means you don't need to call output and input functions for every row and column. If you didn't want to go down the inheritance table as an intermediate step, then you might find it's quite fast to start up a series of parallel jobs to INSERT INTO partition_name SELECT * FROM original_table WHERE ; David