Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNl0R-00DVGE-26 for pgsql-hackers@arkaria.postgresql.org; Tue, 25 Nov 2025 04:50:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vNl0Q-006J8S-0F for pgsql-hackers@arkaria.postgresql.org; Tue, 25 Nov 2025 04:50:22 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vNl0P-006J8K-2V for pgsql-hackers@lists.postgresql.org; Tue, 25 Nov 2025 04:50:22 +0000 Received: from mail-lf1-x132.google.com ([2a00:1450:4864:20::132]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vNl0N-001K7k-1L for pgsql-hackers@lists.postgresql.org; Tue, 25 Nov 2025 04:50:20 +0000 Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-5958232f806so5316911e87.0 for ; Mon, 24 Nov 2025 20:50:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764046218; x=1764651018; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eY0wNpVGaGaSMN2vxMSc+MRfW5Xkdq3pG8Zpi6oTb3I=; b=I08hrRC7nERNB3KfqbfH+lbboqqp55Ih7avLAAqnS24ZiZf1GhxNCfYb+jIEIszDz4 QeDsxdxAH3iYlp0PX8TqyItV3HTjO8EAjoqXNs4277omyo/HflQeXRlb9b8FR2/Q/VXx r/bcXD1MneVEToq9ZzU/WL1Inu6W2PqwfUx4qrl5OVqQRiVmSadqhdaNsxgqn5b6WqnE PGgRb8WCl6RYzCNFxs9vjc5ndPO8V7gqgIhZtYWQ/KR2y9yntyR8mNDyEYPwVMkm+YU/ +Os/7RsuMDL+5BzL9w/Yvn+XhVlUe0G1imygN2dT+qoiqZYn+Kal2VjSXPM/j7erIl1Y wSOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764046218; x=1764651018; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eY0wNpVGaGaSMN2vxMSc+MRfW5Xkdq3pG8Zpi6oTb3I=; b=awpZCLy0wHHZdfNaHDLlK4m2Hyjdw4/OUjv1SkcVBhgL3qIcUfjJ/FRSd95mHc/0Vf SGsL7Sy+wbTcgbQFU51iDLoQ6Dn8rlVViyID80x0ps3WVFo5CKheKWJ3zDFj4x7YxeOL MXpdg9xKkEOXn0YCDCD5pmznvhESBDWHhmt41kn9bXroF/TZp3xb3U07bbP9U7WOQbMZ eZZetcmIqR2eZs8Ja9E028raSe1rtA3K+F8peu7DUngI81AQ9x1uZQxHIEz3ltvAwocR IH3DT8tI8r5zTrQBoSUrgnCSEMhszWkgIOpnItNUmyOz7iSaz7h90d5IhCMUYdP50Za/ oQRw== X-Gm-Message-State: AOJu0YzAoSI4M+1GtSmgb5Ke25SWwKl6KyZU96G4+07Ys4ZCUGtQWrpU 4hVy1CGwZkBimC9dai5vAfSKIVdh+6Z5nKYPsECZbRGMTbAT4JN826F04loPNVD/tmcIaW6E0Sy ysQDAwFlziJWe1bjxH/lP7jFdoX6Pesw= X-Gm-Gg: ASbGnct2dSjJB/Bp3z6Ys1/5Lgc5OmiR6HjtYVzaDIKFTrZlzwb5pdFHfcpA501M5Vn Wvwdb3AfFRX+S/dzNOEcmeO7RgX18IGSdODijLspdLtlalczplUx2KqVmifiEleKFGFYiMuaLIh YziQC2+uYh2H9H6qxNMt6faQw7tLcXQmjfxESjlXOHM9+AEu1GRpdwxl7v4yvmFTsbsCsAPDyP9 E5lCFVtYMJZUpuMFWPEofDUA9C6xUIJE+zOnMPitQiD8OyBl4QYoYyyucjMVGXZcfpS5rocPa9K K8na3KJMna9WJ6eE9gF2GstlNCA= X-Google-Smtp-Source: AGHT+IEum6sJ2qHQXAhZZKKBcqRvYcsq+7EVbjjrLy4OMqZqYuPiaWLCp2w83q9h93NSFcltiEGVd2YfvVoKYHhpDEw= X-Received: by 2002:a05:6512:ad0:b0:591:c3f1:474d with SMTP id 2adb3069b0e04-596a3eac18bmr3753884e87.15.1764046217984; Mon, 24 Nov 2025 20:50:17 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Dilip Kumar Date: Tue, 25 Nov 2025 10:20:01 +0530 X-Gm-Features: AWmQ_bnW9yil7w3UA6Y-lcF2k1iAZkmjlf4qRyCxKRGlXLLuGtKMgnsB8NH5LOc Message-ID: Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump To: Hannu Krosing Cc: PostgreSQL Hackers , Nathan Bossart Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Nov 25, 2025 at 2:32=E2=80=AFAM Hannu Krosing w= rote: > > The expectation was that as chunking is useful mainly in case of > really huge tables the analyze should have been run "recently enough". > > Maybe we should use pg_relation_size() in case we have already > determined that the table is large enough to warrant chunking? Maybe > at least 1/2 of the requested chunk size? > > My reasoning was to not put too much extra load on pg_dump in case > chunking is not required. But of course we can use the presence of a > chunking request to decide to run pg_relation_size(), assuming the > overhead won't be too large in this case. > > > On Mon, Nov 17, 2025 at 5:15=E2=80=AFAM Dilip Kumar wrote: > > > > On Tue, Nov 11, 2025 at 9:00=E2=80=AFPM Hannu Krosing wrote: > > > > > > Attached is a patch that adds the ability to dump table data in multi= ple chunks. > > > > > > Looking for feedback at this point: > > > 1) what have I missed > > > 2) should I implement something to avoid single-page chunks > > > > > > The flag --huge-table-chunk-pages which tells the directory format > > > dump to dump tables where the main fork has more pages than this in > > > multiple chunks of given number of pages, > > > > > > The main use case is speeding up parallel dumps in case of one or a > > > small number of HUGE tables so parts of these can be dumped in > > > parallel. > > > > > > > +1 for the idea, I haven't done the detailed review but I was just > > going through the patch, I noticed that we use pg_class->relpages to > > identify whether to chunk the table or not, which should be fine but > > don't you think if we use direct size calculation function like > > pg_relation_size() we might get better idea and not dependent upon > > whether the stats are updated or not? This will make chunking > > behavior more deterministic. Yeah that makes sense, we can use relpages for initial identification and then use pg_relation_size() if relpages says the table is large enough. --=20 Regards, Dilip Kumar Google