public inbox for [email protected]  
help / color / mirror / Atom feed
Patch: dumping tables data in multiple chunks in pg_dump
16+ messages / 4 participants
[nested] [flat]

* Patch: dumping tables data in multiple chunks in pg_dump
@ 2025-11-11 15:29 Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-17 04:15 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Dilip Kumar <[email protected]>
  0 siblings, 2 replies; 16+ messages in thread

From: Hannu Krosing @ 2025-11-11 15:29 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Nathan Bossart <[email protected]>

Attached is a patch that adds the ability to dump table data in multiple chunks.

Looking for feedback at this point:
 1) what have I missed
 2) should I implement something to avoid single-page chunks

The flag --huge-table-chunk-pages which tells the directory format
dump to dump tables where the main fork has more pages than this in
multiple chunks of given number of pages,

The main use case is speeding up parallel dumps in case of one or a
small number of HUGE tables so parts of these can be dumped in
parallel.

It will also help in case the target file system has some limitations
on file sizes (4GB for FAT, 5TB for GCS).

Currently no tests are included in the patch and also no extra
documentation outside what is printed out by pg_dump --help . Also any
pg_log_warning lines with "CHUNKING" is there for debugging and needs
to be removed before committing.

As  implemented no changes are needed for pg_restore as all chunks are
already associated with the table in .toc and thus are restored into
this table

the attached README shows how I verified it works  and the textual
file created from the directory format dump in the last step there

--
Hannu


Attachments:

  [application/x-patch] 0001-adds-ability-to-dump-data-for-tables-in-multiple-chu.patch (11.5K, 2-0001-adds-ability-to-dump-data-for-tables-in-multiple-chu.patch)
  download | inline diff:
From 015cc46de277971d97c3b1823a5777fccb56c270 Mon Sep 17 00:00:00 2001
From: Hannu Krosing <[email protected]>
Date: Tue, 11 Nov 2025 16:11:08 +0100
Subject: [PATCH] adds ability to dump data for tables in multiple chunks
 controlled by flag --huge-table-chunk-pages

---
 src/bin/pg_dump/pg_backup.h          |   1 +
 src/bin/pg_dump/pg_backup_archiver.c |   1 +
 src/bin/pg_dump/pg_dump.c            | 157 +++++++++++++++++++++------
 src/bin/pg_dump/pg_dump.h            |   7 ++
 4 files changed, 130 insertions(+), 36 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index d9041dad720..b71caed8b83 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -178,6 +178,7 @@ typedef struct _dumpOptions
 	bool		aclsSkip;
 	const char *lockWaitTimeout;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
+	int			huge_table_chunk_pages; /* chunk when relpages is above this */
 
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 59eaecb4ed7..d555e365ea5 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -154,6 +154,7 @@ InitDumpOptions(DumpOptions *opts)
 	opts->dumpSchema = true;
 	opts->dumpData = true;
 	opts->dumpStatistics = false;
+	opts->huge_table_chunk_pages = UINT32_MAX; /* disable chunking by default */
 }
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a00918bacb4..e9ccc8e43ed 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -538,6 +538,7 @@ main(int argc, char **argv)
 		{"exclude-extension", required_argument, NULL, 17},
 		{"sequence-data", no_argument, &dopt.sequence_data, 1},
 		{"restrict-key", required_argument, NULL, 25},
+		{"huge-table-chunk-pages", required_argument, NULL, 26},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -802,6 +803,13 @@ main(int argc, char **argv)
 				dopt.restrict_key = pg_strdup(optarg);
 				break;
 
+			case 26:			/* huge table chunk pages */
+				if (!option_parse_int(optarg, "--huge-table-chunk-pages", 1, INT32_MAX,
+									  &dopt.huge_table_chunk_pages))
+					exit_nicely(1);
+				pg_log_warning("CHUNKING: set dopt.huge_table_chunk_pages to [%u]",(BlockNumber) dopt.huge_table_chunk_pages);
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -1357,6 +1365,9 @@ help(const char *progname)
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAME            include or exclude objects and data from dump\n"
 			 "                               based on expressions in FILENAME\n"));
+	printf(_("  --huge-table-chunk-pages=NUMPAGES\n"
+		     "                               Number of main table pages above which data is \n"
+			 "                               copied out in chunks, also determines the chunk size\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -2397,7 +2408,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 	 * a filter condition was specified.  For other cases a simple COPY
 	 * suffices.
 	 */
-	if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
+	if (tdinfo->filtercond || tdinfo->chunking || tbinfo->relkind == RELKIND_FOREIGN_TABLE)
 	{
 		/* Temporary allows to access to foreign tables to dump data */
 		if (tbinfo->relkind == RELKIND_FOREIGN_TABLE)
@@ -2413,9 +2424,18 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 		else
 			appendPQExpBufferStr(q, "* ");
 
-		appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
+		appendPQExpBuffer(q, "FROM %s %s",
 						  fmtQualifiedDumpable(tbinfo),
 						  tdinfo->filtercond ? tdinfo->filtercond : "");
+		if (tdinfo->chunking)
+		{
+			appendPQExpBuffer(q, "%s ctid BETWEEN '(%u,1)' AND '(%u,32000)'", /* there is no (*,0) tuple */
+								 tdinfo->filtercond?" AND ":" WHERE ",
+								 tdinfo->startPage, tdinfo->endPage);
+			pg_log_warning("CHUNKING: pages [%u:%u]",tdinfo->startPage, tdinfo->endPage);
+		}
+		
+		appendPQExpBuffer(q, ") TO stdout;");
 	}
 	else
 	{
@@ -2423,6 +2443,9 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 						  fmtQualifiedDumpable(tbinfo),
 						  column_list);
 	}
+
+	pg_log_warning("CHUNKING: data query: %s", q->data);
+	
 	res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
 	PQclear(res);
 	destroyPQExpBuffer(clistBuf);
@@ -2918,42 +2941,101 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
 	{
 		TocEntry   *te;
 
-		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
-						  ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
-									   .namespace = tbinfo->dobj.namespace->dobj.name,
-									   .owner = tbinfo->rolname,
-									   .description = "TABLE DATA",
-									   .section = SECTION_DATA,
-									   .createStmt = tdDefn,
-									   .copyStmt = copyStmt,
-									   .deps = &(tbinfo->dobj.dumpId),
-									   .nDeps = 1,
-									   .dumpFn = dumpFn,
-									   .dumpArg = tdinfo));
-
-		/*
-		 * Set the TocEntry's dataLength in case we are doing a parallel dump
-		 * and want to order dump jobs by table size.  We choose to measure
-		 * dataLength in table pages (including TOAST pages) during dump, so
-		 * no scaling is needed.
-		 *
-		 * However, relpages is declared as "integer" in pg_class, and hence
-		 * also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
-		 * Cast so that we get the right interpretation of table sizes
-		 * exceeding INT_MAX pages.
+		/* chunking works off relpages, which may be slightly off
+		 * but is the best we have without doing our own page count
+		 * should be enough for typical use case of huge tables which 
+		 * should have their relpages updated by autovacuum
+		 * 
+		 * We shoukld likely have a slight hysteresis here to avoid
+		 * tiny shunks when relpages is close to the threshold
 		 */
-		te->dataLength = (BlockNumber) tbinfo->relpages;
-		te->dataLength += (BlockNumber) tbinfo->toastpages;
+		if ((BlockNumber) tbinfo->relpages < dopt->huge_table_chunk_pages) /* TODO: add hysteresis here, maybe < 1.1 * huge_table_chunk_pages */
+		{
+			pg_log_warning("CHUNKING: toc for simple relpages [%u]",(BlockNumber) tbinfo->relpages);
+
+			te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = "TABLE DATA",
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = tdinfo));
 
-		/*
-		 * If pgoff_t is only 32 bits wide, the above refinement is useless,
-		 * and instead we'd better worry about integer overflow.  Clamp to
-		 * INT_MAX if the correct result exceeds that.
-		 */
-		if (sizeof(te->dataLength) == 4 &&
-			(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
-			 te->dataLength < 0))
-			te->dataLength = INT_MAX;
+			/*
+			* Set the TocEntry's dataLength in case we are doing a parallel dump
+			* and want to order dump jobs by table size.  We choose to measure
+			* dataLength in table pages (including TOAST pages) during dump, so
+			* no scaling is needed.
+			*
+			* However, relpages is declared as "integer" in pg_class, and hence
+			* also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
+			* Cast so that we get the right interpretation of table sizes
+			* exceeding INT_MAX pages.
+			*/
+			te->dataLength = (BlockNumber) tbinfo->relpages;
+			te->dataLength += (BlockNumber) tbinfo->toastpages;
+
+			/*
+			* If pgoff_t is only 32 bits wide, the above refinement is useless,
+			* and instead we'd better worry about integer overflow.  Clamp to
+			* INT_MAX if the correct result exceeds that.
+			*/
+			if (sizeof(te->dataLength) == 4 &&
+				(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
+				te->dataLength < 0))
+				te->dataLength = INT_MAX;
+		}
+		else
+		{
+			BlockNumber current_chunk_start = 0;
+			PQExpBuffer chunk_desc = createPQExpBuffer();
+			
+			pg_log_warning("CHUNKING: toc for chunked relpages [%u]",(BlockNumber) tbinfo->relpages);
+
+			while (current_chunk_start < (BlockNumber) tbinfo->relpages)/* TODO: add hysteresis here, maybe < 1.1 * huge_table_chunk_pages */
+			{
+				TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo));
+
+				memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo));
+				AssignDumpId(&chunk_tdinfo->dobj);
+				//addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId); /* do we need this here */
+				chunk_tdinfo->chunking = true;
+				chunk_tdinfo->startPage = current_chunk_start;
+				chunk_tdinfo->endPage = current_chunk_start + dopt->huge_table_chunk_pages - 1;
+
+				pg_log_warning("CHUNKING: toc for pages [%u:%u]",chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+				
+				current_chunk_start += dopt->huge_table_chunk_pages;
+				if (current_chunk_start >= (BlockNumber) tbinfo->relpages)
+					chunk_tdinfo->endPage = UINT32_MAX; /* last chunk is for "all the rest" */
+
+				printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+
+				te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = chunk_desc->data,
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = chunk_tdinfo));
+
+				te->dataLength = dopt->huge_table_chunk_pages;
+				/* let's assume toast pages distribute evenly among chunks */
+				te->dataLength += (off_t)dopt->huge_table_chunk_pages * tbinfo->toastpages / tbinfo->relpages;
+			}
+
+			destroyPQExpBuffer(chunk_desc);
+		}
 	}
 
 	destroyPQExpBuffer(copyBuf);
@@ -3077,6 +3159,9 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
 	tdinfo->dobj.namespace = tbinfo->dobj.namespace;
 	tdinfo->tdtable = tbinfo;
 	tdinfo->filtercond = NULL;	/* might get set later */
+	tdinfo->chunking = false; /* defaults */
+	tdinfo->startPage = 0;
+	tdinfo->endPage = UINT32_MAX;
 	addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
 
 	/* A TableDataInfo contains data, of course */
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 72a00e1bc20..30e8160ea66 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -16,6 +16,7 @@
 
 #include "pg_backup.h"
 #include "catalog/pg_publication_d.h"
+#include "storage/block.h"
 
 
 #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ?  1 : 0) )
@@ -413,6 +414,12 @@ typedef struct _tableDataInfo
 	DumpableObject dobj;
 	TableInfo  *tdtable;		/* link to table to dump */
 	char	   *filtercond;		/* WHERE condition to limit rows dumped */
+	bool 		chunking;
+	BlockNumber	startPage;		/* starting table page */
+	BlockNumber	endPage;		/* ending table page for page-range dump,
+	                    		 * usually startPage+huge_table_chunk_pages 
+								 * but we may want to do some small hysteresis to avoid single-page chunks
+								 */
 } TableDataInfo;
 
 typedef struct _indxInfo
-- 
2.51.2.1041.gc1ab5b90ca-goog



  [text/markdown] README.pg_dump.md (3.7K, 3-README.pg_dump.md)
  download | inline:
dumptest=# create table t1(id serial primary key, data text);
CREATE TABLE
dumptest=# insert into t1(data) values('a'), ('b'), ('c');
INSERT 0 3
dumptest=# create table t2(id serial primary key, data char(1800));
CREATE TABLE
dumptest=# insert into t2(data) select i from generate_series(1,30) g(i);
INSERT 0 30
dumptest=# select ctid from t2;
 ctid  
-------
 (0,1)
 (0,2)
 (0,3)
 (0,4)
 (1,1)
 (1,2)
 (1,3)
 (1,4)
 (2,1)
 (2,2)
 (2,3)
 (2,4)
 (3,1)
 (3,2)
 (3,3)
 (3,4)
 (4,1)
 (4,2)
 (4,3)
 (4,4)
 (5,1)
 (5,2)
 (5,3)
 (5,4)
 (6,1)
 (6,2)
 (6,3)
 (6,4)
 (7,1)
 (7,2)
(30 rows)


----------------------------
hannuk@hannuk007:~/work5/postgres-git/src/bin/pg_dump$ ./pg_dump -h /var/run/postgresql -p 5432 --format=directory --huge-table-chunk-pages=2 -f ../dumptest dumptest
pg_dump: warning: CHUNKING: set dopt.huge_table_chunk_pages to [2]
pg_dump: warning: CHUNKING: toc for simple relpages [1]
pg_dump: warning: CHUNKING: toc for chunked relpages [8]
pg_dump: warning: CHUNKING: toc for pages [0:1]
pg_dump: warning: CHUNKING: toc for pages [2:3]
pg_dump: warning: CHUNKING: toc for pages [4:5]
pg_dump: warning: CHUNKING: toc for pages [6:7]
pg_dump: warning: CHUNKING: data query: COPY public.t1 (id, data) TO stdout;
pg_dump: warning: CHUNKING: pages [0:1]
pg_dump: warning: CHUNKING: data query: COPY (SELECT id, data FROM public.t2  WHERE  ctid BETWEEN '(0,1)' AND '(1,32000)') TO stdout;
pg_dump: warning: CHUNKING: pages [2:3]
pg_dump: warning: CHUNKING: data query: COPY (SELECT id, data FROM public.t2  WHERE  ctid BETWEEN '(2,1)' AND '(3,32000)') TO stdout;
pg_dump: warning: CHUNKING: pages [4:5]
pg_dump: warning: CHUNKING: data query: COPY (SELECT id, data FROM public.t2  WHERE  ctid BETWEEN '(4,1)' AND '(5,32000)') TO stdout;
pg_dump: warning: CHUNKING: pages [6:4294967295]
pg_dump: warning: CHUNKING: data query: COPY (SELECT id, data FROM public.t2  WHERE  ctid BETWEEN '(6,1)' AND '(4294967295,32000)') TO stdout;

hannuk@hannuk007:~/work5/postgres-git/src/bin/pg_dump$ ls -l ../dumptest/
total 28
-rw-r--r-- 1 hannuk primarygroup   37 Nov 11 15:28 4012.dat.gz
-rw-r--r-- 1 hannuk primarygroup  100 Nov 11 15:28 4021.dat.gz
-rw-r--r-- 1 hannuk primarygroup  108 Nov 11 15:28 4022.dat.gz
-rw-r--r-- 1 hannuk primarygroup  110 Nov 11 15:28 4023.dat.gz
-rw-r--r-- 1 hannuk primarygroup   94 Nov 11 15:28 4024.dat.gz
-rw-r--r-- 1 hannuk primarygroup 4790 Nov 11 15:28 toc.dat

hannuk@hannuk007:~/work5/postgres-git/src/bin/pg_dump$ ./pg_restore --list ../dumptest
;
; Archive created at 2025-11-11 15:28:40 CET
;     dbname: dumptest
;     TOC Entries: 21
;     Compression: gzip
;     Dump Version: 1.16-0
;     Format: DIRECTORY
;     Integer: 4 bytes
;     Offset: 8 bytes
;     Dumped from database version: 16.4 (Debian 16.4-3+build4)
;     Dumped by pg_dump version: 19devel
;
;
; Selected TOC Entries:
;
218; 1259 1255969 TABLE public t1 hannuk
217; 1259 1255968 SEQUENCE public t1_id_seq hannuk
4019; 0 0 SEQUENCE OWNED BY public t1_id_seq hannuk
216; 1259 1255960 TABLE public t2 hannuk
215; 1259 1255959 SEQUENCE public t2_id_seq hannuk
4020; 0 0 SEQUENCE OWNED BY public t2_id_seq hannuk
3861; 2604 1255972 DEFAULT public t1 id hannuk
3860; 2604 1255963 DEFAULT public t2 id hannuk
4012; 0 1255969 TABLE DATA public t1 hannuk
4021; 0 1255960 TABLE DATA (pages 0:1) public t2 hannuk
4022; 0 1255960 TABLE DATA (pages 2:3) public t2 hannuk
4023; 0 1255960 TABLE DATA (pages 4:5) public t2 hannuk
4024; 0 1255960 TABLE DATA (pages 6:4294967295) public t2 hannuk
4025; 0 0 SEQUENCE SET public t1_id_seq hannuk
4026; 0 0 SEQUENCE SET public t2_id_seq hannuk
3865; 2606 1255976 CONSTRAINT public t1 t1_pkey hannuk
3863; 2606 1255967 CONSTRAINT public t2 t2_pkey hannuk

hannuk@hannuk007:~/work5/postgres-git/src/bin/pg_dump$ ./pg_restore ../dumptest -f ../dumptest/dump.sql


  [application/sql] dump.sql (56.2K, 4-dump.sql)
  download

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-12 12:59 ` Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  1 sibling, 1 reply; 16+ messages in thread

From: Ashutosh Bapat @ 2025-11-12 12:59 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Hi Hannu,

On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
>
> Attached is a patch that adds the ability to dump table data in multiple chunks.
>
> Looking for feedback at this point:
>  1) what have I missed
>  2) should I implement something to avoid single-page chunks
>
> The flag --huge-table-chunk-pages which tells the directory format
> dump to dump tables where the main fork has more pages than this in
> multiple chunks of given number of pages,
>
> The main use case is speeding up parallel dumps in case of one or a
> small number of HUGE tables so parts of these can be dumped in
> parallel.

Have you measured speed up? Can you please share the numbers?

-- 
Best Wishes,
Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
@ 2025-11-13 18:02   ` Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2025-11-13 18:02 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

I just ran a test by generating a 408GB table and then dumping it both ways

$ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
/tmp/plain.dump largedb

real    39m54.968s
user    37m21.557s
sys     2m32.422s

$ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb

real    5m52.965s
user    40m27.284s
sys     3m53.339s

So parallel dump with 8 workers using 1GB (128k pages) chunks runs
almost 7 times faster than the sequential dump.

this was a table that had no TOAST part. I will run some more tests
with TOASTed tables next and expect similar or better improvements.



On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
<[email protected]> wrote:
>
> Hi Hannu,
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> >  1) what have I missed
> >  2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
>
> Have you measured speed up? Can you please share the numbers?
>
> --
> Best Wishes,
> Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-13 18:39     ` Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2025-11-13 18:39 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Going up to 16 workers did not improve performance , but this is
expected, as the disk behind the database can only do 4TB/hour of
reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)

$ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
real    5m44.900s
user    53m50.491s
sys     5m47.602s

And 4 workers showed near-linear speedup from single worker

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
real    10m32.074s
user    38m54.436s
sys     2m58.216s

The database runs on a 64vCPU VM with 128GB RAM, so most of the table
will be read in from the disk






On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <[email protected]> wrote:
>
> I just ran a test by generating a 408GB table and then dumping it both ways
>
> $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> /tmp/plain.dump largedb
>
> real    39m54.968s
> user    37m21.557s
> sys     2m32.422s
>
> $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
>
> real    5m52.965s
> user    40m27.284s
> sys     3m53.339s
>
> So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> almost 7 times faster than the sequential dump.
>
> this was a table that had no TOAST part. I will run some more tests
> with TOASTed tables next and expect similar or better improvements.
>
>
>
> On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> <[email protected]> wrote:
> >
> > Hi Hannu,
> >
> > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > >
> > > Looking for feedback at this point:
> > >  1) what have I missed
> > >  2) should I implement something to avoid single-page chunks
> > >
> > > The flag --huge-table-chunk-pages which tells the directory format
> > > dump to dump tables where the main fork has more pages than this in
> > > multiple chunks of given number of pages,
> > >
> > > The main use case is speeding up parallel dumps in case of one or a
> > > small number of HUGE tables so parts of these can be dumped in
> > > parallel.
> >
> > Have you measured speed up? Can you please share the numbers?
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-13 20:24       ` Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2025-11-13 20:24 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Ran another test with a 53GB database where most of the data is in TOAST

CREATE TABLE just_toasted(
  id serial primary key,
  toasted1 char(2200) STORAGE EXTERNAL,
  toasted2 char(2200) STORAGE EXTERNAL,
  toasted3 char(2200) STORAGE EXTERNAL,
  toasted4 char(2200) STORAGE EXTERNAL
);

and the toast fields were added in somewhat randomised order.

Here the results are as follows

Parallelism   |   chunk size (pages)   |    time (sec)
 1        |    -         |     240
 2        |  1000    |     129
 4        |  1000    |      64
 8        |  1000    |      36
16       |  1000    |      30

 4        |  9095    |      78
 8        |  9095    |      42
16       |  9095    |      42

The reason larger chunk sizes performed worse was that they often had
one or two stragglers left behind which

Detailed run results below:

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres -f
/tmp/ltoastdb-1-plain.dump largetoastdb
real    3m59.465s
user    3m43.304s
sys     0m15.844s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 4 -f /tmp/ltoastdb-4.dump
largetoastdb
real    1m18.320s
user    3m49.236s
sys     0m19.422s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 8 -f /tmp/ltoastdb-8.dump
largetoastdb
real    0m42.028s
user    3m55.299s
sys     0m24.657s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=9095 -j 16 -f /tmp/ltoastdb-16.dump
largetoastdb
real    0m42.575s
user    4m11.011s
sys     0m26.110s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
largetoastdb
real    0m29.641s
user    6m16.321s
sys     0m49.345s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
largetoastdb
real    0m35.685s
user    3m58.528s
sys     0m26.729s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
largetoastdb
real    1m3.737s
user    3m50.251s
sys     0m18.507s

hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
largetoastdb
real    2m8.708s
user    3m57.018s
sys     0m18.499s

On Thu, Nov 13, 2025 at 7:39 PM Hannu Krosing <[email protected]> wrote:
>
> Going up to 16 workers did not improve performance , but this is
> expected, as the disk behind the database can only do 4TB/hour of
> reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)
>
> $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
> real    5m44.900s
> user    53m50.491s
> sys     5m47.602s
>
> And 4 workers showed near-linear speedup from single worker
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
> real    10m32.074s
> user    38m54.436s
> sys     2m58.216s
>
> The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> will be read in from the disk
>
>
>
>
>
>
> On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <[email protected]> wrote:
> >
> > I just ran a test by generating a 408GB table and then dumping it both ways
> >
> > $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> > /tmp/plain.dump largedb
> >
> > real    39m54.968s
> > user    37m21.557s
> > sys     2m32.422s
> >
> > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
> >
> > real    5m52.965s
> > user    40m27.284s
> > sys     3m53.339s
> >
> > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > almost 7 times faster than the sequential dump.
> >
> > this was a table that had no TOAST part. I will run some more tests
> > with TOASTed tables next and expect similar or better improvements.
> >
> >
> >
> > On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> > <[email protected]> wrote:
> > >
> > > Hi Hannu,
> > >
> > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > > >
> > > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > > >
> > > > Looking for feedback at this point:
> > > >  1) what have I missed
> > > >  2) should I implement something to avoid single-page chunks
> > > >
> > > > The flag --huge-table-chunk-pages which tells the directory format
> > > > dump to dump tables where the main fork has more pages than this in
> > > > multiple chunks of given number of pages,
> > > >
> > > > The main use case is speeding up parallel dumps in case of one or a
> > > > small number of HUGE tables so parts of these can be dumped in
> > > > parallel.
> > >
> > > Have you measured speed up? Can you please share the numbers?
> > >
> > > --
> > > Best Wishes,
> > > Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-13 20:26         ` Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2025-11-13 20:26 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

The reason for small chunk sizes is that they are determined by main
heap table, and that was just over 1GB

largetoastdb=> SELECT format('%I.%I', t.schemaname, t.relname) as table_name,
       pg_table_size(t.relid) AS table_size,
       sum(pg_relation_size(i.indexrelid)) AS total_index_size,
       pg_relation_size(t.relid) AS main_table_size,
       pg_relation_size(c.reltoastrelid) AS toast_table_size,
       pg_relation_size(oi.indexrelid) AS toast_index_size,
       t.n_live_tup AS row_count,
       count(*) AS index_count,
       array_to_json(array_agg(json_build_object(i.indexrelid::regclass,
pg_relation_size(i.indexrelid))), true) AS index_info
  FROM pg_stat_user_tables t
  JOIN pg_stat_user_indexes i ON i.relid = t.relid
  JOIN pg_class c ON c.oid = t.relid
  LEFT JOIN pg_stat_sys_indexes AS oi ON oi.relid = c.reltoastrelid
 GROUP BY 1, 2, 4, 5, 6, 7
 ORDER BY 2 DESC, 7 DESC
 LIMIT 25;
┌─[ RECORD 1 ]─────┬─────────────────────────────────────┐
│ table_name       │ public.just_toasted                 │
│ table_size       │ 56718835712                         │
│ total_index_size │ 230064128                           │
│ main_table_size  │ 1191559168                          │
│ toast_table_size │ 54613336064                         │
│ toast_index_size │ 898465792                           │
│ row_count        │ 5625234                             │
│ index_count      │ 1                                   │
│ index_info       │ [{"just_toasted_pkey" : 230064128}] │
└──────────────────┴─────────────────────────────────────┘

On Thu, Nov 13, 2025 at 9:24 PM Hannu Krosing <[email protected]> wrote:
>
> Ran another test with a 53GB database where most of the data is in TOAST
>
> CREATE TABLE just_toasted(
>   id serial primary key,
>   toasted1 char(2200) STORAGE EXTERNAL,
>   toasted2 char(2200) STORAGE EXTERNAL,
>   toasted3 char(2200) STORAGE EXTERNAL,
>   toasted4 char(2200) STORAGE EXTERNAL
> );
>
> and the toast fields were added in somewhat randomised order.
>
> Here the results are as follows
>
> Parallelism   |   chunk size (pages)   |    time (sec)
>  1        |    -         |     240
>  2        |  1000    |     129
>  4        |  1000    |      64
>  8        |  1000    |      36
> 16       |  1000    |      30
>
>  4        |  9095    |      78
>  8        |  9095    |      42
> 16       |  9095    |      42
>
> The reason larger chunk sizes performed worse was that they often had
> one or two stragglers left behind which
>
> Detailed run results below:
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres -f
> /tmp/ltoastdb-1-plain.dump largetoastdb
> real    3m59.465s
> user    3m43.304s
> sys     0m15.844s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 4 -f /tmp/ltoastdb-4.dump
> largetoastdb
> real    1m18.320s
> user    3m49.236s
> sys     0m19.422s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 8 -f /tmp/ltoastdb-8.dump
> largetoastdb
> real    0m42.028s
> user    3m55.299s
> sys     0m24.657s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 16 -f /tmp/ltoastdb-16.dump
> largetoastdb
> real    0m42.575s
> user    4m11.011s
> sys     0m26.110s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
> largetoastdb
> real    0m29.641s
> user    6m16.321s
> sys     0m49.345s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
> largetoastdb
> real    0m35.685s
> user    3m58.528s
> sys     0m26.729s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
> largetoastdb
> real    1m3.737s
> user    3m50.251s
> sys     0m18.507s
>
> hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
> largetoastdb
> real    2m8.708s
> user    3m57.018s
> sys     0m18.499s
>
> On Thu, Nov 13, 2025 at 7:39 PM Hannu Krosing <[email protected]> wrote:
> >
> > Going up to 16 workers did not improve performance , but this is
> > expected, as the disk behind the database can only do 4TB/hour of
> > reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)
> >
> > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
> > real    5m44.900s
> > user    53m50.491s
> > sys     5m47.602s
> >
> > And 4 workers showed near-linear speedup from single worker
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
> > real    10m32.074s
> > user    38m54.436s
> > sys     2m58.216s
> >
> > The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> > will be read in from the disk
> >
> >
> >
> >
> >
> >
> > On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > I just ran a test by generating a 408GB table and then dumping it both ways
> > >
> > > $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> > > /tmp/plain.dump largedb
> > >
> > > real    39m54.968s
> > > user    37m21.557s
> > > sys     2m32.422s
> > >
> > > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
> > >
> > > real    5m52.965s
> > > user    40m27.284s
> > > sys     3m53.339s
> > >
> > > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > > almost 7 times faster than the sequential dump.
> > >
> > > this was a table that had no TOAST part. I will run some more tests
> > > with TOASTed tables next and expect similar or better improvements.
> > >
> > >
> > >
> > > On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> > > <[email protected]> wrote:
> > > >
> > > > Hi Hannu,
> > > >
> > > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > > > >
> > > > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > > > >
> > > > > Looking for feedback at this point:
> > > > >  1) what have I missed
> > > > >  2) should I implement something to avoid single-page chunks
> > > > >
> > > > > The flag --huge-table-chunk-pages which tells the directory format
> > > > > dump to dump tables where the main fork has more pages than this in
> > > > > multiple chunks of given number of pages,
> > > > >
> > > > > The main use case is speeding up parallel dumps in case of one or a
> > > > > small number of HUGE tables so parts of these can be dumped in
> > > > > parallel.
> > > >
> > > > Have you measured speed up? Can you please share the numbers?
> > > >
> > > > --
> > > > Best Wishes,
> > > > Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-13 20:34           ` Hannu Krosing <[email protected]>
  2026-03-28 10:59             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:32             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 2 replies; 16+ messages in thread

From: Hannu Krosing @ 2025-11-13 20:34 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Added to https://commitfest.postgresql.org/patch/6219/

On Thu, Nov 13, 2025 at 9:26 PM Hannu Krosing <[email protected]> wrote:
>
> The reason for small chunk sizes is that they are determined by main
> heap table, and that was just over 1GB
>
> largetoastdb=> SELECT format('%I.%I', t.schemaname, t.relname) as table_name,
>        pg_table_size(t.relid) AS table_size,
>        sum(pg_relation_size(i.indexrelid)) AS total_index_size,
>        pg_relation_size(t.relid) AS main_table_size,
>        pg_relation_size(c.reltoastrelid) AS toast_table_size,
>        pg_relation_size(oi.indexrelid) AS toast_index_size,
>        t.n_live_tup AS row_count,
>        count(*) AS index_count,
>        array_to_json(array_agg(json_build_object(i.indexrelid::regclass,
> pg_relation_size(i.indexrelid))), true) AS index_info
>   FROM pg_stat_user_tables t
>   JOIN pg_stat_user_indexes i ON i.relid = t.relid
>   JOIN pg_class c ON c.oid = t.relid
>   LEFT JOIN pg_stat_sys_indexes AS oi ON oi.relid = c.reltoastrelid
>  GROUP BY 1, 2, 4, 5, 6, 7
>  ORDER BY 2 DESC, 7 DESC
>  LIMIT 25;
> ┌─[ RECORD 1 ]─────┬─────────────────────────────────────┐
> │ table_name       │ public.just_toasted                 │
> │ table_size       │ 56718835712                         │
> │ total_index_size │ 230064128                           │
> │ main_table_size  │ 1191559168                          │
> │ toast_table_size │ 54613336064                         │
> │ toast_index_size │ 898465792                           │
> │ row_count        │ 5625234                             │
> │ index_count      │ 1                                   │
> │ index_info       │ [{"just_toasted_pkey" : 230064128}] │
> └──────────────────┴─────────────────────────────────────┘
>
> On Thu, Nov 13, 2025 at 9:24 PM Hannu Krosing <[email protected]> wrote:
> >
> > Ran another test with a 53GB database where most of the data is in TOAST
> >
> > CREATE TABLE just_toasted(
> >   id serial primary key,
> >   toasted1 char(2200) STORAGE EXTERNAL,
> >   toasted2 char(2200) STORAGE EXTERNAL,
> >   toasted3 char(2200) STORAGE EXTERNAL,
> >   toasted4 char(2200) STORAGE EXTERNAL
> > );
> >
> > and the toast fields were added in somewhat randomised order.
> >
> > Here the results are as follows
> >
> > Parallelism   |   chunk size (pages)   |    time (sec)
> >  1        |    -         |     240
> >  2        |  1000    |     129
> >  4        |  1000    |      64
> >  8        |  1000    |      36
> > 16       |  1000    |      30
> >
> >  4        |  9095    |      78
> >  8        |  9095    |      42
> > 16       |  9095    |      42
> >
> > The reason larger chunk sizes performed worse was that they often had
> > one or two stragglers left behind which
> >
> > Detailed run results below:
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres -f
> > /tmp/ltoastdb-1-plain.dump largetoastdb
> > real    3m59.465s
> > user    3m43.304s
> > sys     0m15.844s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=9095 -j 4 -f /tmp/ltoastdb-4.dump
> > largetoastdb
> > real    1m18.320s
> > user    3m49.236s
> > sys     0m19.422s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=9095 -j 8 -f /tmp/ltoastdb-8.dump
> > largetoastdb
> > real    0m42.028s
> > user    3m55.299s
> > sys     0m24.657s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=9095 -j 16 -f /tmp/ltoastdb-16.dump
> > largetoastdb
> > real    0m42.575s
> > user    4m11.011s
> > sys     0m26.110s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
> > largetoastdb
> > real    0m29.641s
> > user    6m16.321s
> > sys     0m49.345s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
> > largetoastdb
> > real    0m35.685s
> > user    3m58.528s
> > sys     0m26.729s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
> > largetoastdb
> > real    1m3.737s
> > user    3m50.251s
> > sys     0m18.507s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
> > largetoastdb
> > real    2m8.708s
> > user    3m57.018s
> > sys     0m18.499s
> >
> > On Thu, Nov 13, 2025 at 7:39 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > Going up to 16 workers did not improve performance , but this is
> > > expected, as the disk behind the database can only do 4TB/hour of
> > > reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)
> > >
> > > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
> > > real    5m44.900s
> > > user    53m50.491s
> > > sys     5m47.602s
> > >
> > > And 4 workers showed near-linear speedup from single worker
> > >
> > > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > > --format=directory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
> > > real    10m32.074s
> > > user    38m54.436s
> > > sys     2m58.216s
> > >
> > > The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> > > will be read in from the disk
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <[email protected]> wrote:
> > > >
> > > > I just ran a test by generating a 408GB table and then dumping it both ways
> > > >
> > > > $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> > > > /tmp/plain.dump largedb
> > > >
> > > > real    39m54.968s
> > > > user    37m21.557s
> > > > sys     2m32.422s
> > > >
> > > > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > > > --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
> > > >
> > > > real    5m52.965s
> > > > user    40m27.284s
> > > > sys     3m53.339s
> > > >
> > > > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > > > almost 7 times faster than the sequential dump.
> > > >
> > > > this was a table that had no TOAST part. I will run some more tests
> > > > with TOASTed tables next and expect similar or better improvements.
> > > >
> > > >
> > > >
> > > > On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi Hannu,
> > > > >
> > > > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > > > > >
> > > > > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > > > > >
> > > > > > Looking for feedback at this point:
> > > > > >  1) what have I missed
> > > > > >  2) should I implement something to avoid single-page chunks
> > > > > >
> > > > > > The flag --huge-table-chunk-pages which tells the directory format
> > > > > > dump to dump tables where the main fork has more pages than this in
> > > > > > multiple chunks of given number of pages,
> > > > > >
> > > > > > The main use case is speeding up parallel dumps in case of one or a
> > > > > > small number of HUGE tables so parts of these can be dumped in
> > > > > > parallel.
> > > > >
> > > > > Have you measured speed up? Can you please share the numbers?
> > > > >
> > > > > --
> > > > > Best Wishes,
> > > > > Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-28 10:59             ` Hannu Krosing <[email protected]>
  1 sibling, 0 replies; 16+ messages in thread

From: Hannu Krosing @ 2026-03-28 10:59 UTC (permalink / raw)
  To: Zsolt Parragi <[email protected]>; Dilip Kumar <[email protected]>; +Cc: David Rowley <[email protected]>; Ashutosh Bapat <[email protected]>; PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Hi Zsolt and Dilip,

Thanks for review and useful comments!

On Tue, Feb 3, 2026 at 10:10 PM Zsolt Parragi <[email protected]> wrote:
>
> Hello!
>
> I did some testing with this patch, and I think there are some issues
> during restoration:
>
> 1. Isn't there a possible race / scheduling mistake during restore
> because of missing dependencies? The code now prints out "TABLE DATA
> (pages %u:%u)", while the restore code checks for the explicit "TABLE
> DATA" string for dependency tracking (pg_backup_archiver.c:2013 and a
> few other places). This causes POST DATA to have no dependency on the
> table data, and can be scheduled before we load all table data.

I have resolved this by adding a second array to the reverse dependencies
mechanism in buildTocEntryArrays() for chunked dump where I collect arrays
of ids in AH->tableDataChunkIds.

For this I extracted the list management part from DumpableObject

typedef struct _DependencyList
{
DumpId    *dependencies; /* dumpIds of objects this one depends on */
int nDeps; /* number of valid dependencies */
int allocDeps; /* allocated size of dependencies[] */
} DependencyList;

And added addStandaloneDependency() based addObjectDependency()

I simplified it to always use realloc, as it can handle the NULL case

void
addStandaloneDependency(DependencyList *dobj, DumpId refId)
{
if (dobj->nDeps >= dobj->allocDeps)
{
dobj->allocDeps = (dobj->allocDeps <= 0) ? 16 : dobj->allocDeps * 2;
dobj->dependencies = pg_realloc_array(dobj->dependencies,
  DumpId, dobj->allocDeps);
}
dobj->dependencies[dobj->nDeps++] = refId;
}

And then I use AH->tableDataChunkIds in repoint_table_dependencies() to
- replace the dependency on table def with dependency on first chunk
- add the remaining cunks at the end of dependency list.

> I was able to verify the scheduling issue with an index: the INDEX
> part is scheduled too early, before all TABLE DATA completes, but then
> locking prevents it from progressing, so everything completed fine in
> the end. Even if that's guaranteed, which I'm not 100% sure of, it's
> still based on luck and not proper logic, and takes up a slot (or
> multiple), reducing parallelism.
>
> 2. Fixing the TABLE DATA strcmp checks solves the scheduling issue,
> but it's not that simple, because then it causes truncation issues
> during restore, which needs additional changes in the restore code. I
> did a quick fix for that by adding an additional condition to the
> created flag, and with that it seems to restore everything properly,
> and with proper ordering, only starting index/constraint/etc after all
> table data is completed. However this was definitely just a quick test
> fix, this needs a proper better solution.
>
> Other issues I see are more minor, but numerous:

I collect the chunk dependencies in a separate array, which
should solve the truncation issue.

Can you advise a good check to add to tap tests for verifying?

> 3. The patch still has lots of debug output (pg_log_WARNING("CHUNKING
> ...")); Is this intended? Shouldn't these be behind some verbose
> check, and maybe use info instead of warning?

This left in for easing initial reviewing. I have either removed them
or turned them into pg_log_debug()

> 4. The is_segment macro should have () around the use of tdiptr

Thanks, fixed.

> 5. There's still a 32000 magic constant, shouldn't that have some
> descriptive name / explanatory comment?

I turned this into "ctid < (pagenr+1, 0)" for clarity and
futureproofing, as it is not entirely impossible that we could have
at some point more than 32000 items per page.

> 6. formatting issues at multiple places, mostly missing spaces after
> if/while/for statements

My hope was that the pre-release automatic formatting run takes care of
this.

I will eyeball to see if I find theem, but I don't think I have a good
way to detect them all.

Suggestions very much welcome!

> 7. inconsistent error messages (not in range vs must be in range)

> 8. There's a remaining TODO that seems stale, current_chunk_start is
> already uint64

Removed.

> 9. typo: "the computed from pog_relation_size" -> "then computed from
> pg_relation_size"

Fixed.

On Thu, Feb 12, 2026 at 7:13 AM Dilip Kumar <[email protected]> wrote:
>
> On Wed, Jan 28, 2026 at 11:00 PM Hannu Krosing <[email protected]> wrote:
> >
> > v13 has added a proper test comparing original and restored table data
> >
> I was reviewing v13 and here are some initial comments I have
>
> 1. IMHO the commit message details about the work progress instead of
> a high level idea about what it actually does and how.
> Suggestion:
>
> SUBJECT: Add --max-table-segment-pages option to pg_dump for parallel
> table dumping.
>
> This patch introduces the ability to split large heap tables into segments
> based on a specified number of pages. These segments can then be dumped in
> parallel using the existing jobs infrastructure, significantly reducing
> the time required to dump very large tables.
>
> The implementation uses ctid-based range queries (e.g., WHERE ctid >=
> '(start,1)'
> AND ctid <= '(end,32000)') to extract specific chunks of the relation.
>
> <more architecture details and limitation if any>

SUBJECT: Add --max-table-segment-pages option to pg_dump for parallel
table dumping.

This patch introduces the ability to split large heap tables into segments
based on a specified number of pages. These segments can then be dumped in
parallel using the existing jobs infrastructure, significantly reducing
the time required to dump very large tables.

This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.

The implementation uses ctid-based range queries (e.g., WHERE ctid >=
'(startPage,1)' AND ctid <= '(endPage+1,0)') to extract specific chunks of
the relation.

This is only effectively supported for PostgreSQL version 14+ though it does
work inefficiently on earlier versions

The patch only supports "heap" access method as others may not even have the
ctid column

> 2.
> + pg_log_warning("CHUNKING: set dopt.max_table_segment_pages to [%u]",
> dopt.max_table_segment_pages);
> + break;
>
> IMHO we don't need to place warning here while processing the input parameters

Either removed or turned to pg_log_debug()

> 3.
> + printf(_("  --max-table-segment-pages=NUMPAGES\n"
> +      "                               Number of main table pages
> above which data is \n"
> + "                               copied out in chunks, also
> determines the chunk size\n"));
>
> Check the comment formatting, all the parameter description starts
> with lower case, so better we start with "number" rather than "Number"

Fixed

> 4.
> + if (is_segment(tdinfo))
> + {
> + appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
> + if(tdinfo->startPage == 0)
> + appendPQExpBuffer(q, "ctid <= '(%u,32000)'", tdinfo->endPage);
> + else if(tdinfo->endPage != InvalidBlockNumber)
> + appendPQExpBuffer(q, "ctid BETWEEN '(%u,1)' AND '(%u,32000)'",
> + tdinfo->startPage, tdinfo->endPage);
> + else
> + appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
> + pg_log_warning("CHUNKING: pages [%u:%u]",tdinfo->startPage, tdinfo->endPage);
> + }
>
> IMHO we should explain this chunking logic in the comment above this code block?

I added the comment.
I also changed the chunk end logic to "ctid < '(LastPage+1,0)'" for clarity and
future-proofing.

----
Best Regards

Hannu


Attachments:

  [application/x-patch] v14-0001-SUBJECT-Add-max-table-segment-pages-option-to-pg.patch (27.9K, 2-v14-0001-SUBJECT-Add-max-table-segment-pages-option-to-pg.patch)
  download | inline diff:
From d9442eb6476ba27e0f3dee085e48de2efbb445d6 Mon Sep 17 00:00:00 2001
From: Hannu Krosing <[email protected]>
Date: Sat, 28 Mar 2026 11:53:39 +0100
Subject: [PATCH v14] SUBJECT: Add --max-table-segment-pages option to pg_dump
 for parallel table dumping.

This patch introduces the ability to split large heap tables into segments
based on a specified number of pages. These segments can then be dumped in
parallel using the existing jobs infrastructure, significantly reducing
the time required to dump very large tables.

This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.

The implementation uses ctid-based range queries (e.g., WHERE ctid >=
'(startPage,1)' AND ctid < '(endPage+1,0)') to extract specific chunks of
the relation.

This is only effectively supported for PostgreSQL version 14+ though it does
work inefficiently on earlier versions

The patch only supports "heap" access method as others may not even have the
ctid column
---
 doc/src/sgml/ref/pg_dump.sgml             |  24 +++
 src/bin/pg_dump/pg_backup.h               |   2 +
 src/bin/pg_dump/pg_backup_archiver.c      |  84 +++++++++-
 src/bin/pg_dump/pg_backup_archiver.h      |  12 +-
 src/bin/pg_dump/pg_dump.c                 | 177 +++++++++++++++++-----
 src/bin/pg_dump/pg_dump.h                 |  22 ++-
 src/bin/pg_dump/t/004_pg_dump_parallel.pl |  31 ++++
 src/fe_utils/option_utils.c               |  55 +++++++
 src/include/fe_utils/option_utils.h       |   3 +
 9 files changed, 364 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7f538e90194..5f056bb4af6 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1066,6 +1066,30 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-table-segment-pages=<replaceable class="parameter">npages</replaceable></option></term>
+      <listitem>
+       <para>
+        Dump data in segments based on number of pages in the main relation.
+        If the number of data pages in the relation is more than <replaceable class="parameter">npages</replaceable> 
+        the data is split into segments based on that number of pages.
+        Individual segments can be dumped in parallel.
+       </para>
+
+       <note>
+        <para>
+         The option <option>--max-table-segment-pages</option> is applied to only pages
+         in the main heap and if the table has a large TOASTed part this has to be
+         taken into account when deciding on the number of pages to use.
+         In the extreme case a single 8kB heap page can have ~200 toast pointers each 
+         corresponding to 1GB of data. If this data is also non-compressible then a 
+         single-page segment can dump as 200GB file.
+        </para>
+       </note>
+
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--no-comments</option></term>
       <listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fda912ba0a9..11863a1915f 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -27,6 +27,7 @@
 #include "common/file_utils.h"
 #include "fe_utils/simple_list.h"
 #include "libpq-fe.h"
+#include "storage/block.h"
 
 
 typedef enum trivalue
@@ -179,6 +180,7 @@ typedef struct _dumpOptions
 	bool		aclsSkip;
 	const char *lockWaitTimeout;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
+	BlockNumber	max_table_segment_pages; /* chunk when relpages is above this */
 
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 271a2c3e481..384add0713b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -44,6 +44,7 @@
 #include "pg_backup_archiver.h"
 #include "pg_backup_db.h"
 #include "pg_backup_utils.h"
+#include "storage/block.h"
 
 #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
 #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
@@ -154,6 +155,7 @@ InitDumpOptions(DumpOptions *opts)
 	opts->dumpSchema = true;
 	opts->dumpData = true;
 	opts->dumpStatistics = false;
+	opts->max_table_segment_pages = InvalidBlockNumber;
 }
 
 /*
@@ -1995,6 +1997,28 @@ _moveBefore(TocEntry *pos, TocEntry *te)
 	pos->prev = te;
 }
 
+/*
+ * Add a dependency id to a DependencyList object
+ * This is currently used for collecting reverse 
+ * dependencies for chunked data dump 
+ *
+ * Note: duplicate dependencies are currently not eliminated
+ */
+void
+addStandaloneDependency(DependencyList *dobj, DumpId refId)
+{
+	pg_log_warning("Adding dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	if (dobj->nDeps >= dobj->allocDeps)
+	{
+		dobj->allocDeps = (dobj->allocDeps <= 0) ? 16 : dobj->allocDeps * 2;
+		dobj->dependencies = pg_realloc_array(dobj->dependencies,
+											  DumpId, dobj->allocDeps);
+		pg_log_warning("Realloced list %p to size %d", (void *) dobj->dependencies, dobj->allocDeps);
+	}
+	pg_log_warning("Added dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	dobj->dependencies[dobj->nDeps++] = refId;
+}
+
 /*
  * Build index arrays for the TOC list
  *
@@ -2014,6 +2038,7 @@ buildTocEntryArrays(ArchiveHandle *AH)
 
 	AH->tocsByDumpId = pg_malloc0_array(TocEntry *, (maxDumpId + 1));
 	AH->tableDataId = pg_malloc0_array(DumpId, (maxDumpId + 1));
+	AH->tableDataChunkIds = pg_malloc0_array(DependencyList, (maxDumpId + 1));
 
 	for (te = AH->toc->next; te != AH->toc; te = te->next)
 	{
@@ -2029,8 +2054,12 @@ buildTocEntryArrays(ArchiveHandle *AH)
 		 * TOC entry that has a DATA item.  We compute this by reversing the
 		 * TABLE DATA item's dependency, knowing that a TABLE DATA item has
 		 * just one dependency and it is the TABLE item.
+		 *
+		 * For chunked table data, the TABLE DATA item has a description like
+		 * "TABLE DATA (pages 100:199)", and we collect all such items as
+		 * reverse dependencies for the parent table's entry in tableDataChunkIds.
 		 */
-		if (strcmp(te->desc, "TABLE DATA") == 0 && te->nDeps > 0)
+		if (strncmp(te->desc, "TABLE DATA", 10) == 0 && te->nDeps > 0)
 		{
 			DumpId		tableId = te->dependencies[0];
 
@@ -2042,7 +2071,14 @@ buildTocEntryArrays(ArchiveHandle *AH)
 			if (tableId <= 0 || tableId > maxDumpId)
 				pg_fatal("bad table dumpId for TABLE DATA item");
 
-			AH->tableDataId[tableId] = te->dumpId;
+			if (te->desc[10] == '\0') /* te->desc == "TABLE DATA" */
+				AH->tableDataId[tableId] = te->dumpId;
+			else
+			{
+				/* Chunked table data, the description is "TABLE DATA (pages %u:%u)" */
+				addStandaloneDependency(&(AH->tableDataChunkIds[tableId]), te->dumpId);
+				pg_log_debug("Added chunked table data dependency: tableId %u + chunkId %u",
+							 tableId, te->dumpId);}
 		}
 	}
 }
@@ -5017,6 +5053,12 @@ fix_dependencies(ArchiveHandle *AH)
  * that parallel restore will prioritize larger jobs (index builds, FK
  * constraint checks, etc) over smaller ones, avoiding situations where we
  * end a restore with only one active job working on a large table.
+ *
+ * In case of chunked dumps, we change the depenency on table with depedency
+ * on the first chunk of data and add the remaingi chunk ids, if any, to the 
+ * end of depencency list
+ * we also calculate the fullDataLength as the sum of the lengths of chunk
+ * data items and use that to set the item's dataLength.
  */
 static void
 repoint_table_dependencies(ArchiveHandle *AH)
@@ -5032,8 +5074,9 @@ repoint_table_dependencies(ArchiveHandle *AH)
 		for (i = 0; i < te->nDeps; i++)
 		{
 			olddep = te->dependencies[i];
-			if (olddep <= AH->maxDumpId &&
-				AH->tableDataId[olddep] != 0)
+			if (olddep > AH->maxDumpId)
+				continue;
+			if (AH->tableDataId[olddep] != 0)
 			{
 				DumpId		tabledataid = AH->tableDataId[olddep];
 				TocEntry   *tabledatate = AH->tocsByDumpId[tabledataid];
@@ -5043,6 +5086,39 @@ repoint_table_dependencies(ArchiveHandle *AH)
 				pg_log_debug("transferring dependency %d -> %d to %d",
 							 te->dumpId, olddep, tabledataid);
 			}
+			else if (AH->tableDataChunkIds[olddep].nDeps > 0)
+			{
+				int			j;
+				DumpId		chunkdataid;
+				uint64		fullDataLength;
+				DependencyList *deplist = &AH->tableDataChunkIds[olddep];
+
+				/* first in list replaces the dependency on table */
+				chunkdataid = deplist->dependencies[0];
+				te->dependencies[i] = chunkdataid;
+				fullDataLength = AH->tocsByDumpId[chunkdataid]->dataLength;
+				pg_log_debug("transferring chunk list %d -> %d to %d",
+							 te->dumpId, olddep, chunkdataid);
+
+				if (deplist->nDeps > 1)
+				{
+					/* make space */
+					te->dependencies = pg_realloc_array(te->dependencies,
+												  DumpId,
+												  te->nDeps + deplist->nDeps - 1);
+
+					/* the rest are appended to dependencies */
+					for (j = 1; j < deplist->nDeps; j++)
+					{
+						chunkdataid = deplist->dependencies[j];
+						te->dependencies[te->nDeps + j] = chunkdataid;
+						fullDataLength += AH->tocsByDumpId[chunkdataid]->dataLength;
+						pg_log_debug("adding chunk list %d -> %d to %d",
+									te->dumpId, olddep, chunkdataid);
+					}
+				}
+				te->dataLength = Max(te->dataLength, fullDataLength);
+			}
 		}
 	}
 }
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 365073b3eae..cfa3ea1bbd6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -179,6 +179,13 @@ typedef enum
 	OUTPUT_OTHERDATA,			/* writing data as INSERT commands */
 } ArchiverOutput;
 
+typedef struct _DependencyList
+{
+	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
+	int			nDeps;			/* number of valid dependencies */
+	int			allocDeps;		/* allocated size of dependencies[] */
+} DependencyList;
+
 /*
  * For historical reasons, ACL items are interspersed with everything else in
  * a dump file's TOC; typically they're right after the object they're for.
@@ -311,6 +318,7 @@ struct _archiveHandle
 	/* arrays created after the TOC list is complete: */
 	struct _tocEntry **tocsByDumpId;	/* TOCs indexed by dumpId */
 	DumpId	   *tableDataId;	/* TABLE DATA ids, indexed by table dumpId */
+	DependencyList *tableDataChunkIds; /* dependencies indexed by dumpId */
 
 	struct _tocEntry *currToc;	/* Used when dumping data */
 	pg_compress_specification compression_spec; /* Requested specification for
@@ -377,7 +385,7 @@ struct _tocEntry
 	size_t		defnLen;		/* length of dumped definition */
 
 	/* working state while dumping/restoring */
-	pgoff_t		dataLength;		/* item's data size; 0 if none or unknown */
+	uint64		dataLength;		/* item's data size; 0 if none or unknown */
 	int			reqs;			/* do we need schema and/or data of object
 								 * (REQ_* bit mask) */
 	bool		created;		/* set for DATA member if TABLE was created */
@@ -437,6 +445,8 @@ extern int	TocIDRequired(ArchiveHandle *AH, DumpId id);
 TocEntry   *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
 extern bool checkSeek(FILE *fp);
 
+extern void addStandaloneDependency(DependencyList *dobj, DumpId refId);
+
 #define appendStringLiteralAHX(buf,str,AH) \
 	appendStringLiteral(buf, str, (AH)->public.encoding, (AH)->public.std_strings)
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5d1f7682f11..1e7d9a3f7f3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -535,6 +535,7 @@ main(int argc, char **argv)
 		{"exclude-extension", required_argument, NULL, 17},
 		{"sequence-data", no_argument, &dopt.sequence_data, 1},
 		{"restrict-key", required_argument, NULL, 25},
+		{"max-table-segment-pages", required_argument, NULL, 26},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -799,6 +800,12 @@ main(int argc, char **argv)
 				dopt.restrict_key = pg_strdup(optarg);
 				break;
 
+			case 26:
+				if (!option_parse_uint32(optarg, "--max-table-segment-pages", 1, MaxBlockNumber,
+									  &dopt.max_table_segment_pages))
+					exit_nicely(1);
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -1344,6 +1351,9 @@ help(const char *progname)
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAME            include or exclude objects and data from dump\n"
 			 "                               based on expressions in FILENAME\n"));
+	printf(_("  --max-table-segment-pages=NUMPAGES\n"
+		     "                               number of main table pages above which data is \n"
+			 "                               copied out in chunks, also determines the chunk size\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -2396,7 +2406,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 	 * dumping an old pg_largeobject_metadata defined WITH OIDS.  For other
 	 * cases a simple COPY suffices.
 	 */
-	if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
+	if (tdinfo->filtercond || is_segment(tdinfo) || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
 		(fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
 		 tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
 	{
@@ -2414,9 +2424,37 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 		else
 			appendPQExpBufferStr(q, "* ");
 
-		appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
+		appendPQExpBuffer(q, "FROM %s %s",
 						  fmtQualifiedDumpable(tbinfo),
 						  tdinfo->filtercond ? tdinfo->filtercond : "");
+		/* If it's a segment, we need to add a filter condition to select the
+		 * right page range 
+		 * - for first segment we add "ctid < (endPage+1, 0)" 
+		 *   first segment is the one with startPage == 0
+		 * - for last segment we add "ctid >= (startPage, 1)"
+		 *   last segment is the one with endPage == InvalidBlockNumber
+		 *   we leave to upper bound open for the case where more pages 
+		 *   were added after we measured 
+		 * - for middle segments we add 
+		 *   "ctid >= (startPage, 1) AND ctid < (endPage+1, 0)"
+		 *
+		 * "ctid < (endPage+1, 0)" instead of "ctid <= (endPage, maxtuple)"
+		 * was chosen as range end so that we do not have to estimate the maxtuple
+		 * 
+		 */
+		if (is_segment(tdinfo))
+		{
+			appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
+			if(tdinfo->startPage == 0)
+				appendPQExpBuffer(q, "ctid < '(%u,0)'", tdinfo->endPage+1);			
+			else if(tdinfo->endPage != InvalidBlockNumber)
+				appendPQExpBuffer(q, "ctid >= '(%u,1)' AND ctid < '(%u,0)'",
+								 tdinfo->startPage, tdinfo->endPage+1);
+			else
+				appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
+		}
+
+		appendPQExpBuffer(q, ") TO stdout;");
 	}
 	else
 	{
@@ -2424,6 +2462,10 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 						  fmtQualifiedDumpable(tbinfo),
 						  column_list);
 	}
+
+	if (is_segment(tdinfo))
+		pg_log_debug("CHUNKING: data query: %s", q->data);
+	
 	res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
 	PQclear(res);
 	destroyPQExpBuffer(clistBuf);
@@ -2919,42 +2961,89 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
 	{
 		TocEntry   *te;
 
-		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
-						  ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
-									   .namespace = tbinfo->dobj.namespace->dobj.name,
-									   .owner = tbinfo->rolname,
-									   .description = "TABLE DATA",
-									   .section = SECTION_DATA,
-									   .createStmt = tdDefn,
-									   .copyStmt = copyStmt,
-									   .deps = &(tbinfo->dobj.dumpId),
-									   .nDeps = 1,
-									   .dumpFn = dumpFn,
-									   .dumpArg = tdinfo));
-
-		/*
-		 * Set the TocEntry's dataLength in case we are doing a parallel dump
-		 * and want to order dump jobs by table size.  We choose to measure
-		 * dataLength in table pages (including TOAST pages) during dump, so
-		 * no scaling is needed.
-		 *
-		 * However, relpages is declared as "integer" in pg_class, and hence
-		 * also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
-		 * Cast so that we get the right interpretation of table sizes
-		 * exceeding INT_MAX pages.
+		/* data chunking works off relpages, which are computed exactly using
+		 * pg_relation_size() when --max-table-segment-pages was set
+		 * 
+		 * We also don't chunk if table access method is not "heap"
+		 * TODO: we may add chunking for other access methods later, maybe 
+		 * based on primary key tranges
 		 */
-		te->dataLength = (BlockNumber) tbinfo->relpages;
-		te->dataLength += (BlockNumber) tbinfo->toastpages;
+		if (tbinfo->relpages <= dopt->max_table_segment_pages || 
+			strcmp(tbinfo->amname, "heap") != 0)
+		{
+			te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = "TABLE DATA",
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = tdinfo));
 
-		/*
-		 * If pgoff_t is only 32 bits wide, the above refinement is useless,
-		 * and instead we'd better worry about integer overflow.  Clamp to
-		 * INT_MAX if the correct result exceeds that.
-		 */
-		if (sizeof(te->dataLength) == 4 &&
-			(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
-			 te->dataLength < 0))
-			te->dataLength = INT_MAX;
+			/*
+			 * Set the TocEntry's dataLength in case we are doing a parallel dump
+			 * and want to order dump jobs by table size.  We choose to measure
+			 * dataLength in table pages (including TOAST pages) during dump, so
+			 * no scaling is needed.
+			 *
+			 * While pg_class.relpages which stores BlockNumber, a/k/a unsigned int,
+			 * is declared as "integer" we convert it back and store it as 
+			 * BlockNumber in TableInfo.
+			 * And dataLenght is pgoff_t (long int) so does now overflow for
+			 * 2 x UINT32_MAX 
+			 */
+			te->dataLength = tbinfo->relpages;
+			te->dataLength += tbinfo->toastpages;
+		}
+		else
+		{
+			uint64 current_chunk_start = 0;
+			PQExpBuffer chunk_desc = createPQExpBuffer();
+
+			while (current_chunk_start < tbinfo->relpages)
+			{
+				TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo));
+
+				memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo));
+				AssignDumpId(&chunk_tdinfo->dobj);
+				addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId);
+				chunk_tdinfo->startPage = (BlockNumber) current_chunk_start;
+				chunk_tdinfo->endPage = chunk_tdinfo->startPage + dopt->max_table_segment_pages - 1;
+				
+				current_chunk_start += dopt->max_table_segment_pages;
+				if (current_chunk_start >= tbinfo->relpages)
+					chunk_tdinfo->endPage = InvalidBlockNumber; /* last chunk is for "all the rest" */
+
+				printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+
+				te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = chunk_desc->data,
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = chunk_tdinfo));
+
+				if(chunk_tdinfo->endPage == InvalidBlockNumber)
+					te->dataLength = tbinfo->relpages - chunk_tdinfo->startPage;
+				else
+					te->dataLength = dopt->max_table_segment_pages;
+				/* let's assume toast pages distribute evenly among chunks */
+				if(tbinfo->relpages)
+					te->dataLength += te->dataLength * tbinfo->toastpages / tbinfo->relpages;
+			}
+
+			destroyPQExpBuffer(chunk_desc);
+		}
 	}
 
 	destroyPQExpBuffer(copyBuf);
@@ -3081,6 +3170,8 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
 	tdinfo->dobj.namespace = tbinfo->dobj.namespace;
 	tdinfo->tdtable = tbinfo;
 	tdinfo->filtercond = NULL;	/* might get set later */
+	tdinfo->startPage = InvalidBlockNumber; /* we use this as indication that no chunking is needed */
+	tdinfo->endPage = InvalidBlockNumber;
 	addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
 
 	/* A TableDataInfo contains data, of course */
@@ -7347,8 +7438,16 @@ getTables(Archive *fout, int *numTables)
 						 "c.relnamespace, c.relkind, c.reltype, "
 						 "c.relowner, "
 						 "c.relchecks, "
-						 "c.relhasindex, c.relhasrules, c.relpages, "
-						 "c.reltuples, c.relallvisible, ");
+						 "c.relhasindex, c.relhasrules, ");
+
+	/* fetch current relation size if chunking is requested */
+	if(dopt->max_table_segment_pages != InvalidBlockNumber)
+		appendPQExpBufferStr(query, "pg_relation_size(c.oid)/current_setting('block_size')::int AS relpages, ");
+	else
+		/* pg_class.relpages stores BlockNumber (uint32) in an int field, convert to oid to get unsigned int out */
+		appendPQExpBufferStr(query, "c.relpages::oid, ");
+
+	appendPQExpBufferStr(query, "c.reltuples, c.relallvisible, ");
 
 	if (fout->remoteVersion >= 180000)
 		appendPQExpBufferStr(query, "c.relallfrozen, ");
@@ -7589,7 +7688,7 @@ getTables(Archive *fout, int *numTables)
 		tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks));
 		tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
 		tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
-		tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));
+		tblinfo[i].relpages = strtoul(PQgetvalue(res, i, i_relpages), NULL, 10);
 		if (PQgetisnull(res, i, i_toastpages))
 			tblinfo[i].toastpages = 0;
 		else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 5a6726d8b12..84e682d585f 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -16,6 +16,7 @@
 
 #include "pg_backup.h"
 #include "catalog/pg_publication_d.h"
+#include "storage/block.h"
 
 
 #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ?  1 : 0) )
@@ -335,7 +336,11 @@ typedef struct _tableInfo
 	Oid			owning_tab;		/* OID of table owning sequence */
 	int			owning_col;		/* attr # of column owning sequence */
 	bool		is_identity_sequence;
-	int32		relpages;		/* table's size in pages (from pg_class) */
+	BlockNumber	relpages;		/* table's size in pages (from pg_class) 
+	                             * converted to unsigned integer
+								 * when --max-table-segment-pages is set
+								 * the computed from pg_relation_size()
+	                             */
 	int			toastpages;		/* toast table's size in pages, if any */
 
 	bool		interesting;	/* true if need to collect more data */
@@ -413,8 +418,21 @@ typedef struct _tableDataInfo
 	DumpableObject dobj;
 	TableInfo  *tdtable;		/* link to table to dump */
 	char	   *filtercond;		/* WHERE condition to limit rows dumped */
+	/* startPage and endPage to support segmented dump */
+	BlockNumber	startPage;		/* As we always know the lowest segment page
+								 * number we can use InvalidBlockNumber here
+								 * to recognize no segmenting case.
+								 * When 0 for the first page of first
+								 * segment we can omit in range query */
+	BlockNumber	endPage;		/* last page in segment for page-range dump,
+	                    		 * startPage+max_table_segment_pages-1 for 
+								 * most segments, but InvalidBlockNumber for
+								 * the last one to indicate open range
+								 */
 } TableDataInfo;
 
+#define is_segment(tdiptr) ((tdiptr)->startPage != InvalidBlockNumber)
+
 typedef struct _indxInfo
 {
 	DumpableObject dobj;
@@ -449,7 +467,7 @@ typedef struct _relStatsInfo
 {
 	DumpableObject dobj;
 	Oid			relid;
-	int32		relpages;
+	BlockNumber	relpages;
 	char	   *reltuples;
 	int32		relallvisible;
 	int32		relallfrozen;
diff --git a/src/bin/pg_dump/t/004_pg_dump_parallel.pl b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
index 738f34b1c1b..4f35aeed9b9 100644
--- a/src/bin/pg_dump/t/004_pg_dump_parallel.pl
+++ b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
@@ -11,6 +11,7 @@ use Test::More;
 my $dbname1 = 'regression_src';
 my $dbname2 = 'regression_dest1';
 my $dbname3 = 'regression_dest2';
+my $dbname4 = 'regression_dest3';
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
@@ -21,6 +22,7 @@ my $backupdir = $node->backup_dir;
 $node->run_log([ 'createdb', $dbname1 ]);
 $node->run_log([ 'createdb', $dbname2 ]);
 $node->run_log([ 'createdb', $dbname3 ]);
+$node->run_log([ 'createdb', $dbname4 ]);
 
 $node->safe_psql(
 	$dbname1,
@@ -87,4 +89,33 @@ $node->command_ok(
 	],
 	'parallel restore as inserts');
 
+$node->command_ok(
+	[
+		'pg_dump',
+		'--format' => 'directory',
+		'--max-table-segment-pages' => 2,
+		'--no-sync',
+		'--jobs' => 2,
+		'--file' => "$backupdir/dump3",
+		$node->connstr($dbname1),
+	],
+	'parallel dump with chunks of two heap pages');
+
+$node->command_ok(
+	[
+		'pg_restore', '--verbose',
+		'--dbname' => $node->connstr($dbname4),
+		'--jobs' => 3,
+		"$backupdir/dump3",
+	],
+	'parallel restore with chunks of two heap pages');
+
+my $table = 'tplain';
+my $tablehash_query = "SELECT '$table', sum(hashtext(t::text)), count(*) FROM $table AS t";
+
+my $result_1 = $node->safe_psql($dbname1, $tablehash_query);
+my $result_4 = $node->safe_psql($dbname4, $tablehash_query);
+
+is($result_4, $result_1, "Hash check for $table: restored db ($result_4) vs original db ($result_1)");
+
 done_testing();
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 8d0659c1164..a516d8c86a9 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -83,6 +83,61 @@ option_parse_int(const char *optarg, const char *optname,
 	return true;
 }
 
+/*
+ * option_parse_uint32
+ *
+ * Parse unsigned integer value for an option.  If the parsing is successful,
+ * returns true and stores the result in *result if that's given;
+ * if parsing fails, returns false.
+ */
+bool
+option_parse_uint32(const char *optarg, const char *optname,
+				 uint32 min_range, uint32 max_range,
+				 uint32 *result)
+{
+	char	   		*endptr;
+	unsigned long	val;
+
+	/* Fail if there is a minus sign at the start of value */
+	while(isspace((unsigned char) *optarg))
+		optarg++;
+	if(*optarg == '-')
+	{
+		pg_log_error("value \"%s\" for option %s can not be negative",
+					optarg, optname);
+		return false;
+	}
+
+	errno = 0;
+	val = strtoul(optarg, &endptr, 10);
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail.
+	 */
+	while (*endptr != '\0' && isspace((unsigned char) *endptr))
+		endptr++;
+
+	if (*endptr != '\0')
+	{
+		pg_log_error("invalid value \"%s\" for option %s",
+					 optarg, optname);
+		return false;
+	}
+
+	/* as min_range and max_range are uint32 then the range check will
+	 * catch the case where unsigned long val is outside 32 bit range */
+	if (errno == ERANGE || val < min_range || val > max_range)
+	{
+		pg_log_error("%s not in range %u..%u", optname, min_range, max_range);
+		return false;
+	}
+
+	if (result)
+		*result = (uint32) val;
+	return true;
+}
+
 /*
  * Provide strictly harmonized handling of the --sync-method option.
  */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d975db77af2..67fd3650d7a 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -22,6 +22,9 @@ extern void handle_help_version_opts(int argc, char *argv[],
 extern bool option_parse_int(const char *optarg, const char *optname,
 							 int min_range, int max_range,
 							 int *result);
+extern bool option_parse_uint32(const char *optarg, const char *optname,
+							 uint32 min_range, uint32 max_range,
+							 uint32 *result);
 extern bool parse_sync_method(const char *optarg,
 							  DataDirSyncMethod *sync_method);
 extern void check_mut_excl_opts_internal(int n,...);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-28 15:32             ` Hannu Krosing <[email protected]>
  2026-03-28 15:33               ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  1 sibling, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2026-03-28 15:32 UTC (permalink / raw)
  To: Michael Banck <[email protected]>; +Cc: David Rowley <[email protected]>; Ashutosh Bapat <[email protected]>; PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

The issue is that currently the value is given in "main table pages"
and it would be somewhat deceptive, or at least confusing, to try to
express this in any other unit.

As I explained in the commit message:

---------8<-------------------8<-------------------8<----------------
This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
---------8<-------------------8<-------------------8<----------------

And I can think of no cheap and reliable way to change that equation.

I'll be very happy if you have any good ideas for either improving the
flag name, or even propose a way to better estimate the resulting dump
file size so we could give the chunk size in better units

---
Hannu





On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > Perhaps --max-table-segment-pages is a better name than
> > --huge-table-chunk-pages as it's quite subjective what the minimum
> > number of pages required to make a table "huge".
>
> I'm not sure that's better - without looking at the documentation,
> people might confuse segment here with the 1GB split of tables into
> segments. As pg_dump is a very common and basic user tool, I don't think
> implementation details like pages/page sizes and blocks should be part
> of its UX.
>
> Can't we just make it a storage size, like '10GB' and then rename it to
> --table-parallel-threshold or something? I agree it's bikeshedding, but
> I personally don't like either --max-table-segment-pages or
> --huge-table-chunk-pages.
>
>
> Michael





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:32             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-28 15:33               ` Hannu Krosing <[email protected]>
  2026-03-29 21:49                 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2026-03-28 15:33 UTC (permalink / raw)
  To: Michael Banck <[email protected]>; +Cc: David Rowley <[email protected]>; Ashutosh Bapat <[email protected]>; PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

The above

"Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items."

should read

"Or it can be almost 200 GB *for a single page* if the page has just
pointers to 1GB TOAST items."


On Sat, Mar 28, 2026 at 4:32 PM Hannu Krosing <[email protected]> wrote:
>
> The issue is that currently the value is given in "main table pages"
> and it would be somewhat deceptive, or at least confusing, to try to
> express this in any other unit.
>
> As I explained in the commit message:
>
> ---------8<-------------------8<-------------------8<----------------
> This --max-table-segment-pages number specifically applies to main table
> pages which does not guarantee anything about output size.
> The output could be empty if there are no live tuples in the page range.
> Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
> ---------8<-------------------8<-------------------8<----------------
>
> And I can think of no cheap and reliable way to change that equation.
>
> I'll be very happy if you have any good ideas for either improving the
> flag name, or even propose a way to better estimate the resulting dump
> file size so we could give the chunk size in better units
>
> ---
> Hannu
>
>
>
>
>
> On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > > Perhaps --max-table-segment-pages is a better name than
> > > --huge-table-chunk-pages as it's quite subjective what the minimum
> > > number of pages required to make a table "huge".
> >
> > I'm not sure that's better - without looking at the documentation,
> > people might confuse segment here with the 1GB split of tables into
> > segments. As pg_dump is a very common and basic user tool, I don't think
> > implementation details like pages/page sizes and blocks should be part
> > of its UX.
> >
> > Can't we just make it a storage size, like '10GB' and then rename it to
> > --table-parallel-threshold or something? I agree it's bikeshedding, but
> > I personally don't like either --max-table-segment-pages or
> > --huge-table-chunk-pages.
> >
> >
> > Michael





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:32             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:33               ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-29 21:49                 ` Hannu Krosing <[email protected]>
  2026-03-30 17:32                   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2026-03-29 21:49 UTC (permalink / raw)
  To: Michael Banck <[email protected]>; +Cc: David Rowley <[email protected]>; Ashutosh Bapat <[email protected]>; PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

Fixing a off-by-one error in copying over dependencies


On Sat, Mar 28, 2026 at 4:33 PM Hannu Krosing <[email protected]> wrote:
>
> The above
>
> "Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items."
>
> should read
>
> "Or it can be almost 200 GB *for a single page* if the page has just
> pointers to 1GB TOAST items."
>
>
> On Sat, Mar 28, 2026 at 4:32 PM Hannu Krosing <[email protected]> wrote:
> >
> > The issue is that currently the value is given in "main table pages"
> > and it would be somewhat deceptive, or at least confusing, to try to
> > express this in any other unit.
> >
> > As I explained in the commit message:
> >
> > ---------8<-------------------8<-------------------8<----------------
> > This --max-table-segment-pages number specifically applies to main table
> > pages which does not guarantee anything about output size.
> > The output could be empty if there are no live tuples in the page range.
> > Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
> > ---------8<-------------------8<-------------------8<----------------
> >
> > And I can think of no cheap and reliable way to change that equation.
> >
> > I'll be very happy if you have any good ideas for either improving the
> > flag name, or even propose a way to better estimate the resulting dump
> > file size so we could give the chunk size in better units
> >
> > ---
> > Hannu
> >
> >
> >
> >
> >
> > On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > > > Perhaps --max-table-segment-pages is a better name than
> > > > --huge-table-chunk-pages as it's quite subjective what the minimum
> > > > number of pages required to make a table "huge".
> > >
> > > I'm not sure that's better - without looking at the documentation,
> > > people might confuse segment here with the 1GB split of tables into
> > > segments. As pg_dump is a very common and basic user tool, I don't think
> > > implementation details like pages/page sizes and blocks should be part
> > > of its UX.
> > >
> > > Can't we just make it a storage size, like '10GB' and then rename it to
> > > --table-parallel-threshold or something? I agree it's bikeshedding, but
> > > I personally don't like either --max-table-segment-pages or
> > > --huge-table-chunk-pages.
> > >
> > >
> > > Michael


Attachments:

  [application/x-patch] v15-0001-Add-max-table-segment-pages-option-to-pg.patch (27.9K, 2-v15-0001-Add-max-table-segment-pages-option-to-pg.patch)
  download | inline diff:
From d9442eb6476ba27e0f3dee085e48de2efbb445d6 Mon Sep 17 00:00:00 2001
From: Hannu Krosing <[email protected]>
Date: Sat, 28 Mar 2026 11:53:39 +0100
Subject: [PATCH v14] SUBJECT: Add --max-table-segment-pages option to pg_dump
 for parallel table dumping.

This patch introduces the ability to split large heap tables into segments
based on a specified number of pages. These segments can then be dumped in
parallel using the existing jobs infrastructure, significantly reducing
the time required to dump very large tables.

This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.

The implementation uses ctid-based range queries (e.g., WHERE ctid >=
'(startPage,1)' AND ctid < '(endPage+1,0)') to extract specific chunks of
the relation.

This is only effectively supported for PostgreSQL version 14+ though it does
work inefficiently on earlier versions

The patch only supports "heap" access method as others may not even have the
ctid column
---
 doc/src/sgml/ref/pg_dump.sgml             |  24 +++
 src/bin/pg_dump/pg_backup.h               |   2 +
 src/bin/pg_dump/pg_backup_archiver.c      |  84 +++++++++-
 src/bin/pg_dump/pg_backup_archiver.h      |  12 +-
 src/bin/pg_dump/pg_dump.c                 | 177 +++++++++++++++++-----
 src/bin/pg_dump/pg_dump.h                 |  22 ++-
 src/bin/pg_dump/t/004_pg_dump_parallel.pl |  31 ++++
 src/fe_utils/option_utils.c               |  55 +++++++
 src/include/fe_utils/option_utils.h       |   3 +
 9 files changed, 364 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7f538e90194..5f056bb4af6 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1066,6 +1066,30 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-table-segment-pages=<replaceable class="parameter">npages</replaceable></option></term>
+      <listitem>
+       <para>
+        Dump data in segments based on number of pages in the main relation.
+        If the number of data pages in the relation is more than <replaceable class="parameter">npages</replaceable> 
+        the data is split into segments based on that number of pages.
+        Individual segments can be dumped in parallel.
+       </para>
+
+       <note>
+        <para>
+         The option <option>--max-table-segment-pages</option> is applied to only pages
+         in the main heap and if the table has a large TOASTed part this has to be
+         taken into account when deciding on the number of pages to use.
+         In the extreme case a single 8kB heap page can have ~200 toast pointers each 
+         corresponding to 1GB of data. If this data is also non-compressible then a 
+         single-page segment can dump as 200GB file.
+        </para>
+       </note>
+
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--no-comments</option></term>
       <listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fda912ba0a9..11863a1915f 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -27,6 +27,7 @@
 #include "common/file_utils.h"
 #include "fe_utils/simple_list.h"
 #include "libpq-fe.h"
+#include "storage/block.h"
 
 
 typedef enum trivalue
@@ -179,6 +180,7 @@ typedef struct _dumpOptions
 	bool		aclsSkip;
 	const char *lockWaitTimeout;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
+	BlockNumber	max_table_segment_pages; /* chunk when relpages is above this */
 
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 271a2c3e481..384add0713b 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -44,6 +44,7 @@
 #include "pg_backup_archiver.h"
 #include "pg_backup_db.h"
 #include "pg_backup_utils.h"
+#include "storage/block.h"
 
 #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
 #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
@@ -154,6 +155,7 @@ InitDumpOptions(DumpOptions *opts)
 	opts->dumpSchema = true;
 	opts->dumpData = true;
 	opts->dumpStatistics = false;
+	opts->max_table_segment_pages = InvalidBlockNumber;
 }
 
 /*
@@ -1995,6 +1997,28 @@ _moveBefore(TocEntry *pos, TocEntry *te)
 	pos->prev = te;
 }
 
+/*
+ * Add a dependency id to a DependencyList object
+ * This is currently used for collecting reverse 
+ * dependencies for chunked data dump 
+ *
+ * Note: duplicate dependencies are currently not eliminated
+ */
+void
+addStandaloneDependency(DependencyList *dobj, DumpId refId)
+{
+	pg_log_warning("Adding dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	if (dobj->nDeps >= dobj->allocDeps)
+	{
+		dobj->allocDeps = (dobj->allocDeps <= 0) ? 16 : dobj->allocDeps * 2;
+		dobj->dependencies = pg_realloc_array(dobj->dependencies,
+											  DumpId, dobj->allocDeps);
+		pg_log_warning("Realloced list %p to size %d", (void *) dobj->dependencies, dobj->allocDeps);
+	}
+	pg_log_warning("Added dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	dobj->dependencies[dobj->nDeps++] = refId;
+}
+
 /*
  * Build index arrays for the TOC list
  *
@@ -2014,6 +2038,7 @@ buildTocEntryArrays(ArchiveHandle *AH)
 
 	AH->tocsByDumpId = pg_malloc0_array(TocEntry *, (maxDumpId + 1));
 	AH->tableDataId = pg_malloc0_array(DumpId, (maxDumpId + 1));
+	AH->tableDataChunkIds = pg_malloc0_array(DependencyList, (maxDumpId + 1));
 
 	for (te = AH->toc->next; te != AH->toc; te = te->next)
 	{
@@ -2029,8 +2054,12 @@ buildTocEntryArrays(ArchiveHandle *AH)
 		 * TOC entry that has a DATA item.  We compute this by reversing the
 		 * TABLE DATA item's dependency, knowing that a TABLE DATA item has
 		 * just one dependency and it is the TABLE item.
+		 *
+		 * For chunked table data, the TABLE DATA item has a description like
+		 * "TABLE DATA (pages 100:199)", and we collect all such items as
+		 * reverse dependencies for the parent table's entry in tableDataChunkIds.
 		 */
-		if (strcmp(te->desc, "TABLE DATA") == 0 && te->nDeps > 0)
+		if (strncmp(te->desc, "TABLE DATA", 10) == 0 && te->nDeps > 0)
 		{
 			DumpId		tableId = te->dependencies[0];
 
@@ -2042,7 +2071,14 @@ buildTocEntryArrays(ArchiveHandle *AH)
 			if (tableId <= 0 || tableId > maxDumpId)
 				pg_fatal("bad table dumpId for TABLE DATA item");
 
-			AH->tableDataId[tableId] = te->dumpId;
+			if (te->desc[10] == '\0') /* te->desc == "TABLE DATA" */
+				AH->tableDataId[tableId] = te->dumpId;
+			else
+			{
+				/* Chunked table data, the description is "TABLE DATA (pages %u:%u)" */
+				addStandaloneDependency(&(AH->tableDataChunkIds[tableId]), te->dumpId);
+				pg_log_debug("Added chunked table data dependency: tableId %u + chunkId %u",
+							 tableId, te->dumpId);}
 		}
 	}
 }
@@ -5017,6 +5053,12 @@ fix_dependencies(ArchiveHandle *AH)
  * that parallel restore will prioritize larger jobs (index builds, FK
  * constraint checks, etc) over smaller ones, avoiding situations where we
  * end a restore with only one active job working on a large table.
+ *
+ * In case of chunked dumps, we change the depenency on table with depedency
+ * on the first chunk of data and add the remaingi chunk ids, if any, to the 
+ * end of depencency list
+ * we also calculate the fullDataLength as the sum of the lengths of chunk
+ * data items and use that to set the item's dataLength.
  */
 static void
 repoint_table_dependencies(ArchiveHandle *AH)
@@ -5032,8 +5074,9 @@ repoint_table_dependencies(ArchiveHandle *AH)
 		for (i = 0; i < te->nDeps; i++)
 		{
 			olddep = te->dependencies[i];
-			if (olddep <= AH->maxDumpId &&
-				AH->tableDataId[olddep] != 0)
+			if (olddep > AH->maxDumpId)
+				continue;
+			if (AH->tableDataId[olddep] != 0)
 			{
 				DumpId		tabledataid = AH->tableDataId[olddep];
 				TocEntry   *tabledatate = AH->tocsByDumpId[tabledataid];
@@ -5043,6 +5086,39 @@ repoint_table_dependencies(ArchiveHandle *AH)
 				pg_log_debug("transferring dependency %d -> %d to %d",
 							 te->dumpId, olddep, tabledataid);
 			}
+			else if (AH->tableDataChunkIds[olddep].nDeps > 0)
+			{
+				int			j;
+				DumpId		chunkdataid;
+				uint64		fullDataLength;
+				DependencyList *deplist = &AH->tableDataChunkIds[olddep];
+
+				/* first in list replaces the dependency on table */
+				chunkdataid = deplist->dependencies[0];
+				te->dependencies[i] = chunkdataid;
+				fullDataLength = AH->tocsByDumpId[chunkdataid]->dataLength;
+				pg_log_debug("transferring chunk list %d -> %d to %d",
+							 te->dumpId, olddep, chunkdataid);
+
+				if (deplist->nDeps > 1)
+				{
+					/* make space */
+					te->dependencies = pg_realloc_array(te->dependencies,
+												  DumpId,
+												  te->nDeps + deplist->nDeps - 1);
+
+					/* the rest are appended to dependencies */
+					for (j = 1; j < deplist->nDeps; j++)
+					{
+						chunkdataid = deplist->dependencies[j];
+						te->dependencies[te->nDeps++] = chunkdataid;
+						fullDataLength += AH->tocsByDumpId[chunkdataid]->dataLength;
+						pg_log_debug("adding chunk list %d -> %d to %d",
+									te->dumpId, olddep, chunkdataid);
+					}
+				}
+				te->dataLength = Max(te->dataLength, fullDataLength);
+			}
 		}
 	}
 }
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 365073b3eae..cfa3ea1bbd6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -179,6 +179,13 @@ typedef enum
 	OUTPUT_OTHERDATA,			/* writing data as INSERT commands */
 } ArchiverOutput;
 
+typedef struct _DependencyList
+{
+	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
+	int			nDeps;			/* number of valid dependencies */
+	int			allocDeps;		/* allocated size of dependencies[] */
+} DependencyList;
+
 /*
  * For historical reasons, ACL items are interspersed with everything else in
  * a dump file's TOC; typically they're right after the object they're for.
@@ -311,6 +318,7 @@ struct _archiveHandle
 	/* arrays created after the TOC list is complete: */
 	struct _tocEntry **tocsByDumpId;	/* TOCs indexed by dumpId */
 	DumpId	   *tableDataId;	/* TABLE DATA ids, indexed by table dumpId */
+	DependencyList *tableDataChunkIds; /* dependencies indexed by dumpId */
 
 	struct _tocEntry *currToc;	/* Used when dumping data */
 	pg_compress_specification compression_spec; /* Requested specification for
@@ -377,7 +385,7 @@ struct _tocEntry
 	size_t		defnLen;		/* length of dumped definition */
 
 	/* working state while dumping/restoring */
-	pgoff_t		dataLength;		/* item's data size; 0 if none or unknown */
+	uint64		dataLength;		/* item's data size; 0 if none or unknown */
 	int			reqs;			/* do we need schema and/or data of object
 								 * (REQ_* bit mask) */
 	bool		created;		/* set for DATA member if TABLE was created */
@@ -437,6 +445,8 @@ extern int	TocIDRequired(ArchiveHandle *AH, DumpId id);
 TocEntry   *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
 extern bool checkSeek(FILE *fp);
 
+extern void addStandaloneDependency(DependencyList *dobj, DumpId refId);
+
 #define appendStringLiteralAHX(buf,str,AH) \
 	appendStringLiteral(buf, str, (AH)->public.encoding, (AH)->public.std_strings)
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5d1f7682f11..1e7d9a3f7f3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -535,6 +535,7 @@ main(int argc, char **argv)
 		{"exclude-extension", required_argument, NULL, 17},
 		{"sequence-data", no_argument, &dopt.sequence_data, 1},
 		{"restrict-key", required_argument, NULL, 25},
+		{"max-table-segment-pages", required_argument, NULL, 26},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -799,6 +800,12 @@ main(int argc, char **argv)
 				dopt.restrict_key = pg_strdup(optarg);
 				break;
 
+			case 26:
+				if (!option_parse_uint32(optarg, "--max-table-segment-pages", 1, MaxBlockNumber,
+									  &dopt.max_table_segment_pages))
+					exit_nicely(1);
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -1344,6 +1351,9 @@ help(const char *progname)
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAME            include or exclude objects and data from dump\n"
 			 "                               based on expressions in FILENAME\n"));
+	printf(_("  --max-table-segment-pages=NUMPAGES\n"
+		     "                               number of main table pages above which data is \n"
+			 "                               copied out in chunks, also determines the chunk size\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -2396,7 +2406,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 	 * dumping an old pg_largeobject_metadata defined WITH OIDS.  For other
 	 * cases a simple COPY suffices.
 	 */
-	if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
+	if (tdinfo->filtercond || is_segment(tdinfo) || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
 		(fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
 		 tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
 	{
@@ -2414,9 +2424,37 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 		else
 			appendPQExpBufferStr(q, "* ");
 
-		appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
+		appendPQExpBuffer(q, "FROM %s %s",
 						  fmtQualifiedDumpable(tbinfo),
 						  tdinfo->filtercond ? tdinfo->filtercond : "");
+		/* If it's a segment, we need to add a filter condition to select the
+		 * right page range 
+		 * - for first segment we add "ctid < (endPage+1, 0)" 
+		 *   first segment is the one with startPage == 0
+		 * - for last segment we add "ctid >= (startPage, 1)"
+		 *   last segment is the one with endPage == InvalidBlockNumber
+		 *   we leave to upper bound open for the case where more pages 
+		 *   were added after we measured 
+		 * - for middle segments we add 
+		 *   "ctid >= (startPage, 1) AND ctid < (endPage+1, 0)"
+		 *
+		 * "ctid < (endPage+1, 0)" instead of "ctid <= (endPage, maxtuple)"
+		 * was chosen as range end so that we do not have to estimate the maxtuple
+		 * 
+		 */
+		if (is_segment(tdinfo))
+		{
+			appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
+			if(tdinfo->startPage == 0)
+				appendPQExpBuffer(q, "ctid < '(%u,0)'", tdinfo->endPage+1);			
+			else if(tdinfo->endPage != InvalidBlockNumber)
+				appendPQExpBuffer(q, "ctid >= '(%u,1)' AND ctid < '(%u,0)'",
+								 tdinfo->startPage, tdinfo->endPage+1);
+			else
+				appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
+		}
+
+		appendPQExpBuffer(q, ") TO stdout;");
 	}
 	else
 	{
@@ -2424,6 +2462,10 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 						  fmtQualifiedDumpable(tbinfo),
 						  column_list);
 	}
+
+	if (is_segment(tdinfo))
+		pg_log_debug("CHUNKING: data query: %s", q->data);
+	
 	res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
 	PQclear(res);
 	destroyPQExpBuffer(clistBuf);
@@ -2919,42 +2961,89 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
 	{
 		TocEntry   *te;
 
-		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
-						  ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
-									   .namespace = tbinfo->dobj.namespace->dobj.name,
-									   .owner = tbinfo->rolname,
-									   .description = "TABLE DATA",
-									   .section = SECTION_DATA,
-									   .createStmt = tdDefn,
-									   .copyStmt = copyStmt,
-									   .deps = &(tbinfo->dobj.dumpId),
-									   .nDeps = 1,
-									   .dumpFn = dumpFn,
-									   .dumpArg = tdinfo));
-
-		/*
-		 * Set the TocEntry's dataLength in case we are doing a parallel dump
-		 * and want to order dump jobs by table size.  We choose to measure
-		 * dataLength in table pages (including TOAST pages) during dump, so
-		 * no scaling is needed.
-		 *
-		 * However, relpages is declared as "integer" in pg_class, and hence
-		 * also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
-		 * Cast so that we get the right interpretation of table sizes
-		 * exceeding INT_MAX pages.
+		/* data chunking works off relpages, which are computed exactly using
+		 * pg_relation_size() when --max-table-segment-pages was set
+		 * 
+		 * We also don't chunk if table access method is not "heap"
+		 * TODO: we may add chunking for other access methods later, maybe 
+		 * based on primary key tranges
 		 */
-		te->dataLength = (BlockNumber) tbinfo->relpages;
-		te->dataLength += (BlockNumber) tbinfo->toastpages;
+		if (tbinfo->relpages <= dopt->max_table_segment_pages || 
+			strcmp(tbinfo->amname, "heap") != 0)
+		{
+			te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = "TABLE DATA",
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = tdinfo));
 
-		/*
-		 * If pgoff_t is only 32 bits wide, the above refinement is useless,
-		 * and instead we'd better worry about integer overflow.  Clamp to
-		 * INT_MAX if the correct result exceeds that.
-		 */
-		if (sizeof(te->dataLength) == 4 &&
-			(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
-			 te->dataLength < 0))
-			te->dataLength = INT_MAX;
+			/*
+			 * Set the TocEntry's dataLength in case we are doing a parallel dump
+			 * and want to order dump jobs by table size.  We choose to measure
+			 * dataLength in table pages (including TOAST pages) during dump, so
+			 * no scaling is needed.
+			 *
+			 * While pg_class.relpages which stores BlockNumber, a/k/a unsigned int,
+			 * is declared as "integer" we convert it back and store it as 
+			 * BlockNumber in TableInfo.
+			 * And dataLenght is pgoff_t (long int) so does now overflow for
+			 * 2 x UINT32_MAX 
+			 */
+			te->dataLength = tbinfo->relpages;
+			te->dataLength += tbinfo->toastpages;
+		}
+		else
+		{
+			uint64 current_chunk_start = 0;
+			PQExpBuffer chunk_desc = createPQExpBuffer();
+
+			while (current_chunk_start < tbinfo->relpages)
+			{
+				TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo));
+
+				memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo));
+				AssignDumpId(&chunk_tdinfo->dobj);
+				addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId);
+				chunk_tdinfo->startPage = (BlockNumber) current_chunk_start;
+				chunk_tdinfo->endPage = chunk_tdinfo->startPage + dopt->max_table_segment_pages - 1;
+				
+				current_chunk_start += dopt->max_table_segment_pages;
+				if (current_chunk_start >= tbinfo->relpages)
+					chunk_tdinfo->endPage = InvalidBlockNumber; /* last chunk is for "all the rest" */
+
+				printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+
+				te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = chunk_desc->data,
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = chunk_tdinfo));
+
+				if(chunk_tdinfo->endPage == InvalidBlockNumber)
+					te->dataLength = tbinfo->relpages - chunk_tdinfo->startPage;
+				else
+					te->dataLength = dopt->max_table_segment_pages;
+				/* let's assume toast pages distribute evenly among chunks */
+				if(tbinfo->relpages)
+					te->dataLength += te->dataLength * tbinfo->toastpages / tbinfo->relpages;
+			}
+
+			destroyPQExpBuffer(chunk_desc);
+		}
 	}
 
 	destroyPQExpBuffer(copyBuf);
@@ -3081,6 +3170,8 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
 	tdinfo->dobj.namespace = tbinfo->dobj.namespace;
 	tdinfo->tdtable = tbinfo;
 	tdinfo->filtercond = NULL;	/* might get set later */
+	tdinfo->startPage = InvalidBlockNumber; /* we use this as indication that no chunking is needed */
+	tdinfo->endPage = InvalidBlockNumber;
 	addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
 
 	/* A TableDataInfo contains data, of course */
@@ -7347,8 +7438,16 @@ getTables(Archive *fout, int *numTables)
 						 "c.relnamespace, c.relkind, c.reltype, "
 						 "c.relowner, "
 						 "c.relchecks, "
-						 "c.relhasindex, c.relhasrules, c.relpages, "
-						 "c.reltuples, c.relallvisible, ");
+						 "c.relhasindex, c.relhasrules, ");
+
+	/* fetch current relation size if chunking is requested */
+	if(dopt->max_table_segment_pages != InvalidBlockNumber)
+		appendPQExpBufferStr(query, "pg_relation_size(c.oid)/current_setting('block_size')::int AS relpages, ");
+	else
+		/* pg_class.relpages stores BlockNumber (uint32) in an int field, convert to oid to get unsigned int out */
+		appendPQExpBufferStr(query, "c.relpages::oid, ");
+
+	appendPQExpBufferStr(query, "c.reltuples, c.relallvisible, ");
 
 	if (fout->remoteVersion >= 180000)
 		appendPQExpBufferStr(query, "c.relallfrozen, ");
@@ -7589,7 +7688,7 @@ getTables(Archive *fout, int *numTables)
 		tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks));
 		tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
 		tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
-		tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));
+		tblinfo[i].relpages = strtoul(PQgetvalue(res, i, i_relpages), NULL, 10);
 		if (PQgetisnull(res, i, i_toastpages))
 			tblinfo[i].toastpages = 0;
 		else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 5a6726d8b12..84e682d585f 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -16,6 +16,7 @@
 
 #include "pg_backup.h"
 #include "catalog/pg_publication_d.h"
+#include "storage/block.h"
 
 
 #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ?  1 : 0) )
@@ -335,7 +336,11 @@ typedef struct _tableInfo
 	Oid			owning_tab;		/* OID of table owning sequence */
 	int			owning_col;		/* attr # of column owning sequence */
 	bool		is_identity_sequence;
-	int32		relpages;		/* table's size in pages (from pg_class) */
+	BlockNumber	relpages;		/* table's size in pages (from pg_class) 
+	                             * converted to unsigned integer
+								 * when --max-table-segment-pages is set
+								 * the computed from pg_relation_size()
+	                             */
 	int			toastpages;		/* toast table's size in pages, if any */
 
 	bool		interesting;	/* true if need to collect more data */
@@ -413,8 +418,21 @@ typedef struct _tableDataInfo
 	DumpableObject dobj;
 	TableInfo  *tdtable;		/* link to table to dump */
 	char	   *filtercond;		/* WHERE condition to limit rows dumped */
+	/* startPage and endPage to support segmented dump */
+	BlockNumber	startPage;		/* As we always know the lowest segment page
+								 * number we can use InvalidBlockNumber here
+								 * to recognize no segmenting case.
+								 * When 0 for the first page of first
+								 * segment we can omit in range query */
+	BlockNumber	endPage;		/* last page in segment for page-range dump,
+	                    		 * startPage+max_table_segment_pages-1 for 
+								 * most segments, but InvalidBlockNumber for
+								 * the last one to indicate open range
+								 */
 } TableDataInfo;
 
+#define is_segment(tdiptr) ((tdiptr)->startPage != InvalidBlockNumber)
+
 typedef struct _indxInfo
 {
 	DumpableObject dobj;
@@ -449,7 +467,7 @@ typedef struct _relStatsInfo
 {
 	DumpableObject dobj;
 	Oid			relid;
-	int32		relpages;
+	BlockNumber	relpages;
 	char	   *reltuples;
 	int32		relallvisible;
 	int32		relallfrozen;
diff --git a/src/bin/pg_dump/t/004_pg_dump_parallel.pl b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
index 738f34b1c1b..4f35aeed9b9 100644
--- a/src/bin/pg_dump/t/004_pg_dump_parallel.pl
+++ b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
@@ -11,6 +11,7 @@ use Test::More;
 my $dbname1 = 'regression_src';
 my $dbname2 = 'regression_dest1';
 my $dbname3 = 'regression_dest2';
+my $dbname4 = 'regression_dest3';
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
@@ -21,6 +22,7 @@ my $backupdir = $node->backup_dir;
 $node->run_log([ 'createdb', $dbname1 ]);
 $node->run_log([ 'createdb', $dbname2 ]);
 $node->run_log([ 'createdb', $dbname3 ]);
+$node->run_log([ 'createdb', $dbname4 ]);
 
 $node->safe_psql(
 	$dbname1,
@@ -87,4 +89,33 @@ $node->command_ok(
 	],
 	'parallel restore as inserts');
 
+$node->command_ok(
+	[
+		'pg_dump',
+		'--format' => 'directory',
+		'--max-table-segment-pages' => 2,
+		'--no-sync',
+		'--jobs' => 2,
+		'--file' => "$backupdir/dump3",
+		$node->connstr($dbname1),
+	],
+	'parallel dump with chunks of two heap pages');
+
+$node->command_ok(
+	[
+		'pg_restore', '--verbose',
+		'--dbname' => $node->connstr($dbname4),
+		'--jobs' => 3,
+		"$backupdir/dump3",
+	],
+	'parallel restore with chunks of two heap pages');
+
+my $table = 'tplain';
+my $tablehash_query = "SELECT '$table', sum(hashtext(t::text)), count(*) FROM $table AS t";
+
+my $result_1 = $node->safe_psql($dbname1, $tablehash_query);
+my $result_4 = $node->safe_psql($dbname4, $tablehash_query);
+
+is($result_4, $result_1, "Hash check for $table: restored db ($result_4) vs original db ($result_1)");
+
 done_testing();
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 8d0659c1164..a516d8c86a9 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -83,6 +83,61 @@ option_parse_int(const char *optarg, const char *optname,
 	return true;
 }
 
+/*
+ * option_parse_uint32
+ *
+ * Parse unsigned integer value for an option.  If the parsing is successful,
+ * returns true and stores the result in *result if that's given;
+ * if parsing fails, returns false.
+ */
+bool
+option_parse_uint32(const char *optarg, const char *optname,
+				 uint32 min_range, uint32 max_range,
+				 uint32 *result)
+{
+	char	   		*endptr;
+	unsigned long	val;
+
+	/* Fail if there is a minus sign at the start of value */
+	while(isspace((unsigned char) *optarg))
+		optarg++;
+	if(*optarg == '-')
+	{
+		pg_log_error("value \"%s\" for option %s can not be negative",
+					optarg, optname);
+		return false;
+	}
+
+	errno = 0;
+	val = strtoul(optarg, &endptr, 10);
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail.
+	 */
+	while (*endptr != '\0' && isspace((unsigned char) *endptr))
+		endptr++;
+
+	if (*endptr != '\0')
+	{
+		pg_log_error("invalid value \"%s\" for option %s",
+					 optarg, optname);
+		return false;
+	}
+
+	/* as min_range and max_range are uint32 then the range check will
+	 * catch the case where unsigned long val is outside 32 bit range */
+	if (errno == ERANGE || val < min_range || val > max_range)
+	{
+		pg_log_error("%s not in range %u..%u", optname, min_range, max_range);
+		return false;
+	}
+
+	if (result)
+		*result = (uint32) val;
+	return true;
+}
+
 /*
  * Provide strictly harmonized handling of the --sync-method option.
  */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d975db77af2..67fd3650d7a 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -22,6 +22,9 @@ extern void handle_help_version_opts(int argc, char *argv[],
 extern bool option_parse_int(const char *optarg, const char *optname,
 							 int min_range, int max_range,
 							 int *result);
+extern bool option_parse_uint32(const char *optarg, const char *optname,
+							 uint32 min_range, uint32 max_range,
+							 uint32 *result);
 extern bool parse_sync_method(const char *optarg,
 							  DataDirSyncMethod *sync_method);
 extern void check_mut_excl_opts_internal(int n,...);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:32             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:33               ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-29 21:49                 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-30 17:32                   ` Hannu Krosing <[email protected]>
  2026-03-30 21:32                     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Zsolt Parragi <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2026-03-30 17:32 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: David Rowley <[email protected]>; Michael Banck <[email protected]>; Ashutosh Bapat <[email protected]>; Nathan Bossart <[email protected]>

Now the dependencies on chunks should also behave correctly


On Sun, Mar 29, 2026 at 11:49 PM Hannu Krosing <[email protected]> wrote:
>
> Fixing a off-by-one error in copying over dependencies
>
>
> On Sat, Mar 28, 2026 at 4:33 PM Hannu Krosing <[email protected]> wrote:
> >
> > The above
> >
> > "Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items."
> >
> > should read
> >
> > "Or it can be almost 200 GB *for a single page* if the page has just
> > pointers to 1GB TOAST items."
> >
> >
> > On Sat, Mar 28, 2026 at 4:32 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > The issue is that currently the value is given in "main table pages"
> > > and it would be somewhat deceptive, or at least confusing, to try to
> > > express this in any other unit.
> > >
> > > As I explained in the commit message:
> > >
> > > ---------8<-------------------8<-------------------8<----------------
> > > This --max-table-segment-pages number specifically applies to main table
> > > pages which does not guarantee anything about output size.
> > > The output could be empty if there are no live tuples in the page range.
> > > Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
> > > ---------8<-------------------8<-------------------8<----------------
> > >
> > > And I can think of no cheap and reliable way to change that equation.
> > >
> > > I'll be very happy if you have any good ideas for either improving the
> > > flag name, or even propose a way to better estimate the resulting dump
> > > file size so we could give the chunk size in better units
> > >
> > > ---
> > > Hannu
> > >
> > >
> > >
> > >
> > >
> > > On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > > > > Perhaps --max-table-segment-pages is a better name than
> > > > > --huge-table-chunk-pages as it's quite subjective what the minimum
> > > > > number of pages required to make a table "huge".
> > > >
> > > > I'm not sure that's better - without looking at the documentation,
> > > > people might confuse segment here with the 1GB split of tables into
> > > > segments. As pg_dump is a very common and basic user tool, I don't think
> > > > implementation details like pages/page sizes and blocks should be part
> > > > of its UX.
> > > >
> > > > Can't we just make it a storage size, like '10GB' and then rename it to
> > > > --table-parallel-threshold or something? I agree it's bikeshedding, but
> > > > I personally don't like either --max-table-segment-pages or
> > > > --huge-table-chunk-pages.
> > > >
> > > >
> > > > Michael


Attachments:

  [application/x-patch] v16-0001-Add-max-table-segment-pages-option-to-pg_dump-fo.patch (29.5K, 2-v16-0001-Add-max-table-segment-pages-option-to-pg_dump-fo.patch)
  download | inline diff:
From b0d27b32c17d1e09f9484a81b3d3c3581d190adb Mon Sep 17 00:00:00 2001
From: Hannu Krosing <[email protected]>
Date: Mon, 30 Mar 2026 19:28:45 +0200
Subject: [PATCH v16] Add --max-table-segment-pages option to pg_dump for
 parallel table dumping.

This patch introduces the ability to split large heap tables into segments
based on a specified number of pages. These segments can then be dumped in
parallel using the existing jobs infrastructure, significantly reducing
the time required to dump very large tables.

This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.

The implementation uses ctid-based range queries (e.g., WHERE ctid >=
'(startPage,1)' AND ctid < '(endPage+1,0)') to extract specific chunks of
the relation.

This is only effectively supported for PostgreSQL version 14+ though it does
work inefficiently on earlier versions

The patch only supports "heap" access method as others may not even have the
ctid column
---
 doc/src/sgml/ref/pg_dump.sgml             |  24 +++
 src/bin/pg_dump/pg_backup.h               |   2 +
 src/bin/pg_dump/pg_backup_archiver.c      |  92 ++++++++++-
 src/bin/pg_dump/pg_backup_archiver.h      |  12 +-
 src/bin/pg_dump/pg_dump.c                 | 177 +++++++++++++++++-----
 src/bin/pg_dump/pg_dump.h                 |  22 ++-
 src/bin/pg_dump/t/004_pg_dump_parallel.pl |  31 ++++
 src/fe_utils/option_utils.c               |  55 +++++++
 src/include/fe_utils/option_utils.h       |   3 +
 9 files changed, 368 insertions(+), 50 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7f538e90194..5f056bb4af6 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1066,6 +1066,30 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--max-table-segment-pages=<replaceable class="parameter">npages</replaceable></option></term>
+      <listitem>
+       <para>
+        Dump data in segments based on number of pages in the main relation.
+        If the number of data pages in the relation is more than <replaceable class="parameter">npages</replaceable> 
+        the data is split into segments based on that number of pages.
+        Individual segments can be dumped in parallel.
+       </para>
+
+       <note>
+        <para>
+         The option <option>--max-table-segment-pages</option> is applied to only pages
+         in the main heap and if the table has a large TOASTed part this has to be
+         taken into account when deciding on the number of pages to use.
+         In the extreme case a single 8kB heap page can have ~200 toast pointers each 
+         corresponding to 1GB of data. If this data is also non-compressible then a 
+         single-page segment can dump as 200GB file.
+        </para>
+       </note>
+
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--no-comments</option></term>
       <listitem>
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fda912ba0a9..11863a1915f 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -27,6 +27,7 @@
 #include "common/file_utils.h"
 #include "fe_utils/simple_list.h"
 #include "libpq-fe.h"
+#include "storage/block.h"
 
 
 typedef enum trivalue
@@ -179,6 +180,7 @@ typedef struct _dumpOptions
 	bool		aclsSkip;
 	const char *lockWaitTimeout;
 	int			dump_inserts;	/* 0 = COPY, otherwise rows per INSERT */
+	BlockNumber	max_table_segment_pages; /* chunk when relpages is above this */
 
 	/* flags for various command-line long options */
 	int			disable_dollar_quoting;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 271a2c3e481..e32bd8149cb 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -44,6 +44,7 @@
 #include "pg_backup_archiver.h"
 #include "pg_backup_db.h"
 #include "pg_backup_utils.h"
+#include "storage/block.h"
 
 #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
 #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
@@ -154,6 +155,7 @@ InitDumpOptions(DumpOptions *opts)
 	opts->dumpSchema = true;
 	opts->dumpData = true;
 	opts->dumpStatistics = false;
+	opts->max_table_segment_pages = InvalidBlockNumber;
 }
 
 /*
@@ -1995,6 +1997,28 @@ _moveBefore(TocEntry *pos, TocEntry *te)
 	pos->prev = te;
 }
 
+/*
+ * Add a dependency id to a DependencyList object
+ * This is currently used for collecting reverse 
+ * dependencies for chunked data dump 
+ *
+ * Note: duplicate dependencies are currently not eliminated
+ */
+void
+addStandaloneDependency(DependencyList *dobj, DumpId refId)
+{
+	pg_log_warning("Adding dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	if (dobj->nDeps >= dobj->allocDeps)
+	{
+		dobj->allocDeps = (dobj->allocDeps <= 0) ? 16 : dobj->allocDeps * 2;
+		dobj->dependencies = pg_realloc_array(dobj->dependencies,
+											  DumpId, dobj->allocDeps);
+		pg_log_warning("Realloced list %p to size %d", (void *) dobj->dependencies, dobj->allocDeps);
+	}
+	pg_log_warning("Added dep: list %p + dep %u", (void *) dobj->dependencies, refId);
+	dobj->dependencies[dobj->nDeps++] = refId;
+}
+
 /*
  * Build index arrays for the TOC list
  *
@@ -2014,6 +2038,7 @@ buildTocEntryArrays(ArchiveHandle *AH)
 
 	AH->tocsByDumpId = pg_malloc0_array(TocEntry *, (maxDumpId + 1));
 	AH->tableDataId = pg_malloc0_array(DumpId, (maxDumpId + 1));
+	AH->tableDataChunkIds = pg_malloc0_array(DependencyList, (maxDumpId + 1));
 
 	for (te = AH->toc->next; te != AH->toc; te = te->next)
 	{
@@ -2029,8 +2054,12 @@ buildTocEntryArrays(ArchiveHandle *AH)
 		 * TOC entry that has a DATA item.  We compute this by reversing the
 		 * TABLE DATA item's dependency, knowing that a TABLE DATA item has
 		 * just one dependency and it is the TABLE item.
+		 *
+		 * For chunked table data, the TABLE DATA item has a description like
+		 * "TABLE DATA (pages 100:199)", and we collect all such items as
+		 * reverse dependencies for the parent table's entry in tableDataChunkIds.
 		 */
-		if (strcmp(te->desc, "TABLE DATA") == 0 && te->nDeps > 0)
+		if (strncmp(te->desc, "TABLE DATA", 10) == 0 && te->nDeps > 0)
 		{
 			DumpId		tableId = te->dependencies[0];
 
@@ -2042,7 +2071,14 @@ buildTocEntryArrays(ArchiveHandle *AH)
 			if (tableId <= 0 || tableId > maxDumpId)
 				pg_fatal("bad table dumpId for TABLE DATA item");
 
-			AH->tableDataId[tableId] = te->dumpId;
+			if (te->desc[10] == '\0') /* te->desc == "TABLE DATA" */
+				AH->tableDataId[tableId] = te->dumpId;
+			else
+			{
+				/* Chunked table data, the description is "TABLE DATA (pages %u:%u)" */
+				addStandaloneDependency(&(AH->tableDataChunkIds[tableId]), te->dumpId);
+				pg_log_debug("Added chunked table data dependency: tableId %u + chunkId %u",
+							 tableId, te->dumpId);}
 		}
 	}
 }
@@ -2785,7 +2821,7 @@ ReadToc(ArchiveHandle *AH)
 				strcmp(te->desc, "ACL") == 0 ||
 				strcmp(te->desc, "ACL LANGUAGE") == 0)
 				te->section = SECTION_NONE;
-			else if (strcmp(te->desc, "TABLE DATA") == 0 ||
+			else if (strncmp(te->desc, "TABLE DATA", 10) == 0 ||
 					 strcmp(te->desc, "BLOBS") == 0 ||
 					 strcmp(te->desc, "BLOB COMMENTS") == 0)
 				te->section = SECTION_DATA;
@@ -3015,7 +3051,7 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 	 * associated pg_shdepend rows. This is faster to restore than the
 	 * equivalent set of large object commands.
 	 */
-	if (ropt->binary_upgrade && strcmp(te->desc, "TABLE DATA") == 0 &&
+	if (ropt->binary_upgrade && strncmp(te->desc, "TABLE DATA", 10) == 0 &&
 		(te->catalogId.oid == LargeObjectMetadataRelationId ||
 		 te->catalogId.oid == SharedDependRelationId))
 		return REQ_DATA;
@@ -3246,7 +3282,7 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 		if (ropt->selTypes)
 		{
 			if (strcmp(te->desc, "TABLE") == 0 ||
-				strcmp(te->desc, "TABLE DATA") == 0 ||
+				strncmp(te->desc, "TABLE DATA", 10) == 0 ||
 				strcmp(te->desc, "VIEW") == 0 ||
 				strcmp(te->desc, "FOREIGN TABLE") == 0 ||
 				strcmp(te->desc, "MATERIALIZED VIEW") == 0 ||
@@ -5017,6 +5053,12 @@ fix_dependencies(ArchiveHandle *AH)
  * that parallel restore will prioritize larger jobs (index builds, FK
  * constraint checks, etc) over smaller ones, avoiding situations where we
  * end a restore with only one active job working on a large table.
+ *
+ * In case of chunked dumps, we change the depenency on table with depedency
+ * on the first chunk of data and add the remaingi chunk ids, if any, to the 
+ * end of depencency list
+ * we also calculate the fullDataLength as the sum of the lengths of chunk
+ * data items and use that to set the item's dataLength.
  */
 static void
 repoint_table_dependencies(ArchiveHandle *AH)
@@ -5032,8 +5074,9 @@ repoint_table_dependencies(ArchiveHandle *AH)
 		for (i = 0; i < te->nDeps; i++)
 		{
 			olddep = te->dependencies[i];
-			if (olddep <= AH->maxDumpId &&
-				AH->tableDataId[olddep] != 0)
+			if (olddep > AH->maxDumpId)
+				continue;
+			if (AH->tableDataId[olddep] != 0)
 			{
 				DumpId		tabledataid = AH->tableDataId[olddep];
 				TocEntry   *tabledatate = AH->tocsByDumpId[tabledataid];
@@ -5043,6 +5086,39 @@ repoint_table_dependencies(ArchiveHandle *AH)
 				pg_log_debug("transferring dependency %d -> %d to %d",
 							 te->dumpId, olddep, tabledataid);
 			}
+			else if (AH->tableDataChunkIds[olddep].nDeps > 0)
+			{
+				int			j;
+				DumpId		chunkdataid;
+				uint64		fullDataLength;
+				DependencyList *deplist = &AH->tableDataChunkIds[olddep];
+
+				/* first in list replaces the dependency on table */
+				chunkdataid = deplist->dependencies[0];
+				te->dependencies[i] = chunkdataid;
+				fullDataLength = AH->tocsByDumpId[chunkdataid]->dataLength;
+				pg_log_debug("transferring chunk list %d -> %d to %d",
+							 te->dumpId, olddep, chunkdataid);
+
+				if (deplist->nDeps > 1)
+				{
+					/* make space */
+					te->dependencies = pg_realloc_array(te->dependencies,
+												  DumpId,
+												  te->nDeps + deplist->nDeps - 1);
+
+					/* the rest are appended to dependencies */
+					for (j = 1; j < deplist->nDeps; j++)
+					{
+						chunkdataid = deplist->dependencies[j];
+						te->dependencies[te->nDeps++] = chunkdataid;
+						fullDataLength += AH->tocsByDumpId[chunkdataid]->dataLength;
+						pg_log_debug("adding chunk list %d -> %d to %d",
+									te->dumpId, olddep, chunkdataid);
+					}
+				}
+				te->dataLength = Max(te->dataLength, fullDataLength);
+			}
 		}
 	}
 }
@@ -5096,7 +5172,7 @@ identify_locking_dependencies(ArchiveHandle *AH, TocEntry *te)
 		DumpId		depid = te->dependencies[i];
 
 		if (depid <= AH->maxDumpId && AH->tocsByDumpId[depid] != NULL &&
-			((strcmp(AH->tocsByDumpId[depid]->desc, "TABLE DATA") == 0) ||
+			((strncmp(AH->tocsByDumpId[depid]->desc, "TABLE DATA", 10) == 0) ||
 			 strcmp(AH->tocsByDumpId[depid]->desc, "TABLE") == 0))
 			lockids[nlockids++] = depid;
 	}
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 365073b3eae..cfa3ea1bbd6 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -179,6 +179,13 @@ typedef enum
 	OUTPUT_OTHERDATA,			/* writing data as INSERT commands */
 } ArchiverOutput;
 
+typedef struct _DependencyList
+{
+	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
+	int			nDeps;			/* number of valid dependencies */
+	int			allocDeps;		/* allocated size of dependencies[] */
+} DependencyList;
+
 /*
  * For historical reasons, ACL items are interspersed with everything else in
  * a dump file's TOC; typically they're right after the object they're for.
@@ -311,6 +318,7 @@ struct _archiveHandle
 	/* arrays created after the TOC list is complete: */
 	struct _tocEntry **tocsByDumpId;	/* TOCs indexed by dumpId */
 	DumpId	   *tableDataId;	/* TABLE DATA ids, indexed by table dumpId */
+	DependencyList *tableDataChunkIds; /* dependencies indexed by dumpId */
 
 	struct _tocEntry *currToc;	/* Used when dumping data */
 	pg_compress_specification compression_spec; /* Requested specification for
@@ -377,7 +385,7 @@ struct _tocEntry
 	size_t		defnLen;		/* length of dumped definition */
 
 	/* working state while dumping/restoring */
-	pgoff_t		dataLength;		/* item's data size; 0 if none or unknown */
+	uint64		dataLength;		/* item's data size; 0 if none or unknown */
 	int			reqs;			/* do we need schema and/or data of object
 								 * (REQ_* bit mask) */
 	bool		created;		/* set for DATA member if TABLE was created */
@@ -437,6 +445,8 @@ extern int	TocIDRequired(ArchiveHandle *AH, DumpId id);
 TocEntry   *getTocEntryByDumpId(ArchiveHandle *AH, DumpId id);
 extern bool checkSeek(FILE *fp);
 
+extern void addStandaloneDependency(DependencyList *dobj, DumpId refId);
+
 #define appendStringLiteralAHX(buf,str,AH) \
 	appendStringLiteral(buf, str, (AH)->public.encoding, (AH)->public.std_strings)
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5d1f7682f11..1e7d9a3f7f3 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -535,6 +535,7 @@ main(int argc, char **argv)
 		{"exclude-extension", required_argument, NULL, 17},
 		{"sequence-data", no_argument, &dopt.sequence_data, 1},
 		{"restrict-key", required_argument, NULL, 25},
+		{"max-table-segment-pages", required_argument, NULL, 26},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -799,6 +800,12 @@ main(int argc, char **argv)
 				dopt.restrict_key = pg_strdup(optarg);
 				break;
 
+			case 26:
+				if (!option_parse_uint32(optarg, "--max-table-segment-pages", 1, MaxBlockNumber,
+									  &dopt.max_table_segment_pages))
+					exit_nicely(1);
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -1344,6 +1351,9 @@ help(const char *progname)
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
 	printf(_("  --filter=FILENAME            include or exclude objects and data from dump\n"
 			 "                               based on expressions in FILENAME\n"));
+	printf(_("  --max-table-segment-pages=NUMPAGES\n"
+		     "                               number of main table pages above which data is \n"
+			 "                               copied out in chunks, also determines the chunk size\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -2396,7 +2406,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 	 * dumping an old pg_largeobject_metadata defined WITH OIDS.  For other
 	 * cases a simple COPY suffices.
 	 */
-	if (tdinfo->filtercond || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
+	if (tdinfo->filtercond || is_segment(tdinfo) || tbinfo->relkind == RELKIND_FOREIGN_TABLE ||
 		(fout->dopt->binary_upgrade && fout->remoteVersion < 120000 &&
 		 tbinfo->dobj.catId.oid == LargeObjectMetadataRelationId))
 	{
@@ -2414,9 +2424,37 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 		else
 			appendPQExpBufferStr(q, "* ");
 
-		appendPQExpBuffer(q, "FROM %s %s) TO stdout;",
+		appendPQExpBuffer(q, "FROM %s %s",
 						  fmtQualifiedDumpable(tbinfo),
 						  tdinfo->filtercond ? tdinfo->filtercond : "");
+		/* If it's a segment, we need to add a filter condition to select the
+		 * right page range 
+		 * - for first segment we add "ctid < (endPage+1, 0)" 
+		 *   first segment is the one with startPage == 0
+		 * - for last segment we add "ctid >= (startPage, 1)"
+		 *   last segment is the one with endPage == InvalidBlockNumber
+		 *   we leave to upper bound open for the case where more pages 
+		 *   were added after we measured 
+		 * - for middle segments we add 
+		 *   "ctid >= (startPage, 1) AND ctid < (endPage+1, 0)"
+		 *
+		 * "ctid < (endPage+1, 0)" instead of "ctid <= (endPage, maxtuple)"
+		 * was chosen as range end so that we do not have to estimate the maxtuple
+		 * 
+		 */
+		if (is_segment(tdinfo))
+		{
+			appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
+			if(tdinfo->startPage == 0)
+				appendPQExpBuffer(q, "ctid < '(%u,0)'", tdinfo->endPage+1);			
+			else if(tdinfo->endPage != InvalidBlockNumber)
+				appendPQExpBuffer(q, "ctid >= '(%u,1)' AND ctid < '(%u,0)'",
+								 tdinfo->startPage, tdinfo->endPage+1);
+			else
+				appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
+		}
+
+		appendPQExpBuffer(q, ") TO stdout;");
 	}
 	else
 	{
@@ -2424,6 +2462,10 @@ dumpTableData_copy(Archive *fout, const void *dcontext)
 						  fmtQualifiedDumpable(tbinfo),
 						  column_list);
 	}
+
+	if (is_segment(tdinfo))
+		pg_log_debug("CHUNKING: data query: %s", q->data);
+	
 	res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
 	PQclear(res);
 	destroyPQExpBuffer(clistBuf);
@@ -2919,42 +2961,89 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
 	{
 		TocEntry   *te;
 
-		te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
-						  ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
-									   .namespace = tbinfo->dobj.namespace->dobj.name,
-									   .owner = tbinfo->rolname,
-									   .description = "TABLE DATA",
-									   .section = SECTION_DATA,
-									   .createStmt = tdDefn,
-									   .copyStmt = copyStmt,
-									   .deps = &(tbinfo->dobj.dumpId),
-									   .nDeps = 1,
-									   .dumpFn = dumpFn,
-									   .dumpArg = tdinfo));
-
-		/*
-		 * Set the TocEntry's dataLength in case we are doing a parallel dump
-		 * and want to order dump jobs by table size.  We choose to measure
-		 * dataLength in table pages (including TOAST pages) during dump, so
-		 * no scaling is needed.
-		 *
-		 * However, relpages is declared as "integer" in pg_class, and hence
-		 * also in TableInfo, but it's really BlockNumber a/k/a unsigned int.
-		 * Cast so that we get the right interpretation of table sizes
-		 * exceeding INT_MAX pages.
+		/* data chunking works off relpages, which are computed exactly using
+		 * pg_relation_size() when --max-table-segment-pages was set
+		 * 
+		 * We also don't chunk if table access method is not "heap"
+		 * TODO: we may add chunking for other access methods later, maybe 
+		 * based on primary key tranges
 		 */
-		te->dataLength = (BlockNumber) tbinfo->relpages;
-		te->dataLength += (BlockNumber) tbinfo->toastpages;
+		if (tbinfo->relpages <= dopt->max_table_segment_pages || 
+			strcmp(tbinfo->amname, "heap") != 0)
+		{
+			te = ArchiveEntry(fout, tdinfo->dobj.catId, tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = "TABLE DATA",
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = tdinfo));
 
-		/*
-		 * If pgoff_t is only 32 bits wide, the above refinement is useless,
-		 * and instead we'd better worry about integer overflow.  Clamp to
-		 * INT_MAX if the correct result exceeds that.
-		 */
-		if (sizeof(te->dataLength) == 4 &&
-			(tbinfo->relpages < 0 || tbinfo->toastpages < 0 ||
-			 te->dataLength < 0))
-			te->dataLength = INT_MAX;
+			/*
+			 * Set the TocEntry's dataLength in case we are doing a parallel dump
+			 * and want to order dump jobs by table size.  We choose to measure
+			 * dataLength in table pages (including TOAST pages) during dump, so
+			 * no scaling is needed.
+			 *
+			 * While pg_class.relpages which stores BlockNumber, a/k/a unsigned int,
+			 * is declared as "integer" we convert it back and store it as 
+			 * BlockNumber in TableInfo.
+			 * And dataLenght is pgoff_t (long int) so does now overflow for
+			 * 2 x UINT32_MAX 
+			 */
+			te->dataLength = tbinfo->relpages;
+			te->dataLength += tbinfo->toastpages;
+		}
+		else
+		{
+			uint64 current_chunk_start = 0;
+			PQExpBuffer chunk_desc = createPQExpBuffer();
+
+			while (current_chunk_start < tbinfo->relpages)
+			{
+				TableDataInfo *chunk_tdinfo = (TableDataInfo *) pg_malloc(sizeof(TableDataInfo));
+
+				memcpy(chunk_tdinfo, tdinfo, sizeof(TableDataInfo));
+				AssignDumpId(&chunk_tdinfo->dobj);
+				addObjectDependency(&chunk_tdinfo->dobj, tbinfo->dobj.dumpId);
+				chunk_tdinfo->startPage = (BlockNumber) current_chunk_start;
+				chunk_tdinfo->endPage = chunk_tdinfo->startPage + dopt->max_table_segment_pages - 1;
+				
+				current_chunk_start += dopt->max_table_segment_pages;
+				if (current_chunk_start >= tbinfo->relpages)
+					chunk_tdinfo->endPage = InvalidBlockNumber; /* last chunk is for "all the rest" */
+
+				printfPQExpBuffer(chunk_desc, "TABLE DATA (pages %u:%u)", chunk_tdinfo->startPage, chunk_tdinfo->endPage);
+
+				te = ArchiveEntry(fout, chunk_tdinfo->dobj.catId, chunk_tdinfo->dobj.dumpId,
+							ARCHIVE_OPTS(.tag = tbinfo->dobj.name,
+										.namespace = tbinfo->dobj.namespace->dobj.name,
+										.owner = tbinfo->rolname,
+										.description = chunk_desc->data,
+										.section = SECTION_DATA,
+										.createStmt = tdDefn,
+										.copyStmt = copyStmt,
+										.deps = &(tbinfo->dobj.dumpId),
+										.nDeps = 1,
+										.dumpFn = dumpFn,
+										.dumpArg = chunk_tdinfo));
+
+				if(chunk_tdinfo->endPage == InvalidBlockNumber)
+					te->dataLength = tbinfo->relpages - chunk_tdinfo->startPage;
+				else
+					te->dataLength = dopt->max_table_segment_pages;
+				/* let's assume toast pages distribute evenly among chunks */
+				if(tbinfo->relpages)
+					te->dataLength += te->dataLength * tbinfo->toastpages / tbinfo->relpages;
+			}
+
+			destroyPQExpBuffer(chunk_desc);
+		}
 	}
 
 	destroyPQExpBuffer(copyBuf);
@@ -3081,6 +3170,8 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
 	tdinfo->dobj.namespace = tbinfo->dobj.namespace;
 	tdinfo->tdtable = tbinfo;
 	tdinfo->filtercond = NULL;	/* might get set later */
+	tdinfo->startPage = InvalidBlockNumber; /* we use this as indication that no chunking is needed */
+	tdinfo->endPage = InvalidBlockNumber;
 	addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
 
 	/* A TableDataInfo contains data, of course */
@@ -7347,8 +7438,16 @@ getTables(Archive *fout, int *numTables)
 						 "c.relnamespace, c.relkind, c.reltype, "
 						 "c.relowner, "
 						 "c.relchecks, "
-						 "c.relhasindex, c.relhasrules, c.relpages, "
-						 "c.reltuples, c.relallvisible, ");
+						 "c.relhasindex, c.relhasrules, ");
+
+	/* fetch current relation size if chunking is requested */
+	if(dopt->max_table_segment_pages != InvalidBlockNumber)
+		appendPQExpBufferStr(query, "pg_relation_size(c.oid)/current_setting('block_size')::int AS relpages, ");
+	else
+		/* pg_class.relpages stores BlockNumber (uint32) in an int field, convert to oid to get unsigned int out */
+		appendPQExpBufferStr(query, "c.relpages::oid, ");
+
+	appendPQExpBufferStr(query, "c.reltuples, c.relallvisible, ");
 
 	if (fout->remoteVersion >= 180000)
 		appendPQExpBufferStr(query, "c.relallfrozen, ");
@@ -7589,7 +7688,7 @@ getTables(Archive *fout, int *numTables)
 		tblinfo[i].ncheck = atoi(PQgetvalue(res, i, i_relchecks));
 		tblinfo[i].hasindex = (strcmp(PQgetvalue(res, i, i_relhasindex), "t") == 0);
 		tblinfo[i].hasrules = (strcmp(PQgetvalue(res, i, i_relhasrules), "t") == 0);
-		tblinfo[i].relpages = atoi(PQgetvalue(res, i, i_relpages));
+		tblinfo[i].relpages = strtoul(PQgetvalue(res, i, i_relpages), NULL, 10);
 		if (PQgetisnull(res, i, i_toastpages))
 			tblinfo[i].toastpages = 0;
 		else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 5a6726d8b12..84e682d585f 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -16,6 +16,7 @@
 
 #include "pg_backup.h"
 #include "catalog/pg_publication_d.h"
+#include "storage/block.h"
 
 
 #define oidcmp(x,y) ( ((x) < (y) ? -1 : ((x) > (y)) ?  1 : 0) )
@@ -335,7 +336,11 @@ typedef struct _tableInfo
 	Oid			owning_tab;		/* OID of table owning sequence */
 	int			owning_col;		/* attr # of column owning sequence */
 	bool		is_identity_sequence;
-	int32		relpages;		/* table's size in pages (from pg_class) */
+	BlockNumber	relpages;		/* table's size in pages (from pg_class) 
+	                             * converted to unsigned integer
+								 * when --max-table-segment-pages is set
+								 * the computed from pg_relation_size()
+	                             */
 	int			toastpages;		/* toast table's size in pages, if any */
 
 	bool		interesting;	/* true if need to collect more data */
@@ -413,8 +418,21 @@ typedef struct _tableDataInfo
 	DumpableObject dobj;
 	TableInfo  *tdtable;		/* link to table to dump */
 	char	   *filtercond;		/* WHERE condition to limit rows dumped */
+	/* startPage and endPage to support segmented dump */
+	BlockNumber	startPage;		/* As we always know the lowest segment page
+								 * number we can use InvalidBlockNumber here
+								 * to recognize no segmenting case.
+								 * When 0 for the first page of first
+								 * segment we can omit in range query */
+	BlockNumber	endPage;		/* last page in segment for page-range dump,
+	                    		 * startPage+max_table_segment_pages-1 for 
+								 * most segments, but InvalidBlockNumber for
+								 * the last one to indicate open range
+								 */
 } TableDataInfo;
 
+#define is_segment(tdiptr) ((tdiptr)->startPage != InvalidBlockNumber)
+
 typedef struct _indxInfo
 {
 	DumpableObject dobj;
@@ -449,7 +467,7 @@ typedef struct _relStatsInfo
 {
 	DumpableObject dobj;
 	Oid			relid;
-	int32		relpages;
+	BlockNumber	relpages;
 	char	   *reltuples;
 	int32		relallvisible;
 	int32		relallfrozen;
diff --git a/src/bin/pg_dump/t/004_pg_dump_parallel.pl b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
index 738f34b1c1b..4f35aeed9b9 100644
--- a/src/bin/pg_dump/t/004_pg_dump_parallel.pl
+++ b/src/bin/pg_dump/t/004_pg_dump_parallel.pl
@@ -11,6 +11,7 @@ use Test::More;
 my $dbname1 = 'regression_src';
 my $dbname2 = 'regression_dest1';
 my $dbname3 = 'regression_dest2';
+my $dbname4 = 'regression_dest3';
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
@@ -21,6 +22,7 @@ my $backupdir = $node->backup_dir;
 $node->run_log([ 'createdb', $dbname1 ]);
 $node->run_log([ 'createdb', $dbname2 ]);
 $node->run_log([ 'createdb', $dbname3 ]);
+$node->run_log([ 'createdb', $dbname4 ]);
 
 $node->safe_psql(
 	$dbname1,
@@ -87,4 +89,33 @@ $node->command_ok(
 	],
 	'parallel restore as inserts');
 
+$node->command_ok(
+	[
+		'pg_dump',
+		'--format' => 'directory',
+		'--max-table-segment-pages' => 2,
+		'--no-sync',
+		'--jobs' => 2,
+		'--file' => "$backupdir/dump3",
+		$node->connstr($dbname1),
+	],
+	'parallel dump with chunks of two heap pages');
+
+$node->command_ok(
+	[
+		'pg_restore', '--verbose',
+		'--dbname' => $node->connstr($dbname4),
+		'--jobs' => 3,
+		"$backupdir/dump3",
+	],
+	'parallel restore with chunks of two heap pages');
+
+my $table = 'tplain';
+my $tablehash_query = "SELECT '$table', sum(hashtext(t::text)), count(*) FROM $table AS t";
+
+my $result_1 = $node->safe_psql($dbname1, $tablehash_query);
+my $result_4 = $node->safe_psql($dbname4, $tablehash_query);
+
+is($result_4, $result_1, "Hash check for $table: restored db ($result_4) vs original db ($result_1)");
+
 done_testing();
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 8d0659c1164..a516d8c86a9 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -83,6 +83,61 @@ option_parse_int(const char *optarg, const char *optname,
 	return true;
 }
 
+/*
+ * option_parse_uint32
+ *
+ * Parse unsigned integer value for an option.  If the parsing is successful,
+ * returns true and stores the result in *result if that's given;
+ * if parsing fails, returns false.
+ */
+bool
+option_parse_uint32(const char *optarg, const char *optname,
+				 uint32 min_range, uint32 max_range,
+				 uint32 *result)
+{
+	char	   		*endptr;
+	unsigned long	val;
+
+	/* Fail if there is a minus sign at the start of value */
+	while(isspace((unsigned char) *optarg))
+		optarg++;
+	if(*optarg == '-')
+	{
+		pg_log_error("value \"%s\" for option %s can not be negative",
+					optarg, optname);
+		return false;
+	}
+
+	errno = 0;
+	val = strtoul(optarg, &endptr, 10);
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail.
+	 */
+	while (*endptr != '\0' && isspace((unsigned char) *endptr))
+		endptr++;
+
+	if (*endptr != '\0')
+	{
+		pg_log_error("invalid value \"%s\" for option %s",
+					 optarg, optname);
+		return false;
+	}
+
+	/* as min_range and max_range are uint32 then the range check will
+	 * catch the case where unsigned long val is outside 32 bit range */
+	if (errno == ERANGE || val < min_range || val > max_range)
+	{
+		pg_log_error("%s not in range %u..%u", optname, min_range, max_range);
+		return false;
+	}
+
+	if (result)
+		*result = (uint32) val;
+	return true;
+}
+
 /*
  * Provide strictly harmonized handling of the --sync-method option.
  */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d975db77af2..67fd3650d7a 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -22,6 +22,9 @@ extern void handle_help_version_opts(int argc, char *argv[],
 extern bool option_parse_int(const char *optarg, const char *optname,
 							 int min_range, int max_range,
 							 int *result);
+extern bool option_parse_uint32(const char *optarg, const char *optname,
+							 uint32 min_range, uint32 max_range,
+							 uint32 *result);
 extern bool parse_sync_method(const char *optarg,
 							  DataDirSyncMethod *sync_method);
 extern void check_mut_excl_opts_internal(int n,...);
-- 
2.53.0.1018.g2bb0e51243-goog



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-12 12:59 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Ashutosh Bapat <[email protected]>
  2025-11-13 18:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 18:39     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:24       ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:26         ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-13 20:34           ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:32             ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-28 15:33               ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-29 21:49                 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2026-03-30 17:32                   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2026-03-30 21:32                     ` Zsolt Parragi <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Zsolt Parragi @ 2026-03-30 21:32 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; David Rowley <[email protected]>; Michael Banck <[email protected]>; Ashutosh Bapat <[email protected]>; Nathan Bossart <[email protected]>

Hello!

A simple test causes an assertion failure in my testing, dependency
counting still doesn't seem to work correctly:

pg_restore: >...>/pg_backup_archiver.c:5207: reduce_dependencies:
Assertion `otherte->depCount > 0' failed.

Without assertions it results in data loss.

004_pg_dump_parallel also showcases the issue in my testing.

But simple manual testing also confirms it:

1. create some data

CREATE TABLE tplain (id int UNIQUE);
INSERT INTO tplain SELECT x FROM generate_series(1,1000) x;

2. create a dump

dump with --max-table-segment-pages=2

3. try to restore

restore with --jobs=3





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-17 04:15 ` Dilip Kumar <[email protected]>
  2025-11-24 21:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  1 sibling, 1 reply; 16+ messages in thread

From: Dilip Kumar @ 2025-11-17 04:15 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
>
> Attached is a patch that adds the ability to dump table data in multiple chunks.
>
> Looking for feedback at this point:
>  1) what have I missed
>  2) should I implement something to avoid single-page chunks
>
> The flag --huge-table-chunk-pages which tells the directory format
> dump to dump tables where the main fork has more pages than this in
> multiple chunks of given number of pages,
>
> The main use case is speeding up parallel dumps in case of one or a
> small number of HUGE tables so parts of these can be dumped in
> parallel.
>

+1 for the idea, I haven't done the detailed review but I was just
going through the patch, I noticed that we use pg_class->relpages to
identify whether to chunk the table or not, which should be fine but
don't you think if we use direct size calculation function like
pg_relation_size() we might get better idea and not dependent upon
whether the stats are updated or not?  This will make chunking
behavior more deterministic.

-- 
Regards,
Dilip Kumar
Google





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-17 04:15 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Dilip Kumar <[email protected]>
@ 2025-11-24 21:02   ` Hannu Krosing <[email protected]>
  2025-11-25 04:50     ` Re: Patch: dumping tables data in multiple chunks in pg_dump Dilip Kumar <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Hannu Krosing @ 2025-11-24 21:02 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

The expectation was that as chunking is useful mainly in case of
really huge tables the analyze should have been run "recently enough".

Maybe we should use pg_relation_size() in case we have already
determined that the table is large enough to warrant chunking? Maybe
at least 1/2 of the requested chunk size?

My reasoning was to not put too much extra load on pg_dump in case
chunking is not required. But of course we can use the presence of a
chunking request to decide to run pg_relation_size(), assuming the
overhead won't be too large in this case.


On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <[email protected]> wrote:
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> >  1) what have I missed
> >  2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
> >
>
> +1 for the idea, I haven't done the detailed review but I was just
> going through the patch, I noticed that we use pg_class->relpages to
> identify whether to chunk the table or not, which should be fine but
> don't you think if we use direct size calculation function like
> pg_relation_size() we might get better idea and not dependent upon
> whether the stats are updated or not?  This will make chunking
> behavior more deterministic.
>
> --
> Regards,
> Dilip Kumar
> Google





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Patch: dumping tables data in multiple chunks in pg_dump
  2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
  2025-11-17 04:15 ` Re: Patch: dumping tables data in multiple chunks in pg_dump Dilip Kumar <[email protected]>
  2025-11-24 21:02   ` Re: Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
@ 2025-11-25 04:50     ` Dilip Kumar <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Dilip Kumar @ 2025-11-25 04:50 UTC (permalink / raw)
  To: Hannu Krosing <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Nathan Bossart <[email protected]>

On Tue, Nov 25, 2025 at 2:32 AM Hannu Krosing <[email protected]> wrote:
>
> The expectation was that as chunking is useful mainly in case of
> really huge tables the analyze should have been run "recently enough".
>
> Maybe we should use pg_relation_size() in case we have already
> determined that the table is large enough to warrant chunking? Maybe
> at least 1/2 of the requested chunk size?
>
> My reasoning was to not put too much extra load on pg_dump in case
> chunking is not required. But of course we can use the presence of a
> chunking request to decide to run pg_relation_size(), assuming the
> overhead won't be too large in this case.
>
>
> On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <[email protected]> wrote:
> >
> > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > >
> > > Looking for feedback at this point:
> > >  1) what have I missed
> > >  2) should I implement something to avoid single-page chunks
> > >
> > > The flag --huge-table-chunk-pages which tells the directory format
> > > dump to dump tables where the main fork has more pages than this in
> > > multiple chunks of given number of pages,
> > >
> > > The main use case is speeding up parallel dumps in case of one or a
> > > small number of HUGE tables so parts of these can be dumped in
> > > parallel.
> > >
> >
> > +1 for the idea, I haven't done the detailed review but I was just
> > going through the patch, I noticed that we use pg_class->relpages to
> > identify whether to chunk the table or not, which should be fine but
> > don't you think if we use direct size calculation function like
> > pg_relation_size() we might get better idea and not dependent upon
> > whether the stats are updated or not?  This will make chunking
> > behavior more deterministic.

Yeah that makes sense, we can use relpages for initial identification
and then use pg_relation_size() if relpages says the table is large
enough.

-- 
Regards,
Dilip Kumar
Google





^ permalink  raw  reply  [nested|flat] 16+ messages in thread


end of thread, other threads:[~2026-03-30 21:32 UTC | newest]

Thread overview: 16+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-11-11 15:29 Patch: dumping tables data in multiple chunks in pg_dump Hannu Krosing <[email protected]>
2025-11-12 12:59 ` Ashutosh Bapat <[email protected]>
2025-11-13 18:02   ` Hannu Krosing <[email protected]>
2025-11-13 18:39     ` Hannu Krosing <[email protected]>
2025-11-13 20:24       ` Hannu Krosing <[email protected]>
2025-11-13 20:26         ` Hannu Krosing <[email protected]>
2025-11-13 20:34           ` Hannu Krosing <[email protected]>
2026-03-28 10:59             ` Hannu Krosing <[email protected]>
2026-03-28 15:32             ` Hannu Krosing <[email protected]>
2026-03-28 15:33               ` Hannu Krosing <[email protected]>
2026-03-29 21:49                 ` Hannu Krosing <[email protected]>
2026-03-30 17:32                   ` Hannu Krosing <[email protected]>
2026-03-30 21:32                     ` Zsolt Parragi <[email protected]>
2025-11-17 04:15 ` Dilip Kumar <[email protected]>
2025-11-24 21:02   ` Hannu Krosing <[email protected]>
2025-11-25 04:50     ` Dilip Kumar <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox