Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfU8E-002CEQ-2T for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Jan 2026 02:27:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vfU8C-002K5r-33 for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Jan 2026 02:27:41 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfU8C-002K5h-27 for pgsql-hackers@lists.postgresql.org; Tue, 13 Jan 2026 02:27:40 +0000 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vfU8A-0008Wq-0y for pgsql-hackers@lists.postgresql.org; Tue, 13 Jan 2026 02:27:40 +0000 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-477a2ab455fso64201085e9.3 for ; Mon, 12 Jan 2026 18:27:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768271257; x=1768876057; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4dq/oi8ctHs1FLRZ2Q3In7ZQf55wZQQ90NWCkwTwae8=; b=RRSIP/d8axnviJN/eIGsm77p4+UX61AftqPQCXqmgkEIGDoWKYePCGy6yGclAjIdYQ B27CVUUlwbm16QOidYzd9RlP1n3v3bk2naM2fxHaF7UfzFvf7Owy2JAFw74JRuMSlAkL GWkEo/SZn2NWqdznmlgCT8w2zsYpfe0kihoXIDO6gr8EhkJkkM30YYwIHSkvO79FBhgE 4DZ2nrT7T1GOyPZZETopTQD326hwPFdOdteX3+CkjwfffhQFAV3fOA8Ijl54XxTXm1Vn u0Nbhv/3hVhoPO7DXXtPhvgQlIVQU2gsQefnMvelc4iYbBI9E0bacBSoHyYOzQPNJNFr NXqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768271257; x=1768876057; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4dq/oi8ctHs1FLRZ2Q3In7ZQf55wZQQ90NWCkwTwae8=; b=Ox9hOjgP7hJ+zXzSiI+iQnRTi/m74fOSV609LDd7dzxl4gG+P4zS3FxTefeklz4E+d 1QP+30UHDDR6x6OF8pv5GR7y4g+9CDR17HlBmQ4FNkVvWgjLpJ6NpSg5uDpyQ/L+l87V S65fMEVjs84dDzNywQfduVYugSvQhQGZicSUA9LGk8lDieGTkhAsyOMlZJ636cF6O8zK aa/bQxiJyPeg5QzsiaI0oSD62TUaDiE9LxB+51UDyutROKAMjk7mvSENAgKSqlzZmVnk HAwseBQMOe8Yjo96IZm9V4e56iX/DcwkK+qirq8IVl6V/oFA8FonQ1FjpZN0bvCIek+G V17Q== X-Forwarded-Encrypted: i=1; AJvYcCXLOg2HrlY/TJc0uGL75WjPGKQj9E5cz9H7kMFU1RQ67MQ5odHgWHYH7HpyXyJc+CGX84pWhS0LDctwKE6r@lists.postgresql.org X-Gm-Message-State: AOJu0YwuHDKYYx1KuQkTSVe7nXKcsox0wmjJr2BGDY8N7x8qabdhlgJp Go2aKLKmhabE+7jnpM2jSpGuGd5FqMrZ7kxJTxR2l3y2Gzk/SPkjH5Y3t0rsIzoYZVxmIcKIDid ebaBet+flWhq3EYD5d1PW2orZ+S/OEQY= X-Gm-Gg: AY/fxX72Xq4riTXK8cOpS9xgGLUgvvtJDZruwClLq8qaGlVtUUhZ/mFUYNiE+4j2bkG cP4BSR3t55gkasfVyOGFOS1rkZCi1Xv6x+mFbI3Asl2fhWzRGxpn5CIx2OzvTJAFUPh03x6COty GMMiO7d1S9QXbVnnjtenS2eteOv5XqOqE2G2eiTGFgbuvR2kERuEqcKFrZ6NJDrCKV4vbhyxJQo x2qUoARr3e5oElUD2neb0ma2WM4FjXUAN511Xf9koM0R8yjyzLk2yB8xQfdXrZXwnT+Ob/W2Mq+ oMzgbqKoyXVtaCQhJL8ePDCAii49tD18UnP85EF5KPhkC5tWGIywmq3VMo3ltw== X-Google-Smtp-Source: AGHT+IG1xXZ98d0xgvCbHbhky7FsDjxwDD1/M0lfRuWL2fyHwJTSPTcI1Xog8HPCzFiSORVoFEKPQPLknbmV52M1Zlk= X-Received: by 2002:a05:600c:1e1c:b0:477:58af:a91d with SMTP id 5b1f17b1804b1-47d84b0aa4bmr200720805e9.5.1768271257191; Mon, 12 Jan 2026 18:27:37 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Rowley Date: Tue, 13 Jan 2026 15:27:25 +1300 X-Gm-Features: AZwV_QjeVjxj_iDJtdFyAYSWPzxNYcE2YhvgrC0gEe-rM4-6s_byHue0nKP2YAs Message-ID: Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump To: Hannu Krosing Cc: Ashutosh Bapat , PostgreSQL Hackers , Nathan Bossart Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 14 Nov 2025 at 09:34, Hannu Krosing wrote: > Added to https://commitfest.postgresql.org/patch/6219/ I think this could be useful, but I think you'll need to find a way to not do this for non-heap tables. Per the comments in TableAmRoutine, both scan_set_tidrange and scan_getnextslot_tidrange are optional callback functions and the planner won't produce TIDRangePaths if either of those don't exist. Maybe that means you need to consult pg_class.relam to ensure the amname is 'heap' or at least the relam = 2. On testing Citus's columnar AM, I get: postgres=# select * from t where ctid between '(0,1)' and '(10,0)'; ERROR: UPDATE and CTID scans not supported for ColumnarScan 1. For the patch, I think you should tighten the new option up to mean the maximum segment size that a table will be dumped in. I see you have comments like: /* TODO: add hysteresis here, maybe < 1.1 * huge_table_chunk_pages */ You *have* to put the cutoff *somewhere*, so I think it very much should be exactly the specified threshold. If anyone is unhappy that some segments consist of a single page, then that's on them to adjust the parameter accordingly. Otherwise, someone complaints that they got a 1-page segment when the table was 10.0001% bigger than the cutoff and then we're tempted to add a new setting to control the 1.1 factor, which is just silly. If there's a 1-page segment, so what? It's not a big deal. Perhaps --max-table-segment-pages is a better name than --huge-table-chunk-pages as it's quite subjective what the minimum number of pages required to make a table "huge". 2. I'm not sure if you're going to get away with using relpages for this. Is it really that bad to query pg_relation_size() when this option is set? If it really is a problem, then maybe let the user choose with another option. I understand we're using relpages for sorting table sizes so we prefer dumping larger tables first, but that just seems way less important if it's not perfectly accurate. 3. You should be able to simplify the code in dumpTableData() so you're not adding any extra cases. You could use InvalidBlockNumber to indicate an unbounded ctid range and only add ctid qual to the WHERE clause when you have a bounded range (i.e not InvalidBlockNumber). That way the first segment will need WHERE ctid <= '...' and the final one will need WHERE ctid >= '...'. Everything in between will have an upper and lower bound. That results in no ctid quals being added when both ranges are set to InvalidBlockNumber, which you should use for all tables not large enough to be segmented, thus no special case. TID Range scans are perfectly capable of working when only bounded at one side. 4. I think using "int" here is a future complaint waiting to happen. + if (!option_parse_int(optarg, "--huge-table-chunk-pages", 1, INT32_MAX, + &dopt.huge_table_chunk_pages)) I bet we'll eventually see a complaint that someone can't make the segment size larger than 16TB. I think option_parse_uint32() might be called for. David