Efficient batched iteration over hash/list partitioned tables

public inbox for [email protected]  
help / color / mirror / Atom feed

From: QUINCEROT Emmanuel <[email protected]>
To: [email protected] <[email protected]>
Subject: Efficient batched iteration over hash/list partitioned tables
Date: Wed, 28 Jan 2026 09:48:49 +0000 (UTC)
Message-ID: <[email protected]> (raw)
References: <[email protected]>

Hello dear community,
Hash partitioning is useful for very large datasets when the main access patterns are on the partition key. However, we sometimes need to backfill this data in an online fashion, which presents a challenge.
When backfilling a non-partitioned table, we can iterate over the primary key in batches until all rows are processed. This works well because the primary key is unique and ordered.
The query looks like this:
    SELECT *    FROM table    WHERE pk_col > :last_pk_value    ORDER BY pk_col    LIMIT batch_size;

However, when working with hash-partitioned tables, this strategy is inefficient because the primary key is not ordered across partitions. The query planner must retrieve the first N rows from each partition, sort them globally, and then return only enough rows to fill the batch size.
A workaround is to process each partition independently, but this has drawbacks:- It requires additional logic to track progress across multiple partitions- The logic differs between partitioned and non-partitioned tables, making the client partitioning-aware
**Proposed solution:**
Could we make ordering by `tableoid, [primary key columns]` work efficiently for partitioned tables?
In other words, something like this:
    SELECT tableoid, *    FROM table    WHERE (tableoid, pk_col) > (:last_tableoid, :last_pk_value)    ORDER BY tableoid, pk_col    LIMIT batch_size;
Currently, from PG 15 to PG 18, the planner doesn't handle ordering by tableoid efficiently: !ALL! rows are fetched from each partition, then appended, sorted, and limited.
Could we optimize the planner to handle `ORDER BY tableoid` efficiently in this context?
Note: This problem primarily concerns hash and list partitioning, as range partitioning can be batched efficiently by ordering on the partition key itself.
Many thanks,
Emmanuel

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Efficient batched iteration over hash/list partitioned tables
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox