Date: Wed, 28 Jan 2026 09:48:49 +0000 (UTC)
From: QUINCEROT Emmanuel <equincerot@yahoo.fr>
To: 
	"pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Message-ID: <316192095.8713932.1769593729038@mail.yahoo.com>
Subject: Efficient batched iteration over hash/list partitioned tables
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_8713931_1961726620.1769593729037"
References: <316192095.8713932.1769593729038.ref@mail.yahoo.com>
Content-Length: 5254
Archived-At: <https://www.postgresql.org/message-id/316192095.8713932.1769593729038%40mail.yahoo.com>
Precedence: bulk

------=_Part_8713931_1961726620.1769593729037
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello dear community,
Hash partitioning is useful for very large datasets when the main access pa=
tterns are on the partition key. However, we sometimes need to backfill thi=
s data in an online fashion, which presents a challenge.
When backfilling a non-partitioned table, we can iterate over the primary k=
ey in batches until all rows are processed. This works well because the pri=
mary key is unique and ordered.
The query looks like this:
=C2=A0 =C2=A0 SELECT *=C2=A0 =C2=A0 FROM table=C2=A0 =C2=A0 WHERE pk_col > =
:last_pk_value=C2=A0 =C2=A0 ORDER BY pk_col=C2=A0 =C2=A0 LIMIT batch_size;

However, when working with hash-partitioned tables, this strategy is ineffi=
cient because the primary key is not ordered across partitions. The query p=
lanner must retrieve the first N rows from each partition, sort them global=
ly, and then return only enough rows to fill the batch size.
A workaround is to process each partition independently, but this has drawb=
acks:- It requires additional logic to track progress across multiple parti=
tions- The logic differs between partitioned and non-partitioned tables, ma=
king the client partitioning-aware
**Proposed solution:**
Could we make ordering by `tableoid, [primary key columns]` work efficientl=
y for partitioned tables?
In other words, something like this:
=C2=A0 =C2=A0 SELECT tableoid, *=C2=A0 =C2=A0 FROM table=C2=A0 =C2=A0 WHERE=
 (tableoid, pk_col) > (:last_tableoid, :last_pk_value)=C2=A0 =C2=A0 ORDER B=
Y tableoid, pk_col=C2=A0 =C2=A0 LIMIT batch_size;
Currently, from PG 15 to PG 18, the planner doesn't handle ordering by tabl=
eoid efficiently: !ALL! rows are fetched from each partition, then appended=
, sorted, and limited.
Could we optimize the planner to handle `ORDER BY tableoid` efficiently in =
this context?
Note: This problem primarily concerns hash and list partitioning, as range =
partitioning can be batched efficiently by ordering on the partition key it=
self.
Many thanks,
Emmanuel
------=_Part_8713931_1961726620.1769593729037
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><head></head><body><div class=3D"yahoo-style-wrap" style=3D"font-fami=
ly:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div dir=
=3D"ltr" data-setdir=3D"false"><div><div dir=3D"ltr" data-setdir=3D"false">=
Hello dear community,</div><div dir=3D"ltr" data-setdir=3D"false"><br></div=
><div dir=3D"ltr" data-setdir=3D"false">Hash partitioning is useful for ver=
y large datasets when the main access patterns are on the partition key. Ho=
wever, we sometimes need to backfill this data in an online fashion, which =
presents a challenge.</div><div><br></div><div>When backfilling a non-parti=
tioned table, we can iterate over the primary key in batches until all rows=
 are processed. This works well because the primary key is unique and order=
ed.</div><div><br></div><div>The query looks like this:</div><div><br></div=
><div>&nbsp; &nbsp; SELECT *</div><div>&nbsp; &nbsp; FROM table</div><div>&=
nbsp; &nbsp; WHERE pk_col &gt; :last_pk_value</div><div>&nbsp; &nbsp; ORDER=
 BY pk_col</div><div>&nbsp; &nbsp; LIMIT batch_size;</div><div><br></div><d=
iv><br></div><div>However, when working with hash-partitioned tables, this =
strategy is inefficient because the primary key is not ordered across parti=
tions. The query planner must retrieve the first N rows from each partition=
, sort them globally, and then return only enough rows to fill the batch si=
ze.</div><div><br></div><div>A workaround is to process each partition inde=
pendently, but this has drawbacks:</div><div>- It requires additional logic=
 to track progress across multiple partitions</div><div>- The logic differs=
 between partitioned and non-partitioned tables, making the client partitio=
ning-aware</div><div><br></div><div>**Proposed solution:**</div><div><br></=
div><div>Could we make ordering by `tableoid, [primary key columns]` work e=
fficiently for partitioned tables?</div><div><br></div><div dir=3D"ltr" dat=
a-setdir=3D"false">In other words, something like this:</div><div><br></div=
><div>&nbsp; &nbsp; SELECT tableoid, *</div><div>&nbsp; &nbsp; FROM table</=
div><div>&nbsp; &nbsp; WHERE (tableoid, pk_col) &gt; (:last_tableoid, :last=
_pk_value)</div><div>&nbsp; &nbsp; ORDER BY tableoid, pk_col</div><div>&nbs=
p; &nbsp; LIMIT batch_size;</div><div><br></div><div>Currently, from PG 15 =
to PG 18, the planner doesn't handle ordering by tableoid efficiently: !ALL=
! rows are fetched from each partition, then appended, sorted, and limited.=
</div><div><br></div><div>Could we optimize the planner to handle `ORDER BY=
 tableoid` efficiently in this context?</div><div><br></div><div>Note: This=
 problem primarily concerns hash and list partitioning, as range partitioni=
ng can be batched efficiently by ordering on the partition key itself.</div=
></div><br></div><div dir=3D"ltr" data-setdir=3D"false">Many thanks,</div><=
div dir=3D"ltr" data-setdir=3D"false"><br></div><div dir=3D"ltr" data-setdi=
r=3D"false">Emmanuel</div></div></body></html>
------=_Part_8713931_1961726620.1769593729037--