public inbox for [email protected]  
help / color / mirror / Atom feed
POC: Comparison of partitioning key values
4+ messages / 3 participants
[nested] [flat]

* POC: Comparison of partitioning key values
@ 2026-04-13 21:11  John Mikk <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: John Mikk @ 2026-04-13 21:11 UTC (permalink / raw)
  To: [email protected]

Hackers, Hi!

When executing the SQL script, we discovered strange behavior — one of
the partitions cannot be created.


```sql
drop table if exists grid2 cascade;

create table grid2(x bigint, y bigint) partition by range (x,y);

create table g2_part1 partition of grid2 for values from (1,3) to (7,11);
create table g2_part2 partition of grid2 for values from (7,11) to (13,15);
create table g2_part3 partition of grid2 for values from (13,15) to (15,17);
create table g2_part4 partition of grid2 for values from (15,17) to (19,21);
--
create table g2_part5 partition of grid2 for values from (5,15) to (13,17);
-- [42P17] ERROR: partition "g2_part5" would overlap partition "g2_part1"
```

Why is that?

According to the documentation, the row comparison rule for tables applies
(subsection 9.25.5).
However, in the case of row comparison, it is possible to override
comparison operators, which is not the case when defining partitions.
There is no way to override the comparison operator for partition ranges.
And is this general approach always correct in the case of partitioning?

For example, the following approach seems quite valid:

If we look at the relative positions of the ranges for each partitioning
key (of a single table) on a plane, assuming that the from and to values of
the partitioning key in for values are points on the main diagonal of a
rectangle representing possible key values for the partition, then there
appear to be no obstacles to creating the last partition in the provided
SQL script.

Illustrative Python script:

```python
import matplotlib.pyplot as plt
import matplotlib.patches as patches

fig, ax = plt.subplots()

def key(point, radius=0.1, color='red', ax=ax):
circle = patches.Circle(point, radius=radius, color=color, alpha=1)
ax.add_patch(circle)
return

def range(lower, upper, label='', facecolor='blue', edgecolor='red',
alpha=0.5, ax=ax):
x1, y1 = lower
x2, y2 = upper

x_min, y_min = (min(x1, x2), min(y1, y2))
width, height = (abs(x2 - x1), abs(y2 - y1))

rectangle = patches.Rectangle(
(x_min, y_min), width, height,
facecolor=facecolor, edgecolor=edgecolor, alpha=alpha
)

ax.add_patch(rectangle)

if label:
text_x, text_y = (x_min, y_min)
ax.text(text_x, text_y, label, fontsize=8, color='white', ha='left',
va='bottom')

return rectangle

# Range, Key. -------------------------
range( (1, 3, ), (7, 11, ), 'g2_part1')
range( (7, 11, ), (13, 15, ), 'g2_part2')
range( (13, 15, ), (15, 17, ), 'g2_part3')
range( (15, 17, ), (19, 21, ), 'g2_part4')
range( (5, 15, ), (13, 17, ), 'g2_part5', 'black')

# Show. -------------------------------
plt.xlim(0, 25)
plt.ylim(0, 25)
plt.gca().set_aspect('equal')
plt.grid(True, alpha=0.3)
plt.show()
```

Let us provide a generalized reasoning:

Suppose we have a range-partitioned table with a partitioning key
consisting of n attributes.
Let us consider the from and to values of the partitioning key from the for
values clause as points on the main diagonal of an n-dimensional
parallelepiped of permissible key values for a specific partition.
Of course, our set is finite, but this perspective on the partitioning key
allows us to use a fact from multidimensional geometry:

Let the first parallelepiped be defined by the main diagonal points
A = (a_1, a_2, ..., a_n) and A' = (a_1', a_2', ..., a_n'), where a_i < a_i'
for all i,
and the second parallelepiped be defined by the main diagonal points
B = (b_1, b_2, ..., b_n) and B' = (b_1', b_2', ..., b_n'), where b_i < b_i'
for all i.

In this case, the following theorem (statement) holds true:

Two parallelepipeds do not intersect if and only if there exists at least
one coordinate k (from 1 to n) for which the projection intervals on this
axis do not intersect; that is, the intersection of [a_k, a_k'] and [b_k,
b_k'] is an empty set, or equivalently, the condition (a_k' < b_k) OR (b_k'
< a_k) holds for one of the k values.

In other words, this fact establishes: first, when two figures do not
intersect and are separated by a hyperplane x_k = const; and second, the
necessary and sufficient condition for the figures to intersect, given that
all projections intersect. The latter remains valid for a parallelepiped
reduced to a point and can determine whether a new key belongs to a
particular partition.

The experimental patch I am proposing (v1-0001-partition-by-range.patch)
introduces changes to the source code based on the described approach, so
that the partition from the example can be created.

Moreover, the algorithm for determining the partition of a new key during
an insert operation is even more mysterious in the current implementation;
my patch corrects this.

In the proposed patch, writing to output directly via fprintf is hardcoded.
This allows obtaining a plain text list of existing and added ranges after
each operation for the illustrative Python script.

A segmentation fault will occur during update and delete operations.

Or would adding an override for the range comparison operator be a more
flexible and correct approach?

I would like to hear your opinion, dear hackers!


Attachments:

  [text/x-patch] v1-0001-poc-partition-by-range.patch (14.3K, 3-v1-0001-poc-partition-by-range.patch)
  download | inline diff:
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..1093cb5044c 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1680,7 +1680,7 @@ get_partition_for_tuple(PartitionDispatch pd, const Datum *values, const bool *i
 				if (range_partkey_has_null)
 					break;
 
-				if (partdesc->last_found_count >= PARTITION_CACHED_FIND_THRESHOLD)
+				if (partdesc->last_found_count >= PARTITION_CACHED_FIND_THRESHOLD && false /* temporary disable */)
 				{
 					int			last_datum_offset = partdesc->last_found_datum_index;
 					Datum	   *lastDatums = boundinfo->datums[last_datum_offset];
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index f867d1b75a5..5b6099c7b3b 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -224,7 +224,8 @@ static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
 static int	partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
 									Oid *partcollation,
 									PartitionBoundInfo boundinfo,
-									PartitionRangeBound *probe, int32 *cmpval);
+									PartitionRangeBound *lower,
+									PartitionRangeBound *upper);
 static Expr *make_partition_op_expr(PartitionKey key, int keynum,
 									uint16 strategy, Expr *arg1, Expr *arg2);
 static Oid	get_partition_operator(PartitionKey key, int col,
@@ -677,8 +678,7 @@ create_range_bounds(PartitionBoundSpec **boundspecs, int nparts,
 {
 	PartitionBoundInfo boundinfo;
 	PartitionRangeBound **rbounds = NULL;
-	PartitionRangeBound **all_bounds,
-			   *prev;
+	PartitionRangeBound **all_bounds;
 	int			i,
 				k,
 				partnatts;
@@ -728,61 +728,12 @@ create_range_bounds(PartitionBoundSpec **boundspecs, int nparts,
 	Assert(ndatums == nparts * 2 ||
 		   (default_index != -1 && ndatums == (nparts - 1) * 2));
 
-	/* Sort all the bounds in ascending order */
-	qsort_arg(all_bounds, ndatums,
-			  sizeof(PartitionRangeBound *),
-			  qsort_partition_rbound_cmp,
-			  key);
-
-	/* Save distinct bounds from all_bounds into rbounds. */
 	rbounds = (PartitionRangeBound **)
 		palloc(ndatums * sizeof(PartitionRangeBound *));
 	k = 0;
-	prev = NULL;
 	for (i = 0; i < ndatums; i++)
 	{
-		PartitionRangeBound *cur = all_bounds[i];
-		bool		is_distinct = false;
-		int			j;
-
-		/* Is the current bound distinct from the previous one? */
-		for (j = 0; j < key->partnatts; j++)
-		{
-			Datum		cmpval;
-
-			if (prev == NULL || cur->kind[j] != prev->kind[j])
-			{
-				is_distinct = true;
-				break;
-			}
-
-			/*
-			 * If the bounds are both MINVALUE or MAXVALUE, stop now and treat
-			 * them as equal, since any values after this point must be
-			 * ignored.
-			 */
-			if (cur->kind[j] != PARTITION_RANGE_DATUM_VALUE)
-				break;
-
-			cmpval = FunctionCall2Coll(&key->partsupfunc[j],
-									   key->partcollation[j],
-									   cur->datums[j],
-									   prev->datums[j]);
-			if (DatumGetInt32(cmpval) != 0)
-			{
-				is_distinct = true;
-				break;
-			}
-		}
-
-		/*
-		 * Only if the bound is distinct save it into a temporary array, i.e,
-		 * rbounds which is later copied into boundinfo datums array.
-		 */
-		if (is_distinct)
 			rbounds[k++] = all_bounds[i];
-
-		prev = cur;
 	}
 
 	pfree(all_bounds);
@@ -3127,6 +3078,7 @@ check_new_partition_bound(char *relname, Relation parent,
 
 				if (partdesc->nparts > 0)
 				{
+					/* if key->partnatts == 1 or ? boundinfo->ndatums */
 					int			offset;
 
 					Assert(boundinfo &&
@@ -3152,68 +3104,10 @@ check_new_partition_bound(char *relname, Relation parent,
 					offset = partition_range_bsearch(key->partnatts,
 													 key->partsupfunc,
 													 key->partcollation,
-													 boundinfo, lower,
-													 &cmpval);
+													 boundinfo, lower, upper);
 
-					if (boundinfo->indexes[offset + 1] < 0)
-					{
-						/*
-						 * Check that the new partition will fit in the gap.
-						 * For it to fit, the new upper bound must be less
-						 * than or equal to the lower bound of the next
-						 * partition, if there is one.
-						 */
-						if (offset + 1 < boundinfo->ndatums)
-						{
-							Datum	   *datums;
-							PartitionRangeDatumKind *kind;
-							bool		is_lower;
-
-							datums = boundinfo->datums[offset + 1];
-							kind = boundinfo->kind[offset + 1];
-							is_lower = (boundinfo->indexes[offset + 1] == -1);
-
-							cmpval = partition_rbound_cmp(key->partnatts,
-														  key->partsupfunc,
-														  key->partcollation,
-														  datums, kind,
-														  is_lower, upper);
-							if (cmpval < 0)
-							{
-								/*
-								 * Point to problematic key in the upper
-								 * datums list.
-								 */
-								PartitionRangeDatum *datum =
-									list_nth(spec->upperdatums, abs(cmpval) - 1);
-
-								/*
-								 * The new partition overlaps with the
-								 * existing partition between offset + 1 and
-								 * offset + 2.
-								 */
-								overlap = true;
-								overlap_location = datum->location;
-								with = boundinfo->indexes[offset + 2];
-							}
-						}
-					}
-					else
-					{
-						/*
-						 * The new partition overlaps with the existing
-						 * partition between offset and offset + 1.
-						 */
-						PartitionRangeDatum *datum;
-
-						/*
-						 * Point to problematic key in the lower datums list;
-						 * if we have equality, point to the first one.
-						 */
-						datum = cmpval == 0 ? linitial(spec->lowerdatums) :
-							list_nth(spec->lowerdatums, abs(cmpval) - 1);
+					if (offset > -1) {
 						overlap = true;
-						overlap_location = datum->location;
 						with = boundinfo->indexes[offset + 1];
 					}
 				}
@@ -3646,34 +3540,133 @@ static int
 partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
 						Oid *partcollation,
 						PartitionBoundInfo boundinfo,
-						PartitionRangeBound *probe, int32 *cmpval)
+						PartitionRangeBound *lower,
+						PartitionRangeBound *upper)
 {
-	int			lo,
-				hi,
-				mid;
+	int offset = -1;
+	int d, partattr;
+	int32 cmpval1, cmpval2;
 
-	lo = -1;
-	hi = boundinfo->ndatums - 1;
-	while (lo < hi)
+	Datum *lower_datums, *upper_datums;
+
+	fprintf(stdout,"Func : partition_range_bsearch : \n");
+	fprintf(stdout,"\n# Partition by range (relevant for 2D partition key, for Python script) : \n");
+
+	/* New the partition by range */
+	partattr = 0;
+	fprintf(stdout,"range((");
+	while (partattr < partnatts) {
+		fprintf(stdout, "%lu, ", lower->datums[partattr]);
+		partattr++;
+	}
+	fprintf(stdout,"), ");
+
+	partattr = 0;
+	fprintf(stdout,"(");
+	while (partattr < partnatts) {
+		fprintf(stdout, "%lu, ", upper->datums[partattr]);
+		partattr++;
+	}
+	fprintf(stdout,"), 'new', facecolor='black', edgecolor='yellow')\n");
+
+	/* Existing the partition by range */
+	d = 0;
+	while ( d < boundinfo->ndatums - 1 )
 	{
-		mid = (lo + hi + 1) / 2;
-		*cmpval = partition_rbound_cmp(partnatts, partsupfunc,
-									   partcollation,
-									   boundinfo->datums[mid],
-									   boundinfo->kind[mid],
-									   (boundinfo->indexes[mid] == -1),
-									   probe);
-		if (*cmpval <= 0)
+		if ( boundinfo->indexes[d] == -1 && boundinfo->indexes[d+1] > -1 )
 		{
-			lo = mid;
-			if (*cmpval == 0)
-				break;
+			lower_datums = boundinfo->datums[d];
+			upper_datums = boundinfo->datums[d+1];
+
+			partattr = 0;
+			fprintf(stdout,"range((");
+			while (partattr < partnatts) {
+				fprintf(stdout, "%lu, ", lower_datums[partattr]);
+				partattr++;
+			}
+			fprintf(stdout,"), ");
+
+			partattr = 0;
+			fprintf(stdout,"(");
+			while (partattr < partnatts) {
+				fprintf(stdout, "%lu, ", upper_datums[partattr]);
+				partattr++;
+			}
+			fprintf(stdout,"),'p-%d')\n", boundinfo->indexes[d+1]);
+
+			d+=2;
 		}
 		else
-			hi = mid - 1;
+			d++;
 	}
+	fprintf(stdout,"\n");
+	fflush(stdout);
 
-	return lo;
+	/* Search algorithm */
+	d = 0;
+	while ( d < boundinfo->ndatums )
+	{
+		if ( (boundinfo->indexes[d] == -1) && (boundinfo->indexes[d+1] > -1) )
+		{
+			lower_datums = boundinfo->datums[d];
+			upper_datums = boundinfo->datums[d+1];
+
+			partattr = 0;
+			while (partattr < partnatts)
+			{
+				cmpval1 = DatumGetInt32(FunctionCall2Coll(&partsupfunc[partattr], partcollation[partattr],
+														 upper_datums[partattr], lower->datums[partattr]));
+
+				cmpval2 = DatumGetInt32(FunctionCall2Coll(&partsupfunc[partattr], partcollation[partattr],
+														 upper->datums[partattr], lower_datums[partattr]));
+
+				/* Step of the search algorithm */
+				/* datums_lower[partattr], datums_upper[partattr] ;
+				 * probe->datums[partattr], upper->datums[partattr];
+				 * a,a'; b,b' -> a_k' < b_k OR b_k' < a_k
+				 */
+				fprintf(stdout, "Range : Exists( %lu, %lu), New( %lu, %lu)\n",
+					lower_datums[partattr], upper_datums[partattr],
+					lower->datums[partattr], upper->datums[partattr]);
+				fprintf(stdout, "Condition: 1) %lu < %lu : result = %d or 2) %lu < %lu : result = %d; attr = %d; ",
+					upper_datums[partattr], lower->datums[partattr], cmpval1,
+					upper->datums[partattr], lower_datums[partattr], cmpval2,
+					partattr);
+
+				if ( cmpval1 < 1 || cmpval2 < 1 )
+				{
+					/* The partition range passed the check - there are no intersections
+					 * with the new partition range.
+					 * Then exit the loop with the value partnatts++
+					 */
+					partattr = partnatts;
+					fprintf(stdout, "[CHECK OK].");
+				}
+
+				partattr++;
+				fprintf(stdout,"\n");
+				fflush(stdout);
+			}
+
+			if (partattr == partnatts)
+			{
+				/* This condition means that in the while loop for all <partattr>,
+				 * there is an intersection of ranges across all partition key value.
+				 * Therefore, we can skip further partition range checks.
+				 */
+				offset = d;
+				d = boundinfo->ndatums + 3;
+			}
+			d+=2;
+		}
+		else
+			d++;
+	}
+
+	fprintf(stdout,"New the partition by range : %s.\n", (d == boundinfo->ndatums + 3) ? "ERROR" : "OK" );
+	fflush(stdout);
+
+	return offset;
 }
 
 /*
@@ -3689,36 +3682,103 @@ partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
 							  PartitionBoundInfo boundinfo,
 							  int nvalues, const Datum *values, bool *is_equal)
 {
-	int			lo,
-				hi,
-				mid;
+	int offset = -1;
+	int d, partattr;
+	int32 cmpval1, cmpval2;
+
+	Datum *lower_datums, *upper_datums;
+	fprintf(stdout,"Func : partition_range_datum_bsearch : nvalues = %d\n", nvalues);
+	fprintf(stdout,"\n# Partition by range (relevant for 2D partition key, for Python script) : \n");
+
+	/* New the partition key */
+	partattr = 0;
+	fprintf(stdout,"key((");
+	while (partattr < nvalues) {
+		fprintf(stdout, "%lu, ", values[partattr]);
+		partattr++;
+	}
+	fprintf(stdout,"), 0.3)\n");
 
-	lo = -1;
-	hi = boundinfo->ndatums - 1;
-	while (lo < hi)
+	/* Existing the partition by range */
+	d = 0;
+	while ( d < boundinfo->ndatums - 1 )
 	{
-		int32		cmpval;
+		if ( boundinfo->indexes[d] == -1 && boundinfo->indexes[d+1] > -1 )
+		{
+			lower_datums = boundinfo->datums[d];
+			upper_datums = boundinfo->datums[d+1];
+
+			partattr = 0;
+			fprintf(stdout,"range((");
+			while (partattr < nvalues) {
+				fprintf(stdout, "%lu, ", lower_datums[partattr]);
+				partattr++;
+			}
+			fprintf(stdout,"), ");
 
-		mid = (lo + hi + 1) / 2;
-		cmpval = partition_rbound_datum_cmp(partsupfunc,
-											partcollation,
-											boundinfo->datums[mid],
-											boundinfo->kind[mid],
-											values,
-											nvalues);
-		if (cmpval <= 0)
+			partattr = 0;
+			fprintf(stdout,"(");
+			while (partattr < nvalues) {
+				fprintf(stdout, "%lu, ", upper_datums[partattr]);
+				partattr++;
+			}
+			fprintf(stdout,"),'p-%d')\n", boundinfo->indexes[d+1]);
+
+			d+=2;
+		}
+		else
+			d++;
+	}
+	fprintf(stdout,"\n");
+	fflush(stdout);
+
+	/* Search algorithm */
+	d = 0;
+	while ( d < boundinfo->ndatums )
+	{
+		if ( (boundinfo->indexes[d] == -1) && (boundinfo->indexes[d+1] > -1) )
 		{
-			lo = mid;
-			*is_equal = (cmpval == 0);
+			lower_datums = boundinfo->datums[d];
+			upper_datums = boundinfo->datums[d+1];
 
-			if (*is_equal)
-				break;
+			partattr = 0;
+			while (partattr < nvalues)
+			{
+				cmpval1 = DatumGetInt32(FunctionCall2Coll(&partsupfunc[partattr], partcollation[partattr],
+														 lower_datums[partattr], values[partattr]));
+
+				cmpval2 = DatumGetInt32(FunctionCall2Coll(&partsupfunc[partattr], partcollation[partattr],
+														 values[partattr], upper_datums[partattr]));
+
+				/* Step of the search algorithm */
+				fprintf(stdout, "Range : Exists(%lu, %lu), Key(%lu)\n",  lower_datums[partattr], upper_datums[partattr], values[partattr]);
+
+				if ( !(cmpval1 < 1 && cmpval2 < 1) )
+				{
+					partattr = nvalues;
+					fprintf(stdout, "[KEY OUT RANGE].");
+				}
+
+				partattr++;
+				fprintf(stdout,"\n");
+				fflush(stdout);
+			}
+
+			if (partattr == nvalues)
+			{
+				offset = d;
+				d = boundinfo->ndatums + 3;
+			}
+			d+=2;
 		}
 		else
-			hi = mid - 1;
+			d++;
 	}
 
-	return lo;
+	fprintf(stdout,"New partition key : %s.\n", (d == boundinfo->ndatums + 3) ? "OK" : "ERROR" );
+	fflush(stdout);
+
+	return offset;
 }
 
 /*
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index c3d275f8726..77ef116172c 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -71,7 +71,7 @@ PartitionDesc
 RelationGetPartitionDesc(Relation rel, bool omit_detached)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-
+	return RelationBuildPartitionDesc(rel, omit_detached);
 	/*
 	 * If relcache has a partition descriptor, use that.  However, we can only
 	 * do so when we are asked to include all partitions including detached;
@@ -307,9 +307,30 @@ retry:
 	 * Create PartitionBoundInfo and mapping, working in the caller's context.
 	 * This could fail, but we haven't done any damage if so.
 	 */
-	if (nparts > 0)
+	if (nparts > 0) {
 		boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
 
+		int d = 0, partattr;
+
+		fprintf(stdout,"BoundInfo : \n");
+		while ( d < boundinfo->ndatums )
+		{
+			/*cur_datum = boundinfo->datums[d];*/
+			fprintf(stdout,"Index = %d, Range (",boundinfo->indexes[d]);
+
+			partattr = 0;
+			while (partattr < key->partnatts)
+			{
+				fprintf(stdout," %lu, ", boundinfo->datums[d][partattr]);
+				partattr++;
+			}
+			fprintf(stdout,");\n");
+			fflush(stdout);
+
+			d++;
+		}
+	}
+
 	/*
 	 * Now build the actual relcache partition descriptor, copying all the
 	 * data into a new, small context.  As per above comment, we don't make


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: POC: Comparison of partitioning key values
@ 2026-04-14 05:19  David Rowley <[email protected]>
  parent: John Mikk <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: David Rowley @ 2026-04-14 05:19 UTC (permalink / raw)
  To: John Mikk <[email protected]>; +Cc: [email protected]

On Tue, 14 Apr 2026 at 09:11, John Mikk <[email protected]> wrote:
> ```sql
> drop table if exists grid2 cascade;
>
> create table grid2(x bigint, y bigint) partition by range (x,y);
>
> create table g2_part1 partition of grid2 for values from (1,3) to (7,11);
> create table g2_part2 partition of grid2 for values from (7,11) to (13,15);
> create table g2_part3 partition of grid2 for values from (13,15) to (15,17);
> create table g2_part4 partition of grid2 for values from (15,17) to (19,21);
> --
> create table g2_part5 partition of grid2 for values from (5,15) to (13,17);
> -- [42P17] ERROR: partition "g2_part5" would overlap partition "g2_part1"
> ```
>
> Why is that?

Because (5,15) is between (1,3) (inclusive) and (7,11) (non-inclusive)

> According to the documentation, the row comparison rule for tables applies (subsection 9.25.5).
> However, in the case of row comparison, it is possible to override comparison operators, which is not the case when defining partitions.
> There is no way to override the comparison operator for partition ranges. And is this general approach always correct in the case of partitioning?

You're free to create your own type and own btree opfamily, but I
don't see how you're going to tell it that (5,15) isn't above or equal
to (1,3), and separately, isn't below (7,11). This all works with the
notion of sorting on a single dimention, where "y" is the tiebreaker
for equal "x" values. There's just no way to map that into
2-dimentional space with table partitioning.

Note the part of the documentation in [1]:

"For example, given PARTITION BY RANGE (x,y), a partition bound FROM
(1, 2) TO (3, 4) allows x=1 with any y>=2, x=2 with any non-null y,
and x=3 with any y<4."

> Let us provide a generalized reasoning:
>
> Suppose we have a range-partitioned table with a partitioning key consisting of n attributes.
> Let us consider the from and to values of the partitioning key from the for values clause as points on the main diagonal of an n-dimensional parallelepiped of permissible key values for a specific partition.
> Of course, our set is finite, but this perspective on the partitioning key allows us to use a fact from multidimensional geometry:
>
> Let the first parallelepiped be defined by the main diagonal points
> A = (a_1, a_2, ..., a_n) and A' = (a_1', a_2', ..., a_n'), where a_i < a_i' for all i,
> and the second parallelepiped be defined by the main diagonal points
> B = (b_1, b_2, ..., b_n) and B' = (b_1', b_2', ..., b_n'), where b_i < b_i' for all i.
>
> In this case, the following theorem (statement) holds true:

So I think you must be thinking that another RANGE column adds another
dimention. That's not the case. It's still 1 dimention, you just have
more tiebreaker columns in the notion sorting by the partition bound.

> Two parallelepipeds do not intersect if and only if there exists at least one coordinate k (from 1 to n) for which the projection intervals on this axis do not intersect; that is, the intersection of [a_k, a_k'] and [b_k, b_k'] is an empty set, or equivalently, the condition (a_k' < b_k) OR (b_k' < a_k) holds for one of the k values.
>
> In other words, this fact establishes: first, when two figures do not intersect and are separated by a hyperplane x_k = const; and second, the necessary and sufficient condition for the figures to intersect, given that all projections intersect. The latter remains valid for a parallelepiped reduced to a point and can determine whether a new key belongs to a particular partition.
>
> The experimental patch I am proposing (v1-0001-partition-by-range.patch) introduces changes to the source code based on the described approach, so that the partition from the example can be created.
>
> Moreover, the algorithm for determining the partition of a new key during an insert operation is even more mysterious in the current implementation; my patch corrects this.

It uses a binary search to determine if the partition key columns in
the inserted tuple falls within a partition's bound. It's not clear to
me what exactly isn't correct about that.

> In the proposed patch, writing to output directly via fprintf is hardcoded. This allows obtaining a plain text list of existing and added ranges after each operation for the illustrative Python script.
>
> A segmentation fault will occur during update and delete operations.
>
> Or would adding an override for the range comparison operator be a more flexible and correct approach?
>
> I would like to hear your opinion, dear hackers!

You can't change how RANGE partitioning works and not break things for
everyone using RANGE partitioning when they upgrade. If your patch is
proposing that, then it's going to fail. If you're proposing a new
partitioning method, then that's different. It's still a hefty amount
of work. If you're proposing that then do a detailed proposal here
before doing too much work. Remember that with declarative
partitioning, there can be only (at most) a single partition for any
given tuple. The tuple routing done during INSERT and UPDATE requires
that. Finding the correct partition must also be fast as INSERT/UPDATE
performance needs to run that code for every affected tuple.

David

[1] https://www.postgresql.org/docs/current/sql-createtable.html





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: POC: Comparison of partitioning key values
@ 2026-04-16 20:35  John Mikk <[email protected]>
  parent: David Rowley <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: John Mikk @ 2026-04-16 20:35 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: [email protected]

Dear David, thank you for the detailed response.

I understand your concerns, so I have rethought my approach a bit and
would like to discuss,
in general terms, the concept that I would try to implement.

The ability to define a B-tree operator class (opclass) in the clause:
`PARTITION BY RANGE ( { column_name | ( expression ) } [ opclass ] [, ...] )`
allows partitioning over a fairly broad class of sets with a defined
order relation.
For example, one can obtain an elegant example using an extension
for ordinary fractions (p/q) represented as `row(p,q)` with a natural
B-tree operator class for the order relation of ordinary fractions:

```sql
drop table if exists axis cascade;

create table axis (
    id serial,
    key fraction,
    label text
) partition by range (key fraction_ops);

create table segment_1 partition of axis for values from
((0,1)::fraction) to ((1,3)::fraction);
create table segment_2 partition of axis for values from
((2,5)::fraction) to ((4,5)::fraction);
create table segment_3 partition of axis for values from
((15,45)::fraction) to ((2,5)::fraction);
-- segment_1,2,3 : [0, 1/3], [2/5, 4/5], [1/3, 2/5], where 15/45 == 1/3

insert into axis(key,label) select (1,5)::fraction, '1/5';
-- insert to segment_1
insert into axis(key,label) select (1,2)::fraction, '1/2';
-- insert to segment_2
insert into axis(key,label) select (1,3)::fraction, '1/3';
-- insert to segment_3
```

However, for multidimensional data structures where one desires
a multidimensional partitioning key using a B-tree, the necessary
ordering cannot be established.
It is easy to prove that when attempting to introduce
the concept of "to the left" (less than) / "to the right" (greater
than) for rectangles on a plane,
the transitivity of such a relation is violated.

To achieve the intended goal,
it would likely be necessary to use the GiST access method.
According to the documentation, however,
only B-tree is applicable when defining an operator class for a range
partitioning key.

**Proposal:** Make GiST available for partitioning in the `opclass`
clause of `PARTITION BY RANGE`.

John.


On Tue, Apr 14, 2026 at 8:19 AM David Rowley <[email protected]> wrote:

> On Tue, 14 Apr 2026 at 09:11, John Mikk <[email protected]> wrote:
> ...
> You can't change how RANGE partitioning works and not break things for
> everyone using RANGE partitioning when they upgrade. If your patch is
> proposing that, then it's going to fail. If you're proposing a new
> partitioning method, then that's different. It's still a hefty amount
> of work. If you're proposing that then do a detailed proposal here
> before doing too much work. Remember that with declarative
> partitioning, there can be only (at most) a single partition for any
> given tuple. The tuple routing done during INSERT and UPDATE requires
> that. Finding the correct partition must also be fast as INSERT/UPDATE
> performance needs to run that code for every affected tuple.
>
> David
>
> [1] https://www.postgresql.org/docs/current/sql-createtable.html
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: POC: Comparison of partitioning key values
@ 2026-04-20 04:49  Amit Langote <[email protected]>
  parent: John Mikk <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Amit Langote @ 2026-04-20 04:49 UTC (permalink / raw)
  To: John Mikk <[email protected]>; +Cc: David Rowley <[email protected]>; [email protected]

Hi John,

On Fri, Apr 17, 2026 at 5:35 AM John Mikk <[email protected]> wrote:
>
> Dear David, thank you for the detailed response.
>
> I understand your concerns, so I have rethought my approach a bit and would like to discuss,
> in general terms, the concept that I would try to implement.
>
> The ability to define a B-tree operator class (opclass) in the clause:
> `PARTITION BY RANGE ( { column_name | ( expression ) } [ opclass ] [, ...] )`
> allows partitioning over a fairly broad class of sets with a defined order relation.
> For example, one can obtain an elegant example using an extension
> for ordinary fractions (p/q) represented as `row(p,q)` with a natural
> B-tree operator class for the order relation of ordinary fractions:
>
> ```sql
> drop table if exists axis cascade;
>
> create table axis (
>     id serial,
>     key fraction,
>     label text
> ) partition by range (key fraction_ops);
>
> create table segment_1 partition of axis for values from ((0,1)::fraction) to ((1,3)::fraction);
> create table segment_2 partition of axis for values from ((2,5)::fraction) to ((4,5)::fraction);
> create table segment_3 partition of axis for values from ((15,45)::fraction) to ((2,5)::fraction);
> -- segment_1,2,3 : [0, 1/3], [2/5, 4/5], [1/3, 2/5], where 15/45 == 1/3
>
> insert into axis(key,label) select (1,5)::fraction, '1/5';
> -- insert to segment_1
> insert into axis(key,label) select (1,2)::fraction, '1/2';
> -- insert to segment_2
> insert into axis(key,label) select (1,3)::fraction, '1/3';
> -- insert to segment_3
> ```
>
> However, for multidimensional data structures where one desires
> a multidimensional partitioning key using a B-tree, the necessary ordering cannot be established.
> It is easy to prove that when attempting to introduce
> the concept of "to the left" (less than) / "to the right" (greater than) for rectangles on a plane,
> the transitivity of such a relation is violated.
>
> To achieve the intended goal,
> it would likely be necessary to use the GiST access method.
> According to the documentation, however,
> only B-tree is applicable when defining an operator class for a range partitioning key.
>
> **Proposal:** Make GiST available for partitioning in the `opclass` clause of `PARTITION BY RANGE`.

Thanks for the follow-up and the fraction example. The fraction case
already works under RANGE today, as you show, and nothing needs to
change there, or at least that's how I read that part.

I agree with David that the grid case wants a new strategy rather than
a reinterpretation of RANGE, and I'd push back on making GiST
available in RANGE's opclass slot specifically.  The GiST opclasses
you'd actually want for this use case describe overlap and
containment, not order, so applying the RANGE machinery here seems
wrong to me. PARTITION BY RANGE has been around for about ~9 years and
most users understand what it means, so making it accept GiST
opclasses would be more confusing than helpful. But the fact that you
reached for GiST tells me the grid case wants something spatial, and
I've sometimes wondered whether something along the lines of PARTITION
BY BOX for rectangular partitioning could work as a new strategy. The
per-dimension non-overlap check you describe would cover partition
creation. The same logic applies to tuple routing and pruning, though
the algorithms and data structures would require careful design to
scale with many partitions.

Part of what got me thinking about BOX partitioning is the growth of
pgvector, where smaller indexes per partition could help when index
design hits scaling limits. Whether the decompositions there would
actually look like boxes is another question that I haven't studied
very deeply, but it's one more reason to think about spatial
partitioning strategies.

-- 
Thanks, Amit Langote





^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-04-20 04:49 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 21:11 POC: Comparison of partitioning key values John Mikk <[email protected]>
2026-04-14 05:19 ` David Rowley <[email protected]>
2026-04-16 20:35   ` John Mikk <[email protected]>
2026-04-20 04:49     ` Amit Langote <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox