Implement missing join selectivity estimation for range types

public inbox for [email protected]  
help / color / mirror / Atom feed

Implement missing join selectivity estimation for range types
16+ messages / 6 participants
[nested] [flat]

* Implement missing join selectivity estimation for range types
@ 2022-06-30 14:31 Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Mahmoud Sakr @ 2022-06-30 14:31 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: SCHOEMANS Maxime <[email protected]>; Diogo Repas <[email protected]>; Luo Zhicheng <[email protected]>; Andrey Lepikhov <[email protected]>

Hi,
Given a query:
SELECT * FROM t1, t2 WHERE t1.r << t2.r
where t1.r, t2.r are of range type,
currently PostgreSQL will estimate a constant selectivity for the << predicate,
which is equal to 0.005, not utilizing the statistics that the optimizer
collects for range attributes.

We have worked out a theory for inequality join selectivity estimation
(http://arxiv.org/abs/2206.07396), and implemented it for range
types it in this patch.

The algorithm in this patch re-uses the currently collected statistics for
range types, which is the bounds histogram. It works fairly accurate for the
operations <<, >>, &&, &<, &>, <=, >= with estimation error of about 0.5%.
The patch also implements selectivity estimation for the
operations @>, <@ (contains and is contained in), but their accuracy is not
stable, since the bounds histograms assume independence between the range
bounds. A point to discuss is whether or not to keep these last two operations.
The patch also includes the selectivity estimation for multirange types,
treating a multirange as a single range which is its bounding box.

The same algorithm in this patch is applicable to inequality joins of scalar
types. We, however, don't implement it for scalars, since more work is needed
to make use of the other statistics available for scalars, such as the MCV.
This is left as a future work.

--
Mahmoud SAKR - Univeristé Libre de Bruxelles
This work is done by Diogo Repas, Zhicheng Luo, Maxime Schoemans, and myself


Attachments:

  [text/x-patch] v1-0001-Join-Selectivity-Estimation-for-Range-types.patch (74.1K, 2-v1-0001-Join-Selectivity-Estimation-for-Range-types.patch)
  download | inline diff:
diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 919c8889d4..7ba4aa8b04 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1335,3 +1335,511 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * This function is a copy of the function with the same name in
+ * rangetypes_selfuncs.c, with the only difference that the types are
+ * multiranges
+ *
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity,
+				cur_sel1,
+				cur_sel2,
+				prev_sel1,
+				prev_sel2;
+	RangeBound	cur_sync;
+
+	/*
+	 * Histograms will never be empty. In fact, a histogram will never have
+	 * less than 2 values (1 bin)
+	 */
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/* Fast-forwards i and j to start of iteration */
+	for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+	for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+	if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+		cur_sync = hist1[i++];
+	else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+		cur_sync = hist2[j++];
+	else
+	{
+		/* If equal, skip one */
+		cur_sync = hist1[i];
+		i++;
+		j++;
+	}
+	prev_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+											 hist1, nhist1, false);
+	prev_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+											 hist2, nhist2, false);
+
+	/*
+	 * Do the estimation on overlapping region
+	 */
+	selectivity = 0.0;
+	while (i < nhist1 && j < nhist2)
+	{
+		if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* If equal, skip one */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		/* Prepare for the next iteration */
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* Include remainder of hist2 if any */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity / 2;
+}
+
+/*
+ * multirangejoinsel -- join cardinality for multirange operators
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1,
+				vardata2;
+	AttStatsSlot hist1,
+				hist2,
+				sslot;
+	bool		reversed;
+	Selectivity selec;
+	TypeCacheEntry *typcache = NULL,
+			   *rng_typcache = NULL;
+	Form_pg_statistic stats1,
+				stats2;
+	double		empty_frac1,
+				empty_frac2,
+				null_frac1,
+				null_frac2;
+	int			nhist1,
+				nhist2;
+	RangeBound *hist1_lower,
+			   *hist1_upper,
+			   *hist2_lower,
+			   *hist2_upper;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2, &reversed);
+
+	selec = default_multirange_selectivity(operator);
+
+	/* get multirange type cache */
+	if (type_is_multirange(vardata1.vartype))
+		typcache = multirange_get_typcache(fcinfo, vardata1.vartype);
+	else if (type_is_multirange(vardata2.vartype))
+		typcache = multirange_get_typcache(fcinfo, vardata2.vartype);
+
+	if (HeapTupleIsValid(vardata1.statsTuple) &&
+		get_attstatsslot(&hist1, vardata1.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		HeapTupleIsValid(vardata2.statsTuple) &&
+		get_attstatsslot(&hist2, vardata2.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		typcache)
+	{
+
+		/* Initialize underlying range type cache */
+		rng_typcache = typcache->rngtype;
+
+		/*
+		 * First look up the fraction of NULLs and empty ranges from
+		 * pg_statistic.
+		 */
+		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+		null_frac1 = stats1->stanullfrac;
+		null_frac2 = stats2->stanullfrac;
+
+		/* Try to get fraction of empty ranges for the first variable */
+		if (get_attstatsslot(&sslot, vardata1.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac1 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac1 = 0.0;
+		}
+
+		/* Try to get fraction of empty ranges for the second variable */
+		if (get_attstatsslot(&sslot, vardata2.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac2 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac2 = 0.0;
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the first variable.
+		 */
+		nhist1 = hist1.nvalues;
+		hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		for (i = 0; i < nhist1; i++)
+		{
+			range_deserialize(rng_typcache, DatumGetRangeTypeP(hist1.values[i]),
+							  &hist1_lower[i], &hist1_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the second variable.
+		 */
+		nhist2 = hist2.nvalues;
+		hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		for (i = 0; i < nhist2; i++)
+		{
+			range_deserialize(rng_typcache, DatumGetRangeTypeP(hist2.values[i]),
+							  &hist2_lower[i], &hist2_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		switch (operator)
+		{
+			case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+			case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+
+				/*
+				 * Selectivity of A && B = Selectivity of NOT( A << B || A >>
+				 * B ) = 1 - Selectivity of (A.upper < B.lower) - Selectivity
+				 * of (B.upper < A.lower)
+				 */
+				selec = 1;
+				selec -= calc_hist_join_selectivity(rng_typcache,
+													hist1_upper, nhist1,
+													hist2_lower, nhist2);
+				selec -= calc_hist_join_selectivity(rng_typcache,
+													hist2_upper, nhist2,
+													hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LESS_EQUAL_OP:
+
+				/*
+				 * A <= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_GREATER_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to subtract P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LESS_OP:
+
+				/*
+				 * A < B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_lower, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * A >= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_LESS_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_OP:
+
+				/*
+				 * A > B == B < A
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_RANGE_OP:
+			case OID_RANGE_LEFT_MULTIRANGE_OP:
+				/* var1 << var2 when upper(var1) < lower(var2) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_upper, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_RANGE_OP:
+			case OID_RANGE_RIGHT_MULTIRANGE_OP:
+				/* var1 >> var2 when upper(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_upper, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+				/* var1 &< var2 when upper(var1) < upper(var2) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_upper, nhist1,
+												   hist2_upper, nhist2);
+				break;
+
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+				/* var1 &> var2 when lower(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP:
+			case OID_MULTIRANGE_RANGE_CONTAINED_OP:
+			case OID_RANGE_MULTIRANGE_CONTAINED_OP:
+
+				/*
+				 * var1 <@ var2 is equivalent to lower(var2) <= lower(var1)
+				 * and upper(var1) <= upper(var2)
+				 *
+				 * After negating both sides we get not( lower(var1) <
+				 * lower(var2) ) and not( upper(var2) < upper(var1) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				selec *= 1 - calc_hist_join_selectivity(rng_typcache,
+														hist2_upper, nhist2,
+														hist1_upper, nhist1);
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_CONTAINS_RANGE_OP:
+			case OID_RANGE_CONTAINS_MULTIRANGE_OP:
+
+				/*
+				 * var1 @> var2 is equivalent to lower(var1) <= lower(var2)
+				 * and upper(var2) <= upper(var1)
+				 *
+				 * After negating both sides we get not( lower(var2) <
+				 * lower(var1) ) and not( upper(var1) < upper(var2) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				selec *= 1 - calc_hist_join_selectivity(rng_typcache,
+														hist1_upper, nhist1,
+														hist2_upper, nhist2);
+				break;
+
+			case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_RANGE_OP:
+			case OID_RANGE_ADJACENT_MULTIRANGE_OP:
+
+				/*
+				 * just punt for now, estimation would require equality
+				 * selectivity for bounds
+				 */
+			case OID_MULTIRANGE_CONTAINS_ELEM_OP:
+			case OID_MULTIRANGE_ELEM_CONTAINED_OP:
+
+				/*
+				 * just punt for now, estimation would require extraction of
+				 * histograms for the anyelement
+				 */
+			default:
+				break;
+		}
+
+
+		/* the calculated selectivity only applies to non-empty (multi)ranges */
+		selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+		/*
+		 * Depending on the operator, empty (multi)ranges might match
+		 * different fractions of the result.
+		 */
+		switch (operator)
+		{
+			case OID_MULTIRANGE_LESS_OP:
+
+				/*
+				 * empty (multi)range < non-empty (multi)range
+				 */
+				selec += empty_frac1 * (1 - empty_frac2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_OP:
+
+				/*
+				 * non-empty (multi)range > empty (multi)range
+				 */
+				selec += (1 - empty_frac1) * empty_frac2;
+				break;
+
+			case OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP:
+			case OID_MULTIRANGE_RANGE_CONTAINED_OP:
+			case OID_RANGE_MULTIRANGE_CONTAINED_OP:
+
+				/*
+				 * empty (multi)range <@ any (multi)range
+				 */
+			case OID_MULTIRANGE_LESS_EQUAL_OP:
+
+				/*
+				 * empty (multi)range <= any (multi)range
+				 */
+				selec += empty_frac1;
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_CONTAINS_RANGE_OP:
+			case OID_RANGE_CONTAINS_MULTIRANGE_OP:
+
+				/*
+				 * any (multi)range @> empty (multi)range
+				 */
+			case OID_MULTIRANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * any (multi)range >= empty (multi)range
+				 */
+				selec += empty_frac2;
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_ELEM_OP:
+			case OID_MULTIRANGE_ELEM_CONTAINED_OP:
+			case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+			case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_RANGE_OP:
+			case OID_RANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_RANGE_OP:
+			case OID_RANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_RANGE_OP:
+			case OID_RANGE_ADJACENT_MULTIRANGE_OP:
+			default:
+
+				/*
+				 * these operators always return false when an empty
+				 * (multi)range is involved
+				 */
+				break;
+
+		}
+
+		/* all range operators are strict */
+		selec *= (1 - null_frac1) * (1 - null_frac2);
+
+		free_attstatsslot(&hist1);
+		free_attstatsslot(&hist2);
+	}
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+
+}
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index c2795f4593..007e14bcf6 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,509 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * This is a utility function used to estimate the join selectivity of
+ * range attributes using rangebound histogram statistics as described
+ * in this paper:
+ *
+ * Diogo Repas, Zhicheng Luo, Maxime Schoemans and Mahmoud Sakr, 2022
+ * Selectivity Estimation of Inequality Joins In Databases
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * The attributes being joined will be treated as random variables
+ * that follow a distribution modeled by a Probability Density Function (PDF).
+ * Let the two attributes be denoted X, Y.
+ * This function finds the probability P(X < Y).
+ * Note that the PDFs of the two variables can easily be obtained
+ * from their bounds histogram, respectively hist1 and hist2 .
+ *
+ * Let the PDF of X, Y be denoted as f_X, f_Y.
+ * The probability P(X < Y) can be formalized as follows:
+ * P(X < Y)= integral_-inf^inf( integral_-inf^y ( f_X(x) * f_Y(y) dx dy ) )
+ *                = integral_-inf^inf( F_X(y) * f_Y(y) dy )
+ * where F_X(y) denote the Cumulative Distribution Function of X at y.
+ * Note that F_X is the selectivity estimation (non-join),
+ * which is implemented using the function calc_hist_selectivity_scalar.
+ *
+ * Now given the histograms of the two attributes X, Y, we note the following:
+ * - The PDF of Y is a step function
+ *	(constant piece-wise, where each piece is defined in a bin of Y's histogram)
+ * - The CDF of X is linear piece-wise
+ * 	(each piece is defined in a bin of X's histogram)
+ * This leads to the conclusion that their product
+ * (used to calculate the equation above) is also linear piece-wise.
+ * A new piece starts whenever either the bin of X or the bin of Y changes.
+ * By parallel scanning the two rangebound histograms of X and Y,
+ * we evaluate one piece of the result between every two consecutive rangebounds
+ * in the union of the two histograms.
+ *
+ * Given that the product F_X * f_y is linear in the interval
+ * between every two consecutive rangebounds, let them be denoted prev, cur,
+ * it can be shown that the above formula can be discretized into the following:
+ * P(X < Y) = 
+ *   0.5 * sum_0^{n+m-1} ( ( F_X(prev) + F_X(cur) ) * ( F_Y(cur) - F_Y(prev) ) )
+ * where n, m are the lengths of the two histograms.
+ *
+ * As such, it is possible to fully compute the join selectivity
+ * as a summation of CDFs, iterating over the bounds of the two histograms.
+ * This maximizes the code reuse, since the CDF is computed using
+ * the calc_hist_selectivity_scalar function, which is the function used
+ * for selectivity estimation (non-joins).
+ *
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity,
+				cur_sel1,
+				cur_sel2,
+				prev_sel1,
+				prev_sel2;
+	RangeBound	cur_sync;
+
+	/*
+	 * Histograms will never be empty. In fact, a histogram will never have
+	 * less than 2 values (1 bin)
+	 */
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/* Fast-forwards i and j to start of iteration */
+	for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+	for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+	if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+		cur_sync = hist1[i++];
+	else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+		cur_sync = hist2[j++];
+	else
+	{
+		/* If equal, skip one */
+		cur_sync = hist1[i];
+		i++;
+		j++;
+	}
+	prev_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+											 hist1, nhist1, false);
+	prev_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+											 hist2, nhist2, false);
+
+	/*
+	 * Do the estimation on overlapping region
+	 */
+	selectivity = 0.0;
+	while (i < nhist1 && j < nhist2)
+	{
+		if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* If equal, skip one */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		/* Prepare for the next iteration */
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* Include remainder of hist2 if any */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity / 2;
+}
+
+/*
+ * rangejoinsel -- join cardinality for range operators
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1,
+				vardata2;
+	AttStatsSlot hist1,
+				hist2,
+				sslot;
+	bool		reversed;
+	Selectivity selec;
+	TypeCacheEntry *typcache = NULL;
+	Form_pg_statistic stats1,
+				stats2;
+	double		empty_frac1,
+				empty_frac2,
+				null_frac1,
+				null_frac2;
+	int			nhist1,
+				nhist2;
+	RangeBound *hist1_lower,
+			   *hist1_upper,
+			   *hist2_lower,
+			   *hist2_upper;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2, &reversed);
+
+	selec = default_range_selectivity(operator);
+
+	if (HeapTupleIsValid(vardata1.statsTuple) &&
+		get_attstatsslot(&hist1, vardata1.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		HeapTupleIsValid(vardata2.statsTuple) &&
+		get_attstatsslot(&hist2, vardata2.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		vardata1.vartype == vardata2.vartype)
+	{
+
+		/* Initialize type cache */
+		typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+		/*
+		 * First look up the fraction of NULLs and empty ranges from
+		 * pg_statistic.
+		 */
+		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+		null_frac1 = stats1->stanullfrac;
+		null_frac2 = stats2->stanullfrac;
+
+		/* Try to get fraction of empty ranges for the first variable */
+		if (get_attstatsslot(&sslot, vardata1.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac1 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac1 = 0.0;
+		}
+
+		/* Try to get fraction of empty ranges for the second variable */
+		if (get_attstatsslot(&sslot, vardata2.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac2 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac2 = 0.0;
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the first variable.
+		 */
+		nhist1 = hist1.nvalues;
+		hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		for (i = 0; i < nhist1; i++)
+		{
+			range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+							  &hist1_lower[i], &hist1_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the second variable.
+		 */
+		nhist2 = hist2.nvalues;
+		hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		for (i = 0; i < nhist2; i++)
+		{
+			range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+							  &hist2_lower[i], &hist2_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		switch (operator)
+		{
+			case OID_RANGE_OVERLAP_OP:
+
+				/*
+				 * Selectivity of A && B = Selectivity of NOT( A << B || A >>
+				 * B ) = 1 - Selectivity of (A.upper < B.lower) - Selectivity
+				 * of (B.upper < A.lower)
+				 */
+				selec = 1;
+				selec -= calc_hist_join_selectivity(typcache,
+													hist1_upper, nhist1,
+													hist2_lower, nhist2);
+				selec -= calc_hist_join_selectivity(typcache,
+													hist2_upper, nhist2,
+													hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LESS_EQUAL_OP:
+
+				/*
+				 * A <= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_GREATER_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to subtract P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LESS_OP:
+
+				/*
+				 * A < B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_lower, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * A >= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_LESS_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_GREATER_OP:
+
+				/*
+				 * A > B == B < A
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LEFT_OP:
+				/* var1 << var2 when upper(var1) < lower(var2) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_upper, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_RIGHT_OP:
+				/* var1 >> var2 when upper(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_upper, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_OVERLAPS_LEFT_OP:
+				/* var1 &< var2 when upper(var1) < upper(var2) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_upper, nhist1,
+												   hist2_upper, nhist2);
+				break;
+
+			case OID_RANGE_OVERLAPS_RIGHT_OP:
+				/* var1 &> var2 when lower(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_CONTAINED_OP:
+
+				/*
+				 * var1 <@ var2 is equivalent to lower(var2) <= lower(var1)
+				 * and upper(var1) <= upper(var2)
+				 *
+				 * After negating both sides we get not( lower(var1) <
+				 * lower(var2) ) and not( upper(var2) < upper(var1) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				selec *= 1 - calc_hist_join_selectivity(typcache,
+														hist2_upper, nhist2,
+														hist1_upper, nhist1);
+				break;
+
+			case OID_RANGE_CONTAINS_OP:
+
+				/*
+				 * var1 @> var2 is equivalent to lower(var1) <= lower(var2)
+				 * and upper(var2) <= upper(var1)
+				 *
+				 * After negating both sides we get not( lower(var2) <
+				 * lower(var1) ) and not( upper(var1) < upper(var2) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				selec *= 1 - calc_hist_join_selectivity(typcache,
+														hist1_upper, nhist1,
+														hist2_upper, nhist2);
+				break;
+
+			case OID_RANGE_CONTAINS_ELEM_OP:
+			case OID_RANGE_ELEM_CONTAINED_OP:
+
+				/*
+				 * just punt for now, estimation would require extraction of
+				 * histograms for the anyelement
+				 */
+			default:
+				break;
+		}
+
+
+		/* the calculated selectivity only applies to non-empty ranges */
+		selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+		/*
+		 * Depending on the operator, empty ranges might match different
+		 * fractions of the result.
+		 */
+		switch (operator)
+		{
+			case OID_RANGE_LESS_OP:
+
+				/*
+				 * empty range < non-empty range
+				 */
+				selec += empty_frac1 * (1 - empty_frac2);
+				break;
+
+			case OID_RANGE_GREATER_OP:
+
+				/*
+				 * non-empty range > empty range
+				 */
+				selec += (1 - empty_frac1) * empty_frac2;
+				break;
+
+			case OID_RANGE_CONTAINED_OP:
+
+				/*
+				 * empty range <@ any range
+				 */
+			case OID_RANGE_LESS_EQUAL_OP:
+
+				/*
+				 * empty range <= any range
+				 */
+				selec += empty_frac1;
+				break;
+
+			case OID_RANGE_CONTAINS_OP:
+
+				/*
+				 * any range @> empty range
+				 */
+			case OID_RANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * any range >= empty range
+				 */
+				selec += empty_frac2;
+				break;
+
+			case OID_RANGE_CONTAINS_ELEM_OP:
+			case OID_RANGE_ELEM_CONTAINED_OP:
+			case OID_RANGE_OVERLAP_OP:
+			case OID_RANGE_OVERLAPS_LEFT_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_OP:
+			case OID_RANGE_LEFT_OP:
+			case OID_RANGE_RIGHT_OP:
+			default:
+
+				/*
+				 * these operators always return false when an empty range is
+				 * involved
+				 */
+				break;
+
+		}
+
+		/* all range operators are strict */
+		selec *= (1 - null_frac1) * (1 - null_frac2);
+
+		free_attstatsslot(&hist1);
+		free_attstatsslot(&hist2);
+	}
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index bc5f8213f3..b63a7e15af 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3071,78 +3071,78 @@
   oprname => '<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>(anyrange,anyrange)',
   oprnegate => '>=(anyrange,anyrange)', oprcode => 'range_lt',
-  oprrest => 'rangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3885', oid_symbol => 'OID_RANGE_LESS_EQUAL_OP',
   descr => 'less than or equal',
   oprname => '<=', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>=(anyrange,anyrange)',
   oprnegate => '>(anyrange,anyrange)', oprcode => 'range_le',
-  oprrest => 'rangesel', oprjoin => 'scalarlejoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3886', oid_symbol => 'OID_RANGE_GREATER_EQUAL_OP',
   descr => 'greater than or equal',
   oprname => '>=', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<=(anyrange,anyrange)',
   oprnegate => '<(anyrange,anyrange)', oprcode => 'range_ge',
-  oprrest => 'rangesel', oprjoin => 'scalargejoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3887', oid_symbol => 'OID_RANGE_GREATER_OP',
   descr => 'greater than',
   oprname => '>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<(anyrange,anyrange)',
   oprnegate => '<=(anyrange,anyrange)', oprcode => 'range_gt',
-  oprrest => 'rangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3888', oid_symbol => 'OID_RANGE_OVERLAP_OP', descr => 'overlaps',
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
   oprresult => 'bool', oprcom => '<@(anyelement,anyrange)',
   oprcode => 'range_contains_elem', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3890', oid_symbol => 'OID_RANGE_CONTAINS_OP', descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<@(anyrange,anyrange)',
   oprcode => 'range_contains', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3891', oid_symbol => 'OID_RANGE_ELEM_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyelement', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anyelement)',
   oprcode => 'elem_contained_by_range', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3892', oid_symbol => 'OID_RANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anyrange)',
   oprcode => 'range_contained_by', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3893', oid_symbol => 'OID_RANGE_LEFT_OP', descr => 'is left of',
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'range_overleft', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3896', oid_symbol => 'OID_RANGE_OVERLAPS_RIGHT_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'range_overright', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3897', descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anyrange)',
   oprcode => 'range_adjacent', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3898', descr => 'range union',
   oprname => '+', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'anyrange', oprcom => '+(anyrange,anyrange)',
@@ -3277,139 +3277,139 @@
   oprname => '<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>(anymultirange,anymultirange)',
   oprnegate => '>=(anymultirange,anymultirange)', oprcode => 'multirange_lt',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2863', oid_symbol => 'OID_MULTIRANGE_LESS_EQUAL_OP',
   descr => 'less than or equal',
   oprname => '<=', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>=(anymultirange,anymultirange)',
   oprnegate => '>(anymultirange,anymultirange)', oprcode => 'multirange_le',
-  oprrest => 'multirangesel', oprjoin => 'scalarlejoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2864', oid_symbol => 'OID_MULTIRANGE_GREATER_EQUAL_OP',
   descr => 'greater than or equal',
   oprname => '>=', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<=(anymultirange,anymultirange)',
   oprnegate => '<(anymultirange,anymultirange)', oprcode => 'multirange_ge',
-  oprrest => 'multirangesel', oprjoin => 'scalargejoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2865', oid_symbol => 'OID_MULTIRANGE_GREATER_OP',
   descr => 'greater than',
   oprname => '>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<(anymultirange,anymultirange)',
   oprnegate => '<=(anymultirange,anymultirange)', oprcode => 'multirange_gt',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2866', oid_symbol => 'OID_RANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anyrange)',
   oprcode => 'range_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2867', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anymultirange)',
   oprcode => 'multirange_overlaps_range', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2868', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anymultirange)',
   oprcode => 'multirange_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2869', oid_symbol => 'OID_MULTIRANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyelement',
   oprresult => 'bool', oprcom => '<@(anyelement,anymultirange)',
   oprcode => 'multirange_contains_elem', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2870', oid_symbol => 'OID_MULTIRANGE_CONTAINS_RANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<@(anyrange,anymultirange)',
   oprcode => 'multirange_contains_range', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2871', oid_symbol => 'OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<@(anymultirange,anymultirange)',
   oprcode => 'multirange_contains_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2872', oid_symbol => 'OID_MULTIRANGE_ELEM_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyelement', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anyelement)',
   oprcode => 'elem_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2873', oid_symbol => 'OID_MULTIRANGE_RANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anyrange)',
   oprcode => 'range_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2874', oid_symbol => 'OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anymultirange)',
   oprcode => 'multirange_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4539', oid_symbol => 'OID_RANGE_CONTAINS_MULTIRANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<@(anymultirange,anyrange)',
   oprcode => 'range_contains_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4540', oid_symbol => 'OID_RANGE_MULTIRANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anymultirange)',
   oprcode => 'multirange_contained_by_range', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2875', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'range_overleft_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2876', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'multirange_overleft_range',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2877', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'multirange_overleft_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '3585', oid_symbol => 'OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'range_overright_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4035', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'multirange_overright_range',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4142', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'multirange_overright_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4179', oid_symbol => 'OID_RANGE_ADJACENT_MULTIRANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '-|-(anymultirange,anyrange)',
   oprcode => 'range_adjacent_multirange', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4180', oid_symbol => 'OID_MULTIRANGE_ADJACENT_RANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anymultirange)',
   oprcode => 'multirange_adjacent_range', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4198', oid_symbol => 'OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '-|-(anymultirange,anymultirange)',
   oprcode => 'multirange_adjacent_multirange', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4392', descr => 'multirange union',
   oprname => '+', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'anymultirange', oprcom => '+(anymultirange,anymultirange)',
@@ -3426,36 +3426,36 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anyrange)',
   oprcode => 'range_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4396', oid_symbol => 'OID_MULTIRANGE_LEFT_RANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anymultirange)',
   oprcode => 'multirange_before_range', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4397', oid_symbol => 'OID_MULTIRANGE_LEFT_MULTIRANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anymultirange)',
   oprcode => 'multirange_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4398', oid_symbol => 'OID_RANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anyrange)',
   oprcode => 'range_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4399', oid_symbol => 'OID_MULTIRANGE_RIGHT_RANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anymultirange)',
   oprcode => 'multirange_after_range', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4400', oid_symbol => 'OID_MULTIRANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anymultirange)',
   oprcode => 'multirange_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a33..c1d4119684 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11885,4 +11885,12 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
+{ oid => '8356', descr => 'join selectivity for multirange operators',
+  proname => 'multirangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'multirangejoinsel' },
 ]
diff --git a/src/test/regress/expected/multirangetypes.out b/src/test/regress/expected/multirangetypes.out
index ac2eb84c3a..b0eeb672f0 100644
--- a/src/test/regress/expected/multirangetypes.out
+++ b/src/test/regress/expected/multirangetypes.out
@@ -3330,3 +3330,275 @@ create function mr_table_fail(i anyelement) returns table(i anyelement, r anymul
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anymultirange requires at least one input of type anyrange or anymultirange.
+-- test multirange join operators
+create table test_multirange_join_1(mr1 int4multirange);
+create table test_multirange_join_2(mr2 int4multirange);
+create table test_range_join(ir int4range);
+create table test_elem_join(elem int4);
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10),int4range(g+20, g+30),int4range(g+40, g+50)) from generate_series(1,200) g;
+insert into test_multirange_join_1 select '{}'::int4multirange from generate_series(1,50) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10000)) from generate_series(1,100) g;
+insert into test_multirange_join_1 select int4multirange(int4range(NULL, g*10, '(]'), int4range(g*10, g*20, '(]')) from generate_series(1,10) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g*10, g*20, '(]'), int4range(g*20, NULL, '[)')) from generate_series(1,10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10),int4range(g+20, g+30),int4range(g+40, g+50)) from generate_series(1,20) g;
+insert into test_multirange_join_2 select '{}'::int4multirange from generate_series(1,5) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10000)) from generate_series(1,10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(NULL, g*10, '(]'), int4range(g*10, g*20, '(]')) from generate_series(1,10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g*10, g*20, '(]'), int4range(g*20, NULL, '[)')) from generate_series(1,10) g;
+insert into test_range_join select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join select int4range(g, g+10000) from generate_series(1,10) g;
+insert into test_range_join select int4range(NULL,g*10,'(]') from generate_series(1,10) g;
+insert into test_range_join select int4range(g*10,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join select 'empty'::int4range from generate_series(1,20) g;
+insert into test_range_join select NULL from generate_series(1,5) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select g+10000 from generate_series(1,10) g;
+insert into test_elem_join select g*10 from generate_series(1,10) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select NULL from generate_series(1,5) g;
+analyze test_multirange_join_1;
+analyze test_multirange_join_2;
+analyze test_range_join;
+analyze test_elem_join;
+create function check_estimated_rows(text) returns table (estimated int, actual int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain analyze %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
+            return query select tmp[1]::int, tmp[2]::int;
+        end if;
+    end loop;
+end;
+$$;
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 = mr2');
+ estimated | actual 
+-----------+--------
+        55 |    300
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 < mr2');
+ estimated | actual 
+-----------+--------
+      4579 |   4598
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 <= mr2');
+ estimated | actual 
+-----------+--------
+      7309 |   4898
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 > mr2');
+ estimated | actual 
+-----------+--------
+     13041 |  15452
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 >= mr2');
+ estimated | actual 
+-----------+--------
+     15771 |  15752
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 && mr2');
+ estimated | actual 
+-----------+--------
+     11098 |  10932
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 && ir');
+ estimated | actual 
+-----------+--------
+      9611 |   9471
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir && mr2');
+ estimated | actual 
+-----------+--------
+      2924 |   2851
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 <@ mr2');
+ estimated | actual 
+-----------+--------
+      8491 |   7393
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 <@ ir');
+ estimated | actual 
+-----------+--------
+      9754 |   8621
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir <@ mr2');
+ estimated | actual 
+-----------+--------
+      2663 |   1987
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 @> mr2');
+ estimated | actual 
+-----------+--------
+      5022 |   2361
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 @> ir');
+ estimated | actual 
+-----------+--------
+     12473 |   8397
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir @> mr2');
+ estimated | actual 
+-----------+--------
+      1177 |    800
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 << mr2');
+ estimated | actual 
+-----------+--------
+       152 |    181
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 << ir');
+ estimated | actual 
+-----------+--------
+       145 |    170
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir << mr2');
+ estimated | actual 
+-----------+--------
+       478 |    519
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 >> mr2');
+ estimated | actual 
+-----------+--------
+      4750 |   4837
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 >> ir');
+ estimated | actual 
+-----------+--------
+     12644 |  12739
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir >> mr2');
+ estimated | actual 
+-----------+--------
+        98 |    110
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 &< mr2');
+ estimated | actual 
+-----------+--------
+      4868 |   6318
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 &< ir');
+ estimated | actual 
+-----------+--------
+      4120 |   5556
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir &< mr2');
+ estimated | actual 
+-----------+--------
+      1986 |   2627
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 &> mr2');
+ estimated | actual 
+-----------+--------
+     11441 |  13976
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 &> ir');
+ estimated | actual 
+-----------+--------
+     16184 |  19807
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir &> mr2');
+ estimated | actual 
+-----------+--------
+      1819 |   1895
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 -|- mr2');
+ estimated | actual 
+-----------+--------
+       160 |     71
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 -|- ir');
+ estimated | actual 
+-----------+--------
+       224 |    118
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir -|- mr2');
+ estimated | actual 
+-----------+--------
+        35 |     37
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_elem_join, test_multirange_join_1 where elem <@ mr1');
+ estimated | actual 
+-----------+--------
+       120 |   3110
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_elem_join where mr1 @> elem');
+ estimated | actual 
+-----------+--------
+       120 |   3110
+(1 row)
+
+drop function check_estimated_rows;
+drop table test_multirange_join_1;
+drop table test_multirange_join_2;
+drop table test_range_join;
+drop table test_elem_join;
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index 04ccd5d451..10a76dec7a 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -1767,3 +1767,157 @@ create function table_fail(i anyelement) returns table(i anyelement, r anyrange)
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anyrange requires at least one input of type anyrange or anymultirange.
+-- test range join operators
+create table test_range_join_1(ir1 int4range);
+create table test_range_join_2(ir2 int4range);
+create table test_elem_join(elem int4);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1,200) g;
+insert into test_range_join_1 select int4range(g, g+10000) from generate_series(1,100) g;
+insert into test_range_join_1 select int4range(NULL,g*10,'(]') from generate_series(1,10) g;
+insert into test_range_join_1 select int4range(g*10,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1,200) g;
+insert into test_range_join_1 select 'empty'::int4range from generate_series(1,20) g;
+insert into test_range_join_1 select NULL from generate_series(1,50) g;
+insert into test_range_join_2 select int4range(g+10, g+20) from generate_series(1,20) g;
+insert into test_range_join_2 select int4range(g+5000, g+15000) from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(NULL,g*5,'(]') from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(g*5,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join_2 select 'empty'::int4range from generate_series(1,5) g;
+insert into test_range_join_2 select NULL from generate_series(1,5) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select g+10000 from generate_series(1,10) g;
+insert into test_elem_join select g*10 from generate_series(1,10) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select NULL from generate_series(1,5) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_elem_join;
+create function check_estimated_rows(text) returns table (estimated int, actual int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain analyze %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
+            return query select tmp[1]::int, tmp[2]::int;
+        end if;
+    end loop;
+end;
+$$;
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 = ir2');
+ estimated | actual 
+-----------+--------
+        75 |    190
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 < ir2');
+ estimated | actual 
+-----------+--------
+      7256 |   9745
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 <= ir2');
+ estimated | actual 
+-----------+--------
+      9986 |   9935
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 > ir2');
+ estimated | actual 
+-----------+--------
+     30514 |  30565
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 >= ir2');
+ estimated | actual 
+-----------+--------
+     33244 |  30755
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 && ir2');
+ estimated | actual 
+-----------+--------
+      9966 |   9720
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 <@ ir2');
+ estimated | actual 
+-----------+--------
+     11868 |   6268
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 @> ir2');
+ estimated | actual 
+-----------+--------
+      8933 |   3973
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 << ir2');
+ estimated | actual 
+-----------+--------
+      5034 |   5050
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 >> ir2');
+ estimated | actual 
+-----------+--------
+     21400 |  21630
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 &< ir2');
+ estimated | actual 
+-----------+--------
+      9665 |  12023
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 &> ir2');
+ estimated | actual 
+-----------+--------
+     27914 |  28105
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 -|- ir2');
+ estimated | actual 
+-----------+--------
+       364 |    233
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_elem_join, test_range_join_1 where elem <@ ir1');
+ estimated | actual 
+-----------+--------
+       192 |   3349
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_elem_join where ir1 @> elem');
+ estimated | actual 
+-----------+--------
+       192 |   3349
+(1 row)
+
+drop function check_estimated_rows;
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_elem_join;
diff --git a/src/test/regress/sql/multirangetypes.sql b/src/test/regress/sql/multirangetypes.sql
index 1abcaeddb5..cb53f90aba 100644
--- a/src/test/regress/sql/multirangetypes.sql
+++ b/src/test/regress/sql/multirangetypes.sql
@@ -854,3 +854,160 @@ create function mr_inoutparam_fail(inout i anyelement, out r anymultirange)
 --should fail
 create function mr_table_fail(i anyelement) returns table(i anyelement, r anymultirange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+-- test multirange join operators
+create table test_multirange_join_1(mr1 int4multirange);
+create table test_multirange_join_2(mr2 int4multirange);
+create table test_range_join(ir int4range);
+create table test_elem_join(elem int4);
+
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10),int4range(g+20, g+30),int4range(g+40, g+50)) from generate_series(1,200) g;
+insert into test_multirange_join_1 select '{}'::int4multirange from generate_series(1,50) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10000)) from generate_series(1,100) g;
+insert into test_multirange_join_1 select int4multirange(int4range(NULL, g*10, '(]'), int4range(g*10, g*20, '(]')) from generate_series(1,10) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g*10, g*20, '(]'), int4range(g*20, NULL, '[)')) from generate_series(1,10) g;
+
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10),int4range(g+20, g+30),int4range(g+40, g+50)) from generate_series(1,20) g;
+insert into test_multirange_join_2 select '{}'::int4multirange from generate_series(1,5) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10000)) from generate_series(1,10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(NULL, g*10, '(]'), int4range(g*10, g*20, '(]')) from generate_series(1,10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g*10, g*20, '(]'), int4range(g*20, NULL, '[)')) from generate_series(1,10) g;
+
+insert into test_range_join select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join select int4range(g, g+10000) from generate_series(1,10) g;
+insert into test_range_join select int4range(NULL,g*10,'(]') from generate_series(1,10) g;
+insert into test_range_join select int4range(g*10,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join select 'empty'::int4range from generate_series(1,20) g;
+insert into test_range_join select NULL from generate_series(1,5) g;
+
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select g+10000 from generate_series(1,10) g;
+insert into test_elem_join select g*10 from generate_series(1,10) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select NULL from generate_series(1,5) g;
+
+analyze test_multirange_join_1;
+analyze test_multirange_join_2;
+analyze test_range_join;
+analyze test_elem_join;
+
+create function check_estimated_rows(text) returns table (estimated int, actual int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain analyze %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
+            return query select tmp[1]::int, tmp[2]::int;
+        end if;
+    end loop;
+end;
+$$;
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 = mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 < mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 <= mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 > mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 >= mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 && mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 && ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir && mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 <@ mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 <@ ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir <@ mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 @> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 @> ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir @> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 << mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 << ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir << mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 >> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 >> ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir >> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 &< mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 &< ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir &< mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 &> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 &> ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir &> mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_multirange_join_2 where mr1 -|- mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_range_join where mr1 -|- ir');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join, test_multirange_join_2 where ir -|- mr2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_elem_join, test_multirange_join_1 where elem <@ mr1');
+
+SELECT * FROM check_estimated_rows('
+select * from test_multirange_join_1, test_elem_join where mr1 @> elem');
+
+drop function check_estimated_rows;
+
+drop table test_multirange_join_1;
+drop table test_multirange_join_2;
+drop table test_range_join;
+drop table test_elem_join;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 1a10f67f19..6031cd695a 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -616,3 +616,105 @@ create function inoutparam_fail(inout i anyelement, out r anyrange)
 --should fail
 create function table_fail(i anyelement) returns table(i anyelement, r anyrange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+-- test range join operators
+create table test_range_join_1(ir1 int4range);
+create table test_range_join_2(ir2 int4range);
+create table test_elem_join(elem int4);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1,200) g;
+insert into test_range_join_1 select int4range(g, g+10000) from generate_series(1,100) g;
+insert into test_range_join_1 select int4range(NULL,g*10,'(]') from generate_series(1,10) g;
+insert into test_range_join_1 select int4range(g*10,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1,200) g;
+insert into test_range_join_1 select 'empty'::int4range from generate_series(1,20) g;
+insert into test_range_join_1 select NULL from generate_series(1,50) g;
+
+insert into test_range_join_2 select int4range(g+10, g+20) from generate_series(1,20) g;
+insert into test_range_join_2 select int4range(g+5000, g+15000) from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(NULL,g*5,'(]') from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(g*5,NULL,'[)') from generate_series(1,10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1,20) g;
+insert into test_range_join_2 select 'empty'::int4range from generate_series(1,5) g;
+insert into test_range_join_2 select NULL from generate_series(1,5) g;
+
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select g+10000 from generate_series(1,10) g;
+insert into test_elem_join select g*10 from generate_series(1,10) g;
+insert into test_elem_join select g from generate_series(1,20) g;
+insert into test_elem_join select NULL from generate_series(1,5) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_elem_join;
+
+create function check_estimated_rows(text) returns table (estimated int, actual int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain analyze %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
+            return query select tmp[1]::int, tmp[2]::int;
+        end if;
+    end loop;
+end;
+$$;
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 = ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 < ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 <= ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 > ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 >= ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 && ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 <@ ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 @> ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 << ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 >> ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 &< ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 &> ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_range_join_2 where ir1 -|- ir2');
+
+SELECT * FROM check_estimated_rows('
+select * from test_elem_join, test_range_join_1 where elem <@ ir1');
+
+SELECT * FROM check_estimated_rows('
+select * from test_range_join_1, test_elem_join where ir1 @> elem');
+
+drop function check_estimated_rows;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_elem_join;


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
@ 2024-01-05 10:37 ` vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: vignesh C @ 2024-01-05 10:37 UTC (permalink / raw)
  To: Schoemans Maxime <[email protected]>; +Cc: Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

On Tue, 21 Nov 2023 at 01:47, Schoemans Maxime <[email protected]> wrote:
>
> On 14/11/2023 20:46, Tom Lane wrote:
> > I took a brief look through this very interesting work.  I concur
> > with Tomas that it feels a little odd that range join selectivity
> > would become smarter than scalar inequality join selectivity, and
> > that we really ought to prioritize applying these methods to that
> > case.  Still, that's a poor reason to not take the patch.
>
> Indeed, we started with ranges as this was the simpler case (no MCV) and
> was the topic of a course project.
> The idea is to later write a second patch that applies these ideas to
> scalar inequality while also handling MCV's correctly.
>
> > I also agree with the upthread criticism that having two identical
> > functions in different source files will be a maintenance nightmare.
> > Don't do it.  When and if there's a reason for the behavior to
> > diverge between the range and multirange cases, it'd likely be
> > better to handle that by passing in a flag to say what to do.
>
> The duplication is indeed not ideal. However, there are already 8 other
> duplicate functions between the two files.
> I would thus suggest to leave the duplication in this patch and create a
> second one that removes all duplication from the two files, instead of
> just removing the duplication for our new function.
> What are your thoughts on this? If we do this, should the function
> definitions go in rangetypes.h or should we create a new
> rangetypes_selfuncs.h header?
>
> > But my real unhappiness with the patch as-submitted is the test cases,
> > which require rowcount estimates to be reproduced exactly.
>
> > We need a more forgiving test method. Usually the
> > approach is to set up a test case where the improved accuracy of
> > the estimate changes the planner's choice of plan compared to what
> > you got before, since that will normally not be too prone to change
> > from variations of a percent or two in the estimates.
>
> I have changed the test method to produce query plans for a 3-way range
> join.
> The plans for the different operators differ due to the computed
> selectivity estimation, which was not the case before this patch.

One of the tests was aborted at [1], kindly post an updated patch for the same:
[04:55:42.797] src/tools/ci/cores_backtrace.sh linux /tmp/cores
[04:56:03.640] dumping /tmp/cores/postgres-6-24094.core for
/tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/postgres

[04:57:24.199] Core was generated by `postgres: old_node: postgres
regression [local] EXPLAIN '.
[04:57:24.199] Program terminated with signal SIGABRT, Aborted.
[04:57:24.199] #0 __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
[04:57:24.199] Download failed: Invalid argument. Continuing without
source file ./signal/../sysdeps/unix/sysv/linux/raise.c.
[04:57:26.803]
[04:57:26.803] Thread 1 (Thread 0x7f121d9ec380 (LWP 24094)):
[04:57:26.803] #0 __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
[04:57:26.803] set = {__val = {4194304, 0, 4636737291354636288,
4636737291354636288, 0, 0, 64, 64, 128, 128, 192, 192, 256, 256, 0,
0}}
[04:57:26.803] pid = <optimized out>
[04:57:26.803] tid = <optimized out>
[04:57:26.803] ret = <optimized out>
[04:57:26.803] #1 0x00007f122003d537 in __GI_abort () at abort.c:79
...
...
[04:57:38.774] #6 0x00007f357ad95788 in __asan::__asan_report_load1
(addr=addr@entry=107477261711120) at
../../../../src/libsanitizer/asan/asan_rtl.cpp:117
[04:57:38.774] bp = 140731433585840
[04:57:38.774] pc = <optimized out>
[04:57:38.774] local_stack = 139867680821632
[04:57:38.774] sp = 140731433585832
[04:57:38.774] #7 0x000055d5b155c37c in range_cmp_bound_values
(typcache=typcache@entry=0x629000030b60, b1=b1@entry=0x61c000017708,
b2=b2@entry=0x61c0000188b8) at rangetypes.c:2090
[04:57:38.774] No locals.
[04:57:38.774] #8 0x000055d5b1567bb2 in calc_hist_join_selectivity
(typcache=typcache@entry=0x629000030b60,
hist1=hist1@entry=0x61c0000188b8, nhist1=nhist1@entry=101,
hist2=hist2@entry=0x61c0000170b8, nhist2=nhist2@entry=101) at
rangetypes_selfuncs.c:1298
[04:57:38.774] i = 0
[04:57:38.774] j = 101
[04:57:38.774] selectivity = <optimized out>
[04:57:38.774] cur_sel1 = <optimized out>
[04:57:38.774] cur_sel2 = <optimized out>
[04:57:38.774] prev_sel1 = <optimized out>
[04:57:38.774] prev_sel2 = <optimized out>
[04:57:38.774] cur_sync = {val = <optimized out>, infinite =
<optimized out>, inclusive = <optimized out>, lower = <optimized out>}
[04:57:38.774] #9 0x000055d5b1569190 in rangejoinsel
(fcinfo=<optimized out>) at rangetypes_selfuncs.c:1495

[1] - https://cirrus-ci.com/task/5507789477380096

Regards,
Vignesh






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
@ 2024-01-05 17:39   ` Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Schoemans Maxime @ 2024-01-05 17:39 UTC (permalink / raw)
  To: vignesh C <[email protected]>; +Cc: Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

On 05/01/2024 11:37, vignesh C wrote:
 > One of the tests was aborted at [1], kindly post an updated patch for 
the same:

Thank you for notifying us.
I believe I fixed the issue but it is hard to be certain as the issue 
did not arise when running the regression tests locally.

Regards,
Maxime

Attachments:

  [text/x-patch] v4-0001-Join-Selectivity-Estimation-for-Range-types.patch (60.7K, 2-v4-0001-Join-Selectivity-Estimation-for-Range-types.patch)
  download | inline diff:
diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 981c1fd298..6abc43f149 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1335,3 +1335,542 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * This is a utility function used to estimate the join selectivity of
+ * range attributes using rangebound histogram statistics as described
+ * in this paper:
+ *
+ * Diogo Repas, Zhicheng Luo, Maxime Schoemans and Mahmoud Sakr, 2022
+ * Selectivity Estimation of Inequality Joins In Databases
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * The attributes being joined will be treated as random variables
+ * that follow a distribution modeled by a Probability Density Function (PDF).
+ * Let the two attributes be denoted X, Y.
+ * This function finds the probability P(X < Y).
+ * Note that the PDFs of the two variables can easily be obtained
+ * from their bounds histogram, respectively hist1 and hist2 .
+ *
+ * Let the PDF of X, Y be denoted as f_X, f_Y.
+ * The probability P(X < Y) can be formalized as follows:
+ * P(X < Y)= integral_-inf^inf( integral_-inf^y ( f_X(x) * f_Y(y) dx dy ) )
+ *                = integral_-inf^inf( F_X(y) * f_Y(y) dy )
+ * where F_X(y) denote the Cumulative Distribution Function of X at y.
+ * Note that F_X is the selectivity estimation (non-join),
+ * which is implemented using the function calc_hist_selectivity_scalar.
+ *
+ * Now given the histograms of the two attributes X, Y, we note the following:
+ * - The PDF of Y is a step function
+ * (constant piece-wise, where each piece is defined in a bin of Y's histogram)
+ * - The CDF of X is linear piece-wise
+ *   (each piece is defined in a bin of X's histogram)
+ * This leads to the conclusion that their product
+ * (used to calculate the equation above) is also linear piece-wise.
+ * A new piece starts whenever either the bin of X or the bin of Y changes.
+ * By parallel scanning the two rangebound histograms of X and Y,
+ * we evaluate one piece of the result between every two consecutive rangebounds
+ * in the union of the two histograms.
+ *
+ * Given that the product F_X * f_y is linear in the interval
+ * between every two consecutive rangebounds, let them be denoted prev, cur,
+ * it can be shown that the above formula can be discretized into the following:
+ * P(X < Y) =
+ *   0.5 * sum_0^{n+m-1} ( ( F_X(prev) + F_X(cur) ) * ( F_Y(cur) - F_Y(prev) ) )
+ * where n, m are the lengths of the two histograms.
+ *
+ * As such, it is possible to fully compute the join selectivity
+ * as a summation of CDFs, iterating over the bounds of the two histograms.
+ * This maximizes the code reuse, since the CDF is computed using
+ * the calc_hist_selectivity_scalar function, which is the function used
+ * for selectivity estimation (non-joins).
+ *
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0,	/* initialization */
+				prev_sel1 = -1.0,	/* to skip the first iteration */
+				prev_sel2 = 0.0;	/* initialization */
+
+	/*
+	 * Histograms will never be empty. In fact, a histogram will never have
+	 * less than 2 values (1 bin)
+	 */
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/* Fast-forwards i and j to start of iteration */
+	for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+	for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+	/* Do the estimation on overlapping regions */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+
+		if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* If equal, skip one */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		/* Prepare for the next iteration */
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if any */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * multirangejoinsel -- join cardinality for multirange operators
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1,
+				vardata2;
+	AttStatsSlot hist1,
+				hist2,
+				sslot;
+	bool		reversed;
+	Selectivity selec;
+	TypeCacheEntry *typcache = NULL,
+			   *rng_typcache = NULL;
+	Form_pg_statistic stats1,
+				stats2;
+	double		empty_frac1,
+				empty_frac2,
+				null_frac1,
+				null_frac2;
+	int			nhist1,
+				nhist2;
+	RangeBound *hist1_lower,
+			   *hist1_upper,
+			   *hist2_lower,
+			   *hist2_upper;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2, &reversed);
+
+	selec = default_multirange_selectivity(operator);
+
+	/* get multirange type cache */
+	if (type_is_multirange(vardata1.vartype))
+		typcache = multirange_get_typcache(fcinfo, vardata1.vartype);
+	else if (type_is_multirange(vardata2.vartype))
+		typcache = multirange_get_typcache(fcinfo, vardata2.vartype);
+
+	if (HeapTupleIsValid(vardata1.statsTuple) &&
+		get_attstatsslot(&hist1, vardata1.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		HeapTupleIsValid(vardata2.statsTuple) &&
+		get_attstatsslot(&hist2, vardata2.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		typcache)
+	{
+
+		/* Initialize underlying range type cache */
+		rng_typcache = typcache->rngtype;
+
+		/*
+		 * First look up the fraction of NULLs and empty ranges from
+		 * pg_statistic.
+		 */
+		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+		null_frac1 = stats1->stanullfrac;
+		null_frac2 = stats2->stanullfrac;
+
+		/* Try to get fraction of empty ranges for the first variable */
+		if (get_attstatsslot(&sslot, vardata1.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac1 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac1 = 0.0;
+		}
+
+		/* Try to get fraction of empty ranges for the second variable */
+		if (get_attstatsslot(&sslot, vardata2.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac2 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac2 = 0.0;
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the first variable.
+		 */
+		nhist1 = hist1.nvalues;
+		hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		for (i = 0; i < nhist1; i++)
+		{
+			range_deserialize(rng_typcache, DatumGetRangeTypeP(hist1.values[i]),
+							  &hist1_lower[i], &hist1_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the second variable.
+		 */
+		nhist2 = hist2.nvalues;
+		hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		for (i = 0; i < nhist2; i++)
+		{
+			range_deserialize(rng_typcache, DatumGetRangeTypeP(hist2.values[i]),
+							  &hist2_lower[i], &hist2_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		switch (operator)
+		{
+			case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+			case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+
+				/*
+				 * Selectivity of A && B = Selectivity of NOT( A << B || A >>
+				 * B ) = 1 - Selectivity of (A.upper < B.lower) - Selectivity
+				 * of (B.upper < A.lower)
+				 */
+				selec = 1;
+				selec -= calc_hist_join_selectivity(rng_typcache,
+													hist1_upper, nhist1,
+													hist2_lower, nhist2);
+				selec -= calc_hist_join_selectivity(rng_typcache,
+													hist2_upper, nhist2,
+													hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LESS_EQUAL_OP:
+
+				/*
+				 * A <= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_GREATER_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to subtract P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LESS_OP:
+
+				/*
+				 * A < B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_lower, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * A >= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_LESS_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_OP:
+
+				/*
+				 * A > B == B < A
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_RANGE_OP:
+			case OID_RANGE_LEFT_MULTIRANGE_OP:
+				/* var1 << var2 when upper(var1) < lower(var2) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_upper, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_RANGE_OP:
+			case OID_RANGE_RIGHT_MULTIRANGE_OP:
+				/* var1 >> var2 when upper(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_upper, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+				/* var1 &< var2 when upper(var1) < upper(var2) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist1_upper, nhist1,
+												   hist2_upper, nhist2);
+				break;
+
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+				/* var1 &> var2 when lower(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(rng_typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP:
+			case OID_MULTIRANGE_RANGE_CONTAINED_OP:
+			case OID_RANGE_MULTIRANGE_CONTAINED_OP:
+
+				/*
+				 * var1 <@ var2 is equivalent to lower(var2) <= lower(var1)
+				 * and upper(var1) <= upper(var2)
+				 *
+				 * After negating both sides we get not( lower(var1) <
+				 * lower(var2) ) and not( upper(var2) < upper(var1) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				selec *= 1 - calc_hist_join_selectivity(rng_typcache,
+														hist2_upper, nhist2,
+														hist1_upper, nhist1);
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_CONTAINS_RANGE_OP:
+			case OID_RANGE_CONTAINS_MULTIRANGE_OP:
+
+				/*
+				 * var1 @> var2 is equivalent to lower(var1) <= lower(var2)
+				 * and upper(var2) <= upper(var1)
+				 *
+				 * After negating both sides we get not( lower(var2) <
+				 * lower(var1) ) and not( upper(var1) < upper(var2) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(rng_typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				selec *= 1 - calc_hist_join_selectivity(rng_typcache,
+														hist1_upper, nhist1,
+														hist2_upper, nhist2);
+				break;
+
+			case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_RANGE_OP:
+			case OID_RANGE_ADJACENT_MULTIRANGE_OP:
+
+				/*
+				 * just punt for now, estimation would require equality
+				 * selectivity for bounds
+				 */
+			case OID_MULTIRANGE_CONTAINS_ELEM_OP:
+			case OID_MULTIRANGE_ELEM_CONTAINED_OP:
+
+				/*
+				 * just punt for now, estimation would require extraction of
+				 * histograms for the anyelement
+				 */
+			default:
+				break;
+		}
+
+
+		/* the calculated selectivity only applies to non-empty (multi)ranges */
+		selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+		/*
+		 * Depending on the operator, empty (multi)ranges might match
+		 * different fractions of the result.
+		 */
+		switch (operator)
+		{
+			case OID_MULTIRANGE_LESS_OP:
+
+				/*
+				 * empty (multi)range < non-empty (multi)range
+				 */
+				selec += empty_frac1 * (1 - empty_frac2);
+				break;
+
+			case OID_MULTIRANGE_GREATER_OP:
+
+				/*
+				 * non-empty (multi)range > empty (multi)range
+				 */
+				selec += (1 - empty_frac1) * empty_frac2;
+				break;
+
+			case OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP:
+			case OID_MULTIRANGE_RANGE_CONTAINED_OP:
+			case OID_RANGE_MULTIRANGE_CONTAINED_OP:
+
+				/*
+				 * empty (multi)range <@ any (multi)range
+				 */
+			case OID_MULTIRANGE_LESS_EQUAL_OP:
+
+				/*
+				 * empty (multi)range <= any (multi)range
+				 */
+				selec += empty_frac1;
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_CONTAINS_RANGE_OP:
+			case OID_RANGE_CONTAINS_MULTIRANGE_OP:
+
+				/*
+				 * any (multi)range @> empty (multi)range
+				 */
+			case OID_MULTIRANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * any (multi)range >= empty (multi)range
+				 */
+				selec += empty_frac2;
+				break;
+
+			case OID_MULTIRANGE_CONTAINS_ELEM_OP:
+			case OID_MULTIRANGE_ELEM_CONTAINED_OP:
+			case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+			case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_LEFT_RANGE_OP:
+			case OID_RANGE_LEFT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_RIGHT_RANGE_OP:
+			case OID_RANGE_RIGHT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP:
+			case OID_MULTIRANGE_ADJACENT_RANGE_OP:
+			case OID_RANGE_ADJACENT_MULTIRANGE_OP:
+			default:
+
+				/*
+				 * these operators always return false when an empty
+				 * (multi)range is involved
+				 */
+				break;
+
+		}
+
+		/* all range operators are strict */
+		selec *= (1 - null_frac1) * (1 - null_frac2);
+
+		free_attstatsslot(&hist1);
+		free_attstatsslot(&hist2);
+	}
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+
+}
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index c260012bd0..be3479cb0b 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,496 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * This is a utility function used to estimate the join selectivity of
+ * range attributes using rangebound histogram statistics as described
+ * in this paper:
+ *
+ * Diogo Repas, Zhicheng Luo, Maxime Schoemans and Mahmoud Sakr, 2022
+ * Selectivity Estimation of Inequality Joins In Databases
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * The attributes being joined will be treated as random variables
+ * that follow a distribution modeled by a Probability Density Function (PDF).
+ * Let the two attributes be denoted X, Y.
+ * This function finds the probability P(X < Y).
+ * Note that the PDFs of the two variables can easily be obtained
+ * from their bounds histogram, respectively hist1 and hist2 .
+ *
+ * Let the PDF of X, Y be denoted as f_X, f_Y.
+ * The probability P(X < Y) can be formalized as follows:
+ * P(X < Y)= integral_-inf^inf( integral_-inf^y ( f_X(x) * f_Y(y) dx dy ) )
+ *                = integral_-inf^inf( F_X(y) * f_Y(y) dy )
+ * where F_X(y) denote the Cumulative Distribution Function of X at y.
+ * Note that F_X is the selectivity estimation (non-join),
+ * which is implemented using the function calc_hist_selectivity_scalar.
+ *
+ * Now given the histograms of the two attributes X, Y, we note the following:
+ * - The PDF of Y is a step function
+ * (constant piece-wise, where each piece is defined in a bin of Y's histogram)
+ * - The CDF of X is linear piece-wise
+ *   (each piece is defined in a bin of X's histogram)
+ * This leads to the conclusion that their product
+ * (used to calculate the equation above) is also linear piece-wise.
+ * A new piece starts whenever either the bin of X or the bin of Y changes.
+ * By parallel scanning the two rangebound histograms of X and Y,
+ * we evaluate one piece of the result between every two consecutive rangebounds
+ * in the union of the two histograms.
+ *
+ * Given that the product F_X * f_y is linear in the interval
+ * between every two consecutive rangebounds, let them be denoted prev, cur,
+ * it can be shown that the above formula can be discretized into the following:
+ * P(X < Y) =
+ *   0.5 * sum_0^{n+m-1} ( ( F_X(prev) + F_X(cur) ) * ( F_Y(cur) - F_Y(prev) ) )
+ * where n, m are the lengths of the two histograms.
+ *
+ * As such, it is possible to fully compute the join selectivity
+ * as a summation of CDFs, iterating over the bounds of the two histograms.
+ * This maximizes the code reuse, since the CDF is computed using
+ * the calc_hist_selectivity_scalar function, which is the function used
+ * for selectivity estimation (non-joins).
+ *
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0,	/* initialization */
+				prev_sel1 = -1.0,	/* to skip the first iteration */
+				prev_sel2 = 0.0;	/* initialization */
+
+	/*
+	 * Histograms will never be empty. In fact, a histogram will never have
+	 * less than 2 values (1 bin)
+	 */
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/* Fast-forwards i and j to start of iteration */
+	for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+	for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+	/* Do the estimation on overlapping regions */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+
+		if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* If equal, skip one */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		/* Prepare for the next iteration */
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if any */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * rangejoinsel -- join cardinality for range operators
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1,
+				vardata2;
+	AttStatsSlot hist1,
+				hist2,
+				sslot;
+	bool		reversed;
+	Selectivity selec;
+	TypeCacheEntry *typcache = NULL;
+	Form_pg_statistic stats1,
+				stats2;
+	double		empty_frac1,
+				empty_frac2,
+				null_frac1,
+				null_frac2;
+	int			nhist1,
+				nhist2;
+	RangeBound *hist1_lower,
+			   *hist1_upper,
+			   *hist2_lower,
+			   *hist2_upper;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2, &reversed);
+
+	selec = default_range_selectivity(operator);
+
+	if (HeapTupleIsValid(vardata1.statsTuple) &&
+		get_attstatsslot(&hist1, vardata1.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		HeapTupleIsValid(vardata2.statsTuple) &&
+		get_attstatsslot(&hist2, vardata2.statsTuple,
+						 STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						 ATTSTATSSLOT_VALUES) &&
+		vardata1.vartype == vardata2.vartype)
+	{
+
+		/* Initialize type cache */
+		typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+		/*
+		 * First look up the fraction of NULLs and empty ranges from
+		 * pg_statistic.
+		 */
+		stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+		stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+		null_frac1 = stats1->stanullfrac;
+		null_frac2 = stats2->stanullfrac;
+
+		/* Try to get fraction of empty ranges for the first variable */
+		if (get_attstatsslot(&sslot, vardata1.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac1 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac1 = 0.0;
+		}
+
+		/* Try to get fraction of empty ranges for the second variable */
+		if (get_attstatsslot(&sslot, vardata2.statsTuple,
+							 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+							 InvalidOid,
+							 ATTSTATSSLOT_NUMBERS))
+		{
+			if (sslot.nnumbers != 1)	/* shouldn't happen */
+				elog(ERROR, "invalid empty fraction statistic");
+			empty_frac2 = sslot.numbers[0];
+			free_attstatsslot(&sslot);
+		}
+		else
+		{
+			/* No empty fraction statistic. Assume no empty ranges. */
+			empty_frac2 = 0.0;
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the first variable.
+		 */
+		nhist1 = hist1.nvalues;
+		hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+		for (i = 0; i < nhist1; i++)
+		{
+			range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+							  &hist1_lower[i], &hist1_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		/*
+		 * Convert histograms of ranges into histograms of their lower and
+		 * upper bounds for the second variable.
+		 */
+		nhist2 = hist2.nvalues;
+		hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+		for (i = 0; i < nhist2; i++)
+		{
+			range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+							  &hist2_lower[i], &hist2_upper[i], &empty);
+			/* The histogram should not contain any empty ranges */
+			if (empty)
+				elog(ERROR, "bounds histogram contains an empty range");
+		}
+
+		switch (operator)
+		{
+			case OID_RANGE_OVERLAP_OP:
+
+				/*
+				 * Selectivity of A && B = Selectivity of NOT( A << B || A >>
+				 * B ) = 1 - Selectivity of (A.upper < B.lower) - Selectivity
+				 * of (B.upper < A.lower)
+				 */
+				selec = 1;
+				selec -= calc_hist_join_selectivity(typcache,
+													hist1_upper, nhist1,
+													hist2_lower, nhist2);
+				selec -= calc_hist_join_selectivity(typcache,
+													hist2_upper, nhist2,
+													hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LESS_EQUAL_OP:
+
+				/*
+				 * A <= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_GREATER_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to subtract P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LESS_OP:
+
+				/*
+				 * A < B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_lower, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * A >= B
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Negation of OID_RANGE_LESS_OP.
+				 *
+				 * Overestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 < upper2)
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_GREATER_OP:
+
+				/*
+				 * A > B == B < A
+				 *
+				 * Start by comparing lower bounds and if they are equal
+				 * compare upper bounds
+				 *
+				 * Underestimate by comparing only the lower bounds. Higher
+				 * accuracy would require us to add P(lower1 = lower2) *
+				 * P(upper1 > upper2)
+				 */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_LEFT_OP:
+				/* var1 << var2 when upper(var1) < lower(var2) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_upper, nhist1,
+												   hist2_lower, nhist2);
+				break;
+
+			case OID_RANGE_RIGHT_OP:
+				/* var1 >> var2 when upper(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_upper, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_OVERLAPS_LEFT_OP:
+				/* var1 &< var2 when upper(var1) < upper(var2) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist1_upper, nhist1,
+												   hist2_upper, nhist2);
+				break;
+
+			case OID_RANGE_OVERLAPS_RIGHT_OP:
+				/* var1 &> var2 when lower(var2) < lower(var1) */
+				selec = calc_hist_join_selectivity(typcache,
+												   hist2_lower, nhist2,
+												   hist1_lower, nhist1);
+				break;
+
+			case OID_RANGE_CONTAINED_OP:
+
+				/*
+				 * var1 <@ var2 is equivalent to lower(var2) <= lower(var1)
+				 * and upper(var1) <= upper(var2)
+				 *
+				 * After negating both sides we get not( lower(var1) <
+				 * lower(var2) ) and not( upper(var2) < upper(var1) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist1_lower, nhist1,
+													   hist2_lower, nhist2);
+				selec *= 1 - calc_hist_join_selectivity(typcache,
+														hist2_upper, nhist2,
+														hist1_upper, nhist1);
+				break;
+
+			case OID_RANGE_CONTAINS_OP:
+
+				/*
+				 * var1 @> var2 is equivalent to lower(var1) <= lower(var2)
+				 * and upper(var2) <= upper(var1)
+				 *
+				 * After negating both sides we get not( lower(var2) <
+				 * lower(var1) ) and not( upper(var1) < upper(var2) ),
+				 * respectively. Assuming independence, multiply both
+				 * selectivities.
+				 */
+				selec = 1 - calc_hist_join_selectivity(typcache,
+													   hist2_lower, nhist2,
+													   hist1_lower, nhist1);
+				selec *= 1 - calc_hist_join_selectivity(typcache,
+														hist1_upper, nhist1,
+														hist2_upper, nhist2);
+				break;
+
+			case OID_RANGE_CONTAINS_ELEM_OP:
+			case OID_RANGE_ELEM_CONTAINED_OP:
+
+				/*
+				 * just punt for now, estimation would require extraction of
+				 * histograms for the anyelement
+				 */
+			default:
+				break;
+		}
+
+
+		/* the calculated selectivity only applies to non-empty ranges */
+		selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+		/*
+		 * Depending on the operator, empty ranges might match different
+		 * fractions of the result.
+		 */
+		switch (operator)
+		{
+			case OID_RANGE_LESS_OP:
+
+				/*
+				 * empty range < non-empty range
+				 */
+				selec += empty_frac1 * (1 - empty_frac2);
+				break;
+
+			case OID_RANGE_GREATER_OP:
+
+				/*
+				 * non-empty range > empty range
+				 */
+				selec += (1 - empty_frac1) * empty_frac2;
+				break;
+
+			case OID_RANGE_CONTAINED_OP:
+
+				/*
+				 * empty range <@ any range
+				 */
+			case OID_RANGE_LESS_EQUAL_OP:
+
+				/*
+				 * empty range <= any range
+				 */
+				selec += empty_frac1;
+				break;
+
+			case OID_RANGE_CONTAINS_OP:
+
+				/*
+				 * any range @> empty range
+				 */
+			case OID_RANGE_GREATER_EQUAL_OP:
+
+				/*
+				 * any range >= empty range
+				 */
+				selec += empty_frac2;
+				break;
+
+			case OID_RANGE_CONTAINS_ELEM_OP:
+			case OID_RANGE_ELEM_CONTAINED_OP:
+			case OID_RANGE_OVERLAP_OP:
+			case OID_RANGE_OVERLAPS_LEFT_OP:
+			case OID_RANGE_OVERLAPS_RIGHT_OP:
+			case OID_RANGE_LEFT_OP:
+			case OID_RANGE_RIGHT_OP:
+			default:
+
+				/*
+				 * these operators always return false when an empty range is
+				 * involved
+				 */
+				break;
+
+		}
+
+		/* all range operators are strict */
+		selec *= (1 - null_frac1) * (1 - null_frac2);
+
+		free_attstatsslot(&hist1);
+		free_attstatsslot(&hist2);
+	}
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 0e7511dde1..cbee3c2293 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3071,78 +3071,78 @@
   oprname => '<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>(anyrange,anyrange)',
   oprnegate => '>=(anyrange,anyrange)', oprcode => 'range_lt',
-  oprrest => 'rangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3885', oid_symbol => 'OID_RANGE_LESS_EQUAL_OP',
   descr => 'less than or equal',
   oprname => '<=', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>=(anyrange,anyrange)',
   oprnegate => '>(anyrange,anyrange)', oprcode => 'range_le',
-  oprrest => 'rangesel', oprjoin => 'scalarlejoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3886', oid_symbol => 'OID_RANGE_GREATER_EQUAL_OP',
   descr => 'greater than or equal',
   oprname => '>=', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<=(anyrange,anyrange)',
   oprnegate => '<(anyrange,anyrange)', oprcode => 'range_ge',
-  oprrest => 'rangesel', oprjoin => 'scalargejoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3887', oid_symbol => 'OID_RANGE_GREATER_OP',
   descr => 'greater than',
   oprname => '>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<(anyrange,anyrange)',
   oprnegate => '<=(anyrange,anyrange)', oprcode => 'range_gt',
-  oprrest => 'rangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'rangesel', oprjoin => 'rangejoinsel' },
 { oid => '3888', oid_symbol => 'OID_RANGE_OVERLAP_OP', descr => 'overlaps',
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
   oprresult => 'bool', oprcom => '<@(anyelement,anyrange)',
   oprcode => 'range_contains_elem', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3890', oid_symbol => 'OID_RANGE_CONTAINS_OP', descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<@(anyrange,anyrange)',
   oprcode => 'range_contains', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3891', oid_symbol => 'OID_RANGE_ELEM_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyelement', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anyelement)',
   oprcode => 'elem_contained_by_range', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3892', oid_symbol => 'OID_RANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anyrange)',
   oprcode => 'range_contained_by', oprrest => 'rangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3893', oid_symbol => 'OID_RANGE_LEFT_OP', descr => 'is left of',
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'range_overleft', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3896', oid_symbol => 'OID_RANGE_OVERLAPS_RIGHT_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'range_overright', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3897', descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anyrange)',
   oprcode => 'range_adjacent', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3898', descr => 'range union',
   oprname => '+', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'anyrange', oprcom => '+(anyrange,anyrange)',
@@ -3277,139 +3277,139 @@
   oprname => '<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>(anymultirange,anymultirange)',
   oprnegate => '>=(anymultirange,anymultirange)', oprcode => 'multirange_lt',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2863', oid_symbol => 'OID_MULTIRANGE_LESS_EQUAL_OP',
   descr => 'less than or equal',
   oprname => '<=', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>=(anymultirange,anymultirange)',
   oprnegate => '>(anymultirange,anymultirange)', oprcode => 'multirange_le',
-  oprrest => 'multirangesel', oprjoin => 'scalarlejoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2864', oid_symbol => 'OID_MULTIRANGE_GREATER_EQUAL_OP',
   descr => 'greater than or equal',
   oprname => '>=', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<=(anymultirange,anymultirange)',
   oprnegate => '<(anymultirange,anymultirange)', oprcode => 'multirange_ge',
-  oprrest => 'multirangesel', oprjoin => 'scalargejoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2865', oid_symbol => 'OID_MULTIRANGE_GREATER_OP',
   descr => 'greater than',
   oprname => '>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<(anymultirange,anymultirange)',
   oprnegate => '<=(anymultirange,anymultirange)', oprcode => 'multirange_gt',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2866', oid_symbol => 'OID_RANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anyrange)',
   oprcode => 'range_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2867', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anymultirange)',
   oprcode => 'multirange_overlaps_range', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2868', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anymultirange)',
   oprcode => 'multirange_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2869', oid_symbol => 'OID_MULTIRANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyelement',
   oprresult => 'bool', oprcom => '<@(anyelement,anymultirange)',
   oprcode => 'multirange_contains_elem', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2870', oid_symbol => 'OID_MULTIRANGE_CONTAINS_RANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<@(anyrange,anymultirange)',
   oprcode => 'multirange_contains_range', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2871', oid_symbol => 'OID_MULTIRANGE_CONTAINS_MULTIRANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<@(anymultirange,anymultirange)',
   oprcode => 'multirange_contains_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2872', oid_symbol => 'OID_MULTIRANGE_ELEM_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyelement', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anyelement)',
   oprcode => 'elem_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2873', oid_symbol => 'OID_MULTIRANGE_RANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anyrange)',
   oprcode => 'range_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2874', oid_symbol => 'OID_MULTIRANGE_MULTIRANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '@>(anymultirange,anymultirange)',
   oprcode => 'multirange_contained_by_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4539', oid_symbol => 'OID_RANGE_CONTAINS_MULTIRANGE_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<@(anymultirange,anyrange)',
   oprcode => 'range_contains_multirange', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4540', oid_symbol => 'OID_RANGE_MULTIRANGE_CONTAINED_OP',
   descr => 'is contained by',
   oprname => '<@', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '@>(anyrange,anymultirange)',
   oprcode => 'multirange_contained_by_range', oprrest => 'multirangesel',
-  oprjoin => 'contjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2875', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_MULTIRANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'range_overleft_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2876', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_LEFT_RANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'multirange_overleft_range',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '2877', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_LEFT_MULTIRANGE_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'multirange_overleft_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalarltjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '3585', oid_symbol => 'OID_RANGE_OVERLAPS_RIGHT_MULTIRANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'range_overright_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4035', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RIGHT_RANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcode => 'multirange_overright_range',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4142', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RIGHT_MULTIRANGE_OP',
   descr => 'overlaps or is right of',
   oprname => '&>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcode => 'multirange_overright_multirange',
-  oprrest => 'multirangesel', oprjoin => 'scalargtjoinsel' },
+  oprrest => 'multirangesel', oprjoin => 'multirangejoinsel' },
 { oid => '4179', oid_symbol => 'OID_RANGE_ADJACENT_MULTIRANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '-|-(anymultirange,anyrange)',
   oprcode => 'range_adjacent_multirange', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4180', oid_symbol => 'OID_MULTIRANGE_ADJACENT_RANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anymultirange)',
   oprcode => 'multirange_adjacent_range', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4198', oid_symbol => 'OID_MULTIRANGE_ADJACENT_MULTIRANGE_OP',
   descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '-|-(anymultirange,anymultirange)',
   oprcode => 'multirange_adjacent_multirange', oprrest => 'matchingsel',
-  oprjoin => 'matchingjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4392', descr => 'multirange union',
   oprname => '+', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'anymultirange', oprcom => '+(anymultirange,anymultirange)',
@@ -3426,36 +3426,36 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anyrange)',
   oprcode => 'range_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4396', oid_symbol => 'OID_MULTIRANGE_LEFT_RANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anymultirange)',
   oprcode => 'multirange_before_range', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4397', oid_symbol => 'OID_MULTIRANGE_LEFT_MULTIRANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anymultirange)',
   oprcode => 'multirange_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4398', oid_symbol => 'OID_RANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anyrange)',
   oprcode => 'range_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4399', oid_symbol => 'OID_MULTIRANGE_RIGHT_RANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anymultirange)',
   oprcode => 'multirange_after_range', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4400', oid_symbol => 'OID_MULTIRANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anymultirange)',
   oprcode => 'multirange_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7979392776..4eaaad9fe3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12124,6 +12124,15 @@
   proname => 'any_value_transfn', prorettype => 'anyelement',
   proargtypes => 'anyelement anyelement', prosrc => 'any_value_transfn' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
+{ oid => '8356', descr => 'join selectivity for multirange operators',
+  proname => 'multirangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'multirangejoinsel' },
+
 { oid => '8436',
   descr => 'list of available WAL summary files',
   proname => 'pg_available_wal_summaries', prorows => '100',
diff --git a/src/test/regress/expected/multirangetypes.out b/src/test/regress/expected/multirangetypes.out
index a0cb875492..21d63d9bda 100644
--- a/src/test/regress/expected/multirangetypes.out
+++ b/src/test/regress/expected/multirangetypes.out
@@ -3361,3 +3361,66 @@ create function mr_table_fail(i anyelement) returns table(i anyelement, r anymul
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anymultirange requires at least one input of type anyrange or anymultirange.
+--
+-- test selectivity of multirange join operators
+--
+create table test_multirange_join_1 (imr1 int4multirange);
+create table test_multirange_join_2 (imr2 int4multirange);
+create table test_multirange_join_3 (imr3 int4multirange);
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_multirange_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_multirange_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+analyze test_multirange_join_1;
+analyze test_multirange_join_2;
+analyze test_multirange_join_3;
+--reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 && imr2 and imr2 && imr3;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_multirange_join_1.imr1 && test_multirange_join_2.imr2)
+         ->  Nested Loop
+               Join Filter: (test_multirange_join_2.imr2 && test_multirange_join_3.imr3)
+               ->  Seq Scan on test_multirange_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_multirange_join_3
+         ->  Materialize
+               ->  Seq Scan on test_multirange_join_1
+(10 rows)
+
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 << imr2 and imr2 << imr3;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_multirange_join_2.imr2 << test_multirange_join_3.imr3)
+         ->  Nested Loop
+               Join Filter: (test_multirange_join_1.imr1 << test_multirange_join_2.imr2)
+               ->  Seq Scan on test_multirange_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_multirange_join_2
+         ->  Materialize
+               ->  Seq Scan on test_multirange_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 >> imr2 and imr2 >> imr3;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_multirange_join_1.imr1 >> test_multirange_join_2.imr2)
+         ->  Nested Loop
+               Join Filter: (test_multirange_join_2.imr2 >> test_multirange_join_3.imr3)
+               ->  Seq Scan on test_multirange_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_multirange_join_3
+         ->  Seq Scan on test_multirange_join_1
+(9 rows)
+
+drop table test_multirange_join_1;
+drop table test_multirange_join_2;
+drop table test_multirange_join_3;
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index ee02ff0163..357bb3154b 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -1834,3 +1834,66 @@ create function table_fail(i anyelement) returns table(i anyelement, r anyrange)
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anyrange requires at least one input of type anyrange or anymultirange.
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+--reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 && test_range_join_2.ir2)
+         ->  Seq Scan on test_range_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_range_join_2.ir2 && test_range_join_3.ir3)
+                     ->  Seq Scan on test_range_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_2.ir2 << test_range_join_3.ir3)
+         ->  Nested Loop
+               Join Filter: (test_range_join_1.ir1 << test_range_join_2.ir2)
+               ->  Seq Scan on test_range_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_2
+         ->  Materialize
+               ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 >> test_range_join_2.ir2)
+         ->  Nested Loop
+               Join Filter: (test_range_join_2.ir2 >> test_range_join_3.ir3)
+               ->  Seq Scan on test_range_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_3
+         ->  Seq Scan on test_range_join_1
+(9 rows)
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
diff --git a/src/test/regress/sql/multirangetypes.sql b/src/test/regress/sql/multirangetypes.sql
index fefb4b4d42..4c62c31166 100644
--- a/src/test/regress/sql/multirangetypes.sql
+++ b/src/test/regress/sql/multirangetypes.sql
@@ -861,3 +861,30 @@ create function mr_inoutparam_fail(inout i anyelement, out r anymultirange)
 --should fail
 create function mr_table_fail(i anyelement) returns table(i anyelement, r anymultirange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+--
+-- test selectivity of multirange join operators
+--
+create table test_multirange_join_1 (imr1 int4multirange);
+create table test_multirange_join_2 (imr2 int4multirange);
+create table test_multirange_join_3 (imr3 int4multirange);
+
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_multirange_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_multirange_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_multirange_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_multirange_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+
+analyze test_multirange_join_1;
+analyze test_multirange_join_2;
+analyze test_multirange_join_3;
+
+--reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 && imr2 and imr2 && imr3;
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 << imr2 and imr2 << imr3;
+explain (costs off) select count(*) from test_multirange_join_1, test_multirange_join_2, test_multirange_join_3 where imr1 >> imr2 and imr2 >> imr3;
+
+drop table test_multirange_join_1;
+drop table test_multirange_join_2;
+drop table test_multirange_join_3;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index c23be928c3..1018a234a5 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -629,3 +629,30 @@ create function inoutparam_fail(inout i anyelement, out r anyrange)
 --should fail
 create function table_fail(i anyelement) returns table(i anyelement, r anyrange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+
+--reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
@ 2024-01-17 10:48     ` vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: vignesh C @ 2024-01-17 10:48 UTC (permalink / raw)
  To: Schoemans Maxime <[email protected]>; +Cc: Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

On Fri, 5 Jan 2024 at 23:09, Schoemans Maxime <[email protected]> wrote:
>
> On 05/01/2024 11:37, vignesh C wrote:
>  > One of the tests was aborted at [1], kindly post an updated patch for
> the same:
>
> Thank you for notifying us.
> I believe I fixed the issue but it is hard to be certain as the issue
> did not arise when running the regression tests locally.

I'm noticing this issue is not yet resolved, the CFBot is still
failing at [1] with:
#7 0x000055cddc25cd93 in range_cmp_bound_values
(typcache=typcache@entry=0x629000030b60, b1=b1@entry=0x61c000016f08,
b2=b2@entry=0x61c0000180b8) at rangetypes.c:2090
[19:55:02.591] No locals.
[19:55:02.591] #8 0x000055cddc2685c1 in calc_hist_join_selectivity
(typcache=typcache@entry=0x629000030b60,
hist1=hist1@entry=0x61c0000180b8, nhist1=nhist1@entry=101,
hist2=hist2@entry=0x61c0000168b8, nhist2=nhist2@entry=101) at
rangetypes_selfuncs.c:1295
[19:55:02.591] i = 0
[19:55:02.591] j = 101
[19:55:02.591] selectivity = 0
[19:55:02.591] prev_sel1 = -1
[19:55:02.591] prev_sel2 = 0
[19:55:02.591] #9 0x000055cddc269aaa in rangejoinsel
(fcinfo=<optimized out>) at rangetypes_selfuncs.c:1479
[19:55:02.591] root = <optimized out>
[19:55:02.591] operator = <optimized out>
[19:55:02.591] args = <optimized out>
[19:55:02.591] sjinfo = <optimized out>
[19:55:02.591] vardata1 = {var = <optimized out>, rel = <optimized
out>, statsTuple = <optimized out>, freefunc = <optimized out>,
vartype = <optimized out>, atttype = <optimized out>, atttypmod =
<optimized out>, isunique = <optimized out>, acl_ok = <optimized out>}
[19:55:02.591] vardata2 = {var = <optimized out>, rel = <optimized
out>, statsTuple = <optimized out>, freefunc = <optimized out>,
vartype = <optimized out>, atttype = <optimized out>, atttypmod =
<optimized out>, isunique = <optimized out>, acl_ok = <optimized out>}
[19:55:02.591] hist1 = {staop = <optimized out>, stacoll = <optimized
out>, valuetype = <optimized out>, values = <optimized out>, nvalues =
<optimized out>, numbers = <optimized out>, nnumbers = <optimized
out>, values_arr = <optimized out>, numbers_arr = <optimized out>}
[19:55:02.591] hist2 = {staop = <optimized out>, stacoll = <optimized
out>, valuetype = <optimized out>, values = <optimized out>, nvalues =
<optimized out>, numbers = <optimized out>, nnumbers = <optimized
out>, values_arr = <optimized out>, numbers_arr = <optimized out>}
[19:55:02.591] sslot = {staop = <optimized out>, stacoll = <optimized
out>, valuetype = <optimized out>, values = <optimized out>, nvalues =
<optimized out>, numbers = <optimized out>, nnumbers = <optimized
out>, values_arr = <optimized out>, numbers_arr = <optimized out>}
[19:55:02.591] reversed = <optimized out>
[19:55:02.591] selec = 0.001709375000000013
[19:55:02.591] typcache = 0x629000030b60
[19:55:02.591] stats1 = <optimized out>
[19:55:02.591] stats2 = <optimized out>
[19:55:02.591] empty_frac1 = 0
[19:55:02.591] empty_frac2 = 0
[19:55:02.591] null_frac1 = 0
[19:55:02.591] null_frac2 = 0
[19:55:02.591] nhist1 = 101
[19:55:02.591] nhist2 = 101
[19:55:02.591] hist1_lower = 0x61c0000168b8
[19:55:02.591] hist1_upper = 0x61c0000170b8
[19:55:02.591] hist2_lower = 0x61c0000178b8
[19:55:02.591] hist2_upper = 0x61c0000180b8
[19:55:02.591] empty = <optimized out>
[19:55:02.591] i = <optimized out>
[19:55:02.591] __func__ = "rangejoinsel"
[19:55:02.591] #10 0x000055cddc3b761f in FunctionCall5Coll
(flinfo=flinfo@entry=0x7ffc1628d710, collation=collation@entry=0,
arg1=arg1@entry=107545982648856, arg2=arg2@entry=3888,
arg3=arg3@entry=107820862916056, arg4=arg4@entry=0, arg5=<optimized
out>) at fmgr.c:1242
[19:55:02.591] fcinfodata = {fcinfo = {flinfo = <optimized out>,
context = <optimized out>, resultinfo = <optimized out>, fncollation =
<optimized out>, isnull = <optimized out>, nargs = <optimized out>,
args = 0x0}, fcinfo_data = {<optimized out> <repeats 112 times>}}
[19:55:02.591] fcinfo = 0x7ffc1628d5e0
[19:55:02.591] result = <optimized out>
[19:55:02.591] __func__ = "FunctionCall5Coll"
[19:55:02.591] #11 0x000055cddc3b92ee in OidFunctionCall5Coll
(functionId=8355, collation=collation@entry=0,
arg1=arg1@entry=107545982648856, arg2=arg2@entry=3888,
arg3=arg3@entry=107820862916056, arg4=arg4@entry=0, arg5=<optimized
out>) at fmgr.c:1460
[19:55:02.591] flinfo = {fn_addr = <optimized out>, fn_oid =
<optimized out>, fn_nargs = <optimized out>, fn_strict = <optimized
out>, fn_retset = <optimized out>, fn_stats = <optimized out>,
fn_extra = <optimized out>, fn_mcxt = <optimized out>, fn_expr =
<optimized out>}
[19:55:02.591] #12 0x000055cddbe834ae in join_selectivity
(root=root@entry=0x61d00017c218, operatorid=operatorid@entry=3888,
args=0x6210003bc5d8, inputcollid=0,
jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x7ffc1628db30) at
../../../../src/include/postgres.h:324
[19:55:02.591] oprjoin = <optimized out>
[19:55:02.591] result = <optimized out>
[19:55:02.591] __func__ = "join_selectivity"
[19:55:02.591] #13 0x000055cddbd8c18c in clause_selectivity_ext
(root=root@entry=0x61d00017c218, clause=0x6210003bc678,
varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x7ffc1628db30,
use_extended_stats=use_extended_stats@entry=true) at clausesel.c:841

I have changed the status to "Waiting on Author", feel free to post an
updated version, check CFBot and update the Commitfest entry
accordingly.

[1] - https://cirrus-ci.com/task/5698117824151552

Regards,
Vignesh





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
@ 2024-01-22 08:10       ` jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: jian he @ 2024-01-22 08:10 UTC (permalink / raw)
  To: vignesh C <[email protected]>; +Cc: Schoemans Maxime <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

I cannot figure out why it aborts.

as Tom mentioned in upthread about the test cases.
similar to src/test/regress/sql/stats_ext.sql check_estimated_rows function.
we can test it by something:

create or replace function check_estimated_rows(text) returns table (ok bool)
language plpgsql as
$$
declare
    ln text;
    tmp text[];
    first_row bool := true;
begin
    for ln in
        execute format('explain analyze %s', $1)
    loop
        if first_row then
            first_row := false;
            tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
            return query select 0.2 < tmp[1]::float8 / tmp[2]::float8
and tmp[1]::float8 / tmp[2]::float8 < 5;
        end if;
    end loop;
end;
$$;

select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 && ir2$$);
select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 << ir2$$);
select * from check_estimated_rows($$select * from test_range_join_1,
test_range_join_2 where ir1 >> ir2$$);

Do you need 3 tables to do the test? because we need to actually run
the query then compare the estimated row
and actually returned rows.
If you really execute the query with 3 table joins, it will take a lot of time.
So two tables join with where quql should be fine?

/* Fast-forwards i and j to start of iteration */
+ for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+ for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);
+
+ /* Do the estimation on overlapping regions */
+ while (i < nhist1 && j < nhist2)
+ {
+ double cur_sel1,
+ cur_sel2;
+ RangeBound cur_sync;
+
+ if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
+ cur_sync = hist1[i++];
+ else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
+ cur_sync = hist2[j++];
+ else
+ {
+ /* If equal, skip one */
+ cur_sync = hist1[i];
+

this part range_cmp_bound_values "if else if" part computed twice, you
can just do
`
int cmp;
cmp = range_cmp_bound_values(typcache, &hist1[i], &hist2[j]);
if cmp <0  then
else if cmp > 0 then
else then
`

also. I think you can put the following into  main while loop.
+ for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++);
+ for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++);

split range and multirange into 2 patches might be a good idea.
seems: same function (calc_hist_join_selectivity) with same function
signature in src/backend/utils/adt/multirangetypes_selfuncs.c
and src/backend/utils/adt/rangetypes_selfuncs.c,
previously mail complaints not resolved.





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
@ 2026-04-06 23:32         ` Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Haibo Yan @ 2026-04-06 23:32 UTC (permalink / raw)
  To: jian he <[email protected]>; +Cc: vignesh C <[email protected]>; Schoemans Maxime <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

On Mon, Apr 6, 2026 at 3:12 PM jian he <[email protected]> wrote:

> I cannot figure out why it aborts.
>
> as Tom mentioned in upthread about the test cases.
> similar to src/test/regress/sql/stats_ext.sql check_estimated_rows
> function.
> we can test it by something:
>
> create or replace function check_estimated_rows(text) returns table (ok
> bool)
> language plpgsql as
> $$
> declare
>     ln text;
>     tmp text[];
>     first_row bool := true;
> begin
>     for ln in
>         execute format('explain analyze %s', $1)
>     loop
>         if first_row then
>             first_row := false;
>             tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)');
>             return query select 0.2 < tmp[1]::float8 / tmp[2]::float8
> and tmp[1]::float8 / tmp[2]::float8 < 5;
>         end if;
>     end loop;
> end;
> $$;
>
> select * from check_estimated_rows($$select * from test_range_join_1,
> test_range_join_2 where ir1 && ir2$$);
> select * from check_estimated_rows($$select * from test_range_join_1,
> test_range_join_2 where ir1 << ir2$$);
> select * from check_estimated_rows($$select * from test_range_join_1,
> test_range_join_2 where ir1 >> ir2$$);
>
> Do you need 3 tables to do the test? because we need to actually run
> the query then compare the estimated row
> and actually returned rows.
> If you really execute the query with 3 table joins, it will take a lot of
> time.
> So two tables join with where quql should be fine?
>
> /* Fast-forwards i and j to start of iteration */
> + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0;
> i++);
> + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0;
> j++);
> +
> + /* Do the estimation on overlapping regions */
> + while (i < nhist1 && j < nhist2)
> + {
> + double cur_sel1,
> + cur_sel2;
> + RangeBound cur_sync;
> +
> + if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0)
> + cur_sync = hist1[i++];
> + else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0)
> + cur_sync = hist2[j++];
> + else
> + {
> + /* If equal, skip one */
> + cur_sync = hist1[i];
> +
>
> this part range_cmp_bound_values "if else if" part computed twice, you
> can just do
> `
> int cmp;
> cmp = range_cmp_bound_values(typcache, &hist1[i], &hist2[j]);
> if cmp <0  then
> else if cmp > 0 then
> else then
> `
>
> also. I think you can put the following into  main while loop.
> + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0;
> i++);
> + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0;
> j++);
>
> split range and multirange into 2 patches might be a good idea.
> seems: same function (calc_hist_join_selectivity) with same function
> signature in src/backend/utils/adt/multirangetypes_selfuncs.c
> and src/backend/utils/adt/rangetypes_selfuncs.c,
> previously mail complaints not resolved.
>
>
>
> Hi,all
I'd like to revive the discussion on improving selectivity estimation for
range/range joins.
Attached is the v5 patch which teaches rangejoinsel to use range bound
histograms for estimating the <<, >>, and && operators. Currently, the
planner often falls back to a hardcoded default (like 0.005), which can
lead to poor join ordering in complex queries.
In this version, I have intentionally excluded &< and &>.
a &< b essentially maps to upper(a) <= upper(b).
a &> b essentially maps to lower(a) >= lower(b).
Since these operators include equality (<= / >=) rather than strict
inequality (< / >), their estimation is slightly more nuanced. I believe
focusing on the strict inequality and overlap operators first allows us to
deliver a clean, converged, and significantly beneficial improvement. We
can discuss the best approach for the remaining operators once this
foundation is in place.
Test Results
My local tests show that the planner now correctly identifies cases with
zero or full selectivity, which were previously misestimated.
----------------------------------------------------------------------
CREATE TABLE t1 (id int, r int4range);
CREATE TABLE t2 (id int, r int4range);
INSERT INTO t1 SELECT g, int4range(g * 2 - 1, g * 2) FROM
generate_series(1, 1000) g;
INSERT INTO t2 SELECT g, int4range(10000 + g * 200, 10000 + g * 200 + 100)
FROM generate_series(1, 1000) g;
ANALYZE t1, t2;
----------------------------------------------------------------------

Real Selectivity:
----------------------------------------------------------------------
SELECT
    avg((t1.r << t2.r)::int) AS p_ll, -- Expected: 1.0
    avg((t1.r >> t2.r)::int) AS p_rr, -- Expected: 0.0
    avg((t1.r && t2.r)::int) AS p_ov  -- Expected: 0.0
FROM t1, t2;
-- Result: p_ll = 1.0, p_rr = 0.0, p_ov = 0.0
----------------------------------------------------------------------

Planner Improvements:
With the patch, the EXPLAIN output reflects these probabilities accurately:
For << (High Selectivity):
The planner correctly estimates rows=1000000 (1000 * 1000 * 1.0).
----------------------------------------------------------------------
Nested Loop (cost=29.50..15080.75 rows=1000000 width=61)
  Join Filter: (t1.r << t2.r)
----------------------------------------------------------------------
For && and >> (Near-Zero Selectivity):
The planner now correctly predicts rows=1 instead of using the default
multiplier.
----------------------------------------------------------------------
Nested Loop (cost=0.00..15036.50 rows=1 width=36)
  Join Filter: (t1.r && t2.r)
This improved estimation allows the optimizer to make much better decisions
regarding join order and nesting when range columns are involved.
I look forward to your feedback.
Regards,
Haibo


Attachments:

  [application/octet-stream] v5-0001-Improve-range-range-join-selectivity-estimation.patch (20.7K, 3-v5-0001-Improve-range-range-join-selectivity-estimation.patch)
  download | inline diff:
From ac1a0f23f7deb3a9a17bb8e150601ee9aaaafa46 Mon Sep 17 00:00:00 2001
From: Haibo Yan <[email protected]>
Date: Mon, 6 Apr 2026 09:30:10 -0700
Subject: [PATCH v5] Improve range/range join selectivity estimation

Teach rangejoinsel to estimate selected range/range join operators using
range histogram statistics instead of falling back to fixed defaults.

This improves planner row estimates for operators such as <<, >>, &&,
especially when the two range columns have clearly separated or strongly
overlapping distributions.

Regression tests cover plan changes for representative range join cases.
---
 src/backend/utils/adt/rangetypes_selfuncs.c | 301 ++++++++++++++++++++
 src/include/catalog/pg_operator.dat         |   6 +-
 src/include/catalog/pg_proc.dat             |   5 +
 src/test/regress/expected/rangetypes.out    | 114 ++++++++
 src/test/regress/sql/rangetypes.sql         |  53 ++++
 5 files changed, 476 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 75f1e7567d5..97ae19fbcd2 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,304 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed attributes X and Y, respectively.  Each array has at
+ * least 2 entries (one histogram bin).  The entries carry full bound metadata
+ * (lower/upper flag, inclusive/exclusive), and all comparisons use
+ * range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+
+		if (range_cmp_bounds(typcache, &hist1[i], &hist2[j]) < 0)
+			cur_sync = hist1[i++];
+		else if (range_cmp_bounds(typcache, &hist1[i], &hist2[j]) > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * rangejoinsel -- join selectivity for range-vs-range operators
+ *
+ * Supports: <<, >>, &&
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Other range operators are left to their existing generic estimators.
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		empty;
+	int			i;
+
+	{
+		bool	join_is_reversed;
+
+		get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+						   &join_is_reversed);
+	}
+
+	selec = default_range_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	if (vardata1.vartype != vardata2.vartype)
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/* Initialize type cache */
+	typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAP_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B)
+			 * = 1 - P(A.upper < B.lower) - P(B.upper < A.lower)
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All range operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1465f13120a..5ea4434f9fa 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3094,7 +3094,7 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
@@ -3122,12 +3122,12 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3ea17fc5629..c16aa8cec84 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12787,6 +12787,11 @@
   proname => 'error_on_null', proisstrict => 'f', prorettype => 'anyelement',
   proargtypes => 'anyelement', prosrc => 'pg_error_on_null' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
+
 { oid => '6321', descr => 'list of available WAL summary files',
   proname => 'pg_available_wal_summaries', prorows => '100', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index e062a4e5c2c..2fc5b770f90 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -2033,3 +2033,117 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 
 drop table text_support_test;
 drop type textrange_supp;
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 && test_range_join_2.ir2)
+         ->  Seq Scan on test_range_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_range_join_2.ir2 && test_range_join_3.ir3)
+                     ->  Seq Scan on test_range_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_2.ir2 << test_range_join_3.ir3)
+         ->  Nested Loop
+               Join Filter: (test_range_join_1.ir1 << test_range_join_2.ir2)
+               ->  Seq Scan on test_range_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_2
+         ->  Materialize
+               ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 >> test_range_join_2.ir2)
+         ->  Nested Loop
+               Join Filter: (test_range_join_2.ir2 >> test_range_join_3.ir3)
+               ->  Seq Scan on test_range_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_3
+         ->  Seq Scan on test_range_join_1
+(9 rows)
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 5c4b0337b7a..f69109da334 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -708,3 +708,56 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 drop table text_support_test;
 
 drop type textrange_supp;
+
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
@ 2026-04-14 14:03           ` SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: SCHOEMANS Maxime @ 2026-04-14 14:03 UTC (permalink / raw)
  To: Haibo Yan <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

Hi Haibo,

Thank you for picking this up again. I agree with the changes you made
in v5, in particular scoping the patch to the three strict operators and
reworking the tests to check plan structure rather than exact row counts.

Attached is v6 as a 3-patch series building on your v5.

Patch 1 is your range join selectivity patch with one small change: the
range_cmp_bounds result in the merge walk is stored in a local cmp
variable to avoid calling it twice per iteration, as jian he suggested.

Patch 2 adds the same estimation for multirange types, covering all type
combinations (multirange x multirange, multirange x range, range x
multirange). Since both range and multirange types use the same bound
histogram format and the same RangeBound representation, the core
algorithm is identical.

Patch 3 removes the duplication between rangetypes_selfuncs.c and
multirangetypes_selfuncs.c that Tom raised as a concern. It makes the
10 shared helper functions non-static, exports them via selfuncs.h,
and deletes the copies from the multirange file. This covers all the
pre-existing duplication between the two files, not just the functions
added in this patch set.

Regards,
Maxime


Attachments:

  [application/octet-stream] v6-0001-Improve-range-join-selectivity-estimation-for.patch (20.7K, 3-v6-0001-Improve-range-join-selectivity-estimation-for.patch)
  download | inline diff:
From d69c18011b5633172f7eefc2185838e1a06062ab Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:04:23 +0200
Subject: [PATCH v6 1/3] Improve range join selectivity estimation for <<, >>,
 &&

Teach rangejoinsel to estimate join selectivity for range operators
using bound histogram statistics instead of falling back to fixed
defaults. The estimation is based on a trapezoidal approximation of
P(X < Y) by parallel-scanning the bound histograms of both sides.

This improves planner row estimates especially when the two range
columns have clearly separated or strongly overlapping distributions.

Regression tests cover plan changes for representative range join cases.

Based on: Repas, Luo, Schoemans, Sakr (2022) "Selectivity Estimation
of Inequality Joins In Databases"
https://doi.org/10.48550/arXiv.2206.07396
---
 src/backend/utils/adt/rangetypes_selfuncs.c | 300 ++++++++++++++++++++
 src/include/catalog/pg_operator.dat         |   6 +-
 src/include/catalog/pg_proc.dat             |   4 +
 src/test/regress/expected/rangetypes.out    | 114 ++++++++
 src/test/regress/sql/rangetypes.sql         |  53 ++++
 5 files changed, 474 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 75f1e7567d5..9f212e9d178 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,303 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed attributes X and Y, respectively.  Each array has at
+ * least 2 entries (one histogram bin).  The entries carry full bound metadata
+ * (lower/upper flag, inclusive/exclusive), and all comparisons use
+ * range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * rangejoinsel -- join selectivity for range-vs-range operators
+ *
+ * Supports: <<, >>, &&
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Other range operators are left to their existing generic estimators.
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_range_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	if (vardata1.vartype != vardata2.vartype)
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/* Initialize type cache */
+	typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAP_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B)
+			 * = 1 - P(A.upper < B.lower) - P(B.upper < A.lower)
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All range operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1465f13120a..5ea4434f9fa 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3094,7 +3094,7 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
@@ -3122,12 +3122,12 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 99fa9a6ede2..c6a707acae4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12919,4 +12919,8 @@
   proname => 'hashoid8extended', prorettype => 'int8',
   proargtypes => 'oid8 int8', prosrc => 'hashoid8extended' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
 ]
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index e062a4e5c2c..2fc5b770f90 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -2033,3 +2033,117 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 
 drop table text_support_test;
 drop type textrange_supp;
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 && test_range_join_2.ir2)
+         ->  Seq Scan on test_range_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_range_join_2.ir2 && test_range_join_3.ir3)
+                     ->  Seq Scan on test_range_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_2.ir2 << test_range_join_3.ir3)
+         ->  Nested Loop
+               Join Filter: (test_range_join_1.ir1 << test_range_join_2.ir2)
+               ->  Seq Scan on test_range_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_2
+         ->  Materialize
+               ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 >> test_range_join_2.ir2)
+         ->  Nested Loop
+               Join Filter: (test_range_join_2.ir2 >> test_range_join_3.ir3)
+               ->  Seq Scan on test_range_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_3
+         ->  Seq Scan on test_range_join_1
+(9 rows)
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 5c4b0337b7a..f69109da334 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -708,3 +708,56 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 drop table text_support_test;
 
 drop type textrange_supp;
+
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v6-0002-Improve-multirange-join-selectivity-estimation-fo.patch (25.9K, 4-v6-0002-Improve-multirange-join-selectivity-estimation-fo.patch)
  download | inline diff:
From 3b5e90520b13c74f824890138fd4bc239f350461 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:06:03 +0200
Subject: [PATCH v6 2/3] Improve multirange join selectivity estimation for <<,
 >>, &&

Add multirangejoinsel, the multirange equivalent of rangejoinsel,
supporting all type combinations: multirange vs multirange, range vs
multirange, and multirange vs range.

The code is intentionally duplicated from rangetypes_selfuncs.c for
reviewability. A follow-up commit will remove the duplication.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 318 ++++++++++++++++++
 src/include/catalog/pg_operator.dat           |  18 +-
 src/include/catalog/pg_proc.dat               |   4 +
 src/test/regress/expected/multirangetypes.out | 157 +++++++++
 src/test/regress/sql/multirangetypes.sql      |  72 ++++
 5 files changed, 560 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 533111445e7..87f3db162a6 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1334,3 +1334,321 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed or multirange-typed attributes X and Y, respectively.
+ * Each array has at least 2 entries (one histogram bin).  The entries carry
+ * full bound metadata (lower/upper flag, inclusive/exclusive), and all
+ * comparisons use range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * multirangejoinsel -- join selectivity for multirange operators
+ *
+ * Supports: <<, >>, && for all type combinations:
+ *   multirange vs multirange, multirange vs range, range vs multirange
+ *
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Both range and multirange types store bound histograms in the same
+ * format, so the estimation is identical regardless of type combination.
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rng_typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_multirange_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/*
+	 * Determine the range type cache for bound comparisons.  At least one
+	 * side is a multirange type; try vardata1 first, then vardata2.
+	 */
+	typcache = lookup_type_cache(vardata1.vartype, TYPECACHE_MULTIRANGE_INFO);
+	if (typcache->rngtype != NULL)
+		rng_typcache = typcache->rngtype;
+	else
+	{
+		typcache = lookup_type_cache(vardata2.vartype,
+									 TYPECACHE_MULTIRANGE_INFO);
+		rng_typcache = typcache->rngtype;
+	}
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B) = 1 - P(A.upper <
+			 * B.lower) - P(B.upper < A.lower)
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_LEFT_RANGE_OP:
+		case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_RIGHT_RANGE_OP:
+		case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All multirange operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 5ea4434f9fa..28f696a9f41 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3302,19 +3302,19 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anyrange)',
   oprcode => 'range_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2867', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anymultirange)',
   oprcode => 'multirange_overlaps_range', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2868', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anymultirange)',
   oprcode => 'multirange_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2869', oid_symbol => 'OID_MULTIRANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyelement',
@@ -3428,37 +3428,37 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anyrange)',
   oprcode => 'range_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4396', oid_symbol => 'OID_MULTIRANGE_LEFT_RANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anymultirange)',
   oprcode => 'multirange_before_range', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4397', oid_symbol => 'OID_MULTIRANGE_LEFT_MULTIRANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anymultirange)',
   oprcode => 'multirange_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4398', oid_symbol => 'OID_RANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anyrange)',
   oprcode => 'range_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4399', oid_symbol => 'OID_MULTIRANGE_RIGHT_RANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anymultirange)',
   oprcode => 'multirange_after_range', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4400', oid_symbol => 'OID_MULTIRANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anymultirange)',
   oprcode => 'multirange_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 
 { oid => '8262', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'oid8',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c6a707acae4..10fbc22c4a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12923,4 +12923,8 @@
   proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
   proargtypes => 'internal oid internal int2 internal',
   prosrc => 'rangejoinsel' },
+{ oid => '8356', descr => 'join selectivity for multirange operators',
+  proname => 'multirangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'multirangejoinsel' },
 ]
diff --git a/src/test/regress/expected/multirangetypes.out b/src/test/regress/expected/multirangetypes.out
index f5e7df8df43..aab9c5e2604 100644
--- a/src/test/regress/expected/multirangetypes.out
+++ b/src/test/regress/expected/multirangetypes.out
@@ -3512,3 +3512,160 @@ create function mr_table_fail(i anyelement) returns table(i anyelement, r anymul
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anymultirange requires at least one input of type anyrange or anymultirange.
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 && test_mr_join_2.mr2)
+         ->  Seq Scan on test_mr_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_mr_join_2.mr2 && test_mr_join_3.mr3)
+                     ->  Seq Scan on test_mr_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_2.mr2 << test_mr_join_3.mr3)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_1.mr1 << test_mr_join_2.mr2)
+               ->  Seq Scan on test_mr_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_2
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 >> test_mr_join_2.mr2)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_2.mr2 >> test_mr_join_3.mr3)
+               ->  Seq Scan on test_mr_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_3
+         ->  Seq Scan on test_mr_join_1
+(9 rows)
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
diff --git a/src/test/regress/sql/multirangetypes.sql b/src/test/regress/sql/multirangetypes.sql
index 112334b03eb..e3f8cd6f4e3 100644
--- a/src/test/regress/sql/multirangetypes.sql
+++ b/src/test/regress/sql/multirangetypes.sql
@@ -904,3 +904,75 @@ create function mr_inoutparam_fail(inout i anyelement, out r anymultirange)
 --should fail
 create function mr_table_fail(i anyelement) returns table(i anyelement, r anymultirange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v6-0003-Remove-duplicate-selectivity-functions-between-ra.patch (33.7K, 5-v6-0003-Remove-duplicate-selectivity-functions-between-ra.patch)
  download | inline diff:
From 9993f06b9ecf288f06954d6af86532585d269991 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 18:25:23 +0200
Subject: [PATCH v6 3/3] Remove duplicate selectivity functions between range
 and multirange

The multirange selectivity code duplicated 10 helper functions from
rangetypes_selfuncs.c. Since both range and multirange types use the
same histogram format (STATISTIC_KIND_BOUNDS_HISTOGRAM) and the same
RangeBound representation, the functions are identical.

Make the 10 shared functions non-static in rangetypes_selfuncs.c,
export them via selfuncs.h, and remove the copies from
multirangetypes_selfuncs.c.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 771 ------------------
 src/backend/utils/adt/rangetypes_selfuncs.c   |  45 +-
 src/include/utils/selfuncs.h                  |  36 +
 3 files changed, 47 insertions(+), 805 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 87f3db162a6..b558e6912e7 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -38,37 +38,6 @@ static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata,
 									const MultirangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist,
-										   int hist_nvalues, bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value,
-								bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1,
-									double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower,
-											  RangeBound *upper,
-											  const RangeBound *hist_lower,
-											  int hist_nvalues,
-											  const Datum *length_hist_values,
-											  int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower,
-											 const RangeBound *upper,
-											 const RangeBound *hist_lower,
-											 int hist_nvalues,
-											 const Datum *length_hist_values,
-											 int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -698,746 +667,6 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
 	return hist_selec;
 }
 
-
-/*
- * Look up the fraction of values less than (or equal, if 'equal' argument
- * is true) a given const in a histogram of range bounds.
- */
-static double
-calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
-							 const RangeBound *hist, int hist_nvalues, bool equal)
-{
-	Selectivity selec;
-	int			index;
-
-	/*
-	 * Find the histogram bin the given constant falls into. Estimate
-	 * selectivity as the number of preceding whole bins.
-	 */
-	index = rbound_bsearch(typcache, constbound, hist, hist_nvalues, equal);
-	selec = (Selectivity) (Max(index, 0)) / (Selectivity) (hist_nvalues - 1);
-
-	/* Adjust using linear interpolation within the bin */
-	if (index >= 0 && index < hist_nvalues - 1)
-		selec += get_position(typcache, constbound, &hist[index],
-							  &hist[index + 1]) / (Selectivity) (hist_nvalues - 1);
-
-	return selec;
-}
-
-/*
- * Binary search on an array of range bounds. Returns greatest index of range
- * bound in array which is less(less or equal) than given range bound. If all
- * range bounds in array are greater or equal(greater) than given range bound,
- * return -1. When "equal" flag is set conditions in brackets are used.
- *
- * This function is used in scalar operator selectivity estimation. Another
- * goal of this function is to find a histogram bin where to stop
- * interpolation of portion of bounds which are less than or equal to given bound.
- */
-static int
-rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
-			   int hist_length, bool equal)
-{
-	int			lower = -1,
-				upper = hist_length - 1,
-				cmp,
-				middle;
-
-	while (lower < upper)
-	{
-		middle = (lower + upper + 1) / 2;
-		cmp = range_cmp_bounds(typcache, &hist[middle], value);
-
-		if (cmp < 0 || (equal && cmp == 0))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-
-/*
- * Binary search on length histogram. Returns greatest index of range length in
- * histogram which is less than (less than or equal) the given length value. If
- * all lengths in the histogram are greater than (greater than or equal) the
- * given length, returns -1.
- */
-static int
-length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
-					double value, bool equal)
-{
-	int			lower = -1,
-				upper = length_hist_nvalues - 1,
-				middle;
-
-	while (lower < upper)
-	{
-		double		middleval;
-
-		middle = (lower + upper + 1) / 2;
-
-		middleval = DatumGetFloat8(length_hist_values[middle]);
-		if (middleval < value || (equal && middleval <= value))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-/*
- * Get relative position of value in histogram bin in [0,1] range.
- */
-static float8
-get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
-			 const RangeBound *hist2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-	float8		position;
-
-	if (!hist1->infinite && !hist2->infinite)
-	{
-		float8		bin_width;
-
-		/*
-		 * Both bounds are finite. Assuming the subtype's comparison function
-		 * works sanely, the value must be finite, too, because it lies
-		 * somewhere between the bounds.  If it doesn't, arbitrarily return
-		 * 0.5.
-		 */
-		if (value->infinite)
-			return 0.5;
-
-		/* Can't interpolate without subdiff function */
-		if (!has_subdiff)
-			return 0.5;
-
-		/* Calculate relative position using subdiff function. */
-		bin_width = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													 typcache->rng_collation,
-													 hist2->val,
-													 hist1->val));
-		if (isnan(bin_width) || bin_width <= 0.0)
-			return 0.5;			/* punt for NaN or zero-width bin */
-
-		position = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													typcache->rng_collation,
-													value->val,
-													hist1->val))
-			/ bin_width;
-
-		if (isnan(position))
-			return 0.5;			/* punt for NaN from subdiff, Inf/Inf, etc */
-
-		/* Relative position must be in [0,1] range */
-		position = Max(position, 0.0);
-		position = Min(position, 1.0);
-		return position;
-	}
-	else if (hist1->infinite && !hist2->infinite)
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. If the value is
-		 * -infinite, return 0.0 to indicate it's equal to the lower bound.
-		 * Otherwise return 1.0 to indicate it's infinitely far from the lower
-		 * bound.
-		 */
-		return ((value->infinite && value->lower) ? 0.0 : 1.0);
-	}
-	else if (!hist1->infinite && hist2->infinite)
-	{
-		/* same as above, but in reverse */
-		return ((value->infinite && !value->lower) ? 1.0 : 0.0);
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing if a user creates
-		 * a datatype with a broken comparison function).
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-
-/*
- * Get relative position of value in a length histogram bin in [0,1] range.
- */
-static double
-get_len_position(double value, double hist1, double hist2)
-{
-	if (!isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Both bounds are finite. The value should be finite too, because it
-		 * lies somewhere between the bounds. If it doesn't, just return
-		 * something.
-		 */
-		if (isinf(value))
-			return 0.5;
-
-		return 1.0 - (hist2 - value) / (hist2 - hist1);
-	}
-	else if (isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. Return 1.0 to
-		 * indicate the value is infinitely far from the lower bound.
-		 */
-		return 1.0;
-	}
-	else if (isinf(hist1) && isinf(hist2))
-	{
-		/* same as above, but in reverse */
-		return 0.0;
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing unnecessarily if
-		 * the caller messes up)
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-/*
- * Measure distance between two range bounds.
- */
-static float8
-get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-
-	if (!bound1->infinite && !bound2->infinite)
-	{
-		/*
-		 * Neither bound is infinite, use subdiff function or return default
-		 * value of 1.0 if no subdiff is available.
-		 */
-		if (has_subdiff)
-		{
-			float8		res;
-
-			res = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-												   typcache->rng_collation,
-												   bound2->val,
-												   bound1->val));
-			/* Reject possible NaN result, also negative result */
-			if (isnan(res) || res < 0.0)
-				return 1.0;
-			else
-				return res;
-		}
-		else
-			return 1.0;
-	}
-	else if (bound1->infinite && bound2->infinite)
-	{
-		/* Both bounds are infinite */
-		if (bound1->lower == bound2->lower)
-			return 0.0;
-		else
-			return get_float8_infinity();
-	}
-	else
-	{
-		/* One bound is infinite, the other is not */
-		return get_float8_infinity();
-	}
-}
-
-/*
- * Calculate the average of function P(x), in the interval [length1, length2],
- * where P(x) is the fraction of tuples with length < x (or length <= x if
- * 'equal' is true).
- */
-static double
-calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
-					  double length1, double length2, bool equal)
-{
-	double		frac;
-	double		A,
-				B,
-				PA,
-				PB;
-	double		pos;
-	int			i;
-	double		area;
-
-	Assert(length2 >= length1);
-
-	if (length2 < 0.0)
-		return 0.0;				/* shouldn't happen, but doesn't hurt to check */
-
-	/* All lengths in the table are <= infinite. */
-	if (isinf(length2) && equal)
-		return 1.0;
-
-	/*----------
-	 * The average of a function between A and B can be calculated by the
-	 * formula:
-	 *
-	 *			B
-	 *	  1		/
-	 * -------	| P(x)dx
-	 *	B - A	/
-	 *			A
-	 *
-	 * The geometrical interpretation of the integral is the area under the
-	 * graph of P(x). P(x) is defined by the length histogram. We calculate
-	 * the area in a piecewise fashion, iterating through the length histogram
-	 * bins. Each bin is a trapezoid:
-	 *
-	 *		 P(x2)
-	 *		  /|
-	 *		 / |
-	 * P(x1)/  |
-	 *	   |   |
-	 *	   |   |
-	 *	---+---+--
-	 *	   x1  x2
-	 *
-	 * where x1 and x2 are the boundaries of the current histogram, and P(x1)
-	 * and P(x1) are the cumulative fraction of tuples at the boundaries.
-	 *
-	 * The area of each trapezoid is 1/2 * (P(x2) + P(x1)) * (x2 - x1)
-	 *
-	 * The first bin contains the lower bound passed by the caller, so we
-	 * use linear interpolation between the previous and next histogram bin
-	 * boundary to calculate P(x1). Likewise for the last bin: we use linear
-	 * interpolation to calculate P(x2). For the bins in between, x1 and x2
-	 * lie on histogram bin boundaries, so P(x1) and P(x2) are simply:
-	 * P(x1) =	  (bin index) / (number of bins)
-	 * P(x2) = (bin index + 1 / (number of bins)
-	 */
-
-	/* First bin, the one that contains lower bound */
-	i = length_hist_bsearch(length_hist_values, length_hist_nvalues, length1, equal);
-	if (i >= length_hist_nvalues - 1)
-		return 1.0;
-
-	if (i < 0)
-	{
-		i = 0;
-		pos = 0.0;
-	}
-	else
-	{
-		/* interpolate length1's position in the bin */
-		pos = get_len_position(length1,
-							   DatumGetFloat8(length_hist_values[i]),
-							   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-	B = length1;
-
-	/*
-	 * In the degenerate case that length1 == length2, simply return
-	 * P(length1). This is not merely an optimization: if length1 == length2,
-	 * we'd divide by zero later on.
-	 */
-	if (length2 == length1)
-		return PB;
-
-	/*
-	 * Loop through all the bins, until we hit the last bin, the one that
-	 * contains the upper bound. (if lower and upper bounds are in the same
-	 * bin, this falls out immediately)
-	 */
-	area = 0.0;
-	for (; i < length_hist_nvalues - 1; i++)
-	{
-		double		bin_upper = DatumGetFloat8(length_hist_values[i + 1]);
-
-		/* check if we've reached the last bin */
-		if (!(bin_upper < length2 || (equal && bin_upper <= length2)))
-			break;
-
-		/* the upper bound of previous bin is the lower bound of this bin */
-		A = B;
-		PA = PB;
-
-		B = bin_upper;
-		PB = (double) i / (double) (length_hist_nvalues - 1);
-
-		/*
-		 * Add the area of this trapezoid to the total. The point of the
-		 * if-check is to avoid NaN, in the corner case that PA == PB == 0,
-		 * and B - A == Inf. The area of a zero-height trapezoid (PA == PB ==
-		 * 0) is zero, regardless of the width (B - A).
-		 */
-		if (PA > 0 || PB > 0)
-			area += 0.5 * (PB + PA) * (B - A);
-	}
-
-	/* Last bin */
-	A = B;
-	PA = PB;
-
-	B = length2;				/* last bin ends at the query upper bound */
-	if (i >= length_hist_nvalues - 1)
-		pos = 0.0;
-	else
-	{
-		if (DatumGetFloat8(length_hist_values[i]) == DatumGetFloat8(length_hist_values[i + 1]))
-			pos = 0.0;
-		else
-			pos = get_len_position(length2,
-								   DatumGetFloat8(length_hist_values[i]),
-								   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-
-	if (PA > 0 || PB > 0)
-		area += 0.5 * (PB + PA) * (B - A);
-
-	/*
-	 * Ok, we have calculated the area, ie. the integral. Divide by width to
-	 * get the requested average.
-	 *
-	 * Avoid NaN arising from infinite / infinite. This happens at least if
-	 * length2 is infinite. It's not clear what the correct value would be in
-	 * that case, so 0.5 seems as good as any value.
-	 */
-	if (isinf(area) && isinf(length2))
-		frac = 0.5;
-	else
-		frac = area / (length2 - length1);
-
-	return frac;
-}
-
-/*
- * Calculate selectivity of "var <@ const" operator, ie. estimate the fraction
- * of multiranges that fall within the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- *
- * The caller has already checked that constant lower and upper bounds are
- * finite.
- */
-static double
-calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-								const RangeBound *lower, RangeBound *upper,
-								const RangeBound *hist_lower, int hist_nvalues,
-								const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				upper_index;
-	float8		prev_dist;
-	double		bin_width;
-	double		upper_bin_width;
-	double		sum_frac;
-
-	/*
-	 * Begin by finding the bin containing the upper bound, in the lower bound
-	 * histogram. Any range with a lower bound > constant upper bound can't
-	 * match, ie. there are no matches in bins greater than upper_index.
-	 */
-	upper->inclusive = !upper->inclusive;
-	upper->lower = true;
-	upper_index = rbound_bsearch(typcache, upper, hist_lower, hist_nvalues,
-								 false);
-
-	/*
-	 * If the upper bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (upper_index < 0)
-		return 0.0;
-
-	/*
-	 * If the upper bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	upper_index = Min(upper_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate upper_bin_width, ie. the fraction of the (upper_index,
-	 * upper_index + 1) bin which is greater than upper bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	upper_bin_width = get_position(typcache, upper,
-								   &hist_lower[upper_index],
-								   &hist_lower[upper_index + 1]);
-
-	/*
-	 * In the loop, dist and prev_dist are the distance of the "current" bin's
-	 * lower and upper bounds from the constant upper bound.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, but can be less in the corner cases: start
-	 * and end of the loop. We start with bin_width = upper_bin_width, because
-	 * we begin at the bin containing the upper bound.
-	 */
-	prev_dist = 0.0;
-	bin_width = upper_bin_width;
-
-	sum_frac = 0.0;
-	for (i = upper_index; i >= 0; i--)
-	{
-		double		dist;
-		double		length_hist_frac;
-		bool		final_bin = false;
-
-		/*
-		 * dist -- distance from upper bound of query range to lower bound of
-		 * the current bin in the lower bound histogram. Or to the lower bound
-		 * of the constant range, if this is the final bin, containing the
-		 * constant lower bound.
-		 */
-		if (range_cmp_bounds(typcache, &hist_lower[i], lower) < 0)
-		{
-			dist = get_distance(typcache, lower, upper);
-
-			/*
-			 * Subtract from bin_width the portion of this bin that we want to
-			 * ignore.
-			 */
-			bin_width -= get_position(typcache, lower, &hist_lower[i],
-									  &hist_lower[i + 1]);
-			if (bin_width < 0.0)
-				bin_width = 0.0;
-			final_bin = true;
-		}
-		else
-			dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Estimate the fraction of tuples in this bin that are narrow enough
-		 * to not exceed the distance to the upper bound of the query range.
-		 */
-		length_hist_frac = calc_length_hist_frac(length_hist_values,
-												 length_hist_nvalues,
-												 prev_dist, dist, true);
-
-		/*
-		 * Add the fraction of tuples in this bin, with a suitable length, to
-		 * the total.
-		 */
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		if (final_bin)
-			break;
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Calculate selectivity of "var @> const" operator, ie. estimate the fraction
- * of multiranges that contain the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- */
-static double
-calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-							   const RangeBound *lower, const RangeBound *upper,
-							   const RangeBound *hist_lower, int hist_nvalues,
-							   const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				lower_index;
-	double		bin_width,
-				lower_bin_width;
-	double		sum_frac;
-	float8		prev_dist;
-
-	/* Find the bin containing the lower bound of query range. */
-	lower_index = rbound_bsearch(typcache, lower, hist_lower, hist_nvalues,
-								 true);
-
-	/*
-	 * If the lower bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (lower_index < 0)
-		return 0.0;
-
-	/*
-	 * If the lower bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	lower_index = Min(lower_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate lower_bin_width, ie. the fraction of the of (lower_index,
-	 * lower_index + 1) bin which is greater than lower bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	lower_bin_width = get_position(typcache, lower, &hist_lower[lower_index],
-								   &hist_lower[lower_index + 1]);
-
-	/*
-	 * Loop through all the lower bound bins, smaller than the query lower
-	 * bound. In the loop, dist and prev_dist are the distance of the
-	 * "current" bin's lower and upper bounds from the constant upper bound.
-	 * We begin from query lower bound, and walk backwards, so the first bin's
-	 * upper bound is the query lower bound, and its distance to the query
-	 * upper bound is the length of the query range.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, except for the first bin, which is only
-	 * counted up to the constant lower bound.
-	 */
-	prev_dist = get_distance(typcache, lower, upper);
-	sum_frac = 0.0;
-	bin_width = lower_bin_width;
-	for (i = lower_index; i >= 0; i--)
-	{
-		float8		dist;
-		double		length_hist_frac;
-
-		/*
-		 * dist -- distance from upper bound of query range to current value
-		 * of lower bound histogram or lower bound of query range (if we've
-		 * reach it).
-		 */
-		dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Get average fraction of length histogram which covers intervals
-		 * longer than (or equal to) distance to upper bound of query range.
-		 */
-		length_hist_frac =
-			1.0 - calc_length_hist_frac(length_hist_values,
-										length_hist_nvalues,
-										prev_dist, dist, false);
-
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Estimate join selectivity P(X < Y) using rangebound histograms.
- *
- * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
- * "Selectivity Estimation of Inequality Joins In Databases"
- * https://doi.org/10.48550/arXiv.2206.07396
- *
- * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed or multirange-typed attributes X and Y, respectively.
- * Each array has at least 2 entries (one histogram bin).  The entries carry
- * full bound metadata (lower/upper flag, inclusive/exclusive), and all
- * comparisons use range_cmp_bounds() so that bound semantics are preserved.
- *
- * The algorithm models each attribute's distribution as a piecewise function
- * derived from its histogram, then computes:
- *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
- * by parallel-scanning both histograms.
- *
- * The initial fast-forward loops skip histogram entries that fall entirely
- * before the other histogram's range, so the main loop only processes the
- * overlapping region.  Bounds checks are required because the histograms may
- * be completely disjoint (e.g., all of X is below all of Y).
- */
-static double
-calc_hist_join_selectivity(TypeCacheEntry *typcache,
-						   const RangeBound *hist1, int nhist1,
-						   const RangeBound *hist2, int nhist2)
-{
-	int			i,
-				j;
-	double		selectivity = 0.0;
-	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
-	double		prev_sel2 = 0.0;
-
-	Assert(nhist1 > 1);
-	Assert(nhist2 > 1);
-
-	/*
-	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
-	 * vice versa.  Bounds checks prevent out-of-bounds access when the
-	 * histograms are fully disjoint.
-	 */
-	for (i = 0; i < nhist1 &&
-		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
-		;
-	for (j = 0; j < nhist2 &&
-		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
-		;
-
-	/*
-	 * Handle fully-separated histograms.  When all bounds in hist1 are below
-	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
-	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
-	 * into the overlap walk with invalid indices.
-	 */
-	if (i >= nhist1)
-		return 1.0;
-	if (j >= nhist2)
-		return 0.0;
-
-	/* Walk the overlapping region of both histograms */
-	while (i < nhist1 && j < nhist2)
-	{
-		double		cur_sel1,
-					cur_sel2;
-		RangeBound	cur_sync;
-		int			cmp;
-
-		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
-		if (cmp < 0)
-			cur_sync = hist1[i++];
-		else if (cmp > 0)
-			cur_sync = hist2[j++];
-		else
-		{
-			/* Equal bounds: advance both */
-			cur_sync = hist1[i];
-			i++;
-			j++;
-		}
-		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist1, nhist1, false);
-		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist2, nhist2, false);
-
-		/* Skip the first iteration (no previous point yet) */
-		if (prev_sel1 >= 0)
-			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
-
-		prev_sel1 = cur_sel1;
-		prev_sel2 = cur_sel2;
-	}
-
-	/* P(X < Y) = 0.5 * Sum(...) */
-	selectivity /= 2;
-
-	/* Include remainder of hist2 if hist1 was exhausted first */
-	if (j < nhist2)
-		selectivity += 1 - prev_sel2;
-
-	return selectivity;
-}
-
 /*
  * multirangejoinsel -- join selectivity for multirange operators
  *
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 9f212e9d178..8cde336b55c 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -35,29 +35,6 @@ static double default_range_selectivity(Oid operator);
 static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata, const RangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist, int hist_nvalues,
-										   bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value, bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1, double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower, RangeBound *upper,
-											  const RangeBound *hist_lower, int hist_nvalues,
-											  const Datum *length_hist_values, int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower, const RangeBound *upper,
-											 const RangeBound *hist_lower, int hist_nvalues,
-											 const Datum *length_hist_values, int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -592,7 +569,7 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
  * Look up the fraction of values less than (or equal, if 'equal' argument
  * is true) a given const in a histogram of range bounds.
  */
-static double
+double
 calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
 							 const RangeBound *hist, int hist_nvalues, bool equal)
 {
@@ -624,7 +601,7 @@ calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbo
  * goal of this function is to find a histogram bin where to stop
  * interpolation of portion of bounds which are less than or equal to given bound.
  */
-static int
+int
 rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
 			   int hist_length, bool equal)
 {
@@ -653,7 +630,7 @@ rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBou
  * all lengths in the histogram are greater than (greater than or equal) the
  * given length, returns -1.
  */
-static int
+int
 length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 					double value, bool equal)
 {
@@ -679,7 +656,7 @@ length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 /*
  * Get relative position of value in histogram bin in [0,1] range.
  */
-static float8
+float8
 get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
 			 const RangeBound *hist2)
 {
@@ -758,7 +735,7 @@ get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound
 /*
  * Get relative position of value in a length histogram bin in [0,1] range.
  */
-static double
+double
 get_len_position(double value, double hist1, double hist2)
 {
 	if (!isinf(hist1) && !isinf(hist2))
@@ -803,7 +780,7 @@ get_len_position(double value, double hist1, double hist2)
 /*
  * Measure distance between two range bounds.
  */
-static float8
+float8
 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
 {
 	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
@@ -851,7 +828,7 @@ get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBoun
  * where P(x) is the fraction of tuples with length < x (or length <= x if
  * 'equal' is true).
  */
-static double
+double
 calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
 					  double length1, double length2, bool equal)
 {
@@ -1014,7 +991,7 @@ calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
  * The caller has already checked that constant lower and upper bounds are
  * finite.
  */
-static double
+double
 calc_hist_selectivity_contained(TypeCacheEntry *typcache,
 								const RangeBound *lower, RangeBound *upper,
 								const RangeBound *hist_lower, int hist_nvalues,
@@ -1135,7 +1112,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
  * the histograms of range lower bounds and range lengths, on the assumption
  * that the range lengths are independent of the lower bounds.
  */
-static double
+double
 calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 							   const RangeBound *lower, const RangeBound *upper,
 							   const RangeBound *hist_lower, int hist_nvalues,
@@ -1230,7 +1207,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * https://doi.org/10.48550/arXiv.2206.07396
  *
  * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed attributes X and Y, respectively.  Each array has at
+ * of two range- or multirange-typed attributes X and Y, respectively.  Each array has at
  * least 2 entries (one histogram bin).  The entries carry full bound metadata
  * (lower/upper flag, inclusive/exclusive), and all comparisons use
  * range_cmp_bounds() so that bound semantics are preserved.
@@ -1245,7 +1222,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * overlapping region.  Bounds checks are required because the histograms may
  * be completely disjoint (e.g., all of X is below all of Y).
  */
-static double
+double
 calc_hist_join_selectivity(TypeCacheEntry *typcache,
 						   const RangeBound *hist1, int nhist1,
 						   const RangeBound *hist2, int nhist2)
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 8d9fff95a19..1efab370efb 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -18,6 +18,7 @@
 #include "access/htup.h"
 #include "fmgr.h"
 #include "nodes/pathnodes.h"
+#include "utils/rangetypes.h"
 
 
 /*
@@ -248,6 +249,41 @@ extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
 								double loop_count,
 								GenericCosts *costs);
 
+/* Functions in rangetypes_selfuncs.c */
+
+extern double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
+										   const RangeBound *constbound,
+										   const RangeBound *hist, int hist_nvalues,
+										   bool equal);
+extern int	rbound_bsearch(TypeCacheEntry *typcache,
+						   const RangeBound *value, const RangeBound *hist,
+						   int hist_length, bool equal);
+extern int	length_hist_bsearch(const Datum *length_hist_values,
+								int length_hist_nvalues,
+								double value, bool equal);
+extern float8 get_position(TypeCacheEntry *typcache,
+						   const RangeBound *value,
+						   const RangeBound *hist1, const RangeBound *hist2);
+extern double get_len_position(double value, double hist1, double hist2);
+extern float8 get_distance(TypeCacheEntry *typcache,
+						   const RangeBound *bound1, const RangeBound *bound2);
+extern double calc_length_hist_frac(const Datum *length_hist_values,
+									int length_hist_nvalues,
+									double length1, double length2, bool equal);
+extern double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
+											  const RangeBound *lower, RangeBound *upper,
+											  const RangeBound *hist_lower, int hist_nvalues,
+											  const Datum *length_hist_values,
+											  int length_hist_nvalues);
+extern double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
+											 const RangeBound *lower, const RangeBound *upper,
+											 const RangeBound *hist_lower, int hist_nvalues,
+											 const Datum *length_hist_values,
+											 int length_hist_nvalues);
+extern double calc_hist_join_selectivity(TypeCacheEntry *typcache,
+										 const RangeBound *hist1, int nhist1,
+										 const RangeBound *hist2, int nhist2);
+
 /* Functions in array_selfuncs.c */
 
 extern Selectivity scalararraysel_containment(PlannerInfo *root,
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
@ 2026-04-15 00:53             ` Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Haibo Yan @ 2026-04-15 00:53 UTC (permalink / raw)
  To: SCHOEMANS Maxime <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

On Tue, Apr 14, 2026 at 7:03 AM SCHOEMANS Maxime <[email protected]>
wrote:

> Hi Haibo,
>
> Thank you for picking this up again. I agree with the changes you made
> in v5, in particular scoping the patch to the three strict operators and
> reworking the tests to check plan structure rather than exact row counts.
>
> Attached is v6 as a 3-patch series building on your v5.
>
> Patch 1 is your range join selectivity patch with one small change: the
> range_cmp_bounds result in the merge walk is stored in a local cmp
> variable to avoid calling it twice per iteration, as jian he suggested.
>

Hi Maxime,
Thank you for working on this and for building on top of v5.

I think patch 1 looks good to me. I do not see major issues there. One
small note: the localized bool join_is_reversed; in my version was
intentional. I left it that way because get_join_variables() wants a
storage location, and I preferred to keep that use local and explicit
rather than trying to reshape things around it.

>
> Patch 2 adds the same estimation for multirange types, covering all type
> combinations (multirange x multirange, multirange x range, range x
> multirange). Since both range and multirange types use the same bound
> histogram format and the same RangeBound representation, the core
> algorithm is identical.
>
For patch 2, I am less convinced, especially for &&.

My concern is not so much whether the code runs, but whether the semantic
argument is strong enough for the mixed range/multirange cases. The patch
assumes that because range and multirange use the same bounds histogram
format and the same RangeBound representation, the same estimator can be
applied directly. I think that argument is much easier to make for << and >>
than for &&.

For single ranges, && works nicely with the usual decomposition because if
two single ranges do not overlap, then one must be entirely to the left or
entirely to the right of the other. But for multiranges there is a third
possibility: neither side is entirely left nor entirely right, and yet they
still do not overlap because of an internal gap.
For example:

A = {[1,2), [100,101)}

B = [50,60)

Here:

A << B is false
A >> B is false
A && B is also false

So for multiranges, “not left and not right” does not imply overlap in the
same way it does for single ranges. That makes me worry that reusing the
same estimator logic for &&, especially in mixed range/multirange cases,
may overestimate overlap because the overall lower/upper bounds do not
capture the internal holes.

So I think patch 2 still needs a stronger justification there, and probably
more targeted tests around sparse multiranges / hole cases, especially for
&&.

>
> Patch 3 removes the duplication between rangetypes_selfuncs.c and
> multirangetypes_selfuncs.c that Tom raised as a concern. It makes the
> 10 shared helper functions non-static, exports them via selfuncs.h,
> and deletes the copies from the multirange file. This covers all the
> pre-existing duplication between the two files, not just the functions
> added in this patch set.
>
> Regards,
> Maxime
>
For patch 3, I agree with the motivation: the duplication between
rangetypes_selfuncs.c and multirangetypes_selfuncs.c is not ideal. But I am
not convinced that exporting those helpers via selfuncs.h is the right
boundary.

My preference would be something tighter:

   -

   keep the shared helper implementations in one place
   -

   add a backend-private internal header just for the range-family selfuncs
   code
   -

   include that internal header from rangetypes_selfuncs.c and
   multirangetypes_selfuncs.c
   -

   avoid widening visibility by turning a group of file-local helpers into
   broader extern declarations in selfuncs.h

In other words, I agree that the duplication should be removed, but I think
a backend-private internal header should be enough for that goal. I do not
think we need to expand visibility more than necessary by moving these
helpers out of the file-private space into a broader interface.

Thanks again for working on this.

Regards,

Haibo

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
@ 2026-04-15 15:13               ` SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: SCHOEMANS Maxime @ 2026-04-15 15:13 UTC (permalink / raw)
  To: Haibo Yan <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

Hi Haibo,

Thank you for the review.

> One small note: the localized bool join_is_reversed; in my version was
> intentional. I left it that way because get_join_variables() wants a
> storage location, and I preferred to keep that use local and explicit
> rather than trying to reshape things around it.

Fair point. I moved it out of the bare block because it looked unusual,
but I can change it back if you prefer.

> For patch 2, I am less convinced, especially for &&.
> [...]
> for multiranges there is a third possibility: neither side is entirely
> left nor entirely right, and yet they still do not overlap because of
> an internal gap.

This is a valid concern, but it is an existing limitation of multirange
statistics, not something we are introducing. The existing restriction
selectivity code in multirangetypes_selfuncs.c already uses the same
NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
multirange_typanalyze explicitly says:

    /* Treat multiranges like a big range without gaps. */

The statistics only store the outermost bounds, so the gap information
is already lost before our estimator sees it. The multirange GiST
opclass does the same (stores the bounding range). Our join estimator
is just consistent with how multiranges are handled elsewhere.

The alternative is falling back to the 0.005 default, which will almost
certainly be worse. Would a comment explaining the limitation be enough?

> For patch 3, I agree with the motivation [...] But I am not convinced
> that exporting those helpers via selfuncs.h is the right boundary.
> My preference would be something tighter: [...] a backend-private
> internal header just for the range-family selfuncs code

Good point about the visibility. I'll move the declarations to a
separate backend-private header in the next version.

Regards,
Maxime


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
@ 2026-04-16 04:12                 ` Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Haibo Yan @ 2026-04-16 04:12 UTC (permalink / raw)
  To: SCHOEMANS Maxime <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

On Wed, Apr 15, 2026 at 8:13 AM SCHOEMANS Maxime <[email protected]>
wrote:

> Hi Haibo,
>
> Thank you for the review.
>
> > One small note: the localized bool join_is_reversed; in my version was
> > intentional. I left it that way because get_join_variables() wants a
> > storage location, and I preferred to keep that use local and explicit
> > rather than trying to reshape things around it.
>
> Fair point. I moved it out of the bare block because it looked unusual,
> but I can change it back if you prefer.
>
> > For patch 2, I am less convinced, especially for &&.
> > [...]
> > for multiranges there is a third possibility: neither side is entirely
> > left nor entirely right, and yet they still do not overlap because of
> > an internal gap.
>
> This is a valid concern, but it is an existing limitation of multirange
> statistics, not something we are introducing. The existing restriction
> selectivity code in multirangetypes_selfuncs.c already uses the same
> NOT(<<) AND NOT(>>) decomposition for && on multiranges. And
> multirange_typanalyze explicitly says:
>
>     /* Treat multiranges like a big range without gaps. */
>
> The statistics only store the outermost bounds, so the gap information
> is already lost before our estimator sees it. The multirange GiST
> opclass does the same (stores the bounding range). Our join estimator
> is just consistent with how multiranges are handled elsewhere.
>
> The alternative is falling back to the 0.005 default, which will almost
> certainly be worse. Would a comment explaining the limitation be enough?
>
Thanks, that is a fair point.

I agree that this is not something patch 2 is uniquely introducing. If the

existing multirange statistics and restriction selectivity already treat a

multirange essentially as its outer bounds, then it makes sense that the

join estimator can only work within that same approximation.

So I am less worried about this as a correctness objection than I was at

first. My main concern is really about making that limitation explicit,

especially for &&, where internal gaps can matter a lot for the real

overlap semantics.

I think it would help if patch 2 said this a bit more directly, both in

the code comments and in the patch description. Something along the lines

of:


   - this reuses the same outer-bounds approximation already used by
   existing

multirange statistics / restriction selectivity


   - internal gaps are not represented in the available stats
   - so && for sparse multiranges may still be overestimated in some cases
   - but this is still expected to be better than falling back to a fixed
   default selectivity

> For patch 3, I agree with the motivation [...] But I am not convinced
> > that exporting those helpers via selfuncs.h is the right boundary.
> > My preference would be something tighter: [...] a backend-private
> > internal header just for the range-family selfuncs code
>
> Good point about the visibility. I'll move the declarations to a
> separate backend-private header in the next version.
>
> Regards,
> Maxime
>

If you are willing to add that clarification, I think that would address

most of my concern here.

Regards,
Haibo


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
@ 2026-04-16 15:12                   ` SCHOEMANS Maxime <[email protected]>
  2026-04-18 04:02                     ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: SCHOEMANS Maxime @ 2026-04-16 15:12 UTC (permalink / raw)
  To: Haibo Yan <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

Hi Haibo,

Attached is v7 with the changes we discussed.

Patch 2 now has an inline comment on the && case explaining the
outer-bounds approximation and its consistency with existing restriction
selectivity. The commit message mentions it as well.

Patch 3 uses a separate backend-private header (rangetypes_selfuncs.h)
instead of selfuncs.h.

Regards,
Maxime


Attachments:

  [application/octet-stream] v7-0001-Improve-range-join-selectivity-estimation-for.patch (20.7K, 3-v7-0001-Improve-range-join-selectivity-estimation-for.patch)
  download | inline diff:
From d19997bb7587076cc1e3fc0c81a1a06655f93bb5 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:04:23 +0200
Subject: [PATCH v7 1/3] Improve range join selectivity estimation for <<, >>,
 &&

Teach rangejoinsel to estimate join selectivity for range operators
using bound histogram statistics instead of falling back to fixed
defaults. The estimation is based on a trapezoidal approximation of
P(X < Y) by parallel-scanning the bound histograms of both sides.

This improves planner row estimates especially when the two range
columns have clearly separated or strongly overlapping distributions.

Regression tests cover plan changes for representative range join cases.

Based on: Repas, Luo, Schoemans, Sakr (2022) "Selectivity Estimation
of Inequality Joins In Databases"
https://doi.org/10.48550/arXiv.2206.07396
---
 src/backend/utils/adt/rangetypes_selfuncs.c | 300 ++++++++++++++++++++
 src/include/catalog/pg_operator.dat         |   6 +-
 src/include/catalog/pg_proc.dat             |   4 +
 src/test/regress/expected/rangetypes.out    | 114 ++++++++
 src/test/regress/sql/rangetypes.sql         |  53 ++++
 5 files changed, 474 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 75f1e7567d5..cc702f28610 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,303 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed attributes X and Y, respectively.  Each array has at
+ * least 2 entries (one histogram bin).  The entries carry full bound metadata
+ * (lower/upper flag, inclusive/exclusive), and all comparisons use
+ * range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * rangejoinsel -- join selectivity for range-vs-range operators
+ *
+ * Supports: <<, >>, &&
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Other range operators are left to their existing generic estimators.
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_range_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	if (vardata1.vartype != vardata2.vartype)
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/* Initialize type cache */
+	typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAP_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B) = 1 - P(A.upper <
+			 * B.lower) - P(B.upper < A.lower)
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All range operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1465f13120a..5ea4434f9fa 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3094,7 +3094,7 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
@@ -3122,12 +3122,12 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 99fa9a6ede2..c6a707acae4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12919,4 +12919,8 @@
   proname => 'hashoid8extended', prorettype => 'int8',
   proargtypes => 'oid8 int8', prosrc => 'hashoid8extended' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
 ]
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index e062a4e5c2c..2fc5b770f90 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -2033,3 +2033,117 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 
 drop table text_support_test;
 drop type textrange_supp;
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 && test_range_join_2.ir2)
+         ->  Seq Scan on test_range_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_range_join_2.ir2 && test_range_join_3.ir3)
+                     ->  Seq Scan on test_range_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_2.ir2 << test_range_join_3.ir3)
+         ->  Nested Loop
+               Join Filter: (test_range_join_1.ir1 << test_range_join_2.ir2)
+               ->  Seq Scan on test_range_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_2
+         ->  Materialize
+               ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 >> test_range_join_2.ir2)
+         ->  Nested Loop
+               Join Filter: (test_range_join_2.ir2 >> test_range_join_3.ir3)
+               ->  Seq Scan on test_range_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_3
+         ->  Seq Scan on test_range_join_1
+(9 rows)
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 5c4b0337b7a..f69109da334 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -708,3 +708,56 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 drop table text_support_test;
 
 drop type textrange_supp;
+
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v7-0002-Improve-multirange-join-selectivity-estimation-fo.patch (26.6K, 4-v7-0002-Improve-multirange-join-selectivity-estimation-fo.patch)
  download | inline diff:
From deb986653ba56f0580b03484e3f4a22716bac9f7 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:06:03 +0200
Subject: [PATCH v7 2/3] Improve multirange join selectivity estimation for <<,
 >>, &&

Add multirangejoinsel() to estimate join selectivity for multirange
operators using bound histograms, covering all type combinations:
multirange vs multirange, multirange vs range, range vs multirange.

Note that multirange statistics only represent the outermost bounds
(see multirange_typanalyze), so && may overestimate overlap for sparse
multiranges. This is consistent with how existing restriction
selectivity handles multirange &&.

The shared helper functions (calc_hist_join_selectivity and others) are
intentionally duplicated from rangetypes_selfuncs.c for reviewability.
A follow-up commit will remove the duplication.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 325 ++++++++++++++++++
 src/include/catalog/pg_operator.dat           |  18 +-
 src/include/catalog/pg_proc.dat               |   4 +
 src/test/regress/expected/multirangetypes.out | 157 +++++++++
 src/test/regress/sql/multirangetypes.sql      |  72 ++++
 5 files changed, 567 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 533111445e7..241f8c6dbe0 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1334,3 +1334,328 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed or multirange-typed attributes X and Y, respectively.
+ * Each array has at least 2 entries (one histogram bin).  The entries carry
+ * full bound metadata (lower/upper flag, inclusive/exclusive), and all
+ * comparisons use range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * multirangejoinsel -- join selectivity for multirange operators
+ *
+ * Supports: <<, >>, && for all type combinations:
+ *   multirange vs multirange, multirange vs range, range vs multirange
+ *
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Both range and multirange types store bound histograms in the same
+ * format, so the estimation is identical regardless of type combination.
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rng_typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_multirange_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/*
+	 * Determine the range type cache for bound comparisons.  At least one
+	 * side is a multirange type; try vardata1 first, then vardata2.
+	 */
+	typcache = lookup_type_cache(vardata1.vartype, TYPECACHE_MULTIRANGE_INFO);
+	if (typcache->rngtype != NULL)
+		rng_typcache = typcache->rngtype;
+	else
+	{
+		typcache = lookup_type_cache(vardata2.vartype,
+									 TYPECACHE_MULTIRANGE_INFO);
+		rng_typcache = typcache->rngtype;
+	}
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B) = 1 - P(A.upper <
+			 * B.lower) - P(B.upper < A.lower)
+			 *
+			 * This decomposition is exact for single ranges.  For
+			 * multiranges, the bound histograms only represent the outermost
+			 * lower and upper bounds (see multirange_typanalyze), so internal
+			 * gaps are not captured. This can overestimate overlap for sparse
+			 * multiranges, but is consistent with how existing restriction
+			 * selectivity handles multirange &&.
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_LEFT_RANGE_OP:
+		case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_RIGHT_RANGE_OP:
+		case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All multirange operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 5ea4434f9fa..28f696a9f41 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3302,19 +3302,19 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anyrange)',
   oprcode => 'range_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2867', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anymultirange)',
   oprcode => 'multirange_overlaps_range', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2868', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anymultirange)',
   oprcode => 'multirange_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2869', oid_symbol => 'OID_MULTIRANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyelement',
@@ -3428,37 +3428,37 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anyrange)',
   oprcode => 'range_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4396', oid_symbol => 'OID_MULTIRANGE_LEFT_RANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anymultirange)',
   oprcode => 'multirange_before_range', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4397', oid_symbol => 'OID_MULTIRANGE_LEFT_MULTIRANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anymultirange)',
   oprcode => 'multirange_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4398', oid_symbol => 'OID_RANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anyrange)',
   oprcode => 'range_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4399', oid_symbol => 'OID_MULTIRANGE_RIGHT_RANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anymultirange)',
   oprcode => 'multirange_after_range', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4400', oid_symbol => 'OID_MULTIRANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anymultirange)',
   oprcode => 'multirange_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 
 { oid => '8262', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'oid8',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c6a707acae4..10fbc22c4a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12923,4 +12923,8 @@
   proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
   proargtypes => 'internal oid internal int2 internal',
   prosrc => 'rangejoinsel' },
+{ oid => '8356', descr => 'join selectivity for multirange operators',
+  proname => 'multirangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'multirangejoinsel' },
 ]
diff --git a/src/test/regress/expected/multirangetypes.out b/src/test/regress/expected/multirangetypes.out
index f5e7df8df43..aab9c5e2604 100644
--- a/src/test/regress/expected/multirangetypes.out
+++ b/src/test/regress/expected/multirangetypes.out
@@ -3512,3 +3512,160 @@ create function mr_table_fail(i anyelement) returns table(i anyelement, r anymul
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anymultirange requires at least one input of type anyrange or anymultirange.
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 && test_mr_join_2.mr2)
+         ->  Seq Scan on test_mr_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_mr_join_2.mr2 && test_mr_join_3.mr3)
+                     ->  Seq Scan on test_mr_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_2.mr2 << test_mr_join_3.mr3)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_1.mr1 << test_mr_join_2.mr2)
+               ->  Seq Scan on test_mr_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_2
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 >> test_mr_join_2.mr2)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_2.mr2 >> test_mr_join_3.mr3)
+               ->  Seq Scan on test_mr_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_3
+         ->  Seq Scan on test_mr_join_1
+(9 rows)
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
diff --git a/src/test/regress/sql/multirangetypes.sql b/src/test/regress/sql/multirangetypes.sql
index 112334b03eb..e3f8cd6f4e3 100644
--- a/src/test/regress/sql/multirangetypes.sql
+++ b/src/test/regress/sql/multirangetypes.sql
@@ -904,3 +904,75 @@ create function mr_inoutparam_fail(inout i anyelement, out r anymultirange)
 --should fail
 create function mr_table_fail(i anyelement) returns table(i anyelement, r anymultirange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v7-0003-Remove-duplicate-selectivity-functions-between-ra.patch (34.5K, 5-v7-0003-Remove-duplicate-selectivity-functions-between-ra.patch)
  download | inline diff:
From 12665ca19f802a2ace50525cf5df8a6f95e860df Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Thu, 16 Apr 2026 16:28:17 +0200
Subject: [PATCH v7 3/3] Remove duplicate selectivity functions between range
 and multirange

The multirange selectivity code duplicated 10 helper functions from
rangetypes_selfuncs.c. Since both range and multirange types use the
same histogram format (STATISTIC_KIND_BOUNDS_HISTOGRAM) and the same
RangeBound representation, the functions are identical.

Make the 10 shared functions non-static in rangetypes_selfuncs.c,
export them via a new rangetypes_selfuncs.h header, and remove the
copies from multirangetypes_selfuncs.c.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 772 +-----------------
 src/backend/utils/adt/rangetypes_selfuncs.c   |  46 +-
 src/include/utils/rangetypes_selfuncs.h       |  54 ++
 3 files changed, 67 insertions(+), 805 deletions(-)
 create mode 100644 src/include/utils/rangetypes_selfuncs.h

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 241f8c6dbe0..fa5f23d09a9 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -27,6 +27,7 @@
 #include "utils/lsyscache.h"
 #include "utils/multirangetypes.h"
 #include "utils/rangetypes.h"
+#include "utils/rangetypes_selfuncs.h"
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
@@ -38,37 +39,6 @@ static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata,
 									const MultirangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist,
-										   int hist_nvalues, bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value,
-								bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1,
-									double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower,
-											  RangeBound *upper,
-											  const RangeBound *hist_lower,
-											  int hist_nvalues,
-											  const Datum *length_hist_values,
-											  int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower,
-											 const RangeBound *upper,
-											 const RangeBound *hist_lower,
-											 int hist_nvalues,
-											 const Datum *length_hist_values,
-											 int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -698,746 +668,6 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
 	return hist_selec;
 }
 
-
-/*
- * Look up the fraction of values less than (or equal, if 'equal' argument
- * is true) a given const in a histogram of range bounds.
- */
-static double
-calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
-							 const RangeBound *hist, int hist_nvalues, bool equal)
-{
-	Selectivity selec;
-	int			index;
-
-	/*
-	 * Find the histogram bin the given constant falls into. Estimate
-	 * selectivity as the number of preceding whole bins.
-	 */
-	index = rbound_bsearch(typcache, constbound, hist, hist_nvalues, equal);
-	selec = (Selectivity) (Max(index, 0)) / (Selectivity) (hist_nvalues - 1);
-
-	/* Adjust using linear interpolation within the bin */
-	if (index >= 0 && index < hist_nvalues - 1)
-		selec += get_position(typcache, constbound, &hist[index],
-							  &hist[index + 1]) / (Selectivity) (hist_nvalues - 1);
-
-	return selec;
-}
-
-/*
- * Binary search on an array of range bounds. Returns greatest index of range
- * bound in array which is less(less or equal) than given range bound. If all
- * range bounds in array are greater or equal(greater) than given range bound,
- * return -1. When "equal" flag is set conditions in brackets are used.
- *
- * This function is used in scalar operator selectivity estimation. Another
- * goal of this function is to find a histogram bin where to stop
- * interpolation of portion of bounds which are less than or equal to given bound.
- */
-static int
-rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
-			   int hist_length, bool equal)
-{
-	int			lower = -1,
-				upper = hist_length - 1,
-				cmp,
-				middle;
-
-	while (lower < upper)
-	{
-		middle = (lower + upper + 1) / 2;
-		cmp = range_cmp_bounds(typcache, &hist[middle], value);
-
-		if (cmp < 0 || (equal && cmp == 0))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-
-/*
- * Binary search on length histogram. Returns greatest index of range length in
- * histogram which is less than (less than or equal) the given length value. If
- * all lengths in the histogram are greater than (greater than or equal) the
- * given length, returns -1.
- */
-static int
-length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
-					double value, bool equal)
-{
-	int			lower = -1,
-				upper = length_hist_nvalues - 1,
-				middle;
-
-	while (lower < upper)
-	{
-		double		middleval;
-
-		middle = (lower + upper + 1) / 2;
-
-		middleval = DatumGetFloat8(length_hist_values[middle]);
-		if (middleval < value || (equal && middleval <= value))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-/*
- * Get relative position of value in histogram bin in [0,1] range.
- */
-static float8
-get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
-			 const RangeBound *hist2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-	float8		position;
-
-	if (!hist1->infinite && !hist2->infinite)
-	{
-		float8		bin_width;
-
-		/*
-		 * Both bounds are finite. Assuming the subtype's comparison function
-		 * works sanely, the value must be finite, too, because it lies
-		 * somewhere between the bounds.  If it doesn't, arbitrarily return
-		 * 0.5.
-		 */
-		if (value->infinite)
-			return 0.5;
-
-		/* Can't interpolate without subdiff function */
-		if (!has_subdiff)
-			return 0.5;
-
-		/* Calculate relative position using subdiff function. */
-		bin_width = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													 typcache->rng_collation,
-													 hist2->val,
-													 hist1->val));
-		if (isnan(bin_width) || bin_width <= 0.0)
-			return 0.5;			/* punt for NaN or zero-width bin */
-
-		position = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													typcache->rng_collation,
-													value->val,
-													hist1->val))
-			/ bin_width;
-
-		if (isnan(position))
-			return 0.5;			/* punt for NaN from subdiff, Inf/Inf, etc */
-
-		/* Relative position must be in [0,1] range */
-		position = Max(position, 0.0);
-		position = Min(position, 1.0);
-		return position;
-	}
-	else if (hist1->infinite && !hist2->infinite)
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. If the value is
-		 * -infinite, return 0.0 to indicate it's equal to the lower bound.
-		 * Otherwise return 1.0 to indicate it's infinitely far from the lower
-		 * bound.
-		 */
-		return ((value->infinite && value->lower) ? 0.0 : 1.0);
-	}
-	else if (!hist1->infinite && hist2->infinite)
-	{
-		/* same as above, but in reverse */
-		return ((value->infinite && !value->lower) ? 1.0 : 0.0);
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing if a user creates
-		 * a datatype with a broken comparison function).
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-
-/*
- * Get relative position of value in a length histogram bin in [0,1] range.
- */
-static double
-get_len_position(double value, double hist1, double hist2)
-{
-	if (!isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Both bounds are finite. The value should be finite too, because it
-		 * lies somewhere between the bounds. If it doesn't, just return
-		 * something.
-		 */
-		if (isinf(value))
-			return 0.5;
-
-		return 1.0 - (hist2 - value) / (hist2 - hist1);
-	}
-	else if (isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. Return 1.0 to
-		 * indicate the value is infinitely far from the lower bound.
-		 */
-		return 1.0;
-	}
-	else if (isinf(hist1) && isinf(hist2))
-	{
-		/* same as above, but in reverse */
-		return 0.0;
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing unnecessarily if
-		 * the caller messes up)
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-/*
- * Measure distance between two range bounds.
- */
-static float8
-get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-
-	if (!bound1->infinite && !bound2->infinite)
-	{
-		/*
-		 * Neither bound is infinite, use subdiff function or return default
-		 * value of 1.0 if no subdiff is available.
-		 */
-		if (has_subdiff)
-		{
-			float8		res;
-
-			res = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-												   typcache->rng_collation,
-												   bound2->val,
-												   bound1->val));
-			/* Reject possible NaN result, also negative result */
-			if (isnan(res) || res < 0.0)
-				return 1.0;
-			else
-				return res;
-		}
-		else
-			return 1.0;
-	}
-	else if (bound1->infinite && bound2->infinite)
-	{
-		/* Both bounds are infinite */
-		if (bound1->lower == bound2->lower)
-			return 0.0;
-		else
-			return get_float8_infinity();
-	}
-	else
-	{
-		/* One bound is infinite, the other is not */
-		return get_float8_infinity();
-	}
-}
-
-/*
- * Calculate the average of function P(x), in the interval [length1, length2],
- * where P(x) is the fraction of tuples with length < x (or length <= x if
- * 'equal' is true).
- */
-static double
-calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
-					  double length1, double length2, bool equal)
-{
-	double		frac;
-	double		A,
-				B,
-				PA,
-				PB;
-	double		pos;
-	int			i;
-	double		area;
-
-	Assert(length2 >= length1);
-
-	if (length2 < 0.0)
-		return 0.0;				/* shouldn't happen, but doesn't hurt to check */
-
-	/* All lengths in the table are <= infinite. */
-	if (isinf(length2) && equal)
-		return 1.0;
-
-	/*----------
-	 * The average of a function between A and B can be calculated by the
-	 * formula:
-	 *
-	 *			B
-	 *	  1		/
-	 * -------	| P(x)dx
-	 *	B - A	/
-	 *			A
-	 *
-	 * The geometrical interpretation of the integral is the area under the
-	 * graph of P(x). P(x) is defined by the length histogram. We calculate
-	 * the area in a piecewise fashion, iterating through the length histogram
-	 * bins. Each bin is a trapezoid:
-	 *
-	 *		 P(x2)
-	 *		  /|
-	 *		 / |
-	 * P(x1)/  |
-	 *	   |   |
-	 *	   |   |
-	 *	---+---+--
-	 *	   x1  x2
-	 *
-	 * where x1 and x2 are the boundaries of the current histogram, and P(x1)
-	 * and P(x1) are the cumulative fraction of tuples at the boundaries.
-	 *
-	 * The area of each trapezoid is 1/2 * (P(x2) + P(x1)) * (x2 - x1)
-	 *
-	 * The first bin contains the lower bound passed by the caller, so we
-	 * use linear interpolation between the previous and next histogram bin
-	 * boundary to calculate P(x1). Likewise for the last bin: we use linear
-	 * interpolation to calculate P(x2). For the bins in between, x1 and x2
-	 * lie on histogram bin boundaries, so P(x1) and P(x2) are simply:
-	 * P(x1) =	  (bin index) / (number of bins)
-	 * P(x2) = (bin index + 1 / (number of bins)
-	 */
-
-	/* First bin, the one that contains lower bound */
-	i = length_hist_bsearch(length_hist_values, length_hist_nvalues, length1, equal);
-	if (i >= length_hist_nvalues - 1)
-		return 1.0;
-
-	if (i < 0)
-	{
-		i = 0;
-		pos = 0.0;
-	}
-	else
-	{
-		/* interpolate length1's position in the bin */
-		pos = get_len_position(length1,
-							   DatumGetFloat8(length_hist_values[i]),
-							   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-	B = length1;
-
-	/*
-	 * In the degenerate case that length1 == length2, simply return
-	 * P(length1). This is not merely an optimization: if length1 == length2,
-	 * we'd divide by zero later on.
-	 */
-	if (length2 == length1)
-		return PB;
-
-	/*
-	 * Loop through all the bins, until we hit the last bin, the one that
-	 * contains the upper bound. (if lower and upper bounds are in the same
-	 * bin, this falls out immediately)
-	 */
-	area = 0.0;
-	for (; i < length_hist_nvalues - 1; i++)
-	{
-		double		bin_upper = DatumGetFloat8(length_hist_values[i + 1]);
-
-		/* check if we've reached the last bin */
-		if (!(bin_upper < length2 || (equal && bin_upper <= length2)))
-			break;
-
-		/* the upper bound of previous bin is the lower bound of this bin */
-		A = B;
-		PA = PB;
-
-		B = bin_upper;
-		PB = (double) i / (double) (length_hist_nvalues - 1);
-
-		/*
-		 * Add the area of this trapezoid to the total. The point of the
-		 * if-check is to avoid NaN, in the corner case that PA == PB == 0,
-		 * and B - A == Inf. The area of a zero-height trapezoid (PA == PB ==
-		 * 0) is zero, regardless of the width (B - A).
-		 */
-		if (PA > 0 || PB > 0)
-			area += 0.5 * (PB + PA) * (B - A);
-	}
-
-	/* Last bin */
-	A = B;
-	PA = PB;
-
-	B = length2;				/* last bin ends at the query upper bound */
-	if (i >= length_hist_nvalues - 1)
-		pos = 0.0;
-	else
-	{
-		if (DatumGetFloat8(length_hist_values[i]) == DatumGetFloat8(length_hist_values[i + 1]))
-			pos = 0.0;
-		else
-			pos = get_len_position(length2,
-								   DatumGetFloat8(length_hist_values[i]),
-								   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-
-	if (PA > 0 || PB > 0)
-		area += 0.5 * (PB + PA) * (B - A);
-
-	/*
-	 * Ok, we have calculated the area, ie. the integral. Divide by width to
-	 * get the requested average.
-	 *
-	 * Avoid NaN arising from infinite / infinite. This happens at least if
-	 * length2 is infinite. It's not clear what the correct value would be in
-	 * that case, so 0.5 seems as good as any value.
-	 */
-	if (isinf(area) && isinf(length2))
-		frac = 0.5;
-	else
-		frac = area / (length2 - length1);
-
-	return frac;
-}
-
-/*
- * Calculate selectivity of "var <@ const" operator, ie. estimate the fraction
- * of multiranges that fall within the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- *
- * The caller has already checked that constant lower and upper bounds are
- * finite.
- */
-static double
-calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-								const RangeBound *lower, RangeBound *upper,
-								const RangeBound *hist_lower, int hist_nvalues,
-								const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				upper_index;
-	float8		prev_dist;
-	double		bin_width;
-	double		upper_bin_width;
-	double		sum_frac;
-
-	/*
-	 * Begin by finding the bin containing the upper bound, in the lower bound
-	 * histogram. Any range with a lower bound > constant upper bound can't
-	 * match, ie. there are no matches in bins greater than upper_index.
-	 */
-	upper->inclusive = !upper->inclusive;
-	upper->lower = true;
-	upper_index = rbound_bsearch(typcache, upper, hist_lower, hist_nvalues,
-								 false);
-
-	/*
-	 * If the upper bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (upper_index < 0)
-		return 0.0;
-
-	/*
-	 * If the upper bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	upper_index = Min(upper_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate upper_bin_width, ie. the fraction of the (upper_index,
-	 * upper_index + 1) bin which is greater than upper bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	upper_bin_width = get_position(typcache, upper,
-								   &hist_lower[upper_index],
-								   &hist_lower[upper_index + 1]);
-
-	/*
-	 * In the loop, dist and prev_dist are the distance of the "current" bin's
-	 * lower and upper bounds from the constant upper bound.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, but can be less in the corner cases: start
-	 * and end of the loop. We start with bin_width = upper_bin_width, because
-	 * we begin at the bin containing the upper bound.
-	 */
-	prev_dist = 0.0;
-	bin_width = upper_bin_width;
-
-	sum_frac = 0.0;
-	for (i = upper_index; i >= 0; i--)
-	{
-		double		dist;
-		double		length_hist_frac;
-		bool		final_bin = false;
-
-		/*
-		 * dist -- distance from upper bound of query range to lower bound of
-		 * the current bin in the lower bound histogram. Or to the lower bound
-		 * of the constant range, if this is the final bin, containing the
-		 * constant lower bound.
-		 */
-		if (range_cmp_bounds(typcache, &hist_lower[i], lower) < 0)
-		{
-			dist = get_distance(typcache, lower, upper);
-
-			/*
-			 * Subtract from bin_width the portion of this bin that we want to
-			 * ignore.
-			 */
-			bin_width -= get_position(typcache, lower, &hist_lower[i],
-									  &hist_lower[i + 1]);
-			if (bin_width < 0.0)
-				bin_width = 0.0;
-			final_bin = true;
-		}
-		else
-			dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Estimate the fraction of tuples in this bin that are narrow enough
-		 * to not exceed the distance to the upper bound of the query range.
-		 */
-		length_hist_frac = calc_length_hist_frac(length_hist_values,
-												 length_hist_nvalues,
-												 prev_dist, dist, true);
-
-		/*
-		 * Add the fraction of tuples in this bin, with a suitable length, to
-		 * the total.
-		 */
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		if (final_bin)
-			break;
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Calculate selectivity of "var @> const" operator, ie. estimate the fraction
- * of multiranges that contain the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- */
-static double
-calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-							   const RangeBound *lower, const RangeBound *upper,
-							   const RangeBound *hist_lower, int hist_nvalues,
-							   const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				lower_index;
-	double		bin_width,
-				lower_bin_width;
-	double		sum_frac;
-	float8		prev_dist;
-
-	/* Find the bin containing the lower bound of query range. */
-	lower_index = rbound_bsearch(typcache, lower, hist_lower, hist_nvalues,
-								 true);
-
-	/*
-	 * If the lower bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (lower_index < 0)
-		return 0.0;
-
-	/*
-	 * If the lower bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	lower_index = Min(lower_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate lower_bin_width, ie. the fraction of the of (lower_index,
-	 * lower_index + 1) bin which is greater than lower bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	lower_bin_width = get_position(typcache, lower, &hist_lower[lower_index],
-								   &hist_lower[lower_index + 1]);
-
-	/*
-	 * Loop through all the lower bound bins, smaller than the query lower
-	 * bound. In the loop, dist and prev_dist are the distance of the
-	 * "current" bin's lower and upper bounds from the constant upper bound.
-	 * We begin from query lower bound, and walk backwards, so the first bin's
-	 * upper bound is the query lower bound, and its distance to the query
-	 * upper bound is the length of the query range.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, except for the first bin, which is only
-	 * counted up to the constant lower bound.
-	 */
-	prev_dist = get_distance(typcache, lower, upper);
-	sum_frac = 0.0;
-	bin_width = lower_bin_width;
-	for (i = lower_index; i >= 0; i--)
-	{
-		float8		dist;
-		double		length_hist_frac;
-
-		/*
-		 * dist -- distance from upper bound of query range to current value
-		 * of lower bound histogram or lower bound of query range (if we've
-		 * reach it).
-		 */
-		dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Get average fraction of length histogram which covers intervals
-		 * longer than (or equal to) distance to upper bound of query range.
-		 */
-		length_hist_frac =
-			1.0 - calc_length_hist_frac(length_hist_values,
-										length_hist_nvalues,
-										prev_dist, dist, false);
-
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Estimate join selectivity P(X < Y) using rangebound histograms.
- *
- * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
- * "Selectivity Estimation of Inequality Joins In Databases"
- * https://doi.org/10.48550/arXiv.2206.07396
- *
- * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed or multirange-typed attributes X and Y, respectively.
- * Each array has at least 2 entries (one histogram bin).  The entries carry
- * full bound metadata (lower/upper flag, inclusive/exclusive), and all
- * comparisons use range_cmp_bounds() so that bound semantics are preserved.
- *
- * The algorithm models each attribute's distribution as a piecewise function
- * derived from its histogram, then computes:
- *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
- * by parallel-scanning both histograms.
- *
- * The initial fast-forward loops skip histogram entries that fall entirely
- * before the other histogram's range, so the main loop only processes the
- * overlapping region.  Bounds checks are required because the histograms may
- * be completely disjoint (e.g., all of X is below all of Y).
- */
-static double
-calc_hist_join_selectivity(TypeCacheEntry *typcache,
-						   const RangeBound *hist1, int nhist1,
-						   const RangeBound *hist2, int nhist2)
-{
-	int			i,
-				j;
-	double		selectivity = 0.0;
-	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
-	double		prev_sel2 = 0.0;
-
-	Assert(nhist1 > 1);
-	Assert(nhist2 > 1);
-
-	/*
-	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
-	 * vice versa.  Bounds checks prevent out-of-bounds access when the
-	 * histograms are fully disjoint.
-	 */
-	for (i = 0; i < nhist1 &&
-		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
-		;
-	for (j = 0; j < nhist2 &&
-		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
-		;
-
-	/*
-	 * Handle fully-separated histograms.  When all bounds in hist1 are below
-	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
-	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
-	 * into the overlap walk with invalid indices.
-	 */
-	if (i >= nhist1)
-		return 1.0;
-	if (j >= nhist2)
-		return 0.0;
-
-	/* Walk the overlapping region of both histograms */
-	while (i < nhist1 && j < nhist2)
-	{
-		double		cur_sel1,
-					cur_sel2;
-		RangeBound	cur_sync;
-		int			cmp;
-
-		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
-		if (cmp < 0)
-			cur_sync = hist1[i++];
-		else if (cmp > 0)
-			cur_sync = hist2[j++];
-		else
-		{
-			/* Equal bounds: advance both */
-			cur_sync = hist1[i];
-			i++;
-			j++;
-		}
-		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist1, nhist1, false);
-		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist2, nhist2, false);
-
-		/* Skip the first iteration (no previous point yet) */
-		if (prev_sel1 >= 0)
-			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
-
-		prev_sel1 = cur_sel1;
-		prev_sel2 = cur_sel2;
-	}
-
-	/* P(X < Y) = 0.5 * Sum(...) */
-	selectivity /= 2;
-
-	/* Include remainder of hist2 if hist1 was exhausted first */
-	if (j < nhist2)
-		selectivity += 1 - prev_sel2;
-
-	return selectivity;
-}
-
 /*
  * multirangejoinsel -- join selectivity for multirange operators
  *
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index cc702f28610..4f4baa7dc1a 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -26,6 +26,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/rangetypes.h"
+#include "utils/rangetypes_selfuncs.h"
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
@@ -35,29 +36,6 @@ static double default_range_selectivity(Oid operator);
 static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata, const RangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist, int hist_nvalues,
-										   bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value, bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1, double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower, RangeBound *upper,
-											  const RangeBound *hist_lower, int hist_nvalues,
-											  const Datum *length_hist_values, int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower, const RangeBound *upper,
-											 const RangeBound *hist_lower, int hist_nvalues,
-											 const Datum *length_hist_values, int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -592,7 +570,7 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
  * Look up the fraction of values less than (or equal, if 'equal' argument
  * is true) a given const in a histogram of range bounds.
  */
-static double
+double
 calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
 							 const RangeBound *hist, int hist_nvalues, bool equal)
 {
@@ -624,7 +602,7 @@ calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbo
  * goal of this function is to find a histogram bin where to stop
  * interpolation of portion of bounds which are less than or equal to given bound.
  */
-static int
+int
 rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
 			   int hist_length, bool equal)
 {
@@ -653,7 +631,7 @@ rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBou
  * all lengths in the histogram are greater than (greater than or equal) the
  * given length, returns -1.
  */
-static int
+int
 length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 					double value, bool equal)
 {
@@ -679,7 +657,7 @@ length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 /*
  * Get relative position of value in histogram bin in [0,1] range.
  */
-static float8
+float8
 get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
 			 const RangeBound *hist2)
 {
@@ -758,7 +736,7 @@ get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound
 /*
  * Get relative position of value in a length histogram bin in [0,1] range.
  */
-static double
+double
 get_len_position(double value, double hist1, double hist2)
 {
 	if (!isinf(hist1) && !isinf(hist2))
@@ -803,7 +781,7 @@ get_len_position(double value, double hist1, double hist2)
 /*
  * Measure distance between two range bounds.
  */
-static float8
+float8
 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
 {
 	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
@@ -851,7 +829,7 @@ get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBoun
  * where P(x) is the fraction of tuples with length < x (or length <= x if
  * 'equal' is true).
  */
-static double
+double
 calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
 					  double length1, double length2, bool equal)
 {
@@ -1014,7 +992,7 @@ calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
  * The caller has already checked that constant lower and upper bounds are
  * finite.
  */
-static double
+double
 calc_hist_selectivity_contained(TypeCacheEntry *typcache,
 								const RangeBound *lower, RangeBound *upper,
 								const RangeBound *hist_lower, int hist_nvalues,
@@ -1135,7 +1113,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
  * the histograms of range lower bounds and range lengths, on the assumption
  * that the range lengths are independent of the lower bounds.
  */
-static double
+double
 calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 							   const RangeBound *lower, const RangeBound *upper,
 							   const RangeBound *hist_lower, int hist_nvalues,
@@ -1230,7 +1208,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * https://doi.org/10.48550/arXiv.2206.07396
  *
  * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed attributes X and Y, respectively.  Each array has at
+ * of two range- or multirange-typed attributes X and Y, respectively.  Each array has at
  * least 2 entries (one histogram bin).  The entries carry full bound metadata
  * (lower/upper flag, inclusive/exclusive), and all comparisons use
  * range_cmp_bounds() so that bound semantics are preserved.
@@ -1245,7 +1223,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * overlapping region.  Bounds checks are required because the histograms may
  * be completely disjoint (e.g., all of X is below all of Y).
  */
-static double
+double
 calc_hist_join_selectivity(TypeCacheEntry *typcache,
 						   const RangeBound *hist1, int nhist1,
 						   const RangeBound *hist2, int nhist2)
diff --git a/src/include/utils/rangetypes_selfuncs.h b/src/include/utils/rangetypes_selfuncs.h
new file mode 100644
index 00000000000..be6bda9ab11
--- /dev/null
+++ b/src/include/utils/rangetypes_selfuncs.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * rangetypes_selfuncs.h
+ *	  Shared helper functions for range and multirange selectivity estimation.
+ *
+ * These functions are defined in rangetypes_selfuncs.c and used by both
+ * rangetypes_selfuncs.c and multirangetypes_selfuncs.c.
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/rangetypes_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RANGETYPES_SELFUNCS_H
+#define RANGETYPES_SELFUNCS_H
+
+#include "utils/rangetypes.h"
+
+extern double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
+										   const RangeBound *constbound,
+										   const RangeBound *hist, int hist_nvalues,
+										   bool equal);
+extern int	rbound_bsearch(TypeCacheEntry *typcache,
+						   const RangeBound *value, const RangeBound *hist,
+						   int hist_length, bool equal);
+extern int	length_hist_bsearch(const Datum *length_hist_values,
+								int length_hist_nvalues,
+								double value, bool equal);
+extern float8 get_position(TypeCacheEntry *typcache,
+						   const RangeBound *value,
+						   const RangeBound *hist1, const RangeBound *hist2);
+extern double get_len_position(double value, double hist1, double hist2);
+extern float8 get_distance(TypeCacheEntry *typcache,
+						   const RangeBound *bound1, const RangeBound *bound2);
+extern double calc_length_hist_frac(const Datum *length_hist_values,
+									int length_hist_nvalues,
+									double length1, double length2, bool equal);
+extern double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
+											  const RangeBound *lower, RangeBound *upper,
+											  const RangeBound *hist_lower, int hist_nvalues,
+											  const Datum *length_hist_values,
+											  int length_hist_nvalues);
+extern double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
+											 const RangeBound *lower, const RangeBound *upper,
+											 const RangeBound *hist_lower, int hist_nvalues,
+											 const Datum *length_hist_values,
+											 int length_hist_nvalues);
+extern double calc_hist_join_selectivity(TypeCacheEntry *typcache,
+										 const RangeBound *hist1, int nhist1,
+										 const RangeBound *hist2, int nhist2);
+
+#endif							/* RANGETYPES_SELFUNCS_H */
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
@ 2026-04-18 04:02                     ` Haibo Yan <[email protected]>
  2026-04-21 13:54                       ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Haibo Yan @ 2026-04-18 04:02 UTC (permalink / raw)
  To: SCHOEMANS Maxime <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

On Thu, Apr 16, 2026 at 8:13 AM SCHOEMANS Maxime <[email protected]>
wrote:

> Hi Haibo,
>
> Attached is v7 with the changes we discussed.
>
> Patch 2 now has an inline comment on the && case explaining the
> outer-bounds approximation and its consistency with existing restriction
> selectivity. The commit message mentions it as well.
>
> Patch 3 uses a separate backend-private header (rangetypes_selfuncs.h)
> instead of selfuncs.h.
>
> Regards,
> Maxime
>

Hi Maxime,

Thanks for the updated series. Overall I do not have major objections to
the direction here.

A few small nits on patch 2:

   1. In the commit message, I wonder if “the core algorithm is identical”
   is a bit stronger than necessary. Since the main point is that we are
   reusing the same approximation based on outer bounds, something like “the
   same outer-bounds-based estimator can be reused” might be a bit more
   precise.
   2. In a few comments, the wording still says just “range”, but in patch
   2 we are really dealing with range/multirange combinations. I think it
   would be a bit clearer to make that explicit where appropriate, and reserve
   “range” for the underlying range-type/bound-comparison level.
   3. I think it would be good to add the reverse mixed-direction test as
   well, since patch 2 covers multirange × range in addition to range ×
   multirange. Something like:

--------------------------------------------------------
explain (costs off)
select count(*)
from test_mr_join_mr a, test_mr_join_r b
where a.mr << b.r;

explain (costs off)
select count(*)
from test_mr_join_mr a, test_mr_join_r b
where a.mr >> b.r;

explain (costs off)
select count(*)
from test_mr_join_mr a, test_mr_join_r b
where a.mr && b.r;
--------------------------------------------------------

I think that would make the mixed-case coverage feel more complete.

Regards,
Haibo


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-18 04:02                     ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
@ 2026-04-21 13:54                       ` SCHOEMANS Maxime <[email protected]>
  2026-04-23 02:25                         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: SCHOEMANS Maxime @ 2026-04-21 13:54 UTC (permalink / raw)
  To: Haibo Yan <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

Hi Haibo,

Thanks for the continued feedback. Attached is v8 addressing your nits
on patch 2:

- Reworded the commit message to say "the same outer-bounds-based
  estimator can be reused" instead of implying the code is just
  duplicated.
- Made comments in multirangejoinsel type-neutral where they
  unnecessarily said "range" (e.g. "bound histograms" instead of
  "range histograms", "empty values" instead of "empty ranges").
- Added the reverse mixed-direction tests (multirange x range).

Regards,
Maxime


Attachments:

  [application/octet-stream] v8-0001-Improve-range-join-selectivity-estimation-for.patch (20.7K, 3-v8-0001-Improve-range-join-selectivity-estimation-for.patch)
  download | inline diff:
From d19997bb7587076cc1e3fc0c81a1a06655f93bb5 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:04:23 +0200
Subject: [PATCH v8 1/3] Improve range join selectivity estimation for <<, >>,
 &&

Teach rangejoinsel to estimate join selectivity for range operators
using bound histogram statistics instead of falling back to fixed
defaults. The estimation is based on a trapezoidal approximation of
P(X < Y) by parallel-scanning the bound histograms of both sides.

This improves planner row estimates especially when the two range
columns have clearly separated or strongly overlapping distributions.

Regression tests cover plan changes for representative range join cases.

Based on: Repas, Luo, Schoemans, Sakr (2022) "Selectivity Estimation
of Inequality Joins In Databases"
https://doi.org/10.48550/arXiv.2206.07396
---
 src/backend/utils/adt/rangetypes_selfuncs.c | 300 ++++++++++++++++++++
 src/include/catalog/pg_operator.dat         |   6 +-
 src/include/catalog/pg_proc.dat             |   4 +
 src/test/regress/expected/rangetypes.out    | 114 ++++++++
 src/test/regress/sql/rangetypes.sql         |  53 ++++
 5 files changed, 474 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 75f1e7567d5..cc702f28610 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1221,3 +1221,303 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed attributes X and Y, respectively.  Each array has at
+ * least 2 entries (one histogram bin).  The entries carry full bound metadata
+ * (lower/upper flag, inclusive/exclusive), and all comparisons use
+ * range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * rangejoinsel -- join selectivity for range-vs-range operators
+ *
+ * Supports: <<, >>, &&
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Other range operators are left to their existing generic estimators.
+ */
+Datum
+rangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_range_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	if (vardata1.vartype != vardata2.vartype)
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/* Initialize type cache */
+	typcache = range_get_typcache(fcinfo, vardata1.vartype);
+
+	/* Look up NULL and empty-range fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get fraction of empty ranges for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get fraction of empty ranges for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert range histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAP_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B) = 1 - P(A.upper <
+			 * B.lower) - P(B.upper < A.lower)
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty ranges only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty ranges always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All range operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1465f13120a..5ea4434f9fa 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3094,7 +3094,7 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anyrange)',
   oprcode => 'range_overlaps', oprrest => 'rangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3889', oid_symbol => 'OID_RANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anyrange', oprright => 'anyelement',
@@ -3122,12 +3122,12 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anyrange)',
   oprcode => 'range_before', oprrest => 'rangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3894', oid_symbol => 'OID_RANGE_RIGHT_OP', descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anyrange)',
   oprcode => 'range_after', oprrest => 'rangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'rangejoinsel' },
 { oid => '3895', oid_symbol => 'OID_RANGE_OVERLAPS_LEFT_OP',
   descr => 'overlaps or is left of',
   oprname => '&<', oprleft => 'anyrange', oprright => 'anyrange',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 99fa9a6ede2..c6a707acae4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12919,4 +12919,8 @@
   proname => 'hashoid8extended', prorettype => 'int8',
   proargtypes => 'oid8 int8', prosrc => 'hashoid8extended' },
 
+{ oid => '8355', descr => 'join selectivity for range operators',
+  proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'rangejoinsel' },
 ]
diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out
index e062a4e5c2c..2fc5b770f90 100644
--- a/src/test/regress/expected/rangetypes.out
+++ b/src/test/regress/expected/rangetypes.out
@@ -2033,3 +2033,117 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 
 drop table text_support_test;
 drop type textrange_supp;
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 && test_range_join_2.ir2)
+         ->  Seq Scan on test_range_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_range_join_2.ir2 && test_range_join_3.ir3)
+                     ->  Seq Scan on test_range_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_2.ir2 << test_range_join_3.ir3)
+         ->  Nested Loop
+               Join Filter: (test_range_join_1.ir1 << test_range_join_2.ir2)
+               ->  Seq Scan on test_range_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_2
+         ->  Materialize
+               ->  Seq Scan on test_range_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_range_join_1.ir1 >> test_range_join_2.ir2)
+         ->  Nested Loop
+               Join Filter: (test_range_join_2.ir2 >> test_range_join_3.ir3)
+               ->  Seq Scan on test_range_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_range_join_3
+         ->  Seq Scan on test_range_join_1
+(9 rows)
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+                     QUERY PLAN                     
+----------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_range_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_range_join_hi b
+(6 rows)
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql
index 5c4b0337b7a..f69109da334 100644
--- a/src/test/regress/sql/rangetypes.sql
+++ b/src/test/regress/sql/rangetypes.sql
@@ -708,3 +708,56 @@ select * from text_support_test where t <@ textrange_supp('a', 'd');
 drop table text_support_test;
 
 drop type textrange_supp;
+
+--
+-- test selectivity of range join operators
+--
+create table test_range_join_1 (ir1 int4range);
+create table test_range_join_2 (ir2 int4range);
+create table test_range_join_3 (ir3 int4range);
+
+insert into test_range_join_1 select int4range(g, g+10) from generate_series(1, 1000) g;
+insert into test_range_join_1 select int4range(g, g+100) from generate_series(1, 1000, 10) g;
+insert into test_range_join_2 select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_range_join_2 select int4range(g, g+100) from generate_series(1, 500, 10) g;
+insert into test_range_join_3 select int4range(g, g+10) from generate_series(501, 1000) g;
+insert into test_range_join_3 select int4range(g, g+100) from generate_series(501, 1000, 10) g;
+
+analyze test_range_join_1;
+analyze test_range_join_2;
+analyze test_range_join_3;
+
+-- reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 && ir2 and ir2 && ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 << ir2 and ir2 << ir3;
+explain (costs off) select count(*) from test_range_join_1, test_range_join_2, test_range_join_3 where ir1 >> ir2 and ir2 >> ir3;
+
+drop table test_range_join_1;
+drop table test_range_join_2;
+drop table test_range_join_3;
+
+--
+-- test range join selectivity with fully disjoint histograms
+-- (exercises the bounds-check logic when histograms do not overlap)
+--
+create table test_range_join_lo (r int4range);
+create table test_range_join_hi (r int4range);
+
+-- low ranges: [1,11), [2,12), ... [500,510)
+insert into test_range_join_lo select int4range(g, g+10) from generate_series(1, 500) g;
+-- high ranges: [10001,10011), [10002,10012), ... [10500,10510)
+insert into test_range_join_hi select int4range(g, g+10) from generate_series(10001, 10500) g;
+
+analyze test_range_join_lo;
+analyze test_range_join_hi;
+
+-- lo << hi should produce a large selectivity (most pairs match)
+-- lo >> hi should produce a near-zero selectivity
+-- lo && hi should produce a near-zero selectivity (no overlap)
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_range_join_lo a, test_range_join_hi b where a.r && b.r;
+
+drop table test_range_join_lo;
+drop table test_range_join_hi;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v8-0002-Improve-multirange-join-selectivity-estimation-fo.patch (28.2K, 4-v8-0002-Improve-multirange-join-selectivity-estimation-fo.patch)
  download | inline diff:
From ec3953fc66211b5c74f4b5704ccb934d0b981abc Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 13 Apr 2026 16:06:03 +0200
Subject: [PATCH v8 2/3] Improve multirange join selectivity estimation for <<,
 >>, &&

Add multirangejoinsel() to estimate join selectivity for multirange
operators using bound histograms, covering all type combinations:
multirange vs multirange, multirange vs range, range vs multirange.

Note that multirange statistics only represent the outermost bounds
(see multirange_typanalyze), so && may overestimate overlap for sparse
multiranges. This is consistent with how existing restriction
selectivity handles multirange &&.

Since multirange bound histograms have the same structure as range
bound histograms, the same outer-bounds-based estimator can be reused.
The helper functions are intentionally duplicated from
rangetypes_selfuncs.c for reviewability; a follow-up commit will
remove the duplication.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 325 ++++++++++++++++++
 src/include/catalog/pg_operator.dat           |  18 +-
 src/include/catalog/pg_proc.dat               |   4 +
 src/test/regress/expected/multirangetypes.out | 191 ++++++++++
 src/test/regress/sql/multirangetypes.sql      |  77 +++++
 5 files changed, 606 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 533111445e7..e3d5c527e03 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1334,3 +1334,328 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 
 	return sum_frac;
 }
+
+/*
+ * Estimate join selectivity P(X < Y) using rangebound histograms.
+ *
+ * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
+ * "Selectivity Estimation of Inequality Joins In Databases"
+ * https://doi.org/10.48550/arXiv.2206.07396
+ *
+ * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
+ * of two range-typed or multirange-typed attributes X and Y, respectively.
+ * Each array has at least 2 entries (one histogram bin).  The entries carry
+ * full bound metadata (lower/upper flag, inclusive/exclusive), and all
+ * comparisons use range_cmp_bounds() so that bound semantics are preserved.
+ *
+ * The algorithm models each attribute's distribution as a piecewise function
+ * derived from its histogram, then computes:
+ *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
+ * by parallel-scanning both histograms.
+ *
+ * The initial fast-forward loops skip histogram entries that fall entirely
+ * before the other histogram's range, so the main loop only processes the
+ * overlapping region.  Bounds checks are required because the histograms may
+ * be completely disjoint (e.g., all of X is below all of Y).
+ */
+static double
+calc_hist_join_selectivity(TypeCacheEntry *typcache,
+						   const RangeBound *hist1, int nhist1,
+						   const RangeBound *hist2, int nhist2)
+{
+	int			i,
+				j;
+	double		selectivity = 0.0;
+	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
+	double		prev_sel2 = 0.0;
+
+	Assert(nhist1 > 1);
+	Assert(nhist2 > 1);
+
+	/*
+	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
+	 * vice versa.  Bounds checks prevent out-of-bounds access when the
+	 * histograms are fully disjoint.
+	 */
+	for (i = 0; i < nhist1 &&
+		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
+		;
+	for (j = 0; j < nhist2 &&
+		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
+		;
+
+	/*
+	 * Handle fully-separated histograms.  When all bounds in hist1 are below
+	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
+	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
+	 * into the overlap walk with invalid indices.
+	 */
+	if (i >= nhist1)
+		return 1.0;
+	if (j >= nhist2)
+		return 0.0;
+
+	/* Walk the overlapping region of both histograms */
+	while (i < nhist1 && j < nhist2)
+	{
+		double		cur_sel1,
+					cur_sel2;
+		RangeBound	cur_sync;
+		int			cmp;
+
+		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
+		if (cmp < 0)
+			cur_sync = hist1[i++];
+		else if (cmp > 0)
+			cur_sync = hist2[j++];
+		else
+		{
+			/* Equal bounds: advance both */
+			cur_sync = hist1[i];
+			i++;
+			j++;
+		}
+		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist1, nhist1, false);
+		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
+												hist2, nhist2, false);
+
+		/* Skip the first iteration (no previous point yet) */
+		if (prev_sel1 >= 0)
+			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
+
+		prev_sel1 = cur_sel1;
+		prev_sel2 = cur_sel2;
+	}
+
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
+	/* Include remainder of hist2 if hist1 was exhausted first */
+	if (j < nhist2)
+		selectivity += 1 - prev_sel2;
+
+	return selectivity;
+}
+
+/*
+ * multirangejoinsel -- join selectivity for multirange operators
+ *
+ * Supports: <<, >>, && for all type combinations:
+ *   multirange vs multirange, multirange vs range, range vs multirange
+ *
+ * These operators map directly to strict bound comparisons P(X < Y),
+ * which calc_hist_join_selectivity() estimates from bound histograms.
+ * Both range and multirange types store bound histograms in the same
+ * format, so the estimation is identical regardless of type combination.
+ */
+Datum
+multirangejoinsel(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+	VariableStatData vardata1;
+	VariableStatData vardata2;
+	Selectivity selec;
+	AttStatsSlot hist1;
+	AttStatsSlot hist2;
+	AttStatsSlot sslot;
+	bool		have_hist1 = false;
+	bool		have_hist2 = false;
+	TypeCacheEntry *typcache;
+	TypeCacheEntry *rng_typcache;
+	Form_pg_statistic stats1;
+	Form_pg_statistic stats2;
+	double		empty_frac1;
+	double		empty_frac2;
+	double		null_frac1;
+	double		null_frac2;
+	int			nhist1;
+	int			nhist2;
+	RangeBound *hist1_lower;
+	RangeBound *hist1_upper;
+	RangeBound *hist2_lower;
+	RangeBound *hist2_upper;
+	bool		join_is_reversed;
+	bool		empty;
+	int			i;
+
+	get_join_variables(root, args, sjinfo, &vardata1, &vardata2,
+					   &join_is_reversed);
+
+	selec = default_multirange_selectivity(operator);
+
+	/*
+	 * Acquire histogram stats for both sides.  Each slot is tracked
+	 * independently so we can release exactly what was acquired on any
+	 * failure path.
+	 */
+	if (!HeapTupleIsValid(vardata1.statsTuple) ||
+		!HeapTupleIsValid(vardata2.statsTuple))
+		goto cleanup;
+
+	memset(&hist1, 0, sizeof(hist1));
+	memset(&hist2, 0, sizeof(hist2));
+
+	if (!get_attstatsslot(&hist1, vardata1.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist1 = true;
+
+	if (!get_attstatsslot(&hist2, vardata2.statsTuple,
+						  STATISTIC_KIND_BOUNDS_HISTOGRAM, InvalidOid,
+						  ATTSTATSSLOT_VALUES))
+		goto cleanup;
+	have_hist2 = true;
+
+	/*
+	 * Determine the range type cache for bound comparisons.  At least one
+	 * side is a multirange type; try vardata1 first, then vardata2.
+	 */
+	typcache = lookup_type_cache(vardata1.vartype, TYPECACHE_MULTIRANGE_INFO);
+	if (typcache->rngtype != NULL)
+		rng_typcache = typcache->rngtype;
+	else
+	{
+		typcache = lookup_type_cache(vardata2.vartype,
+									 TYPECACHE_MULTIRANGE_INFO);
+		rng_typcache = typcache->rngtype;
+	}
+
+	/* Look up NULL and empty fractions */
+	stats1 = (Form_pg_statistic) GETSTRUCT(vardata1.statsTuple);
+	stats2 = (Form_pg_statistic) GETSTRUCT(vardata2.statsTuple);
+
+	null_frac1 = stats1->stanullfrac;
+	null_frac2 = stats2->stanullfrac;
+
+	/* Try to get empty fraction for the first variable */
+	if (get_attstatsslot(&sslot, vardata1.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac1 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac1 = 0.0;
+	}
+
+	/* Try to get empty fraction for the second variable */
+	if (get_attstatsslot(&sslot, vardata2.statsTuple,
+						 STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM,
+						 InvalidOid, ATTSTATSSLOT_NUMBERS))
+	{
+		if (sslot.nnumbers != 1)
+			elog(ERROR, "invalid empty fraction statistic");
+		empty_frac2 = sslot.numbers[0];
+		free_attstatsslot(&sslot);
+	}
+	else
+	{
+		empty_frac2 = 0.0;
+	}
+
+	/* Convert bound histograms to separate lower/upper bound arrays */
+	nhist1 = hist1.nvalues;
+	hist1_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	hist1_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist1);
+	for (i = 0; i < nhist1; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist1.values[i]),
+						  &hist1_lower[i], &hist1_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	nhist2 = hist2.nvalues;
+	hist2_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	hist2_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist2);
+	for (i = 0; i < nhist2; i++)
+	{
+		range_deserialize(rng_typcache, DatumGetRangeTypeP(hist2.values[i]),
+						  &hist2_lower[i], &hist2_upper[i], &empty);
+		if (empty)
+			elog(ERROR, "bounds histogram contains an empty range");
+	}
+
+	/* Estimate selectivity based on the operator */
+	switch (operator)
+	{
+		case OID_RANGE_OVERLAPS_MULTIRANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_RANGE_OP:
+		case OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP:
+
+			/*
+			 * A && B iff NOT(A << B) AND NOT(A >> B) = 1 - P(A.upper <
+			 * B.lower) - P(B.upper < A.lower)
+			 *
+			 * This decomposition is exact for single ranges.  For
+			 * multiranges, the bound histograms only represent the outermost
+			 * lower and upper bounds (see multirange_typanalyze), so internal
+			 * gaps are not captured. This can overestimate overlap for sparse
+			 * multiranges, but is consistent with how existing restriction
+			 * selectivity handles multirange &&.
+			 */
+			selec = 1;
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist1_upper, nhist1,
+												hist2_lower, nhist2);
+			selec -= calc_hist_join_selectivity(rng_typcache,
+												hist2_upper, nhist2,
+												hist1_lower, nhist1);
+			break;
+
+		case OID_RANGE_LEFT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_LEFT_RANGE_OP:
+		case OID_MULTIRANGE_LEFT_MULTIRANGE_OP:
+			/* A << B iff upper(A) < lower(B) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist1_upper, nhist1,
+											   hist2_lower, nhist2);
+			break;
+
+		case OID_RANGE_RIGHT_MULTIRANGE_OP:
+		case OID_MULTIRANGE_RIGHT_RANGE_OP:
+		case OID_MULTIRANGE_RIGHT_MULTIRANGE_OP:
+			/* A >> B iff upper(B) < lower(A) */
+			selec = calc_hist_join_selectivity(rng_typcache,
+											   hist2_upper, nhist2,
+											   hist1_lower, nhist1);
+			break;
+
+		default:
+			/* Unsupported operator; keep the default selectivity */
+			goto cleanup;
+	}
+
+	/* The histogram-based selectivity applies to non-empty values only */
+	selec *= (1 - empty_frac1) * (1 - empty_frac2);
+
+	/*
+	 * For the supported operators (<<, >>, &&), empty values always produce
+	 * false, so no empty-fraction adjustment is needed.
+	 */
+
+	/* All multirange operators are strict */
+	selec *= (1 - null_frac1) * (1 - null_frac2);
+
+cleanup:
+	if (have_hist2)
+		free_attstatsslot(&hist2);
+	if (have_hist1)
+		free_attstatsslot(&hist1);
+
+	ReleaseVariableStats(vardata1);
+	ReleaseVariableStats(vardata2);
+
+	CLAMP_PROBABILITY(selec);
+
+	PG_RETURN_FLOAT8((float8) selec);
+}
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 5ea4434f9fa..28f696a9f41 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3302,19 +3302,19 @@
   oprname => '&&', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anyrange)',
   oprcode => 'range_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2867', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_RANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '&&(anyrange,anymultirange)',
   oprcode => 'multirange_overlaps_range', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2868', oid_symbol => 'OID_MULTIRANGE_OVERLAPS_MULTIRANGE_OP',
   descr => 'overlaps',
   oprname => '&&', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '&&(anymultirange,anymultirange)',
   oprcode => 'multirange_overlaps_multirange', oprrest => 'multirangesel',
-  oprjoin => 'areajoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '2869', oid_symbol => 'OID_MULTIRANGE_CONTAINS_ELEM_OP',
   descr => 'contains',
   oprname => '@>', oprleft => 'anymultirange', oprright => 'anyelement',
@@ -3428,37 +3428,37 @@
   oprname => '<<', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anyrange)',
   oprcode => 'range_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4396', oid_symbol => 'OID_MULTIRANGE_LEFT_RANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '>>(anyrange,anymultirange)',
   oprcode => 'multirange_before_range', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4397', oid_symbol => 'OID_MULTIRANGE_LEFT_MULTIRANGE_OP',
   descr => 'is left of',
   oprname => '<<', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '>>(anymultirange,anymultirange)',
   oprcode => 'multirange_before_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalarltjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4398', oid_symbol => 'OID_RANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anyrange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anyrange)',
   oprcode => 'range_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4399', oid_symbol => 'OID_MULTIRANGE_RIGHT_RANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '<<(anyrange,anymultirange)',
   oprcode => 'multirange_after_range', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 { oid => '4400', oid_symbol => 'OID_MULTIRANGE_RIGHT_MULTIRANGE_OP',
   descr => 'is right of',
   oprname => '>>', oprleft => 'anymultirange', oprright => 'anymultirange',
   oprresult => 'bool', oprcom => '<<(anymultirange,anymultirange)',
   oprcode => 'multirange_after_multirange', oprrest => 'multirangesel',
-  oprjoin => 'scalargtjoinsel' },
+  oprjoin => 'multirangejoinsel' },
 
 { oid => '8262', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'oid8',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c6a707acae4..10fbc22c4a6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12923,4 +12923,8 @@
   proname => 'rangejoinsel', provolatile => 's', prorettype => 'float8',
   proargtypes => 'internal oid internal int2 internal',
   prosrc => 'rangejoinsel' },
+{ oid => '8356', descr => 'join selectivity for multirange operators',
+  proname => 'multirangejoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'multirangejoinsel' },
 ]
diff --git a/src/test/regress/expected/multirangetypes.out b/src/test/regress/expected/multirangetypes.out
index f5e7df8df43..b348cced3c1 100644
--- a/src/test/regress/expected/multirangetypes.out
+++ b/src/test/regress/expected/multirangetypes.out
@@ -3512,3 +3512,194 @@ create function mr_table_fail(i anyelement) returns table(i anyelement, r anymul
   as $$ select $1, '[1,10]' $$ language sql;
 ERROR:  cannot determine result data type
 DETAIL:  A result of type anymultirange requires at least one input of type anyrange or anymultirange.
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+                                 QUERY PLAN                                  
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 && test_mr_join_2.mr2)
+         ->  Seq Scan on test_mr_join_1
+         ->  Materialize
+               ->  Nested Loop
+                     Join Filter: (test_mr_join_2.mr2 && test_mr_join_3.mr3)
+                     ->  Seq Scan on test_mr_join_2
+                     ->  Materialize
+                           ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_2.mr2 << test_mr_join_3.mr3)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_1.mr1 << test_mr_join_2.mr2)
+               ->  Seq Scan on test_mr_join_1
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_2
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_3
+(10 rows)
+
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (test_mr_join_1.mr1 >> test_mr_join_2.mr2)
+         ->  Nested Loop
+               Join Filter: (test_mr_join_2.mr2 >> test_mr_join_3.mr3)
+               ->  Seq Scan on test_mr_join_2
+               ->  Materialize
+                     ->  Seq Scan on test_mr_join_3
+         ->  Seq Scan on test_mr_join_1
+(9 rows)
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.r)
+         ->  Seq Scan on test_mr_join_lo a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_hi b
+(6 rows)
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r << b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r >> b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.r && b.mr)
+         ->  Seq Scan on test_mr_join_r a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_mr b
+(6 rows)
+
+-- multirange vs range (reverse direction)
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr << b.r;
+                   QUERY PLAN                   
+------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.mr << b.r)
+         ->  Seq Scan on test_mr_join_mr a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_r b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr >> b.r;
+                   QUERY PLAN                   
+------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.mr >> b.r)
+         ->  Seq Scan on test_mr_join_mr a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_r b
+(6 rows)
+
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr && b.r;
+                   QUERY PLAN                   
+------------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (a.mr && b.r)
+         ->  Seq Scan on test_mr_join_mr a
+         ->  Materialize
+               ->  Seq Scan on test_mr_join_r b
+(6 rows)
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
diff --git a/src/test/regress/sql/multirangetypes.sql b/src/test/regress/sql/multirangetypes.sql
index 112334b03eb..fc182e2645a 100644
--- a/src/test/regress/sql/multirangetypes.sql
+++ b/src/test/regress/sql/multirangetypes.sql
@@ -904,3 +904,80 @@ create function mr_inoutparam_fail(inout i anyelement, out r anymultirange)
 --should fail
 create function mr_table_fail(i anyelement) returns table(i anyelement, r anymultirange)
   as $$ select $1, '[1,10]' $$ language sql;
+
+-- Restore GUCs changed by earlier index tests
+RESET enable_seqscan;
+RESET enable_indexscan;
+RESET enable_bitmapscan;
+
+--
+-- test selectivity of multirange join operators
+--
+create table test_mr_join_1 (mr1 int4multirange);
+create table test_mr_join_2 (mr2 int4multirange);
+create table test_mr_join_3 (mr3 int4multirange);
+
+insert into test_mr_join_1 select int4multirange(int4range(g, g+10)) from generate_series(1, 1000) g;
+insert into test_mr_join_1 select int4multirange(int4range(g, g+100)) from generate_series(1, 1000, 10) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_2 select int4multirange(int4range(g, g+100)) from generate_series(1, 500, 10) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+10)) from generate_series(501, 1000) g;
+insert into test_mr_join_3 select int4multirange(int4range(g, g+100)) from generate_series(501, 1000, 10) g;
+
+analyze test_mr_join_1;
+analyze test_mr_join_2;
+analyze test_mr_join_3;
+
+-- multirange vs multirange: reorder joins based on computed selectivity
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 && mr2 and mr2 && mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 << mr2 and mr2 << mr3;
+explain (costs off) select count(*) from test_mr_join_1, test_mr_join_2, test_mr_join_3 where mr1 >> mr2 and mr2 >> mr3;
+
+drop table test_mr_join_1;
+drop table test_mr_join_2;
+drop table test_mr_join_3;
+
+--
+-- test multirange join selectivity with fully disjoint histograms
+--
+create table test_mr_join_lo (r int4multirange);
+create table test_mr_join_hi (r int4multirange);
+
+insert into test_mr_join_lo select int4multirange(int4range(g, g+10)) from generate_series(1, 500) g;
+insert into test_mr_join_hi select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_lo;
+analyze test_mr_join_hi;
+
+-- These should not crash and should produce stable plans.
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r << b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r >> b.r;
+explain (costs off) select count(*) from test_mr_join_lo a, test_mr_join_hi b where a.r && b.r;
+
+drop table test_mr_join_lo;
+drop table test_mr_join_hi;
+
+--
+-- test range vs multirange join selectivity
+--
+create table test_mr_join_r (r int4range);
+create table test_mr_join_mr (mr int4multirange);
+
+insert into test_mr_join_r select int4range(g, g+10) from generate_series(1, 500) g;
+insert into test_mr_join_mr select int4multirange(int4range(g, g+10)) from generate_series(10001, 10500) g;
+
+analyze test_mr_join_r;
+analyze test_mr_join_mr;
+
+-- range vs multirange operators should use multirangejoinsel
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r << b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r >> b.mr;
+explain (costs off) select count(*) from test_mr_join_r a, test_mr_join_mr b where a.r && b.mr;
+
+-- multirange vs range (reverse direction)
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr << b.r;
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr >> b.r;
+explain (costs off) select count(*) from test_mr_join_mr a, test_mr_join_r b where a.mr && b.r;
+
+drop table test_mr_join_r;
+drop table test_mr_join_mr;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v8-0003-Remove-duplicate-selectivity-functions-between-ra.patch (34.5K, 5-v8-0003-Remove-duplicate-selectivity-functions-between-ra.patch)
  download | inline diff:
From 11cbdf7f78d229336e6a75d908a6625a421565e9 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Thu, 16 Apr 2026 16:28:17 +0200
Subject: [PATCH v8 3/3] Remove duplicate selectivity functions between range
 and multirange

The multirange selectivity code duplicated 10 helper functions from
rangetypes_selfuncs.c. Since both range and multirange types use the
same histogram format (STATISTIC_KIND_BOUNDS_HISTOGRAM) and the same
RangeBound representation, the functions are identical.

Make the 10 shared functions non-static in rangetypes_selfuncs.c,
export them via a new rangetypes_selfuncs.h header, and remove the
copies from multirangetypes_selfuncs.c.
---
 .../utils/adt/multirangetypes_selfuncs.c      | 772 +-----------------
 src/backend/utils/adt/rangetypes_selfuncs.c   |  46 +-
 src/include/utils/rangetypes_selfuncs.h       |  54 ++
 3 files changed, 67 insertions(+), 805 deletions(-)
 create mode 100644 src/include/utils/rangetypes_selfuncs.h

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index e3d5c527e03..73e6d490295 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -27,6 +27,7 @@
 #include "utils/lsyscache.h"
 #include "utils/multirangetypes.h"
 #include "utils/rangetypes.h"
+#include "utils/rangetypes_selfuncs.h"
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
@@ -38,37 +39,6 @@ static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata,
 									const MultirangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist,
-										   int hist_nvalues, bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value,
-								bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1,
-									double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower,
-											  RangeBound *upper,
-											  const RangeBound *hist_lower,
-											  int hist_nvalues,
-											  const Datum *length_hist_values,
-											  int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower,
-											 const RangeBound *upper,
-											 const RangeBound *hist_lower,
-											 int hist_nvalues,
-											 const Datum *length_hist_values,
-											 int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -698,746 +668,6 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
 	return hist_selec;
 }
 
-
-/*
- * Look up the fraction of values less than (or equal, if 'equal' argument
- * is true) a given const in a histogram of range bounds.
- */
-static double
-calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
-							 const RangeBound *hist, int hist_nvalues, bool equal)
-{
-	Selectivity selec;
-	int			index;
-
-	/*
-	 * Find the histogram bin the given constant falls into. Estimate
-	 * selectivity as the number of preceding whole bins.
-	 */
-	index = rbound_bsearch(typcache, constbound, hist, hist_nvalues, equal);
-	selec = (Selectivity) (Max(index, 0)) / (Selectivity) (hist_nvalues - 1);
-
-	/* Adjust using linear interpolation within the bin */
-	if (index >= 0 && index < hist_nvalues - 1)
-		selec += get_position(typcache, constbound, &hist[index],
-							  &hist[index + 1]) / (Selectivity) (hist_nvalues - 1);
-
-	return selec;
-}
-
-/*
- * Binary search on an array of range bounds. Returns greatest index of range
- * bound in array which is less(less or equal) than given range bound. If all
- * range bounds in array are greater or equal(greater) than given range bound,
- * return -1. When "equal" flag is set conditions in brackets are used.
- *
- * This function is used in scalar operator selectivity estimation. Another
- * goal of this function is to find a histogram bin where to stop
- * interpolation of portion of bounds which are less than or equal to given bound.
- */
-static int
-rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
-			   int hist_length, bool equal)
-{
-	int			lower = -1,
-				upper = hist_length - 1,
-				cmp,
-				middle;
-
-	while (lower < upper)
-	{
-		middle = (lower + upper + 1) / 2;
-		cmp = range_cmp_bounds(typcache, &hist[middle], value);
-
-		if (cmp < 0 || (equal && cmp == 0))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-
-/*
- * Binary search on length histogram. Returns greatest index of range length in
- * histogram which is less than (less than or equal) the given length value. If
- * all lengths in the histogram are greater than (greater than or equal) the
- * given length, returns -1.
- */
-static int
-length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
-					double value, bool equal)
-{
-	int			lower = -1,
-				upper = length_hist_nvalues - 1,
-				middle;
-
-	while (lower < upper)
-	{
-		double		middleval;
-
-		middle = (lower + upper + 1) / 2;
-
-		middleval = DatumGetFloat8(length_hist_values[middle]);
-		if (middleval < value || (equal && middleval <= value))
-			lower = middle;
-		else
-			upper = middle - 1;
-	}
-	return lower;
-}
-
-/*
- * Get relative position of value in histogram bin in [0,1] range.
- */
-static float8
-get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
-			 const RangeBound *hist2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-	float8		position;
-
-	if (!hist1->infinite && !hist2->infinite)
-	{
-		float8		bin_width;
-
-		/*
-		 * Both bounds are finite. Assuming the subtype's comparison function
-		 * works sanely, the value must be finite, too, because it lies
-		 * somewhere between the bounds.  If it doesn't, arbitrarily return
-		 * 0.5.
-		 */
-		if (value->infinite)
-			return 0.5;
-
-		/* Can't interpolate without subdiff function */
-		if (!has_subdiff)
-			return 0.5;
-
-		/* Calculate relative position using subdiff function. */
-		bin_width = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													 typcache->rng_collation,
-													 hist2->val,
-													 hist1->val));
-		if (isnan(bin_width) || bin_width <= 0.0)
-			return 0.5;			/* punt for NaN or zero-width bin */
-
-		position = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-													typcache->rng_collation,
-													value->val,
-													hist1->val))
-			/ bin_width;
-
-		if (isnan(position))
-			return 0.5;			/* punt for NaN from subdiff, Inf/Inf, etc */
-
-		/* Relative position must be in [0,1] range */
-		position = Max(position, 0.0);
-		position = Min(position, 1.0);
-		return position;
-	}
-	else if (hist1->infinite && !hist2->infinite)
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. If the value is
-		 * -infinite, return 0.0 to indicate it's equal to the lower bound.
-		 * Otherwise return 1.0 to indicate it's infinitely far from the lower
-		 * bound.
-		 */
-		return ((value->infinite && value->lower) ? 0.0 : 1.0);
-	}
-	else if (!hist1->infinite && hist2->infinite)
-	{
-		/* same as above, but in reverse */
-		return ((value->infinite && !value->lower) ? 1.0 : 0.0);
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing if a user creates
-		 * a datatype with a broken comparison function).
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-
-/*
- * Get relative position of value in a length histogram bin in [0,1] range.
- */
-static double
-get_len_position(double value, double hist1, double hist2)
-{
-	if (!isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Both bounds are finite. The value should be finite too, because it
-		 * lies somewhere between the bounds. If it doesn't, just return
-		 * something.
-		 */
-		if (isinf(value))
-			return 0.5;
-
-		return 1.0 - (hist2 - value) / (hist2 - hist1);
-	}
-	else if (isinf(hist1) && !isinf(hist2))
-	{
-		/*
-		 * Lower bin boundary is -infinite, upper is finite. Return 1.0 to
-		 * indicate the value is infinitely far from the lower bound.
-		 */
-		return 1.0;
-	}
-	else if (isinf(hist1) && isinf(hist2))
-	{
-		/* same as above, but in reverse */
-		return 0.0;
-	}
-	else
-	{
-		/*
-		 * If both bin boundaries are infinite, they should be equal to each
-		 * other, and the value should also be infinite and equal to both
-		 * bounds. (But don't Assert that, to avoid crashing unnecessarily if
-		 * the caller messes up)
-		 *
-		 * Assume the value to lie in the middle of the infinite bounds.
-		 */
-		return 0.5;
-	}
-}
-
-/*
- * Measure distance between two range bounds.
- */
-static float8
-get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
-{
-	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
-
-	if (!bound1->infinite && !bound2->infinite)
-	{
-		/*
-		 * Neither bound is infinite, use subdiff function or return default
-		 * value of 1.0 if no subdiff is available.
-		 */
-		if (has_subdiff)
-		{
-			float8		res;
-
-			res = DatumGetFloat8(FunctionCall2Coll(&typcache->rng_subdiff_finfo,
-												   typcache->rng_collation,
-												   bound2->val,
-												   bound1->val));
-			/* Reject possible NaN result, also negative result */
-			if (isnan(res) || res < 0.0)
-				return 1.0;
-			else
-				return res;
-		}
-		else
-			return 1.0;
-	}
-	else if (bound1->infinite && bound2->infinite)
-	{
-		/* Both bounds are infinite */
-		if (bound1->lower == bound2->lower)
-			return 0.0;
-		else
-			return get_float8_infinity();
-	}
-	else
-	{
-		/* One bound is infinite, the other is not */
-		return get_float8_infinity();
-	}
-}
-
-/*
- * Calculate the average of function P(x), in the interval [length1, length2],
- * where P(x) is the fraction of tuples with length < x (or length <= x if
- * 'equal' is true).
- */
-static double
-calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
-					  double length1, double length2, bool equal)
-{
-	double		frac;
-	double		A,
-				B,
-				PA,
-				PB;
-	double		pos;
-	int			i;
-	double		area;
-
-	Assert(length2 >= length1);
-
-	if (length2 < 0.0)
-		return 0.0;				/* shouldn't happen, but doesn't hurt to check */
-
-	/* All lengths in the table are <= infinite. */
-	if (isinf(length2) && equal)
-		return 1.0;
-
-	/*----------
-	 * The average of a function between A and B can be calculated by the
-	 * formula:
-	 *
-	 *			B
-	 *	  1		/
-	 * -------	| P(x)dx
-	 *	B - A	/
-	 *			A
-	 *
-	 * The geometrical interpretation of the integral is the area under the
-	 * graph of P(x). P(x) is defined by the length histogram. We calculate
-	 * the area in a piecewise fashion, iterating through the length histogram
-	 * bins. Each bin is a trapezoid:
-	 *
-	 *		 P(x2)
-	 *		  /|
-	 *		 / |
-	 * P(x1)/  |
-	 *	   |   |
-	 *	   |   |
-	 *	---+---+--
-	 *	   x1  x2
-	 *
-	 * where x1 and x2 are the boundaries of the current histogram, and P(x1)
-	 * and P(x1) are the cumulative fraction of tuples at the boundaries.
-	 *
-	 * The area of each trapezoid is 1/2 * (P(x2) + P(x1)) * (x2 - x1)
-	 *
-	 * The first bin contains the lower bound passed by the caller, so we
-	 * use linear interpolation between the previous and next histogram bin
-	 * boundary to calculate P(x1). Likewise for the last bin: we use linear
-	 * interpolation to calculate P(x2). For the bins in between, x1 and x2
-	 * lie on histogram bin boundaries, so P(x1) and P(x2) are simply:
-	 * P(x1) =	  (bin index) / (number of bins)
-	 * P(x2) = (bin index + 1 / (number of bins)
-	 */
-
-	/* First bin, the one that contains lower bound */
-	i = length_hist_bsearch(length_hist_values, length_hist_nvalues, length1, equal);
-	if (i >= length_hist_nvalues - 1)
-		return 1.0;
-
-	if (i < 0)
-	{
-		i = 0;
-		pos = 0.0;
-	}
-	else
-	{
-		/* interpolate length1's position in the bin */
-		pos = get_len_position(length1,
-							   DatumGetFloat8(length_hist_values[i]),
-							   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-	B = length1;
-
-	/*
-	 * In the degenerate case that length1 == length2, simply return
-	 * P(length1). This is not merely an optimization: if length1 == length2,
-	 * we'd divide by zero later on.
-	 */
-	if (length2 == length1)
-		return PB;
-
-	/*
-	 * Loop through all the bins, until we hit the last bin, the one that
-	 * contains the upper bound. (if lower and upper bounds are in the same
-	 * bin, this falls out immediately)
-	 */
-	area = 0.0;
-	for (; i < length_hist_nvalues - 1; i++)
-	{
-		double		bin_upper = DatumGetFloat8(length_hist_values[i + 1]);
-
-		/* check if we've reached the last bin */
-		if (!(bin_upper < length2 || (equal && bin_upper <= length2)))
-			break;
-
-		/* the upper bound of previous bin is the lower bound of this bin */
-		A = B;
-		PA = PB;
-
-		B = bin_upper;
-		PB = (double) i / (double) (length_hist_nvalues - 1);
-
-		/*
-		 * Add the area of this trapezoid to the total. The point of the
-		 * if-check is to avoid NaN, in the corner case that PA == PB == 0,
-		 * and B - A == Inf. The area of a zero-height trapezoid (PA == PB ==
-		 * 0) is zero, regardless of the width (B - A).
-		 */
-		if (PA > 0 || PB > 0)
-			area += 0.5 * (PB + PA) * (B - A);
-	}
-
-	/* Last bin */
-	A = B;
-	PA = PB;
-
-	B = length2;				/* last bin ends at the query upper bound */
-	if (i >= length_hist_nvalues - 1)
-		pos = 0.0;
-	else
-	{
-		if (DatumGetFloat8(length_hist_values[i]) == DatumGetFloat8(length_hist_values[i + 1]))
-			pos = 0.0;
-		else
-			pos = get_len_position(length2,
-								   DatumGetFloat8(length_hist_values[i]),
-								   DatumGetFloat8(length_hist_values[i + 1]));
-	}
-	PB = (((double) i) + pos) / (double) (length_hist_nvalues - 1);
-
-	if (PA > 0 || PB > 0)
-		area += 0.5 * (PB + PA) * (B - A);
-
-	/*
-	 * Ok, we have calculated the area, ie. the integral. Divide by width to
-	 * get the requested average.
-	 *
-	 * Avoid NaN arising from infinite / infinite. This happens at least if
-	 * length2 is infinite. It's not clear what the correct value would be in
-	 * that case, so 0.5 seems as good as any value.
-	 */
-	if (isinf(area) && isinf(length2))
-		frac = 0.5;
-	else
-		frac = area / (length2 - length1);
-
-	return frac;
-}
-
-/*
- * Calculate selectivity of "var <@ const" operator, ie. estimate the fraction
- * of multiranges that fall within the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- *
- * The caller has already checked that constant lower and upper bounds are
- * finite.
- */
-static double
-calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-								const RangeBound *lower, RangeBound *upper,
-								const RangeBound *hist_lower, int hist_nvalues,
-								const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				upper_index;
-	float8		prev_dist;
-	double		bin_width;
-	double		upper_bin_width;
-	double		sum_frac;
-
-	/*
-	 * Begin by finding the bin containing the upper bound, in the lower bound
-	 * histogram. Any range with a lower bound > constant upper bound can't
-	 * match, ie. there are no matches in bins greater than upper_index.
-	 */
-	upper->inclusive = !upper->inclusive;
-	upper->lower = true;
-	upper_index = rbound_bsearch(typcache, upper, hist_lower, hist_nvalues,
-								 false);
-
-	/*
-	 * If the upper bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (upper_index < 0)
-		return 0.0;
-
-	/*
-	 * If the upper bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	upper_index = Min(upper_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate upper_bin_width, ie. the fraction of the (upper_index,
-	 * upper_index + 1) bin which is greater than upper bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	upper_bin_width = get_position(typcache, upper,
-								   &hist_lower[upper_index],
-								   &hist_lower[upper_index + 1]);
-
-	/*
-	 * In the loop, dist and prev_dist are the distance of the "current" bin's
-	 * lower and upper bounds from the constant upper bound.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, but can be less in the corner cases: start
-	 * and end of the loop. We start with bin_width = upper_bin_width, because
-	 * we begin at the bin containing the upper bound.
-	 */
-	prev_dist = 0.0;
-	bin_width = upper_bin_width;
-
-	sum_frac = 0.0;
-	for (i = upper_index; i >= 0; i--)
-	{
-		double		dist;
-		double		length_hist_frac;
-		bool		final_bin = false;
-
-		/*
-		 * dist -- distance from upper bound of query range to lower bound of
-		 * the current bin in the lower bound histogram. Or to the lower bound
-		 * of the constant range, if this is the final bin, containing the
-		 * constant lower bound.
-		 */
-		if (range_cmp_bounds(typcache, &hist_lower[i], lower) < 0)
-		{
-			dist = get_distance(typcache, lower, upper);
-
-			/*
-			 * Subtract from bin_width the portion of this bin that we want to
-			 * ignore.
-			 */
-			bin_width -= get_position(typcache, lower, &hist_lower[i],
-									  &hist_lower[i + 1]);
-			if (bin_width < 0.0)
-				bin_width = 0.0;
-			final_bin = true;
-		}
-		else
-			dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Estimate the fraction of tuples in this bin that are narrow enough
-		 * to not exceed the distance to the upper bound of the query range.
-		 */
-		length_hist_frac = calc_length_hist_frac(length_hist_values,
-												 length_hist_nvalues,
-												 prev_dist, dist, true);
-
-		/*
-		 * Add the fraction of tuples in this bin, with a suitable length, to
-		 * the total.
-		 */
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		if (final_bin)
-			break;
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Calculate selectivity of "var @> const" operator, ie. estimate the fraction
- * of multiranges that contain the constant lower and upper bounds. This uses
- * the histograms of range lower bounds and range lengths, on the assumption
- * that the range lengths are independent of the lower bounds.
- */
-static double
-calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-							   const RangeBound *lower, const RangeBound *upper,
-							   const RangeBound *hist_lower, int hist_nvalues,
-							   const Datum *length_hist_values, int length_hist_nvalues)
-{
-	int			i,
-				lower_index;
-	double		bin_width,
-				lower_bin_width;
-	double		sum_frac;
-	float8		prev_dist;
-
-	/* Find the bin containing the lower bound of query range. */
-	lower_index = rbound_bsearch(typcache, lower, hist_lower, hist_nvalues,
-								 true);
-
-	/*
-	 * If the lower bound value is below the histogram's lower limit, there
-	 * are no matches.
-	 */
-	if (lower_index < 0)
-		return 0.0;
-
-	/*
-	 * If the lower bound value is at or beyond the histogram's upper limit,
-	 * start our loop at the last actual bin, as though the upper bound were
-	 * within that bin; get_position will clamp its result to 1.0 anyway.
-	 * (This corresponds to assuming that the data population above the
-	 * histogram's upper limit is empty, exactly like what we just assumed for
-	 * the lower limit.)
-	 */
-	lower_index = Min(lower_index, hist_nvalues - 2);
-
-	/*
-	 * Calculate lower_bin_width, ie. the fraction of the of (lower_index,
-	 * lower_index + 1) bin which is greater than lower bound of query range
-	 * using linear interpolation of subdiff function.
-	 */
-	lower_bin_width = get_position(typcache, lower, &hist_lower[lower_index],
-								   &hist_lower[lower_index + 1]);
-
-	/*
-	 * Loop through all the lower bound bins, smaller than the query lower
-	 * bound. In the loop, dist and prev_dist are the distance of the
-	 * "current" bin's lower and upper bounds from the constant upper bound.
-	 * We begin from query lower bound, and walk backwards, so the first bin's
-	 * upper bound is the query lower bound, and its distance to the query
-	 * upper bound is the length of the query range.
-	 *
-	 * bin_width represents the width of the current bin. Normally it is 1.0,
-	 * meaning a full width bin, except for the first bin, which is only
-	 * counted up to the constant lower bound.
-	 */
-	prev_dist = get_distance(typcache, lower, upper);
-	sum_frac = 0.0;
-	bin_width = lower_bin_width;
-	for (i = lower_index; i >= 0; i--)
-	{
-		float8		dist;
-		double		length_hist_frac;
-
-		/*
-		 * dist -- distance from upper bound of query range to current value
-		 * of lower bound histogram or lower bound of query range (if we've
-		 * reach it).
-		 */
-		dist = get_distance(typcache, &hist_lower[i], upper);
-
-		/*
-		 * Get average fraction of length histogram which covers intervals
-		 * longer than (or equal to) distance to upper bound of query range.
-		 */
-		length_hist_frac =
-			1.0 - calc_length_hist_frac(length_hist_values,
-										length_hist_nvalues,
-										prev_dist, dist, false);
-
-		sum_frac += length_hist_frac * bin_width / (double) (hist_nvalues - 1);
-
-		bin_width = 1.0;
-		prev_dist = dist;
-	}
-
-	return sum_frac;
-}
-
-/*
- * Estimate join selectivity P(X < Y) using rangebound histograms.
- *
- * Based on: Diogo Repas, Zhicheng Luo, Maxime Schoemans, Mahmoud Sakr, 2022
- * "Selectivity Estimation of Inequality Joins In Databases"
- * https://doi.org/10.48550/arXiv.2206.07396
- *
- * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed or multirange-typed attributes X and Y, respectively.
- * Each array has at least 2 entries (one histogram bin).  The entries carry
- * full bound metadata (lower/upper flag, inclusive/exclusive), and all
- * comparisons use range_cmp_bounds() so that bound semantics are preserved.
- *
- * The algorithm models each attribute's distribution as a piecewise function
- * derived from its histogram, then computes:
- *   P(X < Y) = 0.5 * sum( (F_X(prev) + F_X(cur)) * (F_Y(cur) - F_Y(prev)) )
- * by parallel-scanning both histograms.
- *
- * The initial fast-forward loops skip histogram entries that fall entirely
- * before the other histogram's range, so the main loop only processes the
- * overlapping region.  Bounds checks are required because the histograms may
- * be completely disjoint (e.g., all of X is below all of Y).
- */
-static double
-calc_hist_join_selectivity(TypeCacheEntry *typcache,
-						   const RangeBound *hist1, int nhist1,
-						   const RangeBound *hist2, int nhist2)
-{
-	int			i,
-				j;
-	double		selectivity = 0.0;
-	double		prev_sel1 = -1.0;	/* negative sentinel skips first iter */
-	double		prev_sel2 = 0.0;
-
-	Assert(nhist1 > 1);
-	Assert(nhist2 > 1);
-
-	/*
-	 * Fast-forward past hist1 entries that are entirely below hist2[0], and
-	 * vice versa.  Bounds checks prevent out-of-bounds access when the
-	 * histograms are fully disjoint.
-	 */
-	for (i = 0; i < nhist1 &&
-		 range_cmp_bounds(typcache, &hist1[i], &hist2[0]) < 0; i++)
-		;
-	for (j = 0; j < nhist2 &&
-		 range_cmp_bounds(typcache, &hist2[j], &hist1[0]) < 0; j++)
-		;
-
-	/*
-	 * Handle fully-separated histograms.  When all bounds in hist1 are below
-	 * all bounds in hist2, P(X < Y) is ~1.0.  When all of hist2 is below
-	 * hist1, P(X < Y) is ~0.0.  We return immediately rather than falling
-	 * into the overlap walk with invalid indices.
-	 */
-	if (i >= nhist1)
-		return 1.0;
-	if (j >= nhist2)
-		return 0.0;
-
-	/* Walk the overlapping region of both histograms */
-	while (i < nhist1 && j < nhist2)
-	{
-		double		cur_sel1,
-					cur_sel2;
-		RangeBound	cur_sync;
-		int			cmp;
-
-		cmp = range_cmp_bounds(typcache, &hist1[i], &hist2[j]);
-		if (cmp < 0)
-			cur_sync = hist1[i++];
-		else if (cmp > 0)
-			cur_sync = hist2[j++];
-		else
-		{
-			/* Equal bounds: advance both */
-			cur_sync = hist1[i];
-			i++;
-			j++;
-		}
-		cur_sel1 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist1, nhist1, false);
-		cur_sel2 = calc_hist_selectivity_scalar(typcache, &cur_sync,
-												hist2, nhist2, false);
-
-		/* Skip the first iteration (no previous point yet) */
-		if (prev_sel1 >= 0)
-			selectivity += (prev_sel1 + cur_sel1) * (cur_sel2 - prev_sel2);
-
-		prev_sel1 = cur_sel1;
-		prev_sel2 = cur_sel2;
-	}
-
-	/* P(X < Y) = 0.5 * Sum(...) */
-	selectivity /= 2;
-
-	/* Include remainder of hist2 if hist1 was exhausted first */
-	if (j < nhist2)
-		selectivity += 1 - prev_sel2;
-
-	return selectivity;
-}
-
 /*
  * multirangejoinsel -- join selectivity for multirange operators
  *
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index cc702f28610..4f4baa7dc1a 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -26,6 +26,7 @@
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/rangetypes.h"
+#include "utils/rangetypes_selfuncs.h"
 #include "utils/selfuncs.h"
 #include "utils/typcache.h"
 
@@ -35,29 +36,6 @@ static double default_range_selectivity(Oid operator);
 static double calc_hist_selectivity(TypeCacheEntry *typcache,
 									VariableStatData *vardata, const RangeType *constval,
 									Oid operator);
-static double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
-										   const RangeBound *constbound,
-										   const RangeBound *hist, int hist_nvalues,
-										   bool equal);
-static int	rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist, int hist_length, bool equal);
-static float8 get_position(TypeCacheEntry *typcache, const RangeBound *value,
-						   const RangeBound *hist1, const RangeBound *hist2);
-static float8 get_len_position(double value, double hist1, double hist2);
-static float8 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1,
-						   const RangeBound *bound2);
-static int	length_hist_bsearch(const Datum *length_hist_values,
-								int length_hist_nvalues, double value, bool equal);
-static double calc_length_hist_frac(const Datum *length_hist_values,
-									int length_hist_nvalues, double length1, double length2, bool equal);
-static double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
-											  const RangeBound *lower, RangeBound *upper,
-											  const RangeBound *hist_lower, int hist_nvalues,
-											  const Datum *length_hist_values, int length_hist_nvalues);
-static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
-											 const RangeBound *lower, const RangeBound *upper,
-											 const RangeBound *hist_lower, int hist_nvalues,
-											 const Datum *length_hist_values, int length_hist_nvalues);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -592,7 +570,7 @@ calc_hist_selectivity(TypeCacheEntry *typcache, VariableStatData *vardata,
  * Look up the fraction of values less than (or equal, if 'equal' argument
  * is true) a given const in a histogram of range bounds.
  */
-static double
+double
 calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbound,
 							 const RangeBound *hist, int hist_nvalues, bool equal)
 {
@@ -624,7 +602,7 @@ calc_hist_selectivity_scalar(TypeCacheEntry *typcache, const RangeBound *constbo
  * goal of this function is to find a histogram bin where to stop
  * interpolation of portion of bounds which are less than or equal to given bound.
  */
-static int
+int
 rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist,
 			   int hist_length, bool equal)
 {
@@ -653,7 +631,7 @@ rbound_bsearch(TypeCacheEntry *typcache, const RangeBound *value, const RangeBou
  * all lengths in the histogram are greater than (greater than or equal) the
  * given length, returns -1.
  */
-static int
+int
 length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 					double value, bool equal)
 {
@@ -679,7 +657,7 @@ length_hist_bsearch(const Datum *length_hist_values, int length_hist_nvalues,
 /*
  * Get relative position of value in histogram bin in [0,1] range.
  */
-static float8
+float8
 get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound *hist1,
 			 const RangeBound *hist2)
 {
@@ -758,7 +736,7 @@ get_position(TypeCacheEntry *typcache, const RangeBound *value, const RangeBound
 /*
  * Get relative position of value in a length histogram bin in [0,1] range.
  */
-static double
+double
 get_len_position(double value, double hist1, double hist2)
 {
 	if (!isinf(hist1) && !isinf(hist2))
@@ -803,7 +781,7 @@ get_len_position(double value, double hist1, double hist2)
 /*
  * Measure distance between two range bounds.
  */
-static float8
+float8
 get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBound *bound2)
 {
 	bool		has_subdiff = OidIsValid(typcache->rng_subdiff_finfo.fn_oid);
@@ -851,7 +829,7 @@ get_distance(TypeCacheEntry *typcache, const RangeBound *bound1, const RangeBoun
  * where P(x) is the fraction of tuples with length < x (or length <= x if
  * 'equal' is true).
  */
-static double
+double
 calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
 					  double length1, double length2, bool equal)
 {
@@ -1014,7 +992,7 @@ calc_length_hist_frac(const Datum *length_hist_values, int length_hist_nvalues,
  * The caller has already checked that constant lower and upper bounds are
  * finite.
  */
-static double
+double
 calc_hist_selectivity_contained(TypeCacheEntry *typcache,
 								const RangeBound *lower, RangeBound *upper,
 								const RangeBound *hist_lower, int hist_nvalues,
@@ -1135,7 +1113,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
  * the histograms of range lower bounds and range lengths, on the assumption
  * that the range lengths are independent of the lower bounds.
  */
-static double
+double
 calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 							   const RangeBound *lower, const RangeBound *upper,
 							   const RangeBound *hist_lower, int hist_nvalues,
@@ -1230,7 +1208,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * https://doi.org/10.48550/arXiv.2206.07396
  *
  * hist1 and hist2 are arrays of RangeBound entries from the bounds histograms
- * of two range-typed attributes X and Y, respectively.  Each array has at
+ * of two range- or multirange-typed attributes X and Y, respectively.  Each array has at
  * least 2 entries (one histogram bin).  The entries carry full bound metadata
  * (lower/upper flag, inclusive/exclusive), and all comparisons use
  * range_cmp_bounds() so that bound semantics are preserved.
@@ -1245,7 +1223,7 @@ calc_hist_selectivity_contains(TypeCacheEntry *typcache,
  * overlapping region.  Bounds checks are required because the histograms may
  * be completely disjoint (e.g., all of X is below all of Y).
  */
-static double
+double
 calc_hist_join_selectivity(TypeCacheEntry *typcache,
 						   const RangeBound *hist1, int nhist1,
 						   const RangeBound *hist2, int nhist2)
diff --git a/src/include/utils/rangetypes_selfuncs.h b/src/include/utils/rangetypes_selfuncs.h
new file mode 100644
index 00000000000..be6bda9ab11
--- /dev/null
+++ b/src/include/utils/rangetypes_selfuncs.h
@@ -0,0 +1,54 @@
+/*-------------------------------------------------------------------------
+ *
+ * rangetypes_selfuncs.h
+ *	  Shared helper functions for range and multirange selectivity estimation.
+ *
+ * These functions are defined in rangetypes_selfuncs.c and used by both
+ * rangetypes_selfuncs.c and multirangetypes_selfuncs.c.
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/rangetypes_selfuncs.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef RANGETYPES_SELFUNCS_H
+#define RANGETYPES_SELFUNCS_H
+
+#include "utils/rangetypes.h"
+
+extern double calc_hist_selectivity_scalar(TypeCacheEntry *typcache,
+										   const RangeBound *constbound,
+										   const RangeBound *hist, int hist_nvalues,
+										   bool equal);
+extern int	rbound_bsearch(TypeCacheEntry *typcache,
+						   const RangeBound *value, const RangeBound *hist,
+						   int hist_length, bool equal);
+extern int	length_hist_bsearch(const Datum *length_hist_values,
+								int length_hist_nvalues,
+								double value, bool equal);
+extern float8 get_position(TypeCacheEntry *typcache,
+						   const RangeBound *value,
+						   const RangeBound *hist1, const RangeBound *hist2);
+extern double get_len_position(double value, double hist1, double hist2);
+extern float8 get_distance(TypeCacheEntry *typcache,
+						   const RangeBound *bound1, const RangeBound *bound2);
+extern double calc_length_hist_frac(const Datum *length_hist_values,
+									int length_hist_nvalues,
+									double length1, double length2, bool equal);
+extern double calc_hist_selectivity_contained(TypeCacheEntry *typcache,
+											  const RangeBound *lower, RangeBound *upper,
+											  const RangeBound *hist_lower, int hist_nvalues,
+											  const Datum *length_hist_values,
+											  int length_hist_nvalues);
+extern double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
+											 const RangeBound *lower, const RangeBound *upper,
+											 const RangeBound *hist_lower, int hist_nvalues,
+											 const Datum *length_hist_values,
+											 int length_hist_nvalues);
+extern double calc_hist_join_selectivity(TypeCacheEntry *typcache,
+										 const RangeBound *hist1, int nhist1,
+										 const RangeBound *hist2, int nhist2);
+
+#endif							/* RANGETYPES_SELFUNCS_H */
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-18 04:02                     ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-21 13:54                       ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
@ 2026-04-23 02:25                         ` Haibo Yan <[email protected]>
  2026-04-23 12:32                           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Haibo Yan @ 2026-04-23 02:25 UTC (permalink / raw)
  To: SCHOEMANS Maxime <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; Andrey Lepikhov <[email protected]>

On Tue, Apr 21, 2026 at 6:54 AM SCHOEMANS Maxime <[email protected]>
wrote:

> Hi Haibo,
>
> Thanks for the continued feedback. Attached is v8 addressing your nits
> on patch 2:
>
> - Reworded the commit message to say "the same outer-bounds-based
>   estimator can be reused" instead of implying the code is just
>   duplicated.
> - Made comments in multirangejoinsel type-neutral where they
>   unnecessarily said "range" (e.g. "bound histograms" instead of
>   "range histograms", "empty values" instead of "empty ranges").
> - Added the reverse mixed-direction tests (multirange x range).
>
> Regards,
> Maxime
>

Hi Maxime,

Thanks for addressing those points — this looks good to me now.

If you don’t mind, I’ve created a CommitFest entry for this:
https://commitfest.postgresql.org/patch/6668/

I’ve listed both of us as authors.

I’m not a committer yet, so we’ll need someone else to review and
(hopefully) pick this up for commit.

Thanks again for the work on this.

Best regards,
Haibo


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-18 04:02                     ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-21 13:54                       ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-23 02:25                         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
@ 2026-04-23 12:32                           ` SCHOEMANS Maxime <[email protected]>
  2026-04-24 18:44                             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: SCHOEMANS Maxime @ 2026-04-23 12:32 UTC (permalink / raw)
  To: Haibo Yan <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>

Hi Haibo,

Thanks for creating the CommitFest entry. Could you add Diogo Repas,
Zhicheng Luo, and Mahmoud Sakr as authors as well? They wrote the
original patch and the underlying algorithm. The earlier CF entry is
at https://commitfest.postgresql.org/patch/3821/ for reference.

Regards,
Maxime


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
  2024-01-05 10:37 ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-05 17:39   ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  2024-01-17 10:48     ` Re: Implement missing join selectivity estimation for range types vignesh C <[email protected]>
  2024-01-22 08:10       ` Re: Implement missing join selectivity estimation for range types jian he <[email protected]>
  2026-04-06 23:32         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-14 14:03           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-15 00:53             ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-15 15:13               ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-16 04:12                 ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-16 15:12                   ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-18 04:02                     ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-21 13:54                       ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
  2026-04-23 02:25                         ` Re: Implement missing join selectivity estimation for range types Haibo Yan <[email protected]>
  2026-04-23 12:32                           ` Re: Implement missing join selectivity estimation for range types SCHOEMANS Maxime <[email protected]>
@ 2026-04-24 18:44                             ` Haibo Yan <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Haibo Yan @ 2026-04-24 18:44 UTC (permalink / raw)
  To: SCHOEMANS Maxime <[email protected]>; +Cc: vignesh C <[email protected]>; Tom Lane <[email protected]>; Damir Belyalov <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]>; SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>

On Thu, Apr 23, 2026 at 5:32 AM SCHOEMANS Maxime <[email protected]>
wrote:

> Hi Haibo,
>
> Thanks for creating the CommitFest entry. Could you add Diogo Repas,
> Zhicheng Luo, and Mahmoud Sakr as authors as well? They wrote the
> original patch and the underlying algorithm. The earlier CF entry is
> at https://commitfest.postgresql.org/patch/3821/ for reference.
>
> Of course — I’ve added Diogo Repas, Zhicheng Luo, and Mahmoud Sakr as
authors as we

> Regards,
> Maxime
>
Thanks for the reference.

Best regards,
Haibo


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

end of thread, other threads:[~2026-04-24 18:44 UTC | newest]

Thread overview: 16+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 14:31 Implement missing join selectivity estimation for range types Mahmoud Sakr <[email protected]>
2024-01-05 10:37 ` vignesh C <[email protected]>
2024-01-05 17:39   ` Schoemans Maxime <[email protected]>
2024-01-17 10:48     ` vignesh C <[email protected]>
2024-01-22 08:10       ` jian he <[email protected]>
2026-04-06 23:32         ` Haibo Yan <[email protected]>
2026-04-14 14:03           ` SCHOEMANS Maxime <[email protected]>
2026-04-15 00:53             ` Haibo Yan <[email protected]>
2026-04-15 15:13               ` SCHOEMANS Maxime <[email protected]>
2026-04-16 04:12                 ` Haibo Yan <[email protected]>
2026-04-16 15:12                   ` SCHOEMANS Maxime <[email protected]>
2026-04-18 04:02                     ` Haibo Yan <[email protected]>
2026-04-21 13:54                       ` SCHOEMANS Maxime <[email protected]>
2026-04-23 02:25                         ` Haibo Yan <[email protected]>
2026-04-23 12:32                           ` SCHOEMANS Maxime <[email protected]>
2026-04-24 18:44                             ` Haibo Yan <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox