public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Implement missing join selectivity estimation for range types
2+ messages / 1 participants
[nested] [flat]

* Re: Implement missing join selectivity estimation for range types
@ 2023-06-19 09:49 Schoemans Maxime <[email protected]>
  2023-06-19 16:26 ` Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Schoemans Maxime @ 2023-06-19 09:49 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

Hi,

In the selectivity algorithm, the division was applied after adding the remaining histogram buckets of histogram2 that don't overlap with histogram1.
This could lead to reducing selectivity by half, e.g., in the case that histogram2 is completely right of histogram1.
The correct calculation is to divide by two before adding the remainder.
This patch-fix does the needed.

Best regards,
Maxime Schoemans

On 20/03/2023 16:34, maxime wrote:
Hi Tomas,

As a quick update, the paper related to this work has finally been published in Mathematics (https://www.mdpi.com/2227-7390/11/6/1383).
During revision we also added a figure showing a comparison of our algorithm vs the existing algorithms in Oracle, SQL Server, MySQL and PostgreSQL, which can be found in the experiments section of the paper.
As can be seen, our algorithm outperforms even Oracle and SQL Server.

During this revision we also found a small bug, so we are working on a revision of the patch, which fixes this.


Also, calc_hist_selectivity_contains in multirangetypes_selfuncs.c needs
a proper comment, not just "this is a copy from rangetypes".


Right, the comment should elaborate more that the collected statistics are
currently that same as rangetypes but may potentially deviate.



However, it seems the two functions are exactly the same. Would the
functions diverge in the future? If not, maybe there should be just a
single shared function?


Indeed, it is possible that the two functions will deviate if that statistics
of multirange types will be refined.



Right, but are there any such plans? Also, what's the likelihood we'll
add new statistics to only one of the places (e.g. for multiranges but
not plain ranges)?

I'd keep a single function until we actually need two. That's also
easier for maintenance - with two it's easy to fix a bug in one place
and forget about the other, etc.

Regarding our previous discussion about the duplication of calc_hist_join_selectivity in rangetypes_selfuncs.c and multirangetypes_selfuncs.c, we can also remove this duplication in the revision if needed.
Note that currently, there are no external functions shared between rangetypes_selfuncs.c and multirangetypes_selfuncs.c.
Any function that was used in both files was duplicated as a static function.
The functions calc_hist_selectivity_scalar, calc_length_hist_frac, calc_hist_selectivity_contained and calc_hist_selectivity_contains are examples of this, where the function is identical but has been declared static in both files.
That said, we can remove the duplication of calc_hist_join_selectivity if it still needed.
We would, however, require some guidance as to where to put the external definition of this function, as there does not appear to be a rangetypes_selfuncs.h header.
Should it simply go into utils/selfuncs.h or should we create a new header file?

Best regards,
Maxime Schoemans



Attachments:

  [text/x-patch] v1-0002-apply-division-before-adding-remainder.patch (1.5K, 3-v1-0002-apply-division-before-adding-remainder.patch)
  download | inline diff:
From 53291919f536f6e7b04ca87f408e5b95e730ddb0 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 20 Mar 2023 11:48:05 -0400
Subject: [PATCH] Apply division before adding remainder

---
 src/backend/utils/adt/multirangetypes_selfuncs.c | 5 ++++-
 src/backend/utils/adt/rangetypes_selfuncs.c      | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 7ba4aa8b04..ad14b789f4 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1412,11 +1412,14 @@ calc_hist_join_selectivity(TypeCacheEntry *typcache,
 		prev_sel2 = cur_sel2;
 	}
 
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
 	/* Include remainder of hist2 if any */
 	if (j < nhist2)
 		selectivity += 1 - prev_sel2;
 
-	return selectivity / 2;
+	return selectivity;
 }
 
 /*
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 007e14bcf6..129ef9648f 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1342,11 +1342,14 @@ calc_hist_join_selectivity(TypeCacheEntry *typcache,
 		prev_sel2 = cur_sel2;
 	}
 
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2
+
 	/* Include remainder of hist2 if any */
 	if (j < nhist2)
 		selectivity += 1 - prev_sel2;
 
-	return selectivity / 2;
+	return selectivity;
 }
 
 /*
-- 
2.17.1



^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Implement missing join selectivity estimation for range types
  2023-06-19 09:49 Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
@ 2023-06-19 16:26 ` Schoemans Maxime <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Schoemans Maxime @ 2023-06-19 16:26 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: SAKR Mahmoud <[email protected]>; Diogo Repas <[email protected]>; LUO Zhicheng <[email protected]>; Tomas Vondra <[email protected]>; Andrey Lepikhov <[email protected]>

This is a quick correction as the last patch contained a missing semicolon.

Regards,
Maxime Schoemans

Attachments:

  [text/x-patch] v2-0002-apply-division-before-adding-remainder.patch (1.5K, 2-v2-0002-apply-division-before-adding-remainder.patch)
  download | inline diff:
From ebd62356210eff2f38772a9c46a0a8792c0e9ce3 Mon Sep 17 00:00:00 2001
From: Maxime Schoemans <[email protected]>
Date: Mon, 20 Mar 2023 11:48:05 -0400
Subject: [PATCH v2] Apply division before adding remainder

---
 src/backend/utils/adt/multirangetypes_selfuncs.c | 5 ++++-
 src/backend/utils/adt/rangetypes_selfuncs.c      | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/multirangetypes_selfuncs.c b/src/backend/utils/adt/multirangetypes_selfuncs.c
index 7ba4aa8b04..ad14b789f4 100644
--- a/src/backend/utils/adt/multirangetypes_selfuncs.c
+++ b/src/backend/utils/adt/multirangetypes_selfuncs.c
@@ -1412,11 +1412,14 @@ calc_hist_join_selectivity(TypeCacheEntry *typcache,
 		prev_sel2 = cur_sel2;
 	}
 
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
 	/* Include remainder of hist2 if any */
 	if (j < nhist2)
 		selectivity += 1 - prev_sel2;
 
-	return selectivity / 2;
+	return selectivity;
 }
 
 /*
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 007e14bcf6..90970943b3 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1342,11 +1342,14 @@ calc_hist_join_selectivity(TypeCacheEntry *typcache,
 		prev_sel2 = cur_sel2;
 	}
 
+	/* P(X < Y) = 0.5 * Sum(...) */
+	selectivity /= 2;
+
 	/* Include remainder of hist2 if any */
 	if (j < nhist2)
 		selectivity += 1 - prev_sel2;
 
-	return selectivity / 2;
+	return selectivity;
 }
 
 /*
-- 
2.17.1



^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2023-06-19 16:26 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-06-19 09:49 Re: Implement missing join selectivity estimation for range types Schoemans Maxime <[email protected]>
2023-06-19 16:26 ` Schoemans Maxime <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox