public inbox for [email protected]
help / color / mirror / Atom feedincorrect information in documentation
6+ messages / 3 participants
[nested] [flat]
* incorrect information in documentation
@ 2021-08-09 12:53 PG Doc comments form <[email protected]>
2021-08-09 16:06 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: PG Doc comments form @ 2021-08-09 12:53 UTC (permalink / raw)
To: [email protected]; +Cc: [email protected]
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/13/row-estimation-examples.html
Description:
Hello, on page
https://www.postgresql.org/docs/current/row-estimation-examples.html - there
is a example:
selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
1/num_distinct2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
in the first string " * min" and in the second " / max"
as I understand it isn't correct.
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: incorrect information in documentation
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
@ 2021-08-09 16:06 ` David G. Johnston <[email protected]>
2021-08-09 16:20 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: David G. Johnston @ 2021-08-09 16:06 UTC (permalink / raw)
To: [email protected]; Pg Docs <[email protected]>
On Mon, Aug 9, 2021 at 8:02 AM PG Doc comments form <[email protected]>
wrote:
> Hello, on page
> https://www.postgresql.org/docs/current/row-estimation-examples.html -
> there
> is a example:
> selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
> 1/num_distinct2)
> = (1 - 0) * (1 - 0) / max(10000, 10000)
> = 0.0001
> in the first string " * min" and in the second " / max"
> as I understand it isn't correct.
>
Division is just multiplication by the reciprocal so while the presentation
here is inconsistent it is correct. Likewise, the larger a number the
smaller its reciprocal, so the change from min to max also works.
David J.
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: incorrect information in documentation
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
2021-08-09 16:06 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
@ 2021-08-09 16:20 ` David G. Johnston <[email protected]>
2021-08-09 18:05 ` Re: incorrect information in documentation Bruce Momjian <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: David G. Johnston @ 2021-08-09 16:20 UTC (permalink / raw)
To: [email protected]; Pg Docs <[email protected]>; Tom Lane <[email protected]>
On Mon, Aug 9, 2021 at 9:06 AM David G. Johnston <[email protected]>
wrote:
> On Mon, Aug 9, 2021 at 8:02 AM PG Doc comments form <
> [email protected]> wrote:
>
>> Hello, on page
>> https://www.postgresql.org/docs/current/row-estimation-examples.html -
>> there
>> is a example:
>> selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
>> 1/num_distinct2)
>> = (1 - 0) * (1 - 0) / max(10000, 10000)
>> = 0.0001
>> in the first string " * min" and in the second " / max"
>> as I understand it isn't correct.
>>
>
> Division is just multiplication by the reciprocal so while the
> presentation here is inconsistent it is correct. Likewise, the larger a
> number the smaller its reciprocal, so the change from min to max also
> works.
>
>
FWIW this used to be presented with the calculation and formula in sync,
but the original had a simple typo in the calculation. When the typo got
fixed back in December of 2007 [1] the author of the patch simplified the
calculation at the same time. I suggest we update the formula line to
match the calculation presented.
David J.
1.
https://github.com/postgres/postgres/commit/f5678e8e07563e34ae4dc832546977d13edcd665
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: incorrect information in documentation
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
2021-08-09 16:06 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
2021-08-09 16:20 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
@ 2021-08-09 18:05 ` Bruce Momjian <[email protected]>
2021-08-10 03:40 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: Bruce Momjian @ 2021-08-09 18:05 UTC (permalink / raw)
To: David G. Johnston <[email protected]>; +Cc: [email protected]; Pg Docs <[email protected]>; Tom Lane <[email protected]>
On Mon, Aug 9, 2021 at 09:20:53AM -0700, David G. Johnston wrote:
> On Mon, Aug 9, 2021 at 9:06 AM David G. Johnston <[email protected]>
> wrote:
>
> On Mon, Aug 9, 2021 at 8:02 AM PG Doc comments form <[email protected]
> > wrote:
>
> Hello, on page
> https://www.postgresql.org/docs/current/row-estimation-examples.html -
> there
> is a example:
> selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
> num_distinct1,
> 1/num_distinct2)
> = (1 - 0) * (1 - 0) / max(10000, 10000)
> = 0.0001
> in the first string " * min" and in the second " / max"
> as I understand it isn't correct.
>
>
> Division is just multiplication by the reciprocal so while the presentation
> here is inconsistent it is correct. Likewise, the larger a number the
> smaller its reciprocal, so the change from min to max also works.
>
>
>
> FWIW this used to be presented with the calculation and formula in sync, but
> the original had a simple typo in the calculation. When the typo got fixed
> back in December of 2007 [1] the author of the patch simplified the calculation
> at the same time. I suggest we update the formula line to match the
> calculation presented.
Nice, can you provide a patch please?
--
Bruce Momjian <[email protected]> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: incorrect information in documentation
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
2021-08-09 16:06 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
2021-08-09 16:20 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
2021-08-09 18:05 ` Re: incorrect information in documentation Bruce Momjian <[email protected]>
@ 2021-08-10 03:40 ` David G. Johnston <[email protected]>
2022-04-12 21:36 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
0 siblings, 1 reply; 6+ messages in thread
From: David G. Johnston @ 2021-08-10 03:40 UTC (permalink / raw)
To: Bruce Momjian <[email protected]>; +Cc: [email protected]; Pg Docs <[email protected]>; Tom Lane <[email protected]>
On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <[email protected]> wrote:
>
> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
> > num_distinct1,
> > 1/num_distinct2)
> > = (1 - 0) * (1 - 0) / max(10000, 10000)
> > = 0.0001
>
> Nice, can you provide a patch please?
>
>
Change the line:
selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
1/num_distinct2)
to be:
selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_distinct1,
num_distinct2)
The wording already talks about "divide by max".
Though:
"so we use an algorithm that relies only on the number of distinct values
for both relations together with their null fractions:"
maybe adds a parenthetical note:
"so we use an algorithm that relies only on the number of distinct values
(the row count estimate for the whole table, not the -1 in the column
statistics) for both relations together with their null fractions:"
Just note I haven't tried to absorb that whole page, let alone the
implementation, and am not all that familiar with this part of PostgreSQL.
Its seems right, though, in isolation.
David J.
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: incorrect information in documentation
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
2021-08-09 16:06 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
2021-08-09 16:20 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
2021-08-09 18:05 ` Re: incorrect information in documentation Bruce Momjian <[email protected]>
2021-08-10 03:40 ` Re: incorrect information in documentation David G. Johnston <[email protected]>
@ 2022-04-12 21:36 ` David G. Johnston <[email protected]>
0 siblings, 0 replies; 6+ messages in thread
From: David G. Johnston @ 2022-04-12 21:36 UTC (permalink / raw)
To: Bruce Momjian <[email protected]>; +Cc: [email protected]; Pg Docs <[email protected]>; Tom Lane <[email protected]>
On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston <[email protected]>
wrote:
> On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <[email protected]> wrote:
>
>>
>> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
>> > num_distinct1,
>> > 1/num_distinct2)
>> > = (1 - 0) * (1 - 0) / max(10000, 10000)
>> > = 0.0001
>>
>> Nice, can you provide a patch please?
>>
>>
> Change the line:
>
>
Concretely, as attached and inline.
David J.
commit 73fa486a855d75d74a1a695bb350bfbfe27c7751
Author: David G. Johnston <[email protected]>
Date: Tue Apr 12 21:23:53 2022 +0000
doc: make unique non-null join selectivity example match the prose
The description of the computation for the unique, non-null,
join selectivity describes a division by the maximum of two values,
while the example shows a multiplication by their reciprocal. While
equivalent the max phrasing is easier to understand; which seems
more important here than precisely adhering to the formula use
in the code (for which either variant is still an approximation).
While both num_distinct and num_rows are equal for a unique column
both the concept and formula use row count (10,000) and the
field num_distinct has already been set to mean the specific value
present in the pg_stats table (i.e, -1), so use num_rows here.
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 78053d7c49..f72bc4b274 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals
</programlisting>
In this case there is no <acronym>MCV</acronym> information for
- <structfield>unique2</structfield> because all the values appear to be
- unique, so we use an algorithm that relies only on the number of
- distinct values for both relations together with their null fractions:
+ <structname>unique2</structname> and all the values appear to be
+ unique (n_distinct = -1), so we use an algorithm that relies on the row
+ count estimates for both relations (num_rows, not shown, but "tenk")
+ together with the column null fractions (zero for both):
<programlisting>
-selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1,
1/num_distinct2)
+selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1,
num_rows2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
</programlisting>
This is, subtract the null fraction from one for each of the relations,
- and divide by the maximum of the numbers of distinct values.
+ and divide by the row count of the larger relation (this value does get
+ scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the
Attachments:
[application/octet-stream] v0001-doc-make-row-estimation-example-match-prose.patch (2.4K, 3-v0001-doc-make-row-estimation-example-match-prose.patch)
download | inline diff:
commit 73fa486a855d75d74a1a695bb350bfbfe27c7751
Author: David G. Johnston <[email protected]>
Date: Tue Apr 12 21:23:53 2022 +0000
doc: make unique non-null join selectivity example match the prose
The description of the computation for the unique, non-null,
join selectivity describes a division by the maximum of two values,
while the example shows a multiplication by their reciprocal. While
equivalent the max phrasing is easier to understand; which seems
more important here than precisely adhering to the formula use
in the code (for which either variant is still an approximation).
While both num_distinct and num_rows are equal for a unique column
both the concept and formula use row count (10,000) and the
field num_distinct has already been set to mean the specific value
present in the pg_stats table (i.e, -1), so use num_rows here.
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 78053d7c49..f72bc4b274 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals
</programlisting>
In this case there is no <acronym>MCV</acronym> information for
- <structfield>unique2</structfield> because all the values appear to be
- unique, so we use an algorithm that relies only on the number of
- distinct values for both relations together with their null fractions:
+ <structname>unique2</structname> and all the values appear to be
+ unique (n_distinct = -1), so we use an algorithm that relies on the row
+ count estimates for both relations (num_rows, not shown, but "tenk")
+ together with the column null fractions (zero for both):
<programlisting>
-selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
+selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
</programlisting>
This is, subtract the null fraction from one for each of the relations,
- and divide by the maximum of the numbers of distinct values.
+ and divide by the row count of the larger relation (this value does get
+ scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the
^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2022-04-12 21:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 12:53 incorrect information in documentation PG Doc comments form <[email protected]>
2021-08-09 16:06 ` David G. Johnston <[email protected]>
2021-08-09 16:20 ` David G. Johnston <[email protected]>
2021-08-09 18:05 ` Bruce Momjian <[email protected]>
2021-08-10 03:40 ` David G. Johnston <[email protected]>
2022-04-12 21:36 ` David G. Johnston <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox