Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1neOCH-0005bZ-SF for pgsql-docs@arkaria.postgresql.org; Tue, 12 Apr 2022 21:37:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1neOCG-0003cX-Js for pgsql-docs@arkaria.postgresql.org; Tue, 12 Apr 2022 21:37:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1neOCG-0003c5-0S for pgsql-docs@lists.postgresql.org; Tue, 12 Apr 2022 21:37:12 +0000 Received: from mail-ej1-x633.google.com ([2a00:1450:4864:20::633]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1neOCB-0000se-S3 for pgsql-docs@lists.postgresql.org; Tue, 12 Apr 2022 21:37:11 +0000 Received: by mail-ej1-x633.google.com with SMTP id t11so126250eju.13 for ; Tue, 12 Apr 2022 14:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KkQAYXiDLR3hiKZNY4WgZ2vyipo5ng6IU6uy0MhYEik=; b=lgJI8iwyNdzo2+Eo0VeambGeASTO0dITJwVZebhdEP9VbB3iOlwmCTKXALBf8sopWh 6jTR3pr+1hsgv6Yak/4bvciWTStcsA/+zfnyQYT9ZCDNjf1ED3VIlvF7u55Bl6YrRs4z 9pS5KfpCIIj896LhgFhS5vKPwCb4IwEnPGWT3X4EbmnwsfYBa/0vjpHBEoIKCzB1vc7x cI/8JPIPpkHECKXBK011a5Unm6j+zgnFjPgWw7uUOgrEzQvzMHpaStxpxaYNZ4NX9G9Z nXB9Tylq2E7XuImuLSn14rCziNl9hlz39FDT3ALJPSS5NbNlR31RhDP0mznB4aF5bkjZ e5pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KkQAYXiDLR3hiKZNY4WgZ2vyipo5ng6IU6uy0MhYEik=; b=JOy5YhMDIcYL3xwX1JeavUVFIB69atJqB+OysG/y9viHur4A3Vneog3cRngfoIRYhN kUic5gMF/Zue14aowc9+Hq3lEecG10eS6A2JU3v3NM9xtkwLtAxKzjvGIbXdj9Y8Ots8 JmwKknB9SSxR6I2EV1sCR4jYmmUIkskq9OW5uURZ5W4W1p+MeauS97YlnhNjD+6o5UOl R/K/9A4zIIJTLAKjADqcEyRW8G5XZEG9GcZ6v7CIZUIflJnvyQjqWDa4vh1ILLincjCr 5ZjBGjGLCXuBleo4+OVZV0+hhu9hLLyGDe6ibxQ3esLs3FWGP5Q8/WmHOAP9IXRdQERn 08fA== X-Gm-Message-State: AOAM533GmyeqKzjzjn0gHOvPtay6N7ml0kMNK997tdq2ug6L5OurRMzJ rdG42fFfhjmFRQT0MoOZ9IF1Ecqf0qWnZgm08HQ= X-Google-Smtp-Source: ABdhPJz0lCNFm5My/Vna/Ew8YkOgwFwsn3xE7pWY6L8tb9gTSkE2s2py9AvLTgFwDaVkAz/3AZ2ZuV8QQlz38VMp3RI= X-Received: by 2002:a17:906:d54b:b0:6e7:f185:18d5 with SMTP id cr11-20020a170906d54b00b006e7f18518d5mr38634356ejc.155.1649799426126; Tue, 12 Apr 2022 14:37:06 -0700 (PDT) MIME-Version: 1.0 References: <162851358008.709.11763395628268716122@wrigleys.postgresql.org> <20210809180550.GC14370@momjian.us> In-Reply-To: From: "David G. Johnston" Date: Tue, 12 Apr 2022 14:36:47 -0700 Message-ID: Subject: Re: incorrect information in documentation To: Bruce Momjian Cc: ivanmulhin@gmail.com, Pg Docs , Tom Lane Content-Type: multipart/mixed; boundary="000000000000bee91e05dc7bdc41" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000bee91e05dc7bdc41 Content-Type: multipart/alternative; boundary="000000000000bee91c05dc7bdc3f" --000000000000bee91c05dc7bdc3f Content-Type: text/plain; charset="UTF-8" On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston wrote: > On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian wrote: > >> >> > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/ >> > num_distinct1, >> > 1/num_distinct2) >> > = (1 - 0) * (1 - 0) / max(10000, 10000) >> > = 0.0001 >> >> Nice, can you provide a patch please? >> >> > Change the line: > > Concretely, as attached and inline. David J. commit 73fa486a855d75d74a1a695bb350bfbfe27c7751 Author: David G. Johnston Date: Tue Apr 12 21:23:53 2022 +0000 doc: make unique non-null join selectivity example match the prose The description of the computation for the unique, non-null, join selectivity describes a division by the maximum of two values, while the example shows a multiplication by their reciprocal. While equivalent the max phrasing is easier to understand; which seems more important here than precisely adhering to the formula use in the code (for which either variant is still an approximation). While both num_distinct and num_rows are equal for a unique column both the concept and formula use row count (10,000) and the field num_distinct has already been set to mean the specific value present in the pg_stats table (i.e, -1), so use num_rows here. diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index 78053d7c49..f72bc4b274 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals In this case there is no MCV information for - unique2 because all the values appear to be - unique, so we use an algorithm that relies only on the number of - distinct values for both relations together with their null fractions: + unique2 and all the values appear to be + unique (n_distinct = -1), so we use an algorithm that relies on the row + count estimates for both relations (num_rows, not shown, but "tenk") + together with the column null fractions (zero for both): -selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2) +selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2) = (1 - 0) * (1 - 0) / max(10000, 10000) = 0.0001 This is, subtract the null fraction from one for each of the relations, - and divide by the maximum of the numbers of distinct values. + and divide by the row count of the larger relation (this value does get + scaled in the non-unique case). The number of rows that the join is likely to emit is calculated as the cardinality of the Cartesian product of the two inputs, multiplied by the --000000000000bee91c05dc7bdc3f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Aug 9, 2021 at 8:40 PM David G. Johnston <david.g.johnston@gmail.com>= wrote:
On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <= ;bruce@momjian.us= > wrote:

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0selectivity =3D (1 - null_frac1) * (1= - null_frac2) * min(1/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0num_distinct1,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01/num_distinct2)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =3D (1 - 0) * (1 - 0) / max(10000, 10000)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =3D 0.0001

Nice, can you provide a patch please?


Change the line:
<= div style=3D"font-family:arial,helvetica,sans-serif">
=

Concretely, as attached and inline.
=
David J.


commit 73fa486a855d75d74a1a695bb350bfb= fe27c7751
Author: David G. Johnston <david.g.johnston@gmail.com>
Date: =C2=A0 T= ue Apr 12 21:23:53 2022 +0000

=C2=A0 =C2=A0 doc: make unique non-nul= l join selectivity example match the prose
=C2=A0 =C2=A0
=C2=A0 =C2= =A0 The description of the computation for the unique, non-null,
=C2=A0 = =C2=A0 join selectivity describes a division by the maximum of two values,<= br>=C2=A0 =C2=A0 while the example shows a multiplication by their reciproc= al.=C2=A0 While
=C2=A0 =C2=A0 equivalent the max phrasing is easier to u= nderstand; which seems
=C2=A0 =C2=A0 more important here than precisely = adhering to the formula use
=C2=A0 =C2=A0 in the code (for which either = variant is still an approximation).
=C2=A0 =C2=A0
=C2=A0 =C2=A0 Whil= e both num_distinct and num_rows are equal for a unique column
=C2=A0 = =C2=A0 both the concept and formula use row count (10,000) and the
=C2= =A0 =C2=A0 field num_distinct has already been set to mean the specific val= ue
=C2=A0 =C2=A0 present in the pg_stats table (i.e, -1), so use num_row= s here.

diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/plan= stats.sgml
index 78053d7c49..f72bc4b274 100644
--- a/doc/src/sgml/pla= nstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -391,18 +391,20 @@ t= ablename =C2=A0| null_frac | n_distinct | most_common_vals
=C2=A0</pr= ogramlisting>
=C2=A0
=C2=A0 =C2=A0 In this case there is no <ac= ronym>MCV</acronym> information for
- =C2=A0 <structfield>= ;unique2</structfield> because all the values appear to be
- =C2= =A0 unique, so we use an algorithm that relies only on the number of
- = =C2=A0 distinct values for both relations together with their null fraction= s:
+ =C2=A0 <structname>unique2</structname> and all the val= ues appear to be
+ =C2=A0 unique (n_distinct =3D -1), so we use an algor= ithm that relies on the row
+ =C2=A0 count estimates for both relations = (num_rows, not shown, but "tenk")
+ =C2=A0 together with the c= olumn null fractions (zero for both):
=C2=A0
=C2=A0<programlisting= >
-selectivity =3D (1 - null_frac1) * (1 - null_frac2) * min(1/num_di= stinct1, 1/num_distinct2)
+selectivity =3D (1 - null_frac1) * (1 - null_= frac2) / max(num_rows1, num_rows2)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0=3D (1 - 0) * (1 - 0) / max(10000, 10000)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 0.0001
=C2=A0</programlisting><= br>=C2=A0
=C2=A0 =C2=A0 This is, subtract the null fraction from one for= each of the relations,
- =C2=A0 and divide by the maximum of the number= s of distinct values.
+ =C2=A0 and divide by the row count of the larger= relation (this value does get
+ =C2=A0 scaled in the non-unique case).<= br>=C2=A0 =C2=A0 The number of rows
=C2=A0 =C2=A0 that the join is likel= y to emit is calculated as the cardinality of the
=C2=A0 =C2=A0 Cartesia= n product of the two inputs, multiplied by the
--000000000000bee91c05dc7bdc3f-- --000000000000bee91e05dc7bdc41 Content-Type: application/octet-stream; name="v0001-doc-make-row-estimation-example-match-prose.patch" Content-Disposition: attachment; filename="v0001-doc-make-row-estimation-example-match-prose.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_l1wny4wk0 Y29tbWl0IDczZmE0ODZhODU1ZDc1ZDc0YTFhNjk1YmIzNTBiZmJmZTI3Yzc3NTEKQXV0aG9yOiBE YXZpZCBHLiBKb2huc3RvbiA8ZGF2aWQuZy5qb2huc3RvbkBnbWFpbC5jb20+CkRhdGU6ICAgVHVl IEFwciAxMiAyMToyMzo1MyAyMDIyICswMDAwCgogICAgZG9jOiBtYWtlIHVuaXF1ZSBub24tbnVs bCBqb2luIHNlbGVjdGl2aXR5IGV4YW1wbGUgbWF0Y2ggdGhlIHByb3NlCiAgICAKICAgIFRoZSBk ZXNjcmlwdGlvbiBvZiB0aGUgY29tcHV0YXRpb24gZm9yIHRoZSB1bmlxdWUsIG5vbi1udWxsLAog ICAgam9pbiBzZWxlY3Rpdml0eSBkZXNjcmliZXMgYSBkaXZpc2lvbiBieSB0aGUgbWF4aW11bSBv ZiB0d28gdmFsdWVzLAogICAgd2hpbGUgdGhlIGV4YW1wbGUgc2hvd3MgYSBtdWx0aXBsaWNhdGlv biBieSB0aGVpciByZWNpcHJvY2FsLiAgV2hpbGUKICAgIGVxdWl2YWxlbnQgdGhlIG1heCBwaHJh c2luZyBpcyBlYXNpZXIgdG8gdW5kZXJzdGFuZDsgd2hpY2ggc2VlbXMKICAgIG1vcmUgaW1wb3J0 YW50IGhlcmUgdGhhbiBwcmVjaXNlbHkgYWRoZXJpbmcgdG8gdGhlIGZvcm11bGEgdXNlCiAgICBp biB0aGUgY29kZSAoZm9yIHdoaWNoIGVpdGhlciB2YXJpYW50IGlzIHN0aWxsIGFuIGFwcHJveGlt YXRpb24pLgogICAgCiAgICBXaGlsZSBib3RoIG51bV9kaXN0aW5jdCBhbmQgbnVtX3Jvd3MgYXJl IGVxdWFsIGZvciBhIHVuaXF1ZSBjb2x1bW4KICAgIGJvdGggdGhlIGNvbmNlcHQgYW5kIGZvcm11 bGEgdXNlIHJvdyBjb3VudCAoMTAsMDAwKSBhbmQgdGhlCiAgICBmaWVsZCBudW1fZGlzdGluY3Qg aGFzIGFscmVhZHkgYmVlbiBzZXQgdG8gbWVhbiB0aGUgc3BlY2lmaWMgdmFsdWUKICAgIHByZXNl bnQgaW4gdGhlIHBnX3N0YXRzIHRhYmxlIChpLmUsIC0xKSwgc28gdXNlIG51bV9yb3dzIGhlcmUu CgpkaWZmIC0tZ2l0IGEvZG9jL3NyYy9zZ21sL3BsYW5zdGF0cy5zZ21sIGIvZG9jL3NyYy9zZ21s L3BsYW5zdGF0cy5zZ21sCmluZGV4IDc4MDUzZDdjNDkuLmY3MmJjNGIyNzQgMTAwNjQ0Ci0tLSBh L2RvYy9zcmMvc2dtbC9wbGFuc3RhdHMuc2dtbAorKysgYi9kb2Mvc3JjL3NnbWwvcGxhbnN0YXRz LnNnbWwKQEAgLTM5MSwxOCArMzkxLDIwIEBAIHRhYmxlbmFtZSAgfCBudWxsX2ZyYWMgfCBuX2Rp c3RpbmN0IHwgbW9zdF9jb21tb25fdmFscwogPC9wcm9ncmFtbGlzdGluZz4KIAogICAgSW4gdGhp cyBjYXNlIHRoZXJlIGlzIG5vIDxhY3JvbnltPk1DVjwvYWNyb255bT4gaW5mb3JtYXRpb24gZm9y Ci0gICA8c3RydWN0ZmllbGQ+dW5pcXVlMjwvc3RydWN0ZmllbGQ+IGJlY2F1c2UgYWxsIHRoZSB2 YWx1ZXMgYXBwZWFyIHRvIGJlCi0gICB1bmlxdWUsIHNvIHdlIHVzZSBhbiBhbGdvcml0aG0gdGhh dCByZWxpZXMgb25seSBvbiB0aGUgbnVtYmVyIG9mCi0gICBkaXN0aW5jdCB2YWx1ZXMgZm9yIGJv dGggcmVsYXRpb25zIHRvZ2V0aGVyIHdpdGggdGhlaXIgbnVsbCBmcmFjdGlvbnM6CisgICA8c3Ry dWN0bmFtZT51bmlxdWUyPC9zdHJ1Y3RuYW1lPiBhbmQgYWxsIHRoZSB2YWx1ZXMgYXBwZWFyIHRv IGJlCisgICB1bmlxdWUgKG5fZGlzdGluY3QgPSAtMSksIHNvIHdlIHVzZSBhbiBhbGdvcml0aG0g dGhhdCByZWxpZXMgb24gdGhlIHJvdworICAgY291bnQgZXN0aW1hdGVzIGZvciBib3RoIHJlbGF0 aW9ucyAobnVtX3Jvd3MsIG5vdCBzaG93biwgYnV0ICJ0ZW5rIikKKyAgIHRvZ2V0aGVyIHdpdGgg dGhlIGNvbHVtbiBudWxsIGZyYWN0aW9ucyAoemVybyBmb3IgYm90aCk6CiAKIDxwcm9ncmFtbGlz dGluZz4KLXNlbGVjdGl2aXR5ID0gKDEgLSBudWxsX2ZyYWMxKSAqICgxIC0gbnVsbF9mcmFjMikg KiBtaW4oMS9udW1fZGlzdGluY3QxLCAxL251bV9kaXN0aW5jdDIpCitzZWxlY3Rpdml0eSA9ICgx IC0gbnVsbF9mcmFjMSkgKiAoMSAtIG51bGxfZnJhYzIpIC8gbWF4KG51bV9yb3dzMSwgbnVtX3Jv d3MyKQogICAgICAgICAgICAgPSAoMSAtIDApICogKDEgLSAwKSAvIG1heCgxMDAwMCwgMTAwMDAp CiAgICAgICAgICAgICA9IDAuMDAwMQogPC9wcm9ncmFtbGlzdGluZz4KIAogICAgVGhpcyBpcywg c3VidHJhY3QgdGhlIG51bGwgZnJhY3Rpb24gZnJvbSBvbmUgZm9yIGVhY2ggb2YgdGhlIHJlbGF0 aW9ucywKLSAgIGFuZCBkaXZpZGUgYnkgdGhlIG1heGltdW0gb2YgdGhlIG51bWJlcnMgb2YgZGlz dGluY3QgdmFsdWVzLgorICAgYW5kIGRpdmlkZSBieSB0aGUgcm93IGNvdW50IG9mIHRoZSBsYXJn ZXIgcmVsYXRpb24gKHRoaXMgdmFsdWUgZG9lcyBnZXQKKyAgIHNjYWxlZCBpbiB0aGUgbm9uLXVu aXF1ZSBjYXNlKS4KICAgIFRoZSBudW1iZXIgb2Ygcm93cwogICAgdGhhdCB0aGUgam9pbiBpcyBs aWtlbHkgdG8gZW1pdCBpcyBjYWxjdWxhdGVkIGFzIHRoZSBjYXJkaW5hbGl0eSBvZiB0aGUKICAg IENhcnRlc2lhbiBwcm9kdWN0IG9mIHRoZSB0d28gaW5wdXRzLCBtdWx0aXBsaWVkIGJ5IHRoZQo= --000000000000bee91e05dc7bdc41--