Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mDId8-0005io-7v for pgsql-docs@arkaria.postgresql.org; Tue, 10 Aug 2021 03:40:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1mDId7-0001q3-6K for pgsql-docs@arkaria.postgresql.org; Tue, 10 Aug 2021 03:40:41 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mDId6-0001pt-S6 for pgsql-docs@lists.postgresql.org; Tue, 10 Aug 2021 03:40:41 +0000 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mDId4-0006oI-RF for pgsql-docs@lists.postgresql.org; Tue, 10 Aug 2021 03:40:39 +0000 Received: by mail-pl1-x634.google.com with SMTP id q2so19108625plr.11 for ; Mon, 09 Aug 2021 20:40:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=3t1jau0vZWnzwSfN484zvgwXK1hz9rMp3dWT45hqWTQ=; b=PuckUIf/+vowByZ5zRqLtuBDaKFqF+38m5Xg0/FrQmfAUW9tSqfUN+PPs9r+3mP4jd Z9WnrRFy+/pUIKZbUQELSsitYqVmE+oYyC+j1kkFIKeLN8oOcUkNxnPN2i8mtQc5XVSR 9jduycxDscA4VZvJOCc5WSk5YXDfgUNLoNXjgn4zP4sm0eUvPBr26OSAqfuIT5m1EtEO G/9SaDYXHu+jDfudy+lNqPF8f+X/Iv1XyXYlbJKmHP1sRFtScb4HvhywkrI6hVSjLp1d wx2U8b/spuaJrGXxD89HKCIhkOx5wDINla49LcIAifgC9j8lLopJcUsclxuLQ3MjuZDK nylw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=3t1jau0vZWnzwSfN484zvgwXK1hz9rMp3dWT45hqWTQ=; b=aqxYIGQ5NpS52+B45xIsuYXAfszJTF+lQ0CHlcmTW3ytFAAg5f7o82kBjHk3EpBXum CMJl3YgDrbkkhk+vNrykIR4sYtueMWQtCDEJICJowiS+AgEHLDdD+JX/Q0ygk0NIhIXD dfPrFs5K6POYnGOhdSl8bVzhQFiPmfjM2psNu9pmNMJPaVd3sq6p6gCUgWvsgQWuL6a/ zUDi6kugBuUy/B/2lATpQi07Gp2fvLpcGPs9ud7buBVlFmUxIrdp9rKnLbtxw91UE2ZP OXCE5FE7hg1lVkKTauvPbqw8vkmF2jjZzKesFZZvPRXYIxZeApXBkOVNEF4qwXvYKLkP TL+A== X-Gm-Message-State: AOAM532p+IQjFoLZUUnWHBvI3FS8g1l3s06DKz4Y9WXGk3MhXMHYUFfO vyGASpVrM4pR+ScEyYGkhY9UeC1zNnPO4+nHt8k= X-Google-Smtp-Source: ABdhPJzx45Vl9hia+znjvBOmqTgmUbjngA4vfMdtG7GT4u3qjluLN99UGBmdPr2IYNCsLtWuerpz+LgDsTe40fvMTek= X-Received: by 2002:a63:cd4c:: with SMTP id a12mr608705pgj.449.1628566837666; Mon, 09 Aug 2021 20:40:37 -0700 (PDT) MIME-Version: 1.0 References: <162851358008.709.11763395628268716122@wrigleys.postgresql.org> <20210809180550.GC14370@momjian.us> In-Reply-To: <20210809180550.GC14370@momjian.us> From: "David G. Johnston" Date: Mon, 9 Aug 2021 20:40:20 -0700 Message-ID: Subject: Re: incorrect information in documentation To: Bruce Momjian Cc: ivanmulhin@gmail.com, Pg Docs , Tom Lane Content-Type: multipart/alternative; boundary="000000000000da27de05c92c434b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000da27de05c92c434b Content-Type: text/plain; charset="UTF-8" On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian wrote: > > > selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/ > > num_distinct1, > > 1/num_distinct2) > > = (1 - 0) * (1 - 0) / max(10000, 10000) > > = 0.0001 > > Nice, can you provide a patch please? > > Change the line: selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2) to be: selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_distinct1, num_distinct2) The wording already talks about "divide by max". Though: "so we use an algorithm that relies only on the number of distinct values for both relations together with their null fractions:" maybe adds a parenthetical note: "so we use an algorithm that relies only on the number of distinct values (the row count estimate for the whole table, not the -1 in the column statistics) for both relations together with their null fractions:" Just note I haven't tried to absorb that whole page, let alone the implementation, and am not all that familiar with this part of PostgreSQL. Its seems right, though, in isolation. David J. --000000000000da27de05c92c434b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <bruce@momjian.us> wrote:

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0selectivity =3D (1 - null_frac1) * (1= - null_frac2) * min(1/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0num_distinct1,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01/num_distinct2)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =3D (1 - 0) * (1 - 0) / max(10000, 10000)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =3D 0.0001

Nice, can you provide a patch please?


Change the line:

selectivity =3D (1 - null_frac1) * (1 - null_frac2) * mi= n(1/num_distinct1,=C2=A01/num_distinct2)
to be:

selectivity =3D (1 - null_f= rac1) * (1 - null_frac2) / max(num_distinct1,=C2=A0num_distinct2)

The wording already talks a= bout "divide by max".

Though:

"so we use an algori= thm that relies only on the number of distinct values for both relations to= gether with their null fractions:"

maybe adds a p= arenthetical note:

"so we use an algorithm that r= elies only on the number of distinct values (the row count estimate for the= whole table, not the -1 in the column statistics) for both relations toget= her with their null fractions:"

Just note I haven= 't tried to absorb that whole page, let alone the implementation, and a= m not all that familiar with this part of PostgreSQL.=C2=A0 Its seems right= , though, in isolation.

David J.

--000000000000da27de05c92c434b--