Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCYft-0026gv-1t for pgsql-hackers@arkaria.postgresql.org; Tue, 14 Apr 2026 07:59:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wCYfr-00Aaog-2q for pgsql-hackers@arkaria.postgresql.org; Tue, 14 Apr 2026 07:59:08 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCYfr-00AaoX-1U for pgsql-hackers@lists.postgresql.org; Tue, 14 Apr 2026 07:59:08 +0000 Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wCYfp-00000000wEG-0Cg5 for pgsql-hackers@lists.postgresql.org; Tue, 14 Apr 2026 07:59:07 +0000 Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-b9c11eba219so650859966b.2 for ; Tue, 14 Apr 2026 00:59:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776153544; cv=none; d=google.com; s=arc-20240605; b=d/eVdBrRlvzJqgywih59eHbCN77SJRt9FtRaP++IAND2/JzBfkHJou4Ui9PiwKPo9L szkjRcTNbusbHb62w6ADPIrzGsvNiUFuzsn1/Z9HUsDJldjPYwKgUgsSthJZ0q6XepsC /17eWoWFSCm7QYHuMHjDNoQtbe5z9Dop7G/y4fqEK/eire6bJS7efye0uRfmE3JEegs1 rFNTS3KREZxvCT/RQPAfOtjGnAh5/dCwLhR2YqWEMPTE/3cXmWfwLCW37T8++NHwvfnl X04yJVdRXa2i6PabY7O6GGrWJwuh/6sjy/xml/5ag/yexEF8fRKKOkX9YCh+TrC8AUVv tIlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=XtiIbSCEUzZzKLOlk0WAUP+s3gOAQ5bRth6msX9lRUY=; fh=NOo30AAV3JZ2gV+RE/nn4WjuTPJNrVMgF4c+LPp6+pw=; b=ci0yQWu4Vk33JOD1UzoPYrOtiRs7fMlKIS8+ltPYi9GhzoQ5huFlGKVEDmhaRbnLvM j4Kbb5NpRPpwKmDggFt1mBOb3F2vtyB8JIQt0IOJlcwrK+OHBPb0asBye3Bd8D4yLXeO AAo4/TpHOeZKzTpLtoHOywSZoq/bdHhzTUcVE1kh4UXgdReZfoGBQsZCjEHoLbXIrw+B ELOD9C3rWKiCUOj+EBtn1uQw2hfLx8HoNlPy3VKzZ0SDWsS2IeSDAmSXkLC/VhX9Vlis 9ckVa31axQeHO8PS6lWsoactpL1yHqJne4JNvnGxRI2pDD6NXKtvLUwXE7N77LKcH3Xe fP9g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776153544; x=1776758344; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=XtiIbSCEUzZzKLOlk0WAUP+s3gOAQ5bRth6msX9lRUY=; b=iiKLXsAd/vZdZos9fknJdWsYH9+hZceRSWV3HfCyviLODIYDocwdgpGK+A2lSNIfBJ UPwuAov4UR5jWJmmZhdmPoj8JdJyx/fiZVzCq645dnSvxpfhUVtSX1adjvqRpxiIOv6L j54VTT4aiZd35XJ6FKMkTPQTkDB0oqaKSP3CjhvG/oh9hJn+qIYCPwgeINJXl9EfP0wF IQ4/G3uD2nlUGUAi95XtPyHGso8zxnRxaQgFBTsp/O5J6N92/jFKV5UZnvwMGTcPkWkm T0Sqdvwexl0aY1xYm8b0U6Ea29TJ29QdTHiz/JRumbCUda3YKKEp7q8/9QV9Dl61JRuu Kzjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776153544; x=1776758344; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XtiIbSCEUzZzKLOlk0WAUP+s3gOAQ5bRth6msX9lRUY=; b=HK2c89Ua4qlIj0jdYY2NbMC1rD7cAn5ptrcz8hGzAUiTh+bUg74Y6DYvsOk5tWJvVf ALcS28HMOmTgwjhRA/WdA4UMvXna40QSBuiFlIsTDzsAm822YTIBBQNc7zaaVu5jU/Qe obfRH/fh2WB2o0k7mLQQnJKJEqhg95WlFx4rHSj+hlSJRhWqr3upNfMOPR6Ry2ayn9V3 Bb6Id5FcL2T3gB5iY3B64dGVM+mIvW3eLYm7AtDnj/CmPZkzArU3eTo1A+7UJ2PsbQiJ 9ViixHDcZZMXfzsYmDJaKksZO427xNtuir9ceF5DQDVapiJSzVj1ZTEPabrgY349ac3g B7vA== X-Gm-Message-State: AOJu0Yw8d4TkwPwaLbWnRoCdpA758kUYT7YyTNjcK1YS0OVPIEMCMhPT JlMjXXfK64QbMWDGcqhQ8fvOlzeu0YlHNtrRugPD8M/TaFxj+oOmcXbrOLWylO6kHjLDzL8B188 ITlgBSHYESNKTwL0aTYpEVUEWjR2wQMU= X-Gm-Gg: AeBDievPIJ3GMVaGIJOIuidimi+dHPk7KjJ0bSfxYcGLQeUadWhriGZFK2d2jPR4kFU 1vKCm0L4nSmwECqMk8mQ7uIYCeZElVNF1lLJJn9PbxXXwJjYHO5ypqYsirmhRMrhMD3VBGF6ZdL FuD72aFB/+AdKKJ5cpA2vxI/MHhgJITm5CMFCLDwAn1YHjacYgMHCbgwwTAvHYARF1FK86lyQ7L 4+AlB6Pbl2UpUFYLA95/GM/x5cnQS5/uHZdLHrpvhbdri5GFObGTak4/pJfE4GvCLHvCFdBgYjv 0zkHUGbASDiX3nRnEA4ul2Xi9MFE/HzDpzICYg== X-Received: by 2002:a17:906:9587:b0:b9b:1c7d:6bb8 with SMTP id a640c23a62f3a-b9d729bfd8emr730396566b.43.1776153543683; Tue, 14 Apr 2026 00:59:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: wenhui qiu Date: Tue, 14 Apr 2026 15:58:51 +0800 X-Gm-Features: AQROBzBzIV6YEzpGkVNtFnR58WxAsjVHqlzHrtubZTPRRM7vzdi43ZFcTRr1yNY Message-ID: Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs To: Richard Guo Cc: Pg Hackers Content-Type: multipart/alternative; boundary="00000000000009b2d8064f66f99a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000009b2d8064f66f99a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Richard > I don't think your proposed change would work. vardata->rel is the > CTE/subquery scan rel in the outer query, and its tuples count is the > CTE's output row count, not the base table's. Using it would be > equivalent to not converting at all, since get_variable_numdistinct() > already computes -stadistinct * vardata->rel->tuples. What we need > here is the base table's rel in the subroot, which gives us the > correct rowcount for interpreting the negative fraction. Thank you for your explanation. The path LGTM Thanks On Tue, Apr 14, 2026 at 2:11=E2=80=AFPM Richard Guo wrote: > On Mon, Apr 13, 2026 at 12:27=E2=80=AFPM wenhui qiu wrote: > > > Thanks so much for working on this! While looking at the negative > stadistinct conversion, I was wondering if we might run into a potential > edge case with multi-level nested subqueries. What do you think? > > > > /* Convert negative stadistinct to absolute count */ > > > > if (stats->stadistinct < 0) > > { > > - RelOptInfo *baserel =3D find_base_rel(subroot, var->varno); > > + RelOptInfo *baserel =3D vardata->rel; > > > > - if (baserel->tuples > 0) > > + if (baserel && baserel->tuples > 0) > > { > > stats->stadistinct =3D (float4) > > clamp_row_est(-stats->stadistinct * baserel->tuples); > > } > > } > > I don't think your proposed change would work. vardata->rel is the > CTE/subquery scan rel in the outer query, and its tuples count is the > CTE's output row count, not the base table's. Using it would be > equivalent to not converting at all, since get_variable_numdistinct() > already computes -stadistinct * vardata->rel->tuples. What we need > here is the base table's rel in the subroot, which gives us the > correct rowcount for interpreting the negative fraction. > > - Richard > --00000000000009b2d8064f66f99a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Richard
> I don't think your proposed change= would work.=C2=A0 vardata->rel is the
> CTE/subquery scan rel in = the outer query, and its tuples count is the
> CTE's output row c= ount, not the base table's.=C2=A0 Using it would be
> equivalent = to not converting at all, since get_variable_numdistinct()
> already = computes -stadistinct * vardata->rel->tuples.=C2=A0 What we need
&= gt; here is the base table's rel in the subroot, which gives us the
= > correct rowcount for interpreting the negative fraction.
Thank you for your explanation. The path LGTM

Thanks

On Tue, Apr 14, 2026 at 2:11= =E2=80=AFPM Richard Guo <guofe= nglinux@gmail.com> wrote:
On Mon, Apr 13, 2026 at 12:27=E2=80=AFPM wenhui qiu <qiuwenhuifx@gmail.= com> wrote:

> Thanks so much for working on this! While looking at the negative stad= istinct conversion, I was wondering if we might run into a potential edge c= ase with multi-level nested subqueries. What do you think?
>
> /* Convert negative stadistinct to absolute count */
>
>=C2=A0 =C2=A0 =C2=A0if (stats->stadistinct < 0)
>=C2=A0 =C2=A0 =C2=A0{
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0RelOptInfo *baserel =3D find_base_rel(subr= oot, var->varno);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0RelOptInfo *baserel =3D vardata->rel; >
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0if (baserel->tuples > 0)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0if (baserel && baserel->tuples = > 0)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0stats->stadistinct = =3D (float4)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0clamp_row= _est(-stats->stadistinct * baserel->tuples);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0}

I don't think your proposed change would work.=C2=A0 vardata->rel is= the
CTE/subquery scan rel in the outer query, and its tuples count is the
CTE's output row count, not the base table's.=C2=A0 Using it would = be
equivalent to not converting at all, since get_variable_numdistinct()
already computes -stadistinct * vardata->rel->tuples.=C2=A0 What we n= eed
here is the base table's rel in the subroot, which gives us the
correct rowcount for interpreting the negative fraction.

- Richard
--00000000000009b2d8064f66f99a--