Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDE5y-002ko0-24 for pgsql-hackers@arkaria.postgresql.org; Thu, 16 Apr 2026 04:12:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wDE5x-003iKB-39 for pgsql-hackers@arkaria.postgresql.org; Thu, 16 Apr 2026 04:12:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDE5x-003iK1-21 for pgsql-hackers@lists.postgresql.org; Thu, 16 Apr 2026 04:12:49 +0000 Received: from mail-yx1-xb129.google.com ([2607:f8b0:4864:20::b129]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wDE5v-00000001LUH-2F2N for pgsql-hackers@lists.postgresql.org; Thu, 16 Apr 2026 04:12:49 +0000 Received: by mail-yx1-xb129.google.com with SMTP id 956f58d0204a3-651ce2484d5so4479273d50.1 for ; Wed, 15 Apr 2026 21:12:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776312766; cv=none; d=google.com; s=arc-20240605; b=WVxPiLKT292vKxckyGCXGGrmcfVmnCQdYMYBZh81HFG+IeInivHCT5LybKzRJLxHKG ceBLGX9IRTUIs63UTyZjPCsQXj5SEMc1yIyfClbcLtIwmWOfYYHuLaMd4nDsyhkX0wEq DymGsoCQozKvE+KUVUb4QnlvkKaYHCMJiBO14me93yKhe+Jk5lG/mvIDtN2XihoYFVUD Co67UZYB4g1TgpJZF3UmSNbn/fw3ZTM2Wh89/KsqsqW0yjk4vUrnqTuGQhRriolQtEQ1 CIE5Wn9sL4cdyJf6crORY9TqKe29DFcn0dzA92xkBzYW175zc7Jpr3VHbd5LaJ4WhLdL emIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=4vGnNv2DtGaG1ZrePOYVYrr4bg6x8RGbudLoMpaJYOk=; fh=p7Wb6JeiVdlPJjd44RutJNr5LxKRXY4k2LUZO9WydNU=; b=WQ5PbwgiVNqcXKJGNG1tDXLSudQ8PnfUzIqTpD8TcX2c1UQL2xY36gQ7zwchRCSawf N7cuWKqzPylyF69Rc8IXKVHYrr9bhq34Uvfd2AgPaZolWueF5C/IyGi2kq2YNmkfi42P tkwWaX8tnBrn7ibmySbO2kDGi+Vkne8hWWrS9a/AtYTcYbZ2Mc856zCpg7lIpw5hXSak QD1j/FaJ2Eyb5I/94WSYY/Hgx5t7Qqs33Qyg1oY9b3a4zY+fqZyKwXy3pdVpfj0IZVrY Ju2QKK0u626yqPWmR2iokNj77m95Bi/GB3NEXcrMVU0WscKlDlU575s6W14xXVMB7U7G W+lQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776312766; x=1776917566; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4vGnNv2DtGaG1ZrePOYVYrr4bg6x8RGbudLoMpaJYOk=; b=UhOjgdxyLXMzy7AQmPQEiCqOVr4qijWxqkrm8YwtBc+Sv09nmNw/+rZP16mZbW2ljx ZeO/2heaRLAgNGV2flc7knvXQ6p9XsDmLoyAyIKD6ImD4HjRFvFzhISGd8MxslwMGNJU 8Mh2Ms9HKBvRwCJxOe8ChzI43W9+xrtee1bERUdgGkT3zXHw77ei1d7SqGnkHhvWpV1S 8kG+g1U8x7PHv9IETMxMHf9BrA0tAjY8jU2YK96cbA3BGWgzZPdQ0eoTceUIOfh2IMmk P8qc9VQMm24rp/V+JieWwDXejlpRLRuZdMLOlGthXWeuohaoio2J5AgHMzMnd1ukT4Sa Tgew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776312766; x=1776917566; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4vGnNv2DtGaG1ZrePOYVYrr4bg6x8RGbudLoMpaJYOk=; b=i5v2gMjFBI+qXnj7FPFOFxtkyZbKxX9x2JHywez52zoGdIfScVVUVt68ycuMyB41ld 1hE8HzDmm2Ub4vhvS5qEI/oTFDI1fi9yHdiKM6/t8mUfPecbK4PN2aPiQ7ORc6lquWA/ 9wtdlJgs3MH/5iB5eUauZPpmbvaO5KtwkvAq1CdyQ5blvqngJIfo9YmqDa6ieQ1E6siO yhVneelIFkZ2v5uqwfwc3xJCtOhepxExmruVBWRSFmS7xhAJTNobbr653Eg4gHTgUBeq Czz2BlpZ7L3NDoGsFsRQVH+mqTF3Hip26PD7dMWgPs+WjVP4bJgSoI2KTUiuC93NtYqx dvMQ== X-Forwarded-Encrypted: i=1; AFNElJ/oLIB96JMVZS1/beCf0l0+IEuyMSPrT3T9OfSEvzfAkztIbGtiWSwBSk+SyB1yNxUITmX2lVA6olABqDe0@lists.postgresql.org X-Gm-Message-State: AOJu0YzArPCQmanxUKqEhw5Lg9f0yC+SVUrYh6M/8jC83ndPnNVKYAAn KRo1DNQREHTd/HvJhtVv3RmvwQ7Ofq/Qdce7GQXEe0oEika1wLem6v25AmEuchIGMQuzyDKrGfC rvkaG4IFcU8ApKe/pmPF2M5Hr2G8L7co= X-Gm-Gg: AeBDiesg8+wxb675lP3m2okgTzJLpxhR/JNYxA1rlubBPh2b2shx5X2qDtf7AZm9U6l o4vIcdnTS1hPytkUEYiGya2oiDRM1kTUEoiObfHT4eDgcdOF3z3jaO/88xrUqovSTkyHXPVasdU E/hlwONEYv2sw80/wony2BxJkn0UxrMghoNjNz6Erp5386RQkV2E3ePbIu+ROKRzpRZtZDBqQ81 5r/SHlcPzlo9qrfh+VMnAcotTzux1+5IhCdVAlVpbK3FZyx3tcCfv1qQFiiChDbJexF3UCfUUk0 LVk8YlU3VOTVRHa9Rq6P/eYU0hEBUxXM1QX0nHh1dImTIfRqEw== X-Received: by 2002:a05:690e:438b:b0:650:18fc:f557 with SMTP id 956f58d0204a3-65198c2cbf9mr15460911d50.56.1776312765863; Wed, 15 Apr 2026 21:12:45 -0700 (PDT) MIME-Version: 1.0 References: <8afecd87-d1e5-241c-5e3e-75e1c62c279b@enterprisedb.com> <1bc5e1e4-380b-213d-6e6a-d5e8f7f0c7f5@enterprisedb.com> <84ffb566-8038-ab35-c841-7a5e5728a247@enterprisedb.com> <525ea957-88ff-e2c6-2975-474c849aa234@ulb.be> <48a78946-f380-a279-e75f-711846e509f8@ulb.be> <4c443cf6-96af-0953-fd5b-bd2b5c0c6f00@ulb.be> <2516071.1699991181@sss.pgh.pa.us> <08b2dea8-6dca-06e4-a53d-3e6334e7645b@ulb.be> In-Reply-To: From: Haibo Yan Date: Wed, 15 Apr 2026 21:12:35 -0700 X-Gm-Features: AQROBzDw6ivxXy40y3dMB5-fkpTEP54xBZTAd4_r3-Z9nJKglL_YaDlBdwKPrAQ Message-ID: Subject: Re: Implement missing join selectivity estimation for range types To: SCHOEMANS Maxime Cc: vignesh C , Tom Lane , Damir Belyalov , jian he , PostgreSQL Hackers , SAKR Mahmoud , Diogo Repas , Andrey Lepikhov Content-Type: multipart/alternative; boundary="0000000000006b5c7f064f8c0b3c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000006b5c7f064f8c0b3c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Apr 15, 2026 at 8:13=E2=80=AFAM SCHOEMANS Maxime wrote: > Hi Haibo, > > Thank you for the review. > > > One small note: the localized bool join_is_reversed; in my version was > > intentional. I left it that way because get_join_variables() wants a > > storage location, and I preferred to keep that use local and explicit > > rather than trying to reshape things around it. > > Fair point. I moved it out of the bare block because it looked unusual, > but I can change it back if you prefer. > > > For patch 2, I am less convinced, especially for &&. > > [...] > > for multiranges there is a third possibility: neither side is entirely > > left nor entirely right, and yet they still do not overlap because of > > an internal gap. > > This is a valid concern, but it is an existing limitation of multirange > statistics, not something we are introducing. The existing restriction > selectivity code in multirangetypes_selfuncs.c already uses the same > NOT(<<) AND NOT(>>) decomposition for && on multiranges. And > multirange_typanalyze explicitly says: > > /* Treat multiranges like a big range without gaps. */ > > The statistics only store the outermost bounds, so the gap information > is already lost before our estimator sees it. The multirange GiST > opclass does the same (stores the bounding range). Our join estimator > is just consistent with how multiranges are handled elsewhere. > > The alternative is falling back to the 0.005 default, which will almost > certainly be worse. Would a comment explaining the limitation be enough? > Thanks, that is a fair point. I agree that this is not something patch 2 is uniquely introducing. If the existing multirange statistics and restriction selectivity already treat a multirange essentially as its outer bounds, then it makes sense that the join estimator can only work within that same approximation. So I am less worried about this as a correctness objection than I was at first. My main concern is really about making that limitation explicit, especially for &&, where internal gaps can matter a lot for the real overlap semantics. I think it would help if patch 2 said this a bit more directly, both in the code comments and in the patch description. Something along the lines of: - this reuses the same outer-bounds approximation already used by existing multirange statistics / restriction selectivity - internal gaps are not represented in the available stats - so && for sparse multiranges may still be overestimated in some cases - but this is still expected to be better than falling back to a fixed default selectivity > For patch 3, I agree with the motivation [...] But I am not convinced > > that exporting those helpers via selfuncs.h is the right boundary. > > My preference would be something tighter: [...] a backend-private > > internal header just for the range-family selfuncs code > > Good point about the visibility. I'll move the declarations to a > separate backend-private header in the next version. > > Regards, > Maxime > If you are willing to add that clarification, I think that would address most of my concern here. Regards, Haibo --0000000000006b5c7f064f8c0b3c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Apr 15, 2026 at 8:13=E2=80=AFAM S= CHOEMANS Maxime <maxime.schoe= mans@ulb.be> wrote:
Hi Haibo,

Thank you for the review.

> One small note: the localized bool join_is_reversed; in my version was=
> intentional. I left it that way because get_join_variables() wants a
> storage location, and I preferred to keep that use local and explicit<= /div>
> rather than trying to reshape things around it.

Fair point. I moved it out of the bare block because it looked unusual,
but I can change it back if you prefer.

> For patch 2, I am less convinced, especially for &&.
> [...]
> for multiranges there is a third possibility: neither side is entirely=
> left nor entirely right, and yet they still do not overlap because of<= /div>
> an internal gap.

This is a valid concern, but it is an existing limitation of multirange
statistics, not something we are introducing. The existing restriction
selectivity code in multirangetypes_selfuncs.c already uses the same
NOT(<<) AND NOT(>>) decomposition for && on multiranges= . And
multirange_typanalyze explicitly says:

=C2=A0 =C2=A0 /* Treat multiranges like a big range without gaps. */

The statistics only store the outermost bounds, so the gap information
is already lost before our estimator sees it. The multirange GiST
opclass does the same (stores the bounding range). Our join estimator
is just consistent with how multiranges are handled elsewhere.

The alternative is falling back to the 0.005 default, which will almost
certainly be worse. Would a comment explaining the limitation be enough?

Thanks, that is a fair point.

I agree that this is not something patch 2 is uniquel= y introducing. If the=C2=A0

existing multirange statistics and restriction select= ivity already treat a

multirange essentially as its outer bounds, then it m= akes sense that the

join estimator can only work within that same approxi= mation.

So I am less worried about this as a correctness obje= ction than I was at

first. My main concern is really about making that li= mitation explicit,

especially for &&, where internal gaps can matter a lot for the real

overlap semantics.

I think it would help if patch 2 said this a bit more= directly, both in

the code comments and in the patch description. Somet= hing along the lines

of:

  • this reus= es the same outer-bounds approximation already used by existing
multirange statistics / restricti= on selectivity
  • internal gaps are not represented in the available stats
  • so && for sparse multiranges m= ay still be overestimated in some cases
  • but this is still expected = to be better than falling back to a fixed default selectivity
> For patch 3, I agree with the motivation [...] But I am not convinced<= /div>
> that exporting those helpers via selfuncs.h is the right boundary.
> My preference would be something tighter: [...] a backend-private
> internal header just for the range-family selfuncs code

Good point about the visibility. I'll move the declarations to a
separate backend-private header in the next version.

Regards,
Maxime

I= f you are willing to add that clarification, I think that would address

=

most of my concern here.

Reg= ards,

Haibo=C2=A0=C2=A0
--0000000000006b5c7f064f8c0b3c--