Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCoVo-002Kg4-2w for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 00:53:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wCoVn-00Dhlq-08 for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 00:53:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCoVm-00Dhlg-2D for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 00:53:47 +0000 Received: from mail-yx1-xb12e.google.com ([2607:f8b0:4864:20::b12e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wCoVk-000000017nf-49EA for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 00:53:47 +0000 Received: by mail-yx1-xb12e.google.com with SMTP id 956f58d0204a3-65006c99d38so6398500d50.3 for ; Tue, 14 Apr 2026 17:53:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776214423; cv=none; d=google.com; s=arc-20240605; b=BL8Z03m4kTeAYMbuGs1k18jFMXU4zmAlP3YCZ0N6vVRxu7kAHqp1l28AAIH5gxAV1X IWrid3h9+RPIgwaswqkc4j+FXzcG4VMJ1Z+ZDu20T61Ieqr7S7h2inUghtRakvOi3Gi0 8ZpEuZ2ZdTQhCQaAKkc9sQTBV2+19Cg5l2CjDj3zOPFj1jfkH0o6R3/KhFXs8gAR/GrB 2OOMz89HwI4zqE/fxtOh3JtmLDa6uP2muaAt/mM/fZYWj0fHJQdmHOV6ks9M2SJMSWA6 IPy+3RWRO10A4Y4p+R5C7tBX/fPqfMZiJ4jISSkCJqL819HuxDSZIysTzq6F8goH1mSf 6bVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=bSPcGu5+pgPKd/y/D1LM19zlx7J/mgYKdE6R/MB8ikY=; fh=/8BFO/dOmmEBX0/FwH/LbQcLv9lI/qPvbNyfCMcc8c8=; b=NtL5vf8ViQ4KEDZZ8FyqGQzLFBH84Km+bQISoZV3wpX0BEAp+Jxk4tavwnDVrOmPiQ J4hbUqaRSVK96VKyhDHxa4ZsDHUfasTOHpOev1QKuo2SvnpU5HOE4oXC+uUWqjBOkDVe FrnJ5sfe16f42Odu2uG35XFzXvIzAQTr7qmeO6pbtIQJRWuEDsN7z6vdv3Dl1NCG9E1G LY+Xw6LC+4C4/Quwh6mixwp+KIUWTxYU8sW0yFeYfRDV/gsw89xdIyjHD7PQPhdSk36m H16J2ellAyS6ZYWfy4HBaEKmLXCtStUllA5W8ezaSOSXk7M0jSfwBBmEIhNZ/dxLBFXU BhDw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776214423; x=1776819223; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bSPcGu5+pgPKd/y/D1LM19zlx7J/mgYKdE6R/MB8ikY=; b=PFJnyc/1iyG738xINYqUafQq8pfvjln5vsVqAmHkuLerh5qpSIeXQTj7HzAhU0HHcg bxtws75jvTRHYaWEFxueoNi9oF1sMEytmY/7IZeS7z54XrbISZC408+RWZSSzEFdAfS9 /OxTY0QeF+vRbPqnwaxeWrygmnwIzAJo56fiSKbpViN3cQRLXZ8NM8lP/LNVhd/Gf5kg 5VC+HaakUpy1RefC+p1gOjowU1gu82ra5UmTVoEzSr1re96aLE6sf1IK+Jpwi6nxB8S9 oL0xWHps0ENEAziGM+HQX1op2DL92KyTI8A0zuiTU5haZAvfIGxnDxChdpWORpsc/XYS sJzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776214423; x=1776819223; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bSPcGu5+pgPKd/y/D1LM19zlx7J/mgYKdE6R/MB8ikY=; b=WwQWBhcNUyDLmMdC/MWlYFAlkLg0GWPkE0bn9XxbU6K2PpenLCaSXx3bB1I6DnC0Xx fRVbmmJGZolmRVkbHADlHwcA0RTfRXvywrdSOFXknQiyQ5kqvKDk08r6zRWw8OE07GGU Xs3+ARggDk3OkIIgG2zrXVqFQCFyKgy+XFbF/vGG6QE9OWpD9chFNIsjQtv/O1XrZ5/E vTSwVgThBQJSNUByrnHZk75cHva0whcQxi5dvFMn9M/AqsnWtNM6R3n5QEPrbaQ/EODr FV3ojN98ebYntz0R8SDpRo7pABswW2LsbXDTJv1+z3KK+cbGVen+QYvcS8XYhBXqSo2a 2uyw== X-Forwarded-Encrypted: i=1; AFNElJ8+syCZOc51diJKSsDPpIBcPt60Gp0FXOJxryuJB7RHdUJN41Uw/cVLk9EVsVT+KsKdtaIy5K9CErUSBlx/@lists.postgresql.org X-Gm-Message-State: AOJu0YxvuTVS7bjPsw2I8Lx55dQbXxQVcT1eVMMMxMYcfJ+y3GLU8wG1 m2dwV7N/28b4hOdZUl0pwdKmOkH/4GYd0ZMLp/KQOMmW7K9HzW+BfPfYn5Ns7r61exLoHQF9tQ3 uf7mGGqo1MREzdXNJHUHD4YA7r64JSZI= X-Gm-Gg: AeBDiesg0z6O58s3YF9Of+Qt3JoED1PM/o2KCB2HLm9VipBkcjssSUSrwvlc5Z8nlkx nIgW3Nhh2zZVP7tDiTZq9AK/lbUkm19gVe9E8MBYqMHFeZC7l9LpZCWDaUmV4p5wpq/BstwK4Yh N8n4iOcIWHX10oOk+cF0Z/vwxf0EhUfymsVMK2doeK2nW036yLtJgT94KOdrD+TytsC629lQ09N 75CG/LUNHW2KHQ67k2A4jg1hwQSjWRQUHfJyQKkAraapjIJFBmsxgwhzBNYg/qAyMH3CqiUccUZ np3+yHnnFyL2gnDA0SdhYwWJnJjHrn+T/t4FgX1a9/23mKuB9jrYAx5Pxxh+OA== X-Received: by 2002:a05:690e:b4e:b0:651:cf29:430b with SMTP id 956f58d0204a3-651cf294640mr9504712d50.12.1776214422809; Tue, 14 Apr 2026 17:53:42 -0700 (PDT) MIME-Version: 1.0 References: <8afecd87-d1e5-241c-5e3e-75e1c62c279b@enterprisedb.com> <1bc5e1e4-380b-213d-6e6a-d5e8f7f0c7f5@enterprisedb.com> <84ffb566-8038-ab35-c841-7a5e5728a247@enterprisedb.com> <525ea957-88ff-e2c6-2975-474c849aa234@ulb.be> <48a78946-f380-a279-e75f-711846e509f8@ulb.be> <4c443cf6-96af-0953-fd5b-bd2b5c0c6f00@ulb.be> <2516071.1699991181@sss.pgh.pa.us> <08b2dea8-6dca-06e4-a53d-3e6334e7645b@ulb.be> In-Reply-To: From: Haibo Yan Date: Tue, 14 Apr 2026 17:53:31 -0700 X-Gm-Features: AQROBzC2uF9RHWTV0t2Mqd-nDxVegW5zxwISaw79tfSbxY7ujKww6ExiVh3fZxg Message-ID: Subject: Re: Implement missing join selectivity estimation for range types To: SCHOEMANS Maxime Cc: vignesh C , Tom Lane , Damir Belyalov , jian he , PostgreSQL Hackers , SAKR Mahmoud , Diogo Repas , LUO Zhicheng , Tomas Vondra , Andrey Lepikhov Content-Type: multipart/alternative; boundary="000000000000b771dc064f75250e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b771dc064f75250e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Apr 14, 2026 at 7:03=E2=80=AFAM SCHOEMANS Maxime wrote: > Hi Haibo, > > Thank you for picking this up again. I agree with the changes you made > in v5, in particular scoping the patch to the three strict operators and > reworking the tests to check plan structure rather than exact row counts. > > Attached is v6 as a 3-patch series building on your v5. > > Patch 1 is your range join selectivity patch with one small change: the > range_cmp_bounds result in the merge walk is stored in a local cmp > variable to avoid calling it twice per iteration, as jian he suggested. > Hi Maxime, Thank you for working on this and for building on top of v5. I think patch 1 looks good to me. I do not see major issues there. One small note: the localized bool join_is_reversed; in my version was intentional. I left it that way because get_join_variables() wants a storage location, and I preferred to keep that use local and explicit rather than trying to reshape things around it. > > Patch 2 adds the same estimation for multirange types, covering all type > combinations (multirange x multirange, multirange x range, range x > multirange). Since both range and multirange types use the same bound > histogram format and the same RangeBound representation, the core > algorithm is identical. > For patch 2, I am less convinced, especially for &&. My concern is not so much whether the code runs, but whether the semantic argument is strong enough for the mixed range/multirange cases. The patch assumes that because range and multirange use the same bounds histogram format and the same RangeBound representation, the same estimator can be applied directly. I think that argument is much easier to make for << and >= > than for &&. For single ranges, && works nicely with the usual decomposition because if two single ranges do not overlap, then one must be entirely to the left or entirely to the right of the other. But for multiranges there is a third possibility: neither side is entirely left nor entirely right, and yet they still do not overlap because of an internal gap. For example: A =3D {[1,2), [100,101)} B =3D [50,60) Here: A << B is false A >> B is false A && B is also false So for multiranges, =E2=80=9Cnot left and not right=E2=80=9D does not imply= overlap in the same way it does for single ranges. That makes me worry that reusing the same estimator logic for &&, especially in mixed range/multirange cases, may overestimate overlap because the overall lower/upper bounds do not capture the internal holes. So I think patch 2 still needs a stronger justification there, and probably more targeted tests around sparse multiranges / hole cases, especially for &&. > > Patch 3 removes the duplication between rangetypes_selfuncs.c and > multirangetypes_selfuncs.c that Tom raised as a concern. It makes the > 10 shared helper functions non-static, exports them via selfuncs.h, > and deletes the copies from the multirange file. This covers all the > pre-existing duplication between the two files, not just the functions > added in this patch set. > > Regards, > Maxime > For patch 3, I agree with the motivation: the duplication between rangetypes_selfuncs.c and multirangetypes_selfuncs.c is not ideal. But I am not convinced that exporting those helpers via selfuncs.h is the right boundary. My preference would be something tighter: - keep the shared helper implementations in one place - add a backend-private internal header just for the range-family selfuncs code - include that internal header from rangetypes_selfuncs.c and multirangetypes_selfuncs.c - avoid widening visibility by turning a group of file-local helpers into broader extern declarations in selfuncs.h In other words, I agree that the duplication should be removed, but I think a backend-private internal header should be enough for that goal. I do not think we need to expand visibility more than necessary by moving these helpers out of the file-private space into a broader interface. Thanks again for working on this. Regards, Haibo --000000000000b771dc064f75250e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Apr 14,= 2026 at 7:03=E2=80=AFAM SCHOEMANS Maxime <maxime.schoemans@ulb.be> wrote:
Hi Haibo,

Thank you for picking this up again. I agree with the changes you made
in v5, in particular scoping the patch to the three strict operators and
reworking the tests to check plan structure rather than exact row counts.

Attached is v6 as a 3-patch series building on your v5.

Patch 1 is your range join selectivity patch with one small change: the
range_cmp_bounds result in the merge walk is stored in a local cmp
variable to avoid calling it twice per iteration, as jian he suggested.

Hi Maxime,

Thank you for working on this and for building on top of v5.=C2=A0

I think patch 1 looks good to me. I do not see major = issues there. One small note: the localized bool j= oin_is_reversed; in my version was intentional. I left it that way b= ecause get_join_variables() wants a storage= location, and I preferred to keep that use local and explicit rather than = trying to reshape things around it.


Patch 2 adds the same estimation for multirange types, covering all type
combinations (multirange x multirange, multirange x range, range x
multirange). Since both range and multirange types use the same bound
histogram format and the same RangeBound representation, the core
algorithm is identical.

For patch 2, I am less convinced, especially for &&.

My concern is not so much whether the code runs, but = whether the semantic argument is strong enough for the mixed range/multiran= ge cases. The patch assumes that because range and multirange use the same = bounds histogram format and the same RangeBound representation, the same estimator can be applied directly. I think th= at argument is much easier to make for << and >> than for &&.

For single ranges, &&= ; works nicely with the usual decomposition because if two single ra= nges do not overlap, then one must be entirely to the left or entirely to t= he right of the other. But for multiranges there is a third possibility: ne= ither side is entirely left nor entirely right, and yet they still do not o= verlap because of an internal gap.

For example:=C2=A0

A =3D {[1,2), [100,101)}

B =3D [50,60)

Here:

A << B is false
= A >> B is false
A && B is also false

So for multiranges, =E2=80=9Cnot left and not right= =E2=80=9D does not imply overlap in the same way it does for single ranges.= That makes me worry that reusing the same estimator logic for &&, especially in mixed range/multirange cases= , may overestimate overlap because the overall lower/upper bounds do not ca= pture the internal holes.

So I think patch 2 still needs a stronger justificati= on there, and probably more targeted tests around sparse multiranges / hole= cases, especially for &&.


Patch 3 removes the duplication between rangetypes_selfuncs.c and
multirangetypes_selfuncs.c that Tom raised as a concern. It makes the
10 shared helper functions non-static, exports them via selfuncs.h,
and deletes the copies from the multirange file. This covers all the
pre-existing duplication between the two files, not just the functions
added in this patch set.

Regards,
Maxime

For patch 3, I agree with the motivation: the duplica= tion between rangetypes_selfuncs.c and multirangetypes_selfuncs.c is not ideal. But I = am not convinced that exporting those helpers via = selfuncs.h is the right boundary.

My preference would be something tighter:

  • keep the shared helper implementations in one place

  • add a backend-private internal header just for the ra= nge-family selfuncs code

  • include that internal header= from rangetypes_selfuncs.c and mult= irangetypes_selfuncs.c

  • avoid widening visibility by turning a group of file-= local helpers into broader extern declarations in = selfuncs.h

In other words, I agree that the duplication should be removed, but I = think a backend-private internal header should be enough for that goal. I d= o not think we need to expand visibility more than necessary by moving thes= e helpers out of the file-private space into a broader interface.=C2=A0

Thanks again for working on this.

Regards,

Haibo

--000000000000b771dc064f75250e--