Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rLha7-009bbV-Co for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jan 2024 10:37:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rLha5-002MoO-2h for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jan 2024 10:37:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rLha4-002MoG-Lc for pgsql-hackers@lists.postgresql.org; Fri, 05 Jan 2024 10:37:36 +0000 Received: from mail-yb1-xb33.google.com ([2607:f8b0:4864:20::b33]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rLhZx-00Druj-MP for pgsql-hackers@lists.postgresql.org; Fri, 05 Jan 2024 10:37:35 +0000 Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-dbdb8e032f7so317033276.1 for ; Fri, 05 Jan 2024 02:37:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704451049; x=1705055849; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RP/Z0/Wdxm1TVTbYRwka09Qotax1hWC97V8PYD0QIOA=; b=WQeHTQheOPMZR9AI5gZ9gCHu4FmIw8iBf17BIvAwY7PUtzVoU9h3+3okuJZr+lFM0d 8z0EUPex/ReLHMRn3pN3wP2065f8FiSbJqlDlwV1U9GK3/GgRZg5KMWNMvCmHWcPfAf4 Jc2bWJfoyU/ufbTTtqWALQ0/njyyGnOLWDyRst35GFni8pzh99i3dzZvxhf7cPzWLSTh 5B7tWCmof5mgYdUHSZtS4T/FYeNU7gQSwba62bOVf2noNSWWRrU4NnV6E4N7ulncBvHe KIaLsrqsXH3XruCXQ+JW6BC3ASmTN4DTa44+lRW41B9OWizqp40fD+ycjyJEIj28otP6 fxCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704451049; x=1705055849; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RP/Z0/Wdxm1TVTbYRwka09Qotax1hWC97V8PYD0QIOA=; b=TO6GOHK7dGOhvO0bPIHxZxo9/Gvp5PkJ+bcNcsv4MHO//IeQo59ENqSdyneQQwHKAG DVhMpQg6s7QksmcRGjDpyPtESn7vPikfYhF3euk6xZ5fSDaGkuHO3Oyfmo+jhG7t+If7 maVZCIGV+asZGIOd7ZAK3GfyeEo3b8lnIrSi/EaMcef/iqSolr7YNlnShOoKVfKmstn/ sjO+tY9Wcu2CgSCtgZpSVLXSC7fATAlTkltMmj8sBLrj5VV8vMj+pOxEOQARy/UmG4ya ivjRdajwrJBX7YSSQgw/zHt9c9VEu5Alr/9tQx0JUI31JsP26K4MfSZACwOm4DhL36VZ YDgQ== X-Gm-Message-State: AOJu0YxE+m3uGjvZazLNDdeMbEPM+3ZLgfI6nTY6wNI1oGQNgNTu/Xn8 00mlawE5FEXVld+cHuV6s6xDKlB5wSXUYOWYa0E= X-Google-Smtp-Source: AGHT+IHnFh3tbD2+ZuySKKZPgZYz5xBu5Inf0kM5VZRMbVCFGpXwkTe6pd3pJKSTFqoTkQxdAZNd8wsODCbk5wUkMp0= X-Received: by 2002:a25:938b:0:b0:dbe:aa27:ad73 with SMTP id a11-20020a25938b000000b00dbeaa27ad73mr1325653ybm.19.1704451048823; Fri, 05 Jan 2024 02:37:28 -0800 (PST) MIME-Version: 1.0 References: <8afecd87-d1e5-241c-5e3e-75e1c62c279b@enterprisedb.com> <1bc5e1e4-380b-213d-6e6a-d5e8f7f0c7f5@enterprisedb.com> <84ffb566-8038-ab35-c841-7a5e5728a247@enterprisedb.com> <525ea957-88ff-e2c6-2975-474c849aa234@ulb.be> <48a78946-f380-a279-e75f-711846e509f8@ulb.be> <4c443cf6-96af-0953-fd5b-bd2b5c0c6f00@ulb.be> <2516071.1699991181@sss.pgh.pa.us> In-Reply-To: From: vignesh C Date: Fri, 5 Jan 2024 16:07:17 +0530 Message-ID: Subject: Re: Implement missing join selectivity estimation for range types To: Schoemans Maxime Cc: Tom Lane , Damir Belyalov , PostgreSQL Hackers , SAKR Mahmoud , Diogo Repas , LUO Zhicheng , Tomas Vondra , Andrey Lepikhov Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 21 Nov 2023 at 01:47, Schoemans Maxime wrote: > > On 14/11/2023 20:46, Tom Lane wrote: > > I took a brief look through this very interesting work. I concur > > with Tomas that it feels a little odd that range join selectivity > > would become smarter than scalar inequality join selectivity, and > > that we really ought to prioritize applying these methods to that > > case. Still, that's a poor reason to not take the patch. > > Indeed, we started with ranges as this was the simpler case (no MCV) and > was the topic of a course project. > The idea is to later write a second patch that applies these ideas to > scalar inequality while also handling MCV's correctly. > > > I also agree with the upthread criticism that having two identical > > functions in different source files will be a maintenance nightmare. > > Don't do it. When and if there's a reason for the behavior to > > diverge between the range and multirange cases, it'd likely be > > better to handle that by passing in a flag to say what to do. > > The duplication is indeed not ideal. However, there are already 8 other > duplicate functions between the two files. > I would thus suggest to leave the duplication in this patch and create a > second one that removes all duplication from the two files, instead of > just removing the duplication for our new function. > What are your thoughts on this? If we do this, should the function > definitions go in rangetypes.h or should we create a new > rangetypes_selfuncs.h header? > > > But my real unhappiness with the patch as-submitted is the test cases, > > which require rowcount estimates to be reproduced exactly. > > > We need a more forgiving test method. Usually the > > approach is to set up a test case where the improved accuracy of > > the estimate changes the planner's choice of plan compared to what > > you got before, since that will normally not be too prone to change > > from variations of a percent or two in the estimates. > > I have changed the test method to produce query plans for a 3-way range > join. > The plans for the different operators differ due to the computed > selectivity estimation, which was not the case before this patch. One of the tests was aborted at [1], kindly post an updated patch for the same: [04:55:42.797] src/tools/ci/cores_backtrace.sh linux /tmp/cores [04:56:03.640] dumping /tmp/cores/postgres-6-24094.core for /tmp/cirrus-ci-build/tmp_install/usr/local/pgsql/bin/postgres [04:57:24.199] Core was generated by `postgres: old_node: postgres regression [local] EXPLAIN '. [04:57:24.199] Program terminated with signal SIGABRT, Aborted. [04:57:24.199] #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 [04:57:24.199] Download failed: Invalid argument. Continuing without source file ./signal/../sysdeps/unix/sysv/linux/raise.c. [04:57:26.803] [04:57:26.803] Thread 1 (Thread 0x7f121d9ec380 (LWP 24094)): [04:57:26.803] #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 [04:57:26.803] set = {__val = {4194304, 0, 4636737291354636288, 4636737291354636288, 0, 0, 64, 64, 128, 128, 192, 192, 256, 256, 0, 0}} [04:57:26.803] pid = [04:57:26.803] tid = [04:57:26.803] ret = [04:57:26.803] #1 0x00007f122003d537 in __GI_abort () at abort.c:79 ... ... [04:57:38.774] #6 0x00007f357ad95788 in __asan::__asan_report_load1 (addr=addr@entry=107477261711120) at ../../../../src/libsanitizer/asan/asan_rtl.cpp:117 [04:57:38.774] bp = 140731433585840 [04:57:38.774] pc = [04:57:38.774] local_stack = 139867680821632 [04:57:38.774] sp = 140731433585832 [04:57:38.774] #7 0x000055d5b155c37c in range_cmp_bound_values (typcache=typcache@entry=0x629000030b60, b1=b1@entry=0x61c000017708, b2=b2@entry=0x61c0000188b8) at rangetypes.c:2090 [04:57:38.774] No locals. [04:57:38.774] #8 0x000055d5b1567bb2 in calc_hist_join_selectivity (typcache=typcache@entry=0x629000030b60, hist1=hist1@entry=0x61c0000188b8, nhist1=nhist1@entry=101, hist2=hist2@entry=0x61c0000170b8, nhist2=nhist2@entry=101) at rangetypes_selfuncs.c:1298 [04:57:38.774] i = 0 [04:57:38.774] j = 101 [04:57:38.774] selectivity = [04:57:38.774] cur_sel1 = [04:57:38.774] cur_sel2 = [04:57:38.774] prev_sel1 = [04:57:38.774] prev_sel2 = [04:57:38.774] cur_sync = {val = , infinite = , inclusive = , lower = } [04:57:38.774] #9 0x000055d5b1569190 in rangejoinsel (fcinfo=) at rangetypes_selfuncs.c:1495 [1] - https://cirrus-ci.com/task/5507789477380096 Regards, Vignesh