Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rRpOP-009aUB-F3 for pgsql-hackers@arkaria.postgresql.org; Mon, 22 Jan 2024 08:10:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rRpOO-002UZp-Io for pgsql-hackers@arkaria.postgresql.org; Mon, 22 Jan 2024 08:10:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rRpOO-002UZh-9j for pgsql-hackers@lists.postgresql.org; Mon, 22 Jan 2024 08:10:52 +0000 Received: from mail-ua1-x934.google.com ([2607:f8b0:4864:20::934]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rRpOM-0035Yw-1i for pgsql-hackers@lists.postgresql.org; Mon, 22 Jan 2024 08:10:51 +0000 Received: by mail-ua1-x934.google.com with SMTP id a1e0cc1a2514c-7d2dfa80009so825652241.0 for ; Mon, 22 Jan 2024 00:10:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705911048; x=1706515848; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sgdalBwta7OB0DOW3E+BpKl+gFICL/Qn/c70R8QyM44=; b=J8c7E0U+mihpix4npRhxY6h0/wqg0mmgvv8JZVmI83agQl3Xa7PxLzmgsXYU9CAK1U 3PQnxVf97Dej5na5ah8ZCj8lz5RDAV6sBjd1Gd3Yx3GTJMqNGq8KZaZLEyBtKlOX8LdG jeEIQYYqpauMcTFhV/lQfpVvBfbea5ExOmbWELdeVU4rMTJoRBaBqYzFTxqJ38WsaLxD t9XwMaenOedcaKvQmh0aReY0fUxLXfwCu6guJpas+Gl2MdzNBd0++lhl/bmkK2tvVfdf ncMnHI9tj6p1pPfyzG41bt/H3xRccGrVc2gbhb8BKyk7cpbaayTz3Hnr+w4y3CIfnDFr VsfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705911048; x=1706515848; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sgdalBwta7OB0DOW3E+BpKl+gFICL/Qn/c70R8QyM44=; b=goM9X9b4Rfcg578Xas+sOe21XWCv8CYi79t8d+7QFmCAqgO9w2lTrQP/uF/ga1N7zc jap/kX6iVmVtQBS1Q/W6i0mhbb5IvKaXiL7+3istmjmbqTzuDjlhGtd3DIzRQNVracUn JtIzffSHBvGJcW3aNoh9waKdiiWPphu3O4t9u6LH93qfyCBliCOAyoVwI5M5I27H0QUN vXnOXph7nDMB48hLutOg72LXHlYUGUM1myh/cM/7wGSlk443TE/DpmY9FVKfF+k4Dv++ jFnTdD7atOQbu4oX7TzKkWlCxaBMO+wS19CXzn60w46kHQS6+P4+8dxuJzhXHDEn3ORZ Zddw== X-Gm-Message-State: AOJu0Yz0GMI5nvwxksnFMEW2l/oKl09w19Ph0HS8vuBkB6mMnX9cYO4V vdiwG61qxmIGT3aCaL9HLe5dgObFK1VJPICPxmcyrOPeter1OoueRssxDV3d2teiChnHrdVCqJf xaaR5lmb3j367peoo7GBKvD9fiNo= X-Google-Smtp-Source: AGHT+IGKMH1pMas+s2E+ElYV0VGEV7JyWyWjiBFkqGL6VAm299TxMVQWTEAWHqX5Moh3SRUOVg1Pa2M2esh+9tQHmPk= X-Received: by 2002:a67:c902:0:b0:469:ada6:7f4b with SMTP id w2-20020a67c902000000b00469ada67f4bmr1392414vsk.26.1705911048046; Mon, 22 Jan 2024 00:10:48 -0800 (PST) MIME-Version: 1.0 References: <8afecd87-d1e5-241c-5e3e-75e1c62c279b@enterprisedb.com> <1bc5e1e4-380b-213d-6e6a-d5e8f7f0c7f5@enterprisedb.com> <84ffb566-8038-ab35-c841-7a5e5728a247@enterprisedb.com> <525ea957-88ff-e2c6-2975-474c849aa234@ulb.be> <48a78946-f380-a279-e75f-711846e509f8@ulb.be> <4c443cf6-96af-0953-fd5b-bd2b5c0c6f00@ulb.be> <2516071.1699991181@sss.pgh.pa.us> <08b2dea8-6dca-06e4-a53d-3e6334e7645b@ulb.be> In-Reply-To: From: jian he Date: Mon, 22 Jan 2024 16:10:37 +0800 Message-ID: Subject: Re: Implement missing join selectivity estimation for range types To: vignesh C Cc: Schoemans Maxime , Tom Lane , Damir Belyalov , PostgreSQL Hackers , SAKR Mahmoud , Diogo Repas , LUO Zhicheng , Tomas Vondra , Andrey Lepikhov Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk I cannot figure out why it aborts. as Tom mentioned in upthread about the test cases. similar to src/test/regress/sql/stats_ext.sql check_estimated_rows function. we can test it by something: create or replace function check_estimated_rows(text) returns table (ok bool) language plpgsql as $$ declare ln text; tmp text[]; first_row bool := true; begin for ln in execute format('explain analyze %s', $1) loop if first_row then first_row := false; tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)'); return query select 0.2 < tmp[1]::float8 / tmp[2]::float8 and tmp[1]::float8 / tmp[2]::float8 < 5; end if; end loop; end; $$; select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 && ir2$$); select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 << ir2$$); select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 >> ir2$$); Do you need 3 tables to do the test? because we need to actually run the query then compare the estimated row and actually returned rows. If you really execute the query with 3 table joins, it will take a lot of time. So two tables join with where quql should be fine? /* Fast-forwards i and j to start of iteration */ + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++); + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++); + + /* Do the estimation on overlapping regions */ + while (i < nhist1 && j < nhist2) + { + double cur_sel1, + cur_sel2; + RangeBound cur_sync; + + if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0) + cur_sync = hist1[i++]; + else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0) + cur_sync = hist2[j++]; + else + { + /* If equal, skip one */ + cur_sync = hist1[i]; + this part range_cmp_bound_values "if else if" part computed twice, you can just do ` int cmp; cmp = range_cmp_bound_values(typcache, &hist1[i], &hist2[j]); if cmp <0 then else if cmp > 0 then else then ` also. I think you can put the following into main while loop. + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++); + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++); split range and multirange into 2 patches might be a good idea. seems: same function (calc_hist_join_selectivity) with same function signature in src/backend/utils/adt/multirangetypes_selfuncs.c and src/backend/utils/adt/rangetypes_selfuncs.c, previously mail complaints not resolved.