Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmqFI-00GuLG-19 for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Feb 2026 09:29:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vmqFG-00DCIO-0h for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Feb 2026 09:29:23 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmqFF-00DCIF-2y for pgsql-hackers@lists.postgresql.org; Mon, 02 Feb 2026 09:29:22 +0000 Received: from mail-wm1-x331.google.com ([2a00:1450:4864:20::331]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vmqFE-00000000eRn-25Cm for pgsql-hackers@lists.postgresql.org; Mon, 02 Feb 2026 09:29:22 +0000 Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-4801c2fae63so31069995e9.2 for ; Mon, 02 Feb 2026 01:29:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770024559; x=1770629359; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=qnrru9RU2wb3ThT9SaPzYyfZLal1H4YypqvEEM2BuSA=; b=Tdzhc5VxH77XAP1mATsRRJVUdx+mFcYu4xizvhwLlbt0UHsnJ6vWsx/e607xG0E/sw r3sLNowN7xMb6d/nmodDrasBdnPm6P5uTVYBCMqmIykwOqAsSC1RBYpJJM2uBNkkGa2l QtC7D12RfuLbg84axl8PHylNzoOWNkwPtLE0qIKQJKeh7/gPFrlAH6k+QyaDuwiy5sKQ A0t0Cu4ZZVm6SQJuqmaGCbfTHJyIbn21d+aUyiDHq9uqUyAkA6+DpUdX7ffHgtt1ZFeU lm6azJji5S4vGzeZtPL3cMY3waREpKBBUbiUlLv/Lb1HvIho98RJnzfs5kgQ6cxd/qzI DkDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770024559; x=1770629359; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qnrru9RU2wb3ThT9SaPzYyfZLal1H4YypqvEEM2BuSA=; b=Ber0nXBLHUR+F1X5ZwMPJ873XTucqn0lzVGKLDKzGLRr6Qv0rZ1gssUNwwW+YVtbdG ICmqrSUub6gehss8TgNvAn5+BSzuVSzbvlatRvN93zfFOC9O0hWK4Fb9zpr13Q+1KrNK eQZTSMkr8onNLb4VU83TtHuMaqP3ofqVc/zzWUjkFmc6PKwu4GBOC3quLvJ2g6yRMzVP iERO0E5XHbIV3iRp9e48b0Mlw+uTZaPzYyx5co3Y8WrAhRjT6inZSSwmdUUp6FzW5YN0 K369GhvfavvtHRNbXcB4tIJKtyWDqDsQ50dYYaRWopw1ZDAPRvUoxOOGEvRSN+K6Dh3w gQSg== X-Gm-Message-State: AOJu0YxNVxP1AscixbKaU000Dam63u6Y/G3ZsMHrvQAHRD0CbgjKYR1O UXG74TvJxuy0eAqYX/AmrKyybAm+SrB6SyV9gcmP+H5/ZvCGFejjQq0o X-Gm-Gg: AZuq6aJuCbrFvdpiuoGUR2ahFAWbdn8FA9IrSYQVd07GNsV3FfdXP2usx+XQI7WL4Qk 9vCR8IMFGDr/r18ZdxMT9s5lRwmWLc19tys/keHKNvcadXYePK41+ZinrbEeG2v8CA46wE0ZfMF PwgU90uc/L3NtWrTRfJjMw/NWndAEuPhSqm1/lGDTsP2KY7UnqbU/9xNvLV2YK5lLX32v9QS8xk B05n7kV4amA4DkwmzeFokkssuVa5iciRv6gNOl2IvYytYhvQkWFAW4NvrQA3I3aokSyfbwI4fbf BZuVNx859GksBi0Sw37k2tz2z6u2JZacIYHNd2eDwNnMAf0wlLlgURi84HX9105UyL7LQEv9mNu 7ifU6arAA/MgiWNBorspp+cY/wyWTYVx7lc0tPmVXyMGNQ1djj1No/0u3Bv+pNDsRHQKAIWNG5a vl862AUbE/kT8= X-Received: by 2002:a05:600c:820e:b0:480:5678:1fdd with SMTP id 5b1f17b1804b1-482db451f37mr150726745e9.12.1770024558959; Mon, 02 Feb 2026 01:29:18 -0800 (PST) Received: from [192.168.2.32] ([165.225.27.6]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-482dbd21f5dsm105262185e9.8.2026.02.02.01.29.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 Feb 2026 01:29:18 -0800 (PST) Message-ID: Date: Mon, 2 Feb 2026 10:29:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Hash-based MCV matching for large IN-lists To: Ilia Evdokimov , Chengpeng Yan Cc: "pgsql-hackers@lists.postgresql.org" References: <7db341e0-fbc6-4ec5-922c-11fdafe7be12@tantorlabs.com> <988e3168-6096-488a-bb42-787e1e8c21a4@tantorlabs.com> <21FF6A0B-886F-4132-B6A6-0F4E934B4CEE@Outlook.com> <8ee9d903-bb13-434c-8145-b3769bf2b3cb@tantorlabs.com> Content-Language: en-US From: David Geier In-Reply-To: <8ee9d903-bb13-434c-8145-b3769bf2b3cb@tantorlabs.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi! > Attached v4 patch with above fixes. Good progress! I did another pass over the code, focusing on structure: - MCVHasContext and MCVInHashContext are identical. MCVHashEntry and MCVInHashEntry only differ by the count member. I would, as said before, merge them and simply not use the count member for the join case. - hash_mcv_in() and mcvs_in_equal() are identical to hash_mcv() and mcvs_equal(). Let's remove the new functions and use the existing ones instead, in the spirit of the previous point. - The threshold constants are also identical. I would merge them into a single, e.g. MCV_HASH_THRESHOLD, in the spirit of the previous two points. - MCVHashTable_hash will then be interchangable with MCVInHashTable_hash. So let's remove MCVInHashTable_hash, in the spirit of the previous three points. - Use palloc_array() instead of palloc() when allocating arrays. - We can avoid allocating the all-true elem_const array by passing NULL for elem_const to scalararray_mcv_hash_match(), and considering a NULL pointer to mean "all elements are constant". - The following comment got copy&pasted from eqsel_internal() twice. It reads a little strange now because we're not punting here by immediately returning like in eqsel_internal() but instead fallback to the original code path. Maybe say instead "... falling back to default code path to compute default selectivity" or something like that. /* * If expression is not variable = something or something = * variable, then punt and return a default estimate. */ - The call to fmgr_info(opfuncoid, &eqproc) is currently under have_mcvs but can be moved into the next if. - elem_nulls and elem_const does have to be 0-initialized via palloc0(). All elements are set in the subsequent for-loop. I believe elem_values also doesn't have to be 0-initialized via palloc0(). - Have you checked there there's test coverage for the special cases (nvalues_non_mcv > 0, nvalues_nonconst > 0, IN contains NULL, isEnequality==true, etc.)? If not let's add tests for these. I'll do a 2nd iteration, focusing on correctness, once these comments are addressed and I've got the SQL from you so that I can test the corner cases manually. -- David Geier