Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0CnX-00EJya-I5 for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 05:07:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u0CnW-0030t4-7v for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 05:07:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0CnV-0030sw-T1 for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 05:07:25 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1u0CnU-002fqD-0I for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 05:07:24 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 53357MLF1233328; Thu, 3 Apr 2025 01:07:22 -0400 From: Tom Lane To: David Rowley cc: Manikandan Swaminathan , pgsql-general@lists.postgresql.org Subject: Re: Postgres Query Plan using wrong index In-reply-to: References: <1203098.1743640224@sss.pgh.pa.us> <2BC8AB39-16D7-4423-BE0A-F0F4EA432E2E@gmail.com> Comments: In-reply-to David Rowley message dated "Thu, 03 Apr 2025 17:41:43 +1300" MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <1233326.1743656842.1@sss.pgh.pa.us> Content-Transfer-Encoding: 8bit Date: Thu, 03 Apr 2025 01:07:22 -0400 Message-ID: <1233327.1743656842@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk David Rowley writes: > On Thu, 3 Apr 2025 at 16:24, Manikandan Swaminathan > wrote: >> why doesn’t making a multivariate statistic make a difference? > Extended statistics won't help you here. "dependencies" just estimates > functional dependencies between the columns mentioned in the ON > clause. What we'd need to store to do better in your example query is > positional information of where certain values are within indexes > according to an ordered scan of the index. I don't quite know how we'd > represent that exactly, but if we knew that a row matching col_a > > 4996 wasn't until somewhere near the end of idx_col_a_btree index, > then we'd likely not want to use that index for this query. A simple-minded approach could be to just be pessimistic, and increase our estimate of how many rows would need to be scanned as a consequence of noticing that the columns have significant correlation. The shape of that penalty function would be mostly guesswork though, I fear. (Even with a clear idea of what to do, making this happen seems a little complex --- just a SMOP, but I'm not very sure how to wire it up.) regards, tom lane