Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0BBr-00DTMP-2b for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 03:24:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u0BBp-002JWe-Kq for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 03:24:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0BBp-002JWV-7h for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 03:24:25 +0000 Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u0BBm-0035GP-2Q for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 03:24:24 +0000 Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-736dd9c4b40so685701b3a.0 for ; Wed, 02 Apr 2025 20:24:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743650660; x=1744255460; darn=lists.postgresql.org; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=6A5W6MU6PP8JlJ6oYcKNNss8w1KminOqAj9zsXDB+Ak=; b=lJ1yUM6a1MC91+RlTgebfKy3BC3tPL3iEolmUIztRCzzrEHcEIaxRN/uvZ7NpuHSps CJ/GNjRH5M0rnu2M7hCsNxZ3oGry5JesGwNFpSlDLkE9DlWFe+NLSbtRRPcGaGW1GQy+ ciEAcaIhnZmopu9eTnLvcDmwaWS2xHHGiV5EKZxGKBF8uu5uFaT0XjPwAvLc3R5bgDuk kdPzcc+wUwEKZNF2DdJY7bNwi+Ox00YKTctVJXOKacWK76f55lDYO9PUiYLJ0+nhg3y1 PqErhml2+d9VLYoq4Gf/TxkrWYOES+65h0uGXDLOn8Jnx1R0OW1ITy/7sRdp2XCjuBPc lwIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743650660; x=1744255460; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6A5W6MU6PP8JlJ6oYcKNNss8w1KminOqAj9zsXDB+Ak=; b=aeUE51e6cvjHzoOGO8NdPjDscnnD+v60F6w7U2G2OMOZRdeM1xQ50egnL7mo7KNwNL UWUR39dNRoWtLOYYNxNqWnveKzmErnuOZQoi8wsXHAdVS94znkkPb1soBmtK4dt8lNsh oRlG0z0A2UJPSh6hpiKXaekH7+mbXeIdrJt7Hig5NELQCoBL13Mn7IoXbloNHu3hvuGW NCiD1m0ySj+TWgmSvYWYpsFohh6u4V0Dlxzi2wreeQGezEXI4KCigA+TzU+z6HY1ha0K RAXGeZm1Dn3qnnsx97VTFDlpLWKdl07kzt219uCude7YuNzZeZ923gX+KZy79wKqZMPn vbaA== X-Gm-Message-State: AOJu0YxJiiaLPGRoq+Ng4i8cmxZC09DQtGZWAPS2E6Vd+7qjUa0yLgpH Xr3euKb8im9elCEOvvyVXp5HgT5pnOl6yzCWaxpeXlgdERja6cYzeWWsAg== X-Gm-Gg: ASbGncuyII0fu5BdsbPHCkN2/2Sbp4exVPrObRFj1kGL3uAHU72eDnxRm0MB2Qr9zGj mPGQqz0U4FvvyfOdYhmIPcAcaUnqsyfq9o2X8yQUbZ9/vrKQX09cXfUmPlKBUPua/d6lVC8v/8J DRT9fzfg3vq1hL32wOhSSGmnlKzaUYBOKN1mg6f366S2ZxbvEvmsxqCn6qB1wgt3hE9GLnu4bmE 4DMu49VZbnzW4jACYK57cYS1OkRVCJDBCAu+TY/SLN32WK3QoWxd0M48kkk2kjuL6R+EnoBjC6Y 9W4cEvgSAe9wQ+nb0WYgzjuf6oiOao8uiry4TCtn80Vo1O6l2rJt3DohGeVZw74BxCL1l7Xqbg= = X-Google-Smtp-Source: AGHT+IHAGKodQ5NYzO3AeT3+Y7bEkrquLt3yRSC6K37gA5H3M6OaCj4AvraXXQ9S3dF83Kurjty0nw== X-Received: by 2002:a05:6a20:d046:b0:1ee:ba29:d8fa with SMTP id adf61e73a8af0-200f55251f9mr2627500637.6.1743650660411; Wed, 02 Apr 2025 20:24:20 -0700 (PDT) Received: from smtpclient.apple ([2601:644:4003:ea90:1cb2:d1f9:2823:ddd5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-af9bc3fd3b7sm225811a12.51.2025.04.02.20.24.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Apr 2025 20:24:19 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Manikandan Swaminathan Mime-Version: 1.0 (1.0) Subject: Re: Postgres Query Plan using wrong index Date: Wed, 2 Apr 2025 20:24:08 -0700 Message-Id: <2BC8AB39-16D7-4423-BE0A-F0F4EA432E2E@gmail.com> References: <1203098.1743640224@sss.pgh.pa.us> Cc: pgsql-general@lists.postgresql.org In-Reply-To: <1203098.1743640224@sss.pgh.pa.us> To: Tom Lane X-Mailer: iPhone Mail (20G81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Thanks Tom. Since you mentioned the planner not knowing about the correlation between th= e columns, I=E2=80=99m curious, why doesn=E2=80=99t making a multivariate st= atistic make a difference? CREATE STATISTICS col_a_col_b_stats (dependencies) ON col_a, col_b FROM test= _table; ANALYZE test_table; And the resulting query plan which uses just the index on col_b: postgres=3D# explain analyze select min(col_b) from test_table where col_a >= 4996; =20 Result (cost=3D62.13..62.14 rows=3D1 width=3D4) (actual time=3D536.648..53= 6.649 rows=3D1 loops=3D1) InitPlan 1 -> Limit (cost=3D0.43..62.13 rows=3D1 width=3D4) (actual time=3D536.6= 41..536.641 rows=3D1 loops=3D1) -> Index Scan using idx_col_a_btree on test_table (cost=3D0.43.= .254987.43 rows=3D4133 width=3D4) (actual time=3D536.640..536.640 rows=3D1 l= oops=3D1) Filter: (col_a > 4996) Rows Removed by Filter: 9992000 Planning Time: 0.285 ms Execution Time: 536.681 ms >=20 > On Apr 2, 2025, at 5:30 PM, Tom Lane wrote: >=20 > =EF=BB=BFManikandan Swaminathan writes: >> 1) Why is the query currently picking the poorly performing index? >=20 > Because the planner thinks that one will be cheaper, as you can see by > comparing the cost estimates in EXPLAIN. It's wrong, but this is a > hard problem to estimate well. Especially when the behavior depends > on a correlation between columns that the planner knows nothing about. >=20 >> 2) Why would the index you suggested, (col_b, col_a), perform better than= (col_a, col_b)? I would=E2=80=99ve expected the filter on col_a to come fir= st, followed by the aggregate on col_b. In my mind, it needs to find rows ma= tching the col_a condition before calculating the MIN(col_b), and I assumed i= t would traverse the B-tree accordingly. >=20 > The idea the planner is using is "scan the index in order (that is, > in col_b order) until you find the first row satisfying the other > constraints (that is, the col_a condition). Then that row's col_b > value is the correct MIN(), and you can stop." Since it knows nothing > of the cross-column correlation, its estimate of how many rows it'll > have to scan through is overly optimistic. But it knows that the > other way involves scanning a whole lot of the index --- there's no > chance of stopping early --- so that's estimated as higher-cost. >=20 > The index I suggested on (col_b, col_a) is amenable to this same > plan shape, since col_b is still the major sort column. The > reason it wins is that the col_a condition can be checked in the > index without having to visit the heap, thus eliminating a lot of > random access to the heap. >=20 >> 3) Why does the planner choose the better-performing (col_a, col_b) index= when the filter is col_a > 5000, but switch to the slower (col_b) index whe= n the filter is not at the edge of the range, like col_a > 4996? >=20 > At some point, as less and less of the col_a-major index would need to > be scanned, there's a crossover in the cost estimates for the two ways > of doing this. I would not have cared to predict where the crossover > is, but you evidently found it empirically. >=20 >> For reference, here=E2=80=99s the query plan when filtering for col_a > 5= 000. It uses the correct index on (col_a, col_b). >=20 > You would do a lot better to approach this without rigid notions of > which is the "correct" index. All of the ones we've discussed are > potentially usable for this query, and they all have different cost > curves depending on how selective the col_a condition is. Even the > index on col_b alone could potentially be the best, because it'll be > smaller than the two-column indexes. So if the col_a condition is > very unselective then it's (at least in theory) possible that that > would be the best choice. >=20 > regards, tom lane