Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0Hnn-00FpuJ-KG for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 10:28:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u0Hnl-005eKl-Vg for pgsql-general@arkaria.postgresql.org; Thu, 03 Apr 2025 10:28:01 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u0Hnl-005eKc-KT for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 10:28:01 +0000 Received: from mail-lf1-x12c.google.com ([2a00:1450:4864:20::12c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u0Hnj-002iSm-2g for pgsql-general@lists.postgresql.org; Thu, 03 Apr 2025 10:28:00 +0000 Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-54998f865b8so733669e87.3 for ; Thu, 03 Apr 2025 03:27:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743676078; x=1744280878; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4QRuM/vm6qdVbMT6rc5Nz6rfuz0jjwnz3r0kxgX+u54=; b=NeJ4WsaXvnveX+eTwdiCYhvUARwy75UhAyZiB0n9peY6r+5sbpGY6Dj8wokelI45dj 0BMo3VoRaUMJ40qOpQMVRe5tUxrPhMWhXB26USSD3qHl/nsWxy58g6b/kEkR6PBF1bOR MMkhcypyOKfdIPZNJJ5Zj+piWFeHROtaDpS6lWYEsVXcDsmsm1gx2ib4R1A08614uSQ+ 9wicdyITSsTAZCWxWTMx0ar8+0Lg0IYGvfaAJQwhZm6v+h5qfKnOLH54QdtcLULpysmK rYyTPJfN6I6KMNcgVsd/CWcMJA0RDZJTm7N3ucePz5zMmIuaKexdicSaqIBIjEQcRT1j wEbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743676078; x=1744280878; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4QRuM/vm6qdVbMT6rc5Nz6rfuz0jjwnz3r0kxgX+u54=; b=NfhLRkBvMoYajU6qjVdePvDZZNY/1u9IT+gJokV8CkG/c9vL/w24hMFjkAjQNaJUpy YgYeoed0KTaFtpF/rySyJtYrRbV2lppvnkofhpUPVLQpJCJSxQCimKDi15Ulddu7V/vF le0eMkAWAvHp1pf+Z9qk/kun4jG88WRRaBNgFwGfaXffICrr6ysHw1pCM+3c1fht61xN Koeazg4Opjn4mFP+qdatNaN5uDDIr7E41JVRDDgnwiTW6zlhYqhn0XEb0cWLHuT3WHI9 r1qolnlpkt8t7TT+tDSKzt8lnaTld7o6iA3xPCdqMbf6YN4kTjQ00DzujHeFXlbHrXg9 fBQQ== X-Forwarded-Encrypted: i=1; AJvYcCWNlJWBjrO5a3pnSwkrLPPu2TISNtZEjg4JRUTH+a24YrqdhrqRiEsb2QWJipVMkCcuLR0bFZ9N56lzCw7j@lists.postgresql.org X-Gm-Message-State: AOJu0YzIIjStkgjY16ZTsg1NjhUZY+6y5eLYcL3NKDbBHadqtDBaruoC iytni9/rXXrLhk3S22FYe23618dR95s5dx+aHiYPDXWtmM5XM49uzEX4OzR6ZjhehjKfhXZ6wMc 2MwEDrcClX1u62ZNsszQ0YuV+9js= X-Gm-Gg: ASbGncv5VAZDuAxsE3mWU7FldN2JbZU25FoOGpp7U2pcNO281mv1ZnrkuzJ0p8WDB8J T0Re2FFZgrqrQGq1VP4hi+li8E8A5znqLn6Bf2i0hEGp8SsEg6xOrInLVhsflIZR7g1eJ+2Znbx 4oslHiogFn0lk+AieY6DgvfToyVovhNfdy9Ux5JtmJGK6bZThtAXRq/VjKa89/ X-Google-Smtp-Source: AGHT+IFg23/kpt8XddE14H5TkhA/gAArkFPbWJVwiYAmvzriX0bhdMT6oRv/LAA8IOcPp7PjJNUuMGyobE7vJiHH738= X-Received: by 2002:a05:6512:33d6:b0:545:2b24:c711 with SMTP id 2adb3069b0e04-54c19c5c62dmr1813912e87.21.1743676077924; Thu, 03 Apr 2025 03:27:57 -0700 (PDT) MIME-Version: 1.0 References: <1203098.1743640224@sss.pgh.pa.us> <2BC8AB39-16D7-4423-BE0A-F0F4EA432E2E@gmail.com> <1233327.1743656842@sss.pgh.pa.us> In-Reply-To: <1233327.1743656842@sss.pgh.pa.us> From: David Rowley Date: Thu, 3 Apr 2025 23:27:45 +1300 X-Gm-Features: ATxdqUEgav2725CPOX2CtTiWdcoWAlFKBJWjwoTdHx0OxGvOnpufLegsLls7lW8 Message-ID: Subject: Re: Postgres Query Plan using wrong index To: Tom Lane Cc: Manikandan Swaminathan , pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 3 Apr 2025 at 18:07, Tom Lane wrote: > A simple-minded approach could be to just be pessimistic, and > increase our estimate of how many rows would need to be scanned as a > consequence of noticing that the columns have significant correlation. > The shape of that penalty function would be mostly guesswork though, > I fear. (Even with a clear idea of what to do, making this happen > seems a little complex --- just a SMOP, but I'm not very sure how to > wire it up.) The problem with being pessimistic is that if the distribution was even or the col_a > 4996 were right at the start of an ordered scan of the idx_col_b_a index, it would have been a good plan. There might be just as many people getting good plans as there are bad and adding pessimism here might just make one set of people happy and the others sad. I don't have a clear idea, but from a few minutes thinking about it, maybe when we build the statistics on a table, once we're about done with what happens today, if we then took the sample rows set and sorted them according to the sort order of each amcanorder index then go through the MCVs lists and histograms for each column and record a min and max value for the percentage of the way through the sorted sample rows that we find each MCV value and value within each histogram bucket and store those in some new statistics kind against the index (maybe indexes could opt into this). This might sound like it'd need lots of storage, but even 1 byte each for the min and max would be better than today. 0 could mean near the start, 127 in the middle and 255 at the end. I imagine this could be done fairly efficiently by sorting the MCVs lists by value and bsearching that array as we walk through the indexed-ordered sample rows. The histograms should be sorted already. When planning the query in question, to figure out the weighting of how many rows we expect to have to read, we look at the stats against the index to find the minimum position for values in the col_a histogram where col_a > 4996. Instead of assuming an even distribution, you use that minimum value to tell you what percentage of the index must be read before a match is found. The stored maximum position value would do the same job for backward index scans. David