Received: from localhost (maia-4.hub.org [200.46.204.183]) by postgresql.org (Postfix) with ESMTP id 6C92F9FB84E for ; Fri, 25 May 2007 05:06:57 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 39966-01 for ; Fri, 25 May 2007 05:06:53 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.4 Received: from h07.llord.com (unknown [195.140.142.194]) by postgresql.org (Postfix) with ESMTP id 9045D9FB641 for ; Fri, 25 May 2007 05:06:53 -0300 (ADT) Received: from par69-8-88-161-102-87.fbx.proxad.net ([88.161.102.87] helo=apollo13) by h07.llord.com with esmtpa (Exim 4.63) (envelope-from ) id 1HrUor-00024Q-4t; Fri, 25 May 2007 10:06:41 +0200 Date: Fri, 25 May 2007 10:09:17 +0200 To: "Steve Atkins" , "PostgreSQL General" Subject: Re: index vs. seq scan choice? From: PFC Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8 MIME-Version: 1.0 References: <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net> <27828.1180055291@sss.pgh.pa.us> <20070525023922.GV4320@alvh.no-ip.org> <29662.1180061117@sss.pgh.pa.us> <46565086.2040705@commandprompt.com> <115.1180063568@sss.pgh.pa.us> <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com> Content-Transfer-Encoding: 7bit Message-ID: In-Reply-To: <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com> User-Agent: Opera Mail/9.10 (Linux) X-PopBeforeSMTPSenders: junk@peufeu.com, lists@peufeu.com, peufeu@peufeu.com, pfcaillaud@peufeu.com X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - h07.llord.com X-AntiAbuse: Original Domain - postgresql.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - peufeu.com X-Source: X-Source-Args: X-Source-Dir: X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=2.16 tagged_above=0 required=5 tests=AWL=-0.430, BAYES_40=-0.185, RCVD_IN_NJABL_SPAM=2.775 X-Spam-Level: ** X-Archive-Number: 200705/1239 X-Sequence-Number: 114433 > Would it be possible to look at a much larger number of samples during > analyze, > then look at the variation in those to generate a reasonable number of > pg_statistic "samples" to represent our estimate of the actual > distribution? > More datapoints for tables where the planner might benefit from it, fewer > where it wouldn't. Maybe it would be possible to take note somewhere of the percentage of occurence of the most common value (in the OP's case, about 3%), in which case a quick decision can be taken to use the index without even looking at the value, if we know the most common one is below the index use threshold...