Date: Fri, 25 May 2007 10:09:17 +0200
To: "Steve Atkins" <steve@blighty.com>,
	"PostgreSQL General" <pgsql-general@postgresql.org>
Subject: Re: index vs. seq scan choice?
From: PFC <lists@peufeu.com>
Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8
MIME-Version: 1.0
References: 
 <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net>
	<27828.1180055291@sss.pgh.pa.us>
	<20070525023922.GV4320@alvh.no-ip.org>
	<29662.1180061117@sss.pgh.pa.us>
	<46565086.2040705@commandprompt.com> <115.1180063568@sss.pgh.pa.us>
	<23385219-5252-468A-BBC9-69516DA81C2A@blighty.com>
Content-Transfer-Encoding: 7bit
Message-ID: <op.tsvh9ra1cigqcu@apollo13>
In-Reply-To: <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com>
User-Agent: Opera Mail/9.10 (Linux)


> Would it be possible to look at a much larger number of samples during  
> analyze,
> then look at the variation in those to generate a reasonable number of
> pg_statistic "samples" to represent our estimate of the actual  
> distribution?
> More datapoints for tables where the planner might benefit from it, fewer
> where it wouldn't.

	Maybe it would be possible to take note somewhere of the percentage of  
occurence of the most common value (in the OP's case, about 3%), in which  
case a quick decision can be taken to use the index without even looking  
at the value, if we know the most common one is below the index use  
threshold...