Received: from localhost (maia-1.hub.org [200.46.204.191]) by postgresql.org (Postfix) with ESMTP id 2D4DE9FBB88 for ; Fri, 25 May 2007 01:25:41 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.191]) (amavisd-maia, port 10024) with ESMTP id 76560-01 for ; Fri, 25 May 2007 01:25:37 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from m.wordtothewise.com (fruitbat.wordtothewise.com [208.187.80.135]) by postgresql.org (Postfix) with ESMTP id 1A1C59FB8F6 for ; Fri, 25 May 2007 01:25:28 -0300 (ADT) Received: from [10.3.2.25] (184.wordtothewise.com [208.187.80.184]) by m.wordtothewise.com (Postfix) with ESMTP id 8802C800E0 for ; Thu, 24 May 2007 21:25:27 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <115.1180063568@sss.pgh.pa.us> References: <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net> <27828.1180055291@sss.pgh.pa.us> <20070525023922.GV4320@alvh.no-ip.org> <29662.1180061117@sss.pgh.pa.us> <46565086.2040705@commandprompt.com> <115.1180063568@sss.pgh.pa.us> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com> Content-Transfer-Encoding: 7bit From: Steve Atkins Subject: Re: index vs. seq scan choice? Date: Thu, 24 May 2007 21:25:23 -0700 To: PostgreSQL General X-Mailer: Apple Mail (2.752.3) X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200705/1237 X-Sequence-Number: 114431 On May 24, 2007, at 8:26 PM, Tom Lane wrote: > "Joshua D. Drake" writes: >> Tom Lane wrote: >>> I'm not sure I want to vote for another 10x increase by >>> default, though. > >> Outside of longer analyze times, and slightly more space taken up >> by the >> statistics, what is the downside? > > Longer plan times --- several of the selfuncs.c routines grovel > over all > the entries in the pg_statistic row. AFAIK no one's measured the real > impact of that, but it could easily be counterproductive for simple > queries. The lateness of the hour is suppressing my supposed statistics savvy, so this may not make sense, but... Would it be possible to look at a much larger number of samples during analyze, then look at the variation in those to generate a reasonable number of pg_statistic "samples" to represent our estimate of the actual distribution? More datapoints for tables where the planner might benefit from it, fewer where it wouldn't. Cheers, Steve