Received: from localhost (maia-2.hub.org [200.46.204.187]) by postgresql.org (Postfix) with ESMTP id 5E2B29FB931 for ; Fri, 25 May 2007 09:52:39 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.187]) (amavisd-maia, port 10024) with ESMTP id 93913-08 for ; Fri, 25 May 2007 09:52:30 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.4 Received: from smtp-mclean.mitre.org (smtpproxy2.mitre.org [192.80.55.71]) by postgresql.org (Postfix) with ESMTP id A93C59FA224 for ; Fri, 25 May 2007 09:52:35 -0300 (ADT) Received: from smtp-mclean.mitre.org (localhost.localdomain [127.0.0.1]) by smtp-mclean.mitre.org (8.12.11.20060308/8.12.11) with SMTP id l4PCQosq014539 for ; Fri, 25 May 2007 08:26:50 -0400 Received: from smtp-mclean.mitre.org (localhost.localdomain [127.0.0.1]) by smtp-mclean.mitre.org (Postfix) with ESMTP id D468B4F8D7 for ; Fri, 25 May 2007 08:26:49 -0400 (EDT) Received: from imcfe2.MITRE.ORG (imcfe2.mitre.org [129.83.29.4]) by smtp-mclean.mitre.org (8.12.11.20060308/8.12.11) with ESMTP id l4PCQn26014515 for ; Fri, 25 May 2007 08:26:49 -0400 Received: from [129.83.51.92] ([129.83.51.92]) by imcfe2.MITRE.ORG with Microsoft SMTPSVC(6.0.3790.1830); Fri, 25 May 2007 08:52:33 -0400 In-Reply-To: <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com> References: <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net> <27828.1180055291@sss.pgh.pa.us> <20070525023922.GV4320@alvh.no-ip.org> <29662.1180061117@sss.pgh.pa.us> <46565086.2040705@commandprompt.com> <115.1180063568@sss.pgh.pa.us> <23385219-5252-468A-BBC9-69516DA81C2A@blighty.com> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: "John D. Burger" Subject: Re: index vs. seq scan choice? Date: Fri, 25 May 2007 08:55:24 -0400 To: PostgreSQL General X-Mailer: Apple Mail (2.752.2) X-OriginalArrivalTime: 25 May 2007 12:52:33.0898 (UTC) FILETIME=[903F84A0:01C79ECB] X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200705/1245 X-Sequence-Number: 114439 Steve Atkins wrote: > Would it be possible to look at a much larger number of samples > during analyze, > then look at the variation in those to generate a reasonable number of > pg_statistic "samples" to represent our estimate of the actual > distribution? > More datapoints for tables where the planner might benefit from it, > fewer > where it wouldn't. You could definitely try to measure the variance of the statistics (using, say, bootstrap resampling), and change the target 'til you got a "good" tradeoff between small sample size and adequate representation of the distribution. Unfortunately, I think the definition of "good" depends strongly on the kinds of queries that get run. Basically, you want the statistics target to be just big enough that more stats wouldn't change the plans for common queries. Remember, too, that this is not just one number, it'd be different for each column (perhaps zero for most). I could imagine hillclimbing the stats targets by storing common queries and then replaying them, while varying the sample size. There was a discussion last year related to all of this, see: http://archives.postgresql.org/pgsql-general/2006-10/msg00526.php - John D. Burger MITRE