public inbox for [email protected]  
help / color / mirror / Atom feed
From: John D. Burger <[email protected]>
To: PostgreSQL General <[email protected]>
Subject: Re: index vs. seq scan choice?
Date: Fri, 25 May 2007 08:55:24 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

Steve Atkins wrote:

> Would it be possible to look at a much larger number of samples  
> during analyze,
> then look at the variation in those to generate a reasonable number of
> pg_statistic "samples" to represent our estimate of the actual  
> distribution?
> More datapoints for tables where the planner might benefit from it,  
> fewer
> where it wouldn't.

You could definitely try to measure the variance of the statistics  
(using, say, bootstrap resampling), and change the target 'til you  
got a "good" tradeoff between small sample size and adequate  
representation of the distribution.  Unfortunately, I think the  
definition of "good" depends strongly on the kinds of queries that  
get run.  Basically, you want the statistics target to be just big  
enough that more stats wouldn't change the plans for common queries.   
Remember, too, that this is not just one number, it'd be different  
for each column (perhaps zero for most).

I could imagine hillclimbing the stats targets by storing common  
queries and then replaying them, while varying the sample size.   
There was a discussion last year related to all of this, see:

   http://archives.postgresql.org/pgsql-general/2006-10/msg00526.php

- John D. Burger
   MITRE






view thread (17+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: index vs. seq scan choice?
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox