To: Greg Smith <gsmith@gregsmith.com>
cc: pgsql-performance@postgresql.org
Subject: Re: Volunteer to build a configuration tool 
In-reply-to: <Pine.GSO.4.64.0706200224430.16657@westnet.com> 
References: <A3AC4FA47DC0B1458C3E5396E685E63302395E53@SAB-DC1.sab.uiuc.edu>
	<1182298719.4482.133.camel@ipso.snappymail.ca>
	<Pine.GSO.4.64.0706200040001.7594@westnet.com>
	<7621.1182318230@sss.pgh.pa.us>
	<Pine.GSO.4.64.0706200224430.16657@westnet.com>
Comments: In-reply-to Greg Smith <gsmith@gregsmith.com>
	message dated "Wed, 20 Jun 2007 02:45:27 -0400"
Date: Wed, 20 Jun 2007 03:06:27 -0400
Message-ID: <9001.1182323187@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>

Greg Smith <gsmith@gregsmith.com> writes:
> On Wed, 20 Jun 2007, Tom Lane wrote:
>> I think what would be much more useful in the long run is some serious 
>> study of the parameters themselves.  For instance, random_page_cost is a 
>> self-admitted oversimplification of reality.

> If I could figure out who would sponsor such a study that's what I'd be 
> doing right now.

Hmm ... Sun?  EDB?  Greenplum?  [I'm afraid Red Hat is not likely to
step up to the plate right now, they have other priorities]

> Many of the tuning knobs on the query optimizer 
> seem very opaque to me so far,

At least some of them are demonstrably broken.  The issue here is to
develop a mental model that is both simple enough to work with, and
rich enough to predict real-world behavior.

> Here's an example of one of the simplest questions in this area to 
> demonstate things I wonder about.  Let's say I have a properly indexed 
> database of some moderate size such that you're in big trouble if you do a 
> sequential scan.  How can I tell if effective_cache_size is in the right 
> ballpark so it will do what I want to effectively navigate that?

As the guy who put in effective_cache_size, I'd say it's on the broken
side of the fence.  Think about how to replace it with a more useful
parameter, not how to determine a good value for it.  "Useful" means
both "easy to determine a value for" and "strong for estimating query
costs", which are contradictory to some extent, but that's the problem
to be solved --- and effective_cache_size doesn't really win on either
metric.

To me, the worst catch-22 we face in this area is that we'd like the
optimizer's choices of plan to be stable and understandable, but the
real-world costs of queries depend enormously on short-term conditions
such as how much of the table has been sucked into RAM recently by
other queries.  I have no good answer to that one.

			regards, tom lane