Received: from localhost (maia-5.hub.org [200.46.204.182]) by postgresql.org (Postfix) with ESMTP id 18C4F9FB400 for ; Wed, 20 Jun 2007 04:06:33 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.182]) (amavisd-maia, port 10024) with ESMTP id 42933-04 for ; Wed, 20 Jun 2007 04:06:27 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130]) by postgresql.org (Postfix) with ESMTP id C549D9FB888 for ; Wed, 20 Jun 2007 04:06:29 -0300 (ADT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss.pgh.pa.us (8.14.1/8.14.1) with ESMTP id l5K76R3Y009002; Wed, 20 Jun 2007 03:06:27 -0400 (EDT) To: Greg Smith cc: pgsql-performance@postgresql.org Subject: Re: Volunteer to build a configuration tool In-reply-to: References: <1182298719.4482.133.camel@ipso.snappymail.ca> <7621.1182318230@sss.pgh.pa.us> Comments: In-reply-to Greg Smith message dated "Wed, 20 Jun 2007 02:45:27 -0400" Date: Wed, 20 Jun 2007 03:06:27 -0400 Message-ID: <9001.1182323187@sss.pgh.pa.us> From: Tom Lane X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200706/405 X-Sequence-Number: 25438 Greg Smith writes: > On Wed, 20 Jun 2007, Tom Lane wrote: >> I think what would be much more useful in the long run is some serious >> study of the parameters themselves. For instance, random_page_cost is a >> self-admitted oversimplification of reality. > If I could figure out who would sponsor such a study that's what I'd be > doing right now. Hmm ... Sun? EDB? Greenplum? [I'm afraid Red Hat is not likely to step up to the plate right now, they have other priorities] > Many of the tuning knobs on the query optimizer > seem very opaque to me so far, At least some of them are demonstrably broken. The issue here is to develop a mental model that is both simple enough to work with, and rich enough to predict real-world behavior. > Here's an example of one of the simplest questions in this area to > demonstate things I wonder about. Let's say I have a properly indexed > database of some moderate size such that you're in big trouble if you do a > sequential scan. How can I tell if effective_cache_size is in the right > ballpark so it will do what I want to effectively navigate that? As the guy who put in effective_cache_size, I'd say it's on the broken side of the fence. Think about how to replace it with a more useful parameter, not how to determine a good value for it. "Useful" means both "easy to determine a value for" and "strong for estimating query costs", which are contradictory to some extent, but that's the problem to be solved --- and effective_cache_size doesn't really win on either metric. To me, the worst catch-22 we face in this area is that we'd like the optimizer's choices of plan to be stable and understandable, but the real-world costs of queries depend enormously on short-term conditions such as how much of the table has been sucked into RAM recently by other queries. I have no good answer to that one. regards, tom lane