DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=Lx9NSWKJoektoMkVQ5QBx0gnZPhaUG3r10aB3gVO0AgqVy1h86X7RNZpPfZ8V3B+CP
	d7mMWDGRdJ2FMjjMVWAVD8cnqVM1HqVpqjTuisej6IMDFImV99MCYNmR6BaH2vu1Mnjs
	VkPnXo18zQoZZIL4fvri+P0lJiY+OPFIPAso0=
MIME-Version: 1.0
In-Reply-To: <4DE89072.7070305@usit.uio.no>
References: <4DE89072.7070305@usit.uio.no>
Date: Mon, 13 Jun 2011 13:24:02 -0400
Message-ID: <BANLkTikCxvg9J4ZbGPXt8PrSxiptjkEdeQ@mail.gmail.com>
Subject: Re: Information about WAL Configuration needs an update
From: Robert Haas <robertmhaas@gmail.com>
To: Rafael Martinez <r.m.guerrero@usit.uio.no>
Cc: pgsql-docs@postgresql.org
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jun 3, 2011 at 3:42 AM, Rafael Martinez
<r.m.guerrero@usit.uio.no> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello
>
> I am sending this email because I think the section
> "28.4. WAL Configuration" [1] of the manual needs to be improved to
> avoid some potential configuration problems.
>
> I have experienced the problem I am going to describe myself and it is
> not the first time other postgres users have asked about this in other
> forums.
>
> This section says among other things (pg-8.3):
>
> " ..... There will always be at least one WAL segment file, and will
> normally not be more than (2 + checkpoint_completion_target) *
> checkpoint_segments + 1 files. Each segment file is normally 16 MB
> (though this size can be altered when building the server). You can use
> this to estimate space requirements for WAL. Ordinarily, when old log
> segment files are no longer needed, they are recycled (renamed to become
> the next segments in the numbered sequence). If, due to a short-term
> peak of log output rate, there are more than 3 * checkpoint_segments + 1
> segment files, the unneeded segment files will be deleted instead of
> recycled until the system gets back under this limit....."
>
> For 9.0 it is almost the same but with some additional information about
> wal_keep_segments.
>
> The part I think should be improved by a note or an extra paragraph is
> this one "... If, due to a short-term peak of log output rate ..."
>
> What is the meaning of a "short-term peak" and how many WAL files over
> the (3 * checkpoint_segments + 1 segment files) limit can we expect
> during a short-term peak?
>
> I sent some days ago an email to pgsql-general about this, REF:
> http://archives.postgresql.org/pgsql-general/2011-05/msg00764.php
>
> But we did not get to any conclusion about how much disk for WAL files
> is really necessary.
>
> I've run some tests to try to get some numbers that can explain what
> happens in my case.
>
> What we have seen is that when creating a GIN index in a tsvector column
> the number of WAL files grow almost proportionally with the size of the
> index we are creating.
>
> The GIN index we are creating on a ~7GB table in one our system is
> around 17GB.
>
> The amount of WAL files in this system will grow to 1353 WAL files while
> this GIN index is being created (checkpoint_segments=128,
> checkpoint_completion_target=0.5 and checkpoint_timeout=5min)
>
> Normally, the amount of WAL files according to the documentation should
> be between 321 to 385 in our case. But it doesn't say anything about how
> many WAL files you can expect during a "short-term peak" and what can
> provoke this.
>
> In our case we got over 1000 "extra" WAL files that it is almost the
> equivalent to the 17GB of our GIN index. The amount of WAL files got
> back to a normal level after this GIN index was generated.
>
> You can see the graph with the generation of WAL files + some extra
> information for this test here: http://folk.uio.no/rafael/total_wal/
>
> What do you think? Shouldn't we update the documentation with some
> information about this?

Perhaps, but we'd have to think of something intelligent to say about
it first.  We can't remove the old WAL files until we successfully
checkpoint, and so I think if checkpoints are taking a very long to
complete or failing altogether, there's actually no upper bound.  I
don't think we have any kind of "hard stop" where, if no log space is
available, we just refuse to process write transactions - such a thing
would seem to be rather dangerous.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company