Message-ID: <4B757D5D.3070506@enterprisedb.com>
Date: Fri, 12 Feb 2010 18:10:05 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
Organization: EnterpriseDB
User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706)
MIME-Version: 1.0
To: Fujii Masao <masao.fujii@gmail.com>
CC: Simon Riggs <simon@2ndquadrant.com>,
	Dimitri Fontaine <dfontaine@hi-media.com>,
	PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server 	continuously
	retry restoring the next WAL
References: <20100127152751.3B2047541B9@cvs.postgresql.org>	
	<1265893599.7341.1454.camel@ebony>	
	<877hqjc2kk.fsf@hi-media-techno.com>	
	<1265896250.7341.1627.camel@ebony>
	<4B740C6C.3010607@enterprisedb.com>	
	<1265897834.7341.1714.camel@ebony>
	<4B7412BE.5030605@enterprisedb.com>	
	<3f0b79eb1002112138n61a3258fg9986e50751d44ea0@mail.gmail.com>	
	<1265979080.7341.3679.camel@ebony>
	<4B75533D.2000703@enterprisedb.com>
	<3f0b79eb1002120747q3203bed6ue1bd07558ec2e38b@mail.gmail.com>
In-Reply-To: <3f0b79eb1002120747q3203bed6ue1bd07558ec2e38b@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Fujii Masao wrote:
> On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>>> So I suggest that you have a new action that gets called after every
>>> checkpoint to clear down the archive. It will remove all files from the
>>> archive prior to %r. We can implement that as a sequence of unlink()s
>>> from within the server, or we can just call a script to do it. I prefer
>>> the latter approach. However we do it, we need something initiated by
>>> the server to maintain the archive and stop it from overflowing.
>> +1
> 
> If we leave executing the remove_command to the bgwriter, the restartpoint
> might not happen unfortunately for a long time. 

Are you thinking of a scenario where remove_command gets stuck, and
prevents bgwriter from performing restartpoints while it's stuck? You
have trouble if restore_command gets stuck like that as well, so I think
we can require that the remove_command returns in a reasonable period of
time, ie. in a few minutes.

> To prevent that situation, the
> archiver should execute the command, I think. Thought?

The archiver isn't running in standby, so that's not going to work. And
it's not connected to shared memory either, so it doesn't know what the
latest restartpoint is.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com