Message-ID: <4BA37FD5.9000404@enterprisedb.com>
Date: Fri, 19 Mar 2010 15:44:53 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
Organization: EnterpriseDB
User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706)
MIME-Version: 1.0
To: Alvaro Herrera <alvherre@commandprompt.com>
CC: Simon Riggs <simon@2ndQuadrant.com>, Fujii Masao <masao.fujii@gmail.com>,
	Aidan Van Dyk <aidan@highrise.ca>,
	PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously
	retry restoring the next WAL
References: <4B740613.5090004@enterprisedb.com>
	<20100211140118.GB14128@oak.highrise.ca>
	<4B74118C.30704@enterprisedb.com>
	<20100211144204.GC14128@oak.highrise.ca>
	<4B743E7D.5070603@enterprisedb.com>
	<3f0b79eb1002180337t1fab1395ve3491256672af15f@mail.gmail.com>
	<4BA0B079.3050301@enterprisedb.com>
	<3f0b79eb1003180727g7877743eq81274e014fe70a49@mail.gmail.com>
	<1268988724.3556.3.camel@ebony> <4BA361E4.7020309@enterprisedb.com>
	<20100319132848.GA3301@alvh.no-ip.org>
In-Reply-To: <20100319132848.GA3301@alvh.no-ip.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

Alvaro Herrera wrote:
> Heikki Linnakangas escribió:
> 
>> When recovery reaches an invalid WAL record, typically caused by a
>> half-written WAL file, it closes the file and moves to the next source.
>> If an error is found in a file restored from archive or in a portion
>> just streamed from master, however, a PANIC is thrown, because it's not
>> expected to have errors in the archive or in the master.
> 
> Hmm, I think I've heard that tools like walmgr do incremental copies of
> the current WAL segment to the archive.  Doesn't this change break that?

Hmm, you could have a restore_command that checks the size before
restoring to make it still work. I note that pg_standby does that, but
of course you can't use pg_standby with the built-in standby mode. Or
maybe we should modify the built-in standby mode to handle partial files
coming from restore_command by not throwing an error but recovering to
the end of the partial file, and then retrying restore_command again
with the same filename until the whole file is recovered (or the missing
WAL is received through other means, ie. streaming replication).

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com