Received: from maia.hub.org (unknown [200.46.204.183]) by mail.postgresql.org (Postfix) with ESMTP id 0283B632A72 for ; Thu, 11 Feb 2010 06:41:06 -0400 (AST) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 33526-10 for ; Thu, 11 Feb 2010 10:40:55 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail136086.authsmtp.com (outmail136086.authsmtp.com [62.13.136.86]) by mail.postgresql.org (Postfix) with ESMTP id 4E819632CE6 for ; Thu, 11 Feb 2010 06:40:55 -0400 (AST) Received: from mail-c194.authsmtp.com (mail-c194.authsmtp.com [62.13.128.121]) by punt9.authsmtp.com (8.14.2/8.14.2/Kp) with ESMTP id o1BAencF004868; Thu, 11 Feb 2010 10:40:49 GMT Received: from [10.0.1.26] (74-92-138-153-WashingtonDC.hfc.comcastbusiness.net [74.92.138.153]) (authenticated bits=0) by mail.authsmtp.com (8.14.2/8.14.2/Kp) with ESMTP id o1BAelYR011606; Thu, 11 Feb 2010 10:40:47 GMT Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL From: Simon Riggs To: Heikki Linnakangas Cc: Fujii Masao , PostgreSQL-development In-Reply-To: <4B726120.80007@enterprisedb.com> References: <20100127152751.3B2047541B9@cvs.postgresql.org> <3f0b79eb1002092105r21e009d3v468496058ba04392@mail.gmail.com> <4B726120.80007@enterprisedb.com> Content-Type: text/plain Date: Thu, 11 Feb 2010 10:37:37 +0000 Message-Id: <1265884657.7341.1192.camel@ebony> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Content-Transfer-Encoding: 7bit X-Server-Quench: ea9bb37f-16f9-11df-80b9-0022640b883e X-Report-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCdxZQATClZOTQEd DAteCiN5VAwpPBRK HVkIKg5MJUcNSQVJ NksadRtFaQRba1xT HGQLWlREUV17XWN/ bwIfagFDa0hQXgZi TklMQU1XHAJ3AVJe B2xvB0UbFQVHeXhw YwhkW3VZEhErfBQv QUkCCGwAM259aWFL Bl1Qd1FdbQNKfB1D blAtXHsONCtlM3Bw LC8aFBMcBw5qYB5Y EEk+BnU3ZGc3IhMG fCVKBTo0BElXDwA6 LBFuMUIVGkobIw0p PEE/VEh6exQVDBFf GVxKHTRdNhEbSjYx HEs7R0MFDDpHQCFT SgEoL1dODyxOEhVx ICMA X-Authentic-SMTP: 61633235383639.1015:706/Kp X-AuthFastPath: 255 X-Virus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-2.599 tagged_above=-10 required=5 tests=BAYES_00=-2.599 X-Spam-Level: X-Archive-Number: 201002/829 X-Sequence-Number: 157172 On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: > Fujii Masao wrote: > > As I pointed out previously, the standby might restore a partially-filled > > WAL file that is being archived by the primary, and cause a FATAL error. > > And this happened in my box when I was testing the SR. > > > > sby [20088] FATAL: archive file "000000010000000000000087" has > > wrong size: 14139392 instead of 16777216 > > sby [20076] LOG: startup process (PID 20088) exited with exit code 1 > > sby [20076] LOG: terminating any other active server processes > > act [18164] LOG: received immediate shutdown request > > > > If the startup process is in standby mode, I think that it should retry > > starting replication instead of emitting an error when it finds a > > partially-filled file in the archive. Then if the replication has been > > terminated, it has only to restore the archived file again. Thought? > > Hmm, so after running restore_command, check the file size and if it's > too short, treat it the same as if restore_command returned non-zero? > And it will be retried on the next iteration. Works for me, though OTOH > it will then fail to complain about a genuinely WAL file that's > truncated for some reason. I guess there's no way around that, even if > you have a script as restore_command that does the file size check, it > will have the same problem. Are we trying to re-invent pg_standby here? -- Simon Riggs www.2ndQuadrant.com