Received: from maia.hub.org (unknown [200.46.204.183]) by mail.postgresql.org (Postfix) with ESMTP id A4F6D635C72 for ; Fri, 19 Mar 2010 10:45:09 -0300 (ADT) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 59499-08 for ; Fri, 19 Mar 2010 13:44:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from exprod7og102.obsmtp.com (exprod7og102.obsmtp.com [64.18.2.157]) by mail.postgresql.org (Postfix) with SMTP id 9EF4C632B70 for ; Fri, 19 Mar 2010 10:44:58 -0300 (ADT) Received: from source ([209.85.219.222]) by exprod7ob102.postini.com ([64.18.6.12]) with SMTP ID DSNKS6N/2LYdDBuuDBQZmy9u7IBuRIEo6yCB@postini.com; Fri, 19 Mar 2010 06:44:57 PDT Received: by ewy22 with SMTP id 22so549291ewy.17 for ; Fri, 19 Mar 2010 06:44:55 -0700 (PDT) Received: by 10.213.54.3 with SMTP id o3mr1587085ebg.76.1269006295813; Fri, 19 Mar 2010 06:44:55 -0700 (PDT) Received: from [192.168.1.117] (dsl-hkibrasgw2-ff67c300-165.dhcp.inet.fi [88.195.103.165]) by mx.google.com with ESMTPS id 16sm218357ewy.3.2010.03.19.06.44.54 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 19 Mar 2010 06:44:55 -0700 (PDT) Message-ID: <4BA37FD5.9000404@enterprisedb.com> Date: Fri, 19 Mar 2010 15:44:53 +0200 From: Heikki Linnakangas Organization: EnterpriseDB User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706) MIME-Version: 1.0 To: Alvaro Herrera CC: Simon Riggs , Fujii Masao , Aidan Van Dyk , PostgreSQL-development Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL References: <4B740613.5090004@enterprisedb.com> <20100211140118.GB14128@oak.highrise.ca> <4B74118C.30704@enterprisedb.com> <20100211144204.GC14128@oak.highrise.ca> <4B743E7D.5070603@enterprisedb.com> <3f0b79eb1002180337t1fab1395ve3491256672af15f@mail.gmail.com> <4BA0B079.3050301@enterprisedb.com> <3f0b79eb1003180727g7877743eq81274e014fe70a49@mail.gmail.com> <1268988724.3556.3.camel@ebony> <4BA361E4.7020309@enterprisedb.com> <20100319132848.GA3301@alvh.no-ip.org> In-Reply-To: <20100319132848.GA3301@alvh.no-ip.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-2.599 tagged_above=-10 required=5 tests=BAYES_00=-2.599 X-Spam-Level: X-Archive-Number: 201003/775 X-Sequence-Number: 159551 Alvaro Herrera wrote: > Heikki Linnakangas escribió: > >> When recovery reaches an invalid WAL record, typically caused by a >> half-written WAL file, it closes the file and moves to the next source. >> If an error is found in a file restored from archive or in a portion >> just streamed from master, however, a PANIC is thrown, because it's not >> expected to have errors in the archive or in the master. > > Hmm, I think I've heard that tools like walmgr do incremental copies of > the current WAL segment to the archive. Doesn't this change break that? Hmm, you could have a restore_command that checks the size before restoring to make it still work. I note that pg_standby does that, but of course you can't use pg_standby with the built-in standby mode. Or maybe we should modify the built-in standby mode to handle partial files coming from restore_command by not throwing an error but recovering to the end of the partial file, and then retrying restore_command again with the same filename until the whole file is recovered (or the missing WAL is received through other means, ie. streaming replication). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com