Received: from maia.hub.org (unknown [200.46.208.211]) by mail.postgresql.org (Postfix) with ESMTP id B6183633516 for ; Wed, 24 Mar 2010 09:31:29 -0300 (ADT) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024) with ESMTP id 83879-06 for ; Wed, 24 Mar 2010 12:31:11 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from exprod7og125.obsmtp.com (exprod7og125.obsmtp.com [64.18.2.28]) by mail.postgresql.org (Postfix) with SMTP id 2E45B633300 for ; Wed, 24 Mar 2010 09:31:18 -0300 (ADT) Received: from source ([209.85.219.214]) by exprod7ob125.postini.com ([64.18.6.12]) with SMTP ID DSNKS6oGFAX/IQz8S/VwLIpLJDWOSNlmr0H1@postini.com; Wed, 24 Mar 2010 05:31:17 PDT Received: by ewy6 with SMTP id 6so1469003ewy.24 for ; Wed, 24 Mar 2010 05:31:15 -0700 (PDT) Received: by 10.213.97.24 with SMTP id j24mr2419518ebn.48.1269433875281; Wed, 24 Mar 2010 05:31:15 -0700 (PDT) Received: from [192.168.1.117] (dsl-hkibrasgw2-ff67c300-165.dhcp.inet.fi [88.195.103.165]) by mx.google.com with ESMTPS id 14sm3652252ewy.10.2010.03.24.05.31.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 24 Mar 2010 05:31:09 -0700 (PDT) Message-ID: <4BAA060A.2020000@enterprisedb.com> Date: Wed, 24 Mar 2010 14:31:06 +0200 From: Heikki Linnakangas Organization: EnterpriseDB User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706) MIME-Version: 1.0 To: Fujii Masao CC: Simon Riggs , Aidan Van Dyk , PostgreSQL-development Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL References: <3f0b79eb1002092105r21e009d3v468496058ba04392@mail.gmail.com> <20100211140118.GB14128@oak.highrise.ca> <4B74118C.30704@enterprisedb.com> <20100211144204.GC14128@oak.highrise.ca> <4B743E7D.5070603@enterprisedb.com> <3f0b79eb1002180337t1fab1395ve3491256672af15f@mail.gmail.com> <4BA0B079.3050301@enterprisedb.com> <3f0b79eb1003180727g7877743eq81274e014fe70a49@mail.gmail.com> <1268988724.3556.3.camel@ebony> <4BA361E4.7020309@enterprisedb.com> <3f0b79eb1003230017v16f4ecbeyc20e75beeffe8f1c@mail.gmail.com> In-Reply-To: <3f0b79eb1003230017v16f4ecbeyc20e75beeffe8f1c@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-2.392 tagged_above=-10 required=5 tests=AWL=0.207, BAYES_00=-2.599 X-Spam-Level: X-Archive-Number: 201003/974 X-Sequence-Number: 159750 Fujii Masao wrote: > But in the current (v8.4 or before) behavior, recovery ends normally > when an invalid record is found in an archived WAL file. Otherwise, > the server would never be able to start normal processing when there > is a corrupted archived file for some reasons. So, that invalid record > should not be treated as a PANIC if the server is not in standby mode > or the trigger file has been created. Thought? Hmm, true, this changes behavior over previous releases. I tend to think that it's always an error if there's a corrupt file in the archive, though, and PANIC is appropriate. If the administrator wants to start up the database anyway, he can remove the corrupt file from the archive and place it directly in pg_xlog instead. > When I tested the patch, the following PANIC error was thrown in the > normal archive recovery. This seems to derive from the above change. > The detail error sequence: > 1. In ReadRecord(), emode was set to PANIC after 00000001000000000000000B > was read. > 2. 00000001000000000000000C including the contrecord tried to be read > by using the emode (= PANIC). But since 00000001000000000000000C did > not exist, PANIC error was thrown. > > ----------------- > LOG: restored log file "00000001000000000000000B" from archive > cp: cannot stat `../data.arh/00000001000000000000000C': No such file > or directory > PANIC: could not open file "pg_xlog/00000001000000000000000C" (log > file 0, segment 12): No such file or directory > LOG: startup process (PID 17204) was terminated by signal 6: Aborted > LOG: terminating any other active server processes > ----------------- Thanks. That's easily fixable (applies over the previous patch): --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3773,7 +3773,7 @@ retry: pagelsn.xrecoff = 0; } /* Wait for the next page to become available */ - if (!XLogPageRead(&pagelsn, emode, false, false)) + if (!XLogPageRead(&pagelsn, emode_arg, false, false)) return NULL; /* Check that the continuation record looks valid */ Perhaps the emode/emode_arg convention is a bit hard to read. I'll go through the patch myself once more, and commit later today or tomorrow if now new issues crop up. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com