Received: from maia.hub.org (unknown [200.46.208.211]) by mail.postgresql.org (Postfix) with ESMTP id 5B2A96335F1 for ; Thu, 11 Feb 2010 14:11:58 -0400 (AST) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024) with ESMTP id 92639-04 for ; Thu, 11 Feb 2010 18:11:37 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from exprod7og122.obsmtp.com (exprod7og122.obsmtp.com [64.18.2.22]) by mail.postgresql.org (Postfix) with SMTP id E5DBB6335C9 for ; Thu, 11 Feb 2010 14:11:46 -0400 (AST) Received: from source ([74.125.78.25]) by exprod7ob122.postini.com ([64.18.6.12]) with SMTP ID DSNKS3RIYWoRtdsxQPAnDqEMnk9tUtilnWom@postini.com; Thu, 11 Feb 2010 10:11:46 PST Received: by ey-out-2122.google.com with SMTP id d26so346042eyd.1 for ; Thu, 11 Feb 2010 10:11:41 -0800 (PST) Received: by 10.213.100.151 with SMTP id y23mr160953ebn.78.1265911901461; Thu, 11 Feb 2010 10:11:41 -0800 (PST) Received: from ?192.168.1.117? ([88.195.103.165]) by mx.google.com with ESMTPS id 28sm6205622eyg.36.2010.02.11.10.11.39 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 11 Feb 2010 10:11:40 -0800 (PST) Message-ID: <4B74485A.3050804@enterprisedb.com> Date: Thu, 11 Feb 2010 20:11:38 +0200 From: Heikki Linnakangas Organization: EnterpriseDB User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706) MIME-Version: 1.0 To: Aidan Van Dyk CC: Simon Riggs , Fujii Masao , PostgreSQL-development Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL References: <1265884657.7341.1192.camel@ebony> <4B73F678.8070109@enterprisedb.com> <1265891248.7341.1346.camel@ebony> <4B73FB99.4080403@enterprisedb.com> <1265893599.7341.1454.camel@ebony> <4B740613.5090004@enterprisedb.com> <20100211140118.GB14128@oak.highrise.ca> <4B74118C.30704@enterprisedb.com> <20100211144204.GC14128@oak.highrise.ca> <4B7438A9.8090902@enterprisedb.com> <20100211173154.GD14128@oak.highrise.ca> In-Reply-To: <20100211173154.GD14128@oak.highrise.ca> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-2.4 tagged_above=-10 required=5 tests=AWL=0.199, BAYES_00=-2.599 X-Spam-Level: X-Archive-Number: 201002/909 X-Sequence-Number: 157252 Aidan Van Dyk wrote: > * Heikki Linnakangas [100211 12:04]: > >>> But it can be a problem - without the last WAL (or at least enough of >>> it) the master switched and archived, you have no guarantee of having >>> being consistent again (I'm thinking specifically of recovering from a >>> fresh backup) >> You have to wait for the last WAL file required by the backup to be >> archived before starting recovery. Otherwise there's no guarantee anyway. > > Right, but now define "wait for". As in don't start postmaster until the last WAL file needed by backup has been fully copied to the archive. > (because you've accepted that you *can* have short WAL files in the > archive) Only momentarily, while the copy is in progress. > I've always made my PITR such that "in the archive" (i.e. the first > moment a recovery can see it) implies that it's bit-for-bit identical to > the original (or at least as bit-for-bit I can assume by checking > various hashes I can afford to). I just assumed that was kind of common > practice. It's certainly good practice, agreed, but hasn't been absolutely required. > I'm amazed that "partial WAL" files are every available in anyones > archive, for anyone's restore command to actually pull. I find that > scarry, and sure, probably won't regularly be noticed... But man, I'ld > hate the time I need that emergency PITR restore to be the one time when > it needs that WAL, pulls it slightly before the copy has finished (i.e. > the master is pushing the WAL over a WAN to a 2nd site), and have my > restore complete consistently... It's not as dramatic as you make it sound. We're only talking about the last WAL file, and only when it's just being copied to the archive. If you have a archive_command like 'cp', and you look at the archive at the same millisecond that 'cp' runs, then yes you will see that the latest WAL file in the archive is only partially copied. It's not a problem for robustness; if you had looked one millisecond earlier you would not have seen the file there at all. Windows 'copy' command preallocates the whole file, which poses a different problem: if you look at the file while it's being copied, the file has the right length, but isn't in fact fully copied yet. I think 'rsync' has the same problem. To avoid that issue, you have to use something like copy+rename to make it atomic. There isn't much we can do in the server (or in pg_standby) to work around that, because there's no way to distinguish a file that's being copied from a fully-copied corrupt file. We do advise to set up an archive_command that doesn't overwrite existing files. That together with a partial WAL segment can cause a problem: if archive_command crashes while it's writing the file, leaving a partial file in the archive, the subsequent run of archive_command won't overwrite it and will get stuck trying. However, there's a small window for that even if you put the file into the archive atomically: if you crash just after fully copying the file, but before the .done file is created, upon restart the server will also try to copy the file to archive, find that it already exists, and fail. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com