Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1YsW9y-0005xN-Ms for pgsql-hackers@arkaria.postgresql.org; Wed, 13 May 2015 12:53:15 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1YsW9y-0008I2-1v for pgsql-hackers@arkaria.postgresql.org; Wed, 13 May 2015 12:53:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1YsW9w-0008Hw-Vn for pgsql-hackers@postgresql.org; Wed, 13 May 2015 12:53:13 +0000 Received: from mail-wi0-x233.google.com ([2a00:1450:400c:c05::233]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84) (envelope-from ) id 1YsW9u-00050E-6F for pgsql-hackers@postgresql.org; Wed, 13 May 2015 12:53:12 +0000 Received: by wicnf17 with SMTP id nf17so54550426wic.1 for ; Wed, 13 May 2015 05:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:reply-to:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=VMssOhMiquGLTfQnxa663MpOLNfZhM/AODUBdvGSuGo=; b=daDhdLTmmj6BINY20QWVb5ZKmYELVbV3rUjSlCD/+YBesvmEubjmmPDiohYajXu4nI 873K7O/n2iePL5uKKggLlmRd+l0Uh3emPSlKjKFNtkGBOhMxzKnmrWTyFcF6jmw3KR2x 1grOy7oQ8OhfPEdANNCw3cpU3DhQiYUgeQRCmft8fo0kY2G8zqA2qUaHmFdLirPNu27u qkiIc0hC0Kiima/gAqBLnRSm4KdBKBqnTAgCaIWd3FrXt4YhzQf9h/+d8CVgx19glhHG YtBTXJuPS/uV2HHGMLcagAdHG9HPG88BGXjsxikLqRMykOx/f62KtwBxuJynrhpj671+ Pw0g== X-Received: by 10.194.78.12 with SMTP id x12mr38315156wjw.112.1431521587536; Wed, 13 May 2015 05:53:07 -0700 (PDT) Received: from [192.168.1.99] (dsl-hkibrasgw1-58c38f-82.dhcp.inet.fi. [88.195.143.82]) by mx.google.com with ESMTPSA id fw3sm7963582wib.5.2015.05.13.05.53.05 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 May 2015 05:53:06 -0700 (PDT) Message-ID: <55534930.4040905@iki.fi> Date: Wed, 13 May 2015 15:53:04 +0300 From: Heikki Linnakangas Reply-To: hlinnaka@iki.fi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Robert Haas CC: Michael Paquier , Venkata Balaji N , Andres Freund , Fujii Masao , Borodin Vladimir , PostgreSQL-development Subject: Re: Streaming replication and WAL archive interactions References: <548AF1CB.80702@vmware.com> <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name> <549083D6.1000301@vmware.com> <54949108.3030109@vmware.com> <552FA38F.9060005@iki.fi> <5535FE71.1010905@iki.fi> <55362CAD.2000207@iki.fi> <553741FE.1080403@iki.fi> <554CB84E.3070406@iki.fi> <5550D20D.6090703@iki.fi> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org On 05/13/2015 03:36 PM, Robert Haas wrote: > On Mon, May 11, 2015 at 12:00 PM, Heikki Linnakangas wrote: >> And here is a new version of the patch. I kept the approach of using pgstat, >> but it now only polls pgstat every 10 seconds, and doesn't block to wait for >> updated stats. > > It's not entirely a new problem, but this error message has gotten pretty crazy: > > + (errmsg("WAL archival > (archive_mode=on/always/shared) requires wal_level \"archive\", > \"hot_standby\", or \"logical\""))); > > Maybe: WAL archival cannot be enabled when wal_level is "minimal" > > I think the documentation should be explicit about what happens if the > primary archives a file and dies before the standby gets notified that > the archiving happened. Yes, good point. > The standby, running in shared mode, is then > promoted. My first guess would be that the standby will end up with > files that thinks it needs to archive but, being unable to do so > because they're already there, they'll live forever in pg_xlog. I > hope that's not the case. Hmm. That is exactly what happens. The standby will attempt to archive them, which will fail, so the archiver will get stuck retrying. That's not actually a new problem though. Even with a single server doing archiving, it's possible that you crash just after archive_command has archived a file, but before it has created the .done file. After restart, the server will try to archive the file again, which will fail. But yeah, with this patch, that's much more likely to happen after a promotion. Our manual says that archive_command should refuse to overwrite an existing file. But to work-around the double-archival problem, where the same file is archived twice, it would be even better if it would simply return success if the file exists, *and has identical contents*. I don't know how to code that logic in a simple one-liner though. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers