Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1YkSlV-0006iV-1N for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 07:38:41 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1YkSlU-00058V-9W for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 07:38:40 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1YkSlS-00058L-Re for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 07:38:38 +0000 Received: from mail-wi0-x229.google.com ([2a00:1450:400c:c05::229]) by magus.postgresql.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1YkSlK-0002oi-Kq for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 07:38:37 +0000 Received: by widdi4 with SMTP id di4so11099098wid.0 for ; Tue, 21 Apr 2015 00:38:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:reply-to:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=+EXL4NRlcEstPWpVg60RZvpXInIBGFhlLEvNWW5WLFA=; b=dKq2AV7u9m7c9UgEDJpNBB60Zgleh1AadXjArGGDSt4LjyBBfdGulWEx+oZVjJVdeK 2u5aLpVflH3L8G7WhGUE5oLZdtTrjE3CHYz9kdVDWkiuIJvR6wKSOn4md7knCQZUUcwH oJYnQL08iUN3i4elzVX+Q/JBDypc0XB8F6hKpUpeCiBgFbCptuEJFazGefhIC2QyoyCk x3zZZu+AScuAJPxn9SK7RCR3LInxGJnKRjSJp2ah3NmiAIx7JKzSoFGqCxLTPfzrJoxm msexAOJdylmN3qgLuTbmJSr/5kyavpDiJCbb1ZeZ835ACY1KhzZXCF6K5M+XXdiO1y4T jwyg== X-Received: by 10.181.11.129 with SMTP id ei1mr32388169wid.57.1429601908195; Tue, 21 Apr 2015 00:38:28 -0700 (PDT) Received: from [192.168.1.99] (dsl-hkibrasgw1-58c38f-82.dhcp.inet.fi. [88.195.143.82]) by mx.google.com with ESMTPSA id r9sm1360839wjo.26.2015.04.21.00.38.26 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Apr 2015 00:38:27 -0700 (PDT) Message-ID: <5535FE71.1010905@iki.fi> Date: Tue, 21 Apr 2015 10:38:25 +0300 From: Heikki Linnakangas Reply-To: hlinnaka@iki.fi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Michael Paquier CC: Venkata Balaji N , Andres Freund , Fujii Masao , Borodin Vladimir , PostgreSQL-development Subject: Re: Streaming replication and WAL archive interactions References: <548AF1CB.80702@vmware.com> <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name> <549083D6.1000301@vmware.com> <54949108.3030109@vmware.com> <552FA38F.9060005@iki.fi> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Pg-Spam-Score: -2.4 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org On 04/21/2015 09:53 AM, Michael Paquier wrote: > On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote: >> Oh, hang on, that's not necessarily true. On promotion, the standby > archives >> the last, partial WAL segment from the old timeline. That's just wrong >> (http://www.postgresql.org/message-id/52FCD37C.3070806@vmware.com), and in >> fact I somehow thought I changed that already, but apparently not. So > let's >> stop doing that. > > Er. Are you planning to prevent the standby from archiving the last partial > segment from the old timeline at promotion? Yes. > I thought from previous discussions that we should do it as master > (be it crashed, burned, burried or dead) may not have the occasion to > do it. By preventing its archiving you close the door to the case > where master did not have the occasion to archive it. The current situation is a mess: 1. Even though we archive the last segment in the standby, there is no guarantee that the master had archived all the previous segments already. 2. If the master is not totally dead, it might try to archive the same file with more WAL in it, at the same time or just afterwards, or even just before the standby has completed promotion. Which copy do you keep in the archive? Having to deal with that makes the archive_command more complicated. Note that even though we don't archive the partial last segment on the previous timeline, the same WAL is copied to the first segment on the new timeline. So the WAL isn't lost. > People may be surprised that a base backup taken from a node that has > archive_mode = on set (that's the case in a very large number of cases) > will not be able to work as-is as node startup will fail as follows: > FATAL: archive_mode='on' cannot be used in archive recovery > HINT: Use 'shared' or 'always' mode instead. Hmm, good point. > One idea would be to simply ignore the fact that archive_mode = on on nodes > in recovery instead of dropping an error. Note that I like the fact that it > drops an error as that's clear, I just point the fact that people may be > surprised that base backups are not working anymore now in this case. By "ignore", what behaviour do you mean? Would "on" be equivalent to "shared", "always", or something else? Or we could keep the current behaviour with archive_mode=on (except for the last segment thing, which is just wrong), where the standby only archives the new timeline, and nothing from the previous timelines. Are the use cases where you'd want that, rather than the new "shared" mode? I wanted to keep the 'on' mode for backwards-compatibility, but if that causes more problems, it might be better to just remove it and force the admin to choose what kind of a setup he has, with "shared" or "always". > Creating a dependency between the pgstat machinery and the WAL sender looks > weak to me. For example with this patch a master cannot stop, as it waits > indefinitely: > LOG: using stale statistics instead of current ones because stats > collector is not responding > LOG: sending archival report: Hmm, yeah, having walsender to wait for the stats file to appear is not good. > You could scan archive_status/ but that would be costly if there are many > entries to scan and I think that walsender should be highly responsive. Or > you could directly store the name of the lastly archived WAL segment marked > as .done in let's say archive_status/last_archived. An entry for that in > the control file does not seem the right place as a node may not have > archive_mode enabled that's why I am not mentioning it. The ways that the archiver process can communicate with the rest of the system are limited, for the sake of robustness. Writing to the control file is definitely not OK. I think using the stats collector is OK for this, but we'll have to arrange it so that the walsender doesn't block on it, and should probably not force new stat file so often. A 5-10 seconds old stats file would be perfectly fine for this purpose. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers