Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1YkU7H-0003T8-Bn for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 09:05:15 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1YkU7G-0004Jk-C5 for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 09:05:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1YkU7E-0004Ib-UG for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 09:05:13 +0000 Received: from mail-qc0-x236.google.com ([2607:f8b0:400d:c01::236]) by magus.postgresql.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1YkU73-0004Qy-GQ for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 09:05:11 +0000 Received: by qcrf4 with SMTP id f4so72543912qcr.0 for ; Tue, 21 Apr 2015 02:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=omMoVQUGxy5veHQHVIXKCG/TjgmiY6cvGHy4yJ5LUmY=; b=uMPdEhFq+OcKm4xAu9TAcHx8PzoQYWgQ03I5kK9oSMhhJi/md9VytnAogVa7QSKYQ+ Qi8sk80YbiaYwSoMUtO+qZRXd7eWNQhq6cnY8h16yqTHnTiGZ5my5sTn8kPKg/SHijVC 3tb7PvVDmDNPhJfw1gl5ABC/VouwII3wkh3ukoaYrbNNhIZP9osiJtNSRv1Hpeq2BnOv 4fUgPDp0SModwea3i1KrcEHTXt4nHo4sxIE7hnr8aVTlzWOUa2BC4eORk9s997x5f+Ah syPDG5Ld1HmMsNFNf/3yliFZ6mw/IIPv7PQD9maoauMjZrXWcvbGpOON8SsXUTF6CdwW IsLw== MIME-Version: 1.0 X-Received: by 10.140.42.130 with SMTP id c2mr22118610qga.94.1429607099074; Tue, 21 Apr 2015 02:04:59 -0700 (PDT) Received: by 10.140.97.69 with HTTP; Tue, 21 Apr 2015 02:04:59 -0700 (PDT) In-Reply-To: <5535FE71.1010905@iki.fi> References: <548AF1CB.80702@vmware.com> <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name> <549083D6.1000301@vmware.com> <54949108.3030109@vmware.com> <552FA38F.9060005@iki.fi> <5535FE71.1010905@iki.fi> Date: Tue, 21 Apr 2015 18:04:59 +0900 Message-ID: Subject: Re: Streaming replication and WAL archive interactions From: Michael Paquier To: hlinnaka@iki.fi Cc: Venkata Balaji N , Andres Freund , Fujii Masao , Borodin Vladimir , PostgreSQL-development Content-Type: multipart/alternative; boundary=001a113aa7d24f0f73051438579c X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org --001a113aa7d24f0f73051438579c Content-Type: text/plain; charset=ISO-8859-1 On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas wrote: > On 04/21/2015 09:53 AM, Michael Paquier wrote: > >> On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote: >> >>> Oh, hang on, that's not necessarily true. On promotion, the standby >>> >> archives >> >>> the last, partial WAL segment from the old timeline. That's just wrong >>> (http://www.postgresql.org/message-id/52FCD37C.3070806@vmware.com), and >>> in >>> fact I somehow thought I changed that already, but apparently not. So >>> >> let's >> >>> stop doing that. >>> >> >> Er. Are you planning to prevent the standby from archiving the last >> partial >> segment from the old timeline at promotion? >> > > Yes. > > I thought from previous discussions that we should do it as master >> (be it crashed, burned, burried or dead) may not have the occasion to >> do it. By preventing its archiving you close the door to the case >> where master did not have the occasion to archive it. >> > > The current situation is a mess: > > 1. Even though we archive the last segment in the standby, there is no > guarantee that the master had archived all the previous segments already. > 2. If the master is not totally dead, it might try to archive the same file > with more WAL in it, at the same time or just afterwards, or even just > before the standby has completed promotion. Which copy do you keep in the > archive? Having to deal with that makes the archive_command more > complicated. > > Note that even though we don't archive the partial last segment on the > previous timeline, the same WAL is copied to the first segment on the new > timeline. So the WAL isn't lost. > But if the failed master has archived those segments safely, we may need them, no? I am not sure we can ignore a user who would want to do a PITR with recovery_target_timeline pointing to the one of the failed master. > > People may be surprised that a base backup taken from a node that has >> archive_mode = on set (that's the case in a very large number of cases) >> will not be able to work as-is as node startup will fail as follows: >> FATAL: archive_mode='on' cannot be used in archive recovery >> HINT: Use 'shared' or 'always' mode instead. >> > > Hmm, good point. > > One idea would be to simply ignore the fact that archive_mode = on on >> nodes >> in recovery instead of dropping an error. Note that I like the fact that >> it >> drops an error as that's clear, I just point the fact that people may be >> surprised that base backups are not working anymore now in this case. >> > > By "ignore", what behaviour do you mean? Would "on" be equivalent to > "shared", "always", or something else? > I meant something backward-compatible, with files marked as .done when they are finished replaying... But now my words *are* weird as on != off ;) Or we could keep the current behaviour with archive_mode=on (except for the > last segment thing, which is just wrong), where the standby only archives > the new timeline, and nothing from the previous timelines. > I guess this would solve the issue here then, which is not a bad thing in itself: http://www.postgresql.org/message-id/20140918180734.361021e1@erg We would need to check if the situation improves with the 'always' mode btw. > Are the use cases where you'd want that, rather than the new "shared" > mode? I wanted to keep the 'on' mode for backwards-compatibility, but if > that causes more problems, it might be better to just remove it and force > the admin to choose what kind of a setup he has, with "shared" or "always". > The 'on' mode is still useful IMO to get a behavior a maximum close to what previous releases did. Regards, -- Michael --001a113aa7d24f0f73051438579c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas &= lt;hlinnaka@iki.fi= > wrote:
On 04/21/2015 09:53 AM, Michael Paquier wrote:
On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:
Oh, hang on, that's not necessarily true. On promotion, the standby
archives
the last, partial WAL segment from the old timeline. That's just wrong<= br> (http://www.postgresql.org/message-id/52FCD37C.3070806@= vmware.com), and in
fact I somehow thought I changed that already, but apparently not. So
let's
stop doing that.

Er. Are you planning to prevent the standby from archiving the last partial=
segment from the old timeline at promotion?

Yes.

I thought from previous discussions that we should do it as master
(be it crashed, burned, burried or dead) may not have the occasion to
do it. By preventing its archiving you close the door to the case
where master did not have the occasion to archive it.

The current situation is a mess:

1. Even though we archive the last segment in the standby, there is no guar= antee that the master had archived all the previous segments already.
2. If the master is not totally dead, it might try to archive the same file= with more WAL in it, at the same time or just afterwards, or even just bef= ore the standby has completed promotion. Which copy do you keep in the arch= ive? Having to deal with that makes the archive_command more complicated.
Note that even though we don't archive the partial last segment on the = previous timeline, the same WAL is copied to the first segment on the new t= imeline. So the WAL isn't lost.

But if the failed master has archived those segments s= afely, we may need them, no? I am not sure=20 we can ignore a user who would want to do a PITR with recovery_target_timel= ine pointing to the one of the failed master.
=A0

People may be surprised that a base backup taken from a node that has
archive_mode =3D on set (that's the case in a very large number of case= s)
will not be able to work as-is as node startup will fail as follows:
FATAL:=A0 archive_mode=3D'on' cannot be used in archive recovery HINT:=A0 Use 'shared' or 'always' mode instead.

Hmm, good point.

One idea would be to simply ignore the fact that archive_mode =3D on on nod= es
in recovery instead of dropping an error. Note that I like the fact that it=
drops an error as that's clear, I just point the fact that people may b= e
surprised that base backups are not working anymore now in this case.

By "ignore", what behaviour do you mean? Would "on" be = equivalent to "shared", "always", or something else?

I meant something backward-compatible, wi= th files marked as .done when they are finished replaying... But now my wor= ds *are* weird as on !=3D off ;)

Or we could keep the current behaviour with archive_mode=3Don (except for t= he last segment thing, which is just wrong), where the standby only archive= s the new timeline, and nothing from the previous timelines.

I guess this would solve the issue here then, which i= s not a bad thing in itself:
http://www.postgresql.org/message-id/2014= 0918180734.361021e1@erg
We would need to check if the sit= uation improves with the 'always' mode btw.
=A0
=
Are the use cases where y= ou'd want that, rather than the new "shared" mode? I wanted t= o keep the 'on' mode for backwards-compatibility, but if that cause= s more problems, it might be better to just remove it and force the admin t= o choose what kind of a setup he has, with "shared" or "alwa= ys".

The &= #39;on' mode is still useful IMO to get a behavior a maximum close to w= hat previous releases did.
Regards,
--
Michael
--001a113aa7d24f0f73051438579c--