MIME-Version: 1.0
In-Reply-To: <5535FE71.1010905@iki.fi>
References: <548AF1CB.80702@vmware.com>
 <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name>
 <549083D6.1000301@vmware.com>
 <CAHGQGwGJzp-QS7BODiv1uc291gAKtjzzCPb_nzUTxYKJhLsUCA@mail.gmail.com>
 <54949108.3030109@vmware.com>
 <CAEyp7J9Hy8Q__FbGeR5skjk7d0dvLC+KLXB3JUuWrXXdJ5O+Wg@mail.gmail.com>
 <552FA38F.9060005@iki.fi>
 <CAB7nPqQE179yogtg+nKvdwt9KROxTyt-EjumKOMbuXQtea5r3w@mail.gmail.com>
 <5535FE71.1010905@iki.fi>
Date: Tue, 21 Apr 2015 18:04:59 +0900
Message-ID: 
 <CAB7nPqS=V=LF-JZsv9eGF84joQ3jxugt4gkOK6kD1CiB98_vWg@mail.gmail.com>
Subject: Re: Streaming replication and WAL archive interactions
From: Michael Paquier <michael.paquier@gmail.com>
To: hlinnaka@iki.fi
Cc: Venkata Balaji N <nag1010@gmail.com>, Andres Freund <andres@anarazel.de>,
 Fujii Masao <masao.fujii@gmail.com>, Borodin Vladimir <root@simply.name>,
 PostgreSQL-development <pgsql-hackers@postgresql.org>
Content-Type: multipart/alternative; boundary=001a113aa7d24f0f73051438579c
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org

--001a113aa7d24f0f73051438579c
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> On 04/21/2015 09:53 AM, Michael Paquier wrote:
>
>> On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:
>>
>>> Oh, hang on, that's not necessarily true. On promotion, the standby
>>>
>> archives
>>
>>> the last, partial WAL segment from the old timeline. That's just wrong
>>> (http://www.postgresql.org/message-id/52FCD37C.3070806@vmware.com), and
>>> in
>>> fact I somehow thought I changed that already, but apparently not. So
>>>
>> let's
>>
>>> stop doing that.
>>>
>>
>> Er. Are you planning to prevent the standby from archiving the last
>> partial
>> segment from the old timeline at promotion?
>>
>
> Yes.
>
>  I thought from previous discussions that we should do it as master
>> (be it crashed, burned, burried or dead) may not have the occasion to
>> do it. By preventing its archiving you close the door to the case
>> where master did not have the occasion to archive it.
>>
>
> The current situation is a mess:
>
> 1. Even though we archive the last segment in the standby, there is no
> guarantee that the master had archived all the previous segments already.
>
2. If the master is not totally dead, it might try to archive the same file
> with more WAL in it, at the same time or just afterwards, or even just
> before the standby has completed promotion. Which copy do you keep in the
> archive? Having to deal with that makes the archive_command more
> complicated.
>
> Note that even though we don't archive the partial last segment on the
> previous timeline, the same WAL is copied to the first segment on the new
> timeline. So the WAL isn't lost.
>

But if the failed master has archived those segments safely, we may need
them, no? I am not sure we can ignore a user who would want to do a PITR
with recovery_target_timeline pointing to the one of the failed master.


>
>  People may be surprised that a base backup taken from a node that has
>> archive_mode = on set (that's the case in a very large number of cases)
>> will not be able to work as-is as node startup will fail as follows:
>> FATAL:  archive_mode='on' cannot be used in archive recovery
>> HINT:  Use 'shared' or 'always' mode instead.
>>
>
> Hmm, good point.
>
>  One idea would be to simply ignore the fact that archive_mode = on on
>> nodes
>> in recovery instead of dropping an error. Note that I like the fact that
>> it
>> drops an error as that's clear, I just point the fact that people may be
>> surprised that base backups are not working anymore now in this case.
>>
>
> By "ignore", what behaviour do you mean? Would "on" be equivalent to
> "shared", "always", or something else?
>

I meant something backward-compatible, with files marked as .done when they
are finished replaying... But now my words *are* weird as on != off ;)

Or we could keep the current behaviour with archive_mode=on (except for the
> last segment thing, which is just wrong), where the standby only archives
> the new timeline, and nothing from the previous timelines.
>

I guess this would solve the issue here then, which is not a bad thing in
itself:
http://www.postgresql.org/message-id/20140918180734.361021e1@erg
We would need to check if the situation improves with the 'always' mode btw.


> Are the use cases where you'd want that, rather than the new "shared"
> mode? I wanted to keep the 'on' mode for backwards-compatibility, but if
> that causes more problems, it might be better to just remove it and force
> the admin to choose what kind of a setup he has, with "shared" or "always".
>

The 'on' mode is still useful IMO to get a behavior a maximum close to what
previous releases did.
Regards,
-- 
Michael

--001a113aa7d24f0f73051438579c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Tue, Apr 21, 2015 at 4:38 PM, Heikki Linnakangas <span dir=3D"ltr">&=
lt;<a href=3D"mailto:hlinnaka@iki.fi" target=3D"_blank">hlinnaka@iki.fi</a>=
&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><sp=
an class=3D"">On 04/21/2015 09:53 AM, Michael Paquier wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
Oh, hang on, that&#39;s not necessarily true. On promotion, the standby<br>
</blockquote>
archives<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
the last, partial WAL segment from the old timeline. That&#39;s just wrong<=
br>
(<a href=3D"http://www.postgresql.org/message-id/52FCD37C.3070806@vmware.co=
m" target=3D"_blank">http://www.postgresql.org/message-id/52FCD37C.3070806@=
vmware.com</a>), and in<br>
fact I somehow thought I changed that already, but apparently not. So<br>
</blockquote>
let&#39;s<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
stop doing that.<br>
</blockquote>
<br>
Er. Are you planning to prevent the standby from archiving the last partial=
<br>
segment from the old timeline at promotion?<br>
</blockquote>
<br></span>
Yes.<span class=3D""><br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
I thought from previous discussions that we should do it as master<br>
(be it crashed, burned, burried or dead) may not have the occasion to<br>
do it. By preventing its archiving you close the door to the case<br>
where master did not have the occasion to archive it.<br>
</blockquote>
<br></span>
The current situation is a mess:<br>
<br>
1. Even though we archive the last segment in the standby, there is no guar=
antee that the master had archived all the previous segments already.<br></=
blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.=
8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
2. If the master is not totally dead, it might try to archive the same file=
 with more WAL in it, at the same time or just afterwards, or even just bef=
ore the standby has completed promotion. Which copy do you keep in the arch=
ive? Having to deal with that makes the archive_command more complicated.<b=
r>
<br>
Note that even though we don&#39;t archive the partial last segment on the =
previous timeline, the same WAL is copied to the first segment on the new t=
imeline. So the WAL isn&#39;t lost.<span class=3D""><br></span></blockquote=
><div><br></div><div>But if the failed master has archived those segments s=
afely, we may need them, no? I am not sure=20
we can ignore a user who would want to do a PITR with recovery_target_timel=
ine pointing to the one of the failed master.<br>=A0<br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex"><span class=3D"">
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
People may be surprised that a base backup taken from a node that has<br>
archive_mode =3D on set (that&#39;s the case in a very large number of case=
s)<br>
will not be able to work as-is as node startup will fail as follows:<br>
FATAL:=A0 archive_mode=3D&#39;on&#39; cannot be used in archive recovery<br=
>
HINT:=A0 Use &#39;shared&#39; or &#39;always&#39; mode instead.<br>
</blockquote>
<br></span>
Hmm, good point.<span class=3D""><br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
One idea would be to simply ignore the fact that archive_mode =3D on on nod=
es<br>
in recovery instead of dropping an error. Note that I like the fact that it=
<br>
drops an error as that&#39;s clear, I just point the fact that people may b=
e<br>
surprised that base backups are not working anymore now in this case.<br>
</blockquote>
<br></span>
By &quot;ignore&quot;, what behaviour do you mean? Would &quot;on&quot; be =
equivalent to &quot;shared&quot;, &quot;always&quot;, or something else?<br=
></blockquote><div><br></div><div>I meant something backward-compatible, wi=
th files marked as .done when they are finished replaying... But now my wor=
ds *are* weird as on !=3D off ;)<br><br></div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex">
Or we could keep the current behaviour with archive_mode=3Don (except for t=
he last segment thing, which is just wrong), where the standby only archive=
s the new timeline, and nothing from the previous timelines.<br></blockquot=
e><div><br></div><div>I guess this would solve the issue here then, which i=
s not a bad thing in itself:<br><a href=3D"http://www.postgresql.org/messag=
e-id/20140918180734.361021e1@erg">http://www.postgresql.org/message-id/2014=
0918180734.361021e1@erg</a><br></div><div>We would need to check if the sit=
uation improves with the &#39;always&#39; mode btw.<br></div><div>=A0</div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">Are the use cases where y=
ou&#39;d want that, rather than the new &quot;shared&quot; mode? I wanted t=
o keep the &#39;on&#39; mode for backwards-compatibility, but if that cause=
s more problems, it might be better to just remove it and force the admin t=
o choose what kind of a setup he has, with &quot;shared&quot; or &quot;alwa=
ys&quot;.<span class=3D""><br></span></blockquote><div><br></div><div>The &=
#39;on&#39; mode is still useful IMO to get a behavior a maximum close to w=
hat previous releases did.<br></div><div>Regards,<br></div></div>-- <br><di=
v class=3D"gmail_signature">Michael<br></div>
</div></div>

--001a113aa7d24f0f73051438579c--