MIME-Version: 1.0
From: Nikolay Samokhvalov <samokhvalov@gmail.com>
Date: Thu, 25 Oct 2018 02:57:18 -0400
Message-ID: 
 <CANNMO+KYuH3Gh7BZp=UGXpoos4tBR0AFgoONkqWBrokuJthEug@mail.gmail.com>
Subject: Using old master as new replica after clean switchover
To: pgsql-docs@lists.postgresql.org, pgsql-hackers@postgresql.org
Content-Type: multipart/mixed; boundary="000000000000cefd490579081d29"
Precedence: bulk

--000000000000cefd490579081d29
Content-Type: multipart/alternative; boundary="000000000000cefd450579081d27"

--000000000000cefd450579081d27
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Currently, the documentation explicitly states, that after failover, the
old master must be recreated from scratch, or pg_rewind should be used
(requiring wal_log_hints to be on, which is off by default):

> The former standby is now the primary, but the former primary is down and
might stay down. To return to normal operation, a standby server must be
recreated, either on the former primary system when it comes up, or on a
third, possibly new, system. The pg_rewind utility can be used to speed up
this process on large clusters.

My research shows that some people already rely on the following when
planned failover (aka switchover) procedure, doing it in production:

 1) shutdown the current master
 2) ensure that the "master candidate" replica has received all WAL data
including shutdown checkpoint from the old master
 3) promote the master candidate to make it new master
 4) configure recovery.conf on the old master node, while it's inactive
 5) start the old master node as a new replica following the new master.

It looks to me now, that if no steps missed in the procedure, this approach
is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe
not really always =E2=80=93 people who know details better will correct me =
here
maybe). Am I right? Or I'm missing some risks here?

Two changes were made in 9.3 which allowed this approach in general [1]
[2]. Also, I see from the code [3] that during shutdown process, the
walsenders are the last who are stopped, so allow replicas to get the
shutdown checkpoint information.

Is this approach considered as safe now?

if so, let's add it to the documentation, making it official. The patch is
attached.

Links:
[0] 26.3 Failover
https://www.postgresql.org/docs/current/static/warm-standby-failover.html
[1] Support clean switchover
https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3D985bd7=
d49726c9f178558491d31a570d47340459
[2] Allow a streaming replication standby to follow a timeline switch
https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3Dabfd19=
2b1b5ba5216ac4b1f31dcd553106304b19
[3]
https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dblob;f=3Dsrc/back=
end/replication/walsender.c;hb=3DHEAD#l276


Regards,
Nik

--000000000000cefd450579081d27
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div di=
r=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"lt=
r"><div dir=3D"ltr"><div dir=3D"ltr">Currently, the documentation explicitl=
y states, that after failover, the old master must be recreated from scratc=
h, or pg_rewind should be used (requiring wal_log_hints to be on, which is =
off by default):</div><div dir=3D"ltr"><br></div><div dir=3D"ltr">&gt;=C2=
=A0The former standby is now the primary, but the former primary is down an=
d might stay down. To return to normal operation, a standby server must be =
recreated, either on the former primary system when it comes up, or on a th=
ird, possibly new, system. The pg_rewind utility can be used to speed up th=
is process on large clusters.<br><div><br></div><div>My research shows that=
 some people already rely on the following when planned failover (aka switc=
hover) procedure, doing it in production:</div><div><br></div><div>=C2=A01)=
 shutdown the current master</div><div>=C2=A02) ensure that the &quot;maste=
r candidate&quot; replica has received all WAL data including shutdown chec=
kpoint from the old master</div><div>=C2=A03) promote the master candidate =
to make it new master</div><div>=C2=A04) configure recovery.conf on the old=
 master node, while it&#39;s inactive</div><div>=C2=A05) start the old mast=
er node as a new replica following the new master.</div><div><br></div><div=
>It looks to me now, that if no steps missed in the procedure, this approac=
h is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe=
 not really always =E2=80=93 people who know details better will correct me=
 here maybe). Am I right? Or I&#39;m missing some risks here?</div><div><br=
></div><div>Two changes were made in 9.3 which allowed this approach in gen=
eral [1] [2]. Also, I see from the code [3] that during shutdown process, t=
he walsenders are the last who are stopped, so allow replicas to get the sh=
utdown checkpoint information.</div><div><br></div><div>Is this approach co=
nsidered as safe now?</div><div><br></div><div>if so, let&#39;s add it to t=
he documentation, making it official. The patch is attached.</div><div><br>=
</div><div>Links:</div><div>[0] 26.3 Failover=C2=A0<a href=3D"https://www.p=
ostgresql.org/docs/current/static/warm-standby-failover.html">https://www.p=
ostgresql.org/docs/current/static/warm-standby-failover.html</a></div><div>=
[1]=C2=A0Support clean switchover=C2=A0<a href=3D"https://git.postgresql.or=
g/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3D985bd7d49726c9f178558491d31a570=
d47340459">https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit=
;h=3D985bd7d49726c9f178558491d31a570d47340459</a></div><div>[2]=C2=A0Allow =
a streaming replication standby to follow a timeline switch=C2=A0<a href=3D=
"https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3Dabfd1=
92b1b5ba5216ac4b1f31dcd553106304b19">https://git.postgresql.org/gitweb/?p=
=3Dpostgresql.git;a=3Dcommit;h=3Dabfd192b1b5ba5216ac4b1f31dcd553106304b19</=
a></div><div>[3]=C2=A0<a href=3D"https://git.postgresql.org/gitweb/?p=3Dpos=
tgresql.git;a=3Dblob;f=3Dsrc/backend/replication/walsender.c;hb=3DHEAD#l276=
">https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dblob;f=3Dsrc/ba=
ckend/replication/walsender.c;hb=3DHEAD#l276</a></div><div><br></div><div><=
div><br class=3D"gmail-Apple-interchange-newline">Regards,</div><div>Nik</d=
iv><br class=3D"gmail-Apple-interchange-newline"></div></div></div></div></=
div></div></div></div></div></div></div></div>

--000000000000cefd450579081d27--

--000000000000cefd490579081d29
Content-Type: application/octet-stream; name="failover_doc.patch"
Content-Disposition: attachment; filename="failover_doc.patch"
Content-Transfer-Encoding: base64
Content-ID: <f_jno89vre0>
X-Attachment-Id: f_jno89vre0

ZGlmZiAtLWdpdCBhL2RvYy9zcmMvc2dtbC9oaWdoLWF2YWlsYWJpbGl0eS5zZ21sIGIvZG9jL3Ny
Yy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKaW5kZXggZmFmOGU3MTg1NC4uMDg4YzUxYzE0
NCAxMDA2NDQKLS0tIGEvZG9jL3NyYy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKKysrIGIv
ZG9jL3NyYy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKQEAgLTE0NTIsNyArMTQ1MiwxMiBA
QCBzeW5jaHJvbm91c19zdGFuZGJ5X25hbWVzID0gJ0FOWSAyIChzMSwgczIsIHMzKScKICAgICBt
dXN0IGJlIHJlY3JlYXRlZCwKICAgICBlaXRoZXIgb24gdGhlIGZvcm1lciBwcmltYXJ5IHN5c3Rl
bSB3aGVuIGl0IGNvbWVzIHVwLCBvciBvbiBhIHRoaXJkLAogICAgIHBvc3NpYmx5IG5ldywgc3lz
dGVtLiBUaGUgPHhyZWYgbGlua2VuZD0iYXBwLXBncmV3aW5kIi8+IHV0aWxpdHkgY2FuIGJlCi0g
ICAgdXNlZCB0byBzcGVlZCB1cCB0aGlzIHByb2Nlc3Mgb24gbGFyZ2UgY2x1c3RlcnMuCisgICAg
dXNlZCB0byBzcGVlZCB1cCB0aGlzIHByb2Nlc3Mgb24gbGFyZ2UgY2x1c3RlcnMuIEF0IHRoZSBz
YW1lIHRpbWUsCisgICAgaWYgYmVmb3JlIGZhaWxvdmVyLCB0aGUgb2xkIG1hc3RlciB3YXMgY2xl
YW5seSBzaHV0IGRvd24sIGFuZAorICAgIGFsbCBXQUwgZGF0YSBpbmNsdWRpbmcgc28tY2FsbGVk
IHNodXRkb3duIGNoZWNrcG9pbnQgd2FzIHJlY2VpdmVkCisgICAgYnkgdGhlIHJlcGxpY2EgYmVm
b3JlIGl0IHdhcyBwcm9tb3RlZCwgdGhlIG9sZCBtYXN0ZXIgY2FuIGJlIHN0YXJ0ZWQKKyAgICBh
cyBhIG5ldyByZXBsaWNhIGF0dGFjaGluZyB0byB0aGUgbmV3IG1hc3RlciB3aXRob3V0IHJlYnVp
bGRpbmcgb3IgdXNpbmcKKyAgICBwZ19yZXdpbmQuIEluIHRoaXMgY2FzZSwgb25seSBjb25maWd1
cmF0aW9uIG9mIHJlY292ZXJ5LmNvbmYgaXMgbmVlZGVkLgogICAgIE9uY2UgY29tcGxldGUsIHRo
ZSBwcmltYXJ5IGFuZCBzdGFuZGJ5IGNhbiBiZQogICAgIGNvbnNpZGVyZWQgdG8gaGF2ZSBzd2l0
Y2hlZCByb2xlcy4gU29tZSBwZW9wbGUgY2hvb3NlIHRvIHVzZSBhIHRoaXJkCiAgICAgc2VydmVy
IHRvIHByb3ZpZGUgYmFja3VwIGZvciB0aGUgbmV3IHByaW1hcnkgdW50aWwgdGhlIG5ldyBzdGFu
ZGJ5Cg==
--000000000000cefd490579081d29--