Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.89) (envelope-from ) id 1gFZaO-0001aj-D3 for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Oct 2018 06:57:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1gFZaL-0001h9-5v for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Oct 2018 06:57:37 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.89) (envelope-from ) id 1gFZaK-0001h2-TH for pgsql-hackers@lists.postgresql.org; Thu, 25 Oct 2018 06:57:37 +0000 Received: from mail-wm1-x342.google.com ([2a00:1450:4864:20::342]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gFZaH-0005WK-9M for pgsql-hackers@postgresql.org; Thu, 25 Oct 2018 06:57:36 +0000 Received: by mail-wm1-x342.google.com with SMTP id w186-v6so317361wmf.0 for ; Wed, 24 Oct 2018 23:57:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=OoC2GnPEQHne5zFmX6Ue5OchObc1Q9vk67+U7GET16Y=; b=sdB0Hw/WKFf4tJYs3rJMZ5ODik0oW72ntv2QSfCoSzNwVIaBWAhwmED25O0JawIlon IVccY8iJnr4WkiJeNkq86HrWBZxKFI5tpgVRPQzz08YT8qcEHcGzph72CWaquN/hMd9e UG5kNSzIoh64s3tBQplTxZFBaN/bsVrU+ocMJBvZxPemn2xnno7UwqzTti2lTbn07rPp aaUVHWfCSaOPOZuMNFcZxEThrkB9kUZnvtu4ucPMUKrZdO780I9SJUq9X446Nbx+ScA/ eNw93PGjo47HSpkLELMJ7hZc1KugPRq24r5o5HHaQ80uqSH9nIkImV57qfK/+k+cdGFl y8PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=OoC2GnPEQHne5zFmX6Ue5OchObc1Q9vk67+U7GET16Y=; b=XoFOLQC9xrEvCcCWd07zhlKQC1mMO+KUHMS+nf/msu08+B8k0Q+M2E0i2eo4UhuNXo cb4QyZmxKn4cj3eV6A8ifCSrVWpcgrgpPpaevvHlSix/+SRQxy/JaMmnAIvGAhuaCpdf CgeghlGE/OicWRuOtdWs4mqhWTVaIvblgrB+Ma4qAGCFerEtSkwG0kqDBYY9cunEBQV2 AMMUOIrFT5K5O77MPgz7WLlNgETscV6zxX1UB/LelSooFPLrj4A0f9ly4z00OpIB92Ij 3K2OYBPYajOMjW6A+fasrFMKnP/ueqz7b0ujAFrYQlzqe9mm4LxfacDTFZG8hwetRIsx M25g== X-Gm-Message-State: AGRZ1gKzlDmag4SWqTczC9sTjfmpKT+6sDUJ8VE35whfCMWPmC9X8U4G JZ0n9I2Pqs6zlKg7fUVzpoiJPKmSMm62uoR7DMFh3l/1 X-Google-Smtp-Source: AJdET5fLwMRZYjFeJNV8V2UOJBByGUBagokBu1zQDMTuMQ8DUUyO++NnIt2PPAlA2OpjR9ymadLPUD9gzfwM4dZ1i4k= X-Received: by 2002:a1c:5e49:: with SMTP id s70-v6mr462648wmb.131.1540450650346; Wed, 24 Oct 2018 23:57:30 -0700 (PDT) MIME-Version: 1.0 From: Nikolay Samokhvalov Date: Thu, 25 Oct 2018 02:57:18 -0400 Message-ID: Subject: Using old master as new replica after clean switchover To: pgsql-docs@lists.postgresql.org, pgsql-hackers@postgresql.org Content-Type: multipart/mixed; boundary="000000000000cefd490579081d29" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --000000000000cefd490579081d29 Content-Type: multipart/alternative; boundary="000000000000cefd450579081d27" --000000000000cefd450579081d27 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Currently, the documentation explicitly states, that after failover, the old master must be recreated from scratch, or pg_rewind should be used (requiring wal_log_hints to be on, which is off by default): > The former standby is now the primary, but the former primary is down and might stay down. To return to normal operation, a standby server must be recreated, either on the former primary system when it comes up, or on a third, possibly new, system. The pg_rewind utility can be used to speed up this process on large clusters. My research shows that some people already rely on the following when planned failover (aka switchover) procedure, doing it in production: 1) shutdown the current master 2) ensure that the "master candidate" replica has received all WAL data including shutdown checkpoint from the old master 3) promote the master candidate to make it new master 4) configure recovery.conf on the old master node, while it's inactive 5) start the old master node as a new replica following the new master. It looks to me now, that if no steps missed in the procedure, this approach is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe not really always =E2=80=93 people who know details better will correct me = here maybe). Am I right? Or I'm missing some risks here? Two changes were made in 9.3 which allowed this approach in general [1] [2]. Also, I see from the code [3] that during shutdown process, the walsenders are the last who are stopped, so allow replicas to get the shutdown checkpoint information. Is this approach considered as safe now? if so, let's add it to the documentation, making it official. The patch is attached. Links: [0] 26.3 Failover https://www.postgresql.org/docs/current/static/warm-standby-failover.html [1] Support clean switchover https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3D985bd7= d49726c9f178558491d31a570d47340459 [2] Allow a streaming replication standby to follow a timeline switch https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3Dabfd19= 2b1b5ba5216ac4b1f31dcd553106304b19 [3] https://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dblob;f=3Dsrc/back= end/replication/walsender.c;hb=3DHEAD#l276 Regards, Nik --000000000000cefd450579081d27 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Currently, the documentation explicitl= y states, that after failover, the old master must be recreated from scratc= h, or pg_rewind should be used (requiring wal_log_hints to be on, which is = off by default):

>=C2= =A0The former standby is now the primary, but the former primary is down an= d might stay down. To return to normal operation, a standby server must be = recreated, either on the former primary system when it comes up, or on a th= ird, possibly new, system. The pg_rewind utility can be used to speed up th= is process on large clusters.

My research shows that= some people already rely on the following when planned failover (aka switc= hover) procedure, doing it in production:

=C2=A01)= shutdown the current master
=C2=A02) ensure that the "maste= r candidate" replica has received all WAL data including shutdown chec= kpoint from the old master
=C2=A03) promote the master candidate = to make it new master
=C2=A04) configure recovery.conf on the old= master node, while it's inactive
=C2=A05) start the old mast= er node as a new replica following the new master.

It looks to me now, that if no steps missed in the procedure, this approac= h is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe= not really always =E2=80=93 people who know details better will correct me= here maybe). Am I right? Or I'm missing some risks here?
Two changes were made in 9.3 which allowed this approach in gen= eral [1] [2]. Also, I see from the code [3] that during shutdown process, t= he walsenders are the last who are stopped, so allow replicas to get the sh= utdown checkpoint information.

Is this approach co= nsidered as safe now?

if so, let's add it to t= he documentation, making it official. The patch is attached.

=
Links:
[2]=C2=A0Allow = a streaming replication standby to follow a timeline switch=C2=A0https://git.postgresql.org/gitweb/?p= =3Dpostgresql.git;a=3Dcommit;h=3Dabfd192b1b5ba5216ac4b1f31dcd553106304b19

<= div>
Regards,
Nik
--000000000000cefd450579081d27-- --000000000000cefd490579081d29 Content-Type: application/octet-stream; name="failover_doc.patch" Content-Disposition: attachment; filename="failover_doc.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jno89vre0 ZGlmZiAtLWdpdCBhL2RvYy9zcmMvc2dtbC9oaWdoLWF2YWlsYWJpbGl0eS5zZ21sIGIvZG9jL3Ny Yy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKaW5kZXggZmFmOGU3MTg1NC4uMDg4YzUxYzE0 NCAxMDA2NDQKLS0tIGEvZG9jL3NyYy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKKysrIGIv ZG9jL3NyYy9zZ21sL2hpZ2gtYXZhaWxhYmlsaXR5LnNnbWwKQEAgLTE0NTIsNyArMTQ1MiwxMiBA QCBzeW5jaHJvbm91c19zdGFuZGJ5X25hbWVzID0gJ0FOWSAyIChzMSwgczIsIHMzKScKICAgICBt dXN0IGJlIHJlY3JlYXRlZCwKICAgICBlaXRoZXIgb24gdGhlIGZvcm1lciBwcmltYXJ5IHN5c3Rl bSB3aGVuIGl0IGNvbWVzIHVwLCBvciBvbiBhIHRoaXJkLAogICAgIHBvc3NpYmx5IG5ldywgc3lz dGVtLiBUaGUgPHhyZWYgbGlua2VuZD0iYXBwLXBncmV3aW5kIi8+IHV0aWxpdHkgY2FuIGJlCi0g ICAgdXNlZCB0byBzcGVlZCB1cCB0aGlzIHByb2Nlc3Mgb24gbGFyZ2UgY2x1c3RlcnMuCisgICAg dXNlZCB0byBzcGVlZCB1cCB0aGlzIHByb2Nlc3Mgb24gbGFyZ2UgY2x1c3RlcnMuIEF0IHRoZSBz YW1lIHRpbWUsCisgICAgaWYgYmVmb3JlIGZhaWxvdmVyLCB0aGUgb2xkIG1hc3RlciB3YXMgY2xl YW5seSBzaHV0IGRvd24sIGFuZAorICAgIGFsbCBXQUwgZGF0YSBpbmNsdWRpbmcgc28tY2FsbGVk IHNodXRkb3duIGNoZWNrcG9pbnQgd2FzIHJlY2VpdmVkCisgICAgYnkgdGhlIHJlcGxpY2EgYmVm b3JlIGl0IHdhcyBwcm9tb3RlZCwgdGhlIG9sZCBtYXN0ZXIgY2FuIGJlIHN0YXJ0ZWQKKyAgICBh cyBhIG5ldyByZXBsaWNhIGF0dGFjaGluZyB0byB0aGUgbmV3IG1hc3RlciB3aXRob3V0IHJlYnVp bGRpbmcgb3IgdXNpbmcKKyAgICBwZ19yZXdpbmQuIEluIHRoaXMgY2FzZSwgb25seSBjb25maWd1 cmF0aW9uIG9mIHJlY292ZXJ5LmNvbmYgaXMgbmVlZGVkLgogICAgIE9uY2UgY29tcGxldGUsIHRo ZSBwcmltYXJ5IGFuZCBzdGFuZGJ5IGNhbiBiZQogICAgIGNvbnNpZGVyZWQgdG8gaGF2ZSBzd2l0 Y2hlZCByb2xlcy4gU29tZSBwZW9wbGUgY2hvb3NlIHRvIHVzZSBhIHRoaXJkCiAgICAgc2VydmVy IHRvIHByb3ZpZGUgYmFja3VwIGZvciB0aGUgbmV3IHByaW1hcnkgdW50aWwgdGhlIG5ldyBzdGFu ZGJ5Cg== --000000000000cefd490579081d29--