Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.89) (envelope-from ) id 1gFbkK-0001cb-L5 for pgsql-docs@arkaria.postgresql.org; Thu, 25 Oct 2018 09:16:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1gFbkI-0006Ob-Ny for pgsql-docs@arkaria.postgresql.org; Thu, 25 Oct 2018 09:16:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.89) (envelope-from ) id 1gFbkI-0006OT-DQ for pgsql-docs@lists.postgresql.org; Thu, 25 Oct 2018 09:16:02 +0000 Received: from mimolette.dalibo.net ([212.85.157.144] helo=mail.dalibo.com) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.89) (envelope-from ) id 1gFbkE-0007IS-GA for pgsql-docs@lists.postgresql.org; Thu, 25 Oct 2018 09:16:01 +0000 Received: from firost (abordeaux-656-1-244-95.w90-38.abo.wanadoo.fr [90.38.160.95]) by mail.dalibo.com (Postfix) with ESMTPSA id 30A7A2C0683; Thu, 25 Oct 2018 11:15:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dalibo.com; s=a; t=1540458955; bh=CF3ykVPh4iSp8ZMR2po2DBUPKr24ktvJ42pDhKpneOQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=pksVO0sk7gv3QEspU5RmNspeGM47B/OSbrePBU2af2Z5BUd2BEV1gZtQyqYYwuQI3 F4Zr8iEMKFXjf6uKr8OPzhGuD2UClsieLjU65wM69Cu++zJkjpG+AJS5l4J2ISU27d iullHSSX93/c3i5ROJoyEVwbM7u2Jm+qQiF/UUSU= Date: Thu, 25 Oct 2018 11:15:51 +0200 From: Jehan-Guillaume de Rorthais To: Nikolay Samokhvalov Cc: pgsql-docs@lists.postgresql.org, pgsql-hackers@postgresql.org Subject: Re: Using old master as new replica after clean switchover Message-ID: <20181025111551.620c6460@firost> In-Reply-To: References: Organization: Dalibo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk On Thu, 25 Oct 2018 02:57:18 -0400 Nikolay Samokhvalov wrote: ... > My research shows that some people already rely on the following when > planned failover (aka switchover) procedure, doing it in production: >=20 > 1) shutdown the current master > 2) ensure that the "master candidate" replica has received all WAL data > including shutdown checkpoint from the old master > 3) promote the master candidate to make it new master > 4) configure recovery.conf on the old master node, while it's inactive > 5) start the old master node as a new replica following the new master. Indeed. > It looks to me now, that if no steps missed in the procedure, this approa= ch > is eligible for Postgres versions 9.3+ (for older versions like 9.3 maybe > not really always =E2=80=93 people who know details better will correct m= e here > maybe). Am I right? Or I'm missing some risks here? As far as I know, this is correct. > Two changes were made in 9.3 which allowed this approach in general [1] > [2]. Also, I see from the code [3] that during shutdown process, the > walsenders are the last who are stopped, so allow replicas to get the > shutdown checkpoint information. I had the same conclusions when I was studying controlled failover some yea= rs ago to implement it PAF project (allowing controlled switchover in one comm= and). Here is a discussions around switchover taking place three years ago on Pacemaker mailing list: https://lists.clusterlabs.org/pipermail/users/2016-October/011568.html > Is this approach considered as safe now? Considering above points, I do think so. The only additional nice step would be to be able to run some more safety t= ests AFTER the switchover process on te old master. The only way I can think of would be to run pg_rewind even if it doesn't do much. > if so, let's add it to the documentation, making it official. The patch is > attached. I suppose we should add the technical steps in a sample procedure?