Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uveRt-00Gmgv-E3 for pgsql-general@arkaria.postgresql.org; Mon, 08 Sep 2025 16:10:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uveRs-005ugo-Cm for pgsql-general@arkaria.postgresql.org; Mon, 08 Sep 2025 16:10:32 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uveRr-005ugb-UN for pgsql-general@lists.postgresql.org; Mon, 08 Sep 2025 16:10:32 +0000 Received: from mail-oa1-x34.google.com ([2001:4860:4864:20::34]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uveRq-001BSW-1i for pgsql-general@lists.postgresql.org; Mon, 08 Sep 2025 16:10:31 +0000 Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-30ccea8f43cso4136410fac.1 for ; Mon, 08 Sep 2025 09:10:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757347829; x=1757952629; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=3aqFdfSNeayKCO2EnmDxcrjfBLAYOp9JmfN95mqUIho=; b=b4Cg9BRkhFpOqk4fQEzuOja/Zg3AB8T+tt4uM8OW3NgCyd1fTsRfK5A7JS+OzfFWLs OcfASjWjab3hmMTI/3Ku4So4D2wC+fo9wUaNN19LPqiWkZMaJ4V3N938kAieEF//xZB/ u8za36wFcihgnYslbeZCU8FzgzYTrRUxlbxahQUMhccHxjgq4rp8zCmnK7+hUNJK5Hpn jH7sCYd1lGPC2oCiBxNP4Fb4+v9/EaMcKYBB7zQOSLT2jHVGptnzk0F+sW6mwbjbq6Pk rYIJWwst1MBPdiRoIoN1OMx8So95BqquSFWWpR2KwOt9NgO1YKX/NxbKWVGQjS+d8E3C WF9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757347829; x=1757952629; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3aqFdfSNeayKCO2EnmDxcrjfBLAYOp9JmfN95mqUIho=; b=vSxgutCd3DulTxhd3AvcQdR3HJD6q2B3MIEyfAZIoedFSu4t1ts7DwItVYHBVjeMeV BTZCOiAj15wUdrcuhGos8mh+DkO4YdQClQqM2JMa9oB/87vmDf3vhg61IWQ5oK2nRowQ RGL/gDw1DIXxHWNKGhddn7W5DXGwN43gcSPSLKsjWlRuJD2rWiXi2lcdEnRKG5Pz3NU0 h10QDdlsEpsxH+DfgzRZ6y0N2n6dgB6qrxMo9kYk0EQ/cyepMxTkTFkseixHhjvLwvOa KKtnc6d64v98M6mV/pHjtLjLD+2NdrwcbjQXVXeKAgzQB3ZID5KLLpDreOgVEu9XzhcI /Jug== X-Gm-Message-State: AOJu0Yy52bsm0AWhVmcVO914oPXU1FuQmgtorzBJUl/HlTEHKYcttiZu g7gjC/lyJCwR0+VVUXusbZeYOVXvFa+WJAyIkQ/fL0F97Rv2QTS5WH3gQan+9tSbG3D/VXpeWv6 VKVHqjq3zwJmZZaedCKqWdb63lhEuuPbPSl+z X-Gm-Gg: ASbGnctL9VLzaYh3dxVOODnel7G2vKhqVNHBLpX757s/vCvNMy6PmuAgWLXCsXFArdn 4I+UJ32TiXp5IpX0Iiv+qxrY39VGbdaBncMQmyPtkB3Mgz4FZNMpeRlKDxOsbCCdhvgOVCgvh54 1aQG6hlq0Bu0F752N6mTwug6Ra7FmRxqLOHFbCEjz4wD/a/g8xlfC0O4HlvOxoJKchyfqA5UIxz jKByXQFyoaJqq0Nwy8= X-Google-Smtp-Source: AGHT+IFLUAfpGlbrvcwHR7FFUIUeNFLFXNUPmyAQGqgDTNqYaXrudzMIKWwkOj/DKfHIRq1vDaUBpiu45qPSJxktR30= X-Received: by 2002:a05:6870:1588:b0:318:70e5:3ce with SMTP id 586e51a60fabf-32266e84e00mr4241589fac.21.1757347829461; Mon, 08 Sep 2025 09:10:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Mon, 8 Sep 2025 12:10:18 -0400 X-Gm-Features: Ac12FXx5iGujvS6SwfL1NWB5zq3Y1w-1Bw32wVYZn5GuuCMG80HoUsJ86GxzCmY Message-ID: Subject: Re: Fast switchover To: "pgsql-general@lists.postgresql.org" Content-Type: multipart/alternative; boundary="0000000000001f39d6063e4c6d64" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000001f39d6063e4c6d64 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Sep 8, 2025 at 11:03=E2=80=AFAM legrand legrand wrote: > Hello all the readers, > > For some projects we need a fast *manual* switchover to address Near Zero > downtime maintenance > (not speaking here about automated failover like those provided by HA > tools, but just planned, controlled operations) > > > Database Physical replication switchover itself: > - initial replication (before switchover) should be synchronous or > replication LAG should be controlled to prevent data loss. > - Switchover duration seems not "compressible" under a few seconds > (because of primary shutdown, promotion, new standby catch up, ...) > - Application retry strategy (after disconnection) should be tuned using > proper retry delay. Pooler or specific driver may help. > There will always be a few seconds delay while the applications reconnect. Do the applications connect via a VIP? That's simpler for the application. This is what I do from the not-yet-new-primary: 1. psql -h $CurrentPrimary -c "ALTER SYSTEM SET synchronous_standby_names TO '*';" 2. Wait a few seconds. 3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but you get the idea 4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to happen, no matter the solution. 5. pg_ctl promote 6. sudo ip add $VIP 7. Replicate from new-primary to new-replica "at leisure". No retry delay, since the application directly goes to the new server. Steps 3-6 are in a script, and what pgpool does, except I do it. #4 is by far the slowest. ssh authentication delay in #3 and #4 are nonexistent if you have "pre-created" an ssh socket. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --0000000000001f39d6063e4c6d64 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Sep 8, 2025 at 11:03=E2=80=AFAM l= egrand legrand <legrand_l= egrand@hotmail.com> wrote:
Hello all the readers,

For some projects we need a fast manual=C2=A0switchover to address N= ear Zero downtime maintenance
(not speaking here about automated failover like those provided by HA tools= , but just planned, controlled operations)


Database Physical replication switchover itself:
- initial replication (before switchover) should be synchronous or replicat= ion LAG should be controlled to prevent data loss.
- Switchover duration seems not "compressible" under a few second= s (because of primary shutdown, promotion, new standby catch up, ...)
- Application retry strategy (after disconnection) should be tuned using pr= oper retry delay. Pooler or specific driver may help.

There will always be a few= seconds delay while the applications=C2=A0reconnect.

<= div>Do the applications connect via a VIP?=C2=A0 That's simpler for the= application.

This is what I do from the not-yet-n= ew-primary:
  1. psql -h=C2=A0 $CurrentPrimary -c "ALTER SYSTEM SET synchronous_standby_names TO '= ;*';"
  2. Wait a few seconds.
  3. ssh $CurrentPrimary sudo= ip del $VIP # cmd is more complicated, but you get the idea
  4. ssh $C= urrentPrimary pg_ctl stop -mfast # to kill connections, has to happen, no m= atter the solution.
  5. pg_ctl promote
  6. sudo ip add $VIP
  7. Replicate from new-primary to new-replica "at leisure".
No retry delay, since the application directly goes to the new = server.
Steps 3-6 are in a script, and what pgpool does, except I= do it.=C2=A0 #4 is by far the slowest.=C2=A0 ssh authentication delay in #= 3 and #4 are=C2=A0nonexistent=C2=A0if you have "pre-created" an s= sh socket.

--
Death = to <Redacted>, and butter sauce.
Don't boil me, I'm still= alive.
<Redacted> lobster!
--0000000000001f39d6063e4c6d64--