MIME-Version: 1.0
References: <AS1P190MB17014CF63D3EDDB0F4F7B66B900CA@AS1P190MB1701.EURP190.PROD.OUTLOOK.COM>
In-Reply-To: <AS1P190MB17014CF63D3EDDB0F4F7B66B900CA@AS1P190MB1701.EURP190.PROD.OUTLOOK.COM>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Mon, 8 Sep 2025 12:10:18 -0400
Message-ID: <CANzqJaDPo-tLSZuOLxTic1yx-=X7EuXS8X9f22aD92-D4RDrYw@mail.gmail.com>
Subject: Re: Fast switchover
To: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000001f39d6063e4c6d64"
Archived-At: <https://www.postgresql.org/message-id/CANzqJaDPo-tLSZuOLxTic1yx-%3DX7EuXS8X9f22aD92-D4RDrYw%40mail.gmail.com>
Precedence: bulk

--0000000000001f39d6063e4c6d64
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Sep 8, 2025 at 11:03=E2=80=AFAM legrand legrand <legrand_legrand@ho=
tmail.com>
wrote:

> Hello all the readers,
>
> For some projects we need a fast *manual* switchover to address Near Zero
> downtime maintenance
> (not speaking here about automated failover like those provided by HA
> tools, but just planned, controlled operations)
>
>
> Database Physical replication switchover itself:
> - initial replication (before switchover) should be synchronous or
> replication LAG should be controlled to prevent data loss.
> - Switchover duration seems not "compressible" under a few seconds
> (because of primary shutdown, promotion, new standby catch up, ...)
> - Application retry strategy (after disconnection) should be tuned using
> proper retry delay. Pooler or specific driver may help.
>

There will always be a few seconds delay while the applications reconnect.

Do the applications connect via a VIP?  That's simpler for the application.

This is what I do from the not-yet-new-primary:

   1. psql -h  $CurrentPrimary -c "ALTER SYSTEM SET
   synchronous_standby_names TO '*';"
   2. Wait a few seconds.
   3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but
   you get the idea
   4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to
   happen, no matter the solution.
   5. pg_ctl promote
   6. sudo ip add $VIP
   7. Replicate from new-primary to new-replica "at leisure".

No retry delay, since the application directly goes to the new server.
Steps 3-6 are in a script, and what pgpool does, except I do it.  #4 is by
far the slowest.  ssh authentication delay in #3 and #4 are nonexistent if
you have "pre-created" an ssh socket.

--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--0000000000001f39d6063e4c6d64
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Mon, Sep 8, 2025 at 11:03=E2=80=AFAM l=
egrand legrand &lt;<a href=3D"mailto:legrand_legrand@hotmail.com">legrand_l=
egrand@hotmail.com</a>&gt; wrote:</div><div class=3D"gmail_quote gmail_quot=
e_container"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=
=3D"msg5819112334024431460">


<div dir=3D"ltr">
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
Hello all the readers,</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
<br>
</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
For some projects we need a fast <b>manual</b>=C2=A0switchover to address N=
ear Zero downtime maintenance</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
(not speaking here about automated failover like those provided by HA tools=
, but just planned, controlled operations)</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
<br>
</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
<br>
</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
Database Physical replication switchover itself:</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
- initial replication (before switchover) should be synchronous or replicat=
ion LAG should be controlled to prevent data loss.</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
- Switchover duration seems not &quot;compressible&quot; under a few second=
s (because of primary shutdown, promotion, new standby catch up, ...)</div>
<div style=3D"font-family:Calibri,Helvetica,sans-serif;font-size:12pt;color=
:rgb(0,0,0)">
- Application retry strategy (after disconnection) should be tuned using pr=
oper retry delay. Pooler or specific driver may help.</div></div></div></bl=
ockquote></div><div><br clear=3D"all"></div><div>There will always be a few=
 seconds delay while the applications=C2=A0reconnect.</div><div><br></div><=
div>Do the applications connect via a VIP?=C2=A0 That&#39;s simpler for the=
 application.</div><div><br></div><div>This is what I do from the not-yet-n=
ew-primary:</div><div><ol><li>psql -h=C2=A0

$CurrentPrimary -c &quot;ALTER SYSTEM SET synchronous_standby_names TO &#39=
;*&#39;;&quot;</li><li>Wait a few seconds.</li><li>ssh $CurrentPrimary sudo=
 ip del $VIP # cmd is more complicated, but you get the idea</li><li>ssh $C=
urrentPrimary pg_ctl stop -mfast # to kill connections, has to happen, no m=
atter the solution.</li><li>pg_ctl promote</li><li>sudo ip add $VIP</li><li=
>Replicate from new-primary to new-replica &quot;at leisure&quot;.</li></ol=
></div><div>No retry delay, since the application directly goes to the new =
server.</div><div>Steps 3-6 are in a script, and what pgpool does, except I=
 do it.=C2=A0 #4 is by far the slowest.=C2=A0 ssh authentication delay in #=
3 and #4 are=C2=A0nonexistent=C2=A0if you have &quot;pre-created&quot; an s=
sh socket.</div><div><br></div><span class=3D"gmail_signature_prefix">-- </=
span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr">Death =
to &lt;Redacted&gt;, and butter sauce.<div>Don&#39;t boil me, I&#39;m still=
 alive.<br><div><div>&lt;Redacted&gt; lobster!</div></div></div></div></div=
></div>

--0000000000001f39d6063e4c6d64--