Re: Fast switchover - Ron Johnson

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Ron Johnson <[email protected]>
To: [email protected] <[email protected]>
Subject: Re: Fast switchover
Date: Mon, 8 Sep 2025 12:48:12 -0400
Message-ID: <CANzqJaAbj3-LOnTc0YaxjYROEN0h_cHqdPLn3hSPGiqkxm6j1Q@mail.gmail.com> (raw)
In-Reply-To: <DBAPR03MB63585EC025A03D343FDFF5D5F10CA@DBAPR03MB6358.eurprd03.prod.outlook.com>
References: <AS1P190MB17014CF63D3EDDB0F4F7B66B900CA@AS1P190MB1701.EURP190.PROD.OUTLOOK.COM>
	<CANzqJaDPo-tLSZuOLxTic1yx-=X7EuXS8X9f22aD92-D4RDrYw@mail.gmail.com>
	<DBAPR03MB63585EC025A03D343FDFF5D5F10CA@DBAPR03MB6358.eurprd03.prod.outlook.com>

On Mon, Sep 8, 2025 at 12:37 PM Klaus Darilion <[email protected]>
wrote:

>
>
> *From:* Ron Johnson <[email protected]>
> *Sent:* Monday, September 8, 2025 6:10 PM
> *To:* [email protected]
> *Subject:* Re: Fast switchover
>
>
>
> On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <
> [email protected]> wrote:
>
> Hello all the readers,
>
>
>
> For some projects we need a fast *manual* switchover to address Near Zero
> downtime maintenance
>
> (not speaking here about automated failover like those provided by HA
> tools, but just planned, controlled operations)
>
>
>
>
>
> Database Physical replication switchover itself:
>
> - initial replication (before switchover) should be synchronous or
> replication LAG should be controlled to prevent data loss.
>
> - Switchover duration seems not "compressible" under a few seconds
> (because of primary shutdown, promotion, new standby catch up, ...)
>
> - Application retry strategy (after disconnection) should be tuned using
> proper retry delay. Pooler or specific driver may help.
>
>
> There will always be a few seconds delay while the applications reconnect.
>
>
>
> Do the applications connect via a VIP?  That's simpler for the application.
>
>
>
> This is what I do from the not-yet-new-primary:
>
>    1. psql -h  $CurrentPrimary -c "ALTER SYSTEM SET
>    synchronous_standby_names TO '*';"
>    2. Wait a few seconds.
>    3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but
>    you get the idea
>    4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has
>    to happen, no matter the solution.
>
> If you remove the VIP in step 3, the TCP connections on the client side
> are broken (may hang around), and will not be properly terminated if you
> stop postgresql in step 4. Thay may cause delays on the client detecting
> the broken TCP connection and reconnect to the server (depending on the
> network/firewall configuration on the servers). Maybe faster reconnect can
> be achieved if you first stop postgresql, and then remove the VIP.
>

Interesting.  Thanks.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Fast switchover
  In-Reply-To: <CANzqJaAbj3-LOnTc0YaxjYROEN0h_cHqdPLn3hSPGiqkxm6j1Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox