RE: Fast switchover

public inbox for [email protected]  
help / color / mirror / Atom feed

RE: Fast switchover
2+ messages / 2 participants
[nested] [flat]

* RE: Fast switchover
@ 2025-09-08 16:37 Klaus Darilion <[email protected]>
  2025-09-08 16:48 ` Re: Fast switchover Ron Johnson <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Klaus Darilion @ 2025-09-08 16:37 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; [email protected] <[email protected]>

From: Ron Johnson <[email protected]>
Sent: Monday, September 8, 2025 6:10 PM
To: [email protected]
Subject: Re: Fast switchover

On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <[email protected]<mailto:[email protected]>> wrote:
Hello all the readers,

For some projects we need a fast manual switchover to address Near Zero downtime maintenance
(not speaking here about automated failover like those provided by HA tools, but just planned, controlled operations)

Database Physical replication switchover itself:
- initial replication (before switchover) should be synchronous or replication LAG should be controlled to prevent data loss.
- Switchover duration seems not "compressible" under a few seconds (because of primary shutdown, promotion, new standby catch up, ...)
- Application retry strategy (after disconnection) should be tuned using proper retry delay. Pooler or specific driver may help.

There will always be a few seconds delay while the applications reconnect.

Do the applications connect via a VIP?  That's simpler for the application.

This is what I do from the not-yet-new-primary:

  1.  psql -h  $CurrentPrimary -c "ALTER SYSTEM SET synchronous_standby_names TO '*';"
  2.  Wait a few seconds.
  3.  ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but you get the idea
  4.  ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to happen, no matter the solution.
If you remove the VIP in step 3, the TCP connections on the client side are broken (may hang around), and will not be properly terminated if you stop postgresql in step 4. Thay may cause delays on the client detecting the broken TCP connection and reconnect to the server (depending on the network/firewall configuration on the servers). Maybe faster reconnect can be achieved if you first stop postgresql, and then remove the VIP.

Regards
Klaus

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Fast switchover
  2025-09-08 16:37 RE: Fast switchover Klaus Darilion <[email protected]>
@ 2025-09-08 16:48 ` Ron Johnson <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Ron Johnson @ 2025-09-08 16:48 UTC (permalink / raw)
  To: [email protected] <[email protected]>

On Mon, Sep 8, 2025 at 12:37 PM Klaus Darilion <[email protected]>
wrote:

>
>
> *From:* Ron Johnson <[email protected]>
> *Sent:* Monday, September 8, 2025 6:10 PM
> *To:* [email protected]
> *Subject:* Re: Fast switchover
>
>
>
> On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <
> [email protected]> wrote:
>
> Hello all the readers,
>
>
>
> For some projects we need a fast *manual* switchover to address Near Zero
> downtime maintenance
>
> (not speaking here about automated failover like those provided by HA
> tools, but just planned, controlled operations)
>
>
>
>
>
> Database Physical replication switchover itself:
>
> - initial replication (before switchover) should be synchronous or
> replication LAG should be controlled to prevent data loss.
>
> - Switchover duration seems not "compressible" under a few seconds
> (because of primary shutdown, promotion, new standby catch up, ...)
>
> - Application retry strategy (after disconnection) should be tuned using
> proper retry delay. Pooler or specific driver may help.
>
>
> There will always be a few seconds delay while the applications reconnect.
>
>
>
> Do the applications connect via a VIP?  That's simpler for the application.
>
>
>
> This is what I do from the not-yet-new-primary:
>
>    1. psql -h  $CurrentPrimary -c "ALTER SYSTEM SET
>    synchronous_standby_names TO '*';"
>    2. Wait a few seconds.
>    3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but
>    you get the idea
>    4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has
>    to happen, no matter the solution.
>
> If you remove the VIP in step 3, the TCP connections on the client side
> are broken (may hang around), and will not be properly terminated if you
> stop postgresql in step 4. Thay may cause delays on the client detecting
> the broken TCP connection and reconnect to the server (depending on the
> network/firewall configuration on the servers). Maybe faster reconnect can
> be achieved if you first stop postgresql, and then remove the VIP.
>

Interesting.  Thanks.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2025-09-08 16:48 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-08 16:37 RE: Fast switchover Klaus Darilion <[email protected]>
2025-09-08 16:48 ` Ron Johnson <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox