public inbox for [email protected]
help / color / mirror / Atom feedFast switchover
2+ messages / 2 participants
[nested] [flat]
* Fast switchover
@ 2025-09-08 15:03 legrand legrand <[email protected]>
0 siblings, 1 reply; 2+ messages in thread
From: legrand legrand @ 2025-09-08 15:03 UTC (permalink / raw)
To: [email protected] <[email protected]>
Hello all the readers,
For some projects we need a fast manual switchover to address Near Zero downtime maintenance
(not speaking here about automated failover like those provided by HA tools, but just planned, controlled operations)
Database Physical replication switchover itself:
- initial replication (before switchover) should be synchronous or replication LAG should be controlled to prevent data loss.
- Switchover duration seems not "compressible" under a few seconds (because of primary shutdown, promotion, new standby catch up, ...)
- Application retry strategy (after disconnection) should be tuned using proper retry delay. Pooler or specific driver may help.
May logical replication ( bi-directional, with one instance RW and the other RO) be a better solution ?
This solution is more complex because of sequences, DDL, Large Objects, Conflict resolution (if any)
but switchover should be faster ...
what could we expect (in term of downtime in both worlds) ?
Are there any Logical Replication Manager available, or admin tools (preferably open source) ?
any feedback is welcome
Thanks in advance
Regards
PAscal
^ permalink raw reply [nested|flat] 2+ messages in thread
* Re: Fast switchover
@ 2025-09-08 16:10 Ron Johnson <[email protected]>
parent: legrand legrand <[email protected]>
0 siblings, 0 replies; 2+ messages in thread
From: Ron Johnson @ 2025-09-08 16:10 UTC (permalink / raw)
To: [email protected] <[email protected]>
On Mon, Sep 8, 2025 at 11:03 AM legrand legrand <[email protected]>
wrote:
> Hello all the readers,
>
> For some projects we need a fast *manual* switchover to address Near Zero
> downtime maintenance
> (not speaking here about automated failover like those provided by HA
> tools, but just planned, controlled operations)
>
>
> Database Physical replication switchover itself:
> - initial replication (before switchover) should be synchronous or
> replication LAG should be controlled to prevent data loss.
> - Switchover duration seems not "compressible" under a few seconds
> (because of primary shutdown, promotion, new standby catch up, ...)
> - Application retry strategy (after disconnection) should be tuned using
> proper retry delay. Pooler or specific driver may help.
>
There will always be a few seconds delay while the applications reconnect.
Do the applications connect via a VIP? That's simpler for the application.
This is what I do from the not-yet-new-primary:
1. psql -h $CurrentPrimary -c "ALTER SYSTEM SET
synchronous_standby_names TO '*';"
2. Wait a few seconds.
3. ssh $CurrentPrimary sudo ip del $VIP # cmd is more complicated, but
you get the idea
4. ssh $CurrentPrimary pg_ctl stop -mfast # to kill connections, has to
happen, no matter the solution.
5. pg_ctl promote
6. sudo ip add $VIP
7. Replicate from new-primary to new-replica "at leisure".
No retry delay, since the application directly goes to the new server.
Steps 3-6 are in a script, and what pgpool does, except I do it. #4 is by
far the slowest. ssh authentication delay in #3 and #4 are nonexistent if
you have "pre-created" an ssh socket.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 2+ messages in thread
end of thread, other threads:[~2025-09-08 16:10 UTC | newest]
Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-08 15:03 Fast switchover legrand legrand <[email protected]>
2025-09-08 16:10 ` Ron Johnson <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox