Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srADg-00Fo5j-O2 for pgsql-admin@arkaria.postgresql.org; Thu, 19 Sep 2024 06:00:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1srADd-004jJp-Ro for pgsql-admin@arkaria.postgresql.org; Thu, 19 Sep 2024 06:00:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1srADd-004jJh-8B for pgsql-admin@lists.postgresql.org; Thu, 19 Sep 2024 06:00:45 +0000 Received: from mail-ed1-x529.google.com ([2a00:1450:4864:20::529]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1srADa-001vLg-Eu for pgsql-admin@postgresql.org; Thu, 19 Sep 2024 06:00:44 +0000 Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-5c42e7adbddso577608a12.2 for ; Wed, 18 Sep 2024 23:00:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726725641; x=1727330441; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=gayp9vX/EpfQU0peCfI+ye36oXk0/wG9E1oik2vi6uE=; b=Ij3HLhnERMxRywbhgvGm2tfRCdAm9zQH8Ey4BaT+YHCTamFV/Xf0Iacujz9iuJwYaw 7KMz1waOKiJGGKGvTMOA1CDkM6mt0gDRkJ1QmJD3srxT175GwtV5EY3Z8axpwwoPoriV i2gYeLxMLz3Q2UPGDhhPnmKXP/4qjnO8YTMLiMT69wIYzO81rsNtNt4sFO1z0LydU9yO orrh3nnGRQqlVxL9lAQlzQ2K91Jry1Wpg3EK44msU8vgeX5MAdWY1VklgxI7Dc/+ejc3 siG2WtKG7+CuH/O2rcki36sGuX6YrbTL04zKqmwCJF66zRCibIGM5AH3jNEiGKviDOpT hnKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726725641; x=1727330441; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gayp9vX/EpfQU0peCfI+ye36oXk0/wG9E1oik2vi6uE=; b=oRs1M8vye3YZ/txeXfV0bLcRtq3IAHXnH1+rL3G2dqnyO7QxBpFUIvrnGw4gtRhz06 xJL3U2hjfCVTUiq4jlCKf0Z/GjoU9iJMoYMCX4l+TnrCifbNs9Z1VpMguOyFQfR2JTIm l5ZSXhJpWPFtN3BpdMR3W2wF9E62jTaJlf5SMUVrSnUvBKiQyYPPEXv+5WX/ldtkBIgg 5JcEimDq/Iyp4oplaTW2DZNtNMmRtQj7Q4uBuLKYx86VEUj5v/yHhgDKClvvh9SOG8XA HYPx2tQUe+ubzp6smJP8+fm3X161Fc00sT4+X2VgFKqECJGPMKmzlpc1qMpo9yvHcquI sAVw== X-Forwarded-Encrypted: i=1; AJvYcCVSlPlvKAFI3rC08j1mQxAG+Enz65dLZeRMs9yVj+Q4EUeZ5F3GEEJAHCNnlcZc4igRWbWCkrELRKifCg==@postgresql.org X-Gm-Message-State: AOJu0YwZlQBZ5iuerGc6NAm3cMUFhcmBPznfwmJSin8Ez2Xbwl/pE9Nj gMmyazfvZHWXTaxZplXjuZ//3EvrG0o5JdguDxce/VYUSQDlsyCNEHrwFaSVokBpqjOu0gwGv/+ aOlso42KvpFMG83t6v+MDiPCT42E= X-Google-Smtp-Source: AGHT+IGai/fuch15swMlZMurpLbHeC3VtO5wQkCRlOTGJ2poBOifTgLg1v/IvkTw29VHWlahxJGc6ht43hfWBL3y2Yk= X-Received: by 2002:a05:6402:234f:b0:5c4:95d:da49 with SMTP id 4fb4d7f45d1cf-5c41e195187mr18877369a12.15.1726725640507; Wed, 18 Sep 2024 23:00:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Asad Ali Date: Thu, 19 Sep 2024 11:00:26 +0500 Message-ID: Subject: Re: Automatic failback To: Wasim Devale Cc: Pgsql-admin , pgsql-admin Content-Type: multipart/alternative; boundary="0000000000006d5c00062272a434" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000006d5c00062272a434 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Wasim, To achieve automatic failback with minimal or zero downtime during disaster recovery (DR) using *Barman* and PostgreSQL in your Azure setup, here=E2=80= =99s a high-level architecture and strategy you can follow: 1. Set up Barman is in the Azure West Region to back up the PostgreSQL database from the Azure East Region. Use streaming replication to keep the DR database up-to-date with the primary database. - *Primary Database:* Configure continuous WAL streaming to the standby in the West region. (archive_mode =3D on, archive_command =3D 'barman-wal-archive= '). - *Standby Database:* Configure this as a hot standby (read-only), ready to be promoted in case of failover. Configure it to receive WAL data vi= a streaming replication. 2. Implement an automatic failover mechanism using a tool like *Patroni* or *pg_auto_failover*. These tools monitor the primary database and, in case of failure, automatically promote the standby database to the primary role. - *Patroni*: A cluster manager for PostgreSQL with high availability, automatically promoting a standby to primary when a failure is detected. - *pg_auto_failover*: Another option that provides automatic failover between primary and standby PostgreSQL databases, making sure the standb= y can seamlessly take over. 3. After recovery, once the primary database in the east region becomes available again, you need to set up *automatic failback*. Here=E2=80=99s ho= w you can handle failback: - *Step 1: Re-establish Streaming Replication*: After promoting the DR database in the west region, reconfigure the primary in the east region = as a standby. This can be done by setting up streaming replication from the promoted DR database (west) back to the original primary (east). - Reconfigure the old primary to become a replica of the new primary (which is the DR site in the west). - Barman can assist with this by restoring the latest backup and setting up WAL streaming to the original region. - *Step 2: Reverse the Failover (Failback)*: Once the original region is stable, you can reverse the failover with zero downtime: - Stop write operations on the current primary (west). - Perform a controlled failover back to the original primary in the east, making it the new primary. - Reconfigure the DR site in the west region to again become a standby replica. This can be automated using *Patroni* or *pg_auto_failover*, ensuring seamless transitions between primary and standby without user interventi= on 4. To further minimize downtime during failback, you can use *logical replication*: - After failover, set up logical replication from the new primary (west) to the original primary (east) while the original primary is still functioning as a read-only standby. - Once logical replication has caught up, you can promote the original primary (east) with virtually no downtime, ensuring seamless failback. This will ensure that your database is always available and that there is no downtime during a failover. Let me know if you have any other questions. Best regards, Asad Ali On Wed, Sep 18, 2024 at 5:17=E2=80=AFPM Wasim Devale w= rote: > Hi All > > I have barman tool in place and can any one suggest automatic failback > with zero down time. > > My PG database is hosted on Linux Red Hat 9. Our all Azure resources are > on east region. We are planning to do DR disaster recovery in west region= . > > Thanks, > Wasim > --0000000000006d5c00062272a434 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Wasim,

To achi= eve automatic failback with minimal or zero downtime during disaster recove= ry (DR) using Barman and PostgreSQL in your Azure setup, h= ere=E2=80=99s a high-level architecture and strategy you can follow:
<= div>
1. Set up=C2=A0 =C2=A0Barman is in the Azure West Region to back up the PostgreSQL database= from the Azure East Region. Use streaming replication to keep the DR datab= ase up-to-date with the primary database.
  • Primary Database:<= /b> Configure continuous WAL streaming to the standby in the West region.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (archive_mode =3D = on, archive_command =3D 'barman-wal-archive').
  • Standb= y Database: Configure this as a hot standby (read-only), ready to be pr= omoted in case of failover.=C2=A0 Configure it to receive WAL data via stre= aming replication.
2. Implement an automatic failo= ver mechanism using a tool like Patroni or pg_auto= _failover. These tools monitor the primary database and, in case o= f failure, automatically promote the standby database to the primary role.<= /div>
  • Patroni: A cluster manager for PostgreSQL wit= h high availability, automatically promoting a standby to primary when a fa= ilure is detected.
  • pg_auto_failover: Another optio= n that provides automatic failover between primary and standby PostgreSQL d= atabases, making sure the standby can seamlessly take over.
3= . After recovery, once the primary database in the east region becomes avai= lable again, you need to set up automatic failback. Here= =E2=80=99s how you can handle failback:
  • Step 1: Re-= establish Streaming Replication: After promoting the DR database in the west region, reconfigure the primary= in the east region as a standby. This can be done by setting up streaming = replication from the promoted DR database (west) back to the original prima= ry (east).

    • Reconfigure the old primary to become a replica of th= e new primary (which is the DR site in the west).
    • Barman can assist= with this by restoring the latest backup and setting up WAL streaming to t= he original region.
  • Step 2: Reverse the Failov= er (Failback): Once the original region is stable, you can reverse the failover with zero = downtime:

    • Stop write operations on the current primary (west).
    • Perform a controlled failover back to the original primary in the ea= st, making it the new primary.
    • Reconfigure the DR site in the west = region to again become a standby replica.

    This can be automated= using Patroni or pg_auto_failover, ensur= ing seamless transitions between primary and standby without user intervent= ion

4.=C2=A0To further minimize downtime during failback= , you can use logical replication:

  • After failover, set up logical re= plication from the new primary (west) to the original primary (east) while = the original primary is still functioning as a read-only standby.
  • O= nce logical replication has caught up, you can promote the original primary= (east) with virtually no downtime, ensuring seamless failback.
This will ensure that your database is always available and that there i= s no downtime during a failover.

Let me know if yo= u have any other questions.

Best regards,
Asad Ali

On Wed, Sep 18, 2024 at 5:17=E2=80=AFPM Wasim Devale = <wasimd60@gmail.= com> wrote:
Hi All

I = have barman tool in place and can any one suggest automatic failback with z= ero down time.

My PG dat= abase is hosted on Linux Red Hat 9. Our all Azure resources are on east reg= ion. We are planning to do DR disaster recovery in west region.

Thanks,
Wasi= m
--0000000000006d5c00062272a434--