Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v4gql-004ZaU-HX for pgsql-admin@arkaria.postgresql.org; Fri, 03 Oct 2025 14:33:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v4gqj-00Dq3l-Ih for pgsql-admin@arkaria.postgresql.org; Fri, 03 Oct 2025 14:33:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v4gqj-00Dq3X-0R for pgsql-admin@lists.postgresql.org; Fri, 03 Oct 2025 14:33:33 +0000 Received: from mail-vk1-xa33.google.com ([2607:f8b0:4864:20::a33]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v4gqh-000DDK-0x for pgsql-admin@postgresql.org; Fri, 03 Oct 2025 14:33:33 +0000 Received: by mail-vk1-xa33.google.com with SMTP id 71dfb90a1353d-54aa0792200so1718025e0c.3 for ; Fri, 03 Oct 2025 07:33:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759502009; x=1760106809; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=p4NcRHQWRDGKYa4c6ZYb1tFEUwo+QQQm77FrXPaDb+4=; b=GR4iMQu/qm4oW0X6YK7Ck4EUPxQNkXPiYStIC7/wr43d/7zUs5nm6IgEE/XlWncGXO L1llA/XvMy4LqpiORTvCYA7S7tyl9E19bXa5mGqQ+qNWD/zOLw01P9mkoAEVddhJPL7t zpKReYPmFxbasR8gRp/ODGu2x/cWZGD+GRvINLbi1dMvoEVvFmHOxQjHJGdB+9bzyO5j oR2d5fj51xR+GxsIiEQzqjRRoeYm/u9Sem8GfgEhXr6Qy8AJ5cgr4Szo3KY6oLayGPLC H1ENAXdFbnexWePQH1T/xHjIndgJFU5OPnjjwT5n77FcjFYG6Xonwlba8t9RXdcR1Gnb Ba5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759502009; x=1760106809; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=p4NcRHQWRDGKYa4c6ZYb1tFEUwo+QQQm77FrXPaDb+4=; b=mkxnR9oEtfPQ5fV0KTCSZ4sd2foOEdu9347BGdgGtcp8G8T8FMSQYFI3MZV9nXgcb4 qJBWOk3Yc+l6rB4wojRzImB9RLrri3VvEUYvTJYcu0mOp07Z63+Akg4b7ZVVWNNuir7b Y0fZDLXmPtztfeZqfYe5ydtk6J3QvjPJSzOFjCoRKgR4cBT/qjWXv06Ii1iwE+qDOsWK Zr66maFfDGd42Z4FgWPX8DTqnfq/izpiuFh5Qm2l+Rqjpbl4yN3V1W0ffrrqKYsAxOv/ OPLc6TWg9fl+gEv2mT8K4GuLV6Y5tNwrtQHQgPW443ndkX7TMijuACCNywPF71BXpHtw 22RQ== X-Forwarded-Encrypted: i=1; AJvYcCVa7eClNO0BvYpMLXlg5uxR8N82+3lUNJhESQ5fD4h2w1DvRZVwzod08nAvFfUhnK+NSNg9GSRaWwhvsA==@postgresql.org X-Gm-Message-State: AOJu0YxpmQ31t52EjtRYQ1vYtyfDMHHIN7EXyZ717OHo7/BUYsusakak bdwyxeTQKUpXQaVrVl0bR0bAo9s5FpSPDEdRIoFvyryoXgMllFz3jrhY1O50HUXii9zy7t5ukpv j07pwfF0JIT/pssJIPyUidTo2ZetId8s= X-Gm-Gg: ASbGncugvyLW32h+4JjluUtJ+hHdR0H1ym5U+EZpUKMz5OzQo7p59BQiCAjmxnFBEPk yJwvj6wqe/JgC7EbImBitqbNEtKsfoD8vgcpI0oXWoTajcFSt/XQWRdRmXCfZfueQjKofovWmpQ b6AfipawSGL9lsZRKhAUqz0V2QwxXE8burXnlVtTxcON04C6U6UK4CKaN+XFmDcOF2eHq7tibg7 1sy1mrr7owZ/2tCWw9CQDWXQ1wn+7BzRplIgevgDRM= X-Google-Smtp-Source: AGHT+IHHV2klPCyjAxdhKX+smm7TJ8N2214ir9aOeQ2Vrrbn4o0TTV7FWaoicBLeXowDP9Y0vsMfMP8XMcvhbk5sh78= X-Received: by 2002:a05:6122:8c21:b0:54a:87d3:2f0c with SMTP id 71dfb90a1353d-5524e9213e6mr1257557e0c.8.1759502009236; Fri, 03 Oct 2025 07:33:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pavan Kumar Date: Fri, 3 Oct 2025 09:32:49 -0500 X-Gm-Features: AS18NWAyJjHx_xAs8GLgOvh9Q-TY8SfEwo0M9gb788rGjlJkNf5m4EkkJCD-V48 Message-ID: Subject: Re: repmgr cannot bring up the standby database after switchover manaully To: Tayyab Fayyaz Cc: Fernando Hevia , Chris Lee , Imran Khan , pgsql-admin Content-Type: multipart/alternative; boundary="0000000000003dff65064041fc86" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003dff65064041fc86 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Tayyab Fayyaz =3D=3D > As I understand, automatically re-adding the old primary as a stan= dby is not an out-of-the-box feature and needs to be handled manually. Is that correct? Yes, that is correct. by default repmgr *does not* take the failed primary, clean it up, rewind it, and reattach it as a standby in failover case. On all nodes (primary & standbys): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D wal_level =3D replica max_wal_senders =3D 10 (depend on no of nodes in a cluster) max_replication_slots =3D 10 (depend on no of nodes in a cluster) hot_standby =3D on (standbys) wal_keep_size =3D 512MB (or sized for your network/WAL shipping risk) archive_mode =3D on (recommended) archive_command =3D 'test ! -f /pgarchive/%f && cp %p /pgarchive/%f' (example; adapt) hot_standby_feedback =3D on (optional; helps reduce vacuum conflicts) shared_preload_libraries =E2=80=94 not required by repmgr (leave as is) set wal_log_hints =3D on repmgr configuration file =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D primary node node_id=3D1 # unique per node node_name=3D'node_a' conninfo=3D'host=3Dnode_a dbname=3Drepmgr user=3Drepmgr port=3D5432' data_directory=3D'/pgdata/15' use_replication_slots=3Dyes # if you want slots managed failover=3Dautomatic # if using repmgrd for auto-failover promote_command=3D'repmgr standby promote -f /etc/repmgr.conf' ( you can ha= ve shell script for it ) follow_command=3D'repmgr standby follow -f /etc/repmgr.conf' log_file=3D'/var/log/repmgr/repmgr.log' standby node node_id=3D2 # unique per node node_name=3D'node_b' conninfo=3D'host=3Dnode_b dbname=3Drepmgr user=3Drepmgr port=3D5432' data_directory=3D'/pgdata/15' use_replication_slots=3Dyes # if you want slots managed failover=3Dautomatic # if using repmgrd for auto-failover promote_command=3D'repmgr standby promote -f /etc/repmgr.conf' #( you can have shell script for it ) follow_command=3D'repmgr standby follow -f /etc/repmgr.conf' log_file=3D'/var/log/repmgr/repmgr.log' during switchover =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D make sure repmgr daemon are running and not in pause state repmgr -f repmgr.conf daemon status make sure no lag. run checkpoint on primary run switchover command this will convert your standby as primary and demote old primary as standby during failover =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D your new standby will become primary and if you have any other standby's then other standby will follow new primary once follow command is executed to bring back old primary as standby you need to run node rejoin command example syntax repmgr -f /etc/repmgr.conf node rejoin -d "host=3Dnode_b dbname=3Drepmgr user=3Drepmgr port=3D5432" --force-rewind (you can use the dry run as well) below cases force-rewind will fail =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D Prerequisites missing : You didn=E2=80=99t enable wal_log_hints=3Don or ini= tialize the cluster with data checksums. If the cluster crashed hard and the data directory is corrupted, rewind can=E2=80=99t make sense of it. If critical control files (like pg_control) are missing or inconsistent. pg_rewind works by comparing timelines between the new primary and the old primary. If the old primary has WAL records that don=E2=80=99t exist in the new prim= ary=E2=80=99s timeline, rewind will refuse. Example: The old primary accepted transactions after a network partition, then you promoted a standby. Those =E2=80=9Clost=E2=80=9D transactions make= divergence irreversible. Required WAL not available: The new primary must still have WAL history needed to reconcile the divergence. If those WAL segments were already removed (due to low wal_keep_size, no archive, or aggressive retention), rewind cannot proceed. On Fri, Oct 3, 2025 at 8:51=E2=80=AFAM Tayyab Fayyaz wrote: > Hello Pavan, > > Please share required parameters for PostgreSQL, I will compare with my > existing configuration. > > As I understand, automatically re-adding the old primary as a standby is > not an out-of-the-box feature and needs to be handled manually. Is that > correct? > > Tayyab > > > On Fri, 3 Oct 2025, 6:20=E2=80=AFpm Pavan Kumar, = wrote: > >> Hello Chris, >> >> I hope you configured required parameters in PostgreSQL. I do noticed th= e >> same issue when your primary is Idle (no activity). >> Before doing switchover please perform checkpoint on primary and run >> switchover command. >> review repmgr -f repmgr.conf cluster events , this will provide more >> information on what happened during switchover. >> >> Note: Make sure repmgr daemon are running and not in pause mode before >> switchover . >> >> >> >> >> On Wed, Oct 1, 2025 at 3:03=E2=80=AFPM Fernando Hevia = wrote: >> >>> >>> >>>> In my recent experience, there was no issue starting the old primary= =E2=80=94it >>>> came up normally. However, it resulted in a split-brain situation wher= e the >>>> old primary continued to accept both read and write operations while s= till >>>> assuming the other two nodes were replicas. >>> >>> >>> Hi Tayyab, >>> >>> A split-brain is definitely an unexpected behavior. After issuing a >>> failover or switchover command, always check the exit code to ensure it= was >>> successful. If not, you should find in the command output or in the >>> postgresql logs an indication of what went wrong. >>> >>> Seems that either the previous primary couldn't be shutdown or repmgr >>> failed somehow to change it to a standby. Repmgr sets the node's role b= y >>> creating the standby.signal file in the data directory. Upon startup, i= f >>> Postgres finds the signal file, it will assume the standby role (provid= ing >>> the postgresql.conf file has the correct configuration too). I can only >>> theorize here, but maybe repmgr failed to write the signal file in $PGD= ATA >>> either due to lack of permissions or a network failure. >>> >>> The exact output would help in figuring out what went wrong. >>> >>> Regards, >>> Fernando >>> >>> >>> >>> >>> >>> El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AFp.m., Tayyab Fayyaz ( >>> tayyab.humayl@gmail.com) escribi=C3=B3: >>> >>>> Hello Fernando, >>>> >>>> In my recent experience, there was no issue starting the old primary= =E2=80=94it >>>> came up normally. However, it resulted in a split-brain situation wher= e the >>>> old primary continued to accept both read and write operations while s= till >>>> assuming the other two nodes were replicas. >>>> >>>> This issue occurred with the following environment: >>>> >>>> - >>>> >>>> *OS version:* RHEL 8.10 >>>> - >>>> >>>> *Postgres DB version:* 14.9 >>>> - >>>> >>>> *repmgr version:* 5.5.0 >>>> >>>> Tayyab >>>> >>>> On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia >>>> wrote: >>>> >>>>> >>>>> I have 2 postgresql servers. One is the primary and another one is th= e >>>>>> standby. I am trying to setup repmgr to do the switchover manually. >>>>>> Passwordless ssh have been setup for postgres ID on both servers. >>>>>> >>>>>> I use this command "repmgr standby switchover --log-level=3DDEBUG >>>>>> --verbose". The standy database is able to promote to be the primary= . For >>>>>> the previous primary database, it was shutdown. It was not able to b= ring up >>>>>> as standby by repmgr. >>>>> >>>>> >>>>> In a switchover the primary server is shutdown and restarted as a >>>>> standby server after the newly promoted primary (former secondary) no= de has >>>>> been started. >>>>> If the primary did not start, there must have been an issue since thi= s >>>>> is not the standard behavior for a switchover command. >>>>> >>>>> Have you checked the Postgres log file for the previous primary? You >>>>> should find the startup failure cause in the log. >>>>> >>>>> Regards, >>>>> Fernando >>>>> >>>>> >>>>> >>>>> El mi=C3=A9, 1 oct 2025 a la(s) 7:30=E2=80=AFa.m., Chris Lee (clee.hk= @gmail.com) >>>>> escribi=C3=B3: >>>>> >>>>>> Hi Tayyab, >>>>>> >>>>>> Thanks for your information . I also want to find out whether that i= s >>>>>> the default behavior, or I am not configuring repmgr correctly. >>>>>> >>>>>> Regards, >>>>>> Chris >>>>>> >>>>>> On Wed, 1 Oct 2025, 18:12 Imran Khan, wrote: >>>>>> >>>>>>> Hi Tayyab, >>>>>>> >>>>>>> Is this a default behavior? We have 4 nodes cluster but never had >>>>>>> issue in switchovers. >>>>>>> >>>>>>> Thanks, >>>>>>> Imran >>>>>>> >>>>>>> On Wed, Oct 1, 2025, 1:10=E2=80=AFPM Tayyab Fayyaz >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Chris, >>>>>>>> >>>>>>>> I faced this issue it will not add automatically as standby you >>>>>>>> have to add it manually. >>>>>>>> >>>>>>>> But I wrote a script which perform to add old primary as standby >>>>>>>> once it's back online. >>>>>>>> >>>>>>>> Tayyab >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 1 Oct 2025, 3:02=E2=80=AFpm Chris Lee, = wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have 2 postgresql servers. One is the primary and another one i= s >>>>>>>>> the standby. I am trying to setup repmgr to do the switchover man= ually. >>>>>>>>> Passwordless ssh have been setup for postgres ID on both servers. >>>>>>>>> >>>>>>>>> I use this command "repmgr standby switchover --log-level=3DDEBUG >>>>>>>>> --verbose". The standy database is able to promote to be the prim= ary. For >>>>>>>>> the previous primary database, it was shutdown. It was not able t= o bring up >>>>>>>>> as standby by repmgr. >>>>>>>>> >>>>>>>>> Does anyone encounter this issue before? Thanks a lot for any >>>>>>>>> suggestions. >>>>>>>>> >>>>>>>>> Here is my OS and DB versions: >>>>>>>>> >>>>>>>>> OS version: CentOS Stream release 8 >>>>>>>>> Postgres DB version: 15.12 >>>>>>>>> rempmgr version: 5.5.0 >>>>>>>>> >>>>>>>>> Here is the repmgr conf files: >>>>>>>>> >>>>> >>>>>>>>> node_id=3D1 # Use 2 on standby >>>>>>>>> node_name=3D'primary' >>>>>>>>> conninfo=3D'host=3Dcentos804 user=3Drepmgr dbname=3Drepmgr passwo= rd=3Dxxx >>>>>>>>> connect_timeout=3D15' >>>>>>>>> use_primary_conninfo_password=3Dtrue >>>>>>>>> data_directory=3D'/var/lib/pgsql/15/data' # Adjust for your setu= p >>>>>>>>> pg_bindir=3D'/usr/pgsql-15/bin' >>>>>>>>> service_start_command =3D 'sudo systemctl start postgresql-15' >>>>>>>>> service_stop_command =3D 'sudo systemctl stop postgresql-15' >>>>>>>>> <<<<< >>>>>>>>> >>>>>>>>> >>>>> >>>>>>>>> node_id=3D2 # Use 2 on standby >>>>>>>>> node_name=3D'standby' >>>>>>>>> conninfo=3D'host=3Dcentos803 user=3Drepmgr dbname=3Drepmgr passwo= rd=3Dxxx >>>>>>>>> connect_timeout=3D15' >>>>>>>>> use_primary_conninfo_password=3Dtrue >>>>>>>>> data_directory=3D'/var/lib/pgsql/15/data' # Adjust for your setu= p >>>>>>>>> pg_bindir=3D'/usr/pgsql-15/bin' >>>>>>>>> service_start_command =3D 'sudo systemctl start postgresql-15' >>>>>>>>> service_stop_command =3D 'sudo systemctl stop postgresql-15' >>>>>>>>> <<<<< >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Chris >>>>>>>>> >>>>>>>> >> >> -- >> >> >> >> *Regards,#! Pavan Kumar----------------------------------------------*- >> *Sr. Database Administrator..!* >> *NEXT GENERATION PROFESSIONALS, LLC* >> *Cell # 267-799-3182 # pavan.dba27 (Gtalk) * >> *India # 9000459083* >> >> *Take Risks; if you win, you will be very happy. If you lose you will be >> Wise * >> >> --=20 *Regards,#! Pavan Kumar----------------------------------------------*- *Sr. Database Administrator..!* *NEXT GENERATION PROFESSIONALS, LLC* *Cell # 267-799-3182 # pavan.dba27 (Gtalk) * *India # 9000459083* *Take Risks; if you win, you will be very happy. If you lose you will be Wise * --0000000000003dff65064041fc86 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello=C2=A0Tayyab Fayyaz=C2=A0

=3D=3D &= gt; As I understand, automatically re-adding the old primary as a standby i= s not an out-of-the-box feature and needs to be handled manually. Is that c= orrect?
Yes, that is correct. by default=C2=A0 repmgr does not=C2=A0 take the failed primary, clean it up= , rewind it, and reattach it as a standby in failover case.

<= div>On all nodes (primary & standbys):
=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D
wal_level =3D replica
max_wal_senders =3D 10 (depend on no of nod= es in a cluster)
max_replication_slots =3D 10 (depend on no of nodes in = a cluster)
hot_standby =3D on (standbys)
wal_keep_size =3D 512MB (or = sized for your network/WAL shipping risk)
archive_mode =3D on (recommend= ed)
archive_command =3D 'test ! -f /pgarchive/%f && cp %p /p= garchive/%f' (example; adapt)
hot_standby_feedback =3D on (optional;= helps reduce vacuum conflicts)
shared_preload_libraries =E2=80=94 not r= equired by repmgr (leave as is)
set wal_log_hints =3D on

repmgr c= onfiguration file
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D
primary node

node_id=3D1 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # unique per node
node_nam= e=3D'node_a'
conninfo=3D'host=3Dnode_a dbname=3Drepmgr user= =3Drepmgr port=3D5432'
data_directory=3D'/pgdata/15'
use_= replication_slots=3Dyes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # if you want sl= ots managed
failover=3Dautomatic =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0# if using repmgrd for auto-failover
promote_com= mand=3D'repmgr standby promote -f /etc/repmgr.conf' ( you can have = shell script for it )
follow_command=3D'repmgr standby follow -f /et= c/repmgr.conf'
log_file=3D'/var/log/repmgr/repmgr.log'
standby node

node_id=3D2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 # unique per node
node_name=3D'node_b'
conn= info=3D'host=3Dnode_b dbname=3Drepmgr user=3Drepmgr port=3D5432'data_directory=3D'/pgdata/15'
use_replication_slots=3Dyes =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # if you want slots managed
failover=3Da= utomatic =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# if= using repmgrd for auto-failover
promote_command=3D'repmgr standby p= romote -f /etc/repmgr.conf' #( you can have shell script for it )
fo= llow_command=3D'repmgr standby follow -f /etc/repmgr.conf'
log_f= ile=3D'/var/log/repmgr/repmgr.log'


during switchover
= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
make sure r= epmgr daemon are running and not in pause state
repmgr -f repmgr.conf da= emon status
make sure no lag.
run checkpoint on primary
run swit= chover command

this will convert your standby as primary and demote= old primary as standby

during failover
=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D
your new standby will become primary and if = you have any other standby's then other standby will follow new primary= once follow command
is executed

to bring back old primary as s= tandby you need to run node rejoin command
example syntax
repmgr -f= /etc/repmgr.conf node rejoin -d "host=3Dnode_b dbname=3Drepmgr user= =3Drepmgr port=3D5432" --force-rewind (you can use the dry run as well= )=C2=A0
below cases force-rewind will fail
=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D
Prerequisites missing : You didn=E2=80=99t enable wal_log_hint= s=3Don or initialize the cluster with data checksums.
If the cluster cra= shed hard and the data directory is corrupted, rewind can=E2=80=99t make se= nse of it.
If critical control files (like pg_control) are missing or in= consistent.
pg_rewind works by comparing timelines between the new prima= ry and the old primary.
If the old primary has WAL records that don=E2= =80=99t exist in the new primary=E2=80=99s timeline, rewind will refuse.Example: The old primary accepted transactions after a network partition, = then you promoted a standby. Those =E2=80=9Clost=E2=80=9D transactions make= divergence irreversible.
Required WAL not available: The new primary mu= st still have WAL history needed to reconcile the divergence.
If those W= AL segments were already removed (due to low wal_keep_size, no archive, or = aggressive retention), rewind cannot proceed.


On Fri, Oct 3, 2025 at 8:51=E2=80=AFAM Tayyab Fayyaz <<= a href=3D"mailto:tayyab.humayl@gmail.com">tayyab.humayl@gmail.com> w= rote:
Hello Pavan,

Pl= ease share required parameters for PostgreSQL, I will compare with my exist= ing configuration.

As I = understand, automatically re-adding the old primary as a standby is not an = out-of-the-box feature and needs to be handled manually. Is that correct?

Tayyab


On Fri, 3 Oct 2= 025, 6:20=E2=80=AFpm Pavan Kumar, <pavan.dba27@gmail.com> wrote:
Hello=C2=A0 Chris,

I hope you configured required parameters=C2=A0in= PostgreSQL. I do noticed the same issue when your primary is Idle (no acti= vity).
Before doing switchover=C2=A0please perform checkpoint=C2= =A0on primary=C2=A0and run switchover command.
review=C2=A0repmgr= -f repmgr.conf cluster events , this will provide more information on what= happened=C2=A0during switchover.

Note: Make sure = repmgr daemon are running and not in pause mode before switchover=C2=A0.




On Wed, Oct 1, 2025 at 3:03= =E2=80=AFPM Fernando Hevia <fhevia@gmail.com> wrote:
=C2=A0
In my recent experien= ce, there was no issue starting the old primary=E2=80=94it came up normally= . However, it resulted in a split-brain situation where the old primary con= tinued to accept both read and write operations while still assuming the ot= her two nodes were replicas.

Hi Tayyab,

A split-brain is definitely=C2=A0an unexpected behavi= or. After issuing a failover or switchover command, always check the exit c= ode to ensure it was successful. If not, you should find in the command out= put or in the postgresql logs an indication of what went=C2=A0wrong.
<= div>
Seems that either the previous primary couldn't be s= hutdown or repmgr failed somehow to change it to a standby. Repmgr sets the= node's role by creating the standby.signal file in the data directory.= Upon startup, if Postgres finds the signal file, it will assume the standb= y role (providing the postgresql.conf file has the correct configuration to= o). I can only theorize here, but maybe repmgr failed to write the signal f= ile in $PGDATA either due to lack of permissions or a network failure.

The exact output would help in figuring out what went = wrong.

Regards,
Fernando

<= br>


El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AF= p.m., Tayyab Fayyaz (tayyab.humayl@gmail.com) escribi=C3=B3:
Hello=C2=A0Fernando,

In my recent experi= ence, there was no issue starting the old primary=E2=80=94it came up normal= ly. However, it resulted in a split-brain situation where the old primary c= ontinued to accept both read and write operations while still assuming the = other two nodes were replicas.

This issue occurred with the following environment:

  • OS version: RHEL 8.10

  • Postgres DB version: 14.9

  • repmgr version: 5.5.0

Tayyab

On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia <fhevia@gmail.co= m> wrote:

I h= ave 2 postgresql servers. One is the primary and another one is the standby= . I am trying to setup repmgr to do the switchover manually. Passwordless s= sh have been setup for postgres ID on both servers.

I use this comma= nd "repmgr standby switchover --log-level=3DDEBUG --verbose". The= standy database is able to promote to be the primary. For the previous pri= mary database, it was shutdown. It was not able to bring up as standby by r= epmgr.=C2=A0=C2=A0

In a switchover the prim= ary server is shutdown and restarted as a standby server after the newly pr= omoted primary (former secondary) node has been started.
If the p= rimary did not start, there must have been an issue since this is not the s= tandard behavior for a switchover command.

Have yo= u checked the Postgres log file for the previous primary? You should find t= he startup failure cause in the log.

Regards,
Fernando

=C2=A0

El mi=C3=A9, 1 oct 2025= a la(s) 7:30=E2=80=AFa.m., Chris Lee (clee.hk@gmail.com) escribi=C3=B3:=
Hi Tayyab,

Thanks for your = information . I also want to find out whether that is the default behavior,= =C2=A0 or I am not configuring repmgr correctly.
Regards,
Chris
<= br>
On Wed,= 1 Oct 2025, 18:12 Imran Khan, <imran.k.23@gmail.com> wrote:
= Hi Tayyab,

=C2=A0Is this a def= ault behavior? We have 4 nodes cluster but never had issue in switchovers.= =C2=A0

Thanks,=C2=A0
Imran

On Wed, Oct 1, 2025, 1:10=E2=80=AFPM Tayyab = Fayyaz <tayyab.humayl@gmail.com> wrote:
=
He= llo Chris,

I faced this issue = it will not add automatically as standby you have to add it manually.
=

But I wrote a script which pe= rform to add old primary as standby once it's back online.

Tayyab


On Wed, 1 Oct 2025, 3:02= =E2=80=AFpm Chris Lee, <clee.hk@gmail.com> w= rote:
Hi all,

I have 2 postgresql servers. One is the primary and= another one is the standby. I am trying to setup repmgr to do the switchov= er manually. Passwordless ssh have been setup for postgres ID on both serve= rs.

I use this command "repmgr standby switchover --log-level= =3DDEBUG --verbose". The standy database is able to promote to be the = primary. For the previous primary database, it was shutdown. It was not abl= e to bring up as standby by repmgr. =C2=A0

Does anyone encounter thi= s issue before? Thanks a lot for any suggestions.

Here is my OS and = DB versions:

OS version: CentOS Stream release 8
Postgres DB vers= ion: =C2=A015.12
rempmgr version: 5.5.0

Here is the repmgr conf f= iles:
>>>>>
node_id=3D1 =C2=A0# Use 2 on standby
no= de_name=3D'primary'
conninfo=3D'host=3Dcentos804 user=3Drepm= gr dbname=3Drepmgr password=3Dxxx connect_timeout=3D15'
use_primary_= conninfo_password=3Dtrue
data_directory=3D'/var/lib/pgsql/15/data= 9; =C2=A0# Adjust for your setup
pg_bindir=3D'/usr/pgsql-15/bin'=
service_start_command =3D 'sudo systemctl start postgresql-15'<= br>service_stop_command =C2=A0=3D 'sudo systemctl stop postgresql-15= 9;
<<<<<

>>>>>
node_id=3D2 =C2= =A0# Use 2 on standby
node_name=3D'standby'
conninfo=3D'h= ost=3Dcentos803 user=3Drepmgr dbname=3Drepmgr password=3Dxxx connect_timeou= t=3D15'
use_primary_conninfo_password=3Dtrue
data_directory=3D= 9;/var/lib/pgsql/15/data' =C2=A0# Adjust for your setup
pg_bindir=3D= '/usr/pgsql-15/bin'
service_start_command =3D 'sudo systemct= l start postgresql-15'
service_stop_command =C2=A0=3D 'sudo syst= emctl stop postgresql-15'
<<<<<

Regards,
Ch= ris


--
Regards,

#!=C2=A0 Pavan Kumar
-----------------------= -----------------------
-
Sr. Database Administrator..!

NEXT GENERATION = PROFESSIONALS, LLC
Cell =C2=A0 =C2=A0#=C2=A0 267-799-3182 = #=C2=A0 pavan.dba27 (Gtalk)=C2=A0=C2=A0
India =C2=A0 # 900045908= 3

Take Risks; if you win, you wil= l be very happy. If you lose you will be Wise =C2=A0


--
Regards,

#!=C2=A0 Pavan Kumar
-----------------------= -----------------------
-
Sr. Database Administrator..!

NEXT GENERATION = PROFESSIONALS, LLC
Cell =C2=A0 =C2=A0#=C2=A0 267-799-3182 = #=C2=A0 pavan.dba27 (Gtalk)=C2=A0=C2=A0
India =C2=A0 # 900045908= 3

Take Risks; if you win, you wil= l be very happy. If you lose you will be Wise =C2=A0
--0000000000003dff65064041fc86--