MIME-Version: 1.0
References: 
 <CAP0GqH7hYsFGmLg3Ge-_Lf_VEeES8yeHpOhSvn071=XRnmVxTw@mail.gmail.com>
 <CAFVRaQ3No2rEYjkuKsxNNMCP16_J_kjDm_pv0+CYwTLXsjE8OQ@mail.gmail.com>
 <CAC4eXDgNZM12t8OgKYhjbu_7NFcsu0qG5aTpdXmbN=fmmp-5Tw@mail.gmail.com>
 <CAP0GqH7oRXztf1Ti13usH9CEgBmMJ4mPMsOn+qacD8Adi+Dxew@mail.gmail.com>
 <CAGYT1XRLUstysEBwWKNVVWrUJCfahCFgmzEkFwpjQmABOaMocg@mail.gmail.com>
 <CAFVRaQ0zz-81jReYtOEo2pHXD0GzZ5sr6=mL7nOQ_EBZkRh8MA@mail.gmail.com>
 <CAGYT1XTZOMQ88zfN7T76RJTam47XyX-U1V_CqTTVhk2f5LVObA@mail.gmail.com>
 <CA+M0sHFfKRd1VTp8PZz0fD9CB=SXXsvc7HAY6Py4A+aUWSnQFQ@mail.gmail.com>
 <CAFVRaQ3wLJQj6dpT7zm7fDsUBMJHwVeQ3GQJJB_fUXL43EkdSg@mail.gmail.com>
In-Reply-To: 
 <CAFVRaQ3wLJQj6dpT7zm7fDsUBMJHwVeQ3GQJJB_fUXL43EkdSg@mail.gmail.com>
From: Pavan Kumar <pavan.dba27@gmail.com>
Date: Fri, 3 Oct 2025 09:32:49 -0500
Message-ID: 
 <CA+M0sHE6-bvgr=pMHtWHvLm711OxCeno9pAC1CchZ+=MSehWrw@mail.gmail.com>
Subject: Re: repmgr cannot bring up the standby database after switchover
 manaully
To: Tayyab Fayyaz <tayyab.humayl@gmail.com>
Cc: Fernando Hevia <fhevia@gmail.com>, Chris Lee <clee.hk@gmail.com>,
	Imran Khan <imran.k.23@gmail.com>, pgsql-admin <pgsql-admin@postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000003dff65064041fc86"
Archived-At: 
 <https://www.postgresql.org/message-id/CA%2BM0sHE6-bvgr%3DpMHtWHvLm711OxCeno9pAC1CchZ%2B%3DMSehWrw%40mail.gmail.com>
Precedence: bulk

--0000000000003dff65064041fc86
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hello Tayyab Fayyaz

=3D=3D > As I understand, automatically re-adding the old primary as a stan=
dby
is not an out-of-the-box feature and needs to be handled manually. Is that
correct?
Yes, that is correct. by default  repmgr *does not*  take the failed
primary, clean it up, rewind it, and reattach it as a standby in failover
case.

On all nodes (primary & standbys):
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
wal_level =3D replica
max_wal_senders =3D 10 (depend on no of nodes in a cluster)
max_replication_slots =3D 10 (depend on no of nodes in a cluster)
hot_standby =3D on (standbys)
wal_keep_size =3D 512MB (or sized for your network/WAL shipping risk)
archive_mode =3D on (recommended)
archive_command =3D 'test ! -f /pgarchive/%f && cp %p /pgarchive/%f'
(example; adapt)
hot_standby_feedback =3D on (optional; helps reduce vacuum conflicts)
shared_preload_libraries =E2=80=94 not required by repmgr (leave as is)
set wal_log_hints =3D on

repmgr configuration file
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
primary node

node_id=3D1                 # unique per node
node_name=3D'node_a'
conninfo=3D'host=3Dnode_a dbname=3Drepmgr user=3Drepmgr port=3D5432'
data_directory=3D'/pgdata/15'
use_replication_slots=3Dyes           # if you want slots managed
failover=3Dautomatic                  # if using repmgrd for auto-failover
promote_command=3D'repmgr standby promote -f /etc/repmgr.conf' ( you can ha=
ve
shell script for it )
follow_command=3D'repmgr standby follow -f /etc/repmgr.conf'
log_file=3D'/var/log/repmgr/repmgr.log'

standby node

node_id=3D2                 # unique per node
node_name=3D'node_b'
conninfo=3D'host=3Dnode_b dbname=3Drepmgr user=3Drepmgr port=3D5432'
data_directory=3D'/pgdata/15'
use_replication_slots=3Dyes           # if you want slots managed
failover=3Dautomatic                  # if using repmgrd for auto-failover
promote_command=3D'repmgr standby promote -f /etc/repmgr.conf' #( you can
have shell script for it )
follow_command=3D'repmgr standby follow -f /etc/repmgr.conf'
log_file=3D'/var/log/repmgr/repmgr.log'


during switchover
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
make sure repmgr daemon are running and not in pause state
repmgr -f repmgr.conf daemon status
make sure no lag.
run checkpoint on primary
run switchover command

this will convert your standby as primary and demote old primary as standby

during failover
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
your new standby will become primary and if you have any other standby's
then other standby will follow new primary once follow command
is executed

to bring back old primary as standby you need to run node rejoin command
example syntax
repmgr -f /etc/repmgr.conf node rejoin -d "host=3Dnode_b dbname=3Drepmgr
user=3Drepmgr port=3D5432" --force-rewind (you can use the dry run as well)
below cases force-rewind will fail
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
Prerequisites missing : You didn=E2=80=99t enable wal_log_hints=3Don or ini=
tialize
the cluster with data checksums.
If the cluster crashed hard and the data directory is corrupted, rewind
can=E2=80=99t make sense of it.
If critical control files (like pg_control) are missing or inconsistent.
pg_rewind works by comparing timelines between the new primary and the old
primary.
If the old primary has WAL records that don=E2=80=99t exist in the new prim=
ary=E2=80=99s
timeline, rewind will refuse.
Example: The old primary accepted transactions after a network partition,
then you promoted a standby. Those =E2=80=9Clost=E2=80=9D transactions make=
 divergence
irreversible.
Required WAL not available: The new primary must still have WAL history
needed to reconcile the divergence.
If those WAL segments were already removed (due to low wal_keep_size, no
archive, or aggressive retention), rewind cannot proceed.


On Fri, Oct 3, 2025 at 8:51=E2=80=AFAM Tayyab Fayyaz <tayyab.humayl@gmail.c=
om>
wrote:

> Hello Pavan,
>
> Please share required parameters for PostgreSQL, I will compare with my
> existing configuration.
>
> As I understand, automatically re-adding the old primary as a standby is
> not an out-of-the-box feature and needs to be handled manually. Is that
> correct?
>
> Tayyab
>
>
> On Fri, 3 Oct 2025, 6:20=E2=80=AFpm Pavan Kumar, <pavan.dba27@gmail.com> =
wrote:
>
>> Hello  Chris,
>>
>> I hope you configured required parameters in PostgreSQL. I do noticed th=
e
>> same issue when your primary is Idle (no activity).
>> Before doing switchover please perform checkpoint on primary and run
>> switchover command.
>> review repmgr -f repmgr.conf cluster events , this will provide more
>> information on what happened during switchover.
>>
>> Note: Make sure repmgr daemon are running and not in pause mode before
>> switchover .
>>
>>
>>
>>
>> On Wed, Oct 1, 2025 at 3:03=E2=80=AFPM Fernando Hevia <fhevia@gmail.com>=
 wrote:
>>
>>>
>>>
>>>> In my recent experience, there was no issue starting the old primary=
=E2=80=94it
>>>> came up normally. However, it resulted in a split-brain situation wher=
e the
>>>> old primary continued to accept both read and write operations while s=
till
>>>> assuming the other two nodes were replicas.
>>>
>>>
>>> Hi Tayyab,
>>>
>>> A split-brain is definitely an unexpected behavior. After issuing a
>>> failover or switchover command, always check the exit code to ensure it=
 was
>>> successful. If not, you should find in the command output or in the
>>> postgresql logs an indication of what went wrong.
>>>
>>> Seems that either the previous primary couldn't be shutdown or repmgr
>>> failed somehow to change it to a standby. Repmgr sets the node's role b=
y
>>> creating the standby.signal file in the data directory. Upon startup, i=
f
>>> Postgres finds the signal file, it will assume the standby role (provid=
ing
>>> the postgresql.conf file has the correct configuration too). I can only
>>> theorize here, but maybe repmgr failed to write the signal file in $PGD=
ATA
>>> either due to lack of permissions or a network failure.
>>>
>>> The exact output would help in figuring out what went wrong.
>>>
>>> Regards,
>>> Fernando
>>>
>>>
>>>
>>>
>>>
>>> El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AFp.m., Tayyab Fayyaz (
>>> tayyab.humayl@gmail.com) escribi=C3=B3:
>>>
>>>> Hello Fernando,
>>>>
>>>> In my recent experience, there was no issue starting the old primary=
=E2=80=94it
>>>> came up normally. However, it resulted in a split-brain situation wher=
e the
>>>> old primary continued to accept both read and write operations while s=
till
>>>> assuming the other two nodes were replicas.
>>>>
>>>> This issue occurred with the following environment:
>>>>
>>>>    -
>>>>
>>>>    *OS version:* RHEL 8.10
>>>>    -
>>>>
>>>>    *Postgres DB version:* 14.9
>>>>    -
>>>>
>>>>    *repmgr version:* 5.5.0
>>>>
>>>> Tayyab
>>>>
>>>> On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia <fhevia@gmail.c=
om>
>>>> wrote:
>>>>
>>>>>
>>>>> I have 2 postgresql servers. One is the primary and another one is th=
e
>>>>>> standby. I am trying to setup repmgr to do the switchover manually.
>>>>>> Passwordless ssh have been setup for postgres ID on both servers.
>>>>>>
>>>>>> I use this command "repmgr standby switchover --log-level=3DDEBUG
>>>>>> --verbose". The standy database is able to promote to be the primary=
. For
>>>>>> the previous primary database, it was shutdown. It was not able to b=
ring up
>>>>>> as standby by repmgr.
>>>>>
>>>>>
>>>>> In a switchover the primary server is shutdown and restarted as a
>>>>> standby server after the newly promoted primary (former secondary) no=
de has
>>>>> been started.
>>>>> If the primary did not start, there must have been an issue since thi=
s
>>>>> is not the standard behavior for a switchover command.
>>>>>
>>>>> Have you checked the Postgres log file for the previous primary? You
>>>>> should find the startup failure cause in the log.
>>>>>
>>>>> Regards,
>>>>> Fernando
>>>>>
>>>>>
>>>>>
>>>>> El mi=C3=A9, 1 oct 2025 a la(s) 7:30=E2=80=AFa.m., Chris Lee (clee.hk=
@gmail.com)
>>>>> escribi=C3=B3:
>>>>>
>>>>>> Hi Tayyab,
>>>>>>
>>>>>> Thanks for your information . I also want to find out whether that i=
s
>>>>>> the default behavior,  or I am not configuring repmgr correctly.
>>>>>>
>>>>>> Regards,
>>>>>> Chris
>>>>>>
>>>>>> On Wed, 1 Oct 2025, 18:12 Imran Khan, <imran.k.23@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Tayyab,
>>>>>>>
>>>>>>>  Is this a default behavior? We have 4 nodes cluster but never had
>>>>>>> issue in switchovers.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Imran
>>>>>>>
>>>>>>> On Wed, Oct 1, 2025, 1:10=E2=80=AFPM Tayyab Fayyaz <tayyab.humayl@g=
mail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Chris,
>>>>>>>>
>>>>>>>> I faced this issue it will not add automatically as standby you
>>>>>>>> have to add it manually.
>>>>>>>>
>>>>>>>> But I wrote a script which perform to add old primary as standby
>>>>>>>> once it's back online.
>>>>>>>>
>>>>>>>> Tayyab
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 1 Oct 2025, 3:02=E2=80=AFpm Chris Lee, <clee.hk@gmail.com>=
 wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have 2 postgresql servers. One is the primary and another one i=
s
>>>>>>>>> the standby. I am trying to setup repmgr to do the switchover man=
ually.
>>>>>>>>> Passwordless ssh have been setup for postgres ID on both servers.
>>>>>>>>>
>>>>>>>>> I use this command "repmgr standby switchover --log-level=3DDEBUG
>>>>>>>>> --verbose". The standy database is able to promote to be the prim=
ary. For
>>>>>>>>> the previous primary database, it was shutdown. It was not able t=
o bring up
>>>>>>>>> as standby by repmgr.
>>>>>>>>>
>>>>>>>>> Does anyone encounter this issue before? Thanks a lot for any
>>>>>>>>> suggestions.
>>>>>>>>>
>>>>>>>>> Here is my OS and DB versions:
>>>>>>>>>
>>>>>>>>> OS version: CentOS Stream release 8
>>>>>>>>> Postgres DB version:  15.12
>>>>>>>>> rempmgr version: 5.5.0
>>>>>>>>>
>>>>>>>>> Here is the repmgr conf files:
>>>>>>>>> >>>>>
>>>>>>>>> node_id=3D1  # Use 2 on standby
>>>>>>>>> node_name=3D'primary'
>>>>>>>>> conninfo=3D'host=3Dcentos804 user=3Drepmgr dbname=3Drepmgr passwo=
rd=3Dxxx
>>>>>>>>> connect_timeout=3D15'
>>>>>>>>> use_primary_conninfo_password=3Dtrue
>>>>>>>>> data_directory=3D'/var/lib/pgsql/15/data'  # Adjust for your setu=
p
>>>>>>>>> pg_bindir=3D'/usr/pgsql-15/bin'
>>>>>>>>> service_start_command =3D 'sudo systemctl start postgresql-15'
>>>>>>>>> service_stop_command  =3D 'sudo systemctl stop postgresql-15'
>>>>>>>>> <<<<<
>>>>>>>>>
>>>>>>>>> >>>>>
>>>>>>>>> node_id=3D2  # Use 2 on standby
>>>>>>>>> node_name=3D'standby'
>>>>>>>>> conninfo=3D'host=3Dcentos803 user=3Drepmgr dbname=3Drepmgr passwo=
rd=3Dxxx
>>>>>>>>> connect_timeout=3D15'
>>>>>>>>> use_primary_conninfo_password=3Dtrue
>>>>>>>>> data_directory=3D'/var/lib/pgsql/15/data'  # Adjust for your setu=
p
>>>>>>>>> pg_bindir=3D'/usr/pgsql-15/bin'
>>>>>>>>> service_start_command =3D 'sudo systemctl start postgresql-15'
>>>>>>>>> service_stop_command  =3D 'sudo systemctl stop postgresql-15'
>>>>>>>>> <<<<<
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>
>>
>> --
>>
>>
>>
>> *Regards,#!  Pavan Kumar----------------------------------------------*-
>> *Sr. Database Administrator..!*
>> *NEXT GENERATION PROFESSIONALS, LLC*
>> *Cell    #  267-799-3182 #  pavan.dba27 (Gtalk)  *
>> *India   # 9000459083*
>>
>> *Take Risks; if you win, you will be very happy. If you lose you will be
>> Wise  *
>>
>>

--=20


*Regards,#!  Pavan Kumar----------------------------------------------*-
*Sr. Database Administrator..!*
*NEXT GENERATION PROFESSIONALS, LLC*
*Cell    #  267-799-3182 #  pavan.dba27 (Gtalk)  *
*India   # 9000459083*

*Take Risks; if you win, you will be very happy. If you lose you will be
Wise  *

--0000000000003dff65064041fc86
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello=C2=A0Tayyab Fayyaz=C2=A0</div><br><div>=3D=3D &=
gt; As I understand, automatically re-adding the old primary as a standby i=
s not an out-of-the-box feature and needs to be handled manually. Is that c=
orrect?</div>Yes, that is correct. by default=C2=A0

repmgr <strong>does not</strong>=C2=A0 take the failed primary, clean it up=
, rewind it, and reattach it as a standby in failover case.<div><br></div><=
div>On all nodes (primary &amp; standbys):<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D<br>wal_level =3D replica<br>max_wal_senders =3D 10 (depend on no of nod=
es in a cluster)<br>max_replication_slots =3D 10 (depend on no of nodes in =
a cluster)<br>hot_standby =3D on (standbys)<br>wal_keep_size =3D 512MB (or =
sized for your network/WAL shipping risk)<br>archive_mode =3D on (recommend=
ed)<br>archive_command =3D &#39;test ! -f /pgarchive/%f &amp;&amp; cp %p /p=
garchive/%f&#39; (example; adapt)<br>hot_standby_feedback =3D on (optional;=
 helps reduce vacuum conflicts)<br>shared_preload_libraries =E2=80=94 not r=
equired by repmgr (leave as is)<br>set wal_log_hints =3D on<br><br>repmgr c=
onfiguration file <br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>primary node <br><br>node_id=3D1 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # unique per node<br>node_nam=
e=3D&#39;node_a&#39;<br>conninfo=3D&#39;host=3Dnode_a dbname=3Drepmgr user=
=3Drepmgr port=3D5432&#39;<br>data_directory=3D&#39;/pgdata/15&#39;<br>use_=
replication_slots=3Dyes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # if you want sl=
ots managed<br>failover=3Dautomatic =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0# if using repmgrd for auto-failover<br>promote_com=
mand=3D&#39;repmgr standby promote -f /etc/repmgr.conf&#39; ( you can have =
shell script for it )<br>follow_command=3D&#39;repmgr standby follow -f /et=
c/repmgr.conf&#39;<br>log_file=3D&#39;/var/log/repmgr/repmgr.log&#39;<br><b=
r>standby node <br><br>node_id=3D2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 # unique per node<br>node_name=3D&#39;node_b&#39;<br>conn=
info=3D&#39;host=3Dnode_b dbname=3Drepmgr user=3Drepmgr port=3D5432&#39;<br=
>data_directory=3D&#39;/pgdata/15&#39;<br>use_replication_slots=3Dyes =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # if you want slots managed<br>failover=3Da=
utomatic =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# if=
 using repmgrd for auto-failover<br>promote_command=3D&#39;repmgr standby p=
romote -f /etc/repmgr.conf&#39; #( you can have shell script for it )<br>fo=
llow_command=3D&#39;repmgr standby follow -f /etc/repmgr.conf&#39;<br>log_f=
ile=3D&#39;/var/log/repmgr/repmgr.log&#39;<br><br><br>during switchover<br>=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>make sure r=
epmgr daemon are running and not in pause state<br>repmgr -f repmgr.conf da=
emon status <br>make sure no lag.<br>run checkpoint on primary <br>run swit=
chover command <br><br>this will convert your standby as primary and demote=
 old primary as standby <br><br>during failover <br>=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>your new standby will become primary and if =
you have any other standby&#39;s then other standby will follow new primary=
 once follow command <br>is executed <br><br>to bring back old primary as s=
tandby you need to run node rejoin command <br>example syntax <br>repmgr -f=
 /etc/repmgr.conf node rejoin -d &quot;host=3Dnode_b dbname=3Drepmgr user=
=3Drepmgr port=3D5432&quot; --force-rewind (you can use the dry run as well=
)=C2=A0</div><div>below cases force-rewind will fail <br>=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D<br>Prerequisites missing : You didn=E2=80=99t enable wal_log_hint=
s=3Don or initialize the cluster with data checksums.<br>If the cluster cra=
shed hard and the data directory is corrupted, rewind can=E2=80=99t make se=
nse of it.<br>If critical control files (like pg_control) are missing or in=
consistent.<br>pg_rewind works by comparing timelines between the new prima=
ry and the old primary.<br>If the old primary has WAL records that don=E2=
=80=99t exist in the new primary=E2=80=99s timeline, rewind will refuse.<br=
>Example: The old primary accepted transactions after a network partition, =
then you promoted a standby. Those =E2=80=9Clost=E2=80=9D transactions make=
 divergence irreversible.<br>Required WAL not available: The new primary mu=
st still have WAL history needed to reconcile the divergence.<br>If those W=
AL segments were already removed (due to low wal_keep_size, no archive, or =
aggressive retention), rewind cannot proceed.<br><div><br></div></div></div=
><br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" clas=
s=3D"gmail_attr">On Fri, Oct 3, 2025 at 8:51=E2=80=AFAM Tayyab Fayyaz &lt;<=
a href=3D"mailto:tayyab.humayl@gmail.com">tayyab.humayl@gmail.com</a>&gt; w=
rote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"auto"><div>Hello Pavan,<div dir=3D"auto"><br></div><div dir=3D"auto">Pl=
ease share required parameters for PostgreSQL, I will compare with my exist=
ing configuration.</div><div dir=3D"auto"><br></div><div dir=3D"auto">As I =
understand, automatically re-adding the old primary as a standby is not an =
out-of-the-box feature and needs to be handled manually. Is that correct?</=
div><div dir=3D"auto"><br></div><div dir=3D"auto">Tayyab</div><br><br><div =
class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, 3 Oct 2=
025, 6:20=E2=80=AFpm Pavan Kumar, &lt;<a href=3D"mailto:pavan.dba27@gmail.c=
om" target=3D"_blank">pavan.dba27@gmail.com</a>&gt; wrote:<br></div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hello=C2=A0

Chris,<div><br></div><div>I hope you configured required parameters=C2=A0in=
 PostgreSQL. I do noticed the same issue when your primary is Idle (no acti=
vity).</div><div>Before doing switchover=C2=A0please perform checkpoint=C2=
=A0on primary=C2=A0and run switchover command.</div><div>review=C2=A0repmgr=
 -f repmgr.conf cluster events , this will provide more information on what=
 happened=C2=A0during switchover.</div><div><br></div><div>Note: Make sure =
repmgr daemon are running and not in pause mode before switchover=C2=A0.</d=
iv><div><br></div><div><br></div><div><br></div></div><br><div class=3D"gma=
il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Oct 1, 2025 at 3:03=
=E2=80=AFPM Fernando Hevia &lt;<a href=3D"mailto:fhevia@gmail.com" rel=3D"n=
oreferrer" target=3D"_blank">fhevia@gmail.com</a>&gt; wrote:<br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:=
1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex">In my recent experien=
ce, there was no issue starting the old primary=E2=80=94it came up normally=
. However, it resulted in a split-brain situation where the old primary con=
tinued to accept both read and write operations while still assuming the ot=
her two nodes were replicas.</blockquote><div><br></div><div>Hi Tayyab,</di=
v><div><br></div><div>A split-brain is definitely=C2=A0an unexpected behavi=
or. After issuing a failover or switchover command, always check the exit c=
ode to ensure it was successful. If not, you should find in the command out=
put or in the postgresql logs an indication of what went=C2=A0wrong.</div><=
div><br></div><div>Seems that either the previous primary couldn&#39;t be s=
hutdown or repmgr failed somehow to change it to a standby. Repmgr sets the=
 node&#39;s role by creating the standby.signal file in the data directory.=
 Upon startup, if Postgres finds the signal file, it will assume the standb=
y role (providing the postgresql.conf file has the correct configuration to=
o). I can only theorize here, but maybe repmgr failed to write the signal f=
ile in $PGDATA either due to lack of permissions or a network failure.</div=
><div><br></div><div>The exact output would help in figuring out what went =
wrong.</div><div><br></div><div>Regards,</div><div>Fernando</div><div><br><=
br><br></div><div><br></div></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AF=
p.m., Tayyab Fayyaz (<a href=3D"mailto:tayyab.humayl@gmail.com" rel=3D"nore=
ferrer" target=3D"_blank">tayyab.humayl@gmail.com</a>) escribi=C3=B3:<br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div=
 dir=3D"ltr">Hello=C2=A0Fernando,<div><br></div><div><p>In my recent experi=
ence, there was no issue starting the old primary=E2=80=94it came up normal=
ly. However, it resulted in a split-brain situation where the old primary c=
ontinued to accept both read and write operations while still assuming the =
other two nodes were replicas.</p>
<p>This issue occurred with the following environment:</p>
<ul>
<li>
<p><strong>OS version:</strong> RHEL 8.10</p>
</li>
<li>
<p><strong>Postgres DB version:</strong> 14.9</p>
</li>
<li>
<p><strong>repmgr version:</strong> 5.5.0</p></li></ul><div>Tayyab</div></d=
iv></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_att=
r">On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia &lt;<a href=3D"ma=
ilto:fhevia@gmail.com" rel=3D"noreferrer" target=3D"_blank">fhevia@gmail.co=
m</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
><div dir=3D"ltr"><br><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I h=
ave 2 postgresql servers. One is the primary and another one is the standby=
. I am trying to setup repmgr to do the switchover manually. Passwordless s=
sh have been setup for postgres ID on both servers.<br><br>I use this comma=
nd &quot;repmgr standby switchover --log-level=3DDEBUG --verbose&quot;. The=
 standy database is able to promote to be the primary. For the previous pri=
mary database, it was shutdown. It was not able to bring up as standby by r=
epmgr.=C2=A0=C2=A0</blockquote><div><br></div><div>In a switchover the prim=
ary server is shutdown and restarted as a standby server after the newly pr=
omoted primary (former secondary) node has been started.</div><div>If the p=
rimary did not start, there must have been an issue since this is not the s=
tandard behavior for a switchover command.</div><div><br></div><div>Have yo=
u checked the Postgres log file for the previous primary? You should find t=
he startup failure cause in the log.</div><div><br></div><div>Regards,</div=
><div>Fernando</div><div><br></div><div>=C2=A0</div></div><br><div class=3D=
"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">El mi=C3=A9, 1 oct 2025=
 a la(s) 7:30=E2=80=AFa.m., Chris Lee (<a href=3D"mailto:clee.hk@gmail.com"=
 rel=3D"noreferrer" target=3D"_blank">clee.hk@gmail.com</a>) escribi=C3=B3:=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"aut=
o">Hi Tayyab,<div dir=3D"auto"><br></div><div dir=3D"auto">Thanks for your =
information . I also want to find out whether that is the default behavior,=
=C2=A0 or I am not configuring repmgr correctly.</div><div dir=3D"auto"><br=
></div><div dir=3D"auto">Regards,</div><div dir=3D"auto">Chris</div></div><=
br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed,=
 1 Oct 2025, 18:12 Imran Khan, &lt;<a href=3D"mailto:imran.k.23@gmail.com" =
rel=3D"noreferrer" target=3D"_blank">imran.k.23@gmail.com</a>&gt; wrote:<br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto">=
Hi Tayyab,<div dir=3D"auto"><br></div><div dir=3D"auto">=C2=A0Is this a def=
ault behavior? We have 4 nodes cluster but never had issue in switchovers.=
=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto">Thanks,=C2=A0</di=
v><div dir=3D"auto">Imran</div></div><br><div class=3D"gmail_quote"><div di=
r=3D"ltr" class=3D"gmail_attr">On Wed, Oct 1, 2025, 1:10=E2=80=AFPM Tayyab =
Fayyaz &lt;<a href=3D"mailto:tayyab.humayl@gmail.com" rel=3D"noreferrer nor=
eferrer" target=3D"_blank">tayyab.humayl@gmail.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><div>He=
llo Chris,<div dir=3D"auto"><br></div><div dir=3D"auto">I faced this issue =
it will not add automatically as standby you have to add it manually.</div>=
<div dir=3D"auto"><br></div><div dir=3D"auto">But I wrote a script which pe=
rform to add old primary as standby once it&#39;s back online.</div><div di=
r=3D"auto"><br></div><div dir=3D"auto">Tayyab</div><br><br><div class=3D"gm=
ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, 1 Oct 2025, 3:02=
=E2=80=AFpm Chris Lee, &lt;<a href=3D"mailto:clee.hk@gmail.com" rel=3D"nore=
ferrer noreferrer noreferrer" target=3D"_blank">clee.hk@gmail.com</a>&gt; w=
rote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr">Hi all,<br><br>I have 2 postgresql servers. One is the primary and=
 another one is the standby. I am trying to setup repmgr to do the switchov=
er manually. Passwordless ssh have been setup for postgres ID on both serve=
rs.<br><br>I use this command &quot;repmgr standby switchover --log-level=
=3DDEBUG --verbose&quot;. The standy database is able to promote to be the =
primary. For the previous primary database, it was shutdown. It was not abl=
e to bring up as standby by repmgr. =C2=A0<br><br>Does anyone encounter thi=
s issue before? Thanks a lot for any suggestions.<br><br>Here is my OS and =
DB versions:<br><br>OS version: CentOS Stream release 8<br>Postgres DB vers=
ion: =C2=A015.12<br>rempmgr version: 5.5.0<br><br>Here is the repmgr conf f=
iles:<br>&gt;&gt;&gt;&gt;&gt;<br>node_id=3D1 =C2=A0# Use 2 on standby<br>no=
de_name=3D&#39;primary&#39;<br>conninfo=3D&#39;host=3Dcentos804 user=3Drepm=
gr dbname=3Drepmgr password=3Dxxx connect_timeout=3D15&#39;<br>use_primary_=
conninfo_password=3Dtrue<br>data_directory=3D&#39;/var/lib/pgsql/15/data=
9; =C2=A0# Adjust for your setup<br>pg_bindir=3D&#39;/usr/pgsql-15/bin&#39;=
<br>service_start_command =3D &#39;sudo systemctl start postgresql-15&#39;<=
br>service_stop_command =C2=A0=3D &#39;sudo systemctl stop postgresql-15=
9;<br>&lt;&lt;&lt;&lt;&lt;<br><br>&gt;&gt;&gt;&gt;&gt;<br>node_id=3D2 =C2=
=A0# Use 2 on standby<br>node_name=3D&#39;standby&#39;<br>conninfo=3D&#39;h=
ost=3Dcentos803 user=3Drepmgr dbname=3Drepmgr password=3Dxxx connect_timeou=
t=3D15&#39;<br>use_primary_conninfo_password=3Dtrue<br>data_directory=3D=
9;/var/lib/pgsql/15/data&#39; =C2=A0# Adjust for your setup<br>pg_bindir=3D=
&#39;/usr/pgsql-15/bin&#39;<br>service_start_command =3D &#39;sudo systemct=
l start postgresql-15&#39;<br>service_stop_command =C2=A0=3D &#39;sudo syst=
emctl stop postgresql-15&#39;<br>&lt;&lt;&lt;&lt;&lt;<br><br>Regards,<br>Ch=
ris<br></div>
</blockquote></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><font color=3D"#990000"><b><font face=3D"&#39;trebuchet ms&#39;, =
sans-serif">Regards,<br><br>#!=C2=A0 Pavan Kumar<br>-----------------------=
-----------------------</font></b>-<br><b><font face=3D"&#39;trebuchet ms&#=
39;, sans-serif">Sr. Database Administrator..!</font></b></font><br><span s=
tyle=3D"border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans=
-serif;font-size:13px;line-height:19px"><b><span style=3D"font-size:7.5pt;f=
ont-family:Georgia,serif;color:white;background-color:red">NEXT GENERATION =
PROFESSIONALS, LLC</span></b></span><br><font face=3D"&#39;trebuchet ms&#39=
;, sans-serif" color=3D"#990000"><b>Cell =C2=A0 =C2=A0#=C2=A0 267-799-3182 =
#=C2=A0 pavan.dba27 (Gtalk)=C2=A0=C2=A0</b></font><div><font face=3D"&#39;t=
rebuchet ms&#39;, sans-serif" color=3D"#990000"><b>India =C2=A0 # 900045908=
3</b></font><br><br><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-le=
ft:1px solid rgb(204,204,204);padding-left:1ex"><b><font face=3D"&#39;trebu=
chet ms&#39;, sans-serif" color=3D"#000099">Take Risks; if you win, you wil=
l be very happy. If you lose you will be Wise =C2=A0</font></b></blockquote=
></div></div>
</blockquote></div></div></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><font color=3D"#990000"><b><font face=3D"&#39;trebuchet ms&#39;, =
sans-serif">Regards,<br><br>#!=C2=A0 Pavan Kumar<br>-----------------------=
-----------------------</font></b>-<br><b><font face=3D"&#39;trebuchet ms&#=
39;, sans-serif">Sr. Database Administrator..!</font></b></font><br><span s=
tyle=3D"border-collapse:collapse;color:rgb(34,34,34);font-family:arial,sans=
-serif;font-size:13px;line-height:19px"><b><span style=3D"font-size:7.5pt;f=
ont-family:Georgia,serif;color:white;background-color:red">NEXT GENERATION =
PROFESSIONALS, LLC</span></b></span><br><font face=3D"&#39;trebuchet ms&#39=
;, sans-serif" color=3D"#990000"><b>Cell =C2=A0 =C2=A0#=C2=A0 267-799-3182 =
#=C2=A0 pavan.dba27 (Gtalk)=C2=A0=C2=A0</b></font><div><font face=3D"&#39;t=
rebuchet ms&#39;, sans-serif" color=3D"#990000"><b>India =C2=A0 # 900045908=
3</b></font><br><br><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-le=
ft:1px solid rgb(204,204,204);padding-left:1ex"><b><font face=3D"&#39;trebu=
chet ms&#39;, sans-serif" color=3D"#000099">Take Risks; if you win, you wil=
l be very happy. If you lose you will be Wise =C2=A0</font></b></blockquote=
></div></div>

--0000000000003dff65064041fc86--