Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v4330-007eij-Gb for pgsql-admin@arkaria.postgresql.org; Wed, 01 Oct 2025 20:03:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v432y-004zI3-Gw for pgsql-admin@arkaria.postgresql.org; Wed, 01 Oct 2025 20:03:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v432y-004zHv-3C for pgsql-admin@lists.postgresql.org; Wed, 01 Oct 2025 20:03:32 +0000 Received: from mail-yw1-x1133.google.com ([2607:f8b0:4864:20::1133]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v432w-0018m9-21 for pgsql-admin@postgresql.org; Wed, 01 Oct 2025 20:03:32 +0000 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-71d6051afbfso3664597b3.2 for ; Wed, 01 Oct 2025 13:03:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759349008; x=1759953808; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Lmd06R6BIsySpFK13H+U8+7bsyPgVMg+Ve20CIgmaEc=; b=hGIgMgLdSsf6XZKOXr35DQp4tAa8FdyADmQtpi74zqnixtF4vpOHXAnf229J4fUqCa owoH82ksxAI8WZAbnpDoe+WPg2to4XvayWBaXdySzXpJJIAn73zXBgzYT2WLIApk/jhj IiZJCA95/fJnhEzsXE+F5IR7tjqN7fyygcj768o3iJ5bDrXRCXlT+3G10pve+VMXHnY/ WLTkzPi3Jin8bW5IU3wSEGb7bk+Do8q6oFA4/g48cejj9Hq1uej02f0kQ2z6v7W8x9Jw KERY/qo31gKAi4PcN3awdAwNZiPS2aq51/12+TDfuWKLnxwfi8/bW7biH4L4rRzPjeuY YAzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759349008; x=1759953808; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Lmd06R6BIsySpFK13H+U8+7bsyPgVMg+Ve20CIgmaEc=; b=nu1cCig4nhRDYssmJ0HJAXE1cs4ecmzt54RYBqLrMgMILuqIhp4idsoeV/7WZ+iVVT yyBgCq6g6zvJMMNc+gIANQz+7srBm2MY7S4e368SD0VIM0JPkXVcMih/09kXpJ6up2EU bUHCRxGpH2K7qrz2bKiQkmT1GBfQ28QOQIZTeL+mCuki0Ymuaw0nmtZ18qn33FGiGMTZ FkR/fXwdjTKbJv687096g/q+hBewOcNM9ZKOhHqPbZk+JcMPE0qsuDuJ39p8eT/1VSGS FHjyJz8wcibDwDPoGf03iGc7+GX2apIWqCmmhghMr1vTVNFcfAFRE+yl+BDQiOV8SkPi A/og== X-Forwarded-Encrypted: i=1; AJvYcCVhJiRA29b/3+nGIkl0ENBZYh8vY8RniRAL9G+XOk59VZw75bGd4IqtnoMcimbPRduH1dZUcHwUMBHDdQ==@postgresql.org X-Gm-Message-State: AOJu0YzQgPBvO0CK/mkZPfpG5BTBTdG9aORI3zjQxd2Ki6vdXCoVWuKH 0QVUAxQJM4y/hzZeuTqnKR/wRxdtvoa65/QAWVxNlk919AIAuG9tSYBfy6CeoBvwyrkXWPGD4ZA xAJpqTt6dTE5sb6XR06/p4ekktVXhE58= X-Gm-Gg: ASbGncuxhqQotu/sl5xeGVtQPM7Nj1PRALKmQ587PbyN/f0qlp3cTVyY/BCQEupQred K2mq7KSh5PJ6HjmG5ZeFkNs2Cu1b1NyvAPMRMtN2hxidF6PA14PeNI+XAiY9hHPIPuNG2v8BM3x Xyvs9DTV4prTDG7kT4BkyTT0cWaisGHz5LsQuMlTbuc+p7wpDzO9NYj8omxR7qoQlxrdkR3dNsF SPWf+T9UWYJ4sx1/qegAnsZaZ+hk+aPsQ== X-Google-Smtp-Source: AGHT+IFk2/9cza0aYO/XN/MMg8meQqx+1U7cK3f9ukQZ0ZknyRky1u4grVGYHhndQQTSLniQ9+7DJstwABIafRmj4cA= X-Received: by 2002:a05:690e:d56:b0:62a:45:856e with SMTP id 956f58d0204a3-63b6ff98ea7mr6018627d50.25.1759349008455; Wed, 01 Oct 2025 13:03:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Fernando Hevia Date: Wed, 1 Oct 2025 17:03:16 -0300 X-Gm-Features: AS18NWDVxER7sManErVx24cmhp98GsVRGaOO2FMZL1nPRkW7oFPeiKqwwhd8uRs Message-ID: Subject: Re: repmgr cannot bring up the standby database after switchover manaully To: Tayyab Fayyaz Cc: Chris Lee , Imran Khan , pgsql-admin Content-Type: multipart/alternative; boundary="000000000000af5c7006401e5cb7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000af5c7006401e5cb7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > In my recent experience, there was no issue starting the old primary=E2= =80=94it > came up normally. However, it resulted in a split-brain situation where t= he > old primary continued to accept both read and write operations while stil= l > assuming the other two nodes were replicas. Hi Tayyab, A split-brain is definitely an unexpected behavior. After issuing a failover or switchover command, always check the exit code to ensure it was successful. If not, you should find in the command output or in the postgresql logs an indication of what went wrong. Seems that either the previous primary couldn't be shutdown or repmgr failed somehow to change it to a standby. Repmgr sets the node's role by creating the standby.signal file in the data directory. Upon startup, if Postgres finds the signal file, it will assume the standby role (providing the postgresql.conf file has the correct configuration too). I can only theorize here, but maybe repmgr failed to write the signal file in $PGDATA either due to lack of permissions or a network failure. The exact output would help in figuring out what went wrong. Regards, Fernando El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AFp.m., Tayyab Fayyaz (tayyab.hu= mayl@gmail.com) escribi=C3=B3: > Hello Fernando, > > In my recent experience, there was no issue starting the old primary=E2= =80=94it > came up normally. However, it resulted in a split-brain situation where t= he > old primary continued to accept both read and write operations while stil= l > assuming the other two nodes were replicas. > > This issue occurred with the following environment: > > - > > *OS version:* RHEL 8.10 > - > > *Postgres DB version:* 14.9 > - > > *repmgr version:* 5.5.0 > > Tayyab > > On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia = wrote: > >> >> I have 2 postgresql servers. One is the primary and another one is the >>> standby. I am trying to setup repmgr to do the switchover manually. >>> Passwordless ssh have been setup for postgres ID on both servers. >>> >>> I use this command "repmgr standby switchover --log-level=3DDEBUG >>> --verbose". The standy database is able to promote to be the primary. F= or >>> the previous primary database, it was shutdown. It was not able to brin= g up >>> as standby by repmgr. >> >> >> In a switchover the primary server is shutdown and restarted as a standb= y >> server after the newly promoted primary (former secondary) node has been >> started. >> If the primary did not start, there must have been an issue since this i= s >> not the standard behavior for a switchover command. >> >> Have you checked the Postgres log file for the previous primary? You >> should find the startup failure cause in the log. >> >> Regards, >> Fernando >> >> >> >> El mi=C3=A9, 1 oct 2025 a la(s) 7:30=E2=80=AFa.m., Chris Lee (clee.hk@gm= ail.com) >> escribi=C3=B3: >> >>> Hi Tayyab, >>> >>> Thanks for your information . I also want to find out whether that is >>> the default behavior, or I am not configuring repmgr correctly. >>> >>> Regards, >>> Chris >>> >>> On Wed, 1 Oct 2025, 18:12 Imran Khan, wrote: >>> >>>> Hi Tayyab, >>>> >>>> Is this a default behavior? We have 4 nodes cluster but never had >>>> issue in switchovers. >>>> >>>> Thanks, >>>> Imran >>>> >>>> On Wed, Oct 1, 2025, 1:10=E2=80=AFPM Tayyab Fayyaz >>>> wrote: >>>> >>>>> Hello Chris, >>>>> >>>>> I faced this issue it will not add automatically as standby you have >>>>> to add it manually. >>>>> >>>>> But I wrote a script which perform to add old primary as standby once >>>>> it's back online. >>>>> >>>>> Tayyab >>>>> >>>>> >>>>> On Wed, 1 Oct 2025, 3:02=E2=80=AFpm Chris Lee, wr= ote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I have 2 postgresql servers. One is the primary and another one is >>>>>> the standby. I am trying to setup repmgr to do the switchover manual= ly. >>>>>> Passwordless ssh have been setup for postgres ID on both servers. >>>>>> >>>>>> I use this command "repmgr standby switchover --log-level=3DDEBUG >>>>>> --verbose". The standy database is able to promote to be the primary= . For >>>>>> the previous primary database, it was shutdown. It was not able to b= ring up >>>>>> as standby by repmgr. >>>>>> >>>>>> Does anyone encounter this issue before? Thanks a lot for any >>>>>> suggestions. >>>>>> >>>>>> Here is my OS and DB versions: >>>>>> >>>>>> OS version: CentOS Stream release 8 >>>>>> Postgres DB version: 15.12 >>>>>> rempmgr version: 5.5.0 >>>>>> >>>>>> Here is the repmgr conf files: >>>>>> >>>>> >>>>>> node_id=3D1 # Use 2 on standby >>>>>> node_name=3D'primary' >>>>>> conninfo=3D'host=3Dcentos804 user=3Drepmgr dbname=3Drepmgr password= =3Dxxx >>>>>> connect_timeout=3D15' >>>>>> use_primary_conninfo_password=3Dtrue >>>>>> data_directory=3D'/var/lib/pgsql/15/data' # Adjust for your setup >>>>>> pg_bindir=3D'/usr/pgsql-15/bin' >>>>>> service_start_command =3D 'sudo systemctl start postgresql-15' >>>>>> service_stop_command =3D 'sudo systemctl stop postgresql-15' >>>>>> <<<<< >>>>>> >>>>>> >>>>> >>>>>> node_id=3D2 # Use 2 on standby >>>>>> node_name=3D'standby' >>>>>> conninfo=3D'host=3Dcentos803 user=3Drepmgr dbname=3Drepmgr password= =3Dxxx >>>>>> connect_timeout=3D15' >>>>>> use_primary_conninfo_password=3Dtrue >>>>>> data_directory=3D'/var/lib/pgsql/15/data' # Adjust for your setup >>>>>> pg_bindir=3D'/usr/pgsql-15/bin' >>>>>> service_start_command =3D 'sudo systemctl start postgresql-15' >>>>>> service_stop_command =3D 'sudo systemctl stop postgresql-15' >>>>>> <<<<< >>>>>> >>>>>> Regards, >>>>>> Chris >>>>>> >>>>> --000000000000af5c7006401e5cb7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
=C2=A0
In my recent experience, there was no issue starting the old pri= mary=E2=80=94it came up normally. However, it resulted in a split-brain sit= uation where the old primary continued to accept both read and write operat= ions while still assuming the other two nodes were replicas.

Hi Tayyab,

A split-brain is defi= nitely=C2=A0an unexpected behavior. After issuing a failover or switchover = command, always check the exit code to ensure it was successful. If not, yo= u should find in the command output or in the postgresql logs an indication= of what went=C2=A0wrong.

Seems that either the pr= evious primary couldn't be shutdown or repmgr failed somehow to change = it to a standby. Repmgr sets the node's role by creating the standby.si= gnal file in the data directory. Upon startup, if Postgres finds the signal= file, it will assume the standby role (providing the postgresql.conf file = has the correct configuration too). I can only theorize here, but maybe rep= mgr failed to write the signal file in $PGDATA either due to lack of permis= sions or a network failure.

The exact output would= help in figuring out what went wrong.

Regards,
Fernando





El mi=C3=A9, 1 oct 2025 a la(s) 4:04=E2=80=AFp.m., Tayyab Fayyaz (tayyab.humayl@gmail.com) escri= bi=C3=B3:
Hello=C2=A0Fernando,

In m= y recent experience, there was no issue starting the old primary=E2=80=94it= came up normally. However, it resulted in a split-brain situation where th= e old primary continued to accept both read and write operations while stil= l assuming the other two nodes were replicas.

This issue occurred with the following environment:

  • OS version: RHEL 8.10

  • Postgres DB version: 14.9

  • repmgr version: 5.5.0

Tayyab

On Wed, Oct 1, 2025 at 11:52=E2=80=AFAM Fernando Hevia <fhevia@gmail.com> wrote:
<= br>
I have 2 postgresql se= rvers. One is the primary and another one is the standby. I am trying to se= tup repmgr to do the switchover manually. Passwordless ssh have been setup = for postgres ID on both servers.

I use this command "repmgr sta= ndby switchover --log-level=3DDEBUG --verbose". The standy database is= able to promote to be the primary. For the previous primary database, it w= as shutdown. It was not able to bring up as standby by repmgr.=C2=A0=C2=A0<= /blockquote>

In a switchover the primary server is shutd= own and restarted as a standby server after the newly promoted primary (for= mer secondary) node has been started.
If the primary did not star= t, there must have been an issue since this is not the standard behavior fo= r a switchover command.

Have you checked the Postg= res log file for the previous primary? You should find the startup failure = cause in the log.

Regards,
Fernando

=C2=A0

El mi=C3=A9, 1 oct 2025 a la(s) 7:30=E2=80= =AFa.m., Chris Lee (= clee.hk@gmail.com) escribi=C3=B3:
Hi Tayyab,

Thanks for your information . I also want to find out = whether that is the default behavior,=C2=A0 or I am not configuring repmgr = correctly.

Regards,
Chris

On Wed, 1 Oct 2025, 18:12 Imran Khan, <imran.k.23@gmail.com= > wrote:
=
Hi Tayyab,

= =C2=A0Is this a default behavior? We have 4 nodes cluster but never had iss= ue in switchovers.=C2=A0

Thanks,=C2=A0
Imran

On Wed, Oct 1, 2025, 1:10= =E2=80=AFPM Tayyab Fayyaz <tayyab.humayl@gmail.com> wrote:
Hello Chris,

I faced thi= s issue it will not add automatically as standby you have to add it manuall= y.

But I wrote a script = which perform to add old primary as standby once it's back online.

Tayyab


On Wed, 1 Oct 2025= , 3:02=E2=80=AFpm Chris Lee, <clee.hk@gmail.com> wrot= e:
Hi all,

I have 2 postgresql servers. One is the primary and anot= her one is the standby. I am trying to setup repmgr to do the switchover ma= nually. Passwordless ssh have been setup for postgres ID on both servers.
I use this command "repmgr standby switchover --log-level=3DDEBU= G --verbose". The standy database is able to promote to be the primary= . For the previous primary database, it was shutdown. It was not able to br= ing up as standby by repmgr. =C2=A0

Does anyone encounter this issue= before? Thanks a lot for any suggestions.

Here is my OS and DB vers= ions:

OS version: CentOS Stream release 8
Postgres DB version: = =C2=A015.12
rempmgr version: 5.5.0

Here is the repmgr conf files:=
>>>>>
node_id=3D1 =C2=A0# Use 2 on standby
node_na= me=3D'primary'
conninfo=3D'host=3Dcentos804 user=3Drepmgr db= name=3Drepmgr password=3Dxxx connect_timeout=3D15'
use_primary_conni= nfo_password=3Dtrue
data_directory=3D'/var/lib/pgsql/15/data' = =C2=A0# Adjust for your setup
pg_bindir=3D'/usr/pgsql-15/bin'service_start_command =3D 'sudo systemctl start postgresql-15'
= service_stop_command =C2=A0=3D 'sudo systemctl stop postgresql-15'<= br><<<<<

>>>>>
node_id=3D2 =C2=A0# = Use 2 on standby
node_name=3D'standby'
conninfo=3D'host= =3Dcentos803 user=3Drepmgr dbname=3Drepmgr password=3Dxxx connect_timeout= =3D15'
use_primary_conninfo_password=3Dtrue
data_directory=3D'= ;/var/lib/pgsql/15/data' =C2=A0# Adjust for your setup
pg_bindir=3D&= #39;/usr/pgsql-15/bin'
service_start_command =3D 'sudo systemctl= start postgresql-15'
service_stop_command =C2=A0=3D 'sudo syste= mctl stop postgresql-15'
<<<<<

Regards,
Chr= is
--000000000000af5c7006401e5cb7--