Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vESrr-00Bx1J-Eq for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 13:39:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vESrq-008WDO-Di for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 13:39:05 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vESrp-008WDF-QW for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 13:39:05 +0000 Received: from mail-oi1-x22e.google.com ([2607:f8b0:4864:20::22e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vESrm-004ZRH-31 for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 13:39:03 +0000 Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-44f6d7b586cso646418b6e.3 for ; Thu, 30 Oct 2025 06:39:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761831542; x=1762436342; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=GuwIar9CYzxs7D0x6WuEsMA2cRgzsEnBKUdbRUifMUA=; b=I8qVXTLxnkK2ORQA2jtI6PEs08fk82azivuH+zf9FVIjLPnt16/Uh63bJrfYR84uOw +UFhGEI6luCtt8om5vbIDzmE2RWgRkEda9DcN8VniM352FbnPZ58zUaLtXyPPIlepL5I /bfha5QBKa9DIlJC9dUca4XGjLIj7vwkgpIvBzFhSrhmQg8W2f7cPNfsq/nW+36Xyvov 7hBeKlOYcHnuUmk917IbPEEdBDxomMVbQJT7QY7O11tKN+idYXNOFEUAGdvid6jvRhi2 7RXIW7TT3H7M2NTdPlsM3ivOI1V5qHwTIJ5skzKeog+fMBBrgNNz0izhoD7hwPm+TKnb eKGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761831542; x=1762436342; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GuwIar9CYzxs7D0x6WuEsMA2cRgzsEnBKUdbRUifMUA=; b=fJGoJTBPgOMv0uEjVEmv6iMz9E+6SZjFknnOnk4CSuWSZpAvLDBVlU+e6mlo1s5CgI aYvUxPA6crylxkh0tNH1dpyy/pI4fBhL3OVlf2yL+IGn/WeJu6BUZ7WnX3z+Co2nQnHM 8I6YwOgoghXgAhn6a6EaHoLA37Lb1+9l1q4zuBj2bw8CJhlNrJFDeQM/V5XC225H7SXW vDy2napFob+WheKndfWG3zaoPUkZ2/hR99e+iut5V11cCTmUfXhEB8w5JCa5uTMXWSNY UcP3tSoLBS0VyqMj3bouvQJgqtvkhPVXGwlaSMheYFjq8P+Ltu5jfeYh5ih7Ww0bO5BR Vzsw== X-Gm-Message-State: AOJu0Yys0AtCQaZBP5URQWecfooaXLlifGnEbj6ZBsXR8kibYcvVYPvB cFfnFALQdXzIqwbPRmo3x3TsO/ToNAyLJry6Gg7+MIvpvG0+bFdVUB1iRoh+DJxIrgfLnPgZU/R iU3+1nyN45Zw3UIQvhclFuFktQVJ/zpuAcQ== X-Gm-Gg: ASbGncte1QaHooW177CIFSO8pRQ17ImatY2pj8Y8Gper6S0HW6kIDO56PWo1zHZP9rp RW/QWqb/GqXtrDix30vy12jlkq6FfDQ7iLuGQVX2CaEjd1H8YwC5f6DVgduAIp64gHEqIF0t0xe Nlrxh1g0DqEamgT7TSh1CAdNBZxWTzsXTDDsn5ZEFzfZe6xwN4vxZW4i/2jrUc/ezZ3yT7KM5Oz Wi86jGC6WyiAHrtAbP2cy7wNlPMmGFtVeeiGrBg/KJ79tLvie4l8c6XtUwRvg== X-Google-Smtp-Source: AGHT+IEWF2FR/EhIfKg+V3Q7xgYJfIlAQpRndD4DNfoub++N0xNs/zjrUWo1pSfQJrj58uP5OX0h90MB21hJa2eVHjs= X-Received: by 2002:a05:6808:c142:b0:44d:a6a8:1b5f with SMTP id 5614622812f47-44f7a3fcd6fmr3563105b6e.24.1761831541966; Thu, 30 Oct 2025 06:39:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Thu, 30 Oct 2025 09:38:50 -0400 X-Gm-Features: AWmQ_bl194fd-i0InNBcy3-cEvSfopIdMIdwZwgeT9FiLM-WeNhFYFBEHZo_LrQ Message-ID: Subject: Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication To: Pgsql-admin Content-Type: multipart/alternative; boundary="000000000000368c180642605f64" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000368c180642605f64 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'd log into Server G and tail -f the Postgresql log file, then run "pg_ctl status". Maybe then "pg_ctl start -wt9999", depending on what the log file says, On Thu, Oct 30, 2025 at 5:30=E2=80=AFAM Edwin UY wrote= : > Hi, > > Apologies for a long email. I suppose as much information as possible wil= l > help with troubleshooting > PostgreSQL is Version 11. I know, it's old, I don't have a choice due to > the application. > > There is a PRIMARY and 2 replicas, SYNC and ASYNC. > We had a network outage that rendered the application unusable for some > reason even though we still have a PRIMARY and a replication server in > place. > This is now resolved since the network is restored so I am just wanting t= o > get some guidance for a quick resolution in the future. > > Not really sure how to confirm which one is SYNC or ASYNC. > select * from pg_stat_replication from the PRIMARY shows nothing > So, I am left with no choice but to trust the documentation where it says > > SERVER -E =3D PRIMARY > SERVER -F =3D ASYNC > SERVER -G =3D SYNC > > When we have the network issue. > SERVER-E and SERVER-F are accessible and they can communicate to each > other. SERVER-G is not accessible. However the application connection is > intermittently dropping. > > The primary is showing several errors like below: > STATEMENT: ROLLBACK PREPARED 'gid' > ERROR: prepared transaction with identifier "gid" is busy > > SERVER-F is showing > FATAL: could not connect to the primary server: could not connect to > server: No route to host > Is the server running on host "SERVER-G" and accepting > TCP/IP connections on port 5432? > > Can't check SERVER-G as it is not accessible. > > I assume the prepared transactions are from the replication, not from the > application. > The error from SERVER-F is as expected since SERVER-G is not accessible. > Under this scenario, the application is intermittently having issues > connecting to the database. Not sure why. > We have re-started both databases SERVER-E and SERVER-F and clear up the > prepared transaction as well using > https://www.cybertec-postgresql.com/en/prepared-transactions. > After startup we can see the prepared transaction gone, pg_prepared_xacts > is emptty and then will show one one prepare transaction that is active > based on pg_stat_activity. > select * from pg_stat_replication still shows nothing. > To resolve the SERVER-F error, we change the recovery.conf and > change primary_conninfo to use SERVER-E. > This still did not resolve the application issue and the primary log stil= l > shows the following every so often. > > STATEMENT: ROLLBACK PREPARED 'gid' > ERROR: prepared transaction with identifier "gid" is busy > > At this stage, I thought maybe the PRIMARY and the replicas are configure= d > in such a way that the PRIMARY must receive confirmation from both that i= t > has committed too otherwise it will just continue waiting. > Under this scenario, it is not able too since SERVER-G is not accessible. > Does that make sense? > > Anyway, maybe someone will be interested to read this email and can shed > some light on this and can advise whether there's some configuration > setting somewhere that we should have modified as a temporary workaround. > Could it be because of synchronous_commit=3D on? Maybe we should have > changed this when SERVER-G is not accessible? > > Everything is back to normal once SERVER-G has become accessible again. > That is about 6 hours though :( and doesn't explain why things will stop > working normally when a replica is down and the PRIMARY is still accessib= le. > Does that mean, if both replicas are down and only the PRIMARY is > accessible, we have to totally turn off / disable replication? > If we do need to break the replica, when the PRIMARY is UP and both > replicas are inaccessible, do we just unset synchronous_standby_names? > > Any reply is much appreciated. Thanks in advance. > > Regards, > Ed > > --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --000000000000368c180642605f64 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'd log into Server G and tail -f the Postgresql = log file, then run "pg_ctl status".=C2=A0 Maybe then "pg_ctl= start -wt9999", depending on what the log file says,

On Thu, Oct 30, 2025 at 5:30=E2=80=AFAM Edwin UY <edwin.uy@gmail.com> wrote:
Hi,

Apologi= es for a long email. I suppose as much information as possible will help wi= th troubleshooting
P= ostgreSQL is Version 11. I know, it's old, I don't have a choice du= e to the application.

There is a PR= IMARY and 2 replicas, SYNC and ASYNC.
We had a network outage that rendered the application unu= sable=C2=A0for some reason even though we still have a PRIMARY and a replic= ation server in place.
This is now resolved since the network is restored so I am just wanting = to get some guidance for a quick resolution in the future.

Not really sure how to confirm which one is SYNC or= ASYNC.
select * fro= m pg_stat_replication from the PRIMARY shows nothing
So, I am left with no choice but to trust = the documentation where it says

SERVER -E =3D PRIMARY
SERVER -F=C2=A0 =3D ASYNC
SERVER -G=C2=A0=3D SYNC

When we have the networ= k issue.
SERVER-E an= d SERVER-F are accessible and they can communicate to each other. SERVER-G = is not accessible. However the application connection is intermittently=C2= =A0dropping.

The primary is showing= several errors like below:
STATEMENT: =C2=A0ROLLBACK PREPARED 'gid'
ERROR: =C2=A0pr= epared transaction with identifier "gid" is busy

SERVER-F is showing
FATAL: =C2=A0could not connect to the primary serve= r: could not connect to server: No route to host
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Is the server running on host "SERV= ER-G" and accepting
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 TCP/IP connections on port 5432?

Can't check SERVER-G as it is not accessible.

I assume the prepared transactions are from the rep= lication, not from the application.
The error from SERVER-F is as expected since SERVER-G is no= t accessible.
Under = this scenario, the application is intermittently having issues connecting t= o the database. Not sure why.
We have re-started both databases SERVER-E and SERVER-F and clear= up the prepared transaction as well using=C2=A0https://www= .cybertec-postgresql.com/en/prepared-transactions.
After startup we can see the prepared tr= ansaction gone,=C2=A0pg_prepared_xacts is emptty and then will show one one= prepare transaction that is active based on pg_stat_activity.
=
select * from pg_stat_replication still shows nothing.
To resolve the SERVER-F error, we change= the recovery.conf and change=C2=A0primary_conninfo to use SERVER-E.=
This still did not resolve= the application issue and the primary log still shows the following every = so often.

STATEMENT: =C2=A0ROLLBACK PREPAR= ED 'gid'
ERROR: =C2=A0prepared transaction with identifier "= ;gid" is busy

At this stage, I thought maybe the PRIMARY and the r= eplicas are configured in such a way that the PRIMARY must receive confirma= tion from both that it has committed too otherwise it will just continue wa= iting.
Under this sc= enario, it is not able too since SERVER-G is not accessible. Does that make= sense?

<= /div>
Anyway, maybe someone will = be interested to read this email and can shed some light on this and can ad= vise whether there's some configuration setting somewhere that we shoul= d have modified as a temporary workaround.
Could it be because of=C2=A0synchronous_commit=3D on= ? Maybe we should have changed this when SERVER-G is not accessible?=

Everything is back to normal onc= e SERVER-G has become accessible again.
That is about 6 hours though :( and doesn't explain why things will stop=20 working normally when a replica is down and the PRIMARY is still=20 accessible.
Does tha= t mean, if both replicas are down and only the PRIMARY is accessible, we ha= ve to totally turn off / disable replication?
If we do need to break the replica, when th= e PRIMARY is UP and both replicas are inaccessible, do we just unset=C2=A0s= ynchronous_standby_names?

Any reply= is much appreciated. Thanks in advance.

Regards,
Ed



--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--000000000000368c180642605f64--