public inbox for [email protected]  
help / color / mirror / Atom feed
Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
3+ messages / 3 participants
[nested] [flat]

* Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
@ 2025-10-30 09:30 Edwin UY <[email protected]>
  2025-10-30 13:38 ` Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Ron Johnson <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Edwin UY @ 2025-10-30 09:30 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

Hi,

Apologies for a long email. I suppose as much information as possible will
help with troubleshooting
PostgreSQL is Version 11. I know, it's old, I don't have a choice due to
the application.

There is a PRIMARY and 2 replicas, SYNC and ASYNC.
We had a network outage that rendered the application unusable for some
reason even though we still have a PRIMARY and a replication server in
place.
This is now resolved since the network is restored so I am just wanting to
get some guidance for a quick resolution in the future.

Not really sure how to confirm which one is SYNC or ASYNC.
select * from pg_stat_replication from the PRIMARY shows nothing
So, I am left with no choice but to trust the documentation where it says

SERVER -E = PRIMARY
SERVER -F  = ASYNC
SERVER -G = SYNC

When we have the network issue.
SERVER-E and SERVER-F are accessible and they can communicate to each
other. SERVER-G is not accessible. However the application connection is
intermittently dropping.

The primary is showing several errors like below:
STATEMENT:  ROLLBACK PREPARED 'gid'
ERROR:  prepared transaction with identifier "gid" is busy

SERVER-F is showing
FATAL:  could not connect to the primary server: could not connect to
server: No route to host
                Is the server running on host "SERVER-G" and accepting
                TCP/IP connections on port 5432?

Can't check SERVER-G as it is not accessible.

I assume the prepared transactions are from the replication, not from the
application.
The error from SERVER-F is as expected since SERVER-G is not accessible.
Under this scenario, the application is intermittently having issues
connecting to the database. Not sure why.
We have re-started both databases SERVER-E and SERVER-F and clear up the
prepared transaction as well using
https://www.cybertec-postgresql.com/en/prepared-transactions.
After startup we can see the prepared transaction gone, pg_prepared_xacts
is emptty and then will show one one prepare transaction that is active
based on pg_stat_activity.
select * from pg_stat_replication still shows nothing.
To resolve the SERVER-F error, we change the recovery.conf and
change primary_conninfo to use SERVER-E.
This still did not resolve the application issue and the primary log still
shows the following every so often.

STATEMENT:  ROLLBACK PREPARED 'gid'
ERROR:  prepared transaction with identifier "gid" is busy

At this stage, I thought maybe the PRIMARY and the replicas are configured
in such a way that the PRIMARY must receive confirmation from both that it
has committed too otherwise it will just continue waiting.
Under this scenario, it is not able too since SERVER-G is not accessible.
Does that make sense?

Anyway, maybe someone will be interested to read this email and can shed
some light on this and can advise whether there's some configuration
setting somewhere that we should have modified as a temporary workaround.
Could it be because of synchronous_commit= on? Maybe we should have changed
this when SERVER-G is not accessible?

Everything is back to normal once SERVER-G has become accessible again.
That is about 6 hours though :( and doesn't explain why things will stop
working normally when a replica is down and the PRIMARY is still accessible.
Does that mean, if both replicas are down and only the PRIMARY is
accessible, we have to totally turn off / disable replication?
If we do need to break the replica, when the PRIMARY is UP and both
replicas are inaccessible, do we just unset synchronous_standby_names?

Any reply is much appreciated. Thanks in advance.

Regards,
Ed


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
  2025-10-30 09:30 Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Edwin UY <[email protected]>
@ 2025-10-30 13:38 ` Ron Johnson <[email protected]>
  2025-10-30 13:52   ` Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Scott Ribe <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Ron Johnson @ 2025-10-30 13:38 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

I'd log into Server G and tail -f the Postgresql log file, then run "pg_ctl
status".  Maybe then "pg_ctl start -wt9999", depending on what the log file
says,

On Thu, Oct 30, 2025 at 5:30 AM Edwin UY <[email protected]> wrote:

> Hi,
>
> Apologies for a long email. I suppose as much information as possible will
> help with troubleshooting
> PostgreSQL is Version 11. I know, it's old, I don't have a choice due to
> the application.
>
> There is a PRIMARY and 2 replicas, SYNC and ASYNC.
> We had a network outage that rendered the application unusable for some
> reason even though we still have a PRIMARY and a replication server in
> place.
> This is now resolved since the network is restored so I am just wanting to
> get some guidance for a quick resolution in the future.
>
> Not really sure how to confirm which one is SYNC or ASYNC.
> select * from pg_stat_replication from the PRIMARY shows nothing
> So, I am left with no choice but to trust the documentation where it says
>
> SERVER -E = PRIMARY
> SERVER -F  = ASYNC
> SERVER -G = SYNC
>
> When we have the network issue.
> SERVER-E and SERVER-F are accessible and they can communicate to each
> other. SERVER-G is not accessible. However the application connection is
> intermittently dropping.
>
> The primary is showing several errors like below:
> STATEMENT:  ROLLBACK PREPARED 'gid'
> ERROR:  prepared transaction with identifier "gid" is busy
>
> SERVER-F is showing
> FATAL:  could not connect to the primary server: could not connect to
> server: No route to host
>                 Is the server running on host "SERVER-G" and accepting
>                 TCP/IP connections on port 5432?
>
> Can't check SERVER-G as it is not accessible.
>
> I assume the prepared transactions are from the replication, not from the
> application.
> The error from SERVER-F is as expected since SERVER-G is not accessible.
> Under this scenario, the application is intermittently having issues
> connecting to the database. Not sure why.
> We have re-started both databases SERVER-E and SERVER-F and clear up the
> prepared transaction as well using
> https://www.cybertec-postgresql.com/en/prepared-transactions.
> After startup we can see the prepared transaction gone, pg_prepared_xacts
> is emptty and then will show one one prepare transaction that is active
> based on pg_stat_activity.
> select * from pg_stat_replication still shows nothing.
> To resolve the SERVER-F error, we change the recovery.conf and
> change primary_conninfo to use SERVER-E.
> This still did not resolve the application issue and the primary log still
> shows the following every so often.
>
> STATEMENT:  ROLLBACK PREPARED 'gid'
> ERROR:  prepared transaction with identifier "gid" is busy
>
> At this stage, I thought maybe the PRIMARY and the replicas are configured
> in such a way that the PRIMARY must receive confirmation from both that it
> has committed too otherwise it will just continue waiting.
> Under this scenario, it is not able too since SERVER-G is not accessible.
> Does that make sense?
>
> Anyway, maybe someone will be interested to read this email and can shed
> some light on this and can advise whether there's some configuration
> setting somewhere that we should have modified as a temporary workaround.
> Could it be because of synchronous_commit= on? Maybe we should have
> changed this when SERVER-G is not accessible?
>
> Everything is back to normal once SERVER-G has become accessible again.
> That is about 6 hours though :( and doesn't explain why things will stop
> working normally when a replica is down and the PRIMARY is still accessible.
> Does that mean, if both replicas are down and only the PRIMARY is
> accessible, we have to totally turn off / disable replication?
> If we do need to break the replica, when the PRIMARY is UP and both
> replicas are inaccessible, do we just unset synchronous_standby_names?
>
> Any reply is much appreciated. Thanks in advance.
>
> Regards,
> Ed
>
>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication
  2025-10-30 09:30 Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Edwin UY <[email protected]>
  2025-10-30 13:38 ` Re: Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Ron Johnson <[email protected]>
@ 2025-10-30 13:52   ` Scott Ribe <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Scott Ribe @ 2025-10-30 13:52 UTC (permalink / raw)
  To: [email protected]; +Cc: Pgsql-admin <[email protected]>

> On Oct 30, 2025, at 7:38 AM, Ron Johnson <[email protected]> wrote:
> 
> Under this scenario, it is not able too since SERVER-G is not accessible. Does that make sense?

Synchronous replication means that the transaction is not complete until it finishes on both servers. That's literally what "synchronous" is.







^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2025-10-30 13:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-10-30 09:30 Replication Question / Issue - PRIMARY with SYNC and ASYNC Replication Edwin UY <[email protected]>
2025-10-30 13:38 ` Ron Johnson <[email protected]>
2025-10-30 13:52   ` Scott Ribe <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox