MIME-Version: 1.0
From: Siraj G <tosiraj.g@gmail.com>
Date: Thu, 28 Nov 2024 08:34:57 -0500
Message-ID: 
 <CAC5iy61uGHGfLpR2Wded8ZniyKqREeDBdeJ4ryXZ5jwU6-oKyg@mail.gmail.com>
Subject: Out of Memory error triggering replica to transition into recovery
 mode
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000a8dd370627f9260c"
Archived-At: 
 <https://www.postgresql.org/message-id/CAC5iy61uGHGfLpR2Wded8ZniyKqREeDBdeJ4ryXZ5jwU6-oKyg%40mail.gmail.com>
Precedence: bulk

--000000000000a8dd370627f9260c
Content-Type: text/plain; charset="UTF-8"

Hello Experts!

As the subject says, today very frequently our replica DB is going into the
recovery mode causing an outage in the application side.

Here are the server  & details:
Server type: Compute engine
OS: Ubuntu 20
Pgsql: 12.2
CPUs: 64
Memory: 128GB
Shared_buffers: 32GB
Work_mem: 256MB
maintenance_work_mem = 3GB
shared_buffers = 32GB
max_connections = 4000
Total size of the DBs: 3TB

The application is designed in such a way that it consumes data
primarily from SECONDARY. And, there are several applications of such type.
I can see tons of messages in the postgres log being written as:
"IP, 2024-11-28 ,<db name>, <user>,1, FATAL: the database system is in
recovery mode"

This indicates that the app services are trying to connect to the DB
constantly and there are tons of them.

Any advice on how we can improvise the situation.

Regards
Siraj

--000000000000a8dd370627f9260c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello Experts!<div><br></div><div>As the subject says, tod=
ay very frequently our replica DB is going into the recovery mode causing a=
n outage in the application side.=C2=A0</div><div><br></div><div>Here are t=
he server=C2=A0 &amp; details:</div><div>Server type: Compute engine</div><=
div>OS: Ubuntu 20</div><div>Pgsql: 12.2</div><div>CPUs: 64</div><div>Memory=
: 128GB</div><div>Shared_buffers: 32GB</div><div>Work_mem: 256MB</div><div>=
maintenance_work_mem =3D 3GB</div><div>shared_buffers =3D 32GB</div><div>ma=
x_connections =3D 4000</div><div>Total size of the DBs: 3TB</div><div><br><=
/div><div>The application is designed in such a way that it consumes data p=
rimarily=C2=A0from SECONDARY. And, there are several applications of such t=
ype. I can see tons of messages in the postgres log being written as:</div>=
<div>&quot;IP, 2024-11-28 ,&lt;db name&gt;, &lt;user&gt;,1, FATAL: the data=
base system is in recovery mode&quot;</div><div><br></div><div>This indicat=
es that the app services are trying to connect to the DB constantly and the=
re are tons of them.</div><div><br></div><div>Any advice on how we can impr=
ovise the situation.</div><div><br></div><div>Regards</div><div>Siraj</div>=
</div>

--000000000000a8dd370627f9260c--