Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tGefu-003vQG-BL for pgsql-admin@arkaria.postgresql.org; Thu, 28 Nov 2024 13:35:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tGefr-005598-Rm for pgsql-admin@arkaria.postgresql.org; Thu, 28 Nov 2024 13:35:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tGefr-005590-Bj for pgsql-admin@lists.postgresql.org; Thu, 28 Nov 2024 13:35:15 +0000 Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tGefo-004VNU-Ia for pgsql-admin@lists.postgresql.org; Thu, 28 Nov 2024 13:35:14 +0000 Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-5ceb03aadb1so1004645a12.0 for ; Thu, 28 Nov 2024 05:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732800910; x=1733405710; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=Z53twy0UQOyHi9ZlqpHOq499WMSOSmIlP37X8JAPkK8=; b=OeAbGMumuvQ/exyqlBqlqfy+6uwlvjLhAyvo2YsUROlg5uxj59NjE8p7+vksMsLZw7 WgS4GOm8MgHh2zqTjKRBadJ2efDIfVk0NKNXMaVmQ/xNy3QzBBY00W5LTWzDvA2i4NOJ pqZTaXK3HWmLuXTlK0osp2GFZVcI3f3oEOBgnhLFExoo1qvIsC7SOlljUSv6MwjIdhQ0 miJXBFJukbLLkaiYQCRErDUoCYQbxy8gqb3nvemnhHkSaUrIwXUSSKhDs0VT/qTTAn7x TTW9/s7Br22JBl5eYt4W3WO7YBJ3kV+GPz3RSUN5WlEukv0Qi2ZQF6fvqYuyvZ1/vU+E 6PUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732800910; x=1733405710; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z53twy0UQOyHi9ZlqpHOq499WMSOSmIlP37X8JAPkK8=; b=VC14YFXb4wg+wJ2lQ0yRtarPMQPlGJQjSJZ26am9mOzGWODhmsjaoh93tkrmhLbxva kLNP8o7Z69hBcJB+xCHW8tsAklFC0nEnq2PxRsCh5+u95XJOZFs+7rlh+OMvTluwK1O8 gUIXWxPW2WaSHoas/OjVky7CNbN4XCgIwWCw3qHl1EuVZnX9iThCD0gnhdyMErTmZQ3Y zbz82o37mAa6iMQKEpWybGtOgPUNXm/BKWSSLiOitOGSL4RBMaZW3O5A9jReg8e9a6b6 5rl5whzJE2lVCPABxSGVVKXkWOoNounN2q4eZcUrAmlm/NWMR14SiiMR/lkSm0S4D39M ncrA== X-Gm-Message-State: AOJu0YzXRhhVVdGCI5v+l/X1kuug8Osgrl+iLT1YpSHvGGBrzvaUjF1r Yqhm+033mcHkXpAR+CBjEPp4PC5pq8dD986m0sfDCg+hrbQPj0BCd+fOYULkwbpa0cmFwvmXByx sCLfloA44xe9VG8vY86t7jG8jOez4Hg== X-Gm-Gg: ASbGncuNE1GDCQ8UKvjGkRCkHn/rDKu4m09FmPhAlGRQOMocpYvliXxlgJxJ/cCCjgJ maVqtPV2WQG4+LR7tYGst5Ck3HOzoHps= X-Google-Smtp-Source: AGHT+IF3O1wbO6LF2rJE6L7QKr/3jY5QJJymIQe8/y1IpyXit6z9QNWuGqGx6NiQYNOKtpxcV8TLSZumu5b/sg7bFZo= X-Received: by 2002:a05:6402:3899:b0:5cf:c0f5:db6d with SMTP id 4fb4d7f45d1cf-5d080cb0208mr5138225a12.34.1732800909202; Thu, 28 Nov 2024 05:35:09 -0800 (PST) MIME-Version: 1.0 From: Siraj G Date: Thu, 28 Nov 2024 08:34:57 -0500 Message-ID: Subject: Out of Memory error triggering replica to transition into recovery mode To: Pgsql-admin Content-Type: multipart/alternative; boundary="000000000000a8dd370627f9260c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a8dd370627f9260c Content-Type: text/plain; charset="UTF-8" Hello Experts! As the subject says, today very frequently our replica DB is going into the recovery mode causing an outage in the application side. Here are the server & details: Server type: Compute engine OS: Ubuntu 20 Pgsql: 12.2 CPUs: 64 Memory: 128GB Shared_buffers: 32GB Work_mem: 256MB maintenance_work_mem = 3GB shared_buffers = 32GB max_connections = 4000 Total size of the DBs: 3TB The application is designed in such a way that it consumes data primarily from SECONDARY. And, there are several applications of such type. I can see tons of messages in the postgres log being written as: "IP, 2024-11-28 ,, ,1, FATAL: the database system is in recovery mode" This indicates that the app services are trying to connect to the DB constantly and there are tons of them. Any advice on how we can improvise the situation. Regards Siraj --000000000000a8dd370627f9260c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Experts!

As the subject says, tod= ay very frequently our replica DB is going into the recovery mode causing a= n outage in the application side.=C2=A0

Here are t= he server=C2=A0 & details:
Server type: Compute engine
<= div>OS: Ubuntu 20
Pgsql: 12.2
CPUs: 64
Memory= : 128GB
Shared_buffers: 32GB
Work_mem: 256MB
= maintenance_work_mem =3D 3GB
shared_buffers =3D 32GB
ma= x_connections =3D 4000
Total size of the DBs: 3TB

<= /div>
The application is designed in such a way that it consumes data p= rimarily=C2=A0from SECONDARY. And, there are several applications of such t= ype. I can see tons of messages in the postgres log being written as:
=
"IP, 2024-11-28 ,<db name>, <user>,1, FATAL: the data= base system is in recovery mode"

This indicat= es that the app services are trying to connect to the DB constantly and the= re are tons of them.

Any advice on how we can impr= ovise the situation.

Regards
Siraj
=
--000000000000a8dd370627f9260c--