Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t1Ioq-00Fbld-IC for pgsql-general@arkaria.postgresql.org; Thu, 17 Oct 2024 05:13:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t1Ioo-00GTOR-QL for pgsql-general@arkaria.postgresql.org; Thu, 17 Oct 2024 05:13:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t1Ioo-00GTOH-BA for pgsql-general@lists.postgresql.org; Thu, 17 Oct 2024 05:13:02 +0000 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t1Iol-001Te6-JA for pgsql-general@lists.postgresql.org; Thu, 17 Oct 2024 05:13:01 +0000 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-37d47b38336so332194f8f.3 for ; Wed, 16 Oct 2024 22:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729141978; x=1729746778; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7V5Azz2z7sCpWcYikLhrAE0uf74MKVv/Gkc8BItLciM=; b=iOjq4BS2Myx4DBNM+800YolcwmJnM+o59adItyljSB2G6zp9byU5BeHKbM9mNvexWi 7FORGav3I6dJnRaOGGj8EbRTQm1fajkgpiEa2rhIpmfXWRBjsgTH3G4Kz76h2mSsPnrD wgujrr+FrzjT5rzEni0nNpyhnrUf3C4ocsmQa86+t38n3TvOndoIpAHlUyONna5YJWK8 yKDn87ajCAysDfBG3KukRrtUZLfp1ewPgtuUadshm0OgoUbY/XYEDs4RcdbOWB7jcfX1 OlxRHMP3xcgJppUP4mCTCz16khlqwQYu1pjjQvVltXiW6d6P1p6OBe6aIyEJUPInC9CB OY0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729141978; x=1729746778; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7V5Azz2z7sCpWcYikLhrAE0uf74MKVv/Gkc8BItLciM=; b=VwuINXlSwYSpvhCFTxEottjTk1y4UwbWaPcVPSeSjn0HA5XbkPS5AZrAPgtQmUskEd fZE31wlgdOQWa5OUwuN8drXbvd+uTUE4NoA9/1wk7f7qLF7Chvc7FKgjdmzIBMdsvmC9 Qn/eNT+FzAmwXNmNI52p8c22hGSRK4Sgeuj8HqC+iFyjE8aJWBU9Yu3Tqn9Pwf4Ca3Jb mqXFUUKFxh3Xs+wLVgXNRkQ10FUxH5mynYtNF/8sKtw+VcmsIxXKafYKil+KjYPuTAem 8ptKIE9h945ZLZyBre+FLFOrYvL4OaJwb3Xtwn3+ikmalcBeBK/N4OAaYmlqyWKSYJL+ kk9w== X-Gm-Message-State: AOJu0Yx6TaerJ6tGjltm4+KhQQd+3TdqSJwoTFKCIkzw03ldz7UhyZAE IduhnrM/50kLilQL7BsLaLId4ZzgNqgynDN0ehSz5EwFJT/1RdDHjpFgoadaXba9xd47JIIOQtA 6MOrTBIoHnUEaSc4tKCmZnohzFFkyJWaf X-Google-Smtp-Source: AGHT+IFmblf+4G1vXTakZWhC+vRx/BcxPsA06D+MQuI0ducqGYPcoemEDuQ5nvyyGguGkM1RdDJVtXIkGXpBFcnRk74= X-Received: by 2002:a5d:4947:0:b0:37d:4afe:8c9b with SMTP id ffacd0b85a97d-37d86d5fec1mr3581265f8f.54.1729141977718; Wed, 16 Oct 2024 22:12:57 -0700 (PDT) MIME-Version: 1.0 References: <37308daa-3860-455f-b9f0-567a982c4554@joeconway.com> In-Reply-To: <37308daa-3860-455f-b9f0-567a982c4554@joeconway.com> From: Siraj G Date: Thu, 17 Oct 2024 10:42:46 +0530 Message-ID: Subject: Re: Help in dealing with OOM To: Joe Conway Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000592be20624a53d4a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000592be20624a53d4a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Joe, I will set these kernel parameters. I also would like to highlight that the issue happened on SECONDARY. While the PRIMARY has less memory and computation in comparison to SECONDARY, not sure if there is anything wrong in the PgSQL. PRIMARY: 48vCPUs & 48GB memory SECONDARY: 64vCPUs & 64GB memory I noticed a few things which do not sound tidy: 1. Total number of DBs are: 1860 (DB environment serves a product that has tenants - around 1100 tenants which means these many DBs are active) : Is there any metric for optimal performance on the number of DBs we should have per instance? I would assume NO (and it should be purely based on the overall operations), but just a question out of curiosity. 2. max_connections is set to 10000. I tried to reduce it to 4000 but was unable to do so (I tried this after reducing the max_connections in PRIMARY to 4000). This is the error: FATAL: hot standby is not possible because max_connections =3D 4000 is a lower setting than on the master server (its value was 10000) If I am clubbing multiple things, sorry for the clutter. Regards Siraj On Tue, Oct 15, 2024 at 12:39=E2=80=AFAM Joe Conway wr= ote: > On 10/14/24 14:37, Siraj G wrote: > > This is from the OS log (/var/log/kern.log): > > > > oom- > > > kill:constraint=3DCONSTRAINT_NONE,nodemask=3D(null),cpuset=3D/,mems_allow= ed=3D0,global_oom,task_memcg=3D/system.sli > ce/system-postgresql.slice/postgresql@12-main.service > ,task=3Dpostgres,pid=3D2334587,uid=3D114 > > 494 Oct 14 09:58:10 gce-k12-prod-as1-erp-pg-secondary kernel: > > [6905020.514569] Out of memory: Killed process 2334587 (postgres) total= - > > vm:26349584kB, anon-rss:3464kB, file-rss:0kB, shmem-rs > > s:21813032kB, UID:114 pgtables:49024kB oom_score_adj:0 > > > 1. Do you happen to have swap disabled? If so, don't do that. > > 2. Does the postgres cgroup have memory.limit (cgroup v1) or memory.max > (cgroup v2) set? > > 3. If #2 answer is no, have you followed the documented guidance here > (in particular vm.overcommit_memory=3D2): > > > > https://www.postgresql.org/docs/12/kernel-resources.html#LINUX-MEMORY-OVE= RCOMMIT > > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > --000000000000592be20624a53d4a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Joe, I will set these kernel param= eters.

I also would like to highlight that the issue hap= pened on SECONDARY. While the PRIMARY has less memory and computation in co= mparison to SECONDARY, not sure if there is anything wrong in the PgSQL.

PRIMARY: 48vCPUs & 48GB memory
SEC= ONDARY: 64vCPUs & 64GB memory

I noticed a few = things which do not sound tidy:
1. Total number of DBs are: 1860= =C2=A0 (DB environment serves a product that has tenants - around 1100 tena= nts which means these many DBs are active)
=C2=A0 =C2=A0 =C2=A0: Is ther= e any metric for optimal performance on the number of DBs we should have pe= r instance? I would assume NO (and it should be purely based on the overall= operations), but just a question out of curiosity.
2. max_connec= tions is set to 10000.=C2=A0
I tried to reduce it to 4000 but was= unable to do so (I tried this after reducing the max_connections in PRIMAR= Y to 4000). This is the error:
FATAL: =C2=A0hot standby is not po= ssible because max_connections =3D 4000 is a lower setting than on the mast= er server (its value was 10000)

If I am clubbi= ng multiple things, sorry for the clutter.

Regards=
Siraj

On Tue, Oct 15, 2024 at 12:39=E2=80=AFAM Joe Conway &= lt;mail@joeconway.com> wrote:<= br>
On 10/14/24 14:3= 7, Siraj G wrote:
> This is from the OS log (/var/log/kern.log):
>
> oom-
> kill:constraint=3DCONSTRAINT_NONE,nodemask=3D(null),cpuset=3D/,mems_al= lowed=3D0,global_oom,task_memcg=3D/system.sli =C2=A0 =C2=A0 =C2=A0ce/system= -postgresql.slice/postgresql@12-main.service,task=3Dpostgres,pid=3D2334587,= uid=3D114
>=C2=A0 =C2=A0 494 Oct 14 09:58:10 gce-k12-prod-as1-erp-pg-secondary ker= nel:
> [6905020.514569] Out of memory: Killed process 2334587 (postgres) tota= l-
> vm:26349584kB, anon-rss:3464kB, file-rss:0kB, shmem-rs=C2=A0 =C2=A0 = =C2=A0
>=C2=A0 =C2=A0s:21813032kB, UID:114 pgtables:49024kB oom_score_adj:0


1. Do you happen to have swap disabled? If so, don't do that.

2. Does the postgres cgroup have memory.limit (cgroup v1) or memory.max
=C2=A0 =C2=A0 (cgroup v2) set?

3. If #2 answer is no, have you followed the documented guidance here
=C2=A0 =C2=A0 (in particular vm.overcommit_memory=3D2):


https://www.postgres= ql.org/docs/12/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
--000000000000592be20624a53d4a--