Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1hYa-006fUp-JN for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 08:10:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u1hYZ-00DdGc-27 for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 08:10:11 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1hYY-00DdGJ-Ms for pgsql-general@lists.postgresql.org; Mon, 07 Apr 2025 08:10:11 +0000 Received: from mail-ej1-x62e.google.com ([2a00:1450:4864:20::62e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u1hYX-003QsX-1D for pgsql-general@lists.postgresql.org; Mon, 07 Apr 2025 08:10:10 +0000 Received: by mail-ej1-x62e.google.com with SMTP id a640c23a62f3a-ac34257295dso828660266b.2 for ; Mon, 07 Apr 2025 01:10:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1744013408; x=1744618208; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=PE8k5wgNLGf7b9L4+WuVsrNXaYozlldtOWWKs221fhI=; b=dWbR6VO4DbCHmfhRMdINofVrVqXNKGX0nHoEATfZ+1oXk9+Q5z8leOcu2MDhtHwNb+ 1NCDCE+Yn0hxcO9xDqhjKOAWUkl3sm54MqSGjtU98HOAXqvg9wlcdCyLiNCvXUG26jgD 4duoBgqepn6K3Rz6T8E5Q9PSqSE9+0I7BrIxZILro+Sz5zKTD8PTcjMW1z/4LLtJEnOc f2VRGGqTSNlOAMpXLwHOn6el38T0cTjMZWbMxoYRnIOi8opi297/kqReUiVMrK251JuA ipIqhlv6YP3VlMOkOLDBGhGJc4rxB6C42kgE9rzlRBfsrJZW3SDJaAJsQi8SY4aK2CwO 2BEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744013408; x=1744618208; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PE8k5wgNLGf7b9L4+WuVsrNXaYozlldtOWWKs221fhI=; b=Q70Q3rbwUWd3KUYIO7dddWjlBnQK5XGTp7VvUeL51ZnCxezs6RdNmYxhmdDRXHjTxK /eGof+qOsxhHubGqCFPRIQ8Cky/EhSzcDuG+oxg1g9/wig+Mom3Rjb1yDE/+77dLgGMy eKI+q7z8m8w685YvDOab4K3QRbx2KprWweWyeLCfubln0kLQv1fnGsCcM6fiRZtw+Ifm 1ADtUb0s0hcrpWJVfesjz5/v6QnYQxoz60kSTniCEtNWs6gRJ7S9FYR2N5lcwH818oly ekAsys0obdiYAelDlkwS+rg0bRNp9XZOw60wHCO51P/sP60tY4aX2rrtX7LcuJhNCcrA 5m0g== X-Forwarded-Encrypted: i=1; AJvYcCVgqLNnPSVvK4zFnRUX9Ls8tBBnToGYBRD62TohiWpMMYAnNXTG8PZMyXHzHU9Ds3KIo0fSwIqLrE7c3qGG@lists.postgresql.org X-Gm-Message-State: AOJu0YzwK1u+GHrxUV0B9PKG0q3x0HufaHRSNyB6OabzeGmE9w9TlDOS HIUupx7qpytKXif8ldNw2enVQeeHr32YM6HrDsamzGc9EQNgojhmL4EcL+AhAWI= X-Gm-Gg: ASbGncu7zswRvvk4sPcjTyUapwlz0ED2kwnO96mhfWHzJ4TQlORnBiUmuzhebMLgqPA dRhpa4FPGHTIWZnkTl3m7eZW0jLtTWF7YCd6aSvZMTXpS4QJnl4psDdPuUJi2vsHY5uyXMFN6RW Qa1Ld8hr/UphqSV4FuG2FN2PDou0RQ83hGhbGlZSre4X1Kt8HydT8P0BPY9Lx8ud5JZXMnN0V/4 lLsDq2uHpjTtuILlvtFpoxvghVMT3Jawr0eVztUxpVczbqqF3231ijjFBE1o7yeUDVo+rRZ/tXr 608AZmeTgGqipFJPr9r0/dU6CdP0IJ5pL+/HKspaVtK/7ioS9bJdXmlk8AS3667jag== X-Google-Smtp-Source: AGHT+IEfhwTL/61zVzK1TIhMmDps+ZRef/YdZQX1h87LUGUxWVbDIZGUQ1d9vXj7lsu1YyLgmGqNVw== X-Received: by 2002:a17:907:7ea1:b0:ac7:b363:862c with SMTP id a640c23a62f3a-ac7d6d7f0a5mr998712766b.34.1744013407966; Mon, 07 Apr 2025 01:10:07 -0700 (PDT) Received: from localhost.localdomain ([2001:871:5e:9a63:6e75:d921:166e:9c61]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ac7bfe5d437sm700532266b.27.2025.04.07.01.10.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Apr 2025 01:10:07 -0700 (PDT) Message-ID: <03a450a1ce93e65f160350628b3d2b125750e5b4.camel@cybertec.at> Subject: Re: Kubernetes, cgroups v2 and OOM killer - how to avoid? From: Laurenz Albe To: Ancoron Luciferis , pgsql-general@lists.postgresql.org Date: Mon, 07 Apr 2025 10:10:07 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.3 (3.54.3-1.fc41) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, 2025-04-05 at 13:53 +0200, Ancoron Luciferis wrote: > I've been investigating this topic every now and then but to this day=20 > have not come to a setup that consistently leads to a PostgreSQL backend= =20 > process receiving an allocation error instead of being killed externally= =20 > by the OOM killer. >=20 > Why this is a problem for me? Because while applications are accessing= =20 > their DBs (multiple services having their own DBs, some high-frequency),= =20 > the whole server goes into recovery and kills all backends/connections. You don't have to explain why that is a problem. It clearly is! > Ideally, I'd find a configuration that only terminates one backend but= =20 > leaves the others working. There isn't, but what you really want is: > I am wondering whether there is any way to receive a real ENOMEM inside= =20 > a cgroup as soon as I try to allocate beyond its memory.max, instead of= =20 > relying on the OOM killer. >=20 > I know the recommendation is to have vm.overcommit_memory set to 2, but= =20 > then that affects all workloads on the host, including critical infra=20 > like the kubelet, CNI, CSI, monitoring, ... I cannot answer your question, but I'd like to make two suggestions: 1. set the PostgreSQL configuration parameters "work_mem", "shared_buffers"= , "maintenance_work_mem" and "max_connections" low enough that you don't go out of memory. A crash is bad, but a query failing with an "out of memory" error isn't nice either. 2. If you want to run PostgreSQL seriously in Kubernetes, put all PostgreSQ= L pods on a dedicated host machine where you can disable memory overcommit= . Yours, Laurenz Albe