Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1h8T-006Z1a-6O for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 07:43:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u1h8R-00DAaB-DT for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 07:43:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u125C-00CZeL-5M for pgsql-general@lists.postgresql.org; Sat, 05 Apr 2025 11:53:06 +0000 Received: from mail-wm1-x335.google.com ([2a00:1450:4864:20::335]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u1259-003XHA-1w for pgsql-general@lists.postgresql.org; Sat, 05 Apr 2025 11:53:05 +0000 Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-43edb40f357so8334105e9.0 for ; Sat, 05 Apr 2025 04:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20230601; t=1743853982; x=1744458782; darn=lists.postgresql.org; h=content-transfer-encoding:subject:from:content-language:to :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=lXN7E703VcPbvzJuggdQOR457sPTDI+wGjDYVyROACs=; b=S4Sa1wecjf+w43pcwQogMwdv1pcQ4kBtXRWCmYCEnq6J56uenmiDCxNVo4rmuFFIW3 B6Kqivo/vIhKDr6UTrinFBx6X6vvFMWwzINvVUuEQqO6w04jlyTq7kegr7g70O/Q1kER Sq8dglu3y1bmP6twxjZsH7537TCH5vDSfjf+G/Q/o9W5uGGIbWnZ1mJxV3gLusKaR0r0 A4xAtTbze0NZfOX5nP8yyWahEoG1Rj9MG/vT2wSZW7cLBzI33u84Gq/fO3h0+H6zOy2l VtivadjTi5eb/Odhe33dcFbUybNzVCKWPnOKDJkOHRxYBl4S4lZwOkOMc+kd3+66KjQC 6rLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743853982; x=1744458782; h=content-transfer-encoding:subject:from:content-language:to :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=lXN7E703VcPbvzJuggdQOR457sPTDI+wGjDYVyROACs=; b=s4vb/hpWG0M+SHYqF1TXO974QcTW2yMAWDNfZIv6IBAREOdfBKTGN3R7lnv0/kLsqp VxtC31zJ5zwAgV83gV17vA3zF4AIvOQiDDgVO31FIYKdjotnYo9fnxwNTeiI7Q+f7jfm bPCnLaCrQN3QbTkciwoVQ2TbwMNEjMZzCRRr8XCJtU/uDTx9B1GGVwYhcnXTeN3g/7Sz AC4tcJhv6G59eQWmXL0MElb+itYx9IkvnM2W0wD4IiXKPL4JBefoneIWklGAEp1MhJoL 2hXh65vp2teWptNarvk/YQ/QDiwO+Ko63pIlynMm4Nv+uWFlhuMMCowYPmjh22q7LYaQ n5WA== X-Gm-Message-State: AOJu0YzsbvL9lkVA2DUrVyJoapQDyu76DvzITUnUkedzuxEWJP33rqxw n9Ej9Q/jwkcR78PgIL4YHA+03ggAbiSS7lmsymLFThTZtmxypvh4sK0Veg== X-Gm-Gg: ASbGncuIpBM3MhOkK7gzKOXwgDvVQ8LxDLKxzW9HLdxsASJGk2x7HaFNgKRK5Qd7tKL BkH0moRMbK5/Z/qJBinLvvO4zMq4POFHHnTsS0S9RiffI+brdOTCQjGU2excDZyafDzzfmyX464 1oKbnfi8z3P3sPgW64cTn4B0W0R0TDa9eqrEUUCJ89e7pnJtHV8OakV7polo/2aYYVkleE+YFTn P8eZQRziZUhiCHjASoWxAmKA/+zzNXu7JWF7v67jSgJRcWKQvbbBhYy+AlryhI7S9BM23OzpBWt Tib4kwCyC9LLtUgPb3HeaHCPMibV5agyM1ehIuDfCiHPE9J5xclGCG6cfPl3kfPry0fV X-Google-Smtp-Source: AGHT+IHfryMAC8p4tIZ/1jS5VUE+k2dYppb0dJDGOE/Fn64Vvv8gchDFy5i5hzruWCwrT6b9imBcOw== X-Received: by 2002:a05:600c:4f4f:b0:43c:ec28:d301 with SMTP id 5b1f17b1804b1-43ee076d88cmr23304445e9.26.1743853982319; Sat, 05 Apr 2025 04:53:02 -0700 (PDT) Received: from [10.147.105.156] ([193.32.248.187]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-43ea9a346e3sm78872315e9.1.2025.04.05.04.53.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 05 Apr 2025 04:53:01 -0700 (PDT) Message-ID: Date: Sat, 5 Apr 2025 13:53:00 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: pgsql-general@lists.postgresql.org Content-Language: en-US From: Ancoron Luciferis Subject: Kubernetes, cgroups v2 and OOM killer - how to avoid? Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, I've been investigating this topic every now and then but to this day have not come to a setup that consistently leads to a PostgreSQL backend process receiving an allocation error instead of being killed externally by the OOM killer. Why this is a problem for me? Because while applications are accessing their DBs (multiple services having their own DBs, some high-frequency), the whole server goes into recovery and kills all backends/connections. While my applications are written to tolerate that, it also means that at that time, esp. for the high-frequency apps, events are piling up, which then leads to a burst as soon as connectivity is restored. This in turn leads to peaks in resource usage in other places (event store, in-memory buffers from apps, ...), which sometimes leads to a series of OOM killer events being triggered, just because some analytics query went overboard. Ideally, I'd find a configuration that only terminates one backend but leaves the others working. I am wondering whether there is any way to receive a real ENOMEM inside a cgroup as soon as I try to allocate beyond its memory.max, instead of relying on the OOM killer. I know the recommendation is to have vm.overcommit_memory set to 2, but then that affects all workloads on the host, including critical infra like the kubelet, CNI, CSI, monitoring, ... I have already gone through and tested the obvious: https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT And yes, I know that Linux cgroups v2 memory.max is not an actual hard limit: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files Any help is greatly appreciated! Cheers, Ancoron