Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8IP5-000OSL-07 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 13:48:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8IP3-006DA1-2l for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 13:48:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8IP3-006D9s-1i for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 13:48:09 +0000 Received: from mail-lf1-x135.google.com ([2a00:1450:4864:20::135]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8IP1-00000000Cqy-2q6j for pgsql-hackers@postgresql.org; Thu, 02 Apr 2026 13:48:09 +0000 Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-5a12cd0bcd8so1123791e87.3 for ; Thu, 02 Apr 2026 06:48:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775137687; cv=none; d=google.com; s=arc-20240605; b=ITT/79f2Kdn3Z1guC9MByr7jlzZTvBfIi+uYRlsQ43f8taUKe9ZTRRW12ZNrO/VxYK TxLGjjfZ5hU9Ext1NYppQ3yU0cWbhvuwvFLpo/TcOEaG4ZSJFWnTP6kuvkUgJDWin0mD gPa3x6CjMtZQtIXcAaDjOzin2JCTQp8c6sFv88CBrAC14YirKjH/6YpGpeQApVqtBDfB Kr2W/Ycfl0Yb58Z9+0lMaaqtkTl6QpShUhkSWhT/dzIuD4jnV12as6Tar/58rSkqHHp/ E8eO9BvfqieXeAIda9PHxHri8zKAPk1llpb9d4qlq/iqgkhr/wlGHI1hGk7bnaXGCPPh nhog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=X8jOv3m06YHpWJrgTLa/hHgEO05gwbCvFbqsJrqOKyc=; fh=ZwOfFCzzD1PfrnIEMKnv+q0YaasGqSUXiBr0nStqtHY=; b=GuIBMt9rrR45UFS0HPm2Xr9ZTn6AHyNam4tsJNgbaplFavRS0+6cShncRObUWgBYTN l1riOQZO+LO/jZ3Uv28xjqIgoKw9tmjL7GquuBBz5E1G5p0h/PpVdzGk+cTbduZa6CIW I8Afo0FdoZnFJ8DPmu6nd+y8PMYlsF8ln7jPmowyOJmf+yyceyqM+63u+KnkLJ6kdBfJ voPQWqJh7MEXrJ0PXiTPSqLueMActGA3A4QUiJ8Ov6o5j7uYQlWESNjmdpqhMUm4oBw7 l9rAUWkHDy7KO9/6R5W6864k64rb/y3/Bc77zZIIqN2jdfzKThNO9tOPINAhs0qQ5Zp0 g01A==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775137687; x=1775742487; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=X8jOv3m06YHpWJrgTLa/hHgEO05gwbCvFbqsJrqOKyc=; b=mZpZd0T1LU8FprBuUoEQUBIu8y21EksYFSXMeTDI7XSv/HWSlMJPTTmv6BeE1D3KrC Ka4alVmupkOskpmrxkFGIDYbBbriptsRpiBQ0pMPw2JE/vWY/1WJy/JhmjR2cYVeEyjX 6iatuFVKNiyOanczwM3K4RNlD1zmJmCzp+1vN9DPHPeQm2AMPVa+m+YyQHJq9MEc6Sa1 NpOORY5dLr9df2zIytB11dTOCWco566fXzH2bgNwBdquOkbg0VfCHcb9b8uEM6vIjU+7 5IQK69nN937ANdA7J+xH43I1FZLk8qVgVNxtd0pM/AfA3jXbWoaTPzZVlHLKR9bR1Czg BeVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775137687; x=1775742487; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=X8jOv3m06YHpWJrgTLa/hHgEO05gwbCvFbqsJrqOKyc=; b=Jb0WQdWNA4khv582AzHgD5/M/zK74kqCd2aUqOOvMeMpg+WQRXJVQGQ8GQps06IhEN QlwlZt6yyY8jvARvOtxOaqAie/51ICu2nw5S8ybE/Hu4yUVsXTtDOQt9eulBFTilfXjQ 9Bm9a4IufcpoFlgeGuHFFWCJm0OsIDg5FecFBLygoedmxH2fpgzZdBxAljWw6EzIb/MC AdWJjoFlW6SbmPP5bn1d5NBeHLroIC2j8ELjTDYWK8F1j1GUdnZbSuU1tER7exq9yghA CnEGmWcyIDuWnY6qila0JlNxrQpPb5SuDj10po6/rnx/4fGNSaZpTS+CUd1UoaEUW1IL hnlQ== X-Forwarded-Encrypted: i=1; AJvYcCW3rK9iogNyn0ZS3EbJoQzjucNig3PVg0coeoGF4FKel5eKJeDwvRZ6Es3eZcBO2uFwKun4I46uMijZU4et@postgresql.org X-Gm-Message-State: AOJu0Yz6YY47gwf0eydTwBn+1Y5Au9PffF7g2veV380X88oBJTfLVeP3 XORZIcsZfff8LCErz75e4MH/hXmNKB4Z8DCJWpEuU4H41BBx5yCkNlhaPwCsSdMTTEw8d/NN31B a3ucUJk0FDNF284umDlQGFWIqSCvKphw= X-Gm-Gg: ATEYQzyGF8ZLUM4aFAtgtYizP1QUGqeIhWXKcUlu2SVzw9uomFFa8CAgGJfpIfYQMpt iIxn5sCNjetDjPFnVJqiCYyNyzFWZva09WmrUHVtT9xPRnIpRqhKyiVWWLkXoT71Gh3t5nuktGY gbhW78hsvcSmlkGEU9qtYOpzgm5T4DviuD4pVRZgXn009IQM8YeURB0fX52wSjdlQ9vvIkaiSA7 hamPjZJA0YFsy1n/Ja/kR93NEjrFgAoZDXBTKhRBiP+D0vsyIMrSllWV09Xq78r32bO8m1NKpKh bG3YVpzEN3qKa+sV7t4eniT8oReJ5UR4FssD5K18MSmIckNojCV9oRqmqNL3p54g0gdZ1CUv9LA Cunid X-Received: by 2002:a05:6512:39d3:b0:5a2:c094:a8a6 with SMTP id 2adb3069b0e04-5a2c764364dmr1100893e87.1.1775137686818; Thu, 02 Apr 2026 06:48:06 -0700 (PDT) MIME-Version: 1.0 References: <01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi> <2981bb36-6bbe-4bdc-9a94-29b1114c79bd@vondra.me> <3026ec05-f664-4ebe-8bf6-0a1218b234ec@iki.fi> <19945803-6bcc-40fe-a14a-7dc5c462ed80@iki.fi> In-Reply-To: From: Matthias van de Meent Date: Thu, 2 Apr 2026 15:47:54 +0200 X-Gm-Features: AQROBzB3ltv4PpPAre36EVQKYTNMNZbPSCUMsZC5VJzuIomWqp36L1u1QFz7GHU Message-ID: Subject: Re: Shared hash table allocations To: Heikki Linnakangas Cc: Tomas Vondra , "pgsql-hackers@postgresql.org" , Robert Haas , Rahila Syed Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 2 Apr 2026 at 13:52, Heikki Linnakangas wrote: > > On 02/04/2026 13:24, Matthias van de Meent wrote: > > On Tue, 31 Mar 2026 at 23:25, Heikki Linnakangas wrote: > >> > >> 0003: In patch 0003 I removed that flexibility by marking them both with > >> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops > >> the hash tables from using any of the other remaining wiggle room, > >> making them truly fixed-size. > > > > I think this patch finally gave me a good reason why PROCLOCK would've > > needed to be allocated with double the sizes of LOCK: > > > > LOCK is (was) initialized with only 50% of its max capacity. If > > PROCLOCK was initialized with the same parameters and all spare shmem > > is then allocated to other processes, then backends wouldn't be able > > to safely use max_locks_per_transaction. To guarantee no OOMs when all > > backends use max_locks_per_transaction, PROCLOCK's size must be > > doubled to make sure PROCLOCK has sufficient space. (The same isn't > > usually an issue for LOCK, because it's very likely backends will > > operate on the same tables, and thus will be able to share most of the > > LOCK structs.) > > Hmm, I don't know if that makes sense. Code and mailing history indicate it's not the reason, but there is no other sane reason why PROCLOCK would *not* be sized to max_locks_per_transaction * MaxBackends. At least with this reasoning the minimum size is exactly that. > It can happen that you have a lot > of backends acquiring the same, smaller set of locks, growing PROCLOCK > so that it uses up all the available wiggle room, and LOCK can never > grow from its initial size, 1/2 * max_locks_per_transactions * > MaxBackends. If the workload then changes so that every backend tries to > acquire exactly max_locks_per_transactions locks, but this time each > lock is on a different object, you will run out of shared memory at 1/2 > the size of what you expected. > > The opposite can't happen, because PROCLOCK is always at least as large > as LOCK. It doesn't matter what you set PROCLOCK's initial size to, it > will grow together with LOCK, and you will not run out of shared memory > before PROCLOCK has grown up to max_locks_per_transactions * MaxBackends > anyway. > > > Now that LOCK is fully allocated, I think the size doubling can be > > removed, or possibly parameterized for those that need it. > > I don't think that follows. The 2x factor is pretty arbitrary, but it's > still a fair assumption that many backends will be acquiring locks on > the same objects so you need more space in PROCLOCK than in LOCK. I agree that we'll *probably* have more PROCLOCKs in use than LOCKs. But max_locks_per_transaction (MLPT) to me indicates that it is an indicator of the maximum number of locks taken by a transaction, and transaction locks have a 1:1 correspondence with PROCLOCKs (as long as we ignore fast-path locking). Adjusting that value by an arbitrary factor does not many any sense. The user configured a value X, so we should use that value X. Possibly there could be adjustments we need to make to give ourself some breathing room (it's not uncommon to overallocate by a constant factor to allow evict-after-insert patterns in caches), but I can't explain a blanket doubling of usage "because we have a hunch LOCK usage will be lower than PROCLOCK usage" when the user specified a value that would/should map 1:1 against PROCLOCKs scaling as anything other than plainly wasting memory. > I don't know how true that assumption is. It feels right for OLTP > applications. But the situation where I've hit max_locks_per_transaction > is when I've tried to create one table with thousands or partitions. Or > rather, when I try to *drop* that table. In that situation, there's just > one transaction acquiring all the locks, so the PROCLOCK / LOCK ratio is 1. > We could parameterize it, but I feel that's probably overkill and > exposing too much detail to users. At the end of the day, if you hit the > limit, you just bump up max_locks_per_transactions. Or, if it's for DROP, you could use a phased dropping scheme, where you spread the operation across many transactions by dropping a subset of the partitions in each transaction. It takes more careful execution and more time, but it allows you to avoid hitting the limits and starving other backends of lock slots, and avoids requiring postmaster restarts. > If there are two > settings, it's more complicated; which one do you change? You probably > don't mind wasting the few MB of memory that you could gain by carefully > tuning the LOCK / PROCLOCK factor. Yes, that would be more complicated, but we have similar factors elsewhere (hash_mem_multiplier, various costs, weights). We wouldn't even have to use a factor, we could just as well use a new, more direct `max_unique_locks_per_transaction`, which we'd use to scale the LOCK hash. Note that with our current default settings we're spending 11kiB (= 64 * (64+24)) per backend on what I would consider oversized PROCLOCK allocations. With MLPT=128, that doubles to 22kiB per backend. Every 50 max_backends, that'd be ~1.1MB of shared memory allocated in excess of user's requested configuration. Kind regards, Matthias van de Meent