Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8Ioz-000OrZ-11 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 14:14:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8Iox-006J8F-28 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 14:14:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8Iox-006J87-0n for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 14:14:55 +0000 Received: from lahtoruutu.iki.fi ([185.185.170.37]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w8Iov-00000000D8i-146C for pgsql-hackers@postgresql.org; Thu, 02 Apr 2026 14:14:55 +0000 Received: from [10.0.2.15] (unknown [130.41.208.1]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id 4fmkPy74xcz49Pv1; Thu, 02 Apr 2026 17:14:46 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1775139288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DiVGzsrYAMGUBPwxuA25EFzB91JkAhQUnZRH2qnV1zc=; b=FD2wOP+KD5OuxWdNhnjlFQ6uAEWE/67diwYDf9n7R8Bp0TiWZvPdnmwa8kFKOx6pjxOZgJ 67uDvRiUl5KaJfrdSJvqX1RictGlyN/D+fODkV9XzApHi3GfiWOL0dVoVSNkdGDBFANGaY 2mMePa/H8m2eANUrEYJX7B6RqsuTBN+C9r8ZF+f1oe8a62JIoSBiNSeHKa4GW7x+Zpsgk3 rfCfl6MnBhuBLOIzNUWmyVwAdgmciN6gQ3MwCKwLuL4Yv+zP8k2ss5tLOZry/C5pZLVFl2 kgD5j12f+1ZDuYejm70G+iXZ5uvNEmI8HP8Ml/2WsH8WcBWC6L7jrSJY7PX6OA== ARC-Seal: i=1; a=rsa-sha256; d=iki.fi; s=lahtoruutu; cv=none; t=1775139288; b=eXGX+Nk+zSFpQFcfpF/jKNqUiBSYBziQiN2Wd1XOsKmxeV6ty4bjGBzjitu9SizDFoLkTf lu/7PUumCiS/6nW+iqgSTtbKY6O5lGaAYkBQmtHnAqgFFmWienUIQ9Uob5QRPCSBqvqt1K VPaaFetFTUYElin8clmsv6qWx0Ig9A7MVsXVIIz00sJtC2ECS9E56xDwdZjS69GnkLSXBX ld9S014wRqK5AnFNFzoeC4OvT0MsLS9RXBskPcfoJQyxa2hj+e4F3meDOWXaM/kOPkTn3R CkNsY3VqBlplki5rHkW7tjFYKtdEKw7Slv5kdVgz79gDM/oJr1KOXaxD9S7iQw== ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1775139288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DiVGzsrYAMGUBPwxuA25EFzB91JkAhQUnZRH2qnV1zc=; b=vRwHnLJC8KUtFh5KLba2c+3lAEijST8wUB9Z6Kjo12VV/cYrTt584d5R3JQh52fDSR5mVV a+kO835DYYmjAI9diLWcr0N3CwC7aN8tngerg9yFVN/Glus9sUTnwBZ//FdqgoqSjs5Url Zp1uuPIbbh8JOgmNjbk6lG4mV7/OvFeGMoq3S0ntDU1ZPDqMZ/QN9USlKUUAyj6QVfmVOX Z2WI2akTODEyChZ9pgYA6Mab5wA40tDuEqgxWRxytQLLDbgw1+UNHUMAITlo50R5mq0INR 5pUNwJWP1bH/IQqlP39FjL5pct2+ub1dr6i9SZgegbbCcnxZR8th7ZfyefEVyQ== Message-ID: <83e37829-0d94-49b2-ad48-5feb7b5d5e44@iki.fi> Date: Thu, 2 Apr 2026 17:14:46 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Shared hash table allocations To: Ashutosh Bapat Cc: Tomas Vondra , "pgsql-hackers@postgresql.org" , Robert Haas , Rahila Syed References: <01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi> <2981bb36-6bbe-4bdc-9a94-29b1114c79bd@vondra.me> <3026ec05-f664-4ebe-8bf6-0a1218b234ec@iki.fi> <19945803-6bcc-40fe-a14a-7dc5c462ed80@iki.fi> Content-Language: en-US From: Heikki Linnakangas In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 02/04/2026 15:55, Ashutosh Bapat wrote: > When we "allocate" shared memory, we are just allocating space on > systems which use mmap. The memory gets allocated only when it is > touched. The wiggle room as a whole is never touched during > initialization. Those pages get allocated when wiggle room is used - > i.e. when the entries beyond initial number are allocated. By > allocating maximal hash tables, I was worried that we will allocate > more memory than required. But that's not true since a 4K memory page > fits only 50-60 entries - far less than the default configuration > permits. Most of the memory for the hash table will be allocated as > the entries as used. Hmm, that's a good point about untouched memory not being allocated. I think it's fine, though. With small changes on top of the the earlier refactorings from this thread, we could stop pre-allocating all the elements when a shared memory hash table is created, and have ShmemHashAlloc() allocate them on the fly, but instead of doing them as anonymous allocations like we do with ShmemAlloc() today, the allocations could come from the pre-allocated region dedicated to the hash table. You'd still get the same determinism and visibility in pg_shmem_allocations, but you could avoid actually touching the pages until they're needed. Not sure it's worth the trouble. > The second hazard of increasing hash table size is the hash table > access becomes slower as it becomes sparse [1]. I don't think it shows > up in performance but maybe worth trying a trivial pgbench run, just > to make sure that default performance doesn't regress. Interesting, but yeah I don't think that's going to be measurable. I did some quick testing with a test function that just locks and unlocks relations: PG_FUNCTION_INFO_V1(test_lock_bench); Datum test_lock_bench(PG_FUNCTION_ARGS) { int32 num_distinct_locks = PG_GETARG_INT32(0); int32 num_acquires = PG_GETARG_INT32(1); LOCKMODE lockmode = AccessExclusiveLock; #define FIRST_RELID 1000000000 for (int32 i = 0; i < num_acquires; i++) { Oid relid = FIRST_RELID + i % num_distinct_locks; if (i >= num_distinct_locks) UnlockRelationOid(relid, lockmode); if (!ConditionalLockRelationOid(relid, lockmode)) { elog(LOG, "could not acquire lock, iteration %d", i); break; } } PG_RETURN_VOID(); } With test_lock_bench(1, 5000000), I don't see any meaningful difference, i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10 to max_locks_per_transactions=128. With more distinct locks involved, the caching effects might be bigger, and maybe you'd see a difference because of more or less collisions. Spot testing some values on my laptop, I don't see anything that would worry me though. > The increase in memory usage is 3MB, which is fine usually. I mean, we > didn't hear any complaints when we increased the default size of the > shared buffer pool - this is much less than that. But why do you want > to double the max_locks_per_transaction? I first thought it's because > the hash table size is anyway a power of 2. But then the size of the > hash table is actually max_locks_per_transaction * (number of backends > + number of prepared transactions). What we want is the default > max_locks_per_transaction such that 14927 locks are allowed. Playing > with max_locks_per_transaction using your script 109 seems to be the > number which will give us 14951 locks. It looks (and is) an odd > number. If we are worried about memory increase, that's the number we > should use as default and then write a long paragraph about why we > chose such an odd-looking number :D. My first thought was actually to set max_locks_per_transaction=100, making it a nice round number :-). But then the neighboring default of max_pred_locks_per_transaction=64 looks weird. We could reduce it max_pred_locks_per_transaction=50 to make it fit in. But it feels a little arbitrary to change just for aesthetic reasons. > I think we should highlight the change in default in the release notes > though. The users which use default configuration will notice an > increase in the memory. If they are using a custom value, they will > think of bumping it up. Can we give them some ballpark % by which they > should increase their max_locks_per_transaction? E.g. double the > number or something? I don't think people who are using the defaults will notice. I'm worried about the people who have set max_locks_per_transactions manually, and now effectively get less lock space for the same setting. Yeah, doubling the previous value is a good rule of thumb. - Heikki