Re: Shared hash table allocations

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Shared hash table allocations
12+ messages / 3 participants
[nested] [flat]

* Re: Shared hash table allocations
@ 2026-03-31 21:25  Heikki Linnakangas <[email protected]>
  0 siblings, 2 replies; 12+ messages in thread

From: Heikki Linnakangas @ 2026-03-31 21:25 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On 31/03/2026 01:02, Heikki Linnakangas wrote:
> I wonder if we should change the defaults somehow. In usual 
> configurations, people are currently getting much more lock space than 
> you'd expect based on max_connections and max_locks_per_transaction, and 
> after these patches, they'll get much fewer locks. It might be prudent 
> bump up the default max_locks_per_transaction setting so that you'd get 
> roughly the same amount of locks in the default configuration.

I did some testing of the memory usage and how removing the wiggle room 
affects the number of locks you can acquire. Attached are the test 
procedures I used, and proposed patches. The patches are new, designed 
to just change the parameters of the hash tables and shmem calculations 
with no other changes. They don't include the refactorings we've 
discussed so far in this thread. My plan is to commit these new patches 
first, and those other refactorings after that. Once these new patches 
are committed, the refactorings won't materially change the overall 
memory usage or how it's divided between different hash tables, all 
those effects are in these new patches.


master: With the default configuration on master, the attached test 
procedure can take 14927 locks before hitting "out of shared memory" 
error. At that point, all the "wiggle room" is assigned for the LOCK 
hash table. A different scenario could make the PROCLOCK hash table 
consume all the wiggle room instead, but I believe running out of LOCK 
space is more common, and I don't think it changes the big picture 
anyway if you hit the ceiling with PROCLOCK instead.

0001: While looking at this, I noticed that we add 10% "safety margin" 
to the shmem calculations in predicate.c, but we had already marked the 
predicate.c hash tables as HASH_FIXED_SIZE so they were never able to 
make use of the safety margin. Oops. The extra memory was available for 
the lock.c hash tables, though. After removing that bogus 10% safety 
margin from predicate.c, memory usage was reduced by 200 kB, but the 
number of locks you could take went down from 14927 to 14159.

0002: As the next step, I also removed the 10% safety margin from 
lock.c. That reduced memory usage by another 320 kB, and the number of 
locks went down from 14159 to 12815.

0003: After those changes, there's only little extra memory sloshing 
around that's not accounted for any data structure. ipci.c reserves a 
constant 100 kB, but that's pretty much it. However, there's still 
flexibility between the LOCK and the PROCLOCK hash tables. The PROCLOCK 
hash table is estimated to be 2x the size of the LOCK table, but when 
it's not, the space can get assigned to the LOCK table instead. In patch 
0003 I removed that flexibility by marking them both with 
HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops 
the hash tables from using any of the other remaining wiggle room, 
making them truly fixed-size. This doesn't change the overall shared 
memory allocated, but the number of locks that the test procedure could 
acquire went down from 12815 to 8767, mostly because it cannot "steal" 
space from PROCLOCK anymore.

0004: To buy back that lock manager space in common out-of-the box 
situations, I propose to bump up the default for 
max_locks_per_transactions from 64 to 128. That increases memory usage 
again by 3216 kB, making it 2696 kB higher than on master (remember that 
the previous changes reduced memory usage). The number of locks you can 
take after that is 17535, which more than on master (14927).

Increasing the default won't affect users who have already set 
max_locks_per_transaction to a non-default value. They will see that the 
number of locks they can acquire with their existing configuration will 
be reduced, again because of the lost wiggle room and flexibility 
between LOCK and PROCLOCK. Not sure if we could or should do something 
about that. Probably best to just document in the release notes that if 
you had raised increase max_locks_per_transaction, you might need to 
raise it further to be able to accommodate the same amount of locks as 
before.

Here's all that in table form:

| Patch                                 | Shmem (kB) | Locks |
| --------------------------------------+------------|-------|
| master                                |     153560 | 14927 |
| 0001: remove 10% from predicate.c     |     153360 | 14159 |
| 0002: remove 10% from lock.c          |     153040 | 12815 |
| 0003: Make lock.c tables fixed size   |     153040 |  8767 |
| 0004: max_locks_per_transactions=128  |     156256 | 17535 |

This increase in memory usage is not great, but it's not that big in the 
grand scheme of things. I think it's well worth, and better than the 
sloppy scheme we have today.

Any thoughts, objections?

- Heikki


Attachments:

  [application/sql] shmem-test.sql (2.1K, 2-shmem-test.sql)
  download

  [text/x-patch] v3-0001-Remove-bogus-safety-margin-from-predicate.c-shmem.patch (1.6K, 3-v3-0001-Remove-bogus-safety-margin-from-predicate.c-shmem.patch)
  download | inline diff:
From 8f55a325a4bfeeeb87ec2582c7d30d8a20d0abe9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Tue, 31 Mar 2026 12:06:44 +0300
Subject: [PATCH v3 1/4] Remove bogus "safety margin" from predicate.c shmem
 estimates

The 10% safety margin was copy-pasted from lock.c when the predicate
locking code was originally added. However, we later (commit
7c797e7194) added the HASH_FIXED_SIZE flag to the hash tables, which
means that they cannot actually use the safety margin that we're
calculating for them.

The extra memory was mainly used by the main lock manager, which is
the only shmem hash table of non-trivial size that does not use the
HASH_FIXED_SIZE flag. If we wanted to have more space for the lock
manager, we should reserve it directly in lock.c. As this patch
stands, the lock manager will just have less memory available than
before.
---
 src/backend/storage/lmgr/predicate.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index ae0e96aee5f..efa47ec1684 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -1383,12 +1383,6 @@ PredicateLockShmemSize(void)
 	size = add_size(size, hash_estimate_size(max_predicate_locks,
 											 sizeof(PREDICATELOCK)));
 
-	/*
-	 * Since NPREDICATELOCKTARGETENTS is only an estimate, add 10% safety
-	 * margin.
-	 */
-	size = add_size(size, size / 10);
-
 	/* transaction list */
 	max_serializable_xacts = (MaxBackends + max_prepared_xacts) * 10;
 	size = add_size(size, PredXactListDataSize);
-- 
2.47.3



  [text/x-patch] v3-0002-Remove-10-safety-marging-from-lock-hash-table-est.patch (818B, 4-v3-0002-Remove-10-safety-marging-from-lock-hash-table-est.patch)
  download | inline diff:
From 53e0a614fdfaa42003841b4776cc4aab72f11982 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Tue, 31 Mar 2026 23:14:36 +0300
Subject: [PATCH v3 2/4] Remove 10% safety marging from lock hash table
 estimates

---
 src/backend/storage/lmgr/lock.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 234643e4dd7..2159de9015a 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3778,11 +3778,6 @@ LockManagerShmemSize(void)
 	max_table_size *= 2;
 	size = add_size(size, hash_estimate_size(max_table_size, sizeof(PROCLOCK)));
 
-	/*
-	 * Since NLOCKENTS is only an estimate, add 10% safety margin.
-	 */
-	size = add_size(size, size / 10);
-
 	return size;
 }
 
-- 
2.47.3



  [text/x-patch] v3-0003-Make-the-lock-hash-tables-fixed-sized.patch (2.1K, 5-v3-0003-Make-the-lock-hash-tables-fixed-sized.patch)
  download | inline diff:
From edc25ff76b77244880120803ca0f495fa734a02c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Tue, 31 Mar 2026 18:39:03 +0300
Subject: [PATCH v3 3/4] Make the lock hash tables fixed-sized

---
 src/backend/storage/lmgr/lock.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 2159de9015a..ac7c5ed0604 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -445,16 +445,14 @@ void
 LockManagerShmemInit(void)
 {
 	HASHCTL		info;
-	int64		init_table_size,
-				max_table_size;
+	int64		max_table_size;
 	bool		found;
 
 	/*
-	 * Compute init/max size to request for lock hashtables.  Note these
-	 * calculations must agree with LockManagerShmemSize!
+	 * Compute sizes for lock hashtables.  Note that these calculations must
+	 * agree with LockManagerShmemSize!
 	 */
 	max_table_size = NLOCKENTS();
-	init_table_size = max_table_size / 2;
 
 	/*
 	 * Allocate hash table for LOCK structs.  This stores per-locked-object
@@ -465,14 +463,14 @@ LockManagerShmemInit(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodLockHash = ShmemInitHash("LOCK hash",
-									   init_table_size,
+									   max_table_size,
 									   max_table_size,
 									   &info,
-									   HASH_ELEM | HASH_BLOBS | HASH_PARTITION);
+									   HASH_ELEM | HASH_BLOBS |
+									   HASH_PARTITION  | HASH_FIXED_SIZE);
 
 	/* Assume an average of 2 holders per lock */
 	max_table_size *= 2;
-	init_table_size *= 2;
 
 	/*
 	 * Allocate hash table for PROCLOCK structs.  This stores
@@ -484,10 +482,11 @@ LockManagerShmemInit(void)
 	info.num_partitions = NUM_LOCK_PARTITIONS;
 
 	LockMethodProcLockHash = ShmemInitHash("PROCLOCK hash",
-										   init_table_size,
+										   max_table_size,
 										   max_table_size,
 										   &info,
-										   HASH_ELEM | HASH_FUNCTION | HASH_PARTITION);
+										   HASH_ELEM | HASH_FUNCTION |
+										   HASH_FIXED_SIZE | HASH_PARTITION);
 
 	/*
 	 * Allocate fast-path structures.
-- 
2.47.3



  [text/x-patch] v3-0004-Change-default-max_locks_per_transactions-128.patch (4.1K, 6-v3-0004-Change-default-max_locks_per_transactions-128.patch)
  download | inline diff:
From 692c0727d84d534a2575e0442c96e4df3752a7b4 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Wed, 1 Apr 2026 00:09:35 +0300
Subject: [PATCH v3 4/4] Change default max_locks_per_transactions=128

---
 doc/src/sgml/config.sgml                      | 2 +-
 src/backend/utils/init/postinit.c             | 2 +-
 src/backend/utils/misc/guc_parameters.dat     | 2 +-
 src/backend/utils/misc/postgresql.conf.sample | 2 +-
 src/bin/pg_resetwal/pg_resetwal.c             | 4 ++--
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 229f41353eb..20706b56158 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11470,7 +11470,7 @@ dynamic_library_path = '/usr/local/lib/postgresql:$libdir'
         can lock more objects as long as the locks of all transactions
         fit in the lock table.  This is <emphasis>not</emphasis> the number of
         rows that can be locked; that value is unlimited.  The default,
-        64, has historically proven sufficient, but you might need to
+        128, has historically proven sufficient, but you might need to
         raise this value if you have queries that touch many different
         tables in a single transaction, e.g., query of a parent table with
         many children.  This parameter can only be set at server start.
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 577ef5effbb..783a7400464 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -593,7 +593,7 @@ InitializeFastPathLocks(void)
 	 * value at FP_LOCK_GROUPS_PER_BACKEND_MAX and insist the value is at
 	 * least 1.
 	 *
-	 * The default max_locks_per_transaction = 64 means 4 groups by default.
+	 * The default max_locks_per_transaction = 128 means 8 groups by default.
 	 */
 	FastPathLockGroupsPerBackend =
 		Max(Min(pg_nextpower2_32(max_locks_per_xact) / FP_LOCK_SLOTS_PER_GROUP,
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0a862693fcd..a3e0dda34af 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1979,7 +1979,7 @@
   short_desc => 'Sets the maximum number of locks per transaction.',
   long_desc => 'The shared lock table is sized on the assumption that at most "max_locks_per_transaction" objects per server process or prepared transaction will need to be locked at any one time.',
   variable => 'max_locks_per_xact',
-  boot_val => '64',
+  boot_val => '128',
   min => '10',
   max => 'INT_MAX',
 },
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf15597385b..6e376d85d61 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -856,7 +856,7 @@
 #------------------------------------------------------------------------------
 
 #deadlock_timeout = 1s
-#max_locks_per_transaction = 64         # min 10
+#max_locks_per_transaction = 128        # min 10
                                         # (change requires restart)
 #max_pred_locks_per_transaction = 64    # min 10
                                         # (change requires restart)
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index ab766c34d4b..44f2b446e5d 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -722,7 +722,7 @@ GuessControlValues(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
-	ControlFile.max_locks_per_xact = 64;
+	ControlFile.max_locks_per_xact = 128;
 
 	ControlFile.maxAlign = MAXIMUM_ALIGNOF;
 	ControlFile.floatFormat = FLOATFORMAT_VALUE;
@@ -931,7 +931,7 @@ RewriteControlFile(void)
 	ControlFile.max_wal_senders = 10;
 	ControlFile.max_worker_processes = 8;
 	ControlFile.max_prepared_xacts = 0;
-	ControlFile.max_locks_per_xact = 64;
+	ControlFile.max_locks_per_xact = 128;
 
 	/* The control file gets flushed here. */
 	update_controlfile(".", &ControlFile, true);
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 10:24  Matthias van de Meent <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Matthias van de Meent @ 2026-04-02 10:24 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Tue, 31 Mar 2026 at 23:25, Heikki Linnakangas <[email protected]> wrote:
>
> On 31/03/2026 01:02, Heikki Linnakangas wrote:
> > I wonder if we should change the defaults somehow. In usual
> > configurations, people are currently getting much more lock space than
> > you'd expect based on max_connections and max_locks_per_transaction, and
> > after these patches, they'll get much fewer locks. It might be prudent
> > bump up the default max_locks_per_transaction setting so that you'd get
> > roughly the same amount of locks in the default configuration.
>
> master: With the default configuration on master, the attached test
> procedure can take 14927 locks before hitting "out of shared memory"
> error. At that point, all the "wiggle room" is assigned for the LOCK
> hash table. A different scenario could make the PROCLOCK hash table
> consume all the wiggle room instead, but I believe running out of LOCK
> space is more common, and I don't think it changes the big picture
> anyway if you hit the ceiling with PROCLOCK instead.
>
> 0001: [...]

LGTM

> 0002: As the next step, I also removed the 10% safety margin from
> lock.c. That reduced memory usage by another 320 kB, and the number of
> locks went down from 14159 to 12815.

LGTM

> 0003: In patch 0003 I removed that flexibility by marking them both with
> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
> the hash tables from using any of the other remaining wiggle room,
> making them truly fixed-size.

I think this patch finally gave me a good reason why PROCLOCK would've
needed to be allocated with double the sizes of LOCK:

LOCK is (was) initialized with only 50% of its max capacity. If
PROCLOCK was initialized with the same parameters and all spare shmem
is then allocated to other processes, then backends wouldn't be able
to safely use max_locks_per_transaction. To guarantee no OOMs when all
backends use max_locks_per_transaction, PROCLOCK's size must be
doubled to make sure PROCLOCK has sufficient space. (The same isn't
usually an issue for LOCK, because it's very likely backends will
operate on the same tables, and thus will be able to share most of the
LOCK structs.)

Now that LOCK is fully allocated, I think the size doubling can be
removed, or possibly parameterized for those that need it.

> 0004: To buy back that lock manager space in common out-of-the box
> situations, I propose to bump up the default for
> max_locks_per_transactions from 64 to 128. [...]
> The number of locks you can
> take after that is 17535, which more than on master (14927).

Note that this is for one backend; with current sizing you could lock
the same 17535 locks in at least one more backend.

Patch LGTM.

> Any thoughts, objections?

Overall, I'm +1 on this change. I do have some general comments
though, at least in part based on discussions in the hackers discord
last year[0]:

1.) We'll need to clearly advertise the changed, more strict behaviour
of the heavy-weight locking system in the release notes.
2.) (Related) We probably should make it easier for DBAs to monitor
lock counts now that we enforce the limit more strictly. This could
take the form of (optional) logging that alerts when a session exceeds
some threshold number of locks in a transaction (e.g. 100% and 200% of
max_locks_per_transaction), or as a metric in
pg_stat_{activity,databases} as total locks taken/max number of locks
taken in a transaction.
3.) (Related) We should probably parameterize the LOCK-to-PROCLOCK
ratio. LOCK is large, and especially on systems with high values of
max_connections (where the additional LOCKs will go unused) the
overhead of carrying all those additional LOCKs would go up to 50% of
the added memory usage (LOCK at 152+24=176B, PROCLOCK at
2*(64+24B)=176B).  It'd be nice if we could avoid allocating that
memory.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

[0] starting at
https://discord.com/channels/1258108670710124574/1266090488415654032/1442879718285119518

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 11:52  Heikki Linnakangas <[email protected]>
  parent: Matthias van de Meent <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Heikki Linnakangas @ 2026-04-02 11:52 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On 02/04/2026 13:24, Matthias van de Meent wrote:
> On Tue, 31 Mar 2026 at 23:25, Heikki Linnakangas <[email protected]> wrote:
>>
>> 0003: In patch 0003 I removed that flexibility by marking them both with
>> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
>> the hash tables from using any of the other remaining wiggle room,
>> making them truly fixed-size.
> 
> I think this patch finally gave me a good reason why PROCLOCK would've
> needed to be allocated with double the sizes of LOCK:
> 
> LOCK is (was) initialized with only 50% of its max capacity. If
> PROCLOCK was initialized with the same parameters and all spare shmem
> is then allocated to other processes, then backends wouldn't be able
> to safely use max_locks_per_transaction. To guarantee no OOMs when all
> backends use max_locks_per_transaction, PROCLOCK's size must be
> doubled to make sure PROCLOCK has sufficient space. (The same isn't
> usually an issue for LOCK, because it's very likely backends will
> operate on the same tables, and thus will be able to share most of the
> LOCK structs.)

Hmm, I don't know if that makes sense. It can happen that you have a lot 
of backends acquiring the same, smaller set of locks, growing PROCLOCK 
so that it uses up all the available wiggle room, and LOCK can never 
grow from its initial size, 1/2 * max_locks_per_transactions * 
MaxBackends. If the workload then changes so that every backend tries to 
acquire exactly max_locks_per_transactions locks, but this time each 
lock is on a different object, you will run out of shared memory at 1/2 
the size of what you expected.

The opposite can't happen, because PROCLOCK is always at least as large 
as LOCK. It doesn't matter what you set PROCLOCK's initial size to, it 
will grow together with LOCK, and you will not run out of shared memory 
before PROCLOCK has grown up to max_locks_per_transactions * MaxBackends 
anyway.

> Now that LOCK is fully allocated, I think the size doubling can be
> removed, or possibly parameterized for those that need it.

I don't think that follows. The 2x factor is pretty arbitrary, but it's 
still a fair assumption that many backends will be acquiring locks on 
the same objects so you need more space in PROCLOCK than in LOCK.

I don't know how true that assumption is. It feels right for OLTP 
applications. But the situation where I've hit max_locks_per_transaction 
is when I've tried to create one table with thousands or partitions. Or 
rather, when I try to *drop* that table. In that situation, there's just 
one transaction acquiring all the locks, so the PROCLOCK / LOCK ratio is 1.

We could parameterize it, but I feel that's probably overkill and 
exposing too much detail to users. At the end of the day, if you hit the 
limit, you just bump up max_locks_per_transactions. If there are two 
settings, it's more complicated; which one do you change? You probably 
don't mind wasting the few MB of memory that you could gain by carefully 
tuning the LOCK / PROCLOCK factor.
- Heikki

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 12:55  Ashutosh Bapat <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Ashutosh Bapat @ 2026-04-02 12:55 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Wed, Apr 1, 2026 at 2:55 AM Heikki Linnakangas <[email protected]> wrote:
>
> On 31/03/2026 01:02, Heikki Linnakangas wrote:
> > I wonder if we should change the defaults somehow. In usual
> > configurations, people are currently getting much more lock space than
> > you'd expect based on max_connections and max_locks_per_transaction, and
> > after these patches, they'll get much fewer locks. It might be prudent
> > bump up the default max_locks_per_transaction setting so that you'd get
> > roughly the same amount of locks in the default configuration.
>
> I did some testing of the memory usage and how removing the wiggle room
> affects the number of locks you can acquire. Attached are the test
> procedures I used, and proposed patches. The patches are new, designed
> to just change the parameters of the hash tables and shmem calculations
> with no other changes. They don't include the refactorings we've
> discussed so far in this thread. My plan is to commit these new patches
> first, and those other refactorings after that. Once these new patches
> are committed, the refactorings won't materially change the overall
> memory usage or how it's divided between different hash tables, all
> those effects are in these new patches.
>
>
> master: With the default configuration on master, the attached test
> procedure can take 14927 locks before hitting "out of shared memory"
> error. At that point, all the "wiggle room" is assigned for the LOCK
> hash table. A different scenario could make the PROCLOCK hash table
> consume all the wiggle room instead, but I believe running out of LOCK
> space is more common, and I don't think it changes the big picture
> anyway if you hit the ceiling with PROCLOCK instead.
>
> 0001: While looking at this, I noticed that we add 10% "safety margin"
> to the shmem calculations in predicate.c, but we had already marked the
> predicate.c hash tables as HASH_FIXED_SIZE so they were never able to
> make use of the safety margin. Oops. The extra memory was available for
> the lock.c hash tables, though. After removing that bogus 10% safety
> margin from predicate.c, memory usage was reduced by 200 kB, but the
> number of locks you could take went down from 14927 to 14159.
>
> 0002: As the next step, I also removed the 10% safety margin from
> lock.c. That reduced memory usage by another 320 kB, and the number of
> locks went down from 14159 to 12815.
>
> 0003: After those changes, there's only little extra memory sloshing
> around that's not accounted for any data structure. ipci.c reserves a
> constant 100 kB, but that's pretty much it. However, there's still
> flexibility between the LOCK and the PROCLOCK hash tables. The PROCLOCK
> hash table is estimated to be 2x the size of the LOCK table, but when
> it's not, the space can get assigned to the LOCK table instead. In patch
> 0003 I removed that flexibility by marking them both with
> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
> the hash tables from using any of the other remaining wiggle room,
> making them truly fixed-size. This doesn't change the overall shared
> memory allocated, but the number of locks that the test procedure could
> acquire went down from 12815 to 8767, mostly because it cannot "steal"
> space from PROCLOCK anymore.
>
> 0004: To buy back that lock manager space in common out-of-the box
> situations, I propose to bump up the default for
> max_locks_per_transactions from 64 to 128. That increases memory usage
> again by 3216 kB, making it 2696 kB higher than on master (remember that
> the previous changes reduced memory usage). The number of locks you can
> take after that is 17535, which more than on master (14927).
>
> Increasing the default won't affect users who have already set
> max_locks_per_transaction to a non-default value. They will see that the
> number of locks they can acquire with their existing configuration will
> be reduced, again because of the lost wiggle room and flexibility
> between LOCK and PROCLOCK. Not sure if we could or should do something
> about that. Probably best to just document in the release notes that if
> you had raised increase max_locks_per_transaction, you might need to
> raise it further to be able to accommodate the same amount of locks as
> before.
>
> Here's all that in table form:
>
> | Patch                                 | Shmem (kB) | Locks |
> | --------------------------------------+------------|-------|
> | master                                |     153560 | 14927 |
> | 0001: remove 10% from predicate.c     |     153360 | 14159 |
> | 0002: remove 10% from lock.c          |     153040 | 12815 |
> | 0003: Make lock.c tables fixed size   |     153040 |  8767 |
> | 0004: max_locks_per_transactions=128  |     156256 | 17535 |
>
> This increase in memory usage is not great, but it's not that big in the
> grand scheme of things. I think it's well worth, and better than the
> sloppy scheme we have today.
>
> Any thoughts, objections?

When we "allocate" shared memory, we are just allocating space on
systems which use mmap. The memory gets allocated only when it is
touched. The wiggle room as a whole is never touched during
initialization. Those pages get allocated when wiggle room is used -
i.e. when the entries beyond initial number are allocated. By
allocating maximal hash tables, I was worried that we will allocate
more memory than required. But that's not true since a 4K memory page
fits only 50-60 entries - far less than the default configuration
permits. Most of the memory for the hash table will be allocated as
the entries as used.

The second hazard of increasing hash table size is the hash table
access becomes slower as it becomes sparse [1]. I don't think it shows
up in performance but maybe worth trying a trivial pgbench run, just
to make sure that default performance doesn't regress.

The increase in memory usage is 3MB, which is fine usually. I mean, we
didn't hear any complaints when we increased the default size of the
shared buffer pool - this is much less than that. But why do you want
to double the max_locks_per_transaction? I first thought it's because
the hash table size is anyway a power of 2. But then the size of the
hash table is actually max_locks_per_transaction * (number of backends
+ number of prepared transactions). What we want is the default
max_locks_per_transaction such that 14927 locks are allowed. Playing
with max_locks_per_transaction using your script 109 seems to be the
number which will give us 14951 locks. It looks (and is) an odd
number. If we are worried about memory increase, that's the number we
should use as default and then write a long paragraph about why we
chose such an odd-looking number :D.

I think we should highlight the change in default in the release notes
though. The users which use default configuration will notice an
increase in the memory. If they are using a custom value, they will
think of bumping it up. Can we give them some ballpark % by which they
should increase their max_locks_per_transaction? E.g. double the
number or something?

I looked at the places where max_locks_per_transaction is used. I
don't see any place that needs code updates other than the ones in the
patch.

LGTM, overall.

[1] https://ashutoshpg.blogspot.com/2025/07/efficiency-of-sparse-hash-table.html

--
Best Wishes,
Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 13:47  Matthias van de Meent <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 12+ messages in thread

From: Matthias van de Meent @ 2026-04-02 13:47 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Thu, 2 Apr 2026 at 13:52, Heikki Linnakangas <[email protected]> wrote:
>
> On 02/04/2026 13:24, Matthias van de Meent wrote:
> > On Tue, 31 Mar 2026 at 23:25, Heikki Linnakangas <[email protected]> wrote:
> >>
> >> 0003: In patch 0003 I removed that flexibility by marking them both with
> >> HASH_FIXED_SIZE, and making init_size equal to max_size. That also stops
> >> the hash tables from using any of the other remaining wiggle room,
> >> making them truly fixed-size.
> >
> > I think this patch finally gave me a good reason why PROCLOCK would've
> > needed to be allocated with double the sizes of LOCK:
> >
> > LOCK is (was) initialized with only 50% of its max capacity. If
> > PROCLOCK was initialized with the same parameters and all spare shmem
> > is then allocated to other processes, then backends wouldn't be able
> > to safely use max_locks_per_transaction. To guarantee no OOMs when all
> > backends use max_locks_per_transaction, PROCLOCK's size must be
> > doubled to make sure PROCLOCK has sufficient space. (The same isn't
> > usually an issue for LOCK, because it's very likely backends will
> > operate on the same tables, and thus will be able to share most of the
> > LOCK structs.)
>
> Hmm, I don't know if that makes sense.

Code and mailing history indicate it's not the reason, but there is no
other sane reason why PROCLOCK would *not* be sized to
max_locks_per_transaction * MaxBackends. At least with this reasoning
the minimum size is exactly that.

> It can happen that you have a lot
> of backends acquiring the same, smaller set of locks, growing PROCLOCK
> so that it uses up all the available wiggle room, and LOCK can never
> grow from its initial size, 1/2 * max_locks_per_transactions *
> MaxBackends. If the workload then changes so that every backend tries to
> acquire exactly max_locks_per_transactions locks, but this time each
> lock is on a different object, you will run out of shared memory at 1/2
> the size of what you expected.
>
> The opposite can't happen, because PROCLOCK is always at least as large
> as LOCK. It doesn't matter what you set PROCLOCK's initial size to, it
> will grow together with LOCK, and you will not run out of shared memory
> before PROCLOCK has grown up to max_locks_per_transactions * MaxBackends
> anyway.
>
> > Now that LOCK is fully allocated, I think the size doubling can be
> > removed, or possibly parameterized for those that need it.
>
> I don't think that follows. The 2x factor is pretty arbitrary, but it's
> still a fair assumption that many backends will be acquiring locks on
> the same objects so you need more space in PROCLOCK than in LOCK.

I agree that we'll *probably* have more PROCLOCKs in use than LOCKs.
But max_locks_per_transaction (MLPT) to me indicates that it is an
indicator of the maximum number of locks taken by a transaction, and
transaction locks have a 1:1 correspondence with PROCLOCKs (as long as
we ignore fast-path locking).

Adjusting that value by an arbitrary factor does not many any sense.
The user configured a value X, so we should use that value X.
Possibly there could be adjustments we need to make to give ourself
some breathing room (it's not uncommon to overallocate by a constant
factor to allow evict-after-insert patterns in caches), but I can't
explain a blanket doubling of usage "because we have a hunch LOCK
usage will be lower than PROCLOCK usage" when the user specified a
value that would/should map 1:1 against PROCLOCKs scaling as anything
other than plainly wasting memory.

> I don't know how true that assumption is. It feels right for OLTP
> applications. But the situation where I've hit max_locks_per_transaction
> is when I've tried to create one table with thousands or partitions. Or
> rather, when I try to *drop* that table. In that situation, there's just
> one transaction acquiring all the locks, so the PROCLOCK / LOCK ratio is 1.

> We could parameterize it, but I feel that's probably overkill and
> exposing too much detail to users. At the end of the day, if you hit the
> limit, you just bump up max_locks_per_transactions.

Or, if it's for DROP, you could use a phased dropping scheme, where
you spread the operation across many transactions by dropping a subset
of the partitions in each transaction. It takes more careful execution
and more time, but it allows you to avoid hitting the limits and
starving other backends of lock slots, and avoids requiring postmaster
restarts.

> If there are two
> settings, it's more complicated; which one do you change? You probably
> don't mind wasting the few MB of memory that you could gain by carefully
> tuning the LOCK / PROCLOCK factor.

Yes, that would be more complicated, but we have similar factors
elsewhere (hash_mem_multiplier, various costs, weights). We wouldn't
even have to use a factor, we could just as well use a new, more
direct `max_unique_locks_per_transaction`, which we'd use to scale the
LOCK hash.

Note that with our current default settings we're spending 11kiB (= 64
* (64+24)) per backend on what I would consider oversized PROCLOCK
allocations. With MLPT=128, that doubles to 22kiB per backend. Every
50 max_backends, that'd be ~1.1MB of shared memory allocated in excess
of user's requested configuration.

Kind regards,

Matthias van de Meent

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 14:14  Heikki Linnakangas <[email protected]>
  parent: Ashutosh Bapat <[email protected]>
  0 siblings, 2 replies; 12+ messages in thread

From: Heikki Linnakangas @ 2026-04-02 14:14 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On 02/04/2026 15:55, Ashutosh Bapat wrote:
> When we "allocate" shared memory, we are just allocating space on
> systems which use mmap. The memory gets allocated only when it is
> touched. The wiggle room as a whole is never touched during
> initialization. Those pages get allocated when wiggle room is used -
> i.e. when the entries beyond initial number are allocated. By
> allocating maximal hash tables, I was worried that we will allocate
> more memory than required. But that's not true since a 4K memory page
> fits only 50-60 entries - far less than the default configuration
> permits. Most of the memory for the hash table will be allocated as
> the entries as used.

Hmm, that's a good point about untouched memory not being allocated. I 
think it's fine, though.

With small changes on top of the the earlier refactorings from this 
thread, we could stop pre-allocating all the elements when a shared 
memory hash table is created, and have ShmemHashAlloc() allocate them on 
the fly, but instead of doing them as anonymous allocations like we do 
with ShmemAlloc() today, the allocations could come from the 
pre-allocated region dedicated to the hash table. You'd still get the 
same determinism and visibility in pg_shmem_allocations, but you could 
avoid actually touching the pages until they're needed. Not sure it's 
worth the trouble.

> The second hazard of increasing hash table size is the hash table
> access becomes slower as it becomes sparse [1]. I don't think it shows
> up in performance but maybe worth trying a trivial pgbench run, just
> to make sure that default performance doesn't regress.

Interesting, but yeah I don't think that's going to be measurable. I did 
some quick testing with a test function that just locks and unlocks 
relations:

PG_FUNCTION_INFO_V1(test_lock_bench);
Datum
test_lock_bench(PG_FUNCTION_ARGS)
{
	int32		num_distinct_locks = PG_GETARG_INT32(0);
	int32		num_acquires = PG_GETARG_INT32(1);

	LOCKMODE	lockmode = AccessExclusiveLock;

#define FIRST_RELID 1000000000

	for (int32 i = 0; i < num_acquires; i++)
	{
		Oid			relid = FIRST_RELID + i % num_distinct_locks;

		if (i >= num_distinct_locks)
			UnlockRelationOid(relid, lockmode);

		if (!ConditionalLockRelationOid(relid, lockmode))
		{
			elog(LOG, "could not acquire lock, iteration %d", i);
			break;
		}
	}

	PG_RETURN_VOID();
}

With test_lock_bench(1, 5000000), I don't see any meaningful difference, 
i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10 
to max_locks_per_transactions=128.

With more distinct locks involved, the caching effects might be bigger, 
and maybe you'd see a difference because of more or less collisions. 
Spot testing some values on my laptop, I don't see anything that would 
worry me though.

> The increase in memory usage is 3MB, which is fine usually. I mean, we
> didn't hear any complaints when we increased the default size of the
> shared buffer pool - this is much less than that. But why do you want
> to double the max_locks_per_transaction? I first thought it's because
> the hash table size is anyway a power of 2. But then the size of the
> hash table is actually max_locks_per_transaction * (number of backends
> + number of prepared transactions). What we want is the default
> max_locks_per_transaction such that 14927 locks are allowed. Playing
> with max_locks_per_transaction using your script 109 seems to be the
> number which will give us 14951 locks. It looks (and is) an odd
> number. If we are worried about memory increase, that's the number we
> should use as default and then write a long paragraph about why we
> chose such an odd-looking number :D.

My first thought was actually to set max_locks_per_transaction=100, 
making it a nice round number :-). But then the neighboring default of 
max_pred_locks_per_transaction=64 looks weird. We could reduce it 
max_pred_locks_per_transaction=50 to make it fit in. But it feels a 
little arbitrary to change just for aesthetic reasons.

> I think we should highlight the change in default in the release notes
> though. The users which use default configuration will notice an
> increase in the memory. If they are using a custom value, they will
> think of bumping it up. Can we give them some ballpark % by which they
> should increase their max_locks_per_transaction? E.g. double the
> number or something?

I don't think people who are using the defaults will notice. I'm worried 
about the people who have set max_locks_per_transactions manually, and 
now effectively get less lock space for the same setting. Yeah, doubling 
the previous value is a good rule of thumb.

- Heikki

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 14:52  Ashutosh Bapat <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 0 replies; 12+ messages in thread

From: Ashutosh Bapat @ 2026-04-02 14:52 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Thu, Apr 2, 2026 at 7:44 PM Heikki Linnakangas <[email protected]> wrote:
>
> On 02/04/2026 15:55, Ashutosh Bapat wrote:
> > When we "allocate" shared memory, we are just allocating space on
> > systems which use mmap. The memory gets allocated only when it is
> > touched. The wiggle room as a whole is never touched during
> > initialization. Those pages get allocated when wiggle room is used -
> > i.e. when the entries beyond initial number are allocated. By
> > allocating maximal hash tables, I was worried that we will allocate
> > more memory than required. But that's not true since a 4K memory page
> > fits only 50-60 entries - far less than the default configuration
> > permits. Most of the memory for the hash table will be allocated as
> > the entries as used.
>
> Hmm, that's a good point about untouched memory not being allocated. I
> think it's fine, though.
>
> With small changes on top of the the earlier refactorings from this
> thread, we could stop pre-allocating all the elements when a shared
> memory hash table is created, and have ShmemHashAlloc() allocate them on
> the fly, but instead of doing them as anonymous allocations like we do
> with ShmemAlloc() today, the allocations could come from the
> pre-allocated region dedicated to the hash table. You'd still get the
> same determinism and visibility in pg_shmem_allocations, but you could
> avoid actually touching the pages until they're needed. Not sure it's
> worth the trouble.

share hash table refactoring + shared memory structure refactoring +
resizable structures, we should be able to get resizable shared hash
tables as well. But that's not required immediately. I feel large hash
tables like buffer hash table, lock hash tables can benefit from this
kind of thing.

>
> > The second hazard of increasing hash table size is the hash table
> > access becomes slower as it becomes sparse [1]. I don't think it shows
> > up in performance but maybe worth trying a trivial pgbench run, just
> > to make sure that default performance doesn't regress.
>
> Interesting, but yeah I don't think that's going to be measurable. I did
> some quick testing with a test function that just locks and unlocks
> relations:
>
> PG_FUNCTION_INFO_V1(test_lock_bench);
> Datum
> test_lock_bench(PG_FUNCTION_ARGS)
> {
>         int32           num_distinct_locks = PG_GETARG_INT32(0);
>         int32           num_acquires = PG_GETARG_INT32(1);
>
>         LOCKMODE        lockmode = AccessExclusiveLock;
>
> #define FIRST_RELID 1000000000
>
>         for (int32 i = 0; i < num_acquires; i++)
>         {
>                 Oid                     relid = FIRST_RELID + i % num_distinct_locks;
>
>                 if (i >= num_distinct_locks)
>                         UnlockRelationOid(relid, lockmode);
>
>                 if (!ConditionalLockRelationOid(relid, lockmode))
>                 {
>                         elog(LOG, "could not acquire lock, iteration %d", i);
>                         break;
>                 }
>         }
>
>         PG_RETURN_VOID();
> }
>
> With test_lock_bench(1, 5000000), I don't see any meaningful difference,
> i.e. it's within 1-2 %, with anything from max_locks_per_transactions=10
> to max_locks_per_transactions=128.
>
> With more distinct locks involved, the caching effects might be bigger,
> and maybe you'd see a difference because of more or less collisions.
> Spot testing some values on my laptop, I don't see anything that would
> worry me though.

Great. This agrees with my experiments with sparse buffer lookup table.

>
> > The increase in memory usage is 3MB, which is fine usually. I mean, we
> > didn't hear any complaints when we increased the default size of the
> > shared buffer pool - this is much less than that. But why do you want
> > to double the max_locks_per_transaction? I first thought it's because
> > the hash table size is anyway a power of 2. But then the size of the
> > hash table is actually max_locks_per_transaction * (number of backends
> > + number of prepared transactions). What we want is the default
> > max_locks_per_transaction such that 14927 locks are allowed. Playing
> > with max_locks_per_transaction using your script 109 seems to be the
> > number which will give us 14951 locks. It looks (and is) an odd
> > number. If we are worried about memory increase, that's the number we
> > should use as default and then write a long paragraph about why we
> > chose such an odd-looking number :D.
>
> My first thought was actually to set max_locks_per_transaction=100,
> making it a nice round number :-). But then the neighboring default of
> max_pred_locks_per_transaction=64 looks weird. We could reduce it
> max_pred_locks_per_transaction=50 to make it fit in. But it feels a
> little arbitrary to change just for aesthetic reasons.

+1. Let's keep it 128 and see if there are complaints. We can set it
to 100 or 109 if the complaints look serious.

-- 
Best Wishes,
Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 16:13  Ashutosh Bapat <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Ashutosh Bapat @ 2026-04-02 16:13 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Thu, Apr 2, 2026 at 7:44 PM Heikki Linnakangas <[email protected]> wrote:
>
> > I think we should highlight the change in default in the release notes
> > though. The users which use default configuration will notice an
> > increase in the memory. If they are using a custom value, they will
> > think of bumping it up. Can we give them some ballpark % by which they
> > should increase their max_locks_per_transaction? E.g. double the
> > number or something?
>
> I don't think people who are using the defaults will notice. I'm worried
> about the people who have set max_locks_per_transactions manually, and
> now effectively get less lock space for the same setting. Yeah, doubling
> the previous value is a good rule of thumb.

Users who have set max_locks_per_transaction to a non-default value
but have tuned their application to respect those limits are safe even
after this change, since their lock hash table never used wiggle room.
Those users who weren't careful to respect those limits will need to
bump their setting. I think the release notes should "nudge" all the
users who use non-default max_locks_per_transaction to increase it if
they see "out of memory" errors. I don't think it should provide a
blanket advise to double their locks

-- 
Best Wishes,
Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-02 16:44  Heikki Linnakangas <[email protected]>
  parent: Ashutosh Bapat <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Heikki Linnakangas @ 2026-04-02 16:44 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On 02/04/2026 19:13, Ashutosh Bapat wrote:
> On Thu, Apr 2, 2026 at 7:44 PM Heikki Linnakangas <[email protected]> wrote:
>>
>>> I think we should highlight the change in default in the release notes
>>> though. The users which use default configuration will notice an
>>> increase in the memory. If they are using a custom value, they will
>>> think of bumping it up. Can we give them some ballpark % by which they
>>> should increase their max_locks_per_transaction? E.g. double the
>>> number or something?
>>
>> I don't think people who are using the defaults will notice. I'm worried
>> about the people who have set max_locks_per_transactions manually, and
>> now effectively get less lock space for the same setting. Yeah, doubling
>> the previous value is a good rule of thumb.
> 
> Users who have set max_locks_per_transaction to a non-default value
> but have tuned their application to respect those limits are safe even
> after this change, since their lock hash table never used wiggle room.
> Those users who weren't careful to respect those limits will need to
> bump their setting.

That's technically true, but in practice it's very hard for someone to 
carefully tune the setting. It's difficult to know how many locks a 
particular set of queries take. In practice what people do is they bump 
up the setting if the get the "out of shared memory" error, until the 
error goes away. If you do the tuning that way, it's quite possible that 
you are relying the "wiggle room" without realizing it.

> I think the release notes should "nudge" all the
> users who use non-default max_locks_per_transaction to increase it if
> they see "out of memory" errors. I don't think it should provide a
> blanket advise to double their locks

How about:

"If you had previously set max_locks_per_transaction, you might need to 
set it to a higher value in v19 to avoid "out of shared memory" errors. 
If you are unsure what to set it to and don't mind the increased memory 
usage, you can double the value to ensure that you can acquire at least 
as many locks as before"

TODO: do some more calculations and testing of how exactly the doubling 
rule works with different values. Is it guaranteed to be enough in all 
cases?

- Heikki

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-03 13:03  Ashutosh Bapat <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Ashutosh Bapat @ 2026-04-03 13:03 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>

On Thu, Apr 2, 2026 at 10:15 PM Heikki Linnakangas <[email protected]> wrote:
>
> On 02/04/2026 19:13, Ashutosh Bapat wrote:
> > On Thu, Apr 2, 2026 at 7:44 PM Heikki Linnakangas <[email protected]> wrote:
> >>
> >>> I think we should highlight the change in default in the release notes
> >>> though. The users which use default configuration will notice an
> >>> increase in the memory. If they are using a custom value, they will
> >>> think of bumping it up. Can we give them some ballpark % by which they
> >>> should increase their max_locks_per_transaction? E.g. double the
> >>> number or something?
> >>
> >> I don't think people who are using the defaults will notice. I'm worried
> >> about the people who have set max_locks_per_transactions manually, and
> >> now effectively get less lock space for the same setting. Yeah, doubling
> >> the previous value is a good rule of thumb.
> >
> > Users who have set max_locks_per_transaction to a non-default value
> > but have tuned their application to respect those limits are safe even
> > after this change, since their lock hash table never used wiggle room.
> > Those users who weren't careful to respect those limits will need to
> > bump their setting.
>
> That's technically true, but in practice it's very hard for someone to
> carefully tune the setting. It's difficult to know how many locks a
> particular set of queries take. In practice what people do is they bump
> up the setting if the get the "out of shared memory" error, until the
> error goes away. If you do the tuning that way, it's quite possible that
> you are relying the "wiggle room" without realizing it.
>

That's true.

> > I think the release notes should "nudge" all the
> > users who use non-default max_locks_per_transaction to increase it if
> > they see "out of memory" errors. I don't think it should provide a
> > blanket advise to double their locks
>
> How about:
>
> "If you had previously set max_locks_per_transaction, you might need to
> set it to a higher value in v19 to avoid "out of shared memory" errors.
> If you are unsure what to set it to and don't mind the increased memory
> usage, you can double the value to ensure that you can acquire at least
> as many locks as before"

The wiggle room is 100KB fixed + 10% of other two structures, so value
by which it should be increased is partly fixed and partly a multiple
of current value. "double the value" is simplest advice we can give.
+1.

-- 
Best Wishes,
Ashutosh Bapat





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-03 17:32  Heikki Linnakangas <[email protected]>
  parent: Ashutosh Bapat <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Heikki Linnakangas @ 2026-04-03 17:32 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>; Matthias van de Meent <[email protected]>

On 03/04/2026 16:03, Ashutosh Bapat wrote:
> On Thu, Apr 2, 2026 at 10:15 PM Heikki Linnakangas <[email protected]> wrote:
>>> I think the release notes should "nudge" all the
>>> users who use non-default max_locks_per_transaction to increase it if
>>> they see "out of memory" errors. I don't think it should provide a
>>> blanket advise to double their locks
>>
>> How about:
>>
>> "If you had previously set max_locks_per_transaction, you might need to
>> set it to a higher value in v19 to avoid "out of shared memory" errors.
>> If you are unsure what to set it to and don't mind the increased memory
>> usage, you can double the value to ensure that you can acquire at least
>> as many locks as before"
> 
> The wiggle room is 100KB fixed + 10% of other two structures, so value
> by which it should be increased is partly fixed and partly a multiple
> of current value. "double the value" is simplest advice we can give.
> +1.

Ok, committed these patches to remove the safety margins, make LOCK and 
PROCLOCK fixed-size, and change the default to 
max_locks_per_transaction=128. I will do one final self-review of the 
remaining earlier patches from this thread next; I believe they're ready 
to be committed too.

Thanks for the review!

- Heikki






^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: Shared hash table allocations
@ 2026-04-03 23:58  Heikki Linnakangas <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 12+ messages in thread

From: Heikki Linnakangas @ 2026-04-03 23:58 UTC (permalink / raw)
  To: Ashutosh Bapat <[email protected]>; +Cc: Tomas Vondra <[email protected]>; pgsql-hackers; Robert Haas <[email protected]>; Rahila Syed <[email protected]>; Matthias van de Meent <[email protected]>

On 03/04/2026 20:32, Heikki Linnakangas wrote:
> On 03/04/2026 16:03, Ashutosh Bapat wrote:
>> On Thu, Apr 2, 2026 at 10:15 PM Heikki Linnakangas <[email protected]> 
>> wrote:
>>>> I think the release notes should "nudge" all the
>>>> users who use non-default max_locks_per_transaction to increase it if
>>>> they see "out of memory" errors. I don't think it should provide a
>>>> blanket advise to double their locks
>>>
>>> How about:
>>>
>>> "If you had previously set max_locks_per_transaction, you might need to
>>> set it to a higher value in v19 to avoid "out of shared memory" errors.
>>> If you are unsure what to set it to and don't mind the increased memory
>>> usage, you can double the value to ensure that you can acquire at least
>>> as many locks as before"
>>
>> The wiggle room is 100KB fixed + 10% of other two structures, so value
>> by which it should be increased is partly fixed and partly a multiple
>> of current value. "double the value" is simplest advice we can give.
>> +1.
> 
> Ok, committed these patches to remove the safety margins, make LOCK and 
> PROCLOCK fixed-size, and change the default to 
> max_locks_per_transaction=128. I will do one final self-review of the 
> remaining earlier patches from this thread next; I believe they're ready 
> to be committed too.
> 
> Thanks for the review!

And committed the rest of the patches from this thread now too, after 
some small fixes and cleanups. Thanks again!

- Heikki






^ permalink  raw  reply  [nested|flat] 12+ messages in thread

end of thread, other threads:[~2026-04-03 23:58 UTC | newest]

Thread overview: 12+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-31 21:25 Re: Shared hash table allocations Heikki Linnakangas <[email protected]>
2026-04-02 10:24 ` Matthias van de Meent <[email protected]>
2026-04-02 11:52   ` Heikki Linnakangas <[email protected]>
2026-04-02 13:47     ` Matthias van de Meent <[email protected]>
2026-04-02 12:55 ` Ashutosh Bapat <[email protected]>
2026-04-02 14:14   ` Heikki Linnakangas <[email protected]>
2026-04-02 14:52     ` Ashutosh Bapat <[email protected]>
2026-04-02 16:13     ` Ashutosh Bapat <[email protected]>
2026-04-02 16:44       ` Heikki Linnakangas <[email protected]>
2026-04-03 13:03         ` Ashutosh Bapat <[email protected]>
2026-04-03 17:32           ` Heikki Linnakangas <[email protected]>
2026-04-03 23:58             ` Heikki Linnakangas <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox