MIME-Version: 1.0
References: <01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi>
 <2981bb36-6bbe-4bdc-9a94-29b1114c79bd@vondra.me>
 <3026ec05-f664-4ebe-8bf6-0a1218b234ec@iki.fi>
 <19945803-6bcc-40fe-a14a-7dc5c462ed80@iki.fi>
 <e07be2ba-856b-4ff5-8313-8b58b6b4e4d0@iki.fi>
 <CAExHW5uWdU1iEM_eVFVVmaHqfjLpq0QrdFUeZjtBDYpNwfuRBg@mail.gmail.com>
 <83e37829-0d94-49b2-ad48-5feb7b5d5e44@iki.fi>
In-Reply-To: <83e37829-0d94-49b2-ad48-5feb7b5d5e44@iki.fi>
From: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Date: Thu, 2 Apr 2026 20:22:32 +0530
Message-ID: 
 <CAExHW5uNtqoSwZ0r+JXgxBSi4V98KfQCWuBxRheYTB40pf7FEg@mail.gmail.com>
Subject: Re: Shared hash table allocations
To: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Tomas Vondra <tomas@vondra.me>,
	"pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>,
 Robert Haas <robertmhaas@gmail.com>,
	Rahila Syed <rahilasyed90@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAExHW5uNtqoSwZ0r%2BJXgxBSi4V98KfQCWuBxRheYTB40pf7FEg%40mail.gmail.com>
Precedence: bulk

On Thu, Apr 2, 2026 at 7:44=E2=80=AFPM Heikki Linnakangas <hlinnaka@iki.fi>=
 wrote:
>
> On 02/04/2026 15:55, Ashutosh Bapat wrote:
> > When we "allocate" shared memory, we are just allocating space on
> > systems which use mmap. The memory gets allocated only when it is
> > touched. The wiggle room as a whole is never touched during
> > initialization. Those pages get allocated when wiggle room is used -
> > i.e. when the entries beyond initial number are allocated. By
> > allocating maximal hash tables, I was worried that we will allocate
> > more memory than required. But that's not true since a 4K memory page
> > fits only 50-60 entries - far less than the default configuration
> > permits. Most of the memory for the hash table will be allocated as
> > the entries as used.
>
> Hmm, that's a good point about untouched memory not being allocated. I
> think it's fine, though.
>
> With small changes on top of the the earlier refactorings from this
> thread, we could stop pre-allocating all the elements when a shared
> memory hash table is created, and have ShmemHashAlloc() allocate them on
> the fly, but instead of doing them as anonymous allocations like we do
> with ShmemAlloc() today, the allocations could come from the
> pre-allocated region dedicated to the hash table. You'd still get the
> same determinism and visibility in pg_shmem_allocations, but you could
> avoid actually touching the pages until they're needed. Not sure it's
> worth the trouble.

share hash table refactoring + shared memory structure refactoring +
resizable structures, we should be able to get resizable shared hash
tables as well. But that's not required immediately. I feel large hash
tables like buffer hash table, lock hash tables can benefit from this
kind of thing.

>
> > The second hazard of increasing hash table size is the hash table
> > access becomes slower as it becomes sparse [1]. I don't think it shows
> > up in performance but maybe worth trying a trivial pgbench run, just
> > to make sure that default performance doesn't regress.
>
> Interesting, but yeah I don't think that's going to be measurable. I did
> some quick testing with a test function that just locks and unlocks
> relations:
>
> PG_FUNCTION_INFO_V1(test_lock_bench);
> Datum
> test_lock_bench(PG_FUNCTION_ARGS)
> {
>         int32           num_distinct_locks =3D PG_GETARG_INT32(0);
>         int32           num_acquires =3D PG_GETARG_INT32(1);
>
>         LOCKMODE        lockmode =3D AccessExclusiveLock;
>
> #define FIRST_RELID 1000000000
>
>         for (int32 i =3D 0; i < num_acquires; i++)
>         {
>                 Oid                     relid =3D FIRST_RELID + i % num_d=
istinct_locks;
>
>                 if (i >=3D num_distinct_locks)
>                         UnlockRelationOid(relid, lockmode);
>
>                 if (!ConditionalLockRelationOid(relid, lockmode))
>                 {
>                         elog(LOG, "could not acquire lock, iteration %d",=
 i);
>                         break;
>                 }
>         }
>
>         PG_RETURN_VOID();
> }
>
> With test_lock_bench(1, 5000000), I don't see any meaningful difference,
> i.e. it's within 1-2 %, with anything from max_locks_per_transactions=3D1=
0
> to max_locks_per_transactions=3D128.
>
> With more distinct locks involved, the caching effects might be bigger,
> and maybe you'd see a difference because of more or less collisions.
> Spot testing some values on my laptop, I don't see anything that would
> worry me though.

Great. This agrees with my experiments with sparse buffer lookup table.

>
> > The increase in memory usage is 3MB, which is fine usually. I mean, we
> > didn't hear any complaints when we increased the default size of the
> > shared buffer pool - this is much less than that. But why do you want
> > to double the max_locks_per_transaction? I first thought it's because
> > the hash table size is anyway a power of 2. But then the size of the
> > hash table is actually max_locks_per_transaction * (number of backends
> > + number of prepared transactions). What we want is the default
> > max_locks_per_transaction such that 14927 locks are allowed. Playing
> > with max_locks_per_transaction using your script 109 seems to be the
> > number which will give us 14951 locks. It looks (and is) an odd
> > number. If we are worried about memory increase, that's the number we
> > should use as default and then write a long paragraph about why we
> > chose such an odd-looking number :D.
>
> My first thought was actually to set max_locks_per_transaction=3D100,
> making it a nice round number :-). But then the neighboring default of
> max_pred_locks_per_transaction=3D64 looks weird. We could reduce it
> max_pred_locks_per_transaction=3D50 to make it fit in. But it feels a
> little arbitrary to change just for aesthetic reasons.

+1. Let's keep it 128 and see if there are complaints. We can set it
to 100 or 109 if the complaints look serious.

--=20
Best Wishes,
Ashutosh Bapat