public inbox for [email protected]  
help / color / mirror / Atom feed
From: Heikki Linnakangas <[email protected]>
To: Ashutosh Bapat <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: [email protected]
Subject: Re: Better shared data structure management and resizable shared data structures
Date: Sat, 28 Mar 2026 01:17:13 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAExHW5v5FVZbsO9sLzztMZ11C3hgGStE=HkkV2bQkCyncess4w@mail.gmail.com>
References: <CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com>
	<CA+TgmoaoSha9KnSdrA7DDMYD9+Uor4OPEaQ88bbEC3QuVE5Mcg@mail.gmail.com>
	<[email protected]>
	<CA+TgmoboxAaS6MCRf__LRhf8Ji3B0O9_=d7KzRS12QhOuocnTQ@mail.gmail.com>
	<[email protected]>
	<CA+Tgmob7E3CGDoEBB3mwfk+v1CgcOG8yL0VHje2gy9Ts=e1Zxw@mail.gmail.com>
	<[email protected]>
	<CA+TgmobqwCHp1w6xkTJmZiq4C8Pr0O3tNUvSFZcAgYLkvF45bg@mail.gmail.com>
	<[email protected]>
	<CA+Tgmob8ot7qam7a650nx_4jyEoFLs8s_kPMc1+EB7-dr56=WA@mail.gmail.com>
	<[email protected]>
	<CA+TgmoauqYXm8iA3FGRAVKxYShUxWBiS_MSLmQpTrmO7wNHamw@mail.gmail.com>
	<[email protected]>
	<CAExHW5uTNWOSxJDWQAUnS0tZawob2_J3dRAtc67NHNZ98X4_xA@mail.gmail.com>
	<CAExHW5t439y61YD9bc7d5wZWHp6J=M43Qu3eEZOBPguZML7o2A@mail.gmail.com>
	<CAExHW5v5FVZbsO9sLzztMZ11C3hgGStE=HkkV2bQkCyncess4w@mail.gmail.com>

Thanks, will incorporate your comments in next version. Replying to just 
a few of them here:

On 27/03/2026 09:01, Ashutosh Bapat wrote:
> /* Restore basic shared memory pointers */
> if (UsedShmemSegAddr != NULL)
> + {
> InitShmemAllocator(UsedShmemSegAddr);
> + ShmemCallRequestCallbacks();
> 
> It's not clear how we keep the list of registered callbacks across the
> backends and also after restart in-sync. How do we make sure that the
> callbacks registered at this time are the same callbacks registered
> before creating the shared memory? How do we make sure that the
> callbacks registered after the startup are also registered after
> restart?

On Unix systems, the registered callbacks are inherited by fork(), and 
also survive over crash restart. With EXEC_BACKEND, the assumption is 
that calling a library's _PG_init() function will register the same 
callbacks every time. We make the same assumption today with the 
shmem_startup hook.

> +void
> +RegisterShmemCallbacks(const ShmemCallbacks *callbacks)
> ... snip ...
> + foreach(lc, requested_shmem_areas)
> 
> Doesn't this list contain all the areas, not just registered in this
> instance of the call. Does that mean that we need to have all the
> attach functions idempotent? Why can't we deal with the newly
> registered areas only?

registered_shmem_areas is supposed to be empty when the function is 
entered. There's an assertion for that too before the foreach().

However, it's missing this, after processing the list:

         list_free_deep(requested_shmem_areas);
         requested_shmem_areas = NIL;

Because of that, this will fail if you load multiple extensions that 
call RegisterShmemCallbacks() in the same session. Will fix that.

> + /*
> + * Extra space to reserve in the shared memory segment, but it's not part
> + * of the struct itself. This is used for shared memory hash tables that
> + * can grow beyond the initial size when more buckets are allocated.
> + */
> + size_t extra_size;
> 
> When we introduce resizable structures (where even the hash table
> directly itself could be resizable), we will introduce a new field
> max_size which is easy to get confused with extra_size. Maybe we can
> rename extra_size to something like "auxilliary_size" to mean size of
> the auxiliary parts of the structure which are not part of the main
> struct itself.
> 
> + /*
> + * max_size is the estimated maximum number of hashtable entries. This is
> + * not a hard limit, but the access efficiency will degrade if it is
> + * exceeded substantially (since it's used to compute directory size and
> + * the hash table buckets will get overfull).
> + */
> + size_t max_size;
> +
> + /*
> + * init_size is the number of hashtable entries to preallocate. For a
> + * table whose maximum size is certain, this should be equal to max_size;
> + * that ensures that no run-time out-of-shared-memory failures can occur.
> + */
> + size_t init_size;
> 
> Everytime I look at these two fields, I question whether those are the
> number of entries (i.e. size of the hash table) or number of bytes
> (size of the memory). I know it's the former, but it indicates that
> something needs to be changed here, like changing the names to have
> _entries instead of _size, or changing the type to int64 or some such.
> Renaming to _entries would conflict with dynahash APIs since they use
> _size, so maybe the latter?

Agreed.

- Heikki





view thread (75+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Better shared data structure management and resizable shared data structures
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox