Re: Safer hash table initialization macro

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Bertrand Drouvot <[email protected]>
To: Jelte Fennema-Nio <[email protected]>
Cc: [email protected]
Subject: Re: Safer hash table initialization macro
Date: Wed, 3 Dec 2025 11:17:43 +0000
Message-ID: <aTAcV/[email protected]> (raw)
In-Reply-To: <CAGECzQSHb6FxuSdYNY7tt8SKFw2dianACvkExrWRmTBfy-CkjQ@mail.gmail.com>
References: <aS2b3LoUypW1/[email protected]>
	<CAGECzQSHb6FxuSdYNY7tt8SKFw2dianACvkExrWRmTBfy-CkjQ@mail.gmail.com>

Hi,

On Mon, Dec 01, 2025 at 03:44:41PM +0100, Jelte Fennema-Nio wrote:
> On Mon, 1 Dec 2025 at 14:45, Bertrand Drouvot
> <[email protected]> wrote:
> > Thoughts?
> 
> I think the hashtable creation API in postgres is so terrible that it
> actively discourages usage.

Thanks for sharing your thoughts!

> So I'm definitely in favor of improving this API (probably by adding a
> few new functions). I have a few main thoughts on what could be
> improved:
> 
> 1. Automatically determine keysize and entrysize given a keymember and
> entrytype (like you suggested).

PFA a patch introducing and using the new macro. Note that it also introduces
HASH_ELEM_INIT_FULL for the rare cases where the whole struct is the
key.

Also one case remains untouched:

$ git grep "entrysize = sizeof" "*.c"
src/backend/replication/logical/relation.c:     ctl.entrysize = sizeof(LogicalRepRelMapEntry);

That's because the key is a member of a nested struct so that the new
macro can not be used. As there is only one occurrence of it, I think we can keep
it as it is. But we could create a dedicated macro for those cases if we feel
the need. Now that I'm writing this, that might be a better idea: that way we'd
avoid any "entrysize/keysize = " in the .c files.

Also a nice side effect of using the macros:

138 insertions(+), 203 deletions(-)

> 2. Autodect most of the flags.
>     a. HASH_PARTITION, HASH_SEGMENT, HASH_FUNCTION, HASH_DIRSIZE,
> HASH_COMPARE, HASH_KEYCOPY, HASH_ALLOC can all be simply be detected
> based on the relevant fields from HASHCTL. Passing them in explicitly
> is just duplication that causes code noise and is easy to forget
> accidentally.
>     b. HASH_ELEM is useless noise because it is required
>     c. HASH_BLOBS could be the default (and maybe use HASH_STRINGS by
> default if the keytype is char*)
> 3. Default to CurrentMemoryContext instead of TopMemoryContext. Like
> all of the other functions that allocate, because right now it's way
> too easy to accidentally use TopMemoryContext when you did not intend
> to.
> 4. Have a creation function that doesn't require HASHCTL at all (just
> takes entrytype and keymember and maybe a version of this that takes a
> memorycontext).

Thanks for the above suggestions! I did not think so deep as you did during
your Citus time, but will think about those too. I suggest we move forward one
step at a time, first step being the new macros. Does that make sense to you?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

view thread (17+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Safer hash table initialization macro
  In-Reply-To: <aTAcV/[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox