public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bertrand Drouvot <[email protected]>
To: Jelte Fennema-Nio <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: [email protected]
Subject: Re: Safer hash table initialization macro
Date: Wed, 14 Jan 2026 07:57:29 +0000
Message-ID: <aWdMaa/[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <aTAcV/[email protected]>
	<[email protected]>
	<CA+hUKG+TuQPXtfkaam6trfLrk+OXf89zUt7Jx27zEY-y8i1swA@mail.gmail.com>
	<CAGECzQSnhom3wFnY6a7xepGfxzzVmwyNCcJQE=y8UYS-g2G=RQ@mail.gmail.com>
	<CA+hUKGLZPZ4DgJs46-dkkkF5pi0gm23SerJY0HbOUP5r7siPsg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

Hi,

On Tue, Jan 13, 2026 at 10:31:18AM +0100, Jelte Fennema-Nio wrote:
> On Tue Jan 13, 2026 at 8:27 AM CET, Bertrand Drouvot wrote:
> > I've probably a too paranoid concern: what if someone use char[N] to store
> > binary stuff with embedded null? That would detect it as string and then
> > make use of strncmp() and then would provide incorrect result.
> > 
> > While the risk is probably very low, I think it is there.
> 
> Added a warning in the comment for these macros. For non of our
> usages this was the case (i.e. the char arrays were all storing null
> terminated strings).

Agreed, I did check that too before doing the initial comment.

> So I'm not too worried about this being a problem
> in practice.

I agree, it's very low risk that one adds a new "bad" one in the future. Adding
a comment looks enough then.

+ * WARNING: If you use char[N] to store binary data that is not null-terminated,
+ * the automatic detection will incorrectly treat it as a string and use string
+ * comparison. In such cases, use hash_make_ext() with .force_blobs = true to
+ * override the automatic detection

maybe s/is not null-terminated/may contain null bytes/?

Also, nit, "Note or NOTE" looks more commonly used that "WARNING". We might want
to use that instead.

> Especially because in most cases there will be no null byte
> in the key, and instead you'll start reading out of bounds, which wil
> cause problems pretty much immediately during development.

Agreed.

> Especially, because to make this macro nice to
> use in the two cases that it would apply to we'd have to make it treat 0
> as a special value.

Not necessary, we could also just add a foreach_hash_with_hash_value() to make
things more consistent?

> Finally, I converted the last couple of hash_seq_init stragglers (some
> I had missed/were added) and others needed the now newly added
> foreach_hash_reset macro to be converted.

I see that you added foreach_hash_restart(), I think that makes sense
(even if it's used only in 3 places).

Two more comments:

=== 1

-static void
-cfunc_hashtable_init(void)
-{
-       /* don't allow double-initialization */
-       Assert(cfunc_hashtable == NULL);

Most of the hash_make_fn_cxt() callers check that the destination is already
NULL so that removing the Assert() is not that worrying for them. There are 2
places where it's not the case: InitializeAttoptCache() and build_guc_variables()
, should we add an Assert (or an if check) in them? I think that an Assert is
more appropriate for those 2.

=== 2

"    At the very least we should choose a few places where we use the new
    macros to make sure they have coverage."

I do agree that the refactoring is quite large. I do think there is no rush
to do all the changes at once. We could update a subset of files at a time per
month so that, that would:

- keep changes consistent within each file
- ease the review(s)
- avoid "large" rebases for patches waiting in the commitfest

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com






view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Safer hash table initialization macro
  In-Reply-To: <aWdMaa/[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox