public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bertrand Drouvot <[email protected]>
To: [email protected]
Subject: Safer hash table initialization macro
Date: Mon, 1 Dec 2025 13:45:00 +0000
Message-ID: <aS2b3LoUypW1/[email protected]> (raw)

Hi hackers,

Currently to create a hash table we do things like:

A) create a struct, say:

 typedef struct SeenRelsEntry
 {
    Oid   rel_id;
    int   list_index;
 } SeenRelsEntry;

where the first member is the hash key, and then later:

B)

 ctl.keysize = sizeof(Oid);
 ctl.entrysize = sizeof(SeenRelsEntry);
 ctl.hcxt = CurrentMemoryContext;

 seen_rels = hash_create("find_all_inheritors temporary table",
                         32, /* start small and extend */
                         &ctl,

I can see 2 possible issues:

1)

We manually specify the type for keysize, which could become incorrect (from the
start) or if the key member's type changes.

2) 

It may be possible to remove the key member without the compiler noticing it.

Take this example and remove:

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..eb11976afef 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -36,7 +36,6 @@
  */
 typedef struct SeenRelsEntry
 {
-       Oid                     rel_id;                 /* relation oid */
        int                     list_index;             /* its position in output list(s) */
 } SeenRelsEntry;

That would compile without any issues because this rel_id member is not
referenced in the code (for this particular example). That's rare but possible.

But then, on my machine, during make check:

TRAP: failed Assert("!found"), File: "nodeModifyTable.c", Line: 5157, PID: 140430

The reason is that the struct member access is done only for bytes level
operations (within the hash related macros). So it's easy to think that this
member is unused (because it is not referenced in the code).

I'm thinking about what kind of safety we could put in place to better deal with
1) and 2).

What about adding a macro that:

- requests the key member name
- ensures that it is at offset 0
- computes the key size based on the member

Something like:

"
#define HASH_ELEM_INIT(ctl, entrytype, keymember) \
    do { \
        StaticAssertStmt(offsetof(entrytype, keymember) == 0, \
                        #keymember " must be first member in " #entrytype); \
        (ctl).keysize = sizeof(((entrytype *)0)->keymember); \
        (ctl).entrysize = sizeof(entrytype); \
    } while (0)
"

That way:

- The key member is explicitly referenced in the code (preventing "unused"
false positives)
- The key size is automatically computed from the actual member type (preventing
type mismatches)
- We enforce that the key is at offset 0

An additional benefit: it avoids repeating the "keysize ="  followed by "entrysize ="
in a lot of places in the code (currently about 100 times).

If that sounds like a good idea, I could work on a patch doing so.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





view thread (17+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Safer hash table initialization macro
  In-Reply-To: <aS2b3LoUypW1/[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox