[PATCH] Use cached hash to skip unnecessary key comparisons in dshash

public inbox for [email protected]  
help / color / mirror / Atom feed

From: CharSyam <[email protected]>
To: [email protected]
Subject: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date: Sat, 11 Apr 2026 01:09:33 +0900
Message-ID: <CAMrLSE4JrhPB2XYMOMzbPSsgE+6kmDaKHKiF34NWdnbQts=Cpw@mail.gmail.com> (raw)

Dear PostgreSQL Hackers,

  This email proposes a patch to optimize hash table lookups and
  deletions in dshash.  The patch file,
  0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch,
  is attached.

  Problem:

  Each dshash_table_item already stores the hash value in its 'hash'
  field, but find_in_bucket() and delete_key_from_bucket() never use
  it.  They unconditionally call the compare function for every item
  in the bucket chain, incurring unnecessary overhead from DSM address
  translation and key comparison on non-matching items.

  Solution:

  This patch adds a hash pre-check (item->hash == hash) before calling
  equal_keys() in both find_in_bucket() and delete_key_from_bucket().
  Items with non-matching hash values are skipped with a single integer
  comparison, avoiding the expensive key comparison entirely.

  All callers -- dshash_find(), dshash_find_or_insert_extended(), and
  dshash_delete_key() -- already have the hash value computed, so it
  is simply passed through as a new parameter.  Since both functions
  are static, there is no API change.

  Benchmark:

  I ran a benchmark using the test_dsm_registry extension with 10,000
  string-keyed entries (key = 'key_1' .. 'key_10000') on an Apple
  Silicon machine:

    Test                     | Before    | After     | Improvement
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%

  The improvement is most significant when bucket chains are longer
  and key comparison is expensive (e.g., string keys).

  Testing & Compatibility:

  The patch compiles cleanly and passes all core regression tests
  (make check) on macOS (arm64).  The changes are limited to internal
  static functions in dshash.c and do not affect any public API, so
  full backward compatibility is maintained.

  I believe this patch is ready for review.  I look forward to any
  feedback and am happy to make revisions.

  Thank you,
  charsyam

Attachments:

  [application/octet-stream] 0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch (4.4K, 3-0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch)
  download | inline diff:
From cf48b0a775cfeadd29288630b7c1fe707e3c7ab5 Mon Sep 17 00:00:00 2001
From: DaeMyung Kang <[email protected]>
Date: Sat, 11 Apr 2026 00:52:56 +0900
Subject: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

Each dshash_table_item already stores the hash value, but
find_in_bucket() and delete_key_from_bucket() were not using it,
always calling the expensive compare function for every item in the
bucket chain.

Add a hash pre-check before calling equal_keys() so that items with
non-matching hash values are skipped with a simple integer comparison,
avoiding the overhead of DSM address translation and key comparison.

Benchmark with 10000 string-keyed entries using test_dsm_registry
shows 21-47% improvement across insert, lookup-hit, and lookup-miss
operations.

Signed-off-by: DaeMyung Kang <[email protected]>
---
 src/backend/lib/dshash.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index 1999989c14f..55a3235e08b 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -172,6 +172,7 @@ static bool resize(dshash_table *hash_table, size_t new_size_log2,
 static inline void ensure_valid_bucket_pointers(dshash_table *hash_table);
 static inline dshash_table_item *find_in_bucket(dshash_table *hash_table,
 												const void *key,
+												dshash_hash hash,
 												dsa_pointer item_pointer);
 static void insert_item_into_bucket(dshash_table *hash_table,
 									dsa_pointer item_pointer,
@@ -183,6 +184,7 @@ static dshash_table_item *insert_into_bucket(dshash_table *hash_table,
 											 int flags);
 static bool delete_key_from_bucket(dshash_table *hash_table,
 								   const void *key,
+								   dshash_hash hash,
 								   dsa_pointer *bucket_head);
 static bool delete_item_from_bucket(dshash_table *hash_table,
 									dshash_table_item *item,
@@ -408,7 +410,7 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
 	ensure_valid_bucket_pointers(hash_table);

 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));

 	if (!item)
 	{
@@ -462,7 +464,7 @@ restart:
 	ensure_valid_bucket_pointers(hash_table);

 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));

 	if (item)
 		*found = true;
@@ -536,7 +538,7 @@ dshash_delete_key(dshash_table *hash_table, const void *key)
 	LWLockAcquire(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE);
 	ensure_valid_bucket_pointers(hash_table);

-	if (delete_key_from_bucket(hash_table, key,
+	if (delete_key_from_bucket(hash_table, key, hash,
 							   &BUCKET_FOR_HASH(hash_table, hash)))
 	{
 		Assert(hash_table->control->partitions[partition].count > 0);
@@ -985,17 +987,20 @@ ensure_valid_bucket_pointers(dshash_table *hash_table)

 /*
  * Scan a locked bucket for a match, using the provided compare function.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static inline dshash_table_item *
 find_in_bucket(dshash_table *hash_table, const void *key,
-			   dsa_pointer item_pointer)
+			   dshash_hash hash, dsa_pointer item_pointer)
 {
 	while (DsaPointerIsValid(item_pointer))
 	{
 		dshash_table_item *item;

 		item = dsa_get_address(hash_table->area, item_pointer);
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 			return item;
 		item_pointer = item->next;
 	}
@@ -1047,10 +1052,13 @@ insert_into_bucket(dshash_table *hash_table,

 /*
  * Search a bucket for a matching key and delete it.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static bool
 delete_key_from_bucket(dshash_table *hash_table,
 					   const void *key,
+					   dshash_hash hash,
 					   dsa_pointer *bucket_head)
 {
 	while (DsaPointerIsValid(*bucket_head))
@@ -1059,7 +1067,8 @@ delete_key_from_bucket(dshash_table *hash_table,

 		item = dsa_get_address(hash_table->area, *bucket_head);

-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 		{
 			dsa_pointer next;

-- 
2.50.1 (Apple Git-155)

view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
  In-Reply-To: <CAMrLSE4JrhPB2XYMOMzbPSsgE+6kmDaKHKiF34NWdnbQts=Cpw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox