Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

public inbox for [email protected]  
help / color / mirror / Atom feed

From: CharSyam <[email protected]>
To: Nathan Bossart <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date: Sat, 11 Apr 2026 10:17:07 +0900
Message-ID: <CAMrLSE5DHxxBxhSMCnimZoMdsUjKagDF4PzgMoFMsZcp_=24cg@mail.gmail.com> (raw)
In-Reply-To: <adk9IiZYGam6OltO@nathan>
References: <CAMrLSE4JrhPB2XYMOMzbPSsgE+6kmDaKHKiF34NWdnbQts=Cpw@mail.gmail.com>
	<adk8Z0HGDaXSRIgF@nathan>
	<adk9IiZYGam6OltO@nathan>

Thank you for the thoughtful review. I've attached an updated patch (v2)
  addressing your feedback.

  1. Comment added

  I've added a comment in find_in_bucket() explaining the hash pre-check
  assumption:

    /*
     * If the hash values don't match, the keys certainly don't match
     * either, so we can skip the expensive key comparison.  Matching
     * hashes don't guarantee matching keys, so equal_keys() is still
     * needed for confirmation.
     */

  A short "See comment in find_in_bucket()" reference is also added in
  delete_key_from_bucket().

  2. No regressions with short keys / cheap comparisons

  I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
  where strcmp cost is minimal:

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 13.26 ms  |  6.44 ms  | ~51% faster
    LOOKUP 10000 hits        |  7.94 ms  |  5.03 ms  | ~37% faster
    LOOKUP 10000 misses      |  8.08 ms  |  4.80 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.05 ms  | 24.06 ms  | ~27% faster

  For reference, here are the original string key results ("key_1" ~
  "key_10000"):

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37% faster
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47% faster
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21% faster

  No regressions in either scenario.  Even with short keys, the integer
  hash pre-check is always cheaper than the DSM address translation plus
  compare function call it avoids.

  Both tests ran on macOS (arm64) using the test_dsm_registry extension
  with 10,000 entries.

  3. Keeping the pre-check outside equal_keys()

  I considered moving the hash comparison into equal_keys(), but I think
  keeping it at the call site is a better fit for the following reasons:

    - equal_keys() is a pure key comparison function.  Mixing in hash
      comparison would change its semantics and make the name misleading.

    - To pass hashes into equal_keys(), we would need to add two hash
      parameters (one for the search key, one for the item).  The caller
      still needs to extract item->hash before the call, so the call site
      doesn't actually get simpler.

    - There are only two call sites, both already modified in this patch,
      so there is no real maintenance burden from having the check at each
      call site.

    - This pattern is consistent with other hash table implementations in
      PostgreSQL (e.g., dynahash.c), which also compare hashes outside
      the key equality function.

  I'm happy to reconsider if you feel differently.

  Regards,
  charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <[email protected]>님이 작성:

> On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> > I believe this patch is ready for review.  I look forward to any feedback
> > and am happy to make revisions.
>
> Sorry, I forgot to ask whether we could move the "pre-check" to within the
> equal_keys() function so that it needn't be added to every one of its
> callers.
>
> --
> nathan
>


Attachments:

  [application/octet-stream] 0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch (4.7K, 3-0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch)
  download | inline diff:
From 0d8f0f44a0e9aa8b6a957ead980185bfad223706 Mon Sep 17 00:00:00 2001
From: DaeMyung Kang <[email protected]>
Date: Sat, 11 Apr 2026 00:52:56 +0900
Subject: [PATCH v2] Use cached hash to skip unnecessary key comparisons in
 dshash

Each dshash_table_item already stores the hash value, but
find_in_bucket() and delete_key_from_bucket() were not using it,
always calling the expensive compare function for every item in the
bucket chain.

Add a hash pre-check before calling equal_keys() so that items with
non-matching hash values are skipped with a simple integer comparison,
avoiding the overhead of DSM address translation and key comparison.

Benchmark with 10000 string-keyed entries using test_dsm_registry
shows 21-47% improvement across insert, lookup-hit, and lookup-miss
operations.

Signed-off-by: DaeMyung Kang <[email protected]>
---
 src/backend/lib/dshash.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index 1999989c14f..91bbb8e9b6e 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -172,6 +172,7 @@ static bool resize(dshash_table *hash_table, size_t new_size_log2,
 static inline void ensure_valid_bucket_pointers(dshash_table *hash_table);
 static inline dshash_table_item *find_in_bucket(dshash_table *hash_table,
 												const void *key,
+												dshash_hash hash,
 												dsa_pointer item_pointer);
 static void insert_item_into_bucket(dshash_table *hash_table,
 									dsa_pointer item_pointer,
@@ -183,6 +184,7 @@ static dshash_table_item *insert_into_bucket(dshash_table *hash_table,
 											 int flags);
 static bool delete_key_from_bucket(dshash_table *hash_table,
 								   const void *key,
+								   dshash_hash hash,
 								   dsa_pointer *bucket_head);
 static bool delete_item_from_bucket(dshash_table *hash_table,
 									dshash_table_item *item,
@@ -408,7 +410,7 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));
 
 	if (!item)
 	{
@@ -462,7 +464,7 @@ restart:
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));
 
 	if (item)
 		*found = true;
@@ -536,7 +538,7 @@ dshash_delete_key(dshash_table *hash_table, const void *key)
 	LWLockAcquire(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE);
 	ensure_valid_bucket_pointers(hash_table);
 
-	if (delete_key_from_bucket(hash_table, key,
+	if (delete_key_from_bucket(hash_table, key, hash,
 							   &BUCKET_FOR_HASH(hash_table, hash)))
 	{
 		Assert(hash_table->control->partitions[partition].count > 0);
@@ -985,17 +987,27 @@ ensure_valid_bucket_pointers(dshash_table *hash_table)
 
 /*
  * Scan a locked bucket for a match, using the provided compare function.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static inline dshash_table_item *
 find_in_bucket(dshash_table *hash_table, const void *key,
-			   dsa_pointer item_pointer)
+			   dshash_hash hash, dsa_pointer item_pointer)
 {
 	while (DsaPointerIsValid(item_pointer))
 	{
 		dshash_table_item *item;
 
 		item = dsa_get_address(hash_table->area, item_pointer);
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+
+		/*
+		 * If the hash values don't match, the keys certainly don't match
+		 * either, so we can skip the expensive key comparison.  Matching
+		 * hashes don't guarantee matching keys, so equal_keys() is still
+		 * needed for confirmation.
+		 */
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 			return item;
 		item_pointer = item->next;
 	}
@@ -1047,10 +1059,13 @@ insert_into_bucket(dshash_table *hash_table,
 
 /*
  * Search a bucket for a matching key and delete it.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static bool
 delete_key_from_bucket(dshash_table *hash_table,
 					   const void *key,
+					   dshash_hash hash,
 					   dsa_pointer *bucket_head)
 {
 	while (DsaPointerIsValid(*bucket_head))
@@ -1059,7 +1074,9 @@ delete_key_from_bucket(dshash_table *hash_table,
 
 		item = dsa_get_address(hash_table->area, *bucket_head);
 
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		/* See comment in find_in_bucket() */
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 		{
 			dsa_pointer next;
 
-- 
2.50.1 (Apple Git-155)

view thread (4+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
  In-Reply-To: <CAMrLSE5DHxxBxhSMCnimZoMdsUjKagDF4PzgMoFMsZcp_=24cg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox