[PATCH] Use cached hash to skip unnecessary key comparisons in dshash

public inbox for [email protected]  
help / color / mirror / Atom feed

[PATCH] Use cached hash to skip unnecessary key comparisons in dshash
4+ messages / 2 participants
[nested] [flat]

* [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
@ 2026-04-10 16:09 CharSyam <[email protected]>
  2026-04-10 18:07 ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: CharSyam @ 2026-04-10 16:09 UTC (permalink / raw)
  To: [email protected]

Dear PostgreSQL Hackers,

  This email proposes a patch to optimize hash table lookups and
  deletions in dshash.  The patch file,
  0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch,
  is attached.

  Problem:

  Each dshash_table_item already stores the hash value in its 'hash'
  field, but find_in_bucket() and delete_key_from_bucket() never use
  it.  They unconditionally call the compare function for every item
  in the bucket chain, incurring unnecessary overhead from DSM address
  translation and key comparison on non-matching items.

  Solution:

  This patch adds a hash pre-check (item->hash == hash) before calling
  equal_keys() in both find_in_bucket() and delete_key_from_bucket().
  Items with non-matching hash values are skipped with a single integer
  comparison, avoiding the expensive key comparison entirely.

  All callers -- dshash_find(), dshash_find_or_insert_extended(), and
  dshash_delete_key() -- already have the hash value computed, so it
  is simply passed through as a new parameter.  Since both functions
  are static, there is no API change.

  Benchmark:

  I ran a benchmark using the test_dsm_registry extension with 10,000
  string-keyed entries (key = 'key_1' .. 'key_10000') on an Apple
  Silicon machine:

    Test                     | Before    | After     | Improvement
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%

  The improvement is most significant when bucket chains are longer
  and key comparison is expensive (e.g., string keys).

  Testing & Compatibility:

  The patch compiles cleanly and passes all core regression tests
  (make check) on macOS (arm64).  The changes are limited to internal
  static functions in dshash.c and do not affect any public API, so
  full backward compatibility is maintained.

  I believe this patch is ready for review.  I look forward to any
  feedback and am happy to make revisions.

  Thank you,
  charsyam

Attachments:

  [application/octet-stream] 0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch (4.4K, 3-0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch)
  download | inline diff:
From cf48b0a775cfeadd29288630b7c1fe707e3c7ab5 Mon Sep 17 00:00:00 2001
From: DaeMyung Kang <[email protected]>
Date: Sat, 11 Apr 2026 00:52:56 +0900
Subject: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash

Each dshash_table_item already stores the hash value, but
find_in_bucket() and delete_key_from_bucket() were not using it,
always calling the expensive compare function for every item in the
bucket chain.

Add a hash pre-check before calling equal_keys() so that items with
non-matching hash values are skipped with a simple integer comparison,
avoiding the overhead of DSM address translation and key comparison.

Benchmark with 10000 string-keyed entries using test_dsm_registry
shows 21-47% improvement across insert, lookup-hit, and lookup-miss
operations.

Signed-off-by: DaeMyung Kang <[email protected]>
---
 src/backend/lib/dshash.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index 1999989c14f..55a3235e08b 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -172,6 +172,7 @@ static bool resize(dshash_table *hash_table, size_t new_size_log2,
 static inline void ensure_valid_bucket_pointers(dshash_table *hash_table);
 static inline dshash_table_item *find_in_bucket(dshash_table *hash_table,
 												const void *key,
+												dshash_hash hash,
 												dsa_pointer item_pointer);
 static void insert_item_into_bucket(dshash_table *hash_table,
 									dsa_pointer item_pointer,
@@ -183,6 +184,7 @@ static dshash_table_item *insert_into_bucket(dshash_table *hash_table,
 											 int flags);
 static bool delete_key_from_bucket(dshash_table *hash_table,
 								   const void *key,
+								   dshash_hash hash,
 								   dsa_pointer *bucket_head);
 static bool delete_item_from_bucket(dshash_table *hash_table,
 									dshash_table_item *item,
@@ -408,7 +410,7 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
 	ensure_valid_bucket_pointers(hash_table);

 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));

 	if (!item)
 	{
@@ -462,7 +464,7 @@ restart:
 	ensure_valid_bucket_pointers(hash_table);

 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));

 	if (item)
 		*found = true;
@@ -536,7 +538,7 @@ dshash_delete_key(dshash_table *hash_table, const void *key)
 	LWLockAcquire(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE);
 	ensure_valid_bucket_pointers(hash_table);

-	if (delete_key_from_bucket(hash_table, key,
+	if (delete_key_from_bucket(hash_table, key, hash,
 							   &BUCKET_FOR_HASH(hash_table, hash)))
 	{
 		Assert(hash_table->control->partitions[partition].count > 0);
@@ -985,17 +987,20 @@ ensure_valid_bucket_pointers(dshash_table *hash_table)

 /*
  * Scan a locked bucket for a match, using the provided compare function.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static inline dshash_table_item *
 find_in_bucket(dshash_table *hash_table, const void *key,
-			   dsa_pointer item_pointer)
+			   dshash_hash hash, dsa_pointer item_pointer)
 {
 	while (DsaPointerIsValid(item_pointer))
 	{
 		dshash_table_item *item;

 		item = dsa_get_address(hash_table->area, item_pointer);
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 			return item;
 		item_pointer = item->next;
 	}
@@ -1047,10 +1052,13 @@ insert_into_bucket(dshash_table *hash_table,

 /*
  * Search a bucket for a matching key and delete it.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static bool
 delete_key_from_bucket(dshash_table *hash_table,
 					   const void *key,
+					   dshash_hash hash,
 					   dsa_pointer *bucket_head)
 {
 	while (DsaPointerIsValid(*bucket_head))
@@ -1059,7 +1067,8 @@ delete_key_from_bucket(dshash_table *hash_table,

 		item = dsa_get_address(hash_table->area, *bucket_head);

-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 		{
 			dsa_pointer next;

-- 
2.50.1 (Apple Git-155)

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
  2026-04-10 16:09 [PATCH] Use cached hash to skip unnecessary key comparisons in dshash CharSyam <[email protected]>
@ 2026-04-10 18:07 ` Nathan Bossart <[email protected]>
  2026-04-10 18:10   ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash Nathan Bossart <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Nathan Bossart @ 2026-04-10 18:07 UTC (permalink / raw)
  To: CharSyam <[email protected]>; +Cc: [email protected]

On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> This patch adds a hash pre-check (item->hash == hash) before calling
> equal_keys() in both find_in_bucket() and delete_key_from_bucket().
> Items with non-matching hash values are skipped with a single integer
> comparison, avoiding the expensive key comparison entirely.

This relies on the fact that matching keys will have matching hashes, but
matching hashes does not necessarily imply matching keys.  IIUC this is a
safe assumption, although a short comment to this effect might be a nice
addition.

>   Test                     | Before    | After     | Improvement
>   -------------------------+-----------+-----------+------------
>   INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
>   LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
>   LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
>   LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%
> 
> The improvement is most significant when bucket chains are longer and key
> comparison is expensive (e.g., string keys).

Nice results.  Are there any regressions when the bucket chains are short
or when key comparisons are inexpensive?

> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

We are in feature-freeze for PostgreSQL v19 now, so this will unfortunately
need to wait until July when v20 development begins.  Please add an entry
to the commitfest site to ensure this doesn't get lost:

	https://commitfest.postgresql.org/

-- 
nathan





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
  2026-04-10 16:09 [PATCH] Use cached hash to skip unnecessary key comparisons in dshash CharSyam <[email protected]>
  2026-04-10 18:07 ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash Nathan Bossart <[email protected]>
@ 2026-04-10 18:10   ` Nathan Bossart <[email protected]>
  2026-04-11 01:17     ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash CharSyam <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Nathan Bossart @ 2026-04-10 18:10 UTC (permalink / raw)
  To: CharSyam <[email protected]>; +Cc: [email protected]

On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

Sorry, I forgot to ask whether we could move the "pre-check" to within the
equal_keys() function so that it needn't be added to every one of its
callers.

-- 
nathan





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
  2026-04-10 16:09 [PATCH] Use cached hash to skip unnecessary key comparisons in dshash CharSyam <[email protected]>
  2026-04-10 18:07 ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash Nathan Bossart <[email protected]>
  2026-04-10 18:10   ` Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash Nathan Bossart <[email protected]>
@ 2026-04-11 01:17     ` CharSyam <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: CharSyam @ 2026-04-11 01:17 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: [email protected]

Thank you for the thoughtful review. I've attached an updated patch (v2)
  addressing your feedback.

  1. Comment added

  I've added a comment in find_in_bucket() explaining the hash pre-check
  assumption:

    /*
     * If the hash values don't match, the keys certainly don't match
     * either, so we can skip the expensive key comparison.  Matching
     * hashes don't guarantee matching keys, so equal_keys() is still
     * needed for confirmation.
     */

  A short "See comment in find_in_bucket()" reference is also added in
  delete_key_from_bucket().

  2. No regressions with short keys / cheap comparisons

  I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
  where strcmp cost is minimal:

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 13.26 ms  |  6.44 ms  | ~51% faster
    LOOKUP 10000 hits        |  7.94 ms  |  5.03 ms  | ~37% faster
    LOOKUP 10000 misses      |  8.08 ms  |  4.80 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.05 ms  | 24.06 ms  | ~27% faster

  For reference, here are the original string key results ("key_1" ~
  "key_10000"):

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37% faster
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47% faster
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21% faster

  No regressions in either scenario.  Even with short keys, the integer
  hash pre-check is always cheaper than the DSM address translation plus
  compare function call it avoids.

  Both tests ran on macOS (arm64) using the test_dsm_registry extension
  with 10,000 entries.

  3. Keeping the pre-check outside equal_keys()

  I considered moving the hash comparison into equal_keys(), but I think
  keeping it at the call site is a better fit for the following reasons:

    - equal_keys() is a pure key comparison function.  Mixing in hash
      comparison would change its semantics and make the name misleading.

    - To pass hashes into equal_keys(), we would need to add two hash
      parameters (one for the search key, one for the item).  The caller
      still needs to extract item->hash before the call, so the call site
      doesn't actually get simpler.

    - There are only two call sites, both already modified in this patch,
      so there is no real maintenance burden from having the check at each
      call site.

    - This pattern is consistent with other hash table implementations in
      PostgreSQL (e.g., dynahash.c), which also compare hashes outside
      the key equality function.

  I'm happy to reconsider if you feel differently.

  Regards,
  charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <[email protected]>님이 작성:

> On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> > I believe this patch is ready for review.  I look forward to any feedback
> > and am happy to make revisions.
>
> Sorry, I forgot to ask whether we could move the "pre-check" to within the
> equal_keys() function so that it needn't be added to every one of its
> callers.
>
> --
> nathan
>


Attachments:

  [application/octet-stream] 0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch (4.7K, 3-0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch)
  download | inline diff:
From 0d8f0f44a0e9aa8b6a957ead980185bfad223706 Mon Sep 17 00:00:00 2001
From: DaeMyung Kang <[email protected]>
Date: Sat, 11 Apr 2026 00:52:56 +0900
Subject: [PATCH v2] Use cached hash to skip unnecessary key comparisons in
 dshash

Each dshash_table_item already stores the hash value, but
find_in_bucket() and delete_key_from_bucket() were not using it,
always calling the expensive compare function for every item in the
bucket chain.

Add a hash pre-check before calling equal_keys() so that items with
non-matching hash values are skipped with a simple integer comparison,
avoiding the overhead of DSM address translation and key comparison.

Benchmark with 10000 string-keyed entries using test_dsm_registry
shows 21-47% improvement across insert, lookup-hit, and lookup-miss
operations.

Signed-off-by: DaeMyung Kang <[email protected]>
---
 src/backend/lib/dshash.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index 1999989c14f..91bbb8e9b6e 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -172,6 +172,7 @@ static bool resize(dshash_table *hash_table, size_t new_size_log2,
 static inline void ensure_valid_bucket_pointers(dshash_table *hash_table);
 static inline dshash_table_item *find_in_bucket(dshash_table *hash_table,
 												const void *key,
+												dshash_hash hash,
 												dsa_pointer item_pointer);
 static void insert_item_into_bucket(dshash_table *hash_table,
 									dsa_pointer item_pointer,
@@ -183,6 +184,7 @@ static dshash_table_item *insert_into_bucket(dshash_table *hash_table,
 											 int flags);
 static bool delete_key_from_bucket(dshash_table *hash_table,
 								   const void *key,
+								   dshash_hash hash,
 								   dsa_pointer *bucket_head);
 static bool delete_item_from_bucket(dshash_table *hash_table,
 									dshash_table_item *item,
@@ -408,7 +410,7 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive)
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));
 
 	if (!item)
 	{
@@ -462,7 +464,7 @@ restart:
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Search the active bucket. */
-	item = find_in_bucket(hash_table, key, BUCKET_FOR_HASH(hash_table, hash));
+	item = find_in_bucket(hash_table, key, hash, BUCKET_FOR_HASH(hash_table, hash));
 
 	if (item)
 		*found = true;
@@ -536,7 +538,7 @@ dshash_delete_key(dshash_table *hash_table, const void *key)
 	LWLockAcquire(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE);
 	ensure_valid_bucket_pointers(hash_table);
 
-	if (delete_key_from_bucket(hash_table, key,
+	if (delete_key_from_bucket(hash_table, key, hash,
 							   &BUCKET_FOR_HASH(hash_table, hash)))
 	{
 		Assert(hash_table->control->partitions[partition].count > 0);
@@ -985,17 +987,27 @@ ensure_valid_bucket_pointers(dshash_table *hash_table)
 
 /*
  * Scan a locked bucket for a match, using the provided compare function.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static inline dshash_table_item *
 find_in_bucket(dshash_table *hash_table, const void *key,
-			   dsa_pointer item_pointer)
+			   dshash_hash hash, dsa_pointer item_pointer)
 {
 	while (DsaPointerIsValid(item_pointer))
 	{
 		dshash_table_item *item;
 
 		item = dsa_get_address(hash_table->area, item_pointer);
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+
+		/*
+		 * If the hash values don't match, the keys certainly don't match
+		 * either, so we can skip the expensive key comparison.  Matching
+		 * hashes don't guarantee matching keys, so equal_keys() is still
+		 * needed for confirmation.
+		 */
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 			return item;
 		item_pointer = item->next;
 	}
@@ -1047,10 +1059,13 @@ insert_into_bucket(dshash_table *hash_table,
 
 /*
  * Search a bucket for a matching key and delete it.
+ * Skips items whose cached hash values don't match, avoiding expensive
+ * key comparisons.
  */
 static bool
 delete_key_from_bucket(dshash_table *hash_table,
 					   const void *key,
+					   dshash_hash hash,
 					   dsa_pointer *bucket_head)
 {
 	while (DsaPointerIsValid(*bucket_head))
@@ -1059,7 +1074,9 @@ delete_key_from_bucket(dshash_table *hash_table,
 
 		item = dsa_get_address(hash_table->area, *bucket_head);
 
-		if (equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
+		/* See comment in find_in_bucket() */
+		if (item->hash == hash &&
+			equal_keys(hash_table, key, ENTRY_FROM_ITEM(item)))
 		{
 			dsa_pointer next;
 
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2026-04-11 01:17 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-10 16:09 [PATCH] Use cached hash to skip unnecessary key comparisons in dshash CharSyam <[email protected]>
2026-04-10 18:07 ` Nathan Bossart <[email protected]>
2026-04-10 18:10   ` Nathan Bossart <[email protected]>
2026-04-11 01:17     ` CharSyam <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox