public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: Andrei Lepikhov <[email protected]>
Cc: [email protected]
Cc: Peter Eisentraut <[email protected]>
Subject: Re: Hashed SAOP on composite type with non-hashable column errors at runtime
Date: Fri, 05 Jun 2026 14:12:02 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Andrei Lepikhov <[email protected]> writes:
> There is an issue when we use a record-based array operation in SQL:

> EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
> SELECT count(*) FROM test
> WHERE (a,b) = ANY (ARRAY[
>   (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
>   (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
>   (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
>   ]);
> ERROR:  could not identify a hash function for type tsvector

Yeah, this is a bug, but I don't think you've identified the
full scope of the problem.  In the first place, there's another
get_op_hash_functions call in selfuncs.c that's also at risk.
In the second place, the same hazard exists for range and
multirange types, which can have non-hashable subtypes.
AFAICT noplace at all is defending against that.

So I'm unexcited about putting the fix for this into
convert_saop_to_hashed_saop_walker as you've done here.
I think it needs to be addressed at the level of the relevant
lsyscache.c lookup functions, so that there's some chance that
future code additions will get this right.  Draft fix attached.

I can't get excited about the test case you suggest;
it's rather expensive and it will do nothing whatever
to guard against future mistakes of the same kind.

I'm also unexcited about your 0002 and 0003.  I don't really
care about optimizing the anonymous-record case; by and large,
it's coincidental that complicated operations work at all on
anonymous record types.

			regards, tom lane



Attachments:

  [text/x-diff] v1-0001-Fix-missed-checks-for-hashability-of-container-ty.patch (13.1K, 2-v1-0001-Fix-missed-checks-for-hashability-of-container-ty.patch)
  download | inline diff:
From 8e1096d3c9b116243cf06964b2ccd8e5115ae9eb Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Fri, 5 Jun 2026 14:06:17 -0400
Subject: [PATCH v1] Fix missed checks for hashability of container-type
 equality.

The operators for array_eq, record_eq, range_eq, and multirange_eq
are all marked oprcanhash, but there's a pitfall: their hash functions
can fail at runtime if the contained type(s) are not hashable.
Therefore, the planner has to check hashability of the contained types
before deciding it can use hashing in these cases.  Not every place
had gotten this memo, and noplace at all had considered the issue
for ranges or multiranges.

For the most part we should fix this in the lookup functions provided
by lsyscache.c, to wit get_op_hash_functions and op_hashjoinable.
But there's a problem: get_op_hash_functions is not passed the input
data type it would need to check.  We mustn't change the API of that
exported function in a back-patched fix, and even if we wanted to,
its call sites in the executor mostly don't have easy access to the
required data type OID.  Fortunately, the executor call sites don't
actually need fixing, because it's expected that the planner verified
hashability before building a plan that requires it.  Therefore,
leave get_op_hash_functions as-is and invent a wrapper function
get_op_hash_functions_ext that does the additional checking needed
in the planner's uses.

We also need to fix hash_ok_operator (extending the fix in 647889667).
While at it, neaten up a couple of places in lookup_type_cache where
relevant code for multirange cases was written differently from the
code for other container types.

Reported-by: Andrei Lepikhov <[email protected]>
Author: Andrei Lepikhov <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
---
 src/backend/optimizer/plan/subselect.c |  4 +-
 src/backend/optimizer/util/clauses.c   |  9 ++-
 src/backend/utils/adt/selfuncs.c       |  4 +-
 src/backend/utils/cache/lsyscache.c    | 78 ++++++++++++++++++++++++--
 src/backend/utils/cache/typcache.c     | 25 +++------
 src/include/catalog/pg_operator.dat    |  4 +-
 src/include/utils/lsyscache.h          |  2 +
 7 files changed, 98 insertions(+), 28 deletions(-)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index ccec1eaa7fe..6aa8971c95d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -841,7 +841,9 @@ hash_ok_operator(OpExpr *expr)
 	if (list_length(expr->args) != 2)
 		return false;
 	if (opid == ARRAY_EQ_OP ||
-		opid == RECORD_EQ_OP)
+		opid == RECORD_EQ_OP ||
+		opid == RANGE_EQ_OP ||
+		opid == MULTIRANGE_EQ_OP)
 	{
 		/* these are strict, but must check input type to ensure hashable */
 		Node	   *leftarg = linitial(expr->args);
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cd86311bb0b..07738894d1a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2544,7 +2544,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
 	if (IsA(node, ScalarArrayOpExpr))
 	{
 		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) node;
-		Expr	   *arrayarg = (Expr *) lsecond(saop->args);
+		Node	   *leftarg = (Node *) linitial(saop->args);
+		Node	   *arrayarg = (Node *) lsecond(saop->args);
 		Oid			lefthashfunc;
 		Oid			righthashfunc;
 
@@ -2553,7 +2554,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
 		{
 			if (saop->useOr)
 			{
-				if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
+				if (get_op_hash_functions_ext(saop->opno, exprType(leftarg),
+											  &lefthashfunc, &righthashfunc) &&
 					lefthashfunc == righthashfunc)
 				{
 					Datum		arrdatum = ((Const *) arrayarg)->constvalue;
@@ -2585,7 +2587,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
 				 * just ensure the lookup items are not in the hash table.
 				 */
 				if (OidIsValid(negator) &&
-					get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
+					get_op_hash_functions_ext(negator, exprType(leftarg),
+											  &lefthashfunc, &righthashfunc) &&
 					lefthashfunc == righthashfunc)
 				{
 					Datum		arrdatum = ((Const *) arrayarg)->constvalue;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b9449b4574a..d6efd07073a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2476,7 +2476,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
 		 * hash functions for the join operator.
 		 */
 		if ((sslot1.nvalues + sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
-			(void) get_op_hash_functions(operator, &hashLeft, &hashRight);
+			(void) get_op_hash_functions_ext(operator,
+											 exprType((Node *) linitial(args)),
+											 &hashLeft, &hashRight);
 	}
 	else
 		memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 036086057d7..ae5fa8154a0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -572,6 +572,11 @@ get_compatible_hash_operators(Oid opno,
  *
  * Returns true if able to find the requested function(s), false if not.
  * (This indicates that the operator should not have been marked oprcanhash.)
+ *
+ * Callers must beware that for container types (arrays, records, ranges)
+ * this function will succeed for array_eq etc, but the hash function could
+ * fail at runtime if the contained type(s) are not hashable.  If it is
+ * possible that the operator is one of these, use get_op_hash_functions_ext.
  */
 bool
 get_op_hash_functions(Oid opno,
@@ -654,6 +659,55 @@ get_op_hash_functions(Oid opno,
 	return result;
 }
 
+/*
+ * get_op_hash_functions_ext
+ *		As above, but verify hashability in container-type cases.
+ *
+ * As with op_hashjoinable, assume the left input type is sufficient
+ * to disambiguate container-type cases.
+ */
+bool
+get_op_hash_functions_ext(Oid opno, Oid inputtype,
+						  RegProcedure *lhs_procno, RegProcedure *rhs_procno)
+{
+	TypeCacheEntry *typentry;
+
+	/* Ensure output args are initialized on failure */
+	if (lhs_procno)
+		*lhs_procno = InvalidOid;
+	if (rhs_procno)
+		*rhs_procno = InvalidOid;
+
+	/* As in op_hashjoinable, let the typcache handle the hard cases */
+	if (opno == ARRAY_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc != F_HASH_ARRAY)
+			return false;
+	}
+	else if (opno == RECORD_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc != F_HASH_RECORD)
+			return false;
+	}
+	else if (opno == RANGE_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc != F_HASH_RANGE)
+			return false;
+	}
+	else if (opno == MULTIRANGE_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc != F_HASH_MULTIRANGE)
+			return false;
+	}
+
+	/* OK, do the normal lookup */
+	return get_op_hash_functions(opno, lhs_procno, rhs_procno);
+}
+
 /*
  * get_op_index_interpretation
  *		Given an operator's OID, find out which amcanorder opfamilies it belongs to,
@@ -1624,7 +1678,8 @@ op_mergejoinable(Oid opno, Oid inputtype)
 	 * For array_eq or record_eq, we can sort if the element or field types
 	 * are all sortable.  We could implement all the checks for that here, but
 	 * the typcache already does that and caches the results too, so let's
-	 * rely on the typcache.
+	 * rely on the typcache.  We do not need similar special cases for ranges
+	 * or multiranges, because their subtypes are required to be sortable.
 	 */
 	if (opno == ARRAY_EQ_OP)
 	{
@@ -1659,10 +1714,11 @@ op_mergejoinable(Oid opno, Oid inputtype)
  * Returns true if the operator is hashjoinable.  (There must be a suitable
  * hash opfamily entry for this operator if it is so marked.)
  *
- * In some cases (currently only array_eq), hashjoinability depends on the
- * specific input data type the operator is invoked for, so that must be
- * passed as well.  We currently assume that only one input's type is needed
- * to check this --- by convention, pass the left input's data type.
+ * In some cases (currently array_eq, record_eq, range_eq, multirange_eq),
+ * hashjoinability depends on the specific input data type the operator is
+ * invoked for, so that must be passed as well.  We currently assume that only
+ * one input's type is needed to check this --- by convention, pass the left
+ * input's data type.
  */
 bool
 op_hashjoinable(Oid opno, Oid inputtype)
@@ -1684,6 +1740,18 @@ op_hashjoinable(Oid opno, Oid inputtype)
 		if (typentry->hash_proc == F_HASH_RECORD)
 			result = true;
 	}
+	else if (opno == RANGE_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc == F_HASH_RANGE)
+			result = true;
+	}
+	else if (opno == MULTIRANGE_EQ_OP)
+	{
+		typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+		if (typentry->hash_proc == F_HASH_MULTIRANGE)
+			result = true;
+	}
 	else
 	{
 		/* For all other operators, rely on pg_operator.oprcanhash */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index cebe7a916fb..da91a2ff1dd 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -779,8 +779,9 @@ lookup_type_cache(Oid type_id, int flags)
 										  HASHSTANDARD_PROC);
 
 		/*
-		 * As above, make sure hash_array, hash_record, or hash_range will
-		 * succeed.
+		 * As above, make sure hash_array, hash_record, hash_range, or
+		 * hash_multirange will succeed.  Here we do need to check the range
+		 * cases.
 		 */
 		if (hash_proc == F_HASH_ARRAY &&
 			!array_element_has_hashing(typentry))
@@ -791,12 +792,8 @@ lookup_type_cache(Oid type_id, int flags)
 		else if (hash_proc == F_HASH_RANGE &&
 				 !range_element_has_hashing(typentry))
 			hash_proc = InvalidOid;
-
-		/*
-		 * Likewise for hash_multirange.
-		 */
-		if (hash_proc == F_HASH_MULTIRANGE &&
-			!multirange_element_has_hashing(typentry))
+		else if (hash_proc == F_HASH_MULTIRANGE &&
+				 !multirange_element_has_hashing(typentry))
 			hash_proc = InvalidOid;
 
 		/* Force update of hash_proc_finfo only if we're changing state */
@@ -828,8 +825,8 @@ lookup_type_cache(Oid type_id, int flags)
 												   HASHEXTENDED_PROC);
 
 		/*
-		 * As above, make sure hash_array_extended, hash_record_extended, or
-		 * hash_range_extended will succeed.
+		 * As above, make sure hash_array_extended, hash_record_extended,
+		 * hash_range_extended, or hash_multirange_extended will succeed.
 		 */
 		if (hash_extended_proc == F_HASH_ARRAY_EXTENDED &&
 			!array_element_has_extended_hashing(typentry))
@@ -840,12 +837,8 @@ lookup_type_cache(Oid type_id, int flags)
 		else if (hash_extended_proc == F_HASH_RANGE_EXTENDED &&
 				 !range_element_has_extended_hashing(typentry))
 			hash_extended_proc = InvalidOid;
-
-		/*
-		 * Likewise for hash_multirange_extended.
-		 */
-		if (hash_extended_proc == F_HASH_MULTIRANGE_EXTENDED &&
-			!multirange_element_has_extended_hashing(typentry))
+		else if (hash_extended_proc == F_HASH_MULTIRANGE_EXTENDED &&
+				 !multirange_element_has_extended_hashing(typentry))
 			hash_extended_proc = InvalidOid;
 
 		/* Force update of proc finfo only if we're changing state */
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1a8fd8b8645..c7f860c442b 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3057,7 +3057,7 @@
   oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
 
 # generic range type operators
-{ oid => '3882', descr => 'equal',
+{ oid => '3882', oid_symbol => 'RANGE_EQ_OP', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'anyrange',
   oprright => 'anyrange', oprresult => 'bool', oprcom => '=(anyrange,anyrange)',
   oprnegate => '<>(anyrange,anyrange)', oprcode => 'range_eq',
@@ -3263,7 +3263,7 @@
   oprname => '@@', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_match_opr(jsonb,jsonpath)',
   oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '2860', descr => 'equal',
+{ oid => '2860', oid_symbol => 'MULTIRANGE_EQ_OP', descr => 'equal',
   oprname => '=', oprcanmerge => 't', oprcanhash => 't',
   oprleft => 'anymultirange', oprright => 'anymultirange', oprresult => 'bool',
   oprcom => '=(anymultirange,anymultirange)',
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 8545e67a632..865980cb0f1 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -86,6 +86,8 @@ extern bool get_compatible_hash_operators(Oid opno,
 										  Oid *lhs_opno, Oid *rhs_opno);
 extern bool get_op_hash_functions(Oid opno,
 								  RegProcedure *lhs_procno, RegProcedure *rhs_procno);
+extern bool get_op_hash_functions_ext(Oid opno, Oid inputtype,
+									  RegProcedure *lhs_procno, RegProcedure *rhs_procno);
 extern List *get_op_index_interpretation(Oid opno);
 extern bool equality_ops_are_compatible(Oid opno1, Oid opno2);
 extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2);
-- 
2.52.0



view thread (5+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Hashed SAOP on composite type with non-hashable column errors at runtime
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox