public inbox for [email protected]
help / color / mirror / Atom feedHashed SAOP on composite type with non-hashable column errors at runtime
5+ messages / 2 participants
[nested] [flat]
* Hashed SAOP on composite type with non-hashable column errors at runtime
@ 2026-06-05 14:27 Andrei Lepikhov <[email protected]>
0 siblings, 1 reply; 5+ messages in thread
From: Andrei Lepikhov @ 2026-06-05 14:27 UTC (permalink / raw)
To: [email protected]; +Cc: Peter Eisentraut <[email protected]>; Tom Lane <[email protected]>
Hi,
There is an issue when we use a record-based array operation in SQL:
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[
(1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
(4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
(7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
]);
ERROR: could not identify a hash function for type tsvector
See the attachment for the full reproduction script.
This happens because the hashability check for the record and array types misses
the op_hashjoinable() test. With fewer than 9 elements the query executes
successfully.
Patch 0001 (attached) fixes this bug. It is a natural follow-up to 17da9d4c282,
the hashing of record types itself was introduced by 01e658fa74c. It deserves a
back-patch down to v14.
More interesting is that EXPLAIN doesn't expose whether the executor used the
hashed or the plain search strategy. That might be acceptable, since we know
hashing is always used from nine elements on. But it forces the user first to
read the source code, and then to inspect the catalog, to find out whether the
clause has a hash function. For a SubPlan we do have this information — so let's
take a look at v0-0002, which introduces a 'hashed' flag.
It would be too prosaic a bug fix if there weren't a nice corner case with the
anonymous record type. Consider the following:
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT count(*) FROM (SELECT g x, -g y FROM generate_series(1,300000) g) t
WHERE (x, y) = ANY
(array[(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(64,-64)]);
/*
-- Before the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=63 read=5, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=63 read=5, temp read=513 written=513
Planning:
Buffers: shared hit=45 read=16
Planning Time: 2.923 ms
Execution Time: 62.969 ms
(10 rows)
-- After the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=42, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=42, temp read=513 written=513
Planning:
Buffers: shared hit=88
Planning Time: 0.837 ms
Execution Time: 745.897 ms
(10 rows)
*/
You can see a regression here: a legitimate hashed SAOP is no longer hashed. The
fix for that is not so simple — we have to check every element of the array
before deciding whether the hashing strategy is possible. This is quite an
expensive operation, so I sketched a solution in patch 0003, but I'm not sure it
is worth developing: checking an anonymous type might simply be too expensive.
Should it be done only once, conditionally, with a size limit and result caching?
--
regards, Andrei Lepikhov,
pgEdge
From d81d95a0fc08df30a71a33bcc992066d8f95a1ca Mon Sep 17 00:00:00 2001
From: Andrei Lepikhov <[email protected]>
Date: Fri, 5 Jun 2026 07:11:54 +0000
Subject: [PATCH v0 1/3] Don't hash a record/array SAOP whose input type isn't
hashable
convert_saop_to_hashed_saop_walker() decided to evaluate a
ScalarArrayOpExpr with a hash table whenever get_op_hash_functions()
reported the operator as hashable. That function only performs a
pg_amop membership test, and since 01e658fa74c added hash/record_ops,
it returns true for record_eq unconditionally. But record_eq (like
array_eq) is only actually hashable when every column/element type has
a hash function. For a composite type with a non-hashable column type
(e.g. tsvector) and an array of at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP
constant elements, the planner therefore enabled hashing and the executor
failed at runtime.
This is the same trap that hash_ok_operator() fell into and that
17da9d4c282 fixed for subplan hashing. Give the SAOP planner gate the same
treatment.
---
src/backend/optimizer/util/clauses.c | 26 +++++++++++++++++++++--
src/backend/utils/cache/lsyscache.c | 6 +++---
src/test/regress/expected/expressions.out | 25 ++++++++++++++++++++++
src/test/regress/sql/expressions.sql | 20 +++++++++++++++++
4 files changed, 72 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cd86311bb0b..d44f674dc76 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2535,6 +2535,24 @@ convert_saop_to_hashed_saop(Node *node)
(void) convert_saop_to_hashed_saop_walker(node, NULL);
}
+/*
+ * saop_hashable_for_type
+ * Can a hashed ScalarArrayOpExpr safely use equality operator 'eqop'
+ * for left-hand input type 'lefttype'?
+ *
+ * get_op_hash_functions() reports record_eq and array_eq as hashable
+ * unconditionally. But hashability actually depends on the specific input
+ * type: every column/element type must itself be hashable. Re-check such
+ * operators through op_hashjoinable().
+ */
+static bool
+saop_hashable_for_type(Oid eqop, Oid lefttype)
+{
+ if (eqop == RECORD_EQ_OP || eqop == ARRAY_EQ_OP)
+ return op_hashjoinable(eqop, lefttype);
+ return true;
+}
+
static bool
convert_saop_to_hashed_saop_walker(Node *node, void *context)
{
@@ -2554,7 +2572,9 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
if (saop->useOr)
{
if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
- lefthashfunc == righthashfunc)
+ lefthashfunc == righthashfunc &&
+ saop_hashable_for_type(saop->opno,
+ exprType(linitial(saop->args))))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
@@ -2586,7 +2606,9 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
*/
if (OidIsValid(negator) &&
get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
- lefthashfunc == righthashfunc)
+ lefthashfunc == righthashfunc &&
+ saop_hashable_for_type(negator,
+ exprType(linitial(saop->args))))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 036086057d7..3c7fecf526d 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1659,9 +1659,9 @@ op_mergejoinable(Oid opno, Oid inputtype)
* Returns true if the operator is hashjoinable. (There must be a suitable
* hash opfamily entry for this operator if it is so marked.)
*
- * In some cases (currently only array_eq), hashjoinability depends on the
- * specific input data type the operator is invoked for, so that must be
- * passed as well. We currently assume that only one input's type is needed
+ * In some cases (currently array_eq and record_eq), hashjoinability depends
+ * on the specific input data type the operator is invoked for, so that must
+ * be passed as well. We currently assume that only one input's type is needed
* to check this --- by convention, pass the left input's data type.
*/
bool
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 730f7bc7eba..c35583cb2ea 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -561,3 +561,28 @@ from inttest;
(3 rows)
rollback;
+--
+-- Hashed SAOP must not be selected for a composite type that is not actually
+-- hashable.
+--
+CREATE TABLE hashed_saop_tbl (a int, b tsvector);
+INSERT INTO hashed_saop_tbl
+ SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
+ANALYZE hashed_saop_tbl;
+-- Throws an ERROR if the hashed strategy has been chosen
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
+SELECT count(*) FROM hashed_saop_tbl
+WHERE (a,b) = ANY (ARRAY[
+ (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
+ (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
+ (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
+ ]);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1.00 loops=1)
+ -> Seq Scan on hashed_saop_tbl (actual rows=9.00 loops=1)
+ Filter: (ROW(a, b) = ANY ('{"(1,''w1'')","(2,''w2'')","(3,''w3'')","(4,''w4'')","(5,''w5'')","(6,''w6'')","(7,''w7'')","(8,''w8'')","(9,''w9'')"}'::record[]))
+ Rows Removed by Filter: 991
+(4 rows)
+
+DROP TABLE hashed_saop_tbl;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index 3b3048f9731..08691877902 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -301,3 +301,23 @@ select
from inttest;
rollback;
+
+--
+-- Hashed SAOP must not be selected for a composite type that is not actually
+-- hashable.
+--
+
+CREATE TABLE hashed_saop_tbl (a int, b tsvector);
+INSERT INTO hashed_saop_tbl
+ SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
+ANALYZE hashed_saop_tbl;
+
+-- Throws an ERROR if the hashed strategy has been chosen
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
+SELECT count(*) FROM hashed_saop_tbl
+WHERE (a,b) = ANY (ARRAY[
+ (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
+ (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
+ (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
+ ]);
+DROP TABLE hashed_saop_tbl;
--
2.54.0
From 241dee85f1094ce797861cd1d082691fd7ec41ab Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Fri, 5 Jun 2026 11:07:21 +0200
Subject: [PATCH v0 2/3] Show hashed ScalarArrayOpExpr decision in EXPLAIN
When the planner converts a ScalarArrayOpExpr to hash-table evaluation
(convert_saop_to_hashed_saop), the resulting node deparses identically
to a linear one, so EXPLAIN gives no indication of which strategy the
executor will use. That made the hashed path invisible and awkward to
test.
---
src/backend/utils/adt/ruleutils.c | 10 +++++++++-
src/test/regress/expected/expressions.out | 9 +++++++++
src/test/regress/sql/expressions.sql | 3 +++
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 88de5c0481c..0567ba0886a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9989,10 +9989,18 @@ get_rule_expr(Node *node, deparse_context *context,
if (!PRETTY_PAREN(context))
appendStringInfoChar(buf, '(');
get_rule_expr_paren(arg1, context, true, node);
- appendStringInfo(buf, " %s %s (",
+
+ /*
+ * Surface hashed decision in EXPLAIN.
+ * hashfuncid is only ever set in a finished plan tree, so this
+ * never appears in deparsed views, rules, or other stored
+ * expressions.
+ */
+ appendStringInfo(buf, " %s %s%s (",
generate_operator_name(expr->opno,
exprType(arg1),
get_base_element_type(exprType(arg2))),
+ OidIsValid(expr->hashfuncid) ? "hashed " : "",
expr->useOr ? "ANY" : "ALL");
get_rule_expr_paren(arg2, context, true, node);
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index c35583cb2ea..9d4db9c3b2a 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -327,6 +327,15 @@ select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i
f
(1 row)
+-- Check tha explain marks the hashed decision.
+explain (verbose, costs off)
+select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Result
+ Output: (return_int_input(1) = hashed ANY ('{10,9,2,8,3,7,4,6,5,1}'::integer[]))
+(2 rows)
+
rollback;
-- Test with non-strict equality function.
-- We need to create our own type for this.
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index 08691877902..fe9c330361b 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -131,6 +131,9 @@ select return_int_input(1) not in (null, null, null, null, null, null, null, nul
select return_int_input(null::int) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
select return_int_input(null::int) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, null);
select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j');
+-- Check tha explain marks the hashed decision.
+explain (verbose, costs off)
+select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
rollback;
--
2.54.0
From 3d67d0b2114b0c0aa271606767ad6e8b33c48d10 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Fri, 5 Jun 2026 14:14:15 +0000
Subject: [PATCH v0 3/3] Recover hashed SAOP for anonymous records with
hashable columns
The previous commit routed record_eq through op_hashjoinable(), which
fixed the runtime failure on composites with non-hashable columns but
also disabled hashing for all anonymous-RECORD SAOPs: the typcache
refuses to vouch for bare RECORD, so op_hashjoinable() returns false
even when every column is hashable. The common "(a, b) = ANY (...)"
idiom thus regressed to a linear search.
A hashed SAOP always has the array as a Const, so unlike the sub-SELECT
case in hash_ok_operator() we can examine the actual data: resolve the
concrete rowtype of every array element and re-enable hashing when all
their column types are hashable. The rowtype is looked up with the
no-error variant so an unregistered typmod degrades to a linear search
rather than an ERROR.
---
src/backend/optimizer/util/clauses.c | 128 +++++++++++++++++++++-
src/test/regress/expected/expressions.out | 10 ++
src/test/regress/sql/expressions.sql | 5 +
3 files changed, 138 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index d44f674dc76..bbb8b7c3b59 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,7 @@
#include "rewrite/rewriteManip.h"
#include "tcop/tcopprot.h"
#include "utils/acl.h"
+#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/datum.h"
#include "utils/fmgroids.h"
@@ -2535,21 +2536,136 @@ convert_saop_to_hashed_saop(Node *node)
(void) convert_saop_to_hashed_saop_walker(node, NULL);
}
+/*
+ * record_const_array_is_hashable
+ * Are all column types of every element of a constant record[] array
+ * hashable?
+ *
+ * Used to recover hashing for an anonymous-RECORD ScalarArrayOpExpr. The
+ * typcache deliberately refuses to report bare RECORD as field-hashable
+ * because it cannot know the columns of an arbitrary anonymous record (see
+ * cache_record_field_properties()). But a hashed SAOP always has the array
+ * as a Const, so here we do have the concrete records in hand: resolve each
+ * element's actual rowtype and check that every (non-dropped) column type has
+ * a hash function. If so, hash_record() cannot fail on this array.
+ *
+ * We examine every element, not just the first: an array of RECORD is uniform
+ * only in element type, and individual elements may carry different blessed
+ * rowtypes (different typmods). A later element with a non-hashable column
+ * would otherwise trip the very failure this guards against.
+ *
+ * The left-hand input need not be examined separately. record_eq() compares
+ * by physical columns and errors out on dissimilar column types, so at runtime
+ * the LHS record either shares the array elements' column layout (hence is
+ * equally hashable) or the comparison errors regardless of hashing.
+ */
+static bool
+record_const_array_is_hashable(Const *arrayConst)
+{
+ ArrayType *arr;
+ int16 elmlen;
+ bool elmbyval;
+ char elmalign;
+ Datum *elems;
+ bool *nulls;
+ int nelems;
+ bool result = true;
+ int32 lastTypmod = -1;
+ Oid lastType = InvalidOid;
+
+ Assert(arrayConst != NULL && !arrayConst->constisnull);
+
+ arr = DatumGetArrayTypeP(arrayConst->constvalue);
+ if (ARR_ELEMTYPE(arr) != RECORDOID)
+ return false;
+
+ get_typlenbyvalalign(RECORDOID, &elmlen, &elmbyval, &elmalign);
+ deconstruct_array(arr, RECORDOID, elmlen, elmbyval, elmalign,
+ &elems, &nulls, &nelems);
+
+ for (int i = 0; i < nelems && result; i++)
+ {
+ HeapTupleHeader rec;
+ Oid tupType;
+ int32 tupTypmod;
+ TupleDesc tupdesc;
+
+ if (nulls[i])
+ continue;
+
+ rec = DatumGetHeapTupleHeader(elems[i]);
+ tupType = HeapTupleHeaderGetTypeId(rec);
+ tupTypmod = HeapTupleHeaderGetTypMod(rec);
+
+ /* Skip the rowtype lookup when this element matches the previous one */
+ if (tupType == lastType && tupTypmod == lastTypmod)
+ continue;
+ lastType = tupType;
+ lastTypmod = tupTypmod;
+
+ /*
+ * Use the no-error variant: an unregistered blessed typmod must never
+ * turn this planner-time optimization decision into an ERROR. Treat a
+ * missing tupdesc as "not hashable" and fall back to a linear search.
+ */
+ tupdesc = lookup_rowtype_tupdesc_noerror(tupType, tupTypmod, true);
+ if (tupdesc == NULL)
+ {
+ result = false;
+ break;
+ }
+ for (int j = 0; j < tupdesc->natts; j++)
+ {
+ Form_pg_attribute att = TupleDescAttr(tupdesc, j);
+ TypeCacheEntry *fieldentry;
+
+ if (att->attisdropped)
+ continue;
+ fieldentry = lookup_type_cache(att->atttypid, TYPECACHE_HASH_PROC);
+ if (!OidIsValid(fieldentry->hash_proc))
+ {
+ result = false;
+ break;
+ }
+ }
+ ReleaseTupleDesc(tupdesc);
+ }
+
+ pfree(elems);
+ pfree(nulls);
+ return result;
+}
+
/*
* saop_hashable_for_type
* Can a hashed ScalarArrayOpExpr safely use equality operator 'eqop'
- * for left-hand input type 'lefttype'?
+ * for left-hand input type 'lefttype' over constant array 'arrayConst'?
*
* get_op_hash_functions() reports record_eq and array_eq as hashable
* unconditionally. But hashability actually depends on the specific input
* type: every column/element type must itself be hashable. Re-check such
* operators through op_hashjoinable().
+ *
+ * op_hashjoinable() conservatively returns false for anonymous RECORD, since
+ * the typcache cannot inspect an arbitrary record's columns. In that one case
+ * we have more information than the typcache -- the constant array itself --
+ * so we examine the actual element rowtypes and allow hashing when they are
+ * all hashable.
*/
static bool
-saop_hashable_for_type(Oid eqop, Oid lefttype)
+saop_hashable_for_type(Oid eqop, Oid lefttype, Const *arrayConst)
{
- if (eqop == RECORD_EQ_OP || eqop == ARRAY_EQ_OP)
+ if (eqop == ARRAY_EQ_OP)
return op_hashjoinable(eqop, lefttype);
+ if (eqop == RECORD_EQ_OP)
+ {
+ if (op_hashjoinable(eqop, lefttype))
+ return true;
+ /* Recover hashing for anonymous RECORD with hashable columns. */
+ if (lefttype == RECORDOID)
+ return record_const_array_is_hashable(arrayConst);
+ return false;
+ }
return true;
}
@@ -2574,7 +2690,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc &&
saop_hashable_for_type(saop->opno,
- exprType(linitial(saop->args))))
+ exprType(linitial(saop->args)),
+ (Const *) arrayarg))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
@@ -2608,7 +2725,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc &&
saop_hashable_for_type(negator,
- exprType(linitial(saop->args))))
+ exprType(linitial(saop->args)),
+ (Const *) arrayarg))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9d4db9c3b2a..958d92e0956 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -335,6 +335,16 @@ select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
Result
Output: (return_int_input(1) = hashed ANY ('{10,9,2,8,3,7,4,6,5,1}'::integer[]))
(2 rows)
+-- An anonymous-record SAOP whose columns are all hashable also hashes: the
+-- planner cannot rely on the typcache for bare RECORD, so it inspects the
+-- constant array's actual rowtype and recovers the hashed plan.
+explain (verbose, costs off)
+select (return_int_input(1), return_int_input(2)) = any (array[(1,2),(3,4),(5,6),(7,8),(9,10),(11,12),(13,14),(15,16),(17,18)]);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Result
+ Output: (ROW(return_int_input(1), return_int_input(2)) = hashed ANY ('{"(1,2)","(3,4)","(5,6)","(7,8)","(9,10)","(11,12)","(13,14)","(15,16)","(17,18)"}'::record[]))
+(2 rows)
rollback;
-- Test with non-strict equality function.
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index fe9c330361b..a8daf14ffc4 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -134,6 +134,11 @@ select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i
-- Check tha explain marks the hashed decision.
explain (verbose, costs off)
select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
+-- An anonymous-record SAOP whose columns are all hashable also hashes: the
+-- planner cannot rely on the typcache for bare RECORD, so it inspects the
+-- constant array's actual rowtype and recovers the hashed plan.
+explain (verbose, costs off)
+select (return_int_input(1), return_int_input(2)) = any (array[(1,2),(3,4),(5,6),(7,8),(9,10),(11,12),(13,14),(15,16),(17,18)]);
rollback;
--
2.54.0
CREATE TABLE test (a int, b tsvector);
INSERT INTO test SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
ANALYZE test;
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[(1, 'w1'::tsvector), (2, 'w2'::tsvector)]);
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[
(1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
(4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
(7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
]);
/*
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (actual rows=1.00 loops=1)
-> Seq Scan on test (actual rows=2.00 loops=1)
Filter: (ROW(a, b) = ANY ('{"(1,''w1'')","(2,''w2'')"}'::record[]))
Rows Removed by Filter: 998
Planning Time: 0.119 ms
Execution Time: 0.854 ms
(6 rows)
ERROR: could not identify a hash function for type tsvector
*/
Attachments:
[text/plain] v0-0001-Don-t-hash-a-record-array-SAOP-whose-input-type-i.patch (6.8K, 2-v0-0001-Don-t-hash-a-record-array-SAOP-whose-input-type-i.patch)
download | inline diff:
From d81d95a0fc08df30a71a33bcc992066d8f95a1ca Mon Sep 17 00:00:00 2001
From: Andrei Lepikhov <[email protected]>
Date: Fri, 5 Jun 2026 07:11:54 +0000
Subject: [PATCH v0 1/3] Don't hash a record/array SAOP whose input type isn't
hashable
convert_saop_to_hashed_saop_walker() decided to evaluate a
ScalarArrayOpExpr with a hash table whenever get_op_hash_functions()
reported the operator as hashable. That function only performs a
pg_amop membership test, and since 01e658fa74c added hash/record_ops,
it returns true for record_eq unconditionally. But record_eq (like
array_eq) is only actually hashable when every column/element type has
a hash function. For a composite type with a non-hashable column type
(e.g. tsvector) and an array of at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP
constant elements, the planner therefore enabled hashing and the executor
failed at runtime.
This is the same trap that hash_ok_operator() fell into and that
17da9d4c282 fixed for subplan hashing. Give the SAOP planner gate the same
treatment.
---
src/backend/optimizer/util/clauses.c | 26 +++++++++++++++++++++--
src/backend/utils/cache/lsyscache.c | 6 +++---
src/test/regress/expected/expressions.out | 25 ++++++++++++++++++++++
src/test/regress/sql/expressions.sql | 20 +++++++++++++++++
4 files changed, 72 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cd86311bb0b..d44f674dc76 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2535,6 +2535,24 @@ convert_saop_to_hashed_saop(Node *node)
(void) convert_saop_to_hashed_saop_walker(node, NULL);
}
+/*
+ * saop_hashable_for_type
+ * Can a hashed ScalarArrayOpExpr safely use equality operator 'eqop'
+ * for left-hand input type 'lefttype'?
+ *
+ * get_op_hash_functions() reports record_eq and array_eq as hashable
+ * unconditionally. But hashability actually depends on the specific input
+ * type: every column/element type must itself be hashable. Re-check such
+ * operators through op_hashjoinable().
+ */
+static bool
+saop_hashable_for_type(Oid eqop, Oid lefttype)
+{
+ if (eqop == RECORD_EQ_OP || eqop == ARRAY_EQ_OP)
+ return op_hashjoinable(eqop, lefttype);
+ return true;
+}
+
static bool
convert_saop_to_hashed_saop_walker(Node *node, void *context)
{
@@ -2554,7 +2572,9 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
if (saop->useOr)
{
if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
- lefthashfunc == righthashfunc)
+ lefthashfunc == righthashfunc &&
+ saop_hashable_for_type(saop->opno,
+ exprType(linitial(saop->args))))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
@@ -2586,7 +2606,9 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
*/
if (OidIsValid(negator) &&
get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
- lefthashfunc == righthashfunc)
+ lefthashfunc == righthashfunc &&
+ saop_hashable_for_type(negator,
+ exprType(linitial(saop->args))))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 036086057d7..3c7fecf526d 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1659,9 +1659,9 @@ op_mergejoinable(Oid opno, Oid inputtype)
* Returns true if the operator is hashjoinable. (There must be a suitable
* hash opfamily entry for this operator if it is so marked.)
*
- * In some cases (currently only array_eq), hashjoinability depends on the
- * specific input data type the operator is invoked for, so that must be
- * passed as well. We currently assume that only one input's type is needed
+ * In some cases (currently array_eq and record_eq), hashjoinability depends
+ * on the specific input data type the operator is invoked for, so that must
+ * be passed as well. We currently assume that only one input's type is needed
* to check this --- by convention, pass the left input's data type.
*/
bool
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 730f7bc7eba..c35583cb2ea 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -561,3 +561,28 @@ from inttest;
(3 rows)
rollback;
+--
+-- Hashed SAOP must not be selected for a composite type that is not actually
+-- hashable.
+--
+CREATE TABLE hashed_saop_tbl (a int, b tsvector);
+INSERT INTO hashed_saop_tbl
+ SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
+ANALYZE hashed_saop_tbl;
+-- Throws an ERROR if the hashed strategy has been chosen
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
+SELECT count(*) FROM hashed_saop_tbl
+WHERE (a,b) = ANY (ARRAY[
+ (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
+ (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
+ (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
+ ]);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Aggregate (actual rows=1.00 loops=1)
+ -> Seq Scan on hashed_saop_tbl (actual rows=9.00 loops=1)
+ Filter: (ROW(a, b) = ANY ('{"(1,''w1'')","(2,''w2'')","(3,''w3'')","(4,''w4'')","(5,''w5'')","(6,''w6'')","(7,''w7'')","(8,''w8'')","(9,''w9'')"}'::record[]))
+ Rows Removed by Filter: 991
+(4 rows)
+
+DROP TABLE hashed_saop_tbl;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index 3b3048f9731..08691877902 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -301,3 +301,23 @@ select
from inttest;
rollback;
+
+--
+-- Hashed SAOP must not be selected for a composite type that is not actually
+-- hashable.
+--
+
+CREATE TABLE hashed_saop_tbl (a int, b tsvector);
+INSERT INTO hashed_saop_tbl
+ SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
+ANALYZE hashed_saop_tbl;
+
+-- Throws an ERROR if the hashed strategy has been chosen
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
+SELECT count(*) FROM hashed_saop_tbl
+WHERE (a,b) = ANY (ARRAY[
+ (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
+ (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
+ (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
+ ]);
+DROP TABLE hashed_saop_tbl;
--
2.54.0
[text/plain] v0-0002-Show-hashed-ScalarArrayOpExpr-decision-in-EXPLAIN.patch (3.2K, 3-v0-0002-Show-hashed-ScalarArrayOpExpr-decision-in-EXPLAIN.patch)
download | inline diff:
From 241dee85f1094ce797861cd1d082691fd7ec41ab Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Fri, 5 Jun 2026 11:07:21 +0200
Subject: [PATCH v0 2/3] Show hashed ScalarArrayOpExpr decision in EXPLAIN
When the planner converts a ScalarArrayOpExpr to hash-table evaluation
(convert_saop_to_hashed_saop), the resulting node deparses identically
to a linear one, so EXPLAIN gives no indication of which strategy the
executor will use. That made the hashed path invisible and awkward to
test.
---
src/backend/utils/adt/ruleutils.c | 10 +++++++++-
src/test/regress/expected/expressions.out | 9 +++++++++
src/test/regress/sql/expressions.sql | 3 +++
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 88de5c0481c..0567ba0886a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9989,10 +9989,18 @@ get_rule_expr(Node *node, deparse_context *context,
if (!PRETTY_PAREN(context))
appendStringInfoChar(buf, '(');
get_rule_expr_paren(arg1, context, true, node);
- appendStringInfo(buf, " %s %s (",
+
+ /*
+ * Surface hashed decision in EXPLAIN.
+ * hashfuncid is only ever set in a finished plan tree, so this
+ * never appears in deparsed views, rules, or other stored
+ * expressions.
+ */
+ appendStringInfo(buf, " %s %s%s (",
generate_operator_name(expr->opno,
exprType(arg1),
get_base_element_type(exprType(arg2))),
+ OidIsValid(expr->hashfuncid) ? "hashed " : "",
expr->useOr ? "ANY" : "ALL");
get_rule_expr_paren(arg2, context, true, node);
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index c35583cb2ea..9d4db9c3b2a 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -327,6 +327,15 @@ select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i
f
(1 row)
+-- Check tha explain marks the hashed decision.
+explain (verbose, costs off)
+select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Result
+ Output: (return_int_input(1) = hashed ANY ('{10,9,2,8,3,7,4,6,5,1}'::integer[]))
+(2 rows)
+
rollback;
-- Test with non-strict equality function.
-- We need to create our own type for this.
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index 08691877902..fe9c330361b 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -131,6 +131,9 @@ select return_int_input(1) not in (null, null, null, null, null, null, null, nul
select return_int_input(null::int) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
select return_int_input(null::int) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, null);
select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j');
+-- Check tha explain marks the hashed decision.
+explain (verbose, costs off)
+select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
rollback;
--
2.54.0
[text/plain] v0-0003-Recover-hashed-SAOP-for-anonymous-records-with-ha.patch (9.4K, 4-v0-0003-Recover-hashed-SAOP-for-anonymous-records-with-ha.patch)
download | inline diff:
From 3d67d0b2114b0c0aa271606767ad6e8b33c48d10 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <[email protected]>
Date: Fri, 5 Jun 2026 14:14:15 +0000
Subject: [PATCH v0 3/3] Recover hashed SAOP for anonymous records with
hashable columns
The previous commit routed record_eq through op_hashjoinable(), which
fixed the runtime failure on composites with non-hashable columns but
also disabled hashing for all anonymous-RECORD SAOPs: the typcache
refuses to vouch for bare RECORD, so op_hashjoinable() returns false
even when every column is hashable. The common "(a, b) = ANY (...)"
idiom thus regressed to a linear search.
A hashed SAOP always has the array as a Const, so unlike the sub-SELECT
case in hash_ok_operator() we can examine the actual data: resolve the
concrete rowtype of every array element and re-enable hashing when all
their column types are hashable. The rowtype is looked up with the
no-error variant so an unregistered typmod degrades to a linear search
rather than an ERROR.
---
src/backend/optimizer/util/clauses.c | 128 +++++++++++++++++++++-
src/test/regress/expected/expressions.out | 10 ++
src/test/regress/sql/expressions.sql | 5 +
3 files changed, 138 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index d44f674dc76..bbb8b7c3b59 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -52,6 +52,7 @@
#include "rewrite/rewriteManip.h"
#include "tcop/tcopprot.h"
#include "utils/acl.h"
+#include "utils/array.h"
#include "utils/builtins.h"
#include "utils/datum.h"
#include "utils/fmgroids.h"
@@ -2535,21 +2536,136 @@ convert_saop_to_hashed_saop(Node *node)
(void) convert_saop_to_hashed_saop_walker(node, NULL);
}
+/*
+ * record_const_array_is_hashable
+ * Are all column types of every element of a constant record[] array
+ * hashable?
+ *
+ * Used to recover hashing for an anonymous-RECORD ScalarArrayOpExpr. The
+ * typcache deliberately refuses to report bare RECORD as field-hashable
+ * because it cannot know the columns of an arbitrary anonymous record (see
+ * cache_record_field_properties()). But a hashed SAOP always has the array
+ * as a Const, so here we do have the concrete records in hand: resolve each
+ * element's actual rowtype and check that every (non-dropped) column type has
+ * a hash function. If so, hash_record() cannot fail on this array.
+ *
+ * We examine every element, not just the first: an array of RECORD is uniform
+ * only in element type, and individual elements may carry different blessed
+ * rowtypes (different typmods). A later element with a non-hashable column
+ * would otherwise trip the very failure this guards against.
+ *
+ * The left-hand input need not be examined separately. record_eq() compares
+ * by physical columns and errors out on dissimilar column types, so at runtime
+ * the LHS record either shares the array elements' column layout (hence is
+ * equally hashable) or the comparison errors regardless of hashing.
+ */
+static bool
+record_const_array_is_hashable(Const *arrayConst)
+{
+ ArrayType *arr;
+ int16 elmlen;
+ bool elmbyval;
+ char elmalign;
+ Datum *elems;
+ bool *nulls;
+ int nelems;
+ bool result = true;
+ int32 lastTypmod = -1;
+ Oid lastType = InvalidOid;
+
+ Assert(arrayConst != NULL && !arrayConst->constisnull);
+
+ arr = DatumGetArrayTypeP(arrayConst->constvalue);
+ if (ARR_ELEMTYPE(arr) != RECORDOID)
+ return false;
+
+ get_typlenbyvalalign(RECORDOID, &elmlen, &elmbyval, &elmalign);
+ deconstruct_array(arr, RECORDOID, elmlen, elmbyval, elmalign,
+ &elems, &nulls, &nelems);
+
+ for (int i = 0; i < nelems && result; i++)
+ {
+ HeapTupleHeader rec;
+ Oid tupType;
+ int32 tupTypmod;
+ TupleDesc tupdesc;
+
+ if (nulls[i])
+ continue;
+
+ rec = DatumGetHeapTupleHeader(elems[i]);
+ tupType = HeapTupleHeaderGetTypeId(rec);
+ tupTypmod = HeapTupleHeaderGetTypMod(rec);
+
+ /* Skip the rowtype lookup when this element matches the previous one */
+ if (tupType == lastType && tupTypmod == lastTypmod)
+ continue;
+ lastType = tupType;
+ lastTypmod = tupTypmod;
+
+ /*
+ * Use the no-error variant: an unregistered blessed typmod must never
+ * turn this planner-time optimization decision into an ERROR. Treat a
+ * missing tupdesc as "not hashable" and fall back to a linear search.
+ */
+ tupdesc = lookup_rowtype_tupdesc_noerror(tupType, tupTypmod, true);
+ if (tupdesc == NULL)
+ {
+ result = false;
+ break;
+ }
+ for (int j = 0; j < tupdesc->natts; j++)
+ {
+ Form_pg_attribute att = TupleDescAttr(tupdesc, j);
+ TypeCacheEntry *fieldentry;
+
+ if (att->attisdropped)
+ continue;
+ fieldentry = lookup_type_cache(att->atttypid, TYPECACHE_HASH_PROC);
+ if (!OidIsValid(fieldentry->hash_proc))
+ {
+ result = false;
+ break;
+ }
+ }
+ ReleaseTupleDesc(tupdesc);
+ }
+
+ pfree(elems);
+ pfree(nulls);
+ return result;
+}
+
/*
* saop_hashable_for_type
* Can a hashed ScalarArrayOpExpr safely use equality operator 'eqop'
- * for left-hand input type 'lefttype'?
+ * for left-hand input type 'lefttype' over constant array 'arrayConst'?
*
* get_op_hash_functions() reports record_eq and array_eq as hashable
* unconditionally. But hashability actually depends on the specific input
* type: every column/element type must itself be hashable. Re-check such
* operators through op_hashjoinable().
+ *
+ * op_hashjoinable() conservatively returns false for anonymous RECORD, since
+ * the typcache cannot inspect an arbitrary record's columns. In that one case
+ * we have more information than the typcache -- the constant array itself --
+ * so we examine the actual element rowtypes and allow hashing when they are
+ * all hashable.
*/
static bool
-saop_hashable_for_type(Oid eqop, Oid lefttype)
+saop_hashable_for_type(Oid eqop, Oid lefttype, Const *arrayConst)
{
- if (eqop == RECORD_EQ_OP || eqop == ARRAY_EQ_OP)
+ if (eqop == ARRAY_EQ_OP)
return op_hashjoinable(eqop, lefttype);
+ if (eqop == RECORD_EQ_OP)
+ {
+ if (op_hashjoinable(eqop, lefttype))
+ return true;
+ /* Recover hashing for anonymous RECORD with hashable columns. */
+ if (lefttype == RECORDOID)
+ return record_const_array_is_hashable(arrayConst);
+ return false;
+ }
return true;
}
@@ -2574,7 +2690,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc &&
saop_hashable_for_type(saop->opno,
- exprType(linitial(saop->args))))
+ exprType(linitial(saop->args)),
+ (Const *) arrayarg))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
@@ -2608,7 +2725,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc &&
saop_hashable_for_type(negator,
- exprType(linitial(saop->args))))
+ exprType(linitial(saop->args)),
+ (Const *) arrayarg))
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
ArrayType *arr = (ArrayType *) DatumGetPointer(arrdatum);
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9d4db9c3b2a..958d92e0956 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -335,6 +335,16 @@ select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
Result
Output: (return_int_input(1) = hashed ANY ('{10,9,2,8,3,7,4,6,5,1}'::integer[]))
(2 rows)
+-- An anonymous-record SAOP whose columns are all hashable also hashes: the
+-- planner cannot rely on the typcache for bare RECORD, so it inspects the
+-- constant array's actual rowtype and recovers the hashed plan.
+explain (verbose, costs off)
+select (return_int_input(1), return_int_input(2)) = any (array[(1,2),(3,4),(5,6),(7,8),(9,10),(11,12),(13,14),(15,16),(17,18)]);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Result
+ Output: (ROW(return_int_input(1), return_int_input(2)) = hashed ANY ('{"(1,2)","(3,4)","(5,6)","(7,8)","(9,10)","(11,12)","(13,14)","(15,16)","(17,18)"}'::record[]))
+(2 rows)
rollback;
-- Test with non-strict equality function.
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index fe9c330361b..a8daf14ffc4 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -134,6 +134,11 @@ select return_text_input('a') not in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i
-- Check tha explain marks the hashed decision.
explain (verbose, costs off)
select return_int_input(1) in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1);
+-- An anonymous-record SAOP whose columns are all hashable also hashes: the
+-- planner cannot rely on the typcache for bare RECORD, so it inspects the
+-- constant array's actual rowtype and recovers the hashed plan.
+explain (verbose, costs off)
+select (return_int_input(1), return_int_input(2)) = any (array[(1,2),(3,4),(5,6),(7,8),(9,10),(11,12),(13,14),(15,16),(17,18)]);
rollback;
--
2.54.0
[text/plain] bug-hashed-saop.sql (1.1K, 5-bug-hashed-saop.sql)
download | inline:
CREATE TABLE test (a int, b tsvector);
INSERT INTO test SELECT g, ('w' || g)::tsvector FROM generate_series(1, 1000) g;
ANALYZE test;
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[(1, 'w1'::tsvector), (2, 'w2'::tsvector)]);
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[
(1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
(4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
(7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
]);
/*
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (actual rows=1.00 loops=1)
-> Seq Scan on test (actual rows=2.00 loops=1)
Filter: (ROW(a, b) = ANY ('{"(1,''w1'')","(2,''w2'')"}'::record[]))
Rows Removed by Filter: 998
Planning Time: 0.119 ms
Execution Time: 0.854 ms
(6 rows)
ERROR: could not identify a hash function for type tsvector
*/
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Hashed SAOP on composite type with non-hashable column errors at runtime
@ 2026-06-05 18:12 Tom Lane <[email protected]>
parent: Andrei Lepikhov <[email protected]>
0 siblings, 2 replies; 5+ messages in thread
From: Tom Lane @ 2026-06-05 18:12 UTC (permalink / raw)
To: Andrei Lepikhov <[email protected]>; +Cc: [email protected]; Peter Eisentraut <[email protected]>
Andrei Lepikhov <[email protected]> writes:
> There is an issue when we use a record-based array operation in SQL:
> EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
> SELECT count(*) FROM test
> WHERE (a,b) = ANY (ARRAY[
> (1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
> (4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
> (7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
> ]);
> ERROR: could not identify a hash function for type tsvector
Yeah, this is a bug, but I don't think you've identified the
full scope of the problem. In the first place, there's another
get_op_hash_functions call in selfuncs.c that's also at risk.
In the second place, the same hazard exists for range and
multirange types, which can have non-hashable subtypes.
AFAICT noplace at all is defending against that.
So I'm unexcited about putting the fix for this into
convert_saop_to_hashed_saop_walker as you've done here.
I think it needs to be addressed at the level of the relevant
lsyscache.c lookup functions, so that there's some chance that
future code additions will get this right. Draft fix attached.
I can't get excited about the test case you suggest;
it's rather expensive and it will do nothing whatever
to guard against future mistakes of the same kind.
I'm also unexcited about your 0002 and 0003. I don't really
care about optimizing the anonymous-record case; by and large,
it's coincidental that complicated operations work at all on
anonymous record types.
regards, tom lane
Attachments:
[text/x-diff] v1-0001-Fix-missed-checks-for-hashability-of-container-ty.patch (13.1K, 2-v1-0001-Fix-missed-checks-for-hashability-of-container-ty.patch)
download | inline diff:
From 8e1096d3c9b116243cf06964b2ccd8e5115ae9eb Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Fri, 5 Jun 2026 14:06:17 -0400
Subject: [PATCH v1] Fix missed checks for hashability of container-type
equality.
The operators for array_eq, record_eq, range_eq, and multirange_eq
are all marked oprcanhash, but there's a pitfall: their hash functions
can fail at runtime if the contained type(s) are not hashable.
Therefore, the planner has to check hashability of the contained types
before deciding it can use hashing in these cases. Not every place
had gotten this memo, and noplace at all had considered the issue
for ranges or multiranges.
For the most part we should fix this in the lookup functions provided
by lsyscache.c, to wit get_op_hash_functions and op_hashjoinable.
But there's a problem: get_op_hash_functions is not passed the input
data type it would need to check. We mustn't change the API of that
exported function in a back-patched fix, and even if we wanted to,
its call sites in the executor mostly don't have easy access to the
required data type OID. Fortunately, the executor call sites don't
actually need fixing, because it's expected that the planner verified
hashability before building a plan that requires it. Therefore,
leave get_op_hash_functions as-is and invent a wrapper function
get_op_hash_functions_ext that does the additional checking needed
in the planner's uses.
We also need to fix hash_ok_operator (extending the fix in 647889667).
While at it, neaten up a couple of places in lookup_type_cache where
relevant code for multirange cases was written differently from the
code for other container types.
Reported-by: Andrei Lepikhov <[email protected]>
Author: Andrei Lepikhov <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
---
src/backend/optimizer/plan/subselect.c | 4 +-
src/backend/optimizer/util/clauses.c | 9 ++-
src/backend/utils/adt/selfuncs.c | 4 +-
src/backend/utils/cache/lsyscache.c | 78 ++++++++++++++++++++++++--
src/backend/utils/cache/typcache.c | 25 +++------
src/include/catalog/pg_operator.dat | 4 +-
src/include/utils/lsyscache.h | 2 +
7 files changed, 98 insertions(+), 28 deletions(-)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index ccec1eaa7fe..6aa8971c95d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -841,7 +841,9 @@ hash_ok_operator(OpExpr *expr)
if (list_length(expr->args) != 2)
return false;
if (opid == ARRAY_EQ_OP ||
- opid == RECORD_EQ_OP)
+ opid == RECORD_EQ_OP ||
+ opid == RANGE_EQ_OP ||
+ opid == MULTIRANGE_EQ_OP)
{
/* these are strict, but must check input type to ensure hashable */
Node *leftarg = linitial(expr->args);
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cd86311bb0b..07738894d1a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2544,7 +2544,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
if (IsA(node, ScalarArrayOpExpr))
{
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) node;
- Expr *arrayarg = (Expr *) lsecond(saop->args);
+ Node *leftarg = (Node *) linitial(saop->args);
+ Node *arrayarg = (Node *) lsecond(saop->args);
Oid lefthashfunc;
Oid righthashfunc;
@@ -2553,7 +2554,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
{
if (saop->useOr)
{
- if (get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) &&
+ if (get_op_hash_functions_ext(saop->opno, exprType(leftarg),
+ &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc)
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
@@ -2585,7 +2587,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context)
* just ensure the lookup items are not in the hash table.
*/
if (OidIsValid(negator) &&
- get_op_hash_functions(negator, &lefthashfunc, &righthashfunc) &&
+ get_op_hash_functions_ext(negator, exprType(leftarg),
+ &lefthashfunc, &righthashfunc) &&
lefthashfunc == righthashfunc)
{
Datum arrdatum = ((Const *) arrayarg)->constvalue;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index b9449b4574a..d6efd07073a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2476,7 +2476,9 @@ eqjoinsel(PG_FUNCTION_ARGS)
* hash functions for the join operator.
*/
if ((sslot1.nvalues + sslot2.nvalues) >= EQJOINSEL_MCV_HASH_THRESHOLD)
- (void) get_op_hash_functions(operator, &hashLeft, &hashRight);
+ (void) get_op_hash_functions_ext(operator,
+ exprType((Node *) linitial(args)),
+ &hashLeft, &hashRight);
}
else
memset(&eqproc, 0, sizeof(eqproc)); /* silence uninit-var warnings */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 036086057d7..ae5fa8154a0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -572,6 +572,11 @@ get_compatible_hash_operators(Oid opno,
*
* Returns true if able to find the requested function(s), false if not.
* (This indicates that the operator should not have been marked oprcanhash.)
+ *
+ * Callers must beware that for container types (arrays, records, ranges)
+ * this function will succeed for array_eq etc, but the hash function could
+ * fail at runtime if the contained type(s) are not hashable. If it is
+ * possible that the operator is one of these, use get_op_hash_functions_ext.
*/
bool
get_op_hash_functions(Oid opno,
@@ -654,6 +659,55 @@ get_op_hash_functions(Oid opno,
return result;
}
+/*
+ * get_op_hash_functions_ext
+ * As above, but verify hashability in container-type cases.
+ *
+ * As with op_hashjoinable, assume the left input type is sufficient
+ * to disambiguate container-type cases.
+ */
+bool
+get_op_hash_functions_ext(Oid opno, Oid inputtype,
+ RegProcedure *lhs_procno, RegProcedure *rhs_procno)
+{
+ TypeCacheEntry *typentry;
+
+ /* Ensure output args are initialized on failure */
+ if (lhs_procno)
+ *lhs_procno = InvalidOid;
+ if (rhs_procno)
+ *rhs_procno = InvalidOid;
+
+ /* As in op_hashjoinable, let the typcache handle the hard cases */
+ if (opno == ARRAY_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc != F_HASH_ARRAY)
+ return false;
+ }
+ else if (opno == RECORD_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc != F_HASH_RECORD)
+ return false;
+ }
+ else if (opno == RANGE_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc != F_HASH_RANGE)
+ return false;
+ }
+ else if (opno == MULTIRANGE_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc != F_HASH_MULTIRANGE)
+ return false;
+ }
+
+ /* OK, do the normal lookup */
+ return get_op_hash_functions(opno, lhs_procno, rhs_procno);
+}
+
/*
* get_op_index_interpretation
* Given an operator's OID, find out which amcanorder opfamilies it belongs to,
@@ -1624,7 +1678,8 @@ op_mergejoinable(Oid opno, Oid inputtype)
* For array_eq or record_eq, we can sort if the element or field types
* are all sortable. We could implement all the checks for that here, but
* the typcache already does that and caches the results too, so let's
- * rely on the typcache.
+ * rely on the typcache. We do not need similar special cases for ranges
+ * or multiranges, because their subtypes are required to be sortable.
*/
if (opno == ARRAY_EQ_OP)
{
@@ -1659,10 +1714,11 @@ op_mergejoinable(Oid opno, Oid inputtype)
* Returns true if the operator is hashjoinable. (There must be a suitable
* hash opfamily entry for this operator if it is so marked.)
*
- * In some cases (currently only array_eq), hashjoinability depends on the
- * specific input data type the operator is invoked for, so that must be
- * passed as well. We currently assume that only one input's type is needed
- * to check this --- by convention, pass the left input's data type.
+ * In some cases (currently array_eq, record_eq, range_eq, multirange_eq),
+ * hashjoinability depends on the specific input data type the operator is
+ * invoked for, so that must be passed as well. We currently assume that only
+ * one input's type is needed to check this --- by convention, pass the left
+ * input's data type.
*/
bool
op_hashjoinable(Oid opno, Oid inputtype)
@@ -1684,6 +1740,18 @@ op_hashjoinable(Oid opno, Oid inputtype)
if (typentry->hash_proc == F_HASH_RECORD)
result = true;
}
+ else if (opno == RANGE_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc == F_HASH_RANGE)
+ result = true;
+ }
+ else if (opno == MULTIRANGE_EQ_OP)
+ {
+ typentry = lookup_type_cache(inputtype, TYPECACHE_HASH_PROC);
+ if (typentry->hash_proc == F_HASH_MULTIRANGE)
+ result = true;
+ }
else
{
/* For all other operators, rely on pg_operator.oprcanhash */
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index cebe7a916fb..da91a2ff1dd 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -779,8 +779,9 @@ lookup_type_cache(Oid type_id, int flags)
HASHSTANDARD_PROC);
/*
- * As above, make sure hash_array, hash_record, or hash_range will
- * succeed.
+ * As above, make sure hash_array, hash_record, hash_range, or
+ * hash_multirange will succeed. Here we do need to check the range
+ * cases.
*/
if (hash_proc == F_HASH_ARRAY &&
!array_element_has_hashing(typentry))
@@ -791,12 +792,8 @@ lookup_type_cache(Oid type_id, int flags)
else if (hash_proc == F_HASH_RANGE &&
!range_element_has_hashing(typentry))
hash_proc = InvalidOid;
-
- /*
- * Likewise for hash_multirange.
- */
- if (hash_proc == F_HASH_MULTIRANGE &&
- !multirange_element_has_hashing(typentry))
+ else if (hash_proc == F_HASH_MULTIRANGE &&
+ !multirange_element_has_hashing(typentry))
hash_proc = InvalidOid;
/* Force update of hash_proc_finfo only if we're changing state */
@@ -828,8 +825,8 @@ lookup_type_cache(Oid type_id, int flags)
HASHEXTENDED_PROC);
/*
- * As above, make sure hash_array_extended, hash_record_extended, or
- * hash_range_extended will succeed.
+ * As above, make sure hash_array_extended, hash_record_extended,
+ * hash_range_extended, or hash_multirange_extended will succeed.
*/
if (hash_extended_proc == F_HASH_ARRAY_EXTENDED &&
!array_element_has_extended_hashing(typentry))
@@ -840,12 +837,8 @@ lookup_type_cache(Oid type_id, int flags)
else if (hash_extended_proc == F_HASH_RANGE_EXTENDED &&
!range_element_has_extended_hashing(typentry))
hash_extended_proc = InvalidOid;
-
- /*
- * Likewise for hash_multirange_extended.
- */
- if (hash_extended_proc == F_HASH_MULTIRANGE_EXTENDED &&
- !multirange_element_has_extended_hashing(typentry))
+ else if (hash_extended_proc == F_HASH_MULTIRANGE_EXTENDED &&
+ !multirange_element_has_extended_hashing(typentry))
hash_extended_proc = InvalidOid;
/* Force update of proc finfo only if we're changing state */
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 1a8fd8b8645..c7f860c442b 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3057,7 +3057,7 @@
oprrest => 'scalargesel', oprjoin => 'scalargejoinsel' },
# generic range type operators
-{ oid => '3882', descr => 'equal',
+{ oid => '3882', oid_symbol => 'RANGE_EQ_OP', descr => 'equal',
oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'anyrange',
oprright => 'anyrange', oprresult => 'bool', oprcom => '=(anyrange,anyrange)',
oprnegate => '<>(anyrange,anyrange)', oprcode => 'range_eq',
@@ -3263,7 +3263,7 @@
oprname => '@@', oprleft => 'jsonb', oprright => 'jsonpath',
oprresult => 'bool', oprcode => 'jsonb_path_match_opr(jsonb,jsonpath)',
oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
-{ oid => '2860', descr => 'equal',
+{ oid => '2860', oid_symbol => 'MULTIRANGE_EQ_OP', descr => 'equal',
oprname => '=', oprcanmerge => 't', oprcanhash => 't',
oprleft => 'anymultirange', oprright => 'anymultirange', oprresult => 'bool',
oprcom => '=(anymultirange,anymultirange)',
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 8545e67a632..865980cb0f1 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -86,6 +86,8 @@ extern bool get_compatible_hash_operators(Oid opno,
Oid *lhs_opno, Oid *rhs_opno);
extern bool get_op_hash_functions(Oid opno,
RegProcedure *lhs_procno, RegProcedure *rhs_procno);
+extern bool get_op_hash_functions_ext(Oid opno, Oid inputtype,
+ RegProcedure *lhs_procno, RegProcedure *rhs_procno);
extern List *get_op_index_interpretation(Oid opno);
extern bool equality_ops_are_compatible(Oid opno1, Oid opno2);
extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2);
--
2.52.0
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Hashed SAOP on composite type with non-hashable column errors at runtime
@ 2026-06-05 18:17 Tom Lane <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 0 replies; 5+ messages in thread
From: Tom Lane @ 2026-06-05 18:17 UTC (permalink / raw)
To: Andrei Lepikhov <[email protected]>; +Cc: [email protected]; Peter Eisentraut <[email protected]>
I wrote:
> In the second place, the same hazard exists for range and
> multirange types, which can have non-hashable subtypes.
> AFAICT noplace at all is defending against that.
I meant to attach the test case I'd made to demonstrate
that this is indeed broken. It's a small variant of
your bug-hashed-saop.sql.
regards, tom lane
create type cashrange as range (subtype=money);
CREATE TABLE ctest (a int, b cashrange);
INSERT INTO ctest SELECT g, cashrange(g::money, (g+1)::money)::cashrange FROM generate_series(1, 1000) g;
ANALYZE ctest;
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ctest
WHERE (a,b) = ANY (ARRAY[(1, '(1,2)'::cashrange), (2, '(3,4)'::cashrange)]);
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ctest
WHERE (a,b) = ANY (ARRAY[
(1, '(1,2)'::cashrange), (2, '(2,3)'::cashrange), (3, '(3,4)'::cashrange),
(4, '(4,5)'::cashrange), (5, '(5,6)'::cashrange), (6, '(6,7)'::cashrange),
(7, '(7,8)'::cashrange), (8, '(8,9)'::cashrange), (9, '(9,10)'::cashrange)
]);
Attachments:
[text/plain] bug-with-range.sql (741B, 2-bug-with-range.sql)
download | inline:
create type cashrange as range (subtype=money);
CREATE TABLE ctest (a int, b cashrange);
INSERT INTO ctest SELECT g, cashrange(g::money, (g+1)::money)::cashrange FROM generate_series(1, 1000) g;
ANALYZE ctest;
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ctest
WHERE (a,b) = ANY (ARRAY[(1, '(1,2)'::cashrange), (2, '(3,4)'::cashrange)]);
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ctest
WHERE (a,b) = ANY (ARRAY[
(1, '(1,2)'::cashrange), (2, '(2,3)'::cashrange), (3, '(3,4)'::cashrange),
(4, '(4,5)'::cashrange), (5, '(5,6)'::cashrange), (6, '(6,7)'::cashrange),
(7, '(7,8)'::cashrange), (8, '(8,9)'::cashrange), (9, '(9,10)'::cashrange)
]);
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Hashed SAOP on composite type with non-hashable column errors at runtime
@ 2026-06-07 17:56 Andrei Lepikhov <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 5+ messages in thread
From: Andrei Lepikhov @ 2026-06-07 17:56 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: [email protected]; Peter Eisentraut <[email protected]>
On 05/06/2026 20:12, Tom Lane wrote:
> So I'm unexcited about putting the fix for this into
> convert_saop_to_hashed_saop_walker as you've done here.
> I think it needs to be addressed at the level of the relevant
> lsyscache.c lookup functions, so that there's some chance that
> future code additions will get this right. Draft fix attached.
Thanks for your efforts!
Now, hash_ok_operator and op_hashjoinable handle all four container-type
equality operators. Side way is a C extension that lets you create a custom type
that groups other types marked as HASHES. I started this research because I had
trouble redesigning my ‘statistics’ type [1], but here, using HASHES seems just
not to work for my custom type.
Fixes in the lookup_type_cache related to the multirange type are also correct
for me. As well as pg_operator.dat changes.
>
> I can't get excited about the test case you suggest;
> it's rather expensive and it will do nothing whatever
> to guard against future mistakes of the same kind.
Ok, let me think about that a little more.
>
> I'm also unexcited about your 0002 and 0003.
I understand about 0003, but what is the problem with 0002? In practice, people
use massive arrays (I’ve seen thousands of elements). You might remember my
complaint about planner’s memory consumption on array selectivity estimation a
couple of years ago - that time you proposed local planning memory context. So,
it’d be nice to see (as with Subplans) whether the SAOP is not hashed for a reason.
> I don't really care about optimizing the anonymous-record case; by and large,
> it's coincidental that complicated operations work at all on
> anonymous record types.
Got it. My actual care here is to provide a way (if possible) for extension
developers to fix this problem in ORM systems where they can't change the
complex application, but have an access pattern and will see regressions, as
they struggle with regressions each time after the introduction of a brand-new
query tree rewriting rule ;).
Note on the ‘lefthashfunc == righthashfunc’ condition. It is correct, because we
can compare RECORDs with only identical types in corresponding positions on the
left and right side of the comparison operator:
if (att1->atttypid != att2->atttypid)
ereport(ERROR, "cannot compare dissimilar column types %s and %s ...");
So, if someday typecache is extended to compare, let’s say, (int4, int8) and
(int4, numeric), this code should also be revised, right?
[1] https://github.com/danolivo/pg_track_optimizer/blob/main/rstats.h
--
regards, Andrei Lepikhov,
pgEdge
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Hashed SAOP on composite type with non-hashable column errors at runtime
@ 2026-06-08 16:00 Tom Lane <[email protected]>
parent: Andrei Lepikhov <[email protected]>
0 siblings, 0 replies; 5+ messages in thread
From: Tom Lane @ 2026-06-08 16:00 UTC (permalink / raw)
To: Andrei Lepikhov <[email protected]>; +Cc: [email protected]; Peter Eisentraut <[email protected]>
Andrei Lepikhov <[email protected]> writes:
> Now, hash_ok_operator and op_hashjoinable handle all four container-type
> equality operators. Side way is a C extension that lets you create a custom type
> that groups other types marked as HASHES.
Yeah, it's interesting to speculate about what we'd have to do to
allow extensions to invent new kinds of container types. Right now,
the knowledge of what kinds of containers there are is wired into
a bunch of places. This fix isn't adding any new places, just fixing
some places whose knowledge was incomplete. So I'm content with this
for today.
> Fixes in the lookup_type_cache related to the multirange type are also correct
> for me. As well as pg_operator.dat changes.
Thanks for reviewing; I pushed v1-0001 after a bit more
comment-smithing.
>> I'm also unexcited about your 0002 and 0003.
> I understand about 0003, but what is the problem with 0002?
Let me rephrase that: 0002 is a new feature and hence out of scope
at this point in the development cycle. If you want to start a
new thread proposing that for v20, go right ahead.
regards, tom lane
^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2026-06-08 16:00 UTC | newest]
Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-05 14:27 Hashed SAOP on composite type with non-hashable column errors at runtime Andrei Lepikhov <[email protected]>
2026-06-05 18:12 ` Tom Lane <[email protected]>
2026-06-05 18:17 ` Tom Lane <[email protected]>
2026-06-07 17:56 ` Andrei Lepikhov <[email protected]>
2026-06-08 16:00 ` Tom Lane <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox