public inbox for [email protected]help / color / mirror / Atom feed
SQL-level pg_datum_image_equal 11+ messages / 4 participants [nested] [flat]
* SQL-level pg_datum_image_equal @ 2025-12-10 17:46 Matthias van de Meent <[email protected]> 0 siblings, 2 replies; 11+ messages in thread From: Matthias van de Meent @ 2025-12-10 17:46 UTC (permalink / raw) To: PostgreSQL Hackers <[email protected]> Hi, One of our customers has this workload where every so often they update the whole table to make sure it's up-to-date. In general, you'd probably want to use MERGE for such a workload and ignore all rows that already have only matching data, but there's a catch: PostgreSQL doesn't have an efficient way to check if the provided data is actually equal in all senses of the word, so we can't easily and cheaply determine whether an update is needed; which is one reason why the full table was updated every time. A naive approach to determining whether each value needs to be updated would use `old IS NOT DISTINCT FROM new`, but a.) this relies on `=` operators to exist for that type, and b.) the = operator of some types don't always distinguish between values that are different for human readers; with as famous example '1.0' and '1.00' in numeric; they have an equal value but are clearly distinct to readers (and certain functions). One could get around this in this case by 'simply' casting to text and comparing the outputs (using the C collation for performance and determinism), or by wrapping it in a row (which then uses record_image_eq, which does use binary compare functions internally), but both imply additional parsing, wrapping, and overhead compared to a direct datum_image_eq call. So, attached is a simple and to-the-point patch that adds the function mentioned in $subject, which will tell the user whether two values of the same type have an exactly equal binary representation, using datum_image_eq. Kind regards, Matthias van de Meent Attachments: [application/octet-stream] v1-0001-Add-SQL-level-datum-equality-tests.patch (3.4K, 2-v1-0001-Add-SQL-level-datum-equality-tests.patch) download | inline diff: From f5264368a05bb49b4ce0e4ff586fb5da6b1f5cc4 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <[email protected]> Date: Wed, 10 Dec 2025 18:45:19 +0100 Subject: [PATCH v1] Add SQL-level datum equality tests This enables improved performance for users that need to test for exact bytewise differences in SQL; e.g. to see if an UPDATE is really necessary for externally provided values. A workaround around this limitation was possible through various methods, but a generic method was not available for all types. --- src/backend/utils/adt/pseudotypes.c | 51 ++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 7 ++++ src/test/regress/expected/opr_sanity.out | 1 + 3 files changed, 59 insertions(+) diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..216d895d2f7 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -23,7 +23,9 @@ #include "postgres.h" #include "libpq/pqformat.h" +#include "utils/datum.h" #include "utils/fmgrprotos.h" +#include "utils/lsyscache.h" /* @@ -375,3 +377,52 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); PSEUDOTYPE_DUMMY_IO_FUNCS(anycompatible); PSEUDOTYPE_DUMMY_IO_FUNCS(anycompatiblenonarray); + +/* + * Compares two datums of the same (any) type, and returns whether they have + * the same binary representation. + */ +Datum +pg_datum_image_equal(PG_FUNCTION_ARGS) +{ + bool eq; + + if (PG_ARGISNULL(0) != PG_ARGISNULL(1)) + { + eq = false; + } + else if (PG_ARGISNULL(0)) + { + /* both NULL */ + eq = true; + } + else + { + Oid typ; + Datum arg0; + Datum arg1; + bool typbyval; + char typalign; + int16 typlen; + + typ = get_fn_expr_argtype(fcinfo->flinfo, 0); + + if (!OidIsValid(typ)) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("could not determine type"))); + } + + Assert(typ == get_fn_expr_argtype(fcinfo->flinfo, 1)); + + arg0 = PG_GETARG_DATUM(0); + arg1 = PG_GETARG_DATUM(1); + + get_typlenbyvalalign(typ, &typlen, &typbyval, &typalign); + + eq = datum_image_eq(arg0, arg1, typbyval, typlen); + } + + PG_RETURN_BOOL(eq); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fd9448ec7b9..2f1b1ba8370 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12612,4 +12612,11 @@ proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}', prosrc => 'pg_get_aios' }, +{ oid => '9200', + descr => 'test if two values have the same binary representation', + proname => 'pg_datum_image_equal', proisstrict => 'f', + proleakproof => 't', prorettype => 'bool', + proargtypes => 'anyelement anyelement', + prosrc => 'pg_datum_image_equal' }, + ] diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index a357e1d0c0e..57fe8d6ede8 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -880,6 +880,7 @@ bytea(integer) bytea(bigint) bytea_larger(bytea,bytea) bytea_smaller(bytea,bytea) +pg_datum_image_equal(anyelement,anyelement) -- Check that functions without argument are not marked as leakproof. SELECT p1.oid::regprocedure FROM pg_proc p1 JOIN pg_namespace pn -- 2.50.1 (Apple Git-155) ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2025-12-20 13:14 jian he <[email protected]> parent: Matthias van de Meent <[email protected]> 1 sibling, 1 reply; 11+ messages in thread From: jian he @ 2025-12-20 13:14 UTC (permalink / raw) To: Matthias van de Meent <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Thu, Dec 11, 2025 at 1:46 AM Matthias van de Meent <[email protected]> wrote: > > Hi, > > > So, attached is a simple and to-the-point patch that adds the function > mentioned in $subject, which will tell the user whether two values of > the same type have an exactly equal binary representation, using > datum_image_eq. > hi. maybe Table 9.76 (https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-CATALOG) is the right place for this function. corner case confused me, I think this is related to null handling, maybe not related to this. create type t1 as (a int, b text); select pg_datum_image_equal('(,)'::t1, $$(,)$$::t1); select pg_datum_image_equal('(,)'::t1, NULL::t1); select '(,)'::t1 is null, NULL::t1 is null; enforce_generic_type_consistency already resolved generic type. see select pg_datum_image_equal('1','1'); ERROR: could not determine polymorphic type because input has type unknown so + if (!OidIsValid(typ)) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("could not determine type"))); + } this part should be elog(ERROR.....) ? -- jian https://www.enterprisedb.com/ ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2025-12-20 16:07 Corey Huinker <[email protected]> parent: Matthias van de Meent <[email protected]> 1 sibling, 1 reply; 11+ messages in thread From: Corey Huinker @ 2025-12-20 16:07 UTC (permalink / raw) To: Matthias van de Meent <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Wed, Dec 10, 2025 at 12:46 PM Matthias van de Meent < [email protected]> wrote: > Hi, > > One of our customers has this workload where every so often they > update the whole table to make sure it's up-to-date. In general, you'd > probably want to use MERGE for such a workload and ignore all rows > that already have only matching data, but there's a catch: PostgreSQL > doesn't have an efficient way to check if the provided data is > actually equal in all senses of the word, so we can't easily and > cheaply determine whether an update is needed; which is one reason why > the full table was updated every time. > Have you ruled out the suppress_redundant_updates_trigger? https://www.postgresql.org/docs/current/functions-trigger.html ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2025-12-21 19:26 Matthias van de Meent <[email protected]> parent: Corey Huinker <[email protected]> 0 siblings, 0 replies; 11+ messages in thread From: Matthias van de Meent @ 2025-12-21 19:26 UTC (permalink / raw) To: Corey Huinker <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Sat, 20 Dec 2025, 17:07 Corey Huinker, <[email protected]> wrote: > > On Wed, Dec 10, 2025 at 12:46 PM Matthias van de Meent <[email protected]> wrote: >> >> Hi, >> >> One of our customers has this workload where every so often they >> update the whole table to make sure it's up-to-date. In general, you'd >> probably want to use MERGE for such a workload and ignore all rows >> that already have only matching data, but there's a catch: PostgreSQL >> doesn't have an efficient way to check if the provided data is >> actually equal in all senses of the word, so we can't easily and >> cheaply determine whether an update is needed; which is one reason why >> the full table was updated every time. > > Have you ruled out the suppress_redundant_updates_trigger? Thank you for the reference, I wasn't aware of this trigger. Sadly, it does not work for our use case, as that only suppresses an update if the heap-formatted rows are binary identical, which is not guaranteed even if when all values are equivalent; as it doesn't take detoasting into account. It also doesn't minimize the pressure on the TOAST table, which is something else we're trying to do with the new function. The issue is that when you SET a column with a user-provided value, during trigger handling, HOT checking, and TOASTing, the binary representation of that user-provided value is the untoasted version (as it has not yet been inserted into any toast table and isn't represented as varatt_external), while the original row's value may be a toast pointer (represented as varatt_external). The checks in trigger handling, TOASTing, and HOT checking, the old tuple's value for that column (in its varatt_external representation) is compared against the new value (as normal varattrib_4b.va_4byte or varattrib_1b), and those will never be binary equal - their first byte is guaranteed to be different. Only if the value is pulled directly from the original column will the original column's TOAST pointer be used, and can a new toast table insertion be skipped (after which suppress_redundant_updates_trigger with its in-heap-row compare option might become useful). But, lacking a system that checks checks whether toasted values actually changed (and thus whether HOT applies, and whether an update has to happen), that trigger isn't up to the task at hand. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2025-12-22 15:25 Matthias van de Meent <[email protected]> parent: jian he <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: Matthias van de Meent @ 2025-12-22 15:25 UTC (permalink / raw) To: jian he <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Sat, 20 Dec 2025 at 14:15, jian he <[email protected]> wrote: > > On Thu, Dec 11, 2025 at 1:46 AM Matthias van de Meent > <[email protected]> wrote: > > > > Hi, > > > > > > > > So, attached is a simple and to-the-point patch that adds the function > > mentioned in $subject, which will tell the user whether two values of > > the same type have an exactly equal binary representation, using > > datum_image_eq. > > > > hi. > > maybe Table 9.76 > (https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-CATALOG) > is the right place for this function. I think table 9.3 (https://www.postgresql.org/docs/18/functions-comparison.html#FUNCTIONS-COMPARISON-FUNC-TABLE) makes more sense, as this is more a compare function than one that exposes catalog information about the input. > corner case confused me, I think this is related to null handling, > maybe not related to this. > create type t1 as (a int, b text); > select pg_datum_image_equal('(,)'::t1, $$(,)$$::t1); > select pg_datum_image_equal('(,)'::t1, NULL::t1); > select '(,)'::t1 is null, NULL::t1 is null; Yes, that's row-type NULL handling for you. '(,)' is a composite value with only NULL values in the attributes, and SQL defines that rows with only NULL columns must return True when `IS NULL` evaluates their NULL-ness. On disk, however, it is still stored as a "composite type; attributes 'a' and 'b' are NULL"; so that a user that casts the value to text will get a different result between (NULL::t1::text) and ('(,)'::t1::text), allowing safe round-trip conversions. Also note that `('(,)'::t1 IS DISTINCT FROM NULL::t1) = TRUE, another curious consequence of this SQL rule. So, that output is expected; some methods already expose these differences between the values, so pg_datum_image_equal() *must* also indicate they are different. And now we also have one more reason to have a function that can notice distinctions that go deeper than surface-level SQL. Aside: This new function doesn't actually fully cover the spectrum of possible inequalities detectable through SQL, as there are some very low level datum introspection tools like pg_column_size() whose output depends on the type of toasting applied. My function cover that, because that data should be completely irrelevant to normal data usage, and the user can combine this manually if they really need it. > enforce_generic_type_consistency already resolved generic type. While you are correct to point out that the type system would prevent this from getting called from SQL without a proper type, I'd like to keep the check to make sure that callers from outside the type system don't accidentally fail to provide the function with a correct type. > so > + if (!OidIsValid(typ)) > + { > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("could not determine type"))); > + } > this part should be elog(ERROR.....) ? Is there a policy on what should _not_ use ereport? I know we don't require ereport for internal errors, but is considered forbidden? Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-25 14:46 Matthias van de Meent <[email protected]> parent: Matthias van de Meent <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: Matthias van de Meent @ 2026-03-25 14:46 UTC (permalink / raw) To: ; +Cc: jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, 22 Dec 2025 at 16:25, Matthias van de Meent <[email protected]> wrote: > > On Sat, 20 Dec 2025 at 14:15, jian he <[email protected]> wrote: > > maybe Table 9.76 > > (https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-CATALOG) > > is the right place for this function. > > I think table 9.3 > (https://www.postgresql.org/docs/18/functions-comparison.html#FUNCTIONS-COMPARISON-FUNC-TABLE) > makes more sense, as this is more a compare function than one that > exposes catalog information about the input. Attached is v2, which adds the new function to the docs, in addition to rebasing the patch onto master. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) Attachments: [application/octet-stream] v2-0001-Add-SQL-level-datum-equality-tests.patch (4.6K, 2-v2-0001-Add-SQL-level-datum-equality-tests.patch) download | inline diff: From febf8d64154bdd40997398fb0ff79c27b726fc72 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <[email protected]> Date: Wed, 10 Dec 2025 18:45:19 +0100 Subject: [PATCH v2] Add SQL-level datum equality tests This enables improved performance for users that need to test for exact bytewise differences in SQL; e.g. to see if an UPDATE is really necessary with a set of externally provided values. A workaround around this limitation was often possible, but no generic method was available that was this performant, or worked for all types. --- doc/src/sgml/func/func-comparison.sgml | 21 ++++++++++ src/backend/utils/adt/pseudotypes.c | 51 ++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 7 ++++ src/test/regress/expected/opr_sanity.out | 1 + 4 files changed, 80 insertions(+) diff --git a/doc/src/sgml/func/func-comparison.sgml b/doc/src/sgml/func/func-comparison.sgml index ecb1d89463a..2f970fecda3 100644 --- a/doc/src/sgml/func/func-comparison.sgml +++ b/doc/src/sgml/func/func-comparison.sgml @@ -653,6 +653,27 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in <returnvalue>1</returnvalue> </para></entry> </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_datum_image_equal</primary> + </indexterm> + <function>pg_datum_image_equal</function> ( <type>anyelement</type>, <type>anyelement</type> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Returns whether the values have the same exact binary representation. + </para> + <para> + <literal>pg_datum_image_equal('1.0'::numeric, '1.0'::numeric)</literal> + <returnvalue>true</returnvalue> + </para> + <para> + <literal>pg_datum_image_equal('1.0'::numeric, '1.00'::numeric)</literal> + <returnvalue>false</returnvalue> + </para></entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 4581c4b1697..93a67e53026 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -23,7 +23,9 @@ #include "postgres.h" #include "libpq/pqformat.h" +#include "utils/datum.h" #include "utils/fmgrprotos.h" +#include "utils/lsyscache.h" /* @@ -375,3 +377,52 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); PSEUDOTYPE_DUMMY_IO_FUNCS(anycompatible); PSEUDOTYPE_DUMMY_IO_FUNCS(anycompatiblenonarray); + +/* + * Compares two datums of the same (any) type, and returns whether they have + * the same binary representation. + */ +Datum +pg_datum_image_equal(PG_FUNCTION_ARGS) +{ + bool eq; + + if (PG_ARGISNULL(0) != PG_ARGISNULL(1)) + { + eq = false; + } + else if (PG_ARGISNULL(0)) + { + /* both NULL */ + eq = true; + } + else + { + Oid typ; + Datum arg0; + Datum arg1; + bool typbyval; + char typalign; + int16 typlen; + + typ = get_fn_expr_argtype(fcinfo->flinfo, 0); + + if (!OidIsValid(typ)) + { + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("could not determine type"))); + } + + Assert(typ == get_fn_expr_argtype(fcinfo->flinfo, 1)); + + arg0 = PG_GETARG_DATUM(0); + arg1 = PG_GETARG_DATUM(1); + + get_typlenbyvalalign(typ, &typlen, &typbyval, &typalign); + + eq = datum_image_eq(arg0, arg1, typbyval, typlen); + } + + PG_RETURN_BOOL(eq); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 0118e970dda..ea59ede9660 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12851,4 +12851,11 @@ proname => 'hashoid8extended', prorettype => 'int8', proargtypes => 'oid8 int8', prosrc => 'hashoid8extended' }, +{ oid => '9200', + descr => 'test if two values have the same binary representation', + proname => 'pg_datum_image_equal', proisstrict => 'f', + proleakproof => 't', prorettype => 'bool', + proargtypes => 'anyelement anyelement', + prosrc => 'pg_datum_image_equal' }, + ] diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index 6ff4d7ee901..df25551c48d 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -887,6 +887,7 @@ oid8le(oid8,oid8) oid8gt(oid8,oid8) oid8ge(oid8,oid8) btoid8cmp(oid8,oid8) +pg_datum_image_equal(anyelement,anyelement) -- Check that functions without argument are not marked as leakproof. SELECT p1.oid::regprocedure FROM pg_proc p1 JOIN pg_namespace pn -- 2.50.1 (Apple Git-155) ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-25 20:40 David Rowley <[email protected]> parent: Matthias van de Meent <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: David Rowley @ 2026-03-25 20:40 UTC (permalink / raw) To: Matthias van de Meent <[email protected]>; +Cc: jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Thu, 26 Mar 2026 at 03:46, Matthias van de Meent <[email protected]> wrote: > Attached is v2, which adds the new function to the docs, in addition > to rebasing the patch onto master. In [1] I was looking into reported bugs with the datum_image code not behaving correctly. It turns out this is due to hash_numeric() using the wrong PG_RETURN_* macro, resulting in it not correctly sign-extending negative 32-bit ints. I did some analysis to see if there were any user-visible changes from fixing that in the back branches. I didn't find any. What you're adding here would add one. I think we might need to become more strict about using the correct return macro before we expose this stuff, else it's going to be a rather scary process to fix bugs when it could affect the datum_image_equal result. The thing about Memoize slowly finding all the bugs in the datum_image_* code is that we're free to fix these bugs, as the use case (the Memoize hash table) only lasts as long as the query. If we expose all of this to users, then they could persist things in tables. 6911f8037 fixed another problem that resulted in the datum_image code not doing the right thing. I also proposed a hack in [2] to fix the sign-extension problem. I had hoped that might be temporary, but if we were to expose this function at the SQL level, then I doubt we'd be brave enough to remove it. Consider the results of the following 2 queries, which only differ in that one uses a materialized CTE and the other does not. with cte(hash) as materialized (select hash_numeric(n) from (values('1234.124'::numeric),('1234.124'::numeric)) n(n)) select * from cte where pg_datum_image_equal(hash, hash_numeric('1234.124'::numeric)); -- wrong results with cte(hash) as (select hash_numeric(n) from (values('1234.124'::numeric),('1234.124'::numeric)) n(n)) select * from cte where pg_datum_image_equal(hash, hash_numeric('1234.124'::numeric)); hash ------ (0 rows) hash ------------ -612512148 -612512148 (2 rows) This would work correctly if I pushed the patch in [2]. But as of yet, I'm not sure about it. I at least think we should delay doing this until we've come up with a fix that we're confident in. David [1] https://postgr.es/m/CAApHDvrRR1VOk4i4CpNWeL48veFshfRAvDTuWxsiUhUqo0akwQ%40mail.gmail.com [2] https://postgr.es/m/CAApHDvreF-UiqBaHtRTQWQ6z1X9snstJW%2Bdfb2DU5GOb-uPEBg%40mail.gmail.com ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-25 21:25 Matthias van de Meent <[email protected]> parent: David Rowley <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: Matthias van de Meent @ 2026-03-25 21:25 UTC (permalink / raw) To: David Rowley <[email protected]>; +Cc: jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, 25 Mar 2026 at 21:40, David Rowley <[email protected]> wrote: > > On Thu, 26 Mar 2026 at 03:46, Matthias van de Meent > <[email protected]> wrote: > > Attached is v2, which adds the new function to the docs, in addition > > to rebasing the patch onto master. > > In [1] I was looking into reported bugs with the datum_image code not > behaving correctly. It turns out this is due to hash_numeric() using > the wrong PG_RETURN_* macro, resulting in it not correctly > sign-extending negative 32-bit ints. Thank you for refering me to this. > I did some analysis to see if > there were any user-visible changes from fixing that in the back > branches. I didn't find any. What you're adding here would add one. I > think we might need to become more strict about using the correct > return macro before we expose this stuff, else it's going to be a > rather scary process to fix bugs when it could affect the > datum_image_equal result. I'm not sure we can reliably enforce that sign extension is always going to be consistent for the same type; specifically looking at unsigned integers as example. PG's tuple deserialization would see 4 byte alignment and sign-extend it as if it were a signed integer, but returned values would prefer non- sign-extended values for more efficient casts from larger unsigned types. > The thing about Memoize slowly finding all the bugs in the > datum_image_* code is that we're free to fix these bugs, as the use > case (the Memoize hash table) only lasts as long as the query. If we > expose all of this to users, then they could persist things in tables. > 6911f8037 fixed another problem that resulted in the datum_image code > not doing the right thing. I'm happy to mark this function as STABLE for now (to prevent its inclusion in permanent storage), and/or to adjust the code to adjust for subnormal inputs (values with incorrect/inconsistent/unexpected sign extensions). > I also proposed a hack in [2] to fix the sign-extension problem. I had > hoped that might be temporary, but if we were to expose this function > at the SQL level, then I doubt we'd be brave enough to remove it. pg_datum_image_equal is supposed to only care about the bytes of the datum that contain the actual value, so I can surely update it to do that. I'll be happy to only apply that part to the code inside pg_datum_image_equal; that should remove the part that exposes the issue directly to users. > This would work correctly if I pushed the patch in [2]. But as of yet, > I'm not sure about it. I at least think we should delay doing this > until we've come up with a fix that we're confident in. I'm happy with any of the options: Having your patch at [2] get committed, me applying [2]'s changes in pg_datum_image_equal, or marking the function STABLE (and so, disallowing using its output in permanent storage). Note that false negatives are unlikely to be a problem, whilst false positives might have issues. Sign extension issues here would create false negatives, so likely wouldn't be a huge problem. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) PS. isn't this an issue with OIDs getting read from disk vs copied around in memory, too? Oids are unsigned, whilst the deserialization path assumes signed, so I'd expect that to have different outputs when the from-disk OID you're checking has its not-sign bit 31 set? > [2] https://postgr.es/m/CAApHDvreF-UiqBaHtRTQWQ6z1X9snstJW%2Bdfb2DU5GOb-uPEBg%40mail.gmail.com ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-25 21:51 David Rowley <[email protected]> parent: Matthias van de Meent <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: David Rowley @ 2026-03-25 21:51 UTC (permalink / raw) To: Matthias van de Meent <[email protected]>; +Cc: jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Thu, 26 Mar 2026 at 10:25, Matthias van de Meent <[email protected]> wrote: > I'm happy to mark this function as STABLE for now (to prevent its > inclusion in permanent storage), and/or to adjust the code to adjust > for subnormal inputs (values with incorrect/inconsistent/unexpected > sign extensions). You lost me at this part. How does marking the function as STABLE prevent users from persisting things on disk based on the return value of the function? I expected the primary use case for this would be in trigger functions that make decisions about data that goes into tables. David ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-25 22:42 Matthias van de Meent <[email protected]> parent: David Rowley <[email protected]> 0 siblings, 1 reply; 11+ messages in thread From: Matthias van de Meent @ 2026-03-25 22:42 UTC (permalink / raw) To: David Rowley <[email protected]>; +Cc: jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, 25 Mar 2026 at 22:51, David Rowley <[email protected]> wrote: > > On Thu, 26 Mar 2026 at 10:25, Matthias van de Meent > <[email protected]> wrote: > > I'm happy to mark this function as STABLE for now (to prevent its > > inclusion in permanent storage), and/or to adjust the code to adjust > > for subnormal inputs (values with incorrect/inconsistent/unexpected > > sign extensions). > > You lost me at this part. How does marking the function as STABLE > prevent users from persisting things on disk based on the return value > of the function? I expected the primary use case for this would be in > trigger functions that make decisions about data that goes into > tables. Indexes and stored generated columns' expression may only contain IMMUTABLE functions, so that they don't change output when the inputs values are unchanged. As the current datum_image_equal depends on the volatile contents/definition of sign-extended bytes (which we clearly don't have a defined/expected value for) that makes the output of this function not immutable for the "same" input values. Marking it as STABLE would therefore prevent its values from being stored inside PostgreSQL's definitions and storing the wrong value (or making a decision based on the wrong value, in case of partial indexes). That wouldn't solve the issue when used in a trigger function, indeed; which is why the second part shows that pg_datum_image_equal could itself check and adjust byval types to have consistent sign-extended bytes before passing them on to datum_image_equal, so that it won't be sensitive to the issue discussed in the memoization thread. Kind regards, Matthias van de Meent ^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: SQL-level pg_datum_image_equal @ 2026-03-27 14:51 Matthias van de Meent <[email protected]> parent: Matthias van de Meent <[email protected]> 0 siblings, 0 replies; 11+ messages in thread From: Matthias van de Meent @ 2026-03-27 14:51 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: David Rowley <[email protected]>; jian he <[email protected]>; PostgreSQL Hackers <[email protected]> On Thu, 26 Mar 2026 at 18:17, Tom Lane <[email protected]> wrote: > > Matthias van de Meent <[email protected]> writes: > > On Wed, 25 Mar 2026 at 22:51, David Rowley <[email protected]> wrote: > >> You lost me at this part. How does marking the function as STABLE > >> prevent users from persisting things on disk based on the return value > >> of the function? I expected the primary use case for this would be in > >> trigger functions that make decisions about data that goes into > >> tables. > > > Indexes and stored generated columns' expression may only contain > > IMMUTABLE functions, so that they don't change output when the inputs > > values are unchanged. As the current datum_image_equal depends on the > > volatile contents/definition of sign-extended bytes (which we clearly > > don't have a defined/expected value for) that makes the output of this > > function not immutable for the "same" input values. > > This seems to me to be a rather creative misinterpretation of what > STABLE and IMMUTABLE mean. If you want to claim that IMMUTABLE means > that, then the function isn't STABLE either, since it could give > different results for the "same" input values within one query. > Moreover, switching from IMMUTABLE to STABLE wouldn't fix the > problem of users assuming more than they should. Yeah, that's fair. > The actual problem here is that datum_image_eq is assuming more > than it should about the contents of a pass-by-value Datum. > That was okay for its original use-cases because a false not-equal > report would just end in not applying some optimization. But > Memoize thinks that the answers are exact, and users would too > if we expose the function at SQL level. > > I think what David proposed at > <CAApHDvreF-UiqBaHtRTQWQ6z1X9snstJW+dfb2DU5GOb-uPEBg@mail.gmail.com> > is not a hack, but in fact correcting datum_image_eq/datum_image_hash > to not assume that unspecified bits are reliably the same. Agreed. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) ^ permalink raw reply [nested|flat] 11+ messages in thread
end of thread, other threads:[~2026-03-27 14:51 UTC | newest] Thread overview: 11+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-12-10 17:46 SQL-level pg_datum_image_equal Matthias van de Meent <[email protected]> 2025-12-20 13:14 ` jian he <[email protected]> 2025-12-22 15:25 ` Matthias van de Meent <[email protected]> 2026-03-25 14:46 ` Matthias van de Meent <[email protected]> 2026-03-25 20:40 ` David Rowley <[email protected]> 2026-03-25 21:25 ` Matthias van de Meent <[email protected]> 2026-03-25 21:51 ` David Rowley <[email protected]> 2026-03-25 22:42 ` Matthias van de Meent <[email protected]> 2026-03-27 14:51 ` Matthias van de Meent <[email protected]> 2025-12-20 16:07 ` Corey Huinker <[email protected]> 2025-12-21 19:26 ` Matthias van de Meent <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox