Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5VjZ-003K62-2m for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 21:25:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5VjY-00GdPJ-1F for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 21:25:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5VjY-00GdPA-0C for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 21:25:48 +0000 Received: from mail-lj1-x22b.google.com ([2a00:1450:4864:20::22b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5VjV-000000018d4-3Lr1 for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 21:25:48 +0000 Received: by mail-lj1-x22b.google.com with SMTP id 38308e7fff4ca-38a76ba6a78so2443851fa.0 for ; Wed, 25 Mar 2026 14:25:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774473945; cv=none; d=google.com; s=arc-20240605; b=EPypbb5Ry6aeN4R+mosljaCg6XKpcACyZWV4c5XReuLyzG5hpsEULo8kxMjv4IBkm/ 28rERg6nRbyfzZth0qDB6nW0kb2HTNk7Mx2bEj45QvzZXwbwVHxawgICVE1JMR0Gu8cY 6YE9+qeSuWTaEfUkqvDPveuKVUCbdISdJQSqs+MpimlHwOWg08IM1dl9bYAx1pg9yPhD 3ihR7fOBANnPHuyowb12cvZpd/fz02sUWJWcyVqdGhcEwtM324EyVIERkMh/G95NdwUc cmyGCFyueJWrppSOhRNfLQklXRnb+lVTJzv/mcR2y7xdhvNBr+Fm2dxMn8kjuFFQaqA6 WhzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=RnxyXsYMp9EpDZQtf+wZ1eK9fXtw+PG3iG4+FYSbvuI=; fh=rXH7eeurBNeKkq5SGs87Tcb+PhzOXh7amozxR0/Oy9w=; b=fbrRYtOqsJBwkiPpPMFs9QyuRf1BHmf2o8sHxKj0PVuhOG/XKkRMdncB37V0jURHxL m/dQ46TaMauX2rmJGJLTG74/Bvcv+lv7zWtinbh9qQBGCf3OLAqXUxyLPH7rznLCqhz0 qH51byX79l7QFrkka4m6BS0fPIRZkjwJwWBzcRgQ4mX5Pyc5RLRKk13GcVGv3R8hLQr1 nOci2MxORYuNM1KZKOhDIqmD66Kzj+XtEp/s0yaAU9sAYvPw0r3mtJTUfW3lKPwcYG+t nlI1PTGwf3jFywAXYESt2YzNUcI4T3AR5uZie2PjZLKQlSvBuPDqYuK7kjx3TT4x51nW XW5Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774473945; x=1775078745; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RnxyXsYMp9EpDZQtf+wZ1eK9fXtw+PG3iG4+FYSbvuI=; b=mPRwk6w2/8rXIPfpAygjjkdYSSlHnE9UHxAur8zFPT4cYXX+nZ6eKw2tU837Om6nD/ u7sD7+5OhQZVNjKNCW2kf3QJp6CGRV7/5cKkRtiRU7nfqBDMl/T4Hkuz4mSwe8z1Jl9U V12UubNYpIr+6Q1oExYlDwtOpJ9fefiZcNQamzKxvrwkPLUT1VGMo+W17GaXY6B8/nYW N2I/moHJtzebZBydLLYnp2M/dQ5rfCWWPocYQLWlicVSwI2e7Vox+OwE6WY0MaCdVSYb W6TxOSnU7QVS6HvY61TXOrPvltdC1ILwYoZl9FqM5oc14nXls3s+LdO0tE3U4PDa7u+D frWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774473945; x=1775078745; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=RnxyXsYMp9EpDZQtf+wZ1eK9fXtw+PG3iG4+FYSbvuI=; b=V2YosuJh4HtM6JQyvQzaO74WmtslEfinUsacygfP2QI3QR5uPFVCJTFJ+vt6rSbqyh OvTs/svYjn7y60m88e+aXkTOOt8vOE2v6fccX4sMzRrGvGdDNmdFKiqap0E+nQcm5orl p/CoB0m+pOV76U/1zIUL/q+FwcZgSRlybVD+CJFhzB/DWdRVWYyeRvJUUquhprk+J8Ff eL4j8a0//ZdAZahXT+xPLP3r81kLsTOTqetyso96h99EBEtLmtuQy3QFegwl86XgvAJ3 o23Nexy2GxMdtWwOcf5BbG6oErPRA/d/LRxHWUMs1pEC41NiPCsxPngh7FI8fmhNThru sshA== X-Forwarded-Encrypted: i=1; AJvYcCUAHnd7NC3cEqaYDenia2j40V5J8GNDBsmagaqfDX1O8hl2TfuoT2ZMijT3+44m6Nz5ygo38mVsn9TfHkEE@lists.postgresql.org X-Gm-Message-State: AOJu0Yzkq9I5okfJU2j801cINn1wyjhhMfQn0vC7+zCjjGdQ+lnf4qGX 5yjo1UNr9wbuYuf8Y6ZJ8L+e7wjCZhpmi1+zUqEmUFfnmaXbvwDLLYHmyQ9LIkXP33EnU+5Qqw7 byeW4gthPsIxJ2nWFB8uLJMtrwiuUxks= X-Gm-Gg: ATEYQzyQ6CjxPXyZjzq0e2/Ybh4vxa9ST4uOkl9PzyU+mfjSj1PppkMDLvEr27Lf5+O pDspyuQK5rfy1dHhPzM/4zQZrVqxkQltVWYTuV3v0P9yAf7JEYLCwQmPAAjkWl+BlVgQtLWJK1x 4O3VfrivBLk9NOJhVbPLAcigYjfTM+hUdFlmoz/SmFwqxSymMGtcTYhQOJxpPBBiGzeVQTanjqy i4jdCPdKPSjWmmhNVwoqOUjpFIu1YZqmuwFdqKkS0EyuKNy1yucazYi/GZcPrpPEQ9Yo2NLy16b WmBcuglxOVYrIe++w654CtY3QGhLpKTcuWLTbTTXa3Xhao+3MweHsNNHqL4eW9WuSFKBoRalGDa ExZN9 X-Received: by 2002:a05:651c:b14:b0:38c:45f1:1eb7 with SMTP id 38308e7fff4ca-38c45f120b2mr15216751fa.8.1774473944933; Wed, 25 Mar 2026 14:25:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Matthias van de Meent Date: Wed, 25 Mar 2026 22:25:33 +0100 X-Gm-Features: AQROBzA6SPGNi0U5f7Q8X2l6Atc63WpOl1YYAneQ8u_zPd1uFMUt-0e4RduTCgw Message-ID: Subject: Re: SQL-level pg_datum_image_equal To: David Rowley Cc: jian he , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, 25 Mar 2026 at 21:40, David Rowley wrote: > > On Thu, 26 Mar 2026 at 03:46, Matthias van de Meent > wrote: > > Attached is v2, which adds the new function to the docs, in addition > > to rebasing the patch onto master. > > In [1] I was looking into reported bugs with the datum_image code not > behaving correctly. It turns out this is due to hash_numeric() using > the wrong PG_RETURN_* macro, resulting in it not correctly > sign-extending negative 32-bit ints. Thank you for refering me to this. > I did some analysis to see if > there were any user-visible changes from fixing that in the back > branches. I didn't find any. What you're adding here would add one. I > think we might need to become more strict about using the correct > return macro before we expose this stuff, else it's going to be a > rather scary process to fix bugs when it could affect the > datum_image_equal result. I'm not sure we can reliably enforce that sign extension is always going to be consistent for the same type; specifically looking at unsigned integers as example. PG's tuple deserialization would see 4 byte alignment and sign-extend it as if it were a signed integer, but returned values would prefer non- sign-extended values for more efficient casts from larger unsigned types. > The thing about Memoize slowly finding all the bugs in the > datum_image_* code is that we're free to fix these bugs, as the use > case (the Memoize hash table) only lasts as long as the query. If we > expose all of this to users, then they could persist things in tables. > 6911f8037 fixed another problem that resulted in the datum_image code > not doing the right thing. I'm happy to mark this function as STABLE for now (to prevent its inclusion in permanent storage), and/or to adjust the code to adjust for subnormal inputs (values with incorrect/inconsistent/unexpected sign extensions). > I also proposed a hack in [2] to fix the sign-extension problem. I had > hoped that might be temporary, but if we were to expose this function > at the SQL level, then I doubt we'd be brave enough to remove it. pg_datum_image_equal is supposed to only care about the bytes of the datum that contain the actual value, so I can surely update it to do that. I'll be happy to only apply that part to the code inside pg_datum_image_equal; that should remove the part that exposes the issue directly to users. > This would work correctly if I pushed the patch in [2]. But as of yet, > I'm not sure about it. I at least think we should delay doing this > until we've come up with a fix that we're confident in. I'm happy with any of the options: Having your patch at [2] get committed, me applying [2]'s changes in pg_datum_image_equal, or marking the function STABLE (and so, disallowing using its output in permanent storage). Note that false negatives are unlikely to be a problem, whilst false positives might have issues. Sign extension issues here would create false negatives, so likely wouldn't be a huge problem. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com) PS. isn't this an issue with OIDs getting read from disk vs copied around in memory, too? Oids are unsigned, whilst the deserialization path assumes signed, so I'd expect that to have different outputs when the from-disk OID you're checking has its not-sign bit 31 set? > [2] https://postgr.es/m/CAApHDvreF-UiqBaHtRTQWQ6z1X9snstJW%2Bdfb2DU5GOb-uPEBg%40mail.gmail.com