Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vtYEs-00EYVC-2f for pgsql-general@arkaria.postgresql.org; Fri, 20 Feb 2026 21:40:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vtYEr-009JY1-2T for pgsql-general@arkaria.postgresql.org; Fri, 20 Feb 2026 21:40:41 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vtYEr-009JXt-1Q for pgsql-general@lists.postgresql.org; Fri, 20 Feb 2026 21:40:41 +0000 Received: from dverite2024.planet-service.net ([185.16.44.252] helo=mail.verite.pro) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vtYEo-00000000RXT-1JQE for pgsql-general@postgresql.org; Fri, 20 Feb 2026 21:40:41 +0000 Received: by mail.verite.pro (Postfix, from userid 1000) id B73322C026F; Fri, 20 Feb 2026 22:40:37 +0100 (CET) Content-Type: text/plain; charset="iso-8859-15" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: Can we get sha* function over text, that could be used in index? From: "Daniel Verite" To: "Linus Heckemann" Cc: "pgsql-general mailing list" In-Reply-To: Date: Fri, 20 Feb 2026 22:40:37 +0100 Message-Id: <63a9ec31-8a5f-47c7-8a70-f1c57207ffd3@manitou-mail.org> X-Mailer: Manitou v1.7.3 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Linus Heckemann wrote: > - there is a byte-array representation of text columns, which appears to > be independent of database encoding Not sure what you're refering to. Both the on-disk and in-memory representations of text/varchar are encoding-dependent. > The obvious (to a naive user, like I was) approach, casting to bytea, > has exceptionally surprising behaviour: for many text strings, it does > exactly what the naive user might hope for, giving back the UTF-8 > representation. But multiple distinct text strings, like '\033' and > '\x1b', convert to the same byte string! And text strings containing a > backslash that doesn't fit the bytea hex format or the bytea escape > format will fail to convert completely! Yes. It seems a common mistake to forget or ignore that backslashes are special in the input text representation of bytea. It might be not obvious from reading the doc at [1] but we just need to quote backslashes by doubling them. AFAIK a working solution for the OP would be: sha256(replace(colname, '\', '\\')::bytea) The result is encoding-dependent, but that does not matter in the context of an expression. index. If the database ever needs to change its encoding, it will have to be recreated entirely anyway. [1] https://www.postgresql.org/docs/current/datatype-binary.html#DATATYPE-BINAR= Y-BYTEA-ESCAPE-FORMAT Best regards, --=20 Daniel V=E9rit=E9=20 https://postgresql.verite.pro/