public inbox for [email protected]  
help / color / mirror / Atom feed
From: Vladimir Valikaev <[email protected]>
To: [email protected]
Cc: Victor Sudakov <[email protected]>
Cc: Eugene Nasonkin <[email protected]>
Subject: BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings
Date: Wed, 25 Feb 2026 11:50:44 +0500
Message-ID: <[email protected]> (raw)

Greetings,

After updating PostgreSQL from version 17.7 to 17.8, we encountered a 
problem when extracting a substring from a UTF8 encoded string:
*ERROR:  invalid byte sequence for encoding "UTF8": 0xe2*

*
*

_Server:_
Linux i-db-sandbox1.4vrs.com 6.1.0-43-cloud-amd64 #1 SMP PREEMPT_DYNAMIC 
Debian 6.1.162-1 (2026-02-08) x86_64 GNU/Linux

$ cat /etc/apt/sources.list.d/pgdg.list
deb http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb-src http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb https://apt-archive.postgresql.org/pub/repos/apt 
bookworm-pgdg-archive main

PostgreSQL 17.8 (Debian 17.8-1.pgdg12+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit

_Steps to reproduce (psql):_

db_fev:vladimir@i-db-sandbox1 => create table test123(id integer, m text);
CREATE TABLE

db_fev:vladimir@i-db-sandbox1 => insert into test123 (id,m) values (1, 
repeat('a', 1027)||E'\xe2\x80\x8d'||repeat('a', 1027));
INSERT 0 1

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(m from 1 for 
256)) from test123;
*ERROR:  invalid byte sequence for encoding "UTF8": 0xe2*

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(substring(m 
from 1 for length(m)) from 1 for 256)) from test123;
  length
--------
     256
(1 row)


_Database db_feb:_
     Name    | Encoding  | Locale Provider | LC_COLLATE | LC_CTYPE | 
Locale | ICU Rules |
------------+-----------+-----------------+------------+----------+--------+-----------+
  db_feb     | UTF8      | libc            | C          | C   | [NULL] | 
[NULL]    |

The problem does not appear on PostgreSQL 17.7. Also, the problem does 
not occur if the string is fully loaded into memory:

db_feb:vladimir@i-db-sandbox1 =# select length(SUBSTRING(*substring(m 
from 1 for length(m))* from 1 for 256)) from test123;
  length
--------
     256
(1 row)

The bugreport has also been sent to [email protected]


-- 
Best Regards,
Vladimir Valikaev
Streamline - Property Management Software


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox