Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvFMv-00AIHu-01 for pgsql-bugs@arkaria.postgresql.org; Wed, 25 Feb 2026 13:56:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvFMt-006V4C-2k for pgsql-bugs@arkaria.postgresql.org; Wed, 25 Feb 2026 13:55:59 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vv8jV-004jvc-2H for pgsql-bugs@lists.postgresql.org; Wed, 25 Feb 2026 06:50:53 +0000 Received: from mail.4vrs.com ([54.148.163.217]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vv8jS-00000001B60-1Yla for pgsql-bugs@lists.postgresql.org; Wed, 25 Feb 2026 06:50:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=4vrs.com; s=20201002; h=To:Subject:From:Date:Message-ID:In-Reply-To; bh=7O8are822I6dROLgdE5MvwmTxQo0nZ5NzU6WF460Kso=; b=CtWe4jI1LLaKUQXQ6tFGaGeGX8 q8ccEkh6IK5uO6jsiJYd6EIBnBFBRuLoOmFCnmIaZQWcbNHpDg2ojlJhh8aU0vfGWhKBvAvsqrqcX x4LhPNSUukhTVhxoBE05JFoHlg4ZRQLjyensy2EyVWGMtCu/PeWZCXnWdGGCQdDShGFc=; Received: from localhost ([127.0.0.1] helo=mail3) by mail.4vrs.com with esmtp (Exim 4.96) (envelope-from ) id 1vv8jQ-002BdU-2W; Tue, 24 Feb 2026 23:50:49 -0700 Received: from [192.168.25.253] ([31.3.209.187]) by mail3 with ESMTPSA id +AsPHMabnmmq9AcAHin7ng (envelope-from ); Tue, 24 Feb 2026 23:50:46 -0700 Content-Type: multipart/alternative; boundary="------------dh1j5J5L4ZYoNt0I9l007WU9" Message-ID: <9e005eef-a5dc-4ca3-8589-d7836c459e4d@4vrs.com> Date: Wed, 25 Feb 2026 11:50:44 +0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Vladimir Valikaev Subject: BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings Content-Language: en-US To: pgsql-bugs@lists.postgresql.org Cc: Victor Sudakov , Eugene Nasonkin X-Spam_score: -2.9 X-Spam_score_int: -28 X-Spam_bar: -- X-Spam_report: Spam detection software, SpamAssassin 4.0.1 (2024-03-25) running on the system "mail3.4vrs.com", has NOT identified this incoming email as spam. If you have any questions, contact postmaster@4vrs.com for details. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.0 HTML_MESSAGE BODY: HTML included in message List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------dh1j5J5L4ZYoNt0I9l007WU9 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Greetings, After updating PostgreSQL from version 17.7 to 17.8, we encountered a problem when extracting a substring from a UTF8 encoded string: *ERROR:  invalid byte sequence for encoding "UTF8": 0xe2* * * _Server:_ Linux i-db-sandbox1.4vrs.com 6.1.0-43-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.162-1 (2026-02-08) x86_64 GNU/Linux $ cat /etc/apt/sources.list.d/pgdg.list deb http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main deb-src http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main deb https://apt-archive.postgresql.org/pub/repos/apt bookworm-pgdg-archive main PostgreSQL 17.8 (Debian 17.8-1.pgdg12+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit _Steps to reproduce (psql):_ db_fev:vladimir@i-db-sandbox1 => create table test123(id integer, m text); CREATE TABLE db_fev:vladimir@i-db-sandbox1 => insert into test123 (id,m) values (1, repeat('a', 1027)||E'\xe2\x80\x8d'||repeat('a', 1027)); INSERT 0 1 db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(m from 1 for 256)) from test123; *ERROR:  invalid byte sequence for encoding "UTF8": 0xe2* db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(substring(m from 1 for length(m)) from 1 for 256)) from test123;  length --------     256 (1 row) _Database db_feb:_     Name    | Encoding  | Locale Provider | LC_COLLATE | LC_CTYPE | Locale | ICU Rules | ------------+-----------+-----------------+------------+----------+--------+-----------+  db_feb     | UTF8      | libc            | C          | C   | [NULL] | [NULL]    | The problem does not appear on PostgreSQL 17.7. Also, the problem does not occur if the string is fully loaded into memory: db_feb:vladimir@i-db-sandbox1 =# select length(SUBSTRING(*substring(m from 1 for length(m))* from 1 for 256)) from test123;  length --------     256 (1 row) The bugreport has also been sent to bugs@postgrespro.ru -- Best Regards, Vladimir Valikaev Streamline - Property Management Software --------------dh1j5J5L4ZYoNt0I9l007WU9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Greetings,

After updating PostgreSQL from version 17.7 to 17.8, we encountered a problem when extracting a substring from a UTF8 encoded string:
ERROR:  invalid byte sequence for encoding "UTF8": 0xe2


Server:
Linux i-db-sandbox1.4vrs.com 6.1.0-43-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.162-1 (2026-02-08) x86_64 GNU/Linux

$ cat /etc/apt/sources.list.d/pgdg.list 
deb http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb-src http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb https://apt-archive.postgresql.org/pub/repos/apt bookworm-pgdg-archive main

PostgreSQL 17.8 (Debian 17.8-1.pgdg12+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit

Steps to reproduce (psql):

db_fev:vladimir@i-db-sandbox1 => create table test123(id integer, m text);
CREATE TABLE

db_fev:vladimir@i-db-sandbox1 => insert into test123 (id,m) values (1, repeat('a', 1027)||E'\xe2\x80\x8d'||repeat('a', 1027));
INSERT 0 1

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(m from 1 for 256)) from test123;
ERROR:  invalid byte sequence for encoding "UTF8": 0xe2

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(substring(m from 1 for length(m)) from 1 for 256)) from test123;
 length 
--------
    256
(1 row)


Database db_feb:
    Name    | Encoding  | Locale Provider | LC_COLLATE | LC_CTYPE | Locale | ICU Rules |
------------+-----------+-----------------+------------+----------+--------+-----------+
 db_feb     | UTF8      | libc            | C          | C        | [NULL] | [NULL]    |

The problem does not appear on PostgreSQL 17.7. Also, the problem does not occur if the string is fully loaded into memory:

db_feb:vladimir@i-db-sandbox1 =# select length(SUBSTRING(substring(m from 1 for length(m)) from 1 for 256)) from test123;
 length 
--------
    256
(1 row)


The bugreport has also been sent to bugs@postgrespro.ru


-- 
Best Regards,
Vladimir Valikaev
Streamline - Property Management Software
--------------dh1j5J5L4ZYoNt0I9l007WU9--