public inbox for [email protected]  
help / color / mirror / Atom feed
From: Noah Misch <[email protected]>
To: [email protected]
Subject: pgsql: Fix SUBSTRING() for toasted multibyte characters.
Date: Sat, 14 Feb 2026 20:17:29 +0000
Message-ID: <[email protected]> (raw)

Fix SUBSTRING() for toasted multibyte characters.

Commit 1e7fe06c10c0a8da9dd6261a6be8d405dc17c728 changed
pg_mbstrlen_with_len() to ereport(ERROR) if the input ends in an
incomplete character.  Most callers want that.  text_substring() does
not.  It detoasts the most bytes it could possibly need to get the
requested number of characters.  For example, to extract up to 2 chars
from UTF8, it needs to detoast 8 bytes.  In a string of 3-byte UTF8
chars, 8 bytes spans 2 complete chars and 1 partial char.

Fix this by replacing this pg_mbstrlen_with_len() call with a string
traversal that differs by stopping upon finding as many chars as the
substring could need.  This also makes SUBSTRING() stop raising an
encoding error if the incomplete char is past the end of the substring.
This is consistent with the general philosophy of the above commit,
which was to raise errors on a just-in-time basis.  Before the above
commit, SUBSTRING() never raised an encoding error.

SUBSTRING() has long been detoasting enough for one more char than
needed, because it did not distinguish exclusive and inclusive end
position.  For avoidance of doubt, stop detoasting extra.

Back-patch to v14, like the above commit.  For applications using
SUBSTRING() on non-ASCII column values, consider applying this to your
copy of any of the February 12, 2026 releases.

Reported-by: SATŌ Kentarō <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Bug: #19406
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/14b1fd6176cb9353846deff607e5dfad09eb5b23

Modified Files
--------------
src/backend/utils/adt/varlena.c         | 62 ++++++++++++++++++++++++++-------
src/test/regress/input/encoding.source  | 19 +++++++++-
src/test/regress/output/encoding.source | 44 ++++++++++++++++++++++-
3 files changed, 111 insertions(+), 14 deletions(-)



view thread (6+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: pgsql: Fix SUBSTRING() for toasted multibyte characters.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox