Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqwwf-003qkv-23 for pgsql-bugs@arkaria.postgresql.org; Fri, 13 Feb 2026 17:27:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vqwwe-00Fcjz-2i for pgsql-bugs@arkaria.postgresql.org; Fri, 13 Feb 2026 17:27:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqwwe-00Fcjr-1B for pgsql-bugs@lists.postgresql.org; Fri, 13 Feb 2026 17:27:09 +0000 Received: from mail-dl1-x1233.google.com ([2607:f8b0:4864:20::1233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vqwwc-00000000WYD-3bx8 for pgsql-bugs@lists.postgresql.org; Fri, 13 Feb 2026 17:27:08 +0000 Received: by mail-dl1-x1233.google.com with SMTP id a92af1059eb24-124a1b4dd40so851953c88.0 for ; Fri, 13 Feb 2026 09:27:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leadboat.com; s=google; t=1771003625; x=1771608425; darn=lists.postgresql.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=JG5P4ZQa+bCb9hN1CjvqkZ+7HlaIJ3b9xT+pg95+/fE=; b=caTG8/ioD71x2OzZbvACnZEIs7fL/RES3Ij6jDu9hA5Kkb+LlJgQfuKOPitcQQ18MC 5ddKiP3RRvBdcE/OagnTBgGCxlNrGaguxcKlMGM80XLEqMA8cHagNBGw9pHHiK8+pbM1 o5gy7Mg0UjLKFjMl4nDYKRrYEXUdS7FmVLb1A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771003625; x=1771608425; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=JG5P4ZQa+bCb9hN1CjvqkZ+7HlaIJ3b9xT+pg95+/fE=; b=AG7Q5onziHegotGCpAP2Jf/w4Jc41zj9gOaTmg0jIwwZFXezLJHI2vu+b5ZjrL9zVR fpD7Hqjq8Dg4fEBBD8/Lp+rigUvMlujZ5ApxM8rLnuKb9V/WKcWPl916uEulED/BsXhW Q+1QI2ItbNz4YZ1PWvY0l1/s7/OhLrH6cIlcyCwthcdbrwSJKVfmuAQVAe8lNTdmBQrc X6ZvVKI/lumtvAbke5xEVTSinsjslxaWWTA1x57zvuVO97mJu7pQCkbQriZmoZLEPtVT qo3aLFK53Y0cGcB4v6pIo6GiiCurBgzjMCGAuYTIf507zksR7xSxjrVPz6x6yU9m2kN9 vc0Q== X-Forwarded-Encrypted: i=1; AJvYcCV1LvEZWT07DljIGPl7QPX21txOnezvhYmeUd4WN7N3tq2hdy+4lqOCH/llOFtg7rv3R4oi7TPnpwCs@lists.postgresql.org X-Gm-Message-State: AOJu0Yx/dAd1vcFgbD4LC/INHMagTqJAdqtsoU+v10p4XGe2WwXQFBlN nZ8TWADj69DgrsGMIiBhZ8oT2r+5fvr9Cl6PetshxTlCMHZqgaUGrvrts40kJn5tGQ== X-Gm-Gg: AZuq6aKtXHSZWkrFXAt/5ZqdLeyX9wOS4WNHEiUQTZTpGwIcazBp0VulwqsYcKmgzvj OaVaJiVpHKWWicmQhqqzRQFQ5rCg4fjz73WG0QLvWQ2QZ3wztPsA9OlztOfOKVJ2iIkBEHLLB+I G0DdrFnq+TWlblIiTeM7Xj6PMgflDAXYUwLKl4i7bM3JrZ8XKLvOthUNlIk7lwQX4/+yKJON0C3 yFlSFyNJlo/quaM3yHhGRIjh+g15705i/3BIsv8Aj9c20XMafZE1xlypkud8rMIYJN5Owad3YmC 8a0egLPBdrPp8L2TyiCtVDJmRfTv6BHNus39pj9YuqnTuQ7oRRmaUaaH9lAzL3FRRFwSHnPHdmB I+bDMACtFs+ctpinX/yZAGUEKayulA/MmRPtQ5vrfxcOKPlJPkiXJQ9k0jsOzFgwcmlg094ZQAB DJN3etH3PaL2gr/tyInhBG1z6nBpLjI0sLre+2LQUqe2CiavWVMSKoxIfsLBqC4htMeXYEjQ== X-Received: by 2002:a05:7022:2212:b0:124:9acd:3ef9 with SMTP id a92af1059eb24-1273adb35d3mr1062903c88.8.1771003624600; Fri, 13 Feb 2026 09:27:04 -0800 (PST) Received: from rfd.leadboat.com (c-73-15-160-255.hsd1.ca.comcast.net. [73.15.160.255]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1272a636095sm8856864c88.0.2026.02.13.09.27.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Feb 2026 09:27:04 -0800 (PST) Date: Fri, 13 Feb 2026 09:27:02 -0800 From: Noah Misch To: ranvis@gmail.com, pgsql-bugs@lists.postgresql.org Cc: thomas.munro@gmail.com Subject: Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 Message-ID: <20260213172702.71@rfd.leadboat.com> References: <19406-9867fddddd724fca@postgresql.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19406-9867fddddd724fca@postgresql.org> User-Agent: Mutt/2.2.12 (2023-09-09) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote: > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises: > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97 > on valid UTF-8 text stored in a TOAST-compressed column. > user=> select substring(data from 1 for 1) from toast_repro; > ERROR: 22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97 Thanks for the report. That is a bug and a regression; I regret missing it during review. The substring operation works by taking a 4-byte slice from the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL), the finding the actual first character within those bytes. However, it incorrectly requires those four bytes to be a valid UTF8 string. I'll start on a fix.