Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ11z-00066F-0v for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 03:28:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wJ11x-00D2Uz-2v for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 03:28:37 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ11x-00D2Uq-1y for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 03:28:37 +0000 Received: from mail-yw1-x1131.google.com ([2607:f8b0:4864:20::1131]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wJ11v-000000003LT-1PUa for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 03:28:37 +0000 Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-7bd5c773ef3so27315057b3.1 for ; Fri, 01 May 2026 20:28:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1777692513; cv=none; d=google.com; s=arc-20240605; b=hFvfmE4vYSl0+M1QlJKWyeSwRVylxHdq0Mu5/eOlm7tYD7A0j8Rhu90oiOmQqAjmMD GwdkFQed9b87wVVT/hU7Yy3pCPmkSP5swnl8RzyCQGv0CAMSecabC3RdF+sSvph019+f rOKx3nEksleoyxLmNFN0hbT58TR95sIwizp2LZCkpEb/50ycgjFZB5eLBf8/mcV/20jT i75rztPum69jvRBmIRqCXoz7db6rlhZxyqGmFNMEle9Oedi3mNRHM+Lo6sG588XcdTut uzD35sJq/5MkvmjCtKk4CY/yiN/yiUM65Or43LHL9Zlatja9+8TudLNT4GE6W2vdRZ/6 MJzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:dkim-signature; bh=npYQZLxeOHn60FskgWLcXJDpi+pzjgNJwil80T2zWKQ=; fh=2x6smzuDNjQjbc2kcU759ERCAhe8L6QKFcgQ/iKLoyE=; b=UlI5QZN4sXwiWQoHCgTYhX0W/cAB/8cLQy0/GhOn6kFqoz/gDxi0q/esU5zVQRiHN2 6hrZA3eN6RhTdeVJ+I+oyKVKQskhw9Wocv1KN/nKcJDbvFZmHSr4/9I47dQnH82tTFGI QLgMM9lbBFAj5GuOEdnjbdx9VEElZoXuAajCnFC6tLzy1v3eR5OVA7vwuB3phmHGTBXz dhne42qqC4AQBxKxK81uGGW+syWSLNbmta5rHVhJWZB1PF23Nm1GtjMfpvyJRkfD1UQ1 PMMXtT6Pa84HmFxwSjiACqxWExs2Gg9d4CYW8Vh4rMkU23hz4gs5WhYJrT8GOMHdOKJy CC+g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777692513; x=1778297313; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=npYQZLxeOHn60FskgWLcXJDpi+pzjgNJwil80T2zWKQ=; b=BbUN4pja3rfBMAKakP4TD4u34vjSH5IAesv7nHl7XRb1xqI+/k1FeVFo1zUotYnXbr i+nyiGGlf7q+PQ0wxrrv00S9rstqjJ+JSkKp2UtS4fDixdjhnrwRHlVMLX8Wy/HRgXyS 06/8HICExlPDgvXvkEHaxX4L68CZCjS50koDqZutg7+SYLFSh2U+gzW83hU1zKOfppXm +2At6FKY6IE5TQBKpn6qVXgqxsroojbp5gFBIv3KPWzbGPrx+AML+wfW4jaUUIhrjSCj 7cX7SWLQfg6hW5Qaots4XBxQtnta0+cmvVzQ9n9n5ba+xVevTUzq97jUwkCu9SoqRZCf 39qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777692513; x=1778297313; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=npYQZLxeOHn60FskgWLcXJDpi+pzjgNJwil80T2zWKQ=; b=UZkKJulXz9UnHt//T3tp1U5kIuScxel5U0wQCwdyX2fno5QprzVY4srRWbHUmL2kvo sxoXulAeHEdxuYAATeZjj1n/T1zo/4v6xTOJ3QAV1Z5kBwgrv0/AAN/O+eb4Td9CjTwx vxojwfcaeytxyNifNQFnpIBYfijyigYSu7YwtOdI43/fpk5XhQ16R9NgPnMaB8qvmMyG j+6BLemrrz0EZSz1av9d8WjFacUPvW3DPUmxdgMesOBpXb7Rs+SGBYEQeTYzOto9Km9X OWR8IM92YmLvYVUirQw338R3mUGeALhU9aHQiipgd25aJKTUXXOIdqcsb10GT+Je3CaV 8DqQ== X-Gm-Message-State: AOJu0YwzglVENYG3rCBZFDd0+qBa5DKwhsjbQHjA2Y4SZi92h+ltOyS2 TH9Tmz30/g765cmkwHqdSeZxvcnFotGhxBa1OXc1WVPI1R7tkyb+M2HoR2F2s09sclIbWgIPp39 OGLzd84uhXJ2tJfADz6HNQ1s+HaNikdI= X-Gm-Gg: AeBDietuzd9+os/rF7DNIrEawQJYrhNBauhsCuDNXfTpExxweGlR8pFu+HkU9Z6Kr5K sLmCRzScNLqLUNeX8RUkS9CDfExIK8+r/2Bx4PqCNsNB/+XB9TPsruqUMRHVKb7TGpnRVDBiBRr e3xD1Jpauuu+eM/uCOIt0DX5meHjXhU0BYournzOFtuvwaT3Qp7a6xzqgwP+bIYEKXjQ1YKxsPJ DdyPGdRaerU+M89uXhWYty7OkOlK+Q4Lwqitgj44LywwkaTpSgb+3XIFB4Zeg9+UFOmDsdZGypF 54dMRnIhNEpPnjN0Jw== X-Received: by 2002:a05:690e:1386:b0:650:36b0:7565 with SMTP id 956f58d0204a3-65c3db79ee8mr1520546d50.45.1777692512645; Fri, 01 May 2026 20:28:32 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a05:7010:1747:b0:515:94fb:edef with HTTP; Fri, 1 May 2026 20:28:31 -0700 (PDT) In-Reply-To: References: From: "David G. Johnston" Date: Fri, 1 May 2026 20:28:31 -0700 X-Gm-Features: AVHnY4KtgD7tAfMCkFOv2K5w9JfWluT5XeQdyCt1dQ_wPHyCr2XuTVXGqPV5afY Message-ID: Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 To: Zhongpu Chen Cc: "pgsql-hackers@lists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000bc74990650cd4a0c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000bc74990650cd4a0c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Friday, May 1, 2026, Zhongpu Chen wrote: > The issue is not specific to E'\\x..' literals. A normal COPY FROM data > file with ENCODING 'EUC_CN' can create text rows that later cannot be > retrieved with SELECT. > > This suggests that input validation for EUC_CN is only structural, while > the EUC_CN-to-UTF8 conversion table is stricter. > I suspect a lack of desire to maintain and ensure that specific values are verified; or accepting the runtime cost to do so. It is indeed structural. This point should probably be documented better. But it=E2=80= =99s hard to feel too bad if the input claims it is providing verifiable EUC_CN data then proceeds to supply data that lacks meaning in reality. We are happy to just store and return your data to you - but it=E2=80=99s unreason= able to ask for it to be converted. It would be nice for the database to provide an extra layer of protection, so I=E2=80=99m not against the idea. Either automatically or or at least providing a function that could, say, be called in a trigger for opt-in. But definitely feels like a problematic benefit-to-cost proposition. David J. --000000000000bc74990650cd4a0c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Friday, May 1, 2026, Zhongpu Chen <chenloveit@gmail.com> wrote:
The issue is not specific to E'\\x..' litera= ls. A normal COPY FROM data file with ENCODING 'EUC_CN' can create = text rows that later cannot be retrieved with SELECT.

<= div>=C2=A0This suggests that input validation for EUC_CN is only structural= , while the EUC_CN-to-UTF8 conversion table is stricter.

I suspect a lack of desire to maintain and ensure= that specific values are verified; or accepting the runtime cost to do so.= =C2=A0 It is indeed structural.=C2=A0 This point should probably be documen= ted better.=C2=A0 But it=E2=80=99s hard to feel too bad if the input claims= it is providing verifiable EUC_CN data then proceeds to supply data that l= acks meaning in reality.=C2=A0 We are happy to just store and return your d= ata to you - but it=E2=80=99s unreasonable to ask for it to be converted.= =C2=A0 It would be nice for the database to provide an extra layer of prote= ction, so I=E2=80=99m not against the idea.=C2=A0 Either automatically or o= r at least providing a function that could, say, be called in a trigger for= opt-in.=C2=A0 But definitely feels like a problematic benefit-to-cost prop= osition.

David J.

--000000000000bc74990650cd4a0c--