Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ0Gd-0005RM-0R for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 02:39:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wJ0Gb-00CyAu-2v for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 02:39:41 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ0Gb-00CyAl-1a for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 02:39:41 +0000 Received: from mail-vs1-xe34.google.com ([2607:f8b0:4864:20::e34]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wJ0GZ-00000003nAV-14A0 for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 02:39:40 +0000 Received: by mail-vs1-xe34.google.com with SMTP id ada2fe7eead31-60fecdd1efaso836997137.3 for ; Fri, 01 May 2026 19:39:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1777689579; cv=none; d=google.com; s=arc-20240605; b=FVSu+0lFnF88Ie91HGeUhdwx0Llhu0LyWtmBBkEwEQuwc8XLcCRoh8XjLmT2xAXLoE DjTBCnc/Y9W2p8xHohbnE427/46Vqvo7ED34iGHJLx+uooMW8jOZ0qriLYicgQBncN0T kRsyNZ5TPjq2xgCpf+XHwPhgW/X+7So8nbvzWIaOGuqrU9LXehY7DONTiBLtBVowcuuM M87hCOQszUcwcnxLZL+rO8nqHSmwEDi9ezLhA9FCiMhW78ZVMsm7pnWX+NhWwxPcFrod DoShCh5Wq1Zlar8lyhZFP33hTfhl/Xrq0g19drdJE7wGI7MQ7Y0ME8TxOY0TffABZOKF iMOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=2hllHMqTPDt8m8YO8COW1bB3Sw5HD33KY4Mm2eCtTl8=; fh=nwNxTtLLPTU0ewfLM7SSbrjMajMl+wwnFkCY/fi90vE=; b=OwYoZBq1KcU2VRT+WBioN46V9NSvAtGyBFSSqeGonMstDm6XlvZ3vElqyNul5ZBEYf F1W1GNwdIiACYtXUBhsjYc75gcwE71XBkfF/nCUJ2HjgQCCr3bckA29dCRS31QWoY9DL ftpmiNzgjjoEYl7CkC5pV3JzcppGvmCQoFpMUntuZQlyZQWh5aXGihzlYEVRQAlN+vch L5JSHkW8US4EwW2g6ul50jv9Jh83eVpqpf55OP9CDFo+C7L/9HRNCb/wTu0x0O2qWl9E Lp15iYRE76TppRApkZ1vS5uSVniUtPEEj009dNw0UaLPorTFyF8ISWvpu7a2g4374J8p 09uA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777689579; x=1778294379; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=2hllHMqTPDt8m8YO8COW1bB3Sw5HD33KY4Mm2eCtTl8=; b=KPEakpPMbggKZmtsnTrjwD1lJ8iQUb4jNihopyoJL5PjNz0Ad1MS1OoF/NfB2T7EMx gh2aqEvGfbNcIFmRMrhomJ6MeFYJTMb0YP7SeZ1Ek64p/nMulDBm1JJ/NfNgD2T00tHS B3ZOnbero344nnIivjxOuEZdhlzXIsQYYS2V87VZjVdbtOCcbG9ryHcdMlqm4KDStQzo 2NVgnb+EFBKHgQE6shyLEMYJxz4A7c+ohmzqzlb+aCBUXNa/oPfe+ZzxdqY5BHc4Gx6s A5+741IJP7HdPk3pZYjLc+UvxsIQCPNxt10Dq62IufXoT2iDS8BLtwlB7ZsIPuy5nyc/ 2dRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777689579; x=1778294379; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2hllHMqTPDt8m8YO8COW1bB3Sw5HD33KY4Mm2eCtTl8=; b=kT7c6WtwC6qTTRGvYlv2LjK7UmxKlbyRGii68/1oCIz/o5jDYDeF8ffZhw4mlTeQOx FnRXFuf9dnpcOyWTSqTsmmo484NYM+0KkQxejh8SXPWlSeddk4qkt8INOc6ALdwDujLc 8BdZHqhzf2q3xaXUe8JkTf4Gl9u7g66ouE1vBTdRJuTLZKJaIzpELUF/teA24YGrOlYj nc5wssHtF2MlE6gKhCmc6VbRKrl0Ql7fh5uFpxMkFvbbEkgRQVgYW5mBqikvGv2uci/b J0QrY2CE2qIqksLVZ8Bnl/n/m48eT8f3SsmYnrCvVwKG+3fvwPPKSSiw+0YgrAMnFuCm eikg== X-Gm-Message-State: AOJu0YzSDd1bFI0T2PjPPSsQpDJjvEGJav9pXrC/is90Lxp5NcPRhaCT zOFw2KmValYz98MfBJpyPjKje/bfUm5KBQWIFrfPSgwl7xg4qcnd9DALmOnX2p6DxENSTZPwWxx 3Jc7qcmHEhswR8nq42WaQ96+9wWnFmAwg72+KH+ANbA== X-Gm-Gg: AeBDieu78bZ9UEm6e7ZXsu1L3JzMlm+TqmlsGhrQf0TfeFgKij9qVPi5meb6e0cEcxZ XGUOy0wVyLMdN+4hviJMOL1NpQTa+zjLebMAuQMBK2leVetL5uB8XAEKQL7+mJquoNGH6dogGUM DPzG6g9LX4cjCe+ZwIrnenqBpFDx7YsUCK1Qd+sHgqNI56LxBZSiXgmllhaeKZuKYk6Xa/aspIl E2TkbCj46MnLQj24vwl1vVLeTGDXbaqB25WWRzU8FGJuZx6d0MP2384T1oC2udhux7R+IM2gEzr WZf0X/8b3Mc4cxavJv4= X-Received: by 2002:a05:6102:1614:b0:602:8ccb:c993 with SMTP id ada2fe7eead31-62d8754d012mr649373137.24.1777689578789; Fri, 01 May 2026 19:39:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Zhongpu Chen Date: Sat, 2 May 2026 10:39:26 +0800 X-Gm-Features: AVHnY4JPOt9vuYKjOgw-cjb2EfydaLflaHRCfUEpxEgyom0tOmxytO3xUwQc9Y4 Message-ID: Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 To: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000dd603f0650cc9be7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000dd603f0650cc9be7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The issue is not specific to E'\\x..' literals. A normal COPY FROM data file with ENCODING 'EUC_CN' can create text rows that later cannot be retrieved with SELECT. This suggests that input validation for EUC_CN is only structural, while the EUC_CN-to-UTF8 conversion table is stricter. On Sat, May 2, 2026 at 10:31=E2=80=AFAM Zhongpu Chen = wrote: > See the related bug report > https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOP= thSTiaxLDZzcrGFYSg%40mail.gmail.com > > Currently PostgreSQL accepts structurally well-formed EUC_CN byte > sequences such as 0xA2A3 into text columns. The value round-trips when > client_encoding is EUC_CN, but fails when client_encoding is UTF8 because > euc_cn_to_utf8 has no mapping. > > If this behavior is intentional for compatibility, the documentation > should explicitly say that validation for some legacy encodings is > byte-structure validation, not mapping-table validation. > If it is not intentional, stricter validation could reject unassigned byt= e > positions at input time. > > -- > Zhongpu Chen > --=20 Zhongpu Chen --000000000000dd603f0650cc9be7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The issue is not specific to E'\\x..' literal= s. A normal COPY FROM data file with ENCODING 'EUC_CN' can create t= ext rows that later cannot be retrieved with SELECT.

=C2=A0This suggests that input validation for EUC_CN is only structural,= while the EUC_CN-to-UTF8 conversion table is stricter.


=
On Sat, May 2, 2026 at 10:31=E2=80=AFAM Zhongpu Chen <chenloveit@gmail.com> wrote:

<= /div>
Currently PostgreSQL accepts structurally well-formed EUC_CN byte= sequences such as 0xA2A3 into text columns. The value round-trips when cli= ent_encoding is EUC_CN, but fails when client_encoding is UTF8 because euc_= cn_to_utf8 has no mapping.

If this behavior is intentiona= l for compatibility, the documentation should explicitly say that validatio= n for some legacy encodings is byte-structure validation, not mapping-table= validation.
If it is not intentional, stricter validation could reject = unassigned byte positions at input time.

--
Zhongpu Chen


--
Zhongpu Chen
--000000000000dd603f0650cc9be7--