Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ08h-0005KR-26 for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 02:31:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wJ08e-00CvjS-1u for pgsql-hackers@arkaria.postgresql.org; Sat, 02 May 2026 02:31:28 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJ08e-00CvjJ-0y for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 02:31:28 +0000 Received: from mail-vs1-xe2d.google.com ([2607:f8b0:4864:20::e2d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wJ08b-0000000030d-45Uc for pgsql-hackers@lists.postgresql.org; Sat, 02 May 2026 02:31:28 +0000 Received: by mail-vs1-xe2d.google.com with SMTP id ada2fe7eead31-6221c7251d1so894832137.2 for ; Fri, 01 May 2026 19:31:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1777689084; cv=none; d=google.com; s=arc-20240605; b=AhsRYUwtPBrrrkoHrB90VSRQCymK/QlUDOgqwIISWH/D0bj9oyDVWfutlFKmKbNxYR NaEWmwyI2Z3V2t/acFZ7eM9v6s/4dH4N8jLYS8fVEZ148cV2H3cochwXy9Jka2cg0dRO nuCOXI8e+hIL8X7dhmgpweEX4eKRaA/Tq/A06/BHsFp9yLpfu34mCbIhWEHElTPeAx+P wxk7OUbcJVtkR5MoNsViNQu8d5AVYjetGVCysf0SlS5PywWEYxUZYlgReI2KeqX2jFvc dWBreso8XEz2TS81TaWi0GMu9Sjrjr+EHfIy0uSi2o+pbQlA+DiUK2Oy9yQNY2lbiNNM WnNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=p/DOoYLEzCUh3LGZe4YHjcaPoMjusfXahh3HlZLKkBg=; fh=nwNxTtLLPTU0ewfLM7SSbrjMajMl+wwnFkCY/fi90vE=; b=R6vJCJOBz7U8dXpmd5mNJ3jxeuGhdZ7jRLrK6Pbvb3u2Blj/Mnq3IHVs0OdPlQNsR7 Ie4BQWoZvDz3J+AZlgIDYsnqbluv8VylmiRIZFAdx4xUIRE7PFFmIrQu6iq9ZvCMz0MZ YD5h6n3E7jfxh8ZVCWqxaY1CA/I9yiVN3d/G8HYe2UuVCIsUJKJB+YFPIzGXQNh6lVm4 ONu2cnS2qJ3jkzpDusBxiPKCWkRl2C/CVpzXtyY+ckhH72BDvVsUTcfy7MqtQXcUJfID PB6E+nYnROXLkhrt2+y4oVBpKfaYhE7XKvYHB8zfFtkeL4cE1hRCyvHBhkfL9s7qv0Bs zqWQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777689084; x=1778293884; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=p/DOoYLEzCUh3LGZe4YHjcaPoMjusfXahh3HlZLKkBg=; b=qfFHZmobKHuF5mwc4nO7jLu/GH1Pszlyn4H29SGTVWE+RDu+tChQoDhpJcO5k0fv3j 8NNZl4+n27nIajRw7swBY2wYljbITR4z+Qcicd2S70adUi0z6eDZ0Dp8C9DX9jlpJgBI ZA09PKvq4908xwhXSQ+CT+28LJ7tDbU+LdNpkjbE7UbnsOHxiXAZNcgRJK4KhFoMuNaK Tp36o4z8cvhdRM+5sXHtQ+DCi7OBwXy5v9LdPmBE3ZKVQ2blIXwZr3GGtu53J1pYgrSO adZZlLWo0q/biurGlcA7SiZYzft19PB5tTp20x6GrIs43Amr6MF7ZxVbc0Ce2Vy2xU/1 OHfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777689084; x=1778293884; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p/DOoYLEzCUh3LGZe4YHjcaPoMjusfXahh3HlZLKkBg=; b=NLQDQilSxp06ij5XAUF/aU1XvWyElBgZVZ9QsmsrnRkrf8xdjQCoctyG+SIXBxt/KV PU3EzNdPqrXrMrpwXExZHAhXcLdirP7zOw1s/+9Cy2i286+6RqfamZU7rjKWTA7MZIMK vX+pgTSGK0hHy0uK9kZp5eMdYGVlAy+q7QN/EzPinmAvuzFBPY65SaMj5LhfoIq78wib ypWP8kyKzh6NH0fsW7V+rnl4v2dSXwo497k7wisms2mZ1DrBTmjtHOAiRNqmVWgj9yij GixNXaT4XbdsN3G3k+jxUV0taQQ7SQcupRIPv7su8tPtVo9dKvKjVMKS6Ckwb/OutBV0 XTxw== X-Gm-Message-State: AOJu0YwdquIySOh6DPWT3Pku13BvJB+2r9V2finJcmlnwjGYqbsmflAc M6ss+8Xi4luD95eSbVPhtpRfDczPsWiNVe67pp8HtAm2geimyQyprdJBtac3RC5Lk670gmmsOcc B283CHdmM347AnS0XE2bKqRB2IEl6UbaTcikUuhrZvA== X-Gm-Gg: AeBDies4q15wAU4ua6pWHGIweRqt6IzMqdgVWeMFXOrSAxenaB8mqCfY0WesEeKwbeq a8IlPjm6J5eKJBSdoZISf1HBKClk+RjvohcaoszQXGIr0ZqENQAkQOS6qa+0/GooC3npKN4ReA+ xgDcXGiCGbGidqc5Wq/S/laErGh039AZ9HKBwEJYp6NXTig9H1SCYrVJIXtw5XnYxD4QRm2f4nd Hd9wB4NMuHVmgaADsP3QXW3wryjZq+dRejk989QTIPfJaQIfXPx84g7NFINsEWX4Ui53TASAsLD cWGlfrbNf22cgQpltms= X-Received: by 2002:a05:6102:5123:b0:605:7a45:c7c5 with SMTP id ada2fe7eead31-62d86b21180mr812411137.14.1777689083812; Fri, 01 May 2026 19:31:23 -0700 (PDT) MIME-Version: 1.0 From: Zhongpu Chen Date: Sat, 2 May 2026 10:31:12 +0800 X-Gm-Features: AVHnY4JxsTCwY8bT__p0-y0GG93PFjJccs-h0rxr5XFp12jHrg9BDdj4JMkufdM Message-ID: Subject: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 To: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000005c9f0d0650cc7ee0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005c9f0d0650cc7ee0 Content-Type: text/plain; charset="UTF-8" See the related bug report https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOPthSTiaxLDZzcrGFYSg%40mail.gmail.com Currently PostgreSQL accepts structurally well-formed EUC_CN byte sequences such as 0xA2A3 into text columns. The value round-trips when client_encoding is EUC_CN, but fails when client_encoding is UTF8 because euc_cn_to_utf8 has no mapping. If this behavior is intentional for compatibility, the documentation should explicitly say that validation for some legacy encodings is byte-structure validation, not mapping-table validation. If it is not intentional, stricter validation could reject unassigned byte positions at input time. -- Zhongpu Chen --0000000000005c9f0d0650cc7ee0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Currently PostgreSQL accepts structurally well-formed EUC_CN b= yte sequences such as 0xA2A3 into text columns. The value round-trips when = client_encoding is EUC_CN, but fails when client_encoding is UTF8 because e= uc_cn_to_utf8 has no mapping.

If this behavior is intenti= onal for compatibility, the documentation should explicitly say that valida= tion for some legacy encodings is byte-structure validation, not mapping-ta= ble validation.
If it is not intentional, stricter validation could reje= ct unassigned byte positions at input time.

--
Zhongpu Chen
=
--0000000000005c9f0d0650cc7ee0--