public inbox for [email protected]  
help / color / mirror / Atom feed
From: Zhongpu Chen <[email protected]>
To: [email protected]
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date: Sat, 2 May 2026 10:39:26 +0800
Message-ID: <CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com> (raw)
In-Reply-To: <CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com>
References: <CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com>

The issue is not specific to E'\\x..' literals. A normal COPY FROM data
file with ENCODING 'EUC_CN' can create text rows that later cannot be
retrieved with SELECT.

 This suggests that input validation for EUC_CN is only structural, while
the EUC_CN-to-UTF8 conversion table is stricter.


On Sat, May 2, 2026 at 10:31 AM Zhongpu Chen <[email protected]> wrote:

> See the related bug report
> https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOPthSTiaxLDZzcrGFYSg%40mail.g...
>
> Currently PostgreSQL accepts structurally well-formed EUC_CN byte
> sequences such as 0xA2A3 into text columns. The value round-trips when
> client_encoding is EUC_CN, but fails when client_encoding is UTF8 because
> euc_cn_to_utf8 has no mapping.
>
> If this behavior is intentional for compatibility, the documentation
> should explicitly say that validation for some legacy encodings is
> byte-structure validation, not mapping-table validation.
> If it is not intentional, stricter validation could reject unassigned byte
> positions at input time.
>
> --
> Zhongpu Chen
>


-- 
Zhongpu Chen


view thread (12+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
  In-Reply-To: <CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox