public inbox for [email protected]  
help / color / mirror / Atom feed
From: David G. Johnston <[email protected]>
To: Zhongpu Chen <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date: Fri, 1 May 2026 20:28:31 -0700
Message-ID: <CAKFQuwZuEZFYK9Arp_qFsoJ5o2EDDDCfsTwBYvoxzhBiXRJHQg@mail.gmail.com> (raw)
In-Reply-To: <CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com>
References: <CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com>
	<CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com>

On Friday, May 1, 2026, Zhongpu Chen <[email protected]> wrote:

> The issue is not specific to E'\\x..' literals. A normal COPY FROM data
> file with ENCODING 'EUC_CN' can create text rows that later cannot be
> retrieved with SELECT.
>
>  This suggests that input validation for EUC_CN is only structural, while
> the EUC_CN-to-UTF8 conversion table is stricter.
>

I suspect a lack of desire to maintain and ensure that specific values are
verified; or accepting the runtime cost to do so.  It is indeed
structural.  This point should probably be documented better.  But it’s
hard to feel too bad if the input claims it is providing verifiable EUC_CN
data then proceeds to supply data that lacks meaning in reality.  We are
happy to just store and return your data to you - but it’s unreasonable to
ask for it to be converted.  It would be nice for the database to provide
an extra layer of protection, so I’m not against the idea.  Either
automatically or or at least providing a function that could, say, be
called in a trigger for opt-in.  But definitely feels like a problematic
benefit-to-cost proposition.

David J.


view thread (12+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
  In-Reply-To: <CAKFQuwZuEZFYK9Arp_qFsoJ5o2EDDDCfsTwBYvoxzhBiXRJHQg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox