public inbox for [email protected]
help / color / mirror / Atom feedFrom: Zhongpu Chen <[email protected]>
To: [email protected]
Subject: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date: Sat, 2 May 2026 10:31:12 +0800
Message-ID: <CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com> (raw)
See the related bug report
https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOPthSTiaxLDZzcrGFYSg%40mail.g...
Currently PostgreSQL accepts structurally well-formed EUC_CN byte sequences
such as 0xA2A3 into text columns. The value round-trips when
client_encoding is EUC_CN, but fails when client_encoding is UTF8 because
euc_cn_to_utf8 has no mapping.
If this behavior is intentional for compatibility, the documentation should
explicitly say that validation for some legacy encodings is byte-structure
validation, not mapping-table validation.
If it is not intentional, stricter validation could reject unassigned byte
positions at input time.
--
Zhongpu Chen
view thread (12+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
In-Reply-To: <CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox