public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: Thomas Munro <[email protected]>
Cc: [email protected]
Cc: Heikki Linnakangas <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Jeroen Vermeulen <[email protected]>
Cc: VASUKI M <[email protected]>
Cc: [email protected]
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: Tue, 14 Apr 2026 22:06:18 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+hUKGJMrcS=hBkqVk=5pjM4w8edG=_ArASC82RqB6HQro-v-g@mail.gmail.com>
References: <[email protected]>
	<CA+TgmoaRGSezRaA7x00X495Qho8WGTzggbDSUt-JsruXceZWug@mail.gmail.com>
	<CA+zULE4L4rA2DLAcfy=eQL7w_ZexV4P5zpQRbP=_qrhJBEOzjg@mail.gmail.com>
	<[email protected]>
	<CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com>
	<CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com>
	<CA+TgmoZaoc37ohnhF5inoPxWzfoznV483xQw8Fmw+ELFScv47g@mail.gmail.com>
	<[email protected]>
	<CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
	<CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com>
	<[email protected]>
	<CAAAe_zCLVunjt1u+2E86shwc3hk1x4bzUyU86nY1fq-nAVYN0Q@mail.gmail.com>
	<CA+hUKGJMrcS=hBkqVk=5pjM4w8edG=_ArASC82RqB6HQro-v-g@mail.gmail.com>

Thomas Munro <[email protected]> writes:
> On Wed, Apr 15, 2026 at 1:20 PM Henson Choi <[email protected]> wrote:
>> I understand the appeal of simply deleting a dead-looking encoding,
>> and Thomas' removal patch is clean work.  However, Korean archival
>> data from the 1990s (government records, academic repositories, early
>> online corpora) does exist as JOHAB bytes; as a client encoding, JOHAB
>> in PostgreSQL provides a straightforward ingest path
>> (client_encoding=JOHAB, convert_from, then store as UTF-8).  Once
>> removed, that path closes with no obvious alternative short of
>> preprocessing outside PostgreSQL.  Fixing the verifier preserves the
>> capability at the cost of a ~30-line correction plus tests.

> The counter argument would be that you could use iconv
> --from-code=JOHAB ..., or libiconv, or the codecs available in Python,
> Java, etc for dealing with historical archived data, something that
> data archivists must be very aware of.

Sure.  But it's not comfortable to remove a user-visible feature
we've had for decades.  My own primary concern about it was that a
correct fix could require non-backwards-compatible behavior changes.
Henson's analysis says that that's not a problem.  So assuming this
patch withstands review, I'd be much happier to see it applied than
to remove JOHAB.

No opinion at the moment about whether to back-patch.

			regards, tom lane






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox