public inbox for [email protected]  
help / color / mirror / Atom feed
From: Thomas Munro <[email protected]>
To: Robert Haas <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Jeroen Vermeulen <[email protected]>
Cc: VASUKI M <[email protected]>
Cc: [email protected]
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: Tue, 14 Apr 2026 18:30:08 +1200
Message-ID: <CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com> (raw)
In-Reply-To: <CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
References: <[email protected]>
	<CA+TgmoaRGSezRaA7x00X495Qho8WGTzggbDSUt-JsruXceZWug@mail.gmail.com>
	<CA+zULE4L4rA2DLAcfy=eQL7w_ZexV4P5zpQRbP=_qrhJBEOzjg@mail.gmail.com>
	<[email protected]>
	<CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com>
	<CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com>
	<CA+TgmoZaoc37ohnhF5inoPxWzfoznV483xQw8Fmw+ELFScv47g@mail.gmail.com>
	<[email protected]>
	<CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>

On Wed, Dec 17, 2025 at 7:43 AM Robert Haas <[email protected]> wrote:
> I think there is a good chance that the right going-forward fix is to
> deprecate the encoding, because according to
> https://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt this and
> everything else that's now under
> https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/ were
> deprecated in 2001. By the time v19 is released, the deprecation will
> be a quarter-century old, and the fact that it doesn't work is good
> evidence that few people will miss it, though perhaps the original
> poster will want to put forward an argument for why we should still
> care about this.

Right, that stuff was withdrawn, along with the BIG5 and JIS X 0212
mappings (here's some interesting discussion about their normative
status[1]).  From what I can figure out, JOHAB was an MS-DOS codepage
(1361), obsoleted by UHC (949) some time around MS-DOS 6.22 or MS-DOS
7 and Windows 95.

So +1 from me, set the phasers to git rm.  Based on the comments for
enum pg_enc, we don't need to worry about numerical stability of
client-only encodings, so I just deleted it (unlike PG_MULE_INTERNAL
which became PG_UNUSED_1).  I didn't mention it in
doc/src/sgml/appendix-obsolete.sgml: the decision criterion for that
seems to be that there was an SGML id that appeared in a URL, which is
not the case here.  The release notes seem like enough of a tombstone
for something that we strongly suspect has 0 users.  Wait until 20, or
just do it now?

I don't have an opinion yet whether the code in the back-branches
might be dangerous, or "fixing" it might be more dangerous, but it's
an interesting question...

[1] https://unicode.org/mail-arch/unicode-ml/y2002-m03/0691.html


Attachments:

  [application/gzip] 0001-Remove-JOHAB-encoding.patch.gz (126.5K, 2-0001-Remove-JOHAB-encoding.patch.gz)
  download

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
  In-Reply-To: <CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox