Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCpeB-002Liq-2V for pgsql-bugs@arkaria.postgresql.org; Wed, 15 Apr 2026 02:06:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wCpeA-00Dzd8-0U for pgsql-bugs@arkaria.postgresql.org; Wed, 15 Apr 2026 02:06:31 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCpe9-00Dzd0-2x for pgsql-bugs@lists.postgresql.org; Wed, 15 Apr 2026 02:06:30 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wCpe4-000000018OT-2lgu for pgsql-bugs@lists.postgresql.org; Wed, 15 Apr 2026 02:06:30 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 63F26Is31910470; Tue, 14 Apr 2026 22:06:18 -0400 From: Tom Lane To: Thomas Munro cc: assam258@gmail.com, Heikki Linnakangas , Robert Haas , Jeroen Vermeulen , VASUKI M , pgsql-bugs@lists.postgresql.org Subject: Re: BUG #19354: JOHAB rejects valid byte sequences In-reply-to: References: <19354-eefe6d8b3e84f9f2@postgresql.org> <2292889.1765846569@sss.pgh.pa.us> <2393116.1765899706@sss.pgh.pa.us> <6a8122ac-123d-4e93-9269-0b3be1e4a5a4@iki.fi> Comments: In-reply-to Thomas Munro message dated "Wed, 15 Apr 2026 13:49:24 +1200" MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <1910468.1776218778.1@sss.pgh.pa.us> Content-Transfer-Encoding: 8bit Date: Tue, 14 Apr 2026 22:06:18 -0400 Message-ID: <1910469.1776218778@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Thomas Munro writes: > On Wed, Apr 15, 2026 at 1:20 PM Henson Choi wrote: >> I understand the appeal of simply deleting a dead-looking encoding, >> and Thomas' removal patch is clean work. However, Korean archival >> data from the 1990s (government records, academic repositories, early >> online corpora) does exist as JOHAB bytes; as a client encoding, JOHAB >> in PostgreSQL provides a straightforward ingest path >> (client_encoding=JOHAB, convert_from, then store as UTF-8). Once >> removed, that path closes with no obvious alternative short of >> preprocessing outside PostgreSQL. Fixing the verifier preserves the >> capability at the cost of a ~30-line correction plus tests. > The counter argument would be that you could use iconv > --from-code=JOHAB ..., or libiconv, or the codecs available in Python, > Java, etc for dealing with historical archived data, something that > data archivists must be very aware of. Sure. But it's not comfortable to remove a user-visible feature we've had for decades. My own primary concern about it was that a correct fix could require non-backwards-compatible behavior changes. Henson's analysis says that that's not a problem. So assuming this patch withstands review, I'd be much happier to see it applied than to remove JOHAB. No opinion at the moment about whether to back-patch. regards, tom lane