X-Original-To: pgsql-docs-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [200.46.204.144]) by svr1.postgresql.org (Postfix) with ESMTP id 833DA55BBB for ; Sat, 12 Mar 2005 06:28:12 +0000 (GMT) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 22013-09 for ; Sat, 12 Mar 2005 06:28:06 +0000 (GMT) Received: from candle.pha.pa.us (candle.pha.pa.us [64.139.89.126]) by svr1.postgresql.org (Postfix) with ESMTP id 2991C555BA for ; Sat, 12 Mar 2005 06:28:05 +0000 (GMT) Received: (from pgman@localhost) by candle.pha.pa.us (8.11.6/8.11.6) id j2C6S5M07431; Sat, 12 Mar 2005 01:28:05 -0500 (EST) From: Bruce Momjian Message-Id: <200503120628.j2C6S5M07431@candle.pha.pa.us> Subject: Re: Suggestion for Encodings table In-Reply-To: To: Preston Landers Date: Sat, 12 Mar 2005 01:28:05 -0500 (EST) Cc: pgsql-docs@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=ELM1110608885-17976-1_ Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, hits=0.012 tagged_above=0 required=5 tests=AWL X-Spam-Level: X-Archive-Number: 200503/5 X-Sequence-Number: 2901 --ELM1110608885-17976-1_ Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Thanks for the ideas. I have applied the following patch which documents all our encodings. Also, the URL I added is very extensive. --------------------------------------------------------------------------- Preston Landers wrote: > > http://www.postgresql.org/docs/8.0/interactive/multibyte.html#CHARSET-TABLE > > I would humbly suggest a few improvements to that Encodings table to > improve the clarity. > > Many of the entries clearly indicate the language or writing system, such > as WIN1256 = "Windows CP1256 (Arabic)" > > I would suggest that every single entry should be described that way with > the common language or writing system name. Even Unicode could say "All > languages". > > In particular, the "WIN" encoding just says "CP1251" -- this is Cyrillic > (Russian) but some people might just see the WIN and assume it's the > character set that Western/US Windows uses (CP 1252). > > It's an easy mistake to make and one I see repeated frequently on other > web pages (calling Windows "Western" CP 1251.) Someone reading English > language docs and seeing a "WIN" character set might naturally assume that > it is the English Windows character set. (Which BTW is not currently > supported by PG for conversions.) > > Some more examples that might improve clarity: > > LATIN5 should say "Turkish" > > LATIN6 should say "Nordic" > > ALT and KOI8 should say "Cyrillic" (or Russian) > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 --ELM1110608885-17976-1_ Content-Transfer-Encoding: 7bit Content-Type: text/plain Content-Disposition: inline; filename="/bjm/diff" Index: doc/src/sgml/charset.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v retrieving revision 2.49 diff -c -c -r2.49 charset.sgml *** doc/src/sgml/charset.sgml 7 Mar 2005 04:30:48 -0000 2.49 --- doc/src/sgml/charset.sgml 12 Mar 2005 06:24:51 -0000 *************** *** 344,390 **** MULE_INTERNAL ! Mule internal code LATIN1 ! ISO 8859-1/ECMA 94 (Latin alphabet no.1) LATIN2 ! ISO 8859-2/ECMA 94 (Latin alphabet no.2) LATIN3 ! ISO 8859-3/ECMA 94 (Latin alphabet no.3) LATIN4 ! ISO 8859-4/ECMA 94 (Latin alphabet no.4) LATIN5 ! ISO 8859-9/ECMA 128 (Latin alphabet no.5) LATIN6 ! ISO 8859-10/ECMA 144 (Latin alphabet no.6) LATIN7 ! ISO 8859-13 (Latin alphabet no.7) LATIN8 ! ISO 8859-14 (Latin alphabet no.8) LATIN9 ! ISO 8859-15 (Latin alphabet no.9) LATIN10 ! ISO 8859-16/ASRO SR 14111 (Latin alphabet no.10) ISO_8859_5 --- 344,390 ---- MULE_INTERNAL ! Mule internal code (Multi-lingual Emacs) LATIN1 ! ISO 8859-1/ECMA 94 (Western European) LATIN2 ! ISO 8859-2/ECMA 94 (Central European) LATIN3 ! ISO 8859-3/ECMA 94 (South European) LATIN4 ! ISO 8859-4/ECMA 94 (North European) LATIN5 ! ISO 8859-9/ECMA 128 (Turkish) LATIN6 ! ISO 8859-10/ECMA 144 (Nordic) LATIN7 ! ISO 8859-13 (Baltic) LATIN8 ! ISO 8859-14 (Celtic) LATIN9 ! ISO 8859-15 (LATIN1 with Euro and accents) LATIN10 ! ISO 8859-16/ASRO SR 14111 (Romanian) ISO_8859_5 *************** *** 404,414 **** KOI8 ! KOI8-R(U) WIN866 ! Windows CP866 WIN874 --- 404,414 ---- KOI8 ! KOI8-R(U) (Cyrillic) WIN866 ! Windows CP866 (Cyrillic) WIN874 *************** *** 416,426 **** WIN1250 ! Windows CP1250 WIN1251 ! Windows CP1251 WIN1256 --- 416,426 ---- WIN1250 ! Windows CP1250 (Central European) WIN1251 ! Windows CP1251 (Cyrillic) WIN1256 *************** *** 883,888 **** --- 883,900 ---- + + + + + An extensive collection of documents about character sets, encodings, + and code pages. + + + + + + --ELM1110608885-17976-1_--