Received: from maia.hub.org (maia-3.hub.org [200.46.204.243]) by mail.postgresql.org (Postfix) with ESMTP id EC855B5DBE1 for ; Thu, 19 May 2011 18:49:10 -0300 (ADT) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.204.243]) (amavisd-maia, port 10024) with ESMTP id 65735-05 for ; Thu, 19 May 2011 21:49:03 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by mail.postgresql.org (Postfix) with SMTP id 2B901B5DBE7 for ; Thu, 19 May 2011 18:49:02 -0300 (ADT) Received: (qmail invoked by alias); 19 May 2011 21:49:01 -0000 Received: from a88-115-218-165.elisa-laajakaista.fi (EHLO [10.0.0.101]) [88.115.218.165] by mail.gmx.net (mp072) with SMTP; 19 May 2011 23:49:01 +0200 X-Authenticated: #495269 X-Provags-ID: V01U2FsdGVkX1+U8k9nULqUeoXV31otKMsVuRj46B9JO+99OAAhEC JjFDyB2XBOT+Ha Subject: non-ASCII characters in SGML documentation (and elsewhere) From: Peter Eisentraut To: pgsql-docs@postgresql.org Content-Type: text/plain; charset="UTF-8" Date: Fri, 20 May 2011 00:49:00 +0300 Message-ID: <1305841740.3952.32.camel@vanquo.pezone.net> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-1.899 tagged_above=-5 required=5 tests=BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, T_RP_MATCHES_RCVD=-0.01, T_TO_NO_BRKTS_FREEMAIL=0.01 X-Spam-Level: X-Archive-Number: 201105/60 X-Sequence-Number: 6735 There are a few literal non-ASCII characters in the SGML documentation, namely in isn.sgml release-7.4.sgml release-8.4.sgml Also, there are some encoded (&foo;) non-ASCII characters in release-8.0.sgml release-8.1.sgml release-8.2.sgml unaccent.sgml These all work fine, because they are all LATIN1, and DocBook SGML uses LATIN1. But I notice that the contributor names in the 9.1 release notes have been carefully ASCII-fied, presumably from the Git UTF-8 commit messages. For additional amusement, when creating the HISTORY file, lynx recodes the HTML into the encoding specified by your LC_CTYPE environment setting. Also, the following source files contain non-ASCII characters in comments: src/backend/port/dynloader/darwin.c (LATIN1) src/backend/storage/lmgr/predicate.c (UTF8) src/backend/storage/lmgr/README-SSI (UTF8) The last two are new in 9.1. So, some questions: * Should we consistently use entities for encoding non-ASCII characters in SGML? Or use LATIN1 freely? * Should we allow/use non-ASCII characters in the release notes? * What encoding should the HISTORY file have? * Should we allow non-ASCII characters in general source files? * If so, what should the encoding be?