X-Original-To: pgsql-docs-postgresql.org@postgresql.org Received: from localhost (mx1.hub.org [200.46.208.251]) by postgresql.org (Postfix) with ESMTP id E135F9FB730 for ; Fri, 22 Sep 2006 15:01:01 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.208.251]) (amavisd-new, port 10024) with ESMTP id 00347-06-4 for ; Fri, 22 Sep 2006 15:00:41 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey- Received: from momjian.us (momjian.us [70.90.9.53]) by postgresql.org (Postfix) with ESMTP id 250279FB476 for ; Fri, 22 Sep 2006 14:17:30 -0300 (ADT) Received: (from bruce@localhost) by momjian.us (8.11.6/8.11.6) id k8MHHT102489; Fri, 22 Sep 2006 13:17:29 -0400 (EDT) From: Bruce Momjian Message-Id: <200609221717.k8MHHT102489@momjian.us> Subject: Re: This approach to non-ASCII names does not work In-Reply-To: <200609202347.11626.peter_e@gmx.net> To: Peter Eisentraut Date: Fri, 22 Sep 2006 13:17:29 -0400 (EDT) CC: pgsql-docs@postgresql.org, Tom Lane , yazicivo@ttnet.net.tr X-Mailer: ELM [version 2.4ME+ PL123] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="US-ASCII" X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=0.086 tagged_above=0 required=5 tests=AWL, SPF_HELO_PASS, SPF_PASS X-Spam-Level: X-Archive-Number: 200609/31 X-Sequence-Number: 3718 That makes a lot of sense. The encoding mentioned in the HTML is how high-bit characters are treated in the HTML, and doesn't control what entities it supports. However, I am confused how non-Latin users can use SGML if it does not support UTF8 entities. I see this flag in openjade: -b, --encoding=NAME Use encoding NAME for output. but I assume it is only for how to treat the high bits in the file, not for entity recognition. I IM'ed with Peter and he said SGML Docbook just doesn't support UTF8 easily, so I am reverting Volkan YAZICI's name to be ASCII (he requested an all-uppercase last name if we can't use the proper symbol), and documented we can only use HTML4 entities, and updated the URLs we should use for reference. I have the official URL and URLs that show the actual symbols too, which is helpful. If people have names that contain HTML4 symbols, please let me know so I can add the symbols: http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html --------------------------------------------------------------------------- Peter Eisentraut wrote: > Bruce Momjian wrote: > > The unusual thing is that though our docs web pages use a stated > > encoding as ISO-8859-1, the UTF8 number does generate the proper > > symbol in my browser (Mozilla), so I wonder if >255 codes are assumed > > to be UTF8. > > These are two different things. > > A numeric character reference picks the numbered character from the > document character set. The document character set is declared in the > document type declaration (and is therefore fixed by the standards > committee for all users). The document character sets for commonly > used SGML applications are: > > HTML 3.2 Latin 1 (ISO 646 + ECMA 94) > HTML 4+ UCS (ISO 10646) > XML UCS (ISO 10646) > DocBook SGML Latin 1 (ISO 646 + ECMA 94) > > If a font is available, an HTML application (browser) should be able to > process (display) any character from the document character set, > whether it arrives in plain or as a character entity. > > Conversely, a character not in the document character set, such as a > non-Latin-1 character in DocBook SGML, cannot be processed, strictly > speaking. > > The other thing you are talking about is the character *encoding* which > specifies how the sequence of bytes that makes up the document is to be > interpreted. Note that this happens before the document character set > is taken into consideration and is pretty much independent of it. For > example, knowledge of the character encoding is necessary to find > the "&" that starts entities. Not all character encodings are capable > of encoding all characters in the document character set, which is why > you need to use character entities to access characters outside the > encoding. > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +