public inbox for [email protected]
help / color / mirror / Atom feedFrom: Bruce Momjian <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: [email protected]
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: This approach to non-ASCII names does not work
Date: Fri, 22 Sep 2006 13:17:29 -0400 (EDT)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
That makes a lot of sense. The encoding mentioned in the HTML is how
high-bit characters are treated in the HTML, and doesn't control what
entities it supports.
However, I am confused how non-Latin users can use SGML if it does not
support UTF8 entities. I see this flag in openjade:
-b, --encoding=NAME Use encoding NAME for output.
but I assume it is only for how to treat the high bits in the file, not
for entity recognition.
I IM'ed with Peter and he said SGML Docbook just doesn't support UTF8
easily, so I am reverting Volkan YAZICI's name to be ASCII (he requested
an all-uppercase last name if we can't use the proper symbol), and
documented we can only use HTML4 entities, and updated the URLs we
should use for reference. I have the official URL and URLs that show
the actual symbols too, which is helpful.
If people have names that contain HTML4 symbols, please let me know so I
can add the symbols:
http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
---------------------------------------------------------------------------
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > The unusual thing is that though our docs web pages use a stated
> > encoding as ISO-8859-1, the UTF8 number does generate the proper
> > symbol in my browser (Mozilla), so I wonder if >255 codes are assumed
> > to be UTF8.
>
> These are two different things.
>
> A numeric character reference picks the numbered character from the
> document character set. The document character set is declared in the
> document type declaration (and is therefore fixed by the standards
> committee for all users). The document character sets for commonly
> used SGML applications are:
>
> HTML 3.2 Latin 1 (ISO 646 + ECMA 94)
> HTML 4+ UCS (ISO 10646)
> XML UCS (ISO 10646)
> DocBook SGML Latin 1 (ISO 646 + ECMA 94)
>
> If a font is available, an HTML application (browser) should be able to
> process (display) any character from the document character set,
> whether it arrives in plain or as a character entity.
>
> Conversely, a character not in the document character set, such as a
> non-Latin-1 character in DocBook SGML, cannot be processed, strictly
> speaking.
>
> The other thing you are talking about is the character *encoding* which
> specifies how the sequence of bytes that makes up the document is to be
> interpreted. Note that this happens before the document character set
> is taken into consideration and is pretty much independent of it. For
> example, knowledge of the character encoding is necessary to find
> the "&" that starts entities. Not all character encodings are capable
> of encoding all characters in the document character set, which is why
> you need to use character entities to access characters outside the
> encoding.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
--
Bruce Momjian [email protected]
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
view thread (16+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: This approach to non-ASCII names does not work
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox