Re: This approach to non-ASCII names does not work

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Peter Eisentraut <[email protected]>
To: Bruce Momjian <[email protected]>
Cc: [email protected]
Cc: Tom Lane <[email protected]>
Subject: Re: This approach to non-ASCII names does not work
Date: Wed, 20 Sep 2006 23:47:10 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Bruce Momjian wrote:
> The unusual thing is that though our docs web pages use a stated
> encoding as ISO-8859-1, the UTF8 number does generate the proper
> symbol in my browser (Mozilla), so I wonder if >255 codes are assumed
> to be UTF8.

These are two different things.

A numeric character reference picks the numbered character from the 
document character set.  The document character set is declared in the 
document type declaration (and is therefore fixed by the standards 
committee for all users).  The document character sets for commonly 
used SGML applications are:

HTML 3.2	Latin 1 (ISO 646 + ECMA 94)
HTML 4+		UCS (ISO 10646)
XML		UCS (ISO 10646)
DocBook SGML	Latin 1 (ISO 646 + ECMA 94)

If a font is available, an HTML application (browser) should be able to 
process (display) any character from the document character set, 
whether it arrives in plain or as a character entity.

Conversely, a character not in the document character set, such as a 
non-Latin-1 character in DocBook SGML, cannot be processed, strictly 
speaking.

The other thing you are talking about is the character *encoding* which 
specifies how the sequence of bytes that makes up the document is to be 
interpreted.  Note that this happens before the document character set 
is taken into consideration and is pretty much independent of it.  For 
example, knowledge of the character encoding is necessary to find 
the "&" that starts entities.  Not all character encodings are capable 
of encoding all characters in the document character set, which is why 
you need to use character entities to access characters outside the 
encoding.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

view thread (16+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: This approach to non-ASCII names does not work
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox