public inbox for [email protected]
help / color / mirror / Atom feednon-ASCII characters in SGML documentation (and elsewhere)
10+ messages / 5 participants
[nested] [flat]
* non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-19 21:49 Peter Eisentraut <[email protected]>
0 siblings, 2 replies; 10+ messages in thread
From: Peter Eisentraut @ 2011-05-19 21:49 UTC (permalink / raw)
To: pgsql-docs
There are a few literal non-ASCII characters in the SGML documentation,
namely in
isn.sgml
release-7.4.sgml
release-8.4.sgml
Also, there are some encoded (&foo;) non-ASCII characters in
release-8.0.sgml
release-8.1.sgml
release-8.2.sgml
unaccent.sgml
These all work fine, because they are all LATIN1, and DocBook SGML uses
LATIN1.
But I notice that the contributor names in the 9.1 release notes have
been carefully ASCII-fied, presumably from the Git UTF-8 commit
messages.
For additional amusement, when creating the HISTORY file, lynx recodes
the HTML into the encoding specified by your LC_CTYPE environment
setting.
Also, the following source files contain non-ASCII characters in
comments:
src/backend/port/dynloader/darwin.c (LATIN1)
src/backend/storage/lmgr/predicate.c (UTF8)
src/backend/storage/lmgr/README-SSI (UTF8)
The last two are new in 9.1.
So, some questions:
* Should we consistently use entities for encoding non-ASCII
characters in SGML? Or use LATIN1 freely?
* Should we allow/use non-ASCII characters in the release notes?
* What encoding should the HISTORY file have?
* Should we allow non-ASCII characters in general source files?
* If so, what should the encoding be?
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-20 08:44 Susanne Ebrecht <[email protected]>
parent: Peter Eisentraut <[email protected]>
1 sibling, 0 replies; 10+ messages in thread
From: Susanne Ebrecht @ 2011-05-20 08:44 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: pgsql-docs
Hello Peter,
On 19.05.2011 23:49, Peter Eisentraut wrote:
> So, some questions:
>
> * Should we consistently use entities for encoding non-ASCII
> characters in SGML? Or use LATIN1 freely?
> * Should we allow/use non-ASCII characters in the release notes?
> * What encoding should the HISTORY file have?
> * Should we allow non-ASCII characters in general source files?
> * If so, what should the encoding be?
one more argument for switching to XML? :)
I guess we will get some more non-ASCII signs in documentation.
How do you want to document the collation stuff?
Collations are for all that isn't ASCII.
Our docs usually have small examples.
I can imagine that you want to place German or Russian letters or whatever
else as examples into doc.
Do you have another idea then using utf8?
What do you expect what not would fit into utf8?
I would expect words like déjà vu - means words that English just copied
from French and still use the French accents.
Or even personal names with e.g. umlauts, accents, and other special
signs from
special languages.
Also consider - usually editors (vi, emacs) use utf8 today.
Btw.
For German docs I use utf8.
The HTML output works well with both 'ö' and 'ö'.
I not yet tested other outputs.
I just changed to utf8 in stylsheets and use export SP_ENCODING=XML
before compiling.
Unfortunately index sorting neither works with 'ö' nor 'ö' yet.
We are still fighting with it and try to figure out how we can force that
it will sort correct.
Just changing makefile didn't help.
But - in English docs - I doubt that you have to deal with indexes on
special
words using non-ASCII characters.
Means very small and low effort changes already might help.
Susanne
--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-20 11:56 Tom Lane <[email protected]>
parent: Peter Eisentraut <[email protected]>
1 sibling, 3 replies; 10+ messages in thread
From: Tom Lane @ 2011-05-20 11:56 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: pgsql-docs
Peter Eisentraut <[email protected]> writes:
> * Should we consistently use entities for encoding non-ASCII
> characters in SGML? Or use LATIN1 freely?
I think we previously discussed this and agreed that all non-ASCII in
the SGML docs should be written as entities. The existence of
violations of that rule is just, well, a violation that ought to be
fixed.
> * Should we allow/use non-ASCII characters in the release notes?
> * What encoding should the HISTORY file have?
Ideally "sure, if entity-ified", but I don't know what to do about
HISTORY.
> * Should we allow non-ASCII characters in general source files?
Prefer "no" here.
regards, tom lane
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-20 12:16 Alvaro Herrera <[email protected]>
parent: Tom Lane <[email protected]>
2 siblings, 1 reply; 10+ messages in thread
From: Alvaro Herrera @ 2011-05-20 12:16 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; pgsql-docs
Excerpts from Tom Lane's message of vie may 20 07:56:58 -0400 2011:
> Peter Eisentraut <[email protected]> writes:
> > * Should we consistently use entities for encoding non-ASCII
> > characters in SGML? Or use LATIN1 freely?
>
> I think we previously discussed this and agreed that all non-ASCII in
> the SGML docs should be written as entities. The existence of
> violations of that rule is just, well, a violation that ought to be
> fixed.
+1
> > * Should we allow/use non-ASCII characters in the release notes?
> > * What encoding should the HISTORY file have?
>
> Ideally "sure, if entity-ified", but I don't know what to do about
> HISTORY.
Can we recode that to plain ascii? I think iconv has a //TRANSLIT flag
or something like that.
--
Álvaro Herrera <[email protected]>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-20 13:04 Susanne Ebrecht <[email protected]>
parent: Tom Lane <[email protected]>
2 siblings, 1 reply; 10+ messages in thread
From: Susanne Ebrecht @ 2011-05-20 13:04 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; pgsql-docs
On 20.05.2011 13:56, Tom Lane wrote:
>> * Should we allow non-ASCII characters in general source files?
> Prefer "no" here.
I only see two reasons for non-ASCII signs in English.
Either it is a foreign name of e.g. a person
or it is a word that English took from French like in déjà vu.
For the second I am sure you will find synonyms that are ASCII only.
The only other reason that I can see for non-ASCII signs in our docs is
for demonstrating collations.
Susanne
--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-20 17:31 Alvaro Herrera <[email protected]>
parent: Susanne Ebrecht <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Alvaro Herrera @ 2011-05-20 17:31 UTC (permalink / raw)
To: Susanne Ebrecht <[email protected]>; +Cc: Tom Lane <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-docs
Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011:
> On 20.05.2011 13:56, Tom Lane wrote:
> >> * Should we allow non-ASCII characters in general source files?
> > Prefer "no" here.
>
> I only see two reasons for non-ASCII signs in English.
> Either it is a foreign name of e.g. a person
> or it is a word that English took from French like in déjà vu.
I'd like my name accented in the release notes, thanks.
--
Álvaro Herrera <[email protected]>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-05-31 20:25 Peter Eisentraut <[email protected]>
parent: Tom Lane <[email protected]>
2 siblings, 0 replies; 10+ messages in thread
From: Peter Eisentraut @ 2011-05-31 20:25 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: pgsql-docs
On fre, 2011-05-20 at 07:56 -0400, Tom Lane wrote:
> > * Should we allow non-ASCII characters in general source
> files?
>
> Prefer "no" here.
Going through this I felt a little bad butchering up people's names that
hadn't bothered anyone before now. So as a compromise, I made
contributor names UTF-8 consistently, but removed other uses of
non-ASCII characters.
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-06-01 19:28 Peter Eisentraut <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 0 replies; 10+ messages in thread
From: Peter Eisentraut @ 2011-06-01 19:28 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Tom Lane <[email protected]>; pgsql-docs
On fre, 2011-05-20 at 08:16 -0400, Alvaro Herrera wrote:
> > > * Should we allow/use non-ASCII characters in the release
> notes?
> > > * What encoding should the HISTORY file have?
> >
> > Ideally "sure, if entity-ified", but I don't know what to do about
> > HISTORY.
>
> Can we recode that to plain ascii? I think iconv has a //TRANSLIT
> flag or something like that.
To make this work on FreeBSD, where we build the releases, we need to
use the following command:
"/usr/bin/perl" -p -e 's/<H(1|2)$/<H\1 align=center/g' HISTORY.html | LC_ALL=en_US.ISO8859-1 lynx -force_html -dump -nolist -stdin | iconv -f latin1 -t us-ascii//TRANSLIT > HISTORY
This also works on Linux/glibc, but FreeBSD is a bit stricter/more
limited. Not sure about other platforms, but I'd guess if they don't
have the required locales, they'd be no worse off than now anyway.
The results are reasonable. It actually depends on the platform
what //TRANSLIT does, e.g. on FreeBSD ö -> "o, on Linux ö -> o.
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-10-12 21:21 Bruce Momjian <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 10+ messages in thread
From: Bruce Momjian @ 2011-10-12 21:21 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Susanne Ebrecht <[email protected]>; Tom Lane <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-docs
Alvaro Herrera wrote:
> Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011:
> > On 20.05.2011 13:56, Tom Lane wrote:
> > >> * Should we allow non-ASCII characters in general source files?
> > > Prefer "no" here.
> >
> > I only see two reasons for non-ASCII signs in English.
> > Either it is a foreign name of e.g. a person
> > or it is a word that English took from French like in déjà vu.
>
> I'd like my name accented in the release notes, thanks.
Sure, you want the first "A" in Alvaro with an accent. I would love to
backpatch that but it would be royal pain. I am afraid it can only
easily be done in future release notes.
I have added the proper markup to our release note checklist; patch
attached. Does anyone else want special handling for their name?
--
Bruce Momjian <[email protected]> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Attachments:
[text/x-diff] /rtmp/accent (502B, 2-%2Frtmp%2Faccent)
download | inline diff:
diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml
new file mode 100644
index 15f273c..c860b90
*** a/doc/src/sgml/release.sgml
--- b/doc/src/sgml/release.sgml
*************** non-ASCII characters convert
*** 27,32 ****
--- 27,34 ----
does not support it
http://www.pemberley.com/janeinfo/latin1.html#latexta
+ Alvaro Herrera is Álvaro Herrera
+
wrap long lines
For new features, add links to the documentation sections. Use </link>
^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: non-ASCII characters in SGML documentation (and elsewhere)
@ 2011-10-24 15:28 Alvaro Herrera <[email protected]>
parent: Bruce Momjian <[email protected]>
0 siblings, 0 replies; 10+ messages in thread
From: Alvaro Herrera @ 2011-10-24 15:28 UTC (permalink / raw)
To: Bruce Momjian <[email protected]>; +Cc: Susanne Ebrecht <[email protected]>; Tom Lane <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-docs
Excerpts from Bruce Momjian's message of mié oct 12 18:21:19 -0300 2011:
> Alvaro Herrera wrote:
> > Excerpts from Susanne Ebrecht's message of vie may 20 09:04:26 -0400 2011:
> > > On 20.05.2011 13:56, Tom Lane wrote:
> > > >> * Should we allow non-ASCII characters in general source files?
> > > > Prefer "no" here.
> > >
> > > I only see two reasons for non-ASCII signs in English.
> > > Either it is a foreign name of e.g. a person
> > > or it is a word that English took from French like in dj vu.
> >
> > I'd like my name accented in the release notes, thanks.
>
> Sure, you want the first "A" in Alvaro with an accent. I would love to
> backpatch that but it would be royal pain. I am afraid it can only
> easily be done in future release notes.
Many thanks, Bruce.
--
Álvaro Herrera <[email protected]>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
^ permalink raw reply [nested|flat] 10+ messages in thread
end of thread, other threads:[~2011-10-24 15:28 UTC | newest]
Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 21:49 non-ASCII characters in SGML documentation (and elsewhere) Peter Eisentraut <[email protected]>
2011-05-20 08:44 ` Susanne Ebrecht <[email protected]>
2011-05-20 11:56 ` Tom Lane <[email protected]>
2011-05-20 12:16 ` Alvaro Herrera <[email protected]>
2011-06-01 19:28 ` Peter Eisentraut <[email protected]>
2011-05-20 13:04 ` Susanne Ebrecht <[email protected]>
2011-05-20 17:31 ` Alvaro Herrera <[email protected]>
2011-10-12 21:21 ` Bruce Momjian <[email protected]>
2011-10-24 15:28 ` Alvaro Herrera <[email protected]>
2011-05-31 20:25 ` Peter Eisentraut <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox