public inbox for [email protected]  
help / color / mirror / Atom feed
requiring all .po files be UTF8-encoded
12+ messages / 5 participants
[nested] [flat]

* requiring all .po files be UTF8-encoded
@ 2025-12-10 16:47  Álvaro Herrera <[email protected]>
  0 siblings, 2 replies; 12+ messages in thread

From: Álvaro Herrera @ 2025-12-10 16:47 UTC (permalink / raw)
  To: [email protected]

Hello,

There's an ongoing project to add a regression test to ensure all
platforms are correctly handling translations. [1]

[1] https://postgr.es/m/[email protected]

The conversation there is leading to requiring all translation files use
UTF-8 encoding.  In practice all live files already are UTF8 [2], so
there's no new requirement; but I think we should add some enforcing
mechanism (maybe a git hook and/or the website-building script refusing
to use a nonconformant file) to ensure we don't break things going
forward.

We can trivially convert all existing files with this oneliner:

for i in $(git grep 'Content-Type:' | grep -v UTF-8 | cut -d: -f1); do msgconv -t UTF-8 $i | sponge $i; done


[2] Actually there is one exception -- nb/pg_config.po.  However, this
file is under the 80% translation requirement, so it should be removed;
moreover the Norwegian translation seems abandoned, having been done for
7.4, with a single update for 8.1 (pg_config.po) and never again touched.
Maybe we should remove all these files.

We have a few more dead languages: af fa fe fr fy id nl sk sl ta ro.
I guess we display those in our translation table out of a stubborn
expectation that translators will magically show up.  I think I would
rather hide those and not take up screen space -- maybe write two pages,
the normal main one without those languages, and a separate one that
includes them.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-10 16:53  Tom Lane <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Tom Lane @ 2025-12-10 16:53 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: [email protected]

=?utf-8?Q?=C3=81lvaro?= Herrera <[email protected]> writes:
> We have a few more dead languages: af fa fe fr fy id nl sk sl ta ro.

Surely 'fr' is not dead?  It looks like its main backend translation
list has fallen a couple notches below 80%, but I cannot imagine that
we have no francophones willing to maintain it anymore.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-10 17:06  Tom Lane <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  1 sibling, 0 replies; 12+ messages in thread

From: Tom Lane @ 2025-12-10 17:06 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: [email protected]

=?utf-8?Q?=C3=81lvaro?= Herrera <[email protected]> writes:
> There's an ongoing project to add a regression test to ensure all
> platforms are correctly handling translations. [1]
> [1] https://postgr.es/m/[email protected]

For the list archives' sake: that link seems wrong.  A more
relevant discussion for translation purposes is

https://www.postgresql.org/message-id/flat/292844.1765315339%40sss.pgh.pa.us

in which we discovered that Solaris' gettext implementation
doesn't handle transcoding of .mo files, and that we have to
know at build time which encoding they're in so as to create
appropriate symlinks.  Rather than write code to extract that
information, I proposed that we just institute a policy that
all our .po files should be in UTF-8.  We're apparently nearly
there already so far as actively-maintained .po files go.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-10 17:46  Álvaro Herrera <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Álvaro Herrera @ 2025-12-10 17:46 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On 2025-Dec-10, Tom Lane wrote:

> =?utf-8?Q?=C3=81lvaro?= Herrera <[email protected]> writes:
> > We have a few more dead languages: af fa fe fr fy id nl sk sl ta ro.
> 
> Surely 'fr' is not dead?  It looks like its main backend translation
> list has fallen a couple notches below 80%, but I cannot imagine that
> we have no francophones willing to maintain it anymore.

Weird, I mistyped "he hr hu" as "fe fr fy".  The right list of dead
translations is
 af fa he hr hu id nb nl ro sk sl ta

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-10 21:19  Guillaume Lelarge <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Guillaume Lelarge @ 2025-12-10 21:19 UTC (permalink / raw)
  To: [email protected]

On 10/12/2025 18:46, Álvaro Herrera wrote:
> On 2025-Dec-10, Tom Lane wrote:
> 
>> =?utf-8?Q?=C3=81lvaro?= Herrera <[email protected]> writes:
>>> We have a few more dead languages: af fa fe fr fy id nl sk sl ta ro.
>>
>> Surely 'fr' is not dead?  It looks like its main backend translation
>> list has fallen a couple notches below 80%, but I cannot imagine that
>> we have no francophones willing to maintain it anymore.
> 
> Weird, I mistyped "he hr hu" as "fe fr fy".  The right list of dead
> translations is
>   af fa he hr hu id nb nl ro sk sl ta
> 

FWIW, I still work on the french translation of the .po files, though I 
don't work anymore on the main backend translation (postgres.po). It's 
way too complicated to do a good translation of server messages, and it 
looks to me more important to do translation on the tools (psql, 
pg_basebackup, etc). So I work only on those translations.

If someone else wants to work on the main backend translation, that's 
fine by me, but I won't do it (though I can commit this translation).


-- 
Guillaume Lelarge
Consultant
https://dalibo.com





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-11 00:15  Michael Paquier <[email protected]>
  parent: Guillaume Lelarge <[email protected]>
  0 siblings, 2 replies; 12+ messages in thread

From: Michael Paquier @ 2025-12-11 00:15 UTC (permalink / raw)
  To: Guillaume Lelarge <[email protected]>; +Cc: [email protected]

On Wed, Dec 10, 2025 at 10:19:15PM +0100, Guillaume Lelarge wrote:
> FWIW, I still work on the french translation of the .po files, though I
> don't work anymore on the main backend translation (postgres.po). It's way
> too complicated to do a good translation of server messages, and it looks to
> me more important to do translation on the tools (psql, pg_basebackup, etc).
> So I work only on those translations.
> 
> If someone else wants to work on the main backend translation, that's fine
> by me, but I won't do it (though I can commit this translation).

About how many messages are we talking about here?  It seems like I'm
one of these guys where the changes may not be that complicated to
translate based on how I'm dealing with the backend area on a daily
basis while being a native French speaker.  I am not much into the
latest evolutions of French in the last 10 years or so and I am seeing
a lot of English-ism these days, but the past messages should give a
reference good enough to provide consistent translations for the new
ones.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-11 06:51  Álvaro Herrera <[email protected]>
  parent: Michael Paquier <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Álvaro Herrera @ 2025-12-11 06:51 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Guillaume Lelarge <[email protected]>; [email protected]

On 2025-Dec-11, Michael Paquier wrote:

> On Wed, Dec 10, 2025 at 10:19:15PM +0100, Guillaume Lelarge wrote:

> > If someone else wants to work on the main backend translation, that's fine
> > by me, but I won't do it (though I can commit this translation).
> 
> About how many messages are we talking about here?

postgres.po has 6411 strings.  The French translation is currently at
78%, and msgfmt says:

5096 translated messages, 988 fuzzy translations, 327 untranslated messages.

My impression is that the postgres.po catalog drops about 8%-10% for
each major release.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
 algunas personas nos parecen brillantes un minuto antes
 de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re[2]: requiring all .po files be UTF8-encoded
@ 2025-12-11 11:45  Pavlo Golub <[email protected]>
  parent: Michael Paquier <[email protected]>
  1 sibling, 0 replies; 12+ messages in thread

From: Pavlo Golub @ 2025-12-11 11:45 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; Guillaume Lelarge <[email protected]>; +Cc: [email protected]

Hi

>About how many messages are we talking about here?  It seems like I'm
>one of these guys where the changes may not be that complicated to
>translate based on how I'm dealing with the backend area on a daily
>basis while being a native French speaker.  I am not much into the
>latest evolutions of French in the last 10 years or so and I am seeing
>a lot of English-ism these days, but the past messages should give a
>reference good enough to provide consistent translations for the new
>ones.

When I stuck with the term, I often use Microsoft language resources 
[1].

They are quite in a good shape for popular languages, including FR.
There is on-line tool to search source terms and translated terms [2].
TBX files are available as well [3]. We use these glossaries as a part 
of our translation workflow on Crowdin [4].

[1] 
https://learn.microsoft.com/en-us/globalization/reference/microsoft-language-resources
[2] 
https://msit.powerbi.com/view?r=eyJrIjoiODJmYjU4Y2YtM2M0ZC00YzYxLWE1YTktNzFjYmYxNTAxNjQ0IiwidCI6Ijcy...
[3] 
https://download.microsoft.com/download/b/2/d/b2db7a7c-8d33-47f3-b2c1-ee5e6445cf45/MicrosoftTermColl...
[4] https://crowdin.com/project/postgresql

Best regards,
Pavlo





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-12 01:14  Michael Paquier <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Michael Paquier @ 2025-12-12 01:14 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Guillaume Lelarge <[email protected]>; [email protected]

On Thu, Dec 11, 2025 at 07:51:33AM +0100, Alvaro Herrera wrote:
> postgres.po has 6411 strings.  The French translation is currently at
> 78%, and msgfmt says:
> 
> 5096 translated messages, 988 fuzzy translations, 327 untranslated messages.
> 
> My impression is that the postgres.po catalog drops about 8%-10% for
> each major release.

Okay, I am new to this business, still I can see what Guillaume has
been doing in the repo for these things in the fr translation, which
is pgtranslation/messages.git, with master pointing to upstream
REL_18_STABLE, I guess.

I am also guessing that most folks just rely on something like po-mode
on emacs, which is available here:
https://manpage.me/docs/sharedocs/gettext/gettext_8.html

I have quickly tested it and that feels natural, I am not used to the
shortcuts yet but the docs are pretty clear.

I have a couple of stupid questions.  What's the flow of a translation
refresh?  Are the files first generated from the top of a stable
branch in the main Postgres repository using `make update-po` on a
periodic basis, then copied back to the translation repository for
further edits (line number updates, etc.)?  Is it more common to edit
the existing files in place, without pulling them from the main
repository?

Guillaume, I am likely not going to get that right in the first shot.
Would you mind reviewing some of the stuff?  Would it be OK to just
send a patch on this list?  If you have a po file that could serve as
a good first example, feel free to offer a suggestion or I would just
pick up one.  Say only for a couple of entries to get the full idea of
how things work.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-12 11:05  Guillaume Lelarge <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Guillaume Lelarge @ 2025-12-12 11:05 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; Álvaro Herrera <[email protected]>; +Cc: [email protected]

On 12/12/2025 02:14, Michael Paquier wrote:
> On Thu, Dec 11, 2025 at 07:51:33AM +0100, Alvaro Herrera wrote:
>> postgres.po has 6411 strings.  The French translation is currently at
>> 78%, and msgfmt says:
>>
>> 5096 translated messages, 988 fuzzy translations, 327 untranslated messages.
>>
>> My impression is that the postgres.po catalog drops about 8%-10% for
>> each major release.
> 
> Okay, I am new to this business, still I can see what Guillaume has
> been doing in the repo for these things in the fr translation, which
> is pgtranslation/messages.git, with master pointing to upstream
> REL_18_STABLE, I guess.
> 

That's right.

> I am also guessing that most folks just rely on something like po-mode
> on emacs, which is available here:
> https://manpage.me/docs/sharedocs/gettext/gettext_8.html
> 

I don't know about that. I'm using poedit.

> I have quickly tested it and that feels natural, I am not used to the
> shortcuts yet but the docs are pretty clear.
> 
> I have a couple of stupid questions.  What's the flow of a translation
> refresh?  Are the files first generated from the top of a stable
> branch in the main Postgres repository using `make update-po` on a
> periodic basis, then copied back to the translation repository for
> further edits (line number updates, etc.)?  Is it more common to edit
> the existing files in place, without pulling them from the main
> repository?
> 

Sounds more like question for Peter or Alvaro :)

> Guillaume, I am likely not going to get that right in the first shot.
> Would you mind reviewing some of the stuff?  Would it be OK to just
> send a patch on this list?  If you have a po file that could serve as
> a good first example, feel free to offer a suggestion or I would just
> pick up one.  Say only for a couple of entries to get the full idea of
> how things work.

I don't mind reviewing, though you should not send a patch. You send the 
whole file. Otherwise, it's a complete nightmare.

There's not really a "good first example". Just pick the first small 
file you can find, and go with it.

Thanks.


-- 
Guillaume Lelarge
Consultant
https://dalibo.com





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-12 13:04  Álvaro Herrera <[email protected]>
  parent: Guillaume Lelarge <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Álvaro Herrera @ 2025-12-12 13:04 UTC (permalink / raw)
  To: Guillaume Lelarge <[email protected]>; +Cc: Michael Paquier <[email protected]>; [email protected]

On 2025-Dec-12, Guillaume Lelarge wrote:

> On 12/12/2025 02:14, Michael Paquier wrote:

> > Guillaume, I am likely not going to get that right in the first shot.
> > Would you mind reviewing some of the stuff?  Would it be OK to just
> > send a patch on this list?  If you have a po file that could serve as
> > a good first example, feel free to offer a suggestion or I would just
> > pick up one.  Say only for a couple of entries to get the full idea of
> > how things work.
> 
> I don't mind reviewing, though you should not send a patch. You send the
> whole file. Otherwise, it's a complete nightmare.

This is why I do separate commits with mechanical updates, then further
commits with translation updates.  By separating the two, it's actually
very easy to review the new translations.

I have a bunch of scripts with which I've formed what I find is a
practical workflow for people using old-school text editors.  (I use
Vim, but I imagine they would apply equally well to Emacs).  If anybody
is interested, I can share them.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: requiring all .po files be UTF8-encoded
@ 2025-12-14 23:42  Michael Paquier <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 0 replies; 12+ messages in thread

From: Michael Paquier @ 2025-12-14 23:42 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Guillaume Lelarge <[email protected]>; [email protected]

On Fri, Dec 12, 2025 at 02:04:45PM +0100, Alvaro Herrera wrote:
> This is why I do separate commits with mechanical updates, then further
> commits with translation updates.  By separating the two, it's actually
> very easy to review the new translations.

So do you pull into the translation tree the .po files generated in
the main Postgres repository on a regular basis?  As far as I can see
from the commit history, that's what is happening.

> I have a bunch of scripts with which I've formed what I find is a
> practical workflow for people using old-school text editors.  (I use
> Vim, but I imagine they would apply equally well to Emacs).  If anybody
> is interested, I can share them.

Yes, I'd be interested to look at what you have, for inspiration.  At
least it would be good to not have to duplicate the work that's
required to copy the files from the main tree after a update-po back 
to the translation repo.  The names of the .po files in the
translation repo are based on the paths to the .po files in the main
tree, as far as I can see.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread


end of thread, other threads:[~2025-12-14 23:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-10 16:47 requiring all .po files be UTF8-encoded Álvaro Herrera <[email protected]>
2025-12-10 16:53 ` Tom Lane <[email protected]>
2025-12-10 17:46   ` Álvaro Herrera <[email protected]>
2025-12-10 21:19     ` Guillaume Lelarge <[email protected]>
2025-12-11 00:15       ` Michael Paquier <[email protected]>
2025-12-11 06:51         ` Álvaro Herrera <[email protected]>
2025-12-12 01:14           ` Michael Paquier <[email protected]>
2025-12-12 11:05             ` Guillaume Lelarge <[email protected]>
2025-12-12 13:04               ` Álvaro Herrera <[email protected]>
2025-12-14 23:42                 ` Michael Paquier <[email protected]>
2025-12-11 11:45         ` Re[2]: requiring all .po files be UTF8-encoded Pavlo Golub <[email protected]>
2025-12-10 17:06 ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox