Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sVs1A-00C5R8-9W for pgsql-general@arkaria.postgresql.org; Mon, 22 Jul 2024 12:19:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sVs18-000kzT-4V for pgsql-general@arkaria.postgresql.org; Mon, 22 Jul 2024 12:19:50 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sVs17-000kzL-G6 for pgsql-general@lists.postgresql.org; Mon, 22 Jul 2024 12:19:50 +0000 Received: from mail-qk1-x72f.google.com ([2607:f8b0:4864:20::72f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sVs13-000r2j-VE for pgsql-general@lists.postgresql.org; Mon, 22 Jul 2024 12:19:48 +0000 Received: by mail-qk1-x72f.google.com with SMTP id af79cd13be357-79f06c9c929so315637285a.0 for ; Mon, 22 Jul 2024 05:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1721650785; x=1722255585; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=tN6vrBOiGQR3bfGXa6KaUzszVCQQSr61LoPOKwkviNs=; b=ECb4GbAhZQaSDBs6bjAuM2xcsqe+sTQZXigIBjDVN/HzKf8h8rUk5qv0EYRciNg56P BTumQRCHpnnZ3ata/q+6Q2kVbT1HV/L8QEAOWBTuUQpcjsmmsWnOl+kg8oX/RdM7zUqu qVZewbeOv6GFVGwF1L3XHjE1ug36BHXfbKvoSJZaNljdNKUFKA04106j+xHLx7Xmy04V atGBsvSH0QcHvPH4+UBOeXukDhVV0lfbtiNex1CvqB55bGHN14zWDmbV6jenmO1m9yav pBU+jD/fOL7/DTbLKulkQAqElOCSRfBfzP/XzVUF5rMgmTYaqTGGXMgBYuDYXucbfwIZ Hekg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721650785; x=1722255585; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tN6vrBOiGQR3bfGXa6KaUzszVCQQSr61LoPOKwkviNs=; b=U9LLHBoD8X+ZBhDaRit6n3VS5npKR5isvymzh6A3xF5TfGL9EjpRatb+y9T7d4jAaw k86USe0lvvy7ObeOGy8ye/jM/Rriq+LFkqxG/he1oVNkbcq+1i8P49BV7gtUpnS+F9oQ MPOV8TSZYRl3TxLqZCzUF9rtXK7DYdk2vtHx8i8MLlRywz3yhpgnVuinbkrw1YCEPYgL N8GKOX++XiAn1Xp4B2dbrEC7hG7IM5vD9bdamUrmOwD/u+17gU6xnndb+U2Mz1qc+pO6 7km3Tb8AOtk9ReBtawLi+26OpB2hP9XjYt0/X4AQ/AOcR72A0rBee8Yo0hLUBZq8BnOP bmFA== X-Forwarded-Encrypted: i=1; AJvYcCVcTjyz5NhBjKzlh7ScFMOBhJhQeKfpKC3N1pgFzMlbz+Up8RUoV1L+Rucm912xlNq2yxEtn4cNbdMHdVn2usN0NIkuCj0Tv4r6TmhWdXxCUx6O X-Gm-Message-State: AOJu0Yz3pTsI1+vJ7YRIPGMFQoZFk1gfFr97Jnaf1x7Xe42Y1iySq/Js nXgc0kC5nK5tREJaH4GsYBMoT00DdgpzGN8FcwS/oishQneo69xStxnMKRMOnzDws3bMkb6lnTg zJkdPkxewYsrXsrsdhEP0M++BhSJ7ec62X16F X-Google-Smtp-Source: AGHT+IGKRwp5ICA3ifCVJo+ZQZTw5SKbSmM4G2CcF4ln2+lXi+uck4fuS+ziXJzPeEA8UfPjNd/W+i+26pQ0ThA7Dhg= X-Received: by 2002:a05:620a:40ce:b0:79e:fd1e:6fc4 with SMTP id af79cd13be357-7a1a197540bmr997552285a.34.1721650784730; Mon, 22 Jul 2024 05:19:44 -0700 (PDT) MIME-Version: 1.0 References: <80c9b0ea-c874-40ad-a006-fb1eb37464c2@aklaver.com> <44b44ece-dce6-4b4f-b751-8787a5a071e0@aklaver.com> In-Reply-To: From: Dave Page Date: Mon, 22 Jul 2024 13:19:33 +0100 Message-ID: Subject: Re: Windows installation problem at post-install step To: Sandeep Thakkar Cc: Thomas Munro , =?UTF-8?B?RXJ0YW4gS8O8w6fDvGtvZ2x1?= , Adrian Klaver , pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000073a2e5061dd50fd1" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000073a2e5061dd50fd1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi On Mon, Jul 22, 2024 at 1:02=E2=80=AFPM Sandeep Thakkar < sandeep.thakkar@enterprisedb.com> wrote: > > > On Mon, Jul 22, 2024 at 5:21=E2=80=AFPM Sandeep Thakkar < > sandeep.thakkar@enterprisedb.com> wrote: > >> Hi, >> >> EDB's windows installer gets the locales on the system using the >> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/script= s/windows/getlocales/getlocales.cpp and >> then substitute some patterns ( >> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/pgserv= er.xml.in#L2850) >> I'm not sure why we do that but that is the old code and probably @Dave >> Page may know but I'm not sure if that >> piece of code is responsible for this change in encoding in this case. >> > It was to work around limitations in the way we could return data from an external program to BitRock InstallBuilder. I forget the precise details as it was something like 15 years ago, but essentially BitRock couldn't read output that contained (certain?) non-alphanumeric characters, so I had to do that crazy encode/decode dance. > >> When I checked the installation log shared by Ertan, I do see that the >> locale passed to initcluster script is the same as returned by the >> getlocales executable. >> >> Executing C:\Windows\System32\cscript //NoLogo "C:\Program >> Files\PostgreSQL\16/installer/server/initcluster.vbs" "NT >> AUTHORITY\NetworkService" "postgres" "****" >> "C:\Users\User1\AppData\Local\Temp/postgresql_installer_cd79fad8b7" >> "C:\Program Files\PostgreSQL\16" "C:\DATA_PG16" 5432 "Turkish,T=C3=BCrki= ye" 0 >> >> Apology about the top posting. Please ignore this thread. I've replied t= o > another thread. > > >> On Mon, Jul 22, 2024 at 6:43=E2=80=AFAM Thomas Munro >> wrote: >> >>> On Mon, Jul 22, 2024 at 11:58=E2=80=AFAM Ertan K=C3=BC=C3=A7=C3=BCkoglu >>> wrote: >>> > Thomas Munro , 21 Tem 2024 Paz, 23:27 >>> tarihinde =C5=9Funu yazd=C4=B1: >>> >> 2. Some existing database clusters which had been installed with th= e >>> >> name "Turkish_Turkey.1254" became unstartable when the OS upgrade >>> >> renamed that locale to "Turkish_T=C3=BCrkiye.1254". I'm trying to p= rovide >>> >> a pathway[2] to fix such systems in core PostgreSQL in the next mino= r >>> >> release. Everyone affected probably already found another way but a= t >>> >> least next time a country is renamed this might help with the next >>> >> point too. >>> > >>> > I was also hit by that OS update. >>> > There is a Microsoft tool for creating a locale installer >>> > https://www.microsoft.com/en-us/download/details.aspx?id=3D41158 >>> > Using that tool and adding a second locale Turkish_Turkey.1254 (name >>> before Microsoft update) in the OS can fix your broken PostgreSQL. >>> > I believe most people simply choose this path. >>> > There are also several blogs/articles written in Turkish about the >>> problem. >>> >>> If that's easy and good enough then maybe I should abandon that >>> on-the-fly renaming patch and we should just do a little documentation >>> note... >>> >>> >> 3. I'd also like to teach initdb to use BCP47 names like "tr-TR" >>> >> instead of those names by default (ie if you don't specify a locale >>> >> name explicitly), and have proposed that before[3] but it hasn't gon= e >>> >> in due to lack of testing/reviews from Windows users. It seems like >>> >> that doesn't matter much in practice to all the people using the >>> >> popular EDB installer, since it apparently takes control of picking >>> >> the locale and explicitly passes it in (and screws up the encoding a= s >>> >> we have now learned). >>> > >>> > If I am not mistaken BCP47 names are already used in Linux systems. >>> > Using them would make PostgreSQL use the same locale names across >>> Linux and Windows systems. >>> >>> Not exactly. POSIX systems use >>> [language[_territory][.codeset][@modifier]], but POSIX doesn't say >>> what any of those components are[1] (are they ISO country codes? >>> English words? Hieroglyphs?), so, curiously, those Windows names like >>> "English_United States.1252" are probably POSIX-conforming. Every >>> real POSIX system of course uses ISO language and country codes these >>> days (though I still recall other names being used years ago), so they >>> look similar to the simpler kinds of BCP47 tags, which are just >>> language-country with the same ISO codes but a different separator. >>> They diverge further once you get into the finer points with more >>> components. Incidentally that lack of standardisation is the reason >>> you can't say that the glibc ".utf8" ending is "wrong", even though it >>> is obviously stupid :-p (all systems I know accept .UTF-8, 'cause >>> that's what Ken Thompson, Rob Pike and the Unicode standard called >>> it). I suspect that Windows accepts the POSIX style en_US too, but >>> it's not what the manual tells you to use. >>> >>> But really we shouldn't have to know or care how locales are named; we >>> should get the names from the OS in the first place, and then we >>> should remember them and give them back to the OS at the right times. >>> The two problems here is that Windows has two kinds, one unstable over >>> time and with illegal (for us) characters in the name, and one stable; >>> we need to find all the places where the old unstable ones can get >>> into our system, and block them off. I'm aware of two places now: the >>> EDB installer, and initdb's default for people who run it on the >>> command line with giving an explicit name. >>> >>> > I can help with the testing part. Let me know the details, please. >>> >>> Thanks! I will rebase that patch, and CC you on the thread. >>> >>> [1] >>> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.htm= l >>> >> >> >> -- >> Sandeep Thakkar >> >> >> > > -- > Sandeep Thakkar > > > --=20 Dave Page VP, Chief Architect, Database Infrastructure EDB: https://www.enterprisedb.com --00000000000073a2e5061dd50fd1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

On Mon, Jul 22, 2024 at 1:02=E2=80=AFPM Sandeep Tha= kkar <sandeep.thakka= r@enterprisedb.com> wrote:

<= /div>

On Mon, Jul 22, 2024 at 5:21=E2=80=AFPM Sandeep Thakkar <sandeep.thakkar= @enterprisedb.com> wrote:
Hi,

EDB's windows installer gets t= he locales on the system using the=C2=A0https://github.com/EnterpriseDB/edb-installers/= blob/REL-16/server/scripts/windows/getlocales/getlocales.cpp=C2=A0and t= hen substitute=C2=A0some patterns (
https://github.com/EnterpriseDB/edb-installers/blob/REL-16/serv= er/pgserver.xml.in#L2850) I'm not sure why we do that but that is t= he old code and probably=C2=A0@Dave Page=C2=A0 may know but I'= ;m not sure if that piece of code is responsible for this change in encodin= g=C2=A0in this case.
<= div>
It was to work around limitations in the way we could re= turn data from an external program to BitRock InstallBuilder. I forget the = precise details as it was something like 15 years ago, but essentially BitR= ock couldn't read output that contained (certain?) non-alphanumeric cha= racters, so I had to do that crazy encode/decode dance.
=C2=A0

When I checked the installation log shared = by Ertan, I do see that the locale passed=C2=A0to initcluster script is the= same as returned by the getlocales executable.

Executing C:\Windows= \System32\cscript //NoLogo "C:\Program Files\PostgreSQL\16/installer/s= erver/initcluster.vbs" "NT AUTHORITY\NetworkService" "p= ostgres" "****" "C:\Users\User1\AppData\Local\Temp/post= gresql_installer_cd79fad8b7" "C:\Program Files\PostgreSQL\16"= ; "C:\DATA_PG16" 5432 "Turkish,T=C3=BCrkiye" 0

Apology about the top posting. Please ignore = this thread. I've replied to another thread.
=C2=A0
On Mon, Jul 22, 2024 at 6:43=E2=80=AFAM Thomas Munro <thomas.munro@gmail.com>= wrote:
On Mon, Jul 22, 2024 at 11:58=E2=80=AFAM = Ertan K=C3=BC=C3=A7=C3=BCkoglu
<ertan.ku= cukoglu@gmail.com> wrote:
> Thomas Munro <thomas.munro@gmail.com>, 21 Tem 2024 Paz, 23:27 tarihinde =C5= =9Funu yazd=C4=B1:
>> 2.=C2=A0 Some existing database clusters which had been installed = with the
>> name "Turkish_Turkey.1254" became unstartable when the O= S upgrade
>> renamed that locale to "Turkish_T=C3=BCrkiye.1254".=C2= =A0 I'm trying to provide
>> a pathway[2] to fix such systems in core PostgreSQL in the next mi= nor
>> release.=C2=A0 Everyone affected probably already found another wa= y but at
>> least next time a country is renamed this might help with the next=
>> point too.
>
> I was also hit by that OS update.
> There is a Microsoft tool for creating a locale installer
> https://www.microsoft.com/en-us= /download/details.aspx?id=3D41158
> Using that tool and adding a second locale Turkish_Turkey.1254 (name b= efore Microsoft update) in the OS can fix your broken PostgreSQL.
> I believe most people simply choose this path.
> There are also several blogs/articles written in Turkish about the pro= blem.

If that's easy and good enough then maybe I should abandon that
on-the-fly renaming patch and we should just do a little documentation
note...

>> 3.=C2=A0 I'd also like to teach initdb to use BCP47 names like= "tr-TR"
>> instead of those names by default (ie if you don't specify a l= ocale
>> name explicitly), and have proposed that before[3] but it hasn'= ;t gone
>> in due to lack of testing/reviews from Windows users.=C2=A0 It see= ms like
>> that doesn't matter much in practice to all the people using t= he
>> popular EDB installer, since it apparently takes control of pickin= g
>> the locale and explicitly passes it in (and screws up the encoding= as
>> we have now learned).
>
> If I am not mistaken BCP47 names are already used in Linux systems. > Using them would make PostgreSQL use the same locale names across Linu= x and Windows systems.

Not exactly.=C2=A0 POSIX systems use
[language[_territory][.codeset][@modifier]], but POSIX doesn't say
what any of those components are[1] (are they ISO country codes?
English words?=C2=A0 Hieroglyphs?), so, curiously, those Windows names like=
"English_United States.1252" are probably POSIX-conforming.=C2=A0= Every
real POSIX system of course uses ISO language and country codes these
days (though I still recall other names being used years ago), so they
look similar to the simpler kinds of BCP47 tags, which are just
language-country with the same ISO codes but a different separator.
They diverge further once you get into the finer points with more
components.=C2=A0 Incidentally that lack of standardisation is the reason you can't say that the glibc ".utf8" ending is "wrong&qu= ot;, even though it
is obviously stupid :-p (all systems I know accept .UTF-8, 'cause
that's what Ken Thompson, Rob Pike and the Unicode standard called
it).=C2=A0 I suspect that Windows accepts the POSIX style en_US too, but it's not what the manual tells you to use.

But really we shouldn't have to know or care how locales are named; we<= br> should get the names from the OS in the first place, and then we
should remember them and give them back to the OS at the right times.
The two problems here is that Windows has two kinds, one unstable over
time and with illegal (for us) characters in the name, and one stable;
we need to find all the places where the old unstable ones can get
into our system, and block them off.=C2=A0 I'm aware of two places now:= the
EDB installer, and initdb's default for people who run it on the
command line with giving an explicit name.

> I can help with the testing part. Let me know the details, please.

Thanks!=C2=A0 I will rebase that patch, and CC you on the thread.

[1] https://pubs.opengroup.o= rg/onlinepubs/9699919799/basedefs/V1_chap08.html


--
Sandeep Thakkar




--
Sandeep Thakkar




--
Dave Page
VP, Chief Architect, Database InfrastructureEDB:=C2=A0htt= ps://www.enterprisedb.com

--00000000000073a2e5061dd50fd1--