Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbRQv-00Gbul-01 for pgsql-general@arkaria.postgresql.org; Tue, 06 Aug 2024 21:09:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbRQt-002oYg-Gi for pgsql-general@arkaria.postgresql.org; Tue, 06 Aug 2024 21:09:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbRQt-002oYY-69 for pgsql-general@lists.postgresql.org; Tue, 06 Aug 2024 21:09:27 +0000 Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sbRQr-003WVR-2Z for pgsql-general@lists.postgresql.org; Tue, 06 Aug 2024 21:09:26 +0000 Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-a7ac469e4c4so42555966b.0 for ; Tue, 06 Aug 2024 14:09:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722978563; x=1723583363; darn=lists.postgresql.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lAHFMLodVk4+atTCkgywdL10XzMF+JbNggwV8iDRYgU=; b=kbW0MltWxmajwfCvy1uv6Cbdhuwtpo1lm82mKFZFF2fhUobheS2K6+nPyGcSnbxcR9 fDJZ4Fx8Cv/0fcEgOEoj3fRQROPSzTx6ltA4JpZNgH+1rrjPqAhji34iCShTxahdfiyW JZFrozf54z655hj5bOH3ffJqp12o8inJeRmuzb6hQbfZnIWf7KSsDnui6nqyJrbcz9Ok vqYyuyfzOtbyWYKMT85x7e6ccYm1thTmy+NNkTGlc7abeEthRG1aiMFLqSyLThx/oIRq Rzv1KWmo0TQhwP6ScQoi0mALdAO7ynmZdv5LR59V+U6oLepflGpl4oUmI7WYwWsnekVN Bo+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722978563; x=1723583363; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lAHFMLodVk4+atTCkgywdL10XzMF+JbNggwV8iDRYgU=; b=Ka74PiPek4qEC3rolv8ciqKILQ8/FfoTOrXqdMHSCCa6cCHbviiCEHr9qXGdZ9ULjy 9J6C+oS7y+lubURW7gAhcUbuOUVOpwZpSChyybOTeUXf76zyRaeyPxczy997PeJJfLVD cFWiPwvz8Om1f1ZEi0ljKjrMq2+YubNmA+pmB58UrLfZHU3S/lvR7PymoMAqA4LVxNgp VQrSkuo9OXeMhQtJTW06Q9hp2iFfZXzK6W2j1YiagJLrUCdQOT6VSf48I3JKjX1kPTM/ iA7s9rT1Eubh/WGIU1ZTwwsvGtRmSsqOx1Ople22jAipJLEP+UjwH4Qm90tWZCkUOsWx 3blg== X-Gm-Message-State: AOJu0YwutW9zwtXZD1khSX70rfX0QzUbMwIaYErwtyl4RmwZlvLeugct 9x2RwKRapiP4jIJA1uOMkzusmZ0jl77nDle4YgKJvrT1e6heUQw0Amxe/SazZ6vY4n7Eo1V5jRE bVBSrz4yTlAxrW1E2LxAH5SoiG44gGG37LMw= X-Google-Smtp-Source: AGHT+IH2pJRn/thFkPSVN+GuYRxJ0l26toIBU3EA7Ug6eBXD7D/S3bZLbJGNBXdzdAAR7VSi6ntCJTov+ylbECv7DOQ= X-Received: by 2002:a17:907:9712:b0:a75:7a8:d70c with SMTP id a640c23a62f3a-a80790118a7mr10649866b.4.1722978562503; Tue, 06 Aug 2024 14:09:22 -0700 (PDT) MIME-Version: 1.0 References: <44b44ece-dce6-4b4f-b751-8787a5a071e0@aklaver.com> <20240806114405.qdwrok7ppfangowo@hjp.at> In-Reply-To: <20240806114405.qdwrok7ppfangowo@hjp.at> From: Thomas Munro Date: Wed, 7 Aug 2024 09:08:44 +1200 Message-ID: Subject: Re: Windows installation problem at post-install step To: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Aug 6, 2024 at 11:44=E2=80=AFPM Peter J. Holzer = wrote: > I assume that "1254" here is the code page. > But you specified --encoding=3DUTF-8 above, so your default locale uses a > different encoding than the template databases. I would expect that to > cause problems if the template databases contain any charecters where > the encodings differ (such as "=C3=BC" in the locale name). It's weird, but on Windows, PostgreSQL allows UTF-8 encoding with any locale, and thus apparent contradictions: /* See notes in createdb() to understand these tests */ if (!(locale_enc =3D=3D user_enc || locale_enc =3D=3D PG_SQL_ASCII || locale_enc =3D=3D -1 || #ifdef WIN32 user_enc =3D=3D PG_UTF8 || #endif user_enc =3D=3D PG_SQL_ASCII)) { pg_log_error("encoding mismatch"); ... and createdb's comments say that is acceptable because: * 3. selected encoding is UTF8 and platform is win32. This is because * UTF8 is a pseudo codepage that is supported in all locales since it's * converted to UTF16 before being used. At the time PostgreSQL was ported to Windows, UTF-8 was not a supported encoding in "char"-based system interfaces like strcoll_l(), and the port had to convert to "wchar_t" interfaces and call (in that example) wcscoll_l(). On modern Windows it is, and there are two locale names, with and without ".UTF-8" suffix (cf. glibc systems that have "en_US" and "en_US.UTF-8" where the suffix-less version uses whatever traditional encoding was used for that language before UTF-8 ate the world). If we were doing the Windows port today, we'd probably not have that special case for Windows, and we wouldn't have the wchar_t conversions. Then I think we'd allow only: --locale=3Dtr-TR (defaults to --encoding=3DWIN1254) --locale=3Dtr-TR --encoding=3DWIN1254 --locale-tr-TR.UTF-8 --locale=3Dtr-TR.UTF-8 --encoding=3DUTF-8 If we come up with an automated (or even manual but documented) way to perform the "Turkish_T=C3=BCrkiye.1254" -> "tr-TR" upgrade as Dave was suggesting upthread, we'll probably want to be careful to tidy up these contradictory settings. For example I guess that American databases initialised by EDB's installer must be using --locale=3D"English_United States.1252" and --encoding=3DUTF-8, and should be changed to "en-US.UTF-8", while those initialised by letting initdb.exe pick the encoding must be using --locale=3D"English_United States.1252" and --encoding=3DWIN1252 (implicit) and should be changed to "en-US" to match the WIN1252 encoding. Only if we did that update would we be able to consider removing the extra UTF-16 conversions that are happening very frequently inside PostgreSQL code, which is a waste of CPU cycles and programmer sanity. (But that's all just speculation from studying the locale code -- I've never really used Windows.)