Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uNHeY-00GAHL-G5 for pgsql-general@arkaria.postgresql.org; Thu, 05 Jun 2025 20:57:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uNHeW-00EvEP-B8 for pgsql-general@arkaria.postgresql.org; Thu, 05 Jun 2025 20:57:32 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uNHeW-00EvDQ-0G for pgsql-general@lists.postgresql.org; Thu, 05 Jun 2025 20:57:32 +0000 Received: from dverite2024.planet-service.net ([185.16.44.252] helo=mail.verite.pro) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uNHeU-000S3r-2C for pgsql-general@postgresql.org; Thu, 05 Jun 2025 20:57:31 +0000 Received: by mail.verite.pro (Postfix, from userid 1000) id 5624B2C056A; Thu, 5 Jun 2025 22:57:27 +0200 (CEST) Content-Type: text/plain; charset="iso-8859-15" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: LOCALE C.UTF-8 on EDB Windows v17 server From: "Daniel Verite" To: "Dominique Devienne" Cc: "Laurenz Albe" ,pgsql-general@postgresql.org In-Reply-To: Date: Thu, 05 Jun 2025 22:57:24 +0200 Message-Id: X-Mailer: Manitou v1.7.3 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Dominique Devienne wrote: > So you're saying datcollate and datctype from pg_database are > irrelevant to PostgreSQL itself, and only extensions might be affects? Almost. An exception that still exists in v18, as far as I can see [1], is the default full text search parser still using libc functions like iswdigit(), iswpunct(), iswspace()... that depend on LC_CTYPE.=20 So you could see differences between OSes in tsvector contents in a database with the builtin provider. Unless using LC_CTYPE=3DC. But then the parsing is suboptimal, since the parser does not recognize Unicode fancy punctuation signs or spaces as such. Personally I would still care to set LC_CTYPE to a reasonable UTF-8 locale with v17 or v18. [1] https://doxygen.postgresql.org/wparser__def_8c.html#a420ea398a8a11db92412a2= af7bf45e40 Best regards, --=20 Daniel V=E9rit=E9=20 https://postgresql.verite.pro/