Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCL41-00CJXE-RX for pgsql-hackers@arkaria.postgresql.org; Fri, 24 Oct 2025 16:54:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vCL3z-000IED-Pf for pgsql-hackers@arkaria.postgresql.org; Fri, 24 Oct 2025 16:54:50 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCL3z-000IE4-C5 for pgsql-hackers@lists.postgresql.org; Fri, 24 Oct 2025 16:54:50 +0000 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vCL3w-003ZW6-0k for pgsql-hackers@postgresql.org; Fri, 24 Oct 2025 16:54:49 +0000 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-2698d47e776so19426095ad.1 for ; Fri, 24 Oct 2025 09:54:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=j-davis-com.20230601.gappssmtp.com; s=20230601; t=1761324887; x=1761929687; darn=postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=6YTGYTR8UZsrLaJy8bJ10cjIfFmoAImLP2SfpL8jtF0=; b=lyPyt5ZiYpfKZ7TeRffEPIll3EsIAo4QifLhwiCA/iT4ftt3DY4CzMhDb5A404s7J9 ejX6HHBMW0PFoASLpAzVuSMuK+oOn+jfBqIMTEdltECtmEWvz0P1n955JqQcZBZPy92e BBU+/ZFtcRU4ln3yFRxW1/4r7hsHEH6hjSlugzxtwAs9iDxb7w0sHgabqLJwXTozsk0A Xuh2eQfJXGIZjJXeldEbaLAG/WiRgOwnjNRIqul2mTwIBahRNyQRqwf3PgpTBM5yNCxc XuQdp7s6iwq2TSI8O1XnQZZmVsqKY0wnmujqxRsPHqYUvtmly/H193TBiTAF+BmDtNYx Y+kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761324887; x=1761929687; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=6YTGYTR8UZsrLaJy8bJ10cjIfFmoAImLP2SfpL8jtF0=; b=oJ31ZI+gwokyy0Xy5j75E3n9rcuPTPwUCeQL5YOfm5v/pcUFKw8SCooQ9OkPKCEE/z wN1I831QNlUUfzWQNPaIvMXN8OOVjKDR7pHgpPlbkM8CQn1DzX3mN8/oiSDWtpablG57 2ogPq0KNCCBLNX3JkZEzx2iq1ku0aiz11/aXgGIeQQAKjrHFmtr6aFYCHbG+9cysNUFo qBfSLj6E8LiUJvS0pOfJ8plWPP8zcH/Kfh+jORnGv1VTSvLMDzQTQPEDsgIZuuargk27 hsQ29GbkTcRsU33SXQzxohA9kkmYlV/nrLDyp38hgg3MvEmyuZd5EUz3aFMAt1Sdv2gk JSSQ== X-Forwarded-Encrypted: i=1; AJvYcCWCUFja49PPH0DQn5G90bICrOe9PD6T3O3yNdpn4MEO0/GMtGSWmKxc4AtHBeGdO/eQpBDlpXcgQ1kmQyvn@postgresql.org X-Gm-Message-State: AOJu0YwH/azO826MlQxGOqMMq1o9FORSD3XYL2/rM7ZhurNVJv9ohoyk fR6wHPoOHeRMhtUPzvPyhJD1v6fbrpEqhMnFnEGlHX2+8AdjUJDr/7bzjQ9jzw5twOAbkJJY/PG 9yUM= X-Gm-Gg: ASbGncsgBq2rIqF5NRowDC11rufz/jS0Mp8IXCrSZ+hWYPEXpXDASOGns4lGqPYYEwi mpgXA7Af0hmPqb0jmgCyFA27qoh/BSQXHeqRa1Twu/J+s/pvakLfXw/eKjWVkE2Yq0xaYyQgNVD PxH+8Sph2D51DnHzjV2fzY0u2qZavZiMORKIZMH2U3DHMCGC/br7WV28vpmHOmwubWOUImo3MEa nstrFgpOFLq2YVbfFzXPsvW67U5+EvUdiYFCtaoTfDSvHMf+Uq84L7WYkwEYyTaU/IvKoEPo2C3 wubizxgUuOtpDHWKt41g3JzGYAbc8Vi86ebZkWDkfr1TYvpW9tbGAvdDOxBd9fP6R5QdO7Nmdp/ mmLNZIcwl4WQIbzTl7f9OnYDwNVLOcG0WQgDXCieLm4yXsZghTfQgNrxsTovmI5vhfa195qL+Uj +veqMeTJlgGXbR7EXhcQ0YS3p0ufHeeYaI X-Google-Smtp-Source: AGHT+IHCCcGMW0nittDVQY+lQxUXx4TDpeN5iL6Av5xge7YQ6rAXCFHjVrHLISOilSfNhkJbOI9Gmw== X-Received: by 2002:a17:903:b90:b0:290:bd15:24a8 with SMTP id d9443c01a7336-290c9c89fa6mr356819395ad.11.1761324887025; Fri, 24 Oct 2025 09:54:47 -0700 (PDT) Received: from jeff-ws-bridge.lan (c-24-7-19-3.hsd1.ca.comcast.net. [24.7.19.3]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2946e2578bfsm60837235ad.112.2025.10.24.09.54.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Oct 2025 09:54:46 -0700 (PDT) Message-ID: Subject: Re: Change initdb default to the builtin collation provider From: Jeff Davis To: Peter Eisentraut , pgsql-hackers@postgresql.org Date: Fri, 24 Oct 2025 09:54:45 -0700 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.3-0ubuntu1 MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 2025-10-17 at 15:02 -0700, Jeff Davis wrote: > On Fri, 2025-10-17 at 17:23 +0200, Peter Eisentraut wrote: > > I remain violently opposed to this idea.=C2=A0 I don't understand how i= t > > could be acceptable to just not provide a good display order by > > default=20 > > and have everyone rewrite their queries. >=20 > I assume that you favor alternative 3 listed here[1], which is to use > ICU "und" as the default. Is that correct? Or do you prefer to get > the > locale from the environment at initdb time? Right now we're still stuck with the worst possible default: libc. Can you make a more concrete counter-proposal here that sorts through some of the details? * Should we base the ICU locale on the environment, or just default everyone to the "und" locale? * If ICU support is disabled, how does that affect the defaults? * If using the environment, what happens if the locale is not supported by ICU (in particular "C" or "C.UTF-8")? * What would be the default encoding, or should that come from the environment? * The ICU provider has some weaknesses around non-UTF8 encodings because of casts from wchar_t and the use of tolower() in downcase_identifier(). Are those potential blockers, and if so, are they fixable? * Can we try harder to find an acceptable way to use memcmp() for the indexes by default, at least primary keys, even if the database collation is ICU? I know that I've argued for this in the past and it's been soundly rejected[1], but some variation on this idea could be worthy of consideration. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/b7a9f32eee8d24518f791168bc6fb653d1f95= f4d.camel@j-davis.com