Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Iz1-001nPS-2S for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 12:48:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0Iz0-008xzt-0j for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 12:48:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Iyz-008xzW-2t for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 12:48:14 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0Iyx-000000028Si-48bQ for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 12:48:14 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-b96d784828bso575033366b.3 for ; Wed, 11 Mar 2026 05:48:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773233291; cv=none; d=google.com; s=arc-20240605; b=NGuMj6FoscCEEXqXmZ796MTH4Zy7aVgTwmpCwU3JG5evY9U20d1LKNjY68QaAMACYd woWTxhPOvKG7+ra/X0R5KpM1FTHXnTttrUpUEbI4rbpYpSbbEs82Lhaxv55Ye2qM0Kuv Elp54AQQsjb/zCCz06Pfc3EFSc3AUT45iRIHCGuQw5ExvKG8iN4+K8+ItETRfTbWlod+ IcVYFUQJorXkeD7K+oo+pIs7Lgnldu3EuqKqp4vJUNTFLtw4SUfE9LECJggYk8wef3pv cjtac9aPlpGtAcn9+AkOQc+r/P7VbsEs+iFsHMZkNhTVgJgqEB98PQPWa/ZLi8Ani+Ph yBkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=/DNxpakllsyHMuK+C2Is7/vg+yPpP8Bz9p/x/IJjWpM=; fh=2zSQZfKdmhwzAOqZC07nBJoHnw6XGaE3I4pRylMnrzM=; b=eiqLAgxyPFE2TiynAGokaBVF03jUbd9Bh8Wzr8z8OK3GfrbEI3fSk8tV40Mu4s7/o6 8gPAPtDbma4s1qYIcYpN24g2VJkmPh9Q/zKO15rVcS6lqrF8aUEBNTOB71TaGdbtVU1K wU0VWA7IPwi9reQGpTKqNxFwAptj+yts8b4l7ZoaNbmLY4w5sDryuP0M7zLCKd0btnSU wJtqowfm8sDa9pG7lreMk9E4z2aXj5i22uPuwVjTK65tXmqwI8x1iMWbS07A5TBeMsWh feQ/0T1sKkrwZI09T0gB5Jw8ZoQs2tTQtOxU0tA0e6TGCbDlt01iHI8vP/Ul8KKBms0w 7jRA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773233291; x=1773838091; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/DNxpakllsyHMuK+C2Is7/vg+yPpP8Bz9p/x/IJjWpM=; b=W8nBqQ9oUQEnZLxK7gAFbAuZxpGYLaR2d1FujHfvVSfEOc8Vji3QaAlKXSM7PUJWKD D4VrUpgTShM3gACbN4C9kgNt2TlFFGoZhGVjMUXK0M3vtoq9Rt/owdqj2nIBhw+a6b/k VMlw304UvwvmNRDeInTEu2TVOhgL5BgS5z4zT+QweJrXzvXcpe8XD/gr5HhL2dXXYddl GEPC7j/G/Ff7ErPHBczLfw9aUg6r7Dbxh1PYxOQ9GAJAnJtceewKthnFr8CjjCZXL9/Y 4/YJIfmWyDrK/FbW6caNtvlbMgSOXswB/8GGBiJDXo5WPXzhyQemon0Vh99ETynLLui8 Hjfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773233291; x=1773838091; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=/DNxpakllsyHMuK+C2Is7/vg+yPpP8Bz9p/x/IJjWpM=; b=NmKeydck6Xp3GVS5JZ2U0RD3HwwA11sz4nhz0pZVbDkZyJQA2Ymzy7SCupgc59DSxO xsT35cRIPcU6uykHbDntjlOW9DvkAdxpqozWzAYZfAiqNBpQcxvja0jGpMZWKqgsNAFc btBNS4kh5Szm2Iqn3qxqKQpojb5Suj5HCIB2BOSJpK3lqx3JELW7QJEjC18z+lEOtKsX 45HUTF+aD+yL3HStim8KP5xC/bB0oJ8mUMs9+rXWBNlaGN5VDmmFfWxl+H6I1MLATZaE iglocfxXdvxO8xh95+J5N/B0xgmLY857xU4vLVg2uEvZ0gAPIXp/ONoPdeJZKbcUHO95 nz1w== X-Gm-Message-State: AOJu0YwICeAC6WghIQv/PnZ2H6pFPGqepSCfbCSWMFc3/u9UEwR1pnV0 O0HgmX3mgUGoL99O7lDEYfODFHQJudCKRqlUmpjgzsYDNk1WgkVHfEIwIX+R2qIgmBxg/BUXXCo 720VfCPf2hrzo+WLXXXmnIsqHN/tEWk0= X-Gm-Gg: ATEYQzzXsswo9HCEoYo4eBPgm0jcpKc2TQC9YjnpqTBCPYLsFZC2gc3suCJdGKFRZpo f7Bvrry5wwF1Er8yKiWz3DUEKGoUE0IsUspDaL7A8BRenksoSCj+cd6S4SBTpAHnxPdyxwGUURP gNk7xs/ORyg5EupUJoFxqyK43uI+6v3MKfF2mbpqUIbPaLBDC/bHChycAMF1XSYFYHU2NmLId5q +d3rqZ0/1cLQpFqDfvg85lCEvDkojLfqYQUywsngW1rCwN3td/c4J/qU6Vbhim3Nu9zoO5Gf9yH 2R60Wr3vkOYbzlY+tFyi9/TZPDSzGTFVd8TSj1Nd X-Received: by 2002:a17:907:60d4:b0:b93:6b15:72f6 with SMTP id a640c23a62f3a-b972e5bf3aemr142598066b.40.1773233290588; Wed, 11 Mar 2026 05:48:10 -0700 (PDT) MIME-Version: 1.0 References: <47e1b4f72fe732c5ae85c6cf2c085b4e99a10120.camel@j-davis.com> <4309879ac305b1cf6b4d7b5fb85bc7b62c6ab768.camel@j-davis.com> In-Reply-To: <4309879ac305b1cf6b4d7b5fb85bc7b62c6ab768.camel@j-davis.com> From: Robert Haas Date: Wed, 11 Mar 2026 08:47:58 -0400 X-Gm-Features: AaiRm50GkFLFjIVYmPrwDaBQ9kPNfmNx7x75M_8toNIVA3yu6bIhV4-6EHU9gHo Message-ID: Subject: Re: Change initdb default to the builtin collation provider To: Jeff Davis Cc: pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 10, 2026 at 3:04=E2=80=AFPM Jeff Davis wrot= e: > If their environment's LC_CTYPE is UTF8-based, they already get UTF-8. > If it isn't, we can either: > > (a) Fall back to LC_CTYPE=3DC, which is the only UTF8-compatible locale > available everywhere. C is actually not a terrible fallback: it doesn't > actually affect many things, because I have moved almost everything to > use the database default locale. > > (b) Warn or error unless they explicitly specify the encoding with -E. > But the former is likely to be ignored and the latter is not what I'd > call "gentle". > > Which of these do you think is the right approach? I'm a little confused as to how this relates to what you were asking before. I thought you were proposing to pick UTF-8 rather than SQL_ASCII when LC_CTYPE=3DC, but that's not on this list of options. To be honest, I'd probably be ready to support making the default encoding UTF8 regardless of the environment, and you have to use -E if you want anything else. I think there are still people using other encodings, but I believe it to be a small minority at this point. > There's narrower question about what we do with LC_CTYPE=3DC. Currently > we use SQL_ASCII encoding, which doesn't seem like a great default, and > we could change that to default to UTF8. And another question about > whether we change the meaning of --no-locale. I think SQL_ASCII is a terrible default. Nobody actually wants that unless they're trying to get out of a sticky situation. Making it opt-in must be right. I do not know what the question about --no-locale is. > We sweat over single-digit performance regressions in fairly specific > cases all the time, but here we're 3X slower for index builds: > > https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-tab= le-if-you-use-default-locale/ > > and 2-5X slower for Sort: > > https://www.postgresql.org/message-id/64039a2dbcba6f42ed2f32bb5f0371870a7= 0afda.camel@j-davis.com > > and others don't seem very concerned, so I feel like I'm missing > something. At the end of the day, we're all just guessing. My experience working for EDB is that we have a number of customers who care about sort order quite a lot, and we've had to sweat blood to make them happy. And, on a personal level, I have a hard time understanding why anyone would be OK with a sort order that puts =C3=81lvaro after Zebra instead of between Alvaro and Beatriz, because that seems extremely frustrating. However, these are just personal biases. I'm much more likely to hear from the customers who care a lot about the details of how something works than I am to hear from the customers who are perfectly happy to take the defaults, because people who are happy don't contact support at all and people who are unhappy about relatively normal things get handled by support; I get the weird cases. And everybody is going to have different experiences. Presumably, your experience is that the indexing and sorting performance is a big concern for the users you support, and that's why you favor prioritizing that part of the experience. That's perfectly legitimate, but it's different from my experience. My experience is that when I tell people they can use collate "C" to speed up sorting, they tell me that's a stupid workaround that doesn't give them the answers that they want, which obviously colors my viewpoint on this question in the same way that your experiences color yours. --=20 Robert Haas EDB: http://www.enterprisedb.com