Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0OAE-001re5-2u for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 18:20:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0OAD-00B9F1-0u for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 18:20:09 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0OAC-00B9Et-3C for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 18:20:09 +0000 Received: from mail-dy1-x1334.google.com ([2607:f8b0:4864:20::1334]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0OAB-00000001eL5-3BjB for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 18:20:08 +0000 Received: by mail-dy1-x1334.google.com with SMTP id 5a478bee46e88-2be19f05d7dso410764eec.1 for ; Wed, 11 Mar 2026 11:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=j-davis-com.20230601.gappssmtp.com; s=20230601; t=1773253207; x=1773858007; darn=postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=HJKCFMlJ1CUV+9SVoIvMby2uo7aT0Pst8woSWLFxsgc=; b=0XsJoujC7+Tne0fmKY+wxhAB03MlLEfXXuP+FFIhT8lENJQus9tF9nvMkY4ovccTmD V92BSHu63KHnGYpUKs7oAILUUVS5FICtyJsMvFQFyJit3cHJJbSQ+923AxOJTDYVfPvW THCi9a27NUDgdGxExKrqdtJ8f0ifWsOn2aIG37V7BaUHURGNgfYNtEtuxKebD6b0G5sl lyQqk9v0LBJvXYpF666k1NF1T2ozUVJ/0sTAri25NLIJJigbDJ4zNhVVmJhqOBfAXglK 7U86vkYRvum8iRf/hTVDp8O90V5gvpCGb2Ww373h8pBcmpJfKEx+a7EStFA0+j/JDxeV R9/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773253207; x=1773858007; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HJKCFMlJ1CUV+9SVoIvMby2uo7aT0Pst8woSWLFxsgc=; b=O7ikDhntSg7TeHRGWxlYcVZHOo1wUm6dqms/kKK8uPW9yCdAUWbYB6PdiOD5ulk1OJ Ley8kH56AzvDuE0bM7psyV7xwMCgjCEe5LhN86UDp+sa4hCZGAlVfjoMFP/g6ipT25uO 2X/SQQfoSXZIoxNpbI+26A9XSZnmZ9kX0vETaKicYGdadmJn+LFRAB+Xo0pv0sBNwaaJ sUmHnumIgl/Hk9jL7gKkYpdplF4u5uKAyb3E/MtcmO7H/Kl+ZY4LZVQVzhwtyGGdzoDk 0XrY5UPKXth0Qu4890AWvwXGzKFcl8sw8cPDsQWIbPpQ6FkG6T6T4/a6/Sl643IMpILK qFCQ== X-Gm-Message-State: AOJu0YyMlBuoE3bqOJU6Nxz6KDxkyJgm1erchC7/ya4c0pGF1Lx9DSEf ltX/mMrDNVfZTvNHldx1SsMsN+FTvzwO0Nrrte6C0EED9RD6FNc7JQx1ykAjOKzwHw== X-Gm-Gg: ATEYQzw9CLTVlpoAWkPrf+lrC1HWXVJt/SrE6ceJEj0qf/HbfHNcI+5lDbXFgrgGuWU aybcgXQesNglTw3pPB0wlGcj453qN2W9sTqOR1r/jqUMt0Km37DhXpAu4A4CvNFg8c42JWPUBOe PdYXSyDd+2DkBs9beU77wuy8/Oy3taYz//rPBVME7OvjG5e+xGgOW5lAuo8kp2fri4HDFzVmLJ/ cizE0F5wp9xHi8Ur/nQ1uWBxbG4pq98Gsj+VPJrlnbFM93eN5meRjlyyXPMTjKcOBq/PuA/KeBD 0JDOMRODhgQtGV7Q52aB7sI3+3ho+ze6BEcdw6L+5Sw4gOjRzYTLQFiUPfVy2C/xLBH/0KkoHSg 5NkyrzOyNpvXw2PzJWjXSNGxOScoYJxwCCyxoEMAE6tUnjd0vHIakyRUC7i5HNy5bwc86jql/ph llaBYVz64VJCoPSirx/nKRdmeozjJa3+vAwaG+l8HCIcqRRDUZTAY= X-Received: by 2002:a05:7300:320a:b0:2a4:3594:d54e with SMTP id 5a478bee46e88-2be8a3323c3mr1570726eec.27.1773253206730; Wed, 11 Mar 2026 11:20:06 -0700 (PDT) Received: from jeff-laptop.lan (c-24-7-19-3.hsd1.ca.comcast.net. [24.7.19.3]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2be8a834fb3sm3392255eec.2.2026.03.11.11.20.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 11:20:06 -0700 (PDT) Message-ID: Subject: Re: Change initdb default to the builtin collation provider From: Jeff Davis To: Robert Haas Cc: pgsql-hackers@postgresql.org Date: Wed, 11 Mar 2026 11:20:05 -0700 In-Reply-To: References: <47e1b4f72fe732c5ae85c6cf2c085b4e99a10120.camel@j-davis.com> <4309879ac305b1cf6b4d7b5fb85bc7b62c6ab768.camel@j-davis.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.3-0ubuntu1.1 MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, 2026-03-11 at 08:47 -0400, Robert Haas wrote: > At the end of the day, we're all just guessing. Part of the reason for that is that changing collation is so difficult that we have very few examples of users moving real workloads from one collation to another. > My experience working > for EDB is that we have a number of customers who care about sort > order quite a lot, and we've had to sweat blood to make them happy. Thank you. I have one burning question: for these users who care deeply about sort order, which scenario best describes their needs? (a) they mostly work in a single locale (if so, does it match their UNIX environment?); or=20 (b) one locale (which one?) is good enough for a variety of locales because even if it's not perfect, it's still better than ASCII; or (c) they somehow partition their data by locale and use multiple locales; or (d) they have a variety of indexes on the same column using different collations to satisfy queries from users in different locales I have found it very difficult to get an answer to that question. When I press users for details (in the sample of users I've been able to reach), usually they back off on the need for sort order, and instead focus on case insensitivity (in which case I suggest the builtin C.UTF- 8). > And, on a personal level, I have a hard time understanding why anyone > would be OK with a sort order that puts =C3=81lvaro after Zebra instead o= f > between Alvaro and Beatriz, because that seems extremely frustrating. I tend to agree, and I wish we had a way to handle this at a "presentation" layer rather than pushing the whole thing down into indexes (storage layer). In theory, pushing collation down to indexes could offer performance advantages, but in practice humans don't read a lot of data, so a post- processing step would be efficient in most cases. > That's perfectly legitimate, but it's different from my > experience. My experience is that when I tell people they can use > collate "C" to speed up sorting, they tell me that's a stupid > workaround that doesn't give them the answers that they want, which > obviously colors my viewpoint on this question in the same way that > your experiences color yours. "C" is especially unappealing because it doesn't even get basic case transformations right outside of ASCII. Regards, Jeff Davis