Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bI1jO-0000iF-6x for pgsql-docs@arkaria.postgresql.org; Tue, 28 Jun 2016 22:43:46 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bI1jN-0001y0-P4 for pgsql-docs@arkaria.postgresql.org; Tue, 28 Jun 2016 22:43:45 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bI1jN-0001xt-Bn for pgsql-docs@postgresql.org; Tue, 28 Jun 2016 22:43:45 +0000 Received: from mail-ob0-x22b.google.com ([2607:f8b0:4003:c01::22b]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bI1jI-0000C5-Nq for pgsql-docs@postgresql.org; Tue, 28 Jun 2016 22:43:43 +0000 Received: by mail-ob0-x22b.google.com with SMTP id mu6so1315264obc.3 for ; Tue, 28 Jun 2016 15:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tHyIg7U+BLTxoJ/84/7h+wEIoXeXj5FmO/ri63D9SeI=; b=NIDYBZJlYES/ZfM7nDNpuk1tqEG3q50YIYibxb2d3XxmOzYYg5QE4sMoM+FO8REbmC xKzYI/1Gt88CduYmkhOqG9xifNfTXGiNpH3mx0Owum8rbKJDOnE7/Ush/ytQ2ucPwvHk +cuYt9lrgbr4vajJi/g6Q5/Z3HcYX106V079fDi8UrNAi1soskfZRd//+1+c+f78Fzd4 Tfww3nC2HbCDmnG4RN4lSVxD/Yb4t8bQx6RwuQFcs/h2sJAtaR8DqoZs2NoRunF+9D7P dYkvzupX+1hbRb9MHMtLcpCg+vQPUJPUjvc/kKkuk5sE1YpRd0fh+StrAzK0P1agRaj2 lGrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tHyIg7U+BLTxoJ/84/7h+wEIoXeXj5FmO/ri63D9SeI=; b=dQtC4Gl54HCH3o9LTCpgOHa6Deuj3h75RsJUsKULHhBTiC9FKmMTrnZbHiXvLI8KRw tkoJJYgfnh02Z8I3CP1mMvIbFSfEt5wk7PzO7Hk8OuW5Stm62eRIbzDeLVtWS6jhb758 WGIcK1hOmbXKfDr1C0k2LvnzLcvORp2oOp5XvFng5E//x1e26AJr3ubTOT9Sn6by7F0a Mwt5p9bJ4tMaCDerQBx5xaHAfIMYnv6wvtElLTvvI52poNnoriZooHiiBuWiRtIaCier EXayyIvjkPV7pud6hGRGuCqYAuC3LAWw5nKMKddGMWa3ip2Cnc6uMndAwOFEVcZHtipL cuUw== X-Gm-Message-State: ALyK8tLwqnPefXIHL4D2HmTEJ+ONgprdYMLFFOnDZzJMZiX6lIFRhEPioiFNaoCIMRK38JjrIeQBRjDQouJ5Lw== X-Received: by 10.157.44.73 with SMTP id f67mr3802875otb.116.1467153818083; Tue, 28 Jun 2016 15:43:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.17.57 with HTTP; Tue, 28 Jun 2016 15:43:18 -0700 (PDT) X-Originating-IP: [75.101.100.201] In-Reply-To: <20160628222011.GD11193@momjian.us> References: <20160617154311.GB19359@momjian.us> <20160617215154.GA140481@alvherre.pgsql> <20160628222011.GD11193@momjian.us> From: Peter Geoghegan Date: Tue, 28 Jun 2016 15:43:18 -0700 Message-ID: Subject: Re: Pg_upgrade and collation To: Bruce Momjian Cc: Alvaro Herrera , PostgreSQL-documentation Content-Type: text/plain; charset=UTF-8 X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-docs Precedence: bulk Sender: pgsql-docs-owner@postgresql.org On Tue, Jun 28, 2016 at 3:20 PM, Bruce Momjian wrote: >> I have long advocated adopting ICU as our defacto standard "collation >> provider", primarily so that we can directly control collations and >> collation versioning. I think that doing this would solve many >> problems. Besides, even SQLite has optional ICU support. PostgreSQL is >> the only major database system that I'm aware of that relies on >> operating system collations exclusively. > > I am hopeful ICU has improved enough since we last researched that > support for it will soon be added. There is a patch available that is not ready to be submitted, and doesn't have a real advocate, but is at least enough to convince me that it's very doable. Performance is certainly no impediment to adopting ICU, even without considering that it effectively re-introduces abbreviated keys for text when the C collation is not used. The best argument for ICU is the evidently lax attitude that the glibc people have towards the correctness and consistency of their collations: https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3 Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the collations in glibc may change from build to build depending on changes in the algorithms or locales. You cannot rely on the collation stay the same once the process exits (nor can you rely upon it via a shared memory mapping to another process sorting strings in memory)". Frankly, we have no excuse for not heeding his warning. I'm not annoyed at the glibc people for taking this position. There is, quite simply, a misalignment of incentives. For the glibc people, the assumption is that any problem with collations leads only to slight annoyance from end users, as when the GUI produces subtly wrong ordering. Whereas, for us, any inconsistency is an extremely serious problem. Here we have the maintainers of glibc telling us that they feel like it's okay that that can happen at any time. Surely that isn't good enough. ICU as a project has every incentive to see things the same way as we do. The library explicitly decouples collation rule versions from algorithm versions. All of this is carefully considered, for the benefit of the numerous major database systems that use ICU. -- Peter Geoghegan -- Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-docs