Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tybG0-008nZ1-9x for pgsql-hackers@arkaria.postgresql.org; Sat, 29 Mar 2025 18:50:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tybFy-00F7ST-QM for pgsql-hackers@arkaria.postgresql.org; Sat, 29 Mar 2025 18:50:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tybFy-00F7SK-G4 for pgsql-hackers@lists.postgresql.org; Sat, 29 Mar 2025 18:50:10 +0000 Received: from fhigh-a2-smtp.messagingengine.com ([103.168.172.153]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tybFv-001w9k-20 for pgsql-hackers@postgresql.org; Sat, 29 Mar 2025 18:50:10 +0000 Received: from phl-compute-10.internal (phl-compute-10.phl.internal [10.202.2.50]) by mailfhigh.phl.internal (Postfix) with ESMTP id B3C2F114019E; Sat, 29 Mar 2025 14:50:04 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-10.internal (MEProxy); Sat, 29 Mar 2025 14:50:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1743274204; x=1743360604; bh=55HHG0S1QOPs9oKkQB1UDc+phfsUYW2AJvkvhiYTyWI=; b= MUgJV1CaMXBDFs33tbAFK3Wum8zgs+q+JnMs8DakhWQRSyElQjeGAMe+bMn8tkeE PIRvnnPZ/VL/mXQu6VvyLaoeH3xsXMiFl7QcvpoiaEXF8OfPi7tsVGjb+i44pJgv 1u9CwCLGKa4DcC0UTtGB/90kfsNhwOcEoOjUAPy/O7Kzil2B9Um05Pv+2V67qqQI 4LGBzJEGR7JMXHhBzjmE7mxr0FjUeZ35ed/fX7iRy+xg2EMcj+pFDuE+6dYEn5Ev LXkVpXxwHhxHNELvu3i4tdmCnrdCyozOXnvC2xvYiVpSPnwsOb5imS7OlsMZ8mgB Zipo85HRrJ6dQHaE24QeIw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1743274204; x= 1743360604; bh=55HHG0S1QOPs9oKkQB1UDc+phfsUYW2AJvkvhiYTyWI=; b=C LoBUjRlffsmeorTL9GU/c8TBazPFeqBkCmWUBRVNhqdzWDQMDiHmO8nuiyNyCQBD cuFteVkgAbKvhq7oM7lCd+ZbdIVgnSD8gcbYZ+Yt3bTvjyFYggYbEFIcJ+nyK9NQ L8YHhjmqfyXQgJPwIq59aWAqeGKuYzMHo4Olzuz8QRinzM5znx3zws4TD1NUXJeB YN4wV6rT9DEQXtTNiaSwmPWxQCYVKAiTp85R4c7CbQljM9MyBxt5LI3beS0qIFUf tSuwU+BUHREHKip9ZxeIifZwv2ZbGOH6CkW4AvA2/U6O2facvLNTgcuAQAkJZIcs e6tsrUAfONCY+cFF0EdnQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddujeegleefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggugfgjsehtkefstddt tddunecuhfhrohhmpeetnhgurhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrh griigvlhdruggvqeenucggtffrrghtthgvrhhnpeduleelhfdukedukedufeejlefhkefh tdfftddvtdejiefgtddtjedvjedukeeiteenucffohhmrghinhepphhoshhtghhrvghsqh hlrdhorhhgpdgtihhrrhhushdqtghirdgtohhmnecuvehluhhsthgvrhfuihiivgeptden ucfrrghrrghmpehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdruggvpd hnsggprhgtphhtthhopeegpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehvihhg nhgvshhhvddusehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlhesjhdquggrvh hishdrtghomhdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgvrhhssehpohhsthhgrhgv shhqlhdrohhrghdprhgtphhtthhopegrnhgurhgvrghssehprhhogigvlhdrshgv X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 29 Mar 2025 14:50:03 -0400 (EDT) Date: Sat, 29 Mar 2025 14:50:03 -0400 From: Andres Freund To: vignesh C Cc: Andreas Karlsson , pgsql-hackers , Jeff Davis Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*() Message-ID: References: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2025-03-17 12:16:11 +0530, vignesh C wrote: > On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson wrote: > > > > Hi, > > > > Jeff pointed out to me that the case conversion functions in ICU have > > UTF-8 specific versions which means we can call those directly if the > > database encoding is UTF-8 and skip having to convert to and from UChar. > > > > Since most people today run their databases in UTF-8 I think this > > optimization is worth it and when measuring on short to medium length > > strings I got a 15-20% speed up. It is still slower than glibc in my > > benchmarks but the gap is smaller now. > > > > SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE > > "sv-SE-x-icu") FROM generate_series(1, 1000000) i); > > > > master: ~540 ms > > Patched: ~460 ms > > glibc: ~410 ms > > > > I have also attached a clean up patch for the non-UTF-8 code paths. I > > thought about doing the same for the new UTF-8 code paths but it turned > > out to be a bit messy due to different function signatures for > > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle(). > > I noticed that Jeff's comments from [1] have not yet been addressed, I > have changed the commitfest entry status to "Waiting on Author", > please address them and update it to "Needs Review". > [1] - https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com It's also worth noting that this patch hasn't been building for quite a while (at least not since 2025-01-29): https://cirrus-ci.com/task/5621435164524544?logs=build#L1228 [17:17:51.214] ld: error: undefined symbol: icu_convert_case [17:17:51.214] >>> referenced by pg_locale_icu.c:484 (../src/backend/utils/adt/pg_locale_icu.c:484) [17:17:51.214] >>> src/backend/postgres_lib.a.p/utils_adt_pg_locale_icu.c.o:(strfold_icu) [17:17:51.214] cc: error: linker command failed with exit code 1 (use -v to see invocation) I think we can mark this as returned-with-feedback for now? Greetings, Andres Freund