public inbox for [email protected]
help / color / mirror / Atom feedFrom: vignesh C <[email protected]>
To: Andreas Karlsson <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Jeff Davis <[email protected]>
Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
Date: Mon, 17 Mar 2025 12:16:11 +0530
Message-ID: <CALDaNm1yY_Jth4TkfLJr88hKEgtC6vPfomNnfPnYebe0QtQECQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson <[email protected]> wrote:
>
> Hi,
>
> Jeff pointed out to me that the case conversion functions in ICU have
> UTF-8 specific versions which means we can call those directly if the
> database encoding is UTF-8 and skip having to convert to and from UChar.
>
> Since most people today run their databases in UTF-8 I think this
> optimization is worth it and when measuring on short to medium length
> strings I got a 15-20% speed up. It is still slower than glibc in my
> benchmarks but the gap is smaller now.
>
> SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
> "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
>
> master: ~540 ms
> Patched: ~460 ms
> glibc: ~410 ms
>
> I have also attached a clean up patch for the non-UTF-8 code paths. I
> thought about doing the same for the new UTF-8 code paths but it turned
> out to be a bit messy due to different function signatures for
> ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle().
I noticed that Jeff's comments from [1] have not yet been addressed, I
have changed the commitfest entry status to "Waiting on Author",
please address them and update it to "Needs Review".
[1] - https://www.postgresql.org/message-id/[email protected]
Regards,
Vignesh
view thread (17+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
In-Reply-To: <CALDaNm1yY_Jth4TkfLJr88hKEgtC6vPfomNnfPnYebe0QtQECQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox