Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tu4F3-003IeS-DM for pgsql-hackers@arkaria.postgresql.org; Mon, 17 Mar 2025 06:46:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tu4F1-0071Kp-AE for pgsql-hackers@arkaria.postgresql.org; Mon, 17 Mar 2025 06:46:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tu4F1-0071Kh-0h for pgsql-hackers@lists.postgresql.org; Mon, 17 Mar 2025 06:46:27 +0000 Received: from mail-yb1-xb29.google.com ([2607:f8b0:4864:20::b29]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tu4Ex-003JR8-1N for pgsql-hackers@postgresql.org; Mon, 17 Mar 2025 06:46:26 +0000 Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-e63c3a53a4cso3283815276.2 for ; Sun, 16 Mar 2025 23:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742193983; x=1742798783; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/C2ysUgfIQw94NinRF9mhNx+urJ+xq0SfYX0C0jfnbI=; b=gAFC8swDFH0DTYSPBXJ7q6tbtXtXHCFguVKWrjz4QY3MS553/2IsgMwgOLwKbAaw92 nMB4wBErg5T5xveD9DgRzGJ+F+P1ssWy2LWlQVTZS2arRi6Ve5RXZEOtO8PH8J0wQN8J FdiX2Ac8/uALZ3C6QdA2Gso1gjfQ/QhvEDgkURENEvv7mWi4viBOG6jqLpUvMNRTBw37 RwaUHfA2GaYp9ssnL/0TGCRD5/IlP1iCZJALpr/rqeZKtY167r983JHArj8vA6rhYg/e TfFS8yudHNYPZnm7W4F6dlLgN6G34XaU4OAjiUUP3G/ZIErmM8gkG7tb7xdFx1O1T6hp n0Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742193983; x=1742798783; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/C2ysUgfIQw94NinRF9mhNx+urJ+xq0SfYX0C0jfnbI=; b=XK1dvHPdCeJVGfHXhxbACqBz/KKLYqSCDAEV1R3I/33YwpOBVnH+7LgGYKANX32HSj Tb8MJr85cieW+1BedB979GUbR33rZT8vo/i4RMl4mBOckh6xZZY7FpgG9a5OBkgYN1oy OUMkbRTNrUxq4k/HyvG0SDkyTljx5BBnMqdJJ/ZG1THefBiYG/vfOdKk628qoT5l+DMm DoAnoZn4uzIjQ1KgQZVBChZlTtjA438+F8aFszASym5jCCrTN7RYvVXzL3xbQmEFu4Gf r9Xj2/hw2LLUJCuz99qW5759v0iYWv89yjwBceP4nV/hs6k4QFQqxSnzQAcnvUp8dwpa 7G2A== X-Gm-Message-State: AOJu0Yw3Uh1tZLiJ6ICzWRG7UAee3zzHt0MfCKKxkOpFLd5Ls7PPWzRO 0qtSAsp0tWC1r86pQ9RYWLqWO5PDpU4fByfyFH0ZRqysVoyxh3OcJOhFQGsOPRij1xz61w+v4o4 q5Bfcn47RA9vcjypRRdCkLIxwVPBZ/YEvg3g8o0nq X-Gm-Gg: ASbGncthoM8/HCanCCH3PsyGtEHUaSS2+OXwFLTHu/EhChE7KCrDxA4pYiPhGiuvPGG yZH3Sv2dV62aRO6VUKe7QQWNiE+0exUHEyHyuhtfWbeqkCOi0/4Z9S80YhT185zRJAcebHkaLdV cFgwEBs+lB454VcmYta3d05GxJ1xDi X-Google-Smtp-Source: AGHT+IE87170Sst8sGMqjq7HqWPZXwziOwhXVq9FG58AMoWbDfCWAUBtUZaM84hf4TokelaM4zqMLq0aGvbnapK2JqE= X-Received: by 2002:a05:690c:6c11:b0:6fe:d759:b187 with SMTP id 00721157ae682-6ff4600e9ffmr123292057b3.24.1742193982990; Sun, 16 Mar 2025 23:46:22 -0700 (PDT) MIME-Version: 1.0 References: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> In-Reply-To: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> From: vignesh C Date: Mon, 17 Mar 2025 12:16:11 +0530 X-Gm-Features: AQ5f1Jr1ooJMbEdQRzRaZy0K3NAyILF-M_6x19xFUxSYgZpk2RgaZJ_aJUqfoX8 Message-ID: Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*() To: Andreas Karlsson Cc: pgsql-hackers , Jeff Davis Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson wrote: > > Hi, > > Jeff pointed out to me that the case conversion functions in ICU have > UTF-8 specific versions which means we can call those directly if the > database encoding is UTF-8 and skip having to convert to and from UChar. > > Since most people today run their databases in UTF-8 I think this > optimization is worth it and when measuring on short to medium length > strings I got a 15-20% speed up. It is still slower than glibc in my > benchmarks but the gap is smaller now. > > SELECT count(upper) FROM (SELECT upper(('K=C3=A5lhuvud ' || i) COLLATE > "sv-SE-x-icu") FROM generate_series(1, 1000000) i); > > master: ~540 ms > Patched: ~460 ms > glibc: ~410 ms > > I have also attached a clean up patch for the non-UTF-8 code paths. I > thought about doing the same for the new UTF-8 code paths but it turned > out to be a bit messy due to different function signatures for > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle= (). I noticed that Jeff's comments from [1] have not yet been addressed, I have changed the commitfest entry status to "Waiting on Author", please address them and update it to "Needs Review". [1] - https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6c7eb= c7c0c6e60.camel@j-davis.com Regards, Vignesh