Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tyhKK-009yJG-2U for pgsql-hackers@arkaria.postgresql.org; Sun, 30 Mar 2025 01:19:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tyhKI-0028TN-Pg for pgsql-hackers@arkaria.postgresql.org; Sun, 30 Mar 2025 01:19:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tyhKI-0028TF-FC for pgsql-hackers@lists.postgresql.org; Sun, 30 Mar 2025 01:19:02 +0000 Received: from mail-yb1-xb35.google.com ([2607:f8b0:4864:20::b35]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tyhKF-001yve-3B for pgsql-hackers@postgresql.org; Sun, 30 Mar 2025 01:19:01 +0000 Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-e6b81408b9fso971451276.1 for ; Sat, 29 Mar 2025 18:18:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743297538; x=1743902338; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=m8bSDwIZhcgIKcAQZGIzjRBJvFvtTL0gCzKUPZhbLRw=; b=QcYX6U+T/VCzu8b6esGCNi2Kr5QLGUJRSTWr73MzL4RxIMB5toY8vacCjxmb0spGHz XY6rrVkENvMh+booheaU9aQkX2Un6ptdppii4j3WndmUpu13CNUuJPIUHBAo5RlOa0+t seaF8l7E7gEnwADwxSmtLuPrZvahXoTTKjg+Rxh0xRIGzv2aUGttKCkCWY3S6qgof9dS Ghv2GghyfuFWHWNB63WSJGSiAL+xQJtWyVW0y2jtq0o9i++Uno1u7cZ7WHzi7EHxpOlL U4mwvb02pP0WbSEgbDp4yvi1XJQJdquyR3mAqO0p18cRCt78+koqP51dNX6y24/98cis Y3aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743297538; x=1743902338; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m8bSDwIZhcgIKcAQZGIzjRBJvFvtTL0gCzKUPZhbLRw=; b=GhCiDglMTjl1qbW3Wb2Yx83LFsnMqaKcb+piPmCkEaI9JVTqDDpem3Ue1x2sDICRvH kQw2GPHWUw+RpBAEZyvgBDHEJc+VE4k3MyfuSJBZu5rf11v9+lGtaSacQOWAPx5X83DU PmJ/ZF2cja5FY7t801wCyvVRtJCNtf6wsV++GGVqfm9TyonE6MxhvSxFw+mMuNhgIuju rUmJKGrt+CJtwy/v/9L9Zwj4MKEuPZMqQWQO4fTXXqAOIk44Hw2FnjR3yVxZHTHYPLkH LNLWK7jOrx5yhj3juTA8GvP1AIWR+Zl4CN5mV1ZT72Tkv4R6GyhDHaMrEW1gp/SD/Qn9 vIfw== X-Forwarded-Encrypted: i=1; AJvYcCXBKb64Fa1hSjUC1Im5yZNZlI0UyQy7i7Q8K7xGyWqxGQQj8CCU1uIJR/fYZnuisPE3ManFIRnYDIO/C0Y+@postgresql.org X-Gm-Message-State: AOJu0Yw/0MoRKEuGYuLOQUlChDYlttbdxBpU4S/8SApXOss9H6IW9tUK mNAZ740Q98qQQwAWssVB6syTpsYluPBPwiJlCIV8IMDcKHgqIzZnWHj2n8gQI8lTHOLncsU5yjp izxzamnimBWkbBuTyGnDWl0hTVh9QLQ7RjaEk2g== X-Gm-Gg: ASbGncv5lgnRukZ6f0n/Mpa5dtPEYr4zY9z6+lRHZLWjUKkcI8ace01+DDK1uukyLC3 3s4hiAPS5tgR/2u8FkapqD/PigdycasMO9uhNJ1Bd+jMwVJYKuHlxuxuP2Q7n5NYB00G7uLxpDA XWSiUqp7rz+iecciBWmoggIaXD02FgblrC6Q25bsJRGD+Xob+QlxzvKq9WtXNs X-Google-Smtp-Source: AGHT+IGe5nnMSeKk7bsImx7DuBVhikkpilgzSwb9lHc4BIp8LK0eNc3BEo4dyH7N4cNhVsroZGIEg47syrAXDNrNMjc= X-Received: by 2002:a05:690c:a00e:b0:702:7298:b61f with SMTP id 00721157ae682-7027298b72cmr6477287b3.0.1743297538441; Sat, 29 Mar 2025 18:18:58 -0700 (PDT) MIME-Version: 1.0 References: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> In-Reply-To: From: vignesh C Date: Sun, 30 Mar 2025 06:48:46 +0530 X-Gm-Features: AQ5f1JrnHE9h-Em_8EBet40fh4DG0Yx5XUE86aMw-TJX3leIIbnjxrP0uucPoSE Message-ID: Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*() To: Andres Freund Cc: Andreas Karlsson , pgsql-hackers , Jeff Davis Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, 30 Mar 2025 at 00:20, Andres Freund wrote: > > On 2025-03-17 12:16:11 +0530, vignesh C wrote: > > On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson wrot= e: > > > > > > Hi, > > > > > > Jeff pointed out to me that the case conversion functions in ICU have > > > UTF-8 specific versions which means we can call those directly if the > > > database encoding is UTF-8 and skip having to convert to and from UCh= ar. > > > > > > Since most people today run their databases in UTF-8 I think this > > > optimization is worth it and when measuring on short to medium length > > > strings I got a 15-20% speed up. It is still slower than glibc in my > > > benchmarks but the gap is smaller now. > > > > > > SELECT count(upper) FROM (SELECT upper(('K=C3=A5lhuvud ' || i) COLLAT= E > > > "sv-SE-x-icu") FROM generate_series(1, 1000000) i); > > > > > > master: ~540 ms > > > Patched: ~460 ms > > > glibc: ~410 ms > > > > > > I have also attached a clean up patch for the non-UTF-8 code paths. I > > > thought about doing the same for the new UTF-8 code paths but it turn= ed > > > out to be a bit messy due to different function signatures for > > > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToT= itle(). > > > > I noticed that Jeff's comments from [1] have not yet been addressed, I > > have changed the commitfest entry status to "Waiting on Author", > > please address them and update it to "Needs Review". > > [1] - https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6= c7ebc7c0c6e60.camel@j-davis.com > > It's also worth noting that this patch hasn't been building for quite a w= hile > (at least not since 2025-01-29): > > https://cirrus-ci.com/task/5621435164524544?logs=3Dbuild#L1228 > [17:17:51.214] ld: error: undefined symbol: icu_convert_case > [17:17:51.214] >>> referenced by pg_locale_icu.c:484 (../src/backend/util= s/adt/pg_locale_icu.c:484) > [17:17:51.214] >>> src/backend/postgres_lib.a.p/utils_adt_p= g_locale_icu.c.o:(strfold_icu) > [17:17:51.214] cc: error: linker command failed with exit code 1 (use -v = to see invocation) > > I think we can mark this as returned-with-feedback for now? Thanks, the commitfest entry is marked as returned with feedback. @Andreas Karlsson Feel free to add a new commitfest entry when you have addressed the feedback. Regards, Vignesh