Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tOiba-008Fxh-OA for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Dec 2024 19:24:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tOibZ-00FXqc-UV for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Dec 2024 19:24:09 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tOibZ-00FXpU-Gq for pgsql-hackers@lists.postgresql.org; Fri, 20 Dec 2024 19:24:09 +0000 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tOibW-000bK6-Jz for pgsql-hackers@postgresql.org; Fri, 20 Dec 2024 19:24:08 +0000 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-21644aca3a0so25452245ad.3 for ; Fri, 20 Dec 2024 11:24:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=j-davis-com.20230601.gappssmtp.com; s=20230601; t=1734722645; x=1735327445; darn=postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=llSBrwxDWRl9u3FNzXQZFvPG5OPbKJk5vTrYqgBbK6Q=; b=rlMJlpUA6OWdqQs1zD7zhhEX4uBpKndvDxINuWhSllwVYAjGOJzoFMeOIKWaVLn5y8 lO1Z8w8FXu5mLbNx8XOUoUbEjk1piRFZdGiR92ywjsjJ67LDys6DpV8W/nbQlaz2yibG FsqFISQy06KjD7cp5jQYsHL2GC9teG3z1jI89LkqlaCO8s28SQQ2zX5a1RLKfmzDDnxb bOWAWy+op4mDqeGV8qmPUanVK4uGQmqZKm9pRqtXUnirmtBNsCa5crTFPRzVg7/PHbHb HT16rujfS/9WqyBOXBdzZbRqFNr5//fIBcfTGiG1TBx+pr6b8rYriMrTXC3a8AM+8fR3 sdRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734722645; x=1735327445; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=llSBrwxDWRl9u3FNzXQZFvPG5OPbKJk5vTrYqgBbK6Q=; b=vrGILZzl14pMgP0+mgLkUJPesSH2pgYfNARR6Qg8MGVBEcmeW44JwgQr8uaRDcH4h6 NotH1eCKdyerkSvjO5Er7N37Q4eS/xelRrpI5Xt9ildkIzq40r80hFNqxmftUQfK8pS7 e1Bdb1PoTgfcVNpVXPzKKWiJpnHUsPnEY9SsxjcHwOz3BFy+hSCp2b3GOnobYrs8YfFH NDeDSolWfFHbUzMfQ60KQjcARLV5PEGeLlB6VgT+KggFYgeM5RThvFsY2Mzte/lfFvn/ 5HU1ZVTThdZV75Z7D3F++PqvzWqQo8NpxS+nRmoQ4T4HrbwsiBDUB2dPzk0Bz5pVjNb6 /BmQ== X-Forwarded-Encrypted: i=1; AJvYcCXxYbkXHrGrD6a28Kk+30dNGQlHS1md/GMt/snxJjg0UTTdpgRvd/hDug0pGTe+SSaX8cx31wDUJIgFeyrK@postgresql.org X-Gm-Message-State: AOJu0YyjbC1Rg2HggBYD6w/EkzziPI820T/McH9+P6c0IHPswHlfd8wj +c5TdALBik/o6ZxNaQtmyz9T+5T15lSmDN4N3kjf3W6z6NUqyf4XAEjWToirSQ== X-Gm-Gg: ASbGncvnt83Z/nUdckA4VMK1xxAZbg3aWp83ygNH+P6e7CqQ9mfHhB5ykiHQa1kowi+ BvqZJHXJeqjTo7Xu9cJ9mvzO0XIDh4wkcW1H1+7PIj2VN+FrlcUUSE57Jqt+GBHwoEZUbWmjObX hEZKYHBBx3X5uk7muqD6reAKI1u53YTsGv3CZdjj2wbHUddALiQ6ja05SDYVYSspipLhlLSRn+F XtoqI57Yr/ngYSThfzVizC3sXG5MfMoUqKCGd6TVgON/H7abNcAbkkrLYh3GTfzRPdhgy/iJrtL 5Ggh69IvgHDwf670mZSfxUE= X-Google-Smtp-Source: AGHT+IEH4fsbPd0IqeKjCEf3RdAUeTqshJhigVVSrVhwIvUt6aFn7NNv6BqYp2bP8BS5KvZkn0MixA== X-Received: by 2002:a17:903:2c8:b0:215:a412:4f12 with SMTP id d9443c01a7336-219e6ed00ddmr46242725ad.33.1734722645325; Fri, 20 Dec 2024 11:24:05 -0800 (PST) Received: from jeff-laptop.lan (c-76-102-242-158.hsd1.ca.comcast.net. [76.102.242.158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9cde01sm32562335ad.143.2024.12.20.11.24.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Dec 2024 11:24:04 -0800 (PST) Message-ID: <72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com> Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*() From: Jeff Davis To: Andreas Karlsson , pgsql-hackers Date: Fri, 20 Dec 2024 11:24:04 -0800 In-Reply-To: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> References: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu2 MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 2024-12-20 at 06:20 +0100, Andreas Karlsson wrote: > SELECT count(upper) FROM (SELECT upper(('K=C3=A5lhuvud ' || i) COLLATE= =20 > "sv-SE-x-icu") FROM generate_series(1, 1000000) i); >=20 > master:=C2=A0 ~540 ms > Patched: ~460 ms > glibc:=C2=A0=C2=A0 ~410 ms It looks like you are opening and closing the UCaseMap object each time. Why not save it in pg_locale_t? That should speed it up even more and hopefully beat libc. Also, to support older ICU versions consistently, we need to fix up the locale name to support "und"; cf. pg_ucol_open(). Perhaps factor out that logic? Regards, Jeff Davis