public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jeff Davis <[email protected]>
To: [email protected]
Subject: pgsql: Optimization for lower(), upper(), casefold() functions.
Date: Sat, 15 Mar 2025 20:04:41 +0000
Message-ID: <[email protected]> (raw)

Optimization for lower(), upper(), casefold() functions.

Improve performance and reduce table sizes for case mapping.

The main case mapping table stores only 16-bit offsets, which can be
used to look up the mapped code point in any of the case tables (fold,
lower, upper, or title case). Simple case pairs point to the same
offsets.

Generate a function in generate-unicode_case_table.pl that consists of
a nested branches to test for specific codepoint ranges that determine
the offset in the main table.

Other approaches were considered, such as representing these ranges as
another structure (rather than branches in a generated function), or a
different approach such as a radix tree, or perfect hashing. The
author implemented and tested these alternatives and settled on the
generated branches.

Author: Alexander Borisov <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/27bdec06841d1bb004ca7627eac97808b08a7ac7

Modified Files
--------------
src/common/unicode/generate-unicode_case_table.pl |   388 +-
src/common/unicode_case.c                         |    80 +-
src/include/common/unicode_case_table.h           | 16418 ++++++++++++++++----
3 files changed, 13671 insertions(+), 3215 deletions(-)



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: pgsql: Optimization for lower(), upper(), casefold() functions.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox