Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1viaOO-00ARwU-1S for pgsql-hackers@arkaria.postgresql.org; Wed, 21 Jan 2026 15:45:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1viaON-007wbn-1x for pgsql-hackers@arkaria.postgresql.org; Wed, 21 Jan 2026 15:45:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1viaON-007wbf-12 for pgsql-hackers@lists.postgresql.org; Wed, 21 Jan 2026 15:45:11 +0000 Received: from mail-wr1-x429.google.com ([2a00:1450:4864:20::429]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1viaOL-001jy3-1P for pgsql-hackers@postgresql.org; Wed, 21 Jan 2026 15:45:11 +0000 Received: by mail-wr1-x429.google.com with SMTP id ffacd0b85a97d-42fb0fc5aa9so3878191f8f.1 for ; Wed, 21 Jan 2026 07:45:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769010308; x=1769615108; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=wRKndQp5AZJIDeZmk32om+mWJfeXqkhhS7PHUFwXmLM=; b=NbliVbSM3znkr8T10yUR0TWLmcVEWqyDsXlsTzpHbforrfZzCOYxXAIHmWxo670Eer UsMWAHc+CgtUnvM58tuTuc/IYsg9DLi3zvqw1RQJ3lBDB3fbdyLx7nSD8VBaRJFPgyyp azCvcJjh3KPTTBPN+79psh9SdsIt3xVUT6xcqOavEA7iMyeSJB/AR0BA2BjMoaaxyGsg fAzdmyJbPjafGHc/0/laQxQlHY6A0vLQBAFTwSKZx3Ezt+0NSUbd3i8gIABSmgjsaA6W JAXSfNQG/ElaLkT0IvFiyBMqpN7r4T3YdbO311Fz2VPKPmWKTbu8qFtgBUdUEzNprp6d hHnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769010308; x=1769615108; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wRKndQp5AZJIDeZmk32om+mWJfeXqkhhS7PHUFwXmLM=; b=ePwI3TeQJptbTA06yUxC7gXDbvKg33KIo5nZ9HzDCk2Kys0bElcsCQuvkHjuVVRqUe nGk4v2kWKH8Ppc0g64CnHFfAGjLW6MVoIaotwdpeLftQyKzeevEbVGuFVEbLisMZBa0P V9MHNhZTH2FRqDUG2isBkX45pSadAXmijKyVKBYCfJb7JiU+f4jL2OFdJWaISMQl/b1J /rqeG9QTjXKeEja3eJcLdCW40npgOVnl/4h/Ok6wotiPzf8wU3G+A5rGWQSy2dCz127K hHAY4FpAGTZBEaLbwpwrjsdVpLC44pyrwBXFBiVx7tdYhVFUQobjJarilBj6xmF0hcLS Z6Jw== X-Forwarded-Encrypted: i=1; AJvYcCWamYmlxrzMoe/dPZ7evr2YC3uW3GEXzl3LJXl2Mw02FWnVee1ts11unuiuQFx8j43NhrZZwwoP5TF/s0rf@postgresql.org X-Gm-Message-State: AOJu0Yxc0KXRyffvRS84bBNNXhuXj6VhF73ipwSogYdPbhRAvfrGuwvX 4I3jhMVTuwY7cdhsxnrMxgWbZXK+Qz+5M2zdHoIMrC8OFR1wzu4ErovB X-Gm-Gg: AZuq6aI+Cv7U+gtlJfsGdC7s4g8cJpTRW+HYAklVH6FzBnaH5+z+unzDhvsQnZZhQ/g QeU7xRiFIdAZAQeVkMXUvNi0pUurLcRvI5Vefe+mS/cYZhih7phdtkkReMCcr5O/vwq3yGsmABW 4JBhDAr0beZzzBhkDM1rHxmolv9wMPsFI/zUCLJ7sOkklu5PYwopoLbKM5XaI/qHEcupOeonwJG 2Nh772Ml7Ar+qGvM/kjD627IzG+sIn2p5KmEyW6LgLmZ+Me0a5kazNHutY3box/YIVJAhwfZqCk TDAFCeeJgKvZGrHlbeWc0tmJd42DhyYEFWjNoL0kFXw8rtAeU3x9dNf94hAx7wrjxIp210A9Nnn kJ8LQkhYk9WPbYP5OJNmUfR3BiDMmiurRam7MAZNMYQWFBQEVFsglA5+8ZAaS0NRfDj21Hg3V0h uMUYKEvUlTBalcpJx58Q== X-Received: by 2002:a05:6000:25c7:b0:432:5bf9:cf15 with SMTP id ffacd0b85a97d-4356a024907mr29051585f8f.5.1769010307726; Wed, 21 Jan 2026 07:45:07 -0800 (PST) Received: from [192.168.2.32] ([147.161.235.4]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4358e24cef3sm12461496f8f.0.2026.01.21.07.45.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Jan 2026 07:45:07 -0800 (PST) Message-ID: <66620ec7-0f81-4813-9cf1-b901a56efcc3@gmail.com> Date: Wed, 21 Jan 2026 16:45:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Reduce build times of pg_trgm GIN indexes To: Heikki Linnakangas , pgsql-hackers References: <5d366878-2007-4d31-861e-19294b7a583b@gmail.com> <9ac3931a-180e-4283-a7a8-05eb66099206@iki.fi> <2e11134f-02c3-43da-8c39-fb520a1a251d@iki.fi> Content-Language: en-US From: David Geier In-Reply-To: <2e11134f-02c3-43da-8c39-fb520a1a251d@iki.fi> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk >> Oh, that's evil. I had tested that specifically. But it only worked >> because the code in master uses str_tolower() with >> DEFAULT_COLLATION_OID. So using a different locale like in the following >> example does something different than when creating a database with the >> same locale. >> >> postgres=# select lower('III' COLLATE "tr_TR"); >>   lower >> ------- >>   ııı >> >> postgres=# select show_trgm('III' COLLATE "tr_TR"); >>          show_trgm >> ------------------------- >>   {"  i"," ii","ii ",iii} >> (1 row) >> >> But when using tr_TR as default locale of the database the following >> happens: >> >> postgres=# select lower('III' COLLATE "tr_TR"); >>   lower >> ------- >>   ııı >> >> postgres=# select show_trgm('III');sü >>                 show_trgm >> --------------------------------------- >>   {0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1} >> >> I'm wondering if that's intentional to begin with. Shouldn't the code >> instead pass PG_GET_COLLATION() to str_tolower()? Might require some >> research to see how other index types handle locales. >> >> Coming back to the original problem: the lengthy comment at the top of >> pg_locale_libc.c, suggests that in some cases ASCII characters are >> handled the pg_ascii_tolower() way for the default locale. See for >> example tolower_libc_mb(). So a character by character conversion using >> that function will yield a different result than strlower_libc_mb(). I'm >> wondering why that is. > > Hmm, yeah, that feels funny. The trigram code predates per-column > collation support, so I guess we never really thought through how it > should interact with COLLATE clauses. I've written a patch to fix that. See [1]. >> Anyways, we could limit the optimization to only kick in when the used >> locale follows the same rules as pg_ascii_tolower(). We could test that >> when creating the locale and store that info in pg_locale_struct. > > I think that's only possible for libc locales, which operate one > character at a time. In ICU locales, lower-casing a character can depend > on the surrounding characters, so you cannot just test the conversion of > every ascii character individually. It would make sense for libc locales > though, and I hope the ICU functions are a little faster anyway. > > Although, we probably should be using case-folding rather than lower- > casing with ICU locales anyway. Case-folding is designed for string > matching. It'd be a backwards-compatibility breaking change, though. Oh, I wasn't ware of that. Doing it only for libc locales seems still useful. Good point with the casefolding. I'll look into that. How do we usually go about such backwards-compatibility breaking changes? Could we have pg_upgrade reindex all GIN indexes? Would that be acceptable? [1] https://www.postgresql.org/message-id/flat/db087c3e-230e-4119-8a03-8b5d74956bc2%40gmail.com -- David Geier