Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w64rz-003vee-06 for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 10:56:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w64rx-009Cbt-11 for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 10:56:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w64rx-009Cbl-04 for pgsql-hackers@lists.postgresql.org; Fri, 27 Mar 2026 10:56:49 +0000 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w64rv-00000001ItE-1SWz for pgsql-hackers@postgresql.org; Fri, 27 Mar 2026 10:56:48 +0000 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-4852b81c73aso17465725e9.3 for ; Fri, 27 Mar 2026 03:56:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774609006; x=1775213806; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=1Oq3AEgfQW6c4d1Ct5TJXq99Jx2bycqIL46q3lXGxKg=; b=VT+7p++gz20LHmI7UBDa5MMXo7hjrGt6TnNgvOc/jydaoPL4o5TNbn6ALvF2wrmZlP JM2OGZEP64N6aqvOUxmRYGjaFlAncEAhlvePYbqKmpDyGuc1EvHwalB2FZ2Q5DP0rAWe ou6vTvJZpSPEjdrn/xN4ndLwUe+R1qslwyzj/Ask+oS8jhR7Vn9g0XhHsW4RGrk1GT6L JC9auS8s0aLkUhSMTLiUgyA2ZYKZMsHnyB9IyeR2cXOjqWdOdR8vqVl2LSut9/ThqKUj HQCS+5TvPMzHOGaUUla+1nJwBhKL0k6YBtwIXBoOrerUWgv5tcSCuqNvGk/n2Isdk+M7 4ygQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774609006; x=1775213806; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1Oq3AEgfQW6c4d1Ct5TJXq99Jx2bycqIL46q3lXGxKg=; b=alVHc4vsBRnMROFy9qtOo2dxSKdPSEsjQA7JJiG3rtq2rg1Ibo8ozJfM8ocqm7F0B+ C+/O6k0l5RbNEJPbKaYepRsNSQ4IQ2kcBl6KBJWenQK45DJMknS3cDE+y35Ous8xbnHd 29Q+Iv3mRrFJZl1OqdmUdCB1sjJg5sflNt7p5fNVEQJ8gspZXD82AMCD0WydA1bDaU48 w/keK/Ss3A0fvw5UAevjHJOB68SAjliXA2T09p7aUFd7PtLfjLEYLaG8Jrb1Em9uxjMr oWQfB6MfANweltKKmNYza7+g5Wq14gsG8nDVjmokiWjxo8ORRH3UHDQiyG8ax3YXrWyw ZTQw== X-Gm-Message-State: AOJu0Yy+VBxb2/EBG35EvcfpthwOcjNjjuhGUlt2GY9KMRO5100dMVJU IUgBqlVTo+8bDjUcc7lNKT1okWA66zmLzDVnLa1J4uzO2lFQ08W8MWGN X-Gm-Gg: ATEYQzwl1TS10apB9tqunOto5CTCay1HMIvDGVyGHu74p8afN/XtyPkdx6ZbBtFwydt OIASNVRcKDIU+tFCdcXpemH6poHAmONOLCKXd5sBtS05Q9u3ej7Aq3PmLTC1SLyCMfpG+XvZCEp /3LqraqDFtoPTDGipwqkJ/STBTGxxMq/ZiIpoT0dk9rFODqhSFOHD7JpxEjSB69vPo2VaOsBTv8 9kjbclfYHlPbtW9Zi/VKwTLnfY/yWMyvlg5AXnTX6coJlOa2kEjyNAcu8/fY0duQu5zy5g8hu46 /EJgdxd7zOQuPEdeTjjfdP0ZnwcwDGOHHnEpir6j31Z/gqcA6Ska87KFjxHWnVa3aYGIIMUMJfV 523cKHYLVMPEQMuRg7vLvh6QFrUNnRBZdB/1rzvai4VCvUR6COZSp2yAkUVrIxliwk+XCGJ7z6d J1ZrvtPW+RbRdPDD4FAPh+ X-Received: by 2002:a05:600c:a10a:b0:486:fbc8:8dfd with SMTP id 5b1f17b1804b1-48727f80089mr27241255e9.23.1774609005695; Fri, 27 Mar 2026 03:56:45 -0700 (PDT) Received: from [172.31.5.233] ([147.161.235.32]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48722c6b105sm160048965e9.1.2026.03.27.03.56.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 27 Mar 2026 03:56:45 -0700 (PDT) Message-ID: Date: Fri, 27 Mar 2026 11:56:43 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Use correct collation in pg_trgm To: Tom Lane , Jeff Davis Cc: pgsql-hackers , Heikki Linnakangas , Peter Eisentraut References: <2c15502fd399128ee27fbe1a305e006780159f66.camel@j-davis.com> <7e11acde-9d5b-49a1-9c41-23096d51d2e2@gmail.com> <677b2db1582d9eb9e099ea83856e610a5072dcd3.camel@j-davis.com> <1017962.1774549603@sss.pgh.pa.us> Content-Language: en-US From: David Geier In-Reply-To: <1017962.1774549603@sss.pgh.pa.us> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 26.03.2026 19:26, Tom Lane wrote: > Jeff Davis writes: >> On Thu, 2026-03-26 at 09:50 +0100, David Geier wrote: >>> I agree. That is inconsistent. But if anything, shouldn't we change >>> tsvector/tsquery to as well adhere to the inferred collation? > >> I am not sure either way. >> It's easy to specify a COLLATE clause to affect the interpretation of >> the input. But once you parse the inputs into a stored value, you can't >> later reinterpret those values by specifying a COLLATE clause. The >> parsing already happened and the original input string was lost. >> You can end up with a table full of values, some of which were parsed >> with one set of semantics, and others parsed with a different set of >> semantics. That may make sense or it may just cause confusion. It's >> tough for me to say. > > The rule that text search goes by is that it's okay to be a bit > fuzzy about this because people are usually looking for approximate > matches, so that even if you have sets of lexemes that were extracted > under slightly different parsing rules you can probably still find tsquery allows to do starts-with queries equivalent to LIKE 'foo%' via to_tsquery('foo:*'). These two would then also behave differently. Can you see any good reason that speaks against using the inferred collation in tsquery / tsvector? > what you want. While that argument still works for pg_trgm's original > "similarity" functions, it falls flat for the LIKE/ILIKE/regex index > support functionality: people will be justifiably unhappy if the index > doesn't find the exact same matches that a seqscan-and-filter would. Agreed. That was also one of the motivations to change it. > I've not experimented, but I rather imagine that things are already > buggy as heck, in that optimizing a LIKE or regex expression that's > got collation A applied to it into an indexscan on a pg_trgm index > made with collation B will not work if different trigrams get > extracted. I think we have to insist that the index collation match > the query. Once we've done that, the concern about making a change > like this seems less: you will not get wrong answers, rather the > planner will refuse to use an incompatible index. I thought that happens already. In the following example no index scan is used, even though sequential scan is disabled. FWICS, IndexCollMatchesExprColl() takes care of that. CREATE EXTENSION pg_trgm; CREATE TABLE test(col TEXT COLLATE "tr-x-icu"); CREATE INDEX ON test USING GIN(col gin_trgm_ops); SET enable_seqscan = FALSE; EXPLAIN SELECT * FROM test WHERE col LIKE '%test%' COLLATE "C"; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on test (cost=10000000000.00..10000000001.01 rows=1 width=32) Filter: (col = 'test'::text COLLATE "C") If you have other cases in mind, pointers are appreciated. -- David Geier