Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nCLAz-0000oH-98 for pgsql-www@arkaria.postgresql.org; Tue, 25 Jan 2022 12:43:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nCLAy-0003Pg-4H for pgsql-www@arkaria.postgresql.org; Tue, 25 Jan 2022 12:43:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nCLAx-0003PX-HE for pgsql-www@lists.postgresql.org; Tue, 25 Jan 2022 12:43:55 +0000 Received: from mail-wm1-x331.google.com ([2a00:1450:4864:20::331]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nCLAu-0000U3-55 for pgsql-www@postgresql.org; Tue, 25 Jan 2022 12:43:54 +0000 Received: by mail-wm1-x331.google.com with SMTP id o1-20020a1c4d01000000b0034d95625e1fso1517825wmh.4 for ; Tue, 25 Jan 2022 04:43:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec-at.20210112.gappssmtp.com; s=20210112; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=qUZeKN355qsTA/N8M2OHXUk6qNdf6WTQlIPy9DUWHpk=; b=J6c5eVa/6LCD5w25Wj4i+4Iij15QePuRbNweMqPGIcDsI8FfTFnv9n9wa63VyMEmGy CUBmYOOYwI4lHE9IFSLSwROOGZcHuxp6TViPy9Gi80sXCrWBgVrmSZKtdGWQkV+Vr9Tb GvKfqIwRz8ZkbYWH3OM54/lk2CLajd4FVXgQOA/iH/i+ou7qrXOWVTLsz4I0dFuP3KkV EPBl8KuHx/1Ru58RYuVfsJ98roVGThbygkrjBzA/cAooujfoRQwykUlaxD6beS6gDhZ/ gSN6uUo5Cm/aMQRgEikaA5iBwUEJXqc7nnOb8O49UbvL/apHztD2Lj4kBBEVXNbUkK82 tb4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=qUZeKN355qsTA/N8M2OHXUk6qNdf6WTQlIPy9DUWHpk=; b=LwoxmLaord7hCcHXNJMKctMnyiizv0P45xAYzWBIIl4AAZH5I9vOBxMUpxuFSeBsVF JXUsi5PDp6dic73zGxgqYmo3vnZW2TpZ6PHdV5L2wYtcgqeFu7nNPAHMQeIP2cdAi+4c 0Rbw8W4vDIusheRda6tmh3szn6L1NgDbVENBcyry7pHPDFC+8LhYwe59RLqghrUCpokW BVdlqmG5gUVLeal1D7pOqudaST18mKpzJV+BNwdCVkBHbfkknuENeimx46u+5Af7PoeF YPunxQXoip8Pp5FT8bOnJq1nSwqgvGQm9UyIE5ih8U/8JRFTBDcjMmvfNbkojfyQ8oaO lm0g== X-Gm-Message-State: AOAM5335pMvLExAyXQ+siDBaKtL5xOijpzrU/9aImj0jvzKrzxx0paiO vYKMbfeR5MbZ555OvpH2/lduZw== X-Google-Smtp-Source: ABdhPJwsMmc5kYCwpd7dUR/Dn31buzKYtDr10ieSX8hQeHZwxW9awUku3wctwVz7thzTT7dmQwVcJQ== X-Received: by 2002:a05:600c:154d:: with SMTP id f13mr2803608wmg.70.1643114629722; Tue, 25 Jan 2022 04:43:49 -0800 (PST) Received: from localhost.localdomain (dynamic-ea5z6dr9b0brepeai-pd01.res.v6.highway.a1.net. [2001:871:5e:648:93ab:b888:fbfc:de9a]) by smtp.gmail.com with ESMTPSA id h8sm314972wmq.26.2022.01.25.04.43.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 04:43:49 -0800 (PST) Message-ID: <22d5245c9c5a9aa05a0510bdd52458812140a870.camel@cybertec.at> Subject: Re: Mailing list search engine: surprising missing results? From: Laurenz Albe To: Oleg Bartunov , Tom Lane Cc: Bruce Momjian , James Addison , PostgreSQL WWW Date: Tue, 25 Jan 2022 13:43:48 +0100 In-Reply-To: References: <2150096.1643057249@sss.pgh.pa.us> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-2.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 2022-01-25 at 14:04 +0300, Oleg Bartunov wrote: > On Mon, Jan 24, 2022 at 11:47 PM Tom Lane wrote: > > Bruce Momjian writes: > > > On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote: > > > > The reason is that the 'moore' in 'boyer-moore' is stemmed, since it > > > > is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool' > > > > isn't: > > > > > Wow, he showed me this problem earlier but I never suspected it was > > > stemming issue because I never considered proper nowns could be > > > stem-adjusted, but it is obvious they can. > > > > I wonder if we should change that so that components of a compound > > word are consistently stemmed the same way. > > Something like this > > SELECT to_tsvector('english', 'Boyer-Moore-Horspool'); >                        to_tsvector > ---------------------------------------------------------- >  'boyer':2 'boyer-moore-horspool':1 'boyer-moore':1  'moore-horspool':1  'horspool':4 'moor':3 > (1 row) Not quite. The problem is question is the "'boyer-moore':1". If that were "'boyer-moor':1" instead, the problem would disappear. Yours, Laurenz Albe