Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nC518-00045B-5u for pgsql-www@arkaria.postgresql.org; Mon, 24 Jan 2022 19:28:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nC50a-0002zr-8c for pgsql-www@arkaria.postgresql.org; Mon, 24 Jan 2022 19:28:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nC50a-0002zh-2E for pgsql-www@lists.postgresql.org; Mon, 24 Jan 2022 19:28:08 +0000 Received: from momjian.us ([72.94.173.45]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nC50W-0003eb-Hn for pgsql-www@postgresql.org; Mon, 24 Jan 2022 19:28:07 +0000 Received: from bruce by momjian.us with local (Exim 4.94.2) (envelope-from ) id 1nC50S-00AnXk-DU; Mon, 24 Jan 2022 14:28:00 -0500 Date: Mon, 24 Jan 2022 14:28:00 -0500 From: Bruce Momjian To: Laurenz Albe Cc: James Addison , pgsql-www@postgresql.org Subject: Re: Mailing list search engine: surprising missing results? Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote: > On Sun, 2022-01-23 at 12:49 +0000, James Addison wrote: > > Specializing the query further and searching for > > "boyer-moore-horspool"[4] *does* again return results -- two documents > > -- with the terms "boyer" and "horspool" highlighted. > > This is caused by the peculiarities of PostgreSQL full text search: > > SELECT to_tsvector('english', 'Boyer-Moore-Horspool') > @@ websearch_to_tsquery('english', 'boyer-moore'); > > ?column? > ══════════ > f > (1 row) > > The reason is that the 'moore' in 'boyer-moore' is stemmed, since it > is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool' > isn't: Wow, he showed me this problem earlier but I never suspected it was stemming issue because I never considered proper nowns could be stem-adjusted, but it is obvious they can. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.