public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bruce Momjian <[email protected]>
To: Laurenz Albe <[email protected]>
Cc: James Addison <[email protected]>
Cc: [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
Date: Mon, 24 Jan 2022 14:28:00 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>
	<[email protected]>

On Mon, Jan 24, 2022 at 08:27:41AM +0100, Laurenz Albe wrote:
> On Sun, 2022-01-23 at 12:49 +0000, James Addison wrote:
> > Specializing the query further and searching for
> > "boyer-moore-horspool"[4] *does* again return results -- two documents
> > -- with the terms "boyer" and "horspool" highlighted.
> 
> This is caused by the peculiarities of PostgreSQL full text search:
> 
> SELECT to_tsvector('english', 'Boyer-Moore-Horspool')
>        @@ websearch_to_tsquery('english', 'boyer-moore');
> 
>  ?column?
> ══════════
>  f
> (1 row)
> 
> The reason is that the 'moore' in 'boyer-moore' is stemmed, since it
> is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool'
> isn't:

Wow, he showed me this problem earlier but I never suspected it was
stemming issue because I never considered proper nowns could be
stem-adjusted, but it is obvious they can.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Mailing list search engine: surprising missing results?
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox