public inbox for [email protected]  
help / color / mirror / Atom feed
From: Laurenz Albe <[email protected]>
To: James Addison <[email protected]>
To: [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
Date: Mon, 24 Jan 2022 08:27:41 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>
References: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>

On Sun, 2022-01-23 at 12:49 +0000, James Addison wrote:
> Hello,
> 
> I noticed that the mailing list search engine[1] seems to unexpectedly
> miss results for some queries.
> 
> For example:
> 
> A search for "boyer"[2] returns five results, including result
> snippets that contain the text "Boyer-More-Horspool" [sic] and
> "Boyer-Moore-Horspool".
> 
> However, a more specific search for "boyer-moore"[3] does not return
> any results -- that seems surprising.
> 
> Specializing the query further and searching for
> "boyer-moore-horspool"[4] *does* again return results -- two documents
> -- with the terms "boyer" and "horspool" highlighted.

This is caused by the peculiarities of PostgreSQL full text search:

SELECT to_tsvector('english', 'Boyer-Moore-Horspool')
       @@ websearch_to_tsquery('english', 'boyer-moore');

 ?column?
══════════
 f
(1 row)

The reason is that the 'moore' in 'boyer-moore' is stemmed, since it
is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool'
isn't:

SELECT to_tsvector('english', 'Boyer-Moore-Horspool');

                       to_tsvector
══════════════════════════════════════════════════════════
 'boyer':2 'boyer-moore-horspool':1 'horspool':4 'moor':3
(1 row)

SELECT websearch_to_tsquery('english', 'boyer-moore');

         websearch_to_tsquery
═════════════════════════════════════
 'boyer-moor' <-> 'boyer' <-> 'moor'
(1 row)

'boyer-moor' is not present in the first result.

As a workaround, I suggest that you search for 'boyer moore'
or (even better) '"boyer moore"' (with the double quotes):

SELECT websearch_to_tsquery('english', 'boyer moore');

 websearch_to_tsquery
══════════════════════
 'boyer' & 'moor'
(1 row)

SELECT websearch_to_tsquery('english', '"boyer moore"');

 websearch_to_tsquery
══════════════════════
 'boyer' <-> 'moor'
(1 row)

Yours,
Laurenz Albe






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Mailing list search engine: surprising missing results?
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox