public inbox for [email protected]
help / color / mirror / Atom feedFrom: James Addison <[email protected]>
To: [email protected]
Subject: Mailing list search engine: surprising missing results?
Date: Sun, 23 Jan 2022 12:49:07 +0000
Message-ID: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com> (raw)
Hello,
I noticed that the mailing list search engine[1] seems to unexpectedly
miss results for some queries.
For example:
A search for "boyer"[2] returns five results, including result
snippets that contain the text "Boyer-More-Horspool" [sic] and
"Boyer-Moore-Horspool".
However, a more specific search for "boyer-moore"[3] does not return
any results -- that seems surprising.
Specializing the query further and searching for
"boyer-moore-horspool"[4] *does* again return results -- two documents
-- with the terms "boyer" and "horspool" highlighted.
Although it's not a significant problem, I do have a theory that could
explain the behaviour (offered in case it may save time on
investigation):
It seems possible that the term "more" -- and nearby misspellings,
like "moore" -- may be filtered out as stopwords (meaning: they're not
present in the search index), and that the search engine is configured
to require a minimum percentage match rate for query terms.
Under those conditions: searches for "boyer" would produce an 100%
match rate, "boyer-moore" would produce 50% (since "moore" would not
be found in the term index), and "boyer-moore-horspool" would match at
66-point-6-repeating percent.
Given a required match rate of around two thirds, that could explain
the behaviour (it might not be the true reason, but it seems like one
possibility).
Thanks,
James
[1] https://www.postgresql.org/search/
[2] https://www.postgresql.org/search/?m=1&q=boyer&l=1&d=365&s=r
[3] https://www.postgresql.org/search/?m=1&q=boyer-moore&l=1&d=365&s=r
[4] https://www.postgresql.org/search/?m=1&q=boyer-moore-horspool&l=1&d=365&s=r
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
In-Reply-To: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox