public inbox for [email protected]  
help / color / mirror / Atom feed
From: James Addison <[email protected]>
To: Ivan Panchenko <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
Date: Wed, 26 Jan 2022 08:28:43 +0000
Message-ID: <CALDQ5NwjHE6jjmxVPSq00FbTiVVKcb9+fX7nMnrRXtHNZGt+2g@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAF4Au4yttKJ1KAP-cO+HMLQ2_66vmx0dLTBUbE4W8Aa64foafg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CALDQ5NzFfKCDvmbr6otF+ePH=oijN3xBeqjMen4boitUppTMBA@mail.gmail.com>
	<[email protected]>

On Tue, 25 Jan 2022 at 21:23, Ivan Panchenko <[email protected]> wrote:
>
> On 25.01.2022 23:48, James Addison wrote:
> > I'm uncertain why parsing hyphenated query text produces compound tokens?
>
> Because in some cases user wants to search the full hyphenated words,
> not parts of them.

That makes sense, although to refer back to a previous suggestion of
yours, we could allow matching on the full hyphenated words by
emitting an 'OR' condition from the parsed query, instead of 'AND'
(perhaps using an argument?).

In other words:

# expected query to achieve a match (from your previous post in this thread)
'boyers-moore' | ('boyers' & 'moore')

# actual query that does not result in a match today (plainto_tsquery
for 'boyer-moore')
'boyer-moore' & 'boyer' & 'moore'

> >> It seems to me that in both cases we'd be better off generating
> >> "'boyers' <-> 'moore'", without the compound token at all.
> >> Maybe there's a case for the weaker 'boyers' & 'moore' translation,
> >> but I think if people wanted that they'd just enter separate words.
>
> Matching the compond token might be significant for ranking. (?)

Yes that does seem likely.  The knowledge that there is an exact-match
token in the results could be important for various use cases
(including relevance scoring).

> Probably, there is no universal *to_tsquery function and no universal
> parser to fit all users.

That seems possible too, yep.






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Mailing list search engine: surprising missing results?
  In-Reply-To: <CALDQ5NwjHE6jjmxVPSq00FbTiVVKcb9+fX7nMnrRXtHNZGt+2g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox