public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tom Lane <[email protected]>
To: Ivan Panchenko <[email protected]>
Cc: [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
Date: Tue, 25 Jan 2022 12:54:28 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CALDQ5NxzgeXHRCD4dS_6qz+nn01ivi3i1ZEtD2DmC779i0=iSQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<CAF4Au4yttKJ1KAP-cO+HMLQ2_66vmx0dLTBUbE4W8Aa64foafg@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
Ivan Panchenko <[email protected]> writes:
> The actual explanation can be seen from comparing a tsvector with a tsquery.
> To avoid stemming effects, we use the simple configuration below.
> # select plainto_tsquery('simple','boyers-moore');
> plainto_tsquery
> -------------------------------------
> 'boyers-moore' & 'boyers' & 'moore'
> # select to_tsvector('simple','boyers-moore-horspool');
> to_tsvector
> -------------------------------------------------------------
> 'boyers':2 'boyers-moore-horspool':1 'horspool':4 'moore':3
> Obviously, such tsvector does not match the above tsquery. I think,a better tsquery for this query would be
> 'boyers-moore' | ('boyers' & 'moore')
> May be, it is worth changing to_tsquery() behavior for such cases.
Changing the behavior of to_tsquery is certainly a lot less scary
than changing to_tsvector --- it wouldn't call the validity of
existing tsvector indexes into question.
I see that to_tsquery is even sillier than plainto_tsquery:
regression=# select to_tsquery('simple','boyers-moore');
to_tsquery
-----------------------------------------
'boyers-moore' <-> 'boyers' <-> 'moore'
(1 row)
which is absolutely not a sane translation.
It seems to me that in both cases we'd be better off generating
"'boyers' <-> 'moore'", without the compound token at all.
Maybe there's a case for the weaker 'boyers' & 'moore' translation,
but I think if people wanted that they'd just enter separate words.
regards, tom lane
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Mailing list search engine: surprising missing results?
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox