Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nCdfq-0005kA-H3 for pgsql-www@arkaria.postgresql.org; Wed, 26 Jan 2022 08:29:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nCdfp-0002on-Bv for pgsql-www@arkaria.postgresql.org; Wed, 26 Jan 2022 08:29:01 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nCdfp-0002oe-2h for pgsql-www@lists.postgresql.org; Wed, 26 Jan 2022 08:29:01 +0000 Received: from mail-yb1-xb2e.google.com ([2607:f8b0:4864:20::b2e]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nCdfl-0005AD-0x for pgsql-www@lists.postgresql.org; Wed, 26 Jan 2022 08:29:00 +0000 Received: by mail-yb1-xb2e.google.com with SMTP id h14so69078670ybe.12 for ; Wed, 26 Jan 2022 00:28:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jp-hosting.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mozy2LyQE+TTslcPv8XmvuZXdbTPKIiUg0Eu6qVRRq0=; b=S8KteDEbRWCRaivPb3RIoaktxr68X+MO/yShDxW6picmDO8t35AVpDK7Fp9HjHPqkC hhS2N/gEnwxUTBXDWUY6UEwoV5pIwOo/NkP+JZ2Wcml/yl3aISLJqvDPfRLKwZWGYbXC +rP/QuQAtsszY/xMowJ7IgNwR9bFE4RwQGNlG+VClNNQaSpIv/3uJoCnzE5I+5CNVFOQ UfENCM9ql7mX9RnPTg7XomNaLp+I3WtQK2i6GQ+3zyYOD5mSFxpxYJT5mZ5kcQf1MIf1 5y1/rT6N5LKEKG87MNp9nKzkklRQBHjGXY/pyy7Hk6DZVTQsQGKD2bBP4XjF3GwFHfrB /IDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mozy2LyQE+TTslcPv8XmvuZXdbTPKIiUg0Eu6qVRRq0=; b=AdJKiaH5UOHGArxs6Z+SopERJUfY/WqCHtbqfxLm3lCD45+DfJmmUo4syrnXzivYTx Dj+up4ZpspFTb0RFFEXCGYoYAWocz/NnohaseXxCAVxMdFOQ7lwDchf6SbuTPJ+v/ReR lUCs0eM7MiO1qNQ+9isw7hVy5TykMja46MkWoLTCyBu3I3EQ0tC9CxMl+M6aneZ1WkKK MJ53kYeFYJUjlaRL5qYGs7fs/GqnrHyFru2OTW//ekE9iIdE6HPTdV9iCVEyo54qMIId LOLSeYKquHiq65uCqAg0aG8cCxeUcc06YFUHNGWedcNRaG/uM0DM1iDKZYmBnidhuFv/ DKxg== X-Gm-Message-State: AOAM532EQbB7B92UZBBWNFDVbECY3SBYXlXbMTbEtLq+ZGqG9j1qNi8Y Vc3raeEOoJk843hSi2lZzMKHHk9SK9AysCZta+c8Xg== X-Google-Smtp-Source: ABdhPJy8eBymoJCST6YLjujMzyAW2rAm9s+XhB+xws0yNO2xpfDOHCDxazqBEmE1MSm9CQ0X0gSeAveIE7kNMmIQryI= X-Received: by 2002:a5b:590:: with SMTP id l16mr32055402ybp.629.1643185734939; Wed, 26 Jan 2022 00:28:54 -0800 (PST) MIME-Version: 1.0 References: <2150096.1643057249@sss.pgh.pa.us> <22d5245c9c5a9aa05a0510bdd52458812140a870.camel@cybertec.at> <2257661.1643127753@sss.pgh.pa.us> <79b3eb6e-152e-3c56-7b71-51d091c0f6d9@postgrespro.ru> <2274255.1643133268@sss.pgh.pa.us> In-Reply-To: From: James Addison Date: Wed, 26 Jan 2022 08:28:43 +0000 Message-ID: Subject: Re: Mailing list search engine: surprising missing results? To: Ivan Panchenko Cc: Tom Lane , pgsql-www@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 25 Jan 2022 at 21:23, Ivan Panchenko wrote: > > On 25.01.2022 23:48, James Addison wrote: > > I'm uncertain why parsing hyphenated query text produces compound tokens? > > Because in some cases user wants to search the full hyphenated words, > not parts of them. That makes sense, although to refer back to a previous suggestion of yours, we could allow matching on the full hyphenated words by emitting an 'OR' condition from the parsed query, instead of 'AND' (perhaps using an argument?). In other words: # expected query to achieve a match (from your previous post in this thread) 'boyers-moore' | ('boyers' & 'moore') # actual query that does not result in a match today (plainto_tsquery for 'boyer-moore') 'boyer-moore' & 'boyer' & 'moore' > >> It seems to me that in both cases we'd be better off generating > >> "'boyers' <-> 'moore'", without the compound token at all. > >> Maybe there's a case for the weaker 'boyers' & 'moore' translation, > >> but I think if people wanted that they'd just enter separate words. > > Matching the compond token might be significant for ranking. (?) Yes that does seem likely. The knowledge that there is an exact-match token in the results could be important for various use cases (including relevance scoring). > Probably, there is no universal *to_tsquery function and no universal > parser to fit all users. That seems possible too, yep.