Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nBtlW-0003lL-GT for pgsql-www@arkaria.postgresql.org; Mon, 24 Jan 2022 07:27:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nBtlU-00067a-GE for pgsql-www@arkaria.postgresql.org; Mon, 24 Jan 2022 07:27:48 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nBtlU-00067Q-6d for pgsql-www@lists.postgresql.org; Mon, 24 Jan 2022 07:27:48 +0000 Received: from mail-wr1-x429.google.com ([2a00:1450:4864:20::429]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nBtlR-0005Lx-CQ for pgsql-www@postgresql.org; Mon, 24 Jan 2022 07:27:46 +0000 Received: by mail-wr1-x429.google.com with SMTP id h29so9132377wrb.5 for ; Sun, 23 Jan 2022 23:27:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec-at.20210112.gappssmtp.com; s=20210112; h=message-id:subject:from:to:date:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=i8FoKol6zd4IhV4EUXn5RAIrfCFBR38S2UmUK2uTwIg=; b=cYDjoiBJM43yVxcgrgG+pkuz/44C0HYPzfW/uhBsCwqwsXP5xc5Rfd5Bgk6+4byHto vMCKGv/QOfXWPku/XanrMEw4y51XmtrOs18d676m025YzthH1iO2Zu6tiaXNh3vpH7CK BZNd9Nlajyr3TLIrxQQ9oTwZDoEzXL6wrz87KPgDfXPx8zk6sDUQxJswcMx3W7lFFCtT qCZ9zl5MDgFYa509MDp9zo7nZGS3oJNMt0xyl6bPxf+/E1ja3n+FMPJqx5VnBJsZyUjB uBabMp5WS3G63er803gjzT8zv9Fu/+Uh7lzItRqWfsXk/x6nZxjvyLrqbCZQbcjev5B5 ivGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=i8FoKol6zd4IhV4EUXn5RAIrfCFBR38S2UmUK2uTwIg=; b=ie8Rxsj8Mvdrd/YeSebFDQu153UXFlHDjxRS8M2HLqLbJRk8nDTvwCbwCskWKlGAVv Z1GicX+LLmJMqOGgQRMPRcoEepldDRO0HDt0A5ULAvSyaCEXc+JZbyE0tq/NJT/MrFRn 7q4O+ogIDRl2B8fj5ICrS/dITExMdoD4JnR/1Xa1ra3ukXAcymf1kKmnkVdzszHiskG6 DC1/THtDG4S9pb4wD9WP0Y/bhLV4h0dTPVCTm7inUKFbwNNGy58of5DHTTNvSDk/fAVH 9r6JEkMuMx2PJ+hcsukOSWvOZaKHZsGdpm999WQjt7BlGCuoGvK0rhdyP7tzP5RAhyon buvg== X-Gm-Message-State: AOAM531Bxo1YtldB567GWd1+I7VFgwGY6pOb/r72N9qSMSTZkk9IGllO ccCcE/tgvErQMI8FB7+sLG6GxJmUq2LFsQ== X-Google-Smtp-Source: ABdhPJxwpnOaVa43bfvKRTX/+CUIHNeEL6PBSv3hRGvxKVqAwUlHpimNCiyviBudBq1F60cw0Qas+w== X-Received: by 2002:a5d:6d47:: with SMTP id k7mr12675851wri.627.1643009263254; Sun, 23 Jan 2022 23:27:43 -0800 (PST) Received: from localhost.localdomain (dynamic-ea4d6itoftwgrmduu-pd01.res.v6.highway.a1.net. [2001:871:5e:430:45b2:f73c:6cfc:6aa6]) by smtp.gmail.com with ESMTPSA id f8sm20700207wmg.3.2022.01.23.23.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 23 Jan 2022 23:27:42 -0800 (PST) Message-ID: Subject: Re: Mailing list search engine: surprising missing results? From: Laurenz Albe To: James Addison , pgsql-www@postgresql.org Date: Mon, 24 Jan 2022 08:27:41 +0100 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-2.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, 2022-01-23 at 12:49 +0000, James Addison wrote: > Hello, > > I noticed that the mailing list search engine[1] seems to unexpectedly > miss results for some queries. > > For example: > > A search for "boyer"[2] returns five results, including result > snippets that contain the text "Boyer-More-Horspool" [sic] and > "Boyer-Moore-Horspool". > > However, a more specific search for "boyer-moore"[3] does not return > any results -- that seems surprising. > > Specializing the query further and searching for > "boyer-moore-horspool"[4] *does* again return results -- two documents > -- with the terms "boyer" and "horspool" highlighted. This is caused by the peculiarities of PostgreSQL full text search: SELECT to_tsvector('english', 'Boyer-Moore-Horspool') @@ websearch_to_tsquery('english', 'boyer-moore'); ?column? ══════════ f (1 row) The reason is that the 'moore' in 'boyer-moore' is stemmed, since it is at the end of the word, while the 'moore' in 'Boyer-Moore-Horspool' isn't: SELECT to_tsvector('english', 'Boyer-Moore-Horspool'); to_tsvector ══════════════════════════════════════════════════════════ 'boyer':2 'boyer-moore-horspool':1 'horspool':4 'moor':3 (1 row) SELECT websearch_to_tsquery('english', 'boyer-moore'); websearch_to_tsquery ═════════════════════════════════════ 'boyer-moor' <-> 'boyer' <-> 'moor' (1 row) 'boyer-moor' is not present in the first result. As a workaround, I suggest that you search for 'boyer moore' or (even better) '"boyer moore"' (with the double quotes): SELECT websearch_to_tsquery('english', 'boyer moore'); websearch_to_tsquery ══════════════════════ 'boyer' & 'moor' (1 row) SELECT websearch_to_tsquery('english', '"boyer moore"'); websearch_to_tsquery ══════════════════════ 'boyer' <-> 'moor' (1 row) Yours, Laurenz Albe