public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Florents Tselai <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search
Date: Thu, 26 Feb 2026 14:48:06 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+v5N41Rr-18cjSsc3a6YvGTmnaXmfNiQM3e_gvdw76Yj7y6zA@mail.gmail.com>
References: <CA+v5N42aGz5sCa3HoURUK82b-PN6N6LtUj45vAcL7KEBd5i5fg@mail.gmail.com>
	<CA+v5N41Rr-18cjSsc3a6YvGTmnaXmfNiQM3e_gvdw76Yj7y6zA@mail.gmail.com>



> On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> wrote:
> 
> 
> 
> 
> On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <[email protected]> wrote:
> Hi,
> 
> in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> but never tackle it. Until yesterday that is.
> 
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> 
> This patch introduces a tsmatch boolean operator to the JSONPath engine.
> By integrating FTS natively into path expressions,
> this operator allows for high-precision filtering of nested JSONB structures—
> solving issues with structural ambiguity and query complexity.
> 
> Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> - Imprecise Global Indexing
> jsonb_to_tsvector aggregates text into a flat vector.
> This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> 
> - Complex SQL Workarounds
> Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> 
> One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> within the same JSON object - something jsonb_to_tsvector cannot do.
> 
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> 
> In a flat vector, the association between "Alice" and "performance" is lost.
> tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> 
> While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> tsmatch is architecturally modeled after the standard-defined like_regex.
> 
> The implementation follows the like_regex precedent:
> it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> Caching is scoped to the JsonPathExecContext,
> ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
> 
> This initial implementation uses plainto_tsquery.
> However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> 
> Here's a v2, that implements the tsqparser clause 
> 
> So this should now work too 
> 
> select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>

Hi Florents,

Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/










reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox