public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Florents Tselai <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search
Date: Mon, 2 Mar 2026 11:44:12 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+v5N43E0eC82QiLKrUBN7FDO8OmBDYN6Jrs1mGfh=t4Zf5pdg@mail.gmail.com>
References: <CA+v5N42aGz5sCa3HoURUK82b-PN6N6LtUj45vAcL7KEBd5i5fg@mail.gmail.com>
	<CA+v5N41Rr-18cjSsc3a6YvGTmnaXmfNiQM3e_gvdw76Yj7y6zA@mail.gmail.com>
	<[email protected]>
	<CA+v5N43E0eC82QiLKrUBN7FDO8OmBDYN6Jrs1mGfh=t4Zf5pdg@mail.gmail.com>



> On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> wrote:
> 
> 
> 
> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote:
> 
> 
> > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> wrote:
> > 
> > 
> > 
> > 
> > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <[email protected]> wrote:
> > Hi,
> > 
> > in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> > but never tackle it. Until yesterday that is.
> > 
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> > 
> > This patch introduces a tsmatch boolean operator to the JSONPath engine.
> > By integrating FTS natively into path expressions,
> > this operator allows for high-precision filtering of nested JSONB structures—
> > solving issues with structural ambiguity and query complexity.
> > 
> > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> > - Imprecise Global Indexing
> > jsonb_to_tsvector aggregates text into a flat vector.
> > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> > 
> > - Complex SQL Workarounds
> > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> > 
> > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> > within the same JSON object - something jsonb_to_tsvector cannot do.
> > 
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> > 
> > In a flat vector, the association between "Alice" and "performance" is lost.
> > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> > 
> > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> > tsmatch is architecturally modeled after the standard-defined like_regex.
> > 
> > The implementation follows the like_regex precedent:
> > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> > Caching is scoped to the JsonPathExecContext,
> > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
> > 
> > This initial implementation uses plainto_tsquery.
> > However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> > 
> > Here's a v2, that implements the tsqparser clause 
> > 
> > So this should now work too 
> > 
> > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
> 
> Hi Florents,
> 
> Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
> 
>  Hi Evan, 
> thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
> 
> I noticed the failure too, but I'm having a hard time reproducing it tbh.
> This fails for Debian Trixie with Meson. The same with Autoconf passes...
> 
> https://github.com/Florents-Tselai/postgres/runs/65098077968
> 
> 
> 
> 
> <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>

I have reviewed v3 and traced a few test cases. Here comes my review comments:

1 
```
+        <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+        <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+        <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```

For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```

2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+							  JsonPathString *tsquery,
+							  JsonPathString *tsconfig,
+							  JsonPathString *tsquery_parser,
+							  JsonPathParseItem ** result,
+							  struct Node *escontext);
```

Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line.

3 - jsonpath_gram.y
```
+	| expr TSMATCH_P STRING_P
+	{
+		JsonPathParseItem *jppitem;
+		/* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+		if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+		   YYABORT;
+		$$ = jppitem;
+	}
+	| expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+	{
+		JsonPathParseItem *jppitem;
+		/* Pass NULL for tsquery_parser (4th) */
+		if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+		   YYABORT;
+		$$ = jppitem;
+	}
+	| expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+	{
+		JsonPathParseItem *jppitem;
+		/* Pass NULL for tsconfig (3rd) */
+		if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+		   YYABORT;
+		$$ = jppitem;
+	}
+	| expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+	{
+		JsonPathParseItem *jppitem;
+		if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+		   YYABORT;
+		$$ = jppitem;
+	}
```

Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you.

4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+			 JsonPathString *tsquery,
+			 JsonPathString *tsconfig,
+			 JsonPathString *tsquery_parser,
+			 JsonPathParseItem **result,
+			 struct Node *escontext)
```

makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).

5 - jsonpath.h
```
+		struct
+		{
+			int32		doc;
+			char	   *tsquery;
+			uint32		tsquerylen;
+			int32		tsconfig;
+			char	   *tsqparser;
+			uint32		tsqparser_len;
+		}			tsmatch;

+		struct
+		{
+			JsonPathParseItem *doc;
+			char	   *tsquery;
+			uint32		tsquerylen;
+			JsonPathParseItem *tsconfig;
+			char	   *tsqparser;
+			uint32		tsqparser_len;
+		}			tsmatch;
 	}			value;
```

tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure?

6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"

static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```

Why don’t put these includes to the header section together with other includes?

7 - jsonpath_exec.c
```
+			else
+			{
+				/*
+				 * Fallback or Error for unknown flags (should be caught by
+				 * parser)
+				 */
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("unrecognized tsqparser flag")));
+			}
```

This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().

8 -  jsonpath_exec.c
```
+	/* Setup Context (Run ONLY once per predicate) */
+	if (!cxt->initialized)
```

While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-#   @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
 ?column?
----------

(1 row)
```

I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused.

9 -  jsonpath_exec.c
```
+		/* Select Parser and Compile Query */
+		parser_mode = jsp->content.tsmatch.tsqparser;
+		parser_len = jsp->content.tsmatch.tsqparser_len;
+
+		if (parser_len > 0)
+		{
+			/* Dispatch based on flag */
+			if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```

Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.

10 - jsonpath_gram.y
```
+			 ereport(ERROR,
+					 (errcode(ERRCODE_SYNTAX_ERROR),
+					  errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+					  errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```

When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb                                                                                                      @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR:  invalid tsquery_parser value: "pss@"
LINE 2:   @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
             ^
HINT:  Valid values are "pl", "ph", and "w".
```

You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```

This change is also included in the attached diff file.

11 - jsonpath.c
```
+			if (printBracketes)
+				appendStringInfoChar(buf, ')');
+			break;
+
 			if (printBracketes)
 				appendStringInfoChar(buf, ')');
```

Duplicate code. Looks like a copy-pasto.

12 - jsonpath.c
```
+				/* Write the Main Query String */
+				appendBinaryStringInfo(buf,
+									   &item->value.tsmatch.tsquerylen,
+									   sizeof(item->value.tsmatch.tsquerylen));
+				appendBinaryStringInfo(buf,
+									   item->value.tsmatch.tsquery,
+									   item->value.tsmatch.tsquerylen);
+				appendStringInfoChar(buf, '\0');
```

I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Attachments:

  [application/octet-stream] nocfbot_jsonpath_gram_y.diff (2.8K, 2-nocfbot_jsonpath_gram_y.diff)
  download | inline diff:
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index a2acf2660f6..4d5837e2459 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -78,6 +78,13 @@ static bool makeItemTsMatch(JsonPathParseItem *doc,
 	JsonPathParseItem  *value;
 	JsonPathParseResult *result;
 	JsonPathItemType	optype;
+	struct
+	{
+		bool			has_tsconfig;
+		JsonPathString	tsconfig;
+		bool			has_tsqparser;
+		JsonPathString	tsqparser;
+	}				tsmatch_opts;
 	bool				boolean;
 	int					integer;
 }
@@ -100,6 +107,8 @@ static bool makeItemTsMatch(JsonPathParseItem *doc,
 					datetime_template opt_datetime_template csv_elem
 					datetime_precision opt_datetime_precision
 
+%type	<tsmatch_opts>	tsmatch_opts
+
 %type	<elems>		accessor_expr csv_list opt_csv_list
 
 %type	<indexs>	index_list
@@ -192,36 +201,44 @@ predicate:
 			YYABORT;
 		$$ = jppitem;
 	}
-	| expr TSMATCH_P STRING_P
+	| expr TSMATCH_P STRING_P tsmatch_opts
 	{
 		JsonPathParseItem *jppitem;
-		/* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
-		if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
-		   YYABORT;
+
+		if (! makeItemTsMatch($1, &$3,
+							  $4.has_tsconfig ? &$4.tsconfig : NULL,
+							  $4.has_tsqparser ? &$4.tsqparser : NULL,
+							  &jppitem, escontext))
+			YYABORT;
+
 		$$ = jppitem;
 	}
-	| expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+	;
+
+tsmatch_opts:
+	/* EMPTY */
 	{
-		JsonPathParseItem *jppitem;
-		/* Pass NULL for tsquery_parser (4th) */
-		if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
-		   YYABORT;
-		$$ = jppitem;
+		$$.has_tsconfig = false;
+		$$.has_tsqparser = false;
 	}
-	| expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+	| TSCONFIG_P STRING_P
 	{
-		JsonPathParseItem *jppitem;
-		/* Pass NULL for tsconfig (3rd) */
-		if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
-		   YYABORT;
-		$$ = jppitem;
+		$$.has_tsconfig = true;
+		$$.tsconfig = $2;
+		$$.has_tsqparser = false;
 	}
-	| expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+	| TSQUERYPARSER_P STRING_P
 	{
-		JsonPathParseItem *jppitem;
-		if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
-		   YYABORT;
-		$$ = jppitem;
+		$$.has_tsconfig = false;
+		$$.has_tsqparser = true;
+		$$.tsqparser = $2;
+	}
+	| TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+	{
+		$$.has_tsconfig = true;
+		$$.tsconfig = $2;
+		$$.has_tsqparser = true;
+		$$.tsqparser = $4;
 	}
 	;
 
@@ -762,7 +779,7 @@ makeItemTsMatch(JsonPathParseItem *doc,
 		{
 			 ereport(ERROR,
 					 (errcode(ERRCODE_SYNTAX_ERROR),
-					  errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+					  errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
 					  errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
 		}
 


view thread (5+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox