public inbox for [email protected]
help / color / mirror / Atom feedPatch: Add tsmatch JSONPath operator for granular Full Text Search
2+ messages / 1 participants
[nested] [flat]
* Patch: Add tsmatch JSONPath operator for granular Full Text Search
@ 2026-01-26 17:22 Florents Tselai <[email protected]>
2026-02-01 11:02 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]>
0 siblings, 1 reply; 2+ messages in thread
From: Florents Tselai @ 2026-01-26 17:22 UTC (permalink / raw)
To: pgsql-hackers <[email protected]>
Hi,
in real-life I work a lot with json & fts search, here's a feature I've
always wished I had,
but never tackle it. Until yesterday that is.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB
structures—
solving issues with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for FTS-ing
nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key
(e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a
"Customer Review").
- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via
jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating
intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during
path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.
The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for
pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the
stability requirements of prepared statements.
This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to
like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery,
and phraseto_tsquery.
Attachments:
[application/octet-stream] v1-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (22.2K, 3-v1-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch)
download | inline diff:
From 79a09ba63178db192937f689aaebefc1360ca0bf Mon Sep 17 00:00:00 2001
From: Florents Tselai <[email protected]>
Date: Mon, 26 Jan 2026 19:17:41 +0200
Subject: [PATCH v1] Add tsmatch JSONPath operator for granular Full Text
Search
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch introduces the tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB structures—
solving issues with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for searching nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
- Complex SQL Workarounds
Achieving 100% precision requires "exploding" the document via jsonb_array_elementsand LATERAL joins.
This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.
The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
---
src/backend/utils/adt/jsonpath.c | 90 ++++++++++++++++++++
src/backend/utils/adt/jsonpath_exec.c | 88 ++++++++++++++++++-
src/backend/utils/adt/jsonpath_gram.y | 66 +++++++++++++-
src/backend/utils/adt/jsonpath_scan.l | 2 +
src/include/utils/jsonpath.h | 15 ++++
src/test/regress/expected/jsonb_jsonpath.out | 46 ++++++++++
src/test/regress/expected/jsonpath.out | 42 +++++++++
src/test/regress/sql/jsonb_jsonpath.sql | 8 ++
src/test/regress/sql/jsonpath.sql | 14 +++
9 files changed, 369 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 18a8046d6cf..87c26f7adcf 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -349,6 +349,50 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext,
*(int32 *) (buf->data + offs) = chld - pos;
}
break;
+ case jpiTsMatch:
+ {
+ int32 expr_off;
+ int32 config_off;
+
+ expr_off = reserveSpaceForItemPointer(buf); /* Slot for '@' */
+ config_off = reserveSpaceForItemPointer(buf); /* Slot for 'tsconfig' */
+
+ /* Write the tsquery String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0'); /* Safety Null Terminator */
+
+ /* Flatten Child 1: Expression (@) */
+ if (!flattenJsonPathParseItem(buf, &chld, escontext,
+ item->value.tsmatch.doc,
+ nestingLevel,
+ insideArraySubscript))
+ return false;
+ /* Patch the first slot */
+ *(int32 *) (buf->data + expr_off) = chld - pos;
+
+ /* Flatten Child 2: TSConfig (Optional) */
+ if (item->value.tsmatch.tsconfig)
+ {
+ if (!flattenJsonPathParseItem(buf, &chld, escontext,
+ item->value.tsmatch.tsconfig,
+ nestingLevel,
+ insideArraySubscript))
+ return false;
+ /* Patch the second slot */
+ *(int32 *) (buf->data + config_off) = chld - pos;
+ }
+ else
+ {
+ /* If no config, write 0 to the slot (Null ptr) */
+ *(int32 *) (buf->data + config_off) = 0;
+ }
+ }
+ break;
case jpiFilter:
argNestingLevel++;
/* FALLTHROUGH */
@@ -759,6 +803,38 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey,
appendStringInfoChar(buf, '"');
}
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+ case jpiTsMatch:
+ if (printBracketes)
+ appendStringInfoChar(buf, '(');
+
+ jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc);
+ printJsonPathItem(buf, &elem, false,
+ operationPriority(elem.type) <=
+ operationPriority(v->type));
+
+ appendStringInfoString(buf, " tsmatch ");
+
+ escape_json_with_len(buf,
+ v->content.tsmatch.tsquery,
+ v->content.tsmatch.tsquerylen);
+
+ if (v->content.tsmatch.tsconfig)
+ {
+ JsonPathItem config_item;
+ int32 config_len;
+ char *config_str;
+
+ appendStringInfoString(buf, " tsconfig ");
+ jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig);
+ config_str = jspGetString(&config_item, &config_len);
+ appendStringInfoChar(buf, '"');
+ appendBinaryStringInfo(buf, config_str, config_len);
+ appendStringInfoChar(buf, '"');
+ }
+
if (printBracketes)
appendStringInfoChar(buf, ')');
break;
@@ -914,6 +990,8 @@ jspOperationName(JsonPathItemType type)
return "timestamp";
case jpiTimestampTz:
return "timestamp_tz";
+ case jpiTsMatch:
+ return "tsmatch";
default:
elog(ERROR, "unrecognized jsonpath item type: %d", type);
return NULL;
@@ -1072,6 +1150,12 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos)
read_int32(v->content.like_regex.patternlen, base, pos);
v->content.like_regex.pattern = base + pos;
break;
+ case jpiTsMatch:
+ read_int32(v->content.tsmatch.doc, base, pos);
+ read_int32(v->content.tsmatch.tsconfig, base, pos);
+ read_int32(v->content.tsmatch.tsquerylen, base, pos);
+ v->content.tsmatch.tsquery = base + pos;
+ break;
default:
elog(ERROR, "unrecognized jsonpath item type: %d", v->type);
}
@@ -1142,6 +1226,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a)
v->type == jpiLast ||
v->type == jpiStartsWith ||
v->type == jpiLikeRegex ||
+ v->type == jpiTsMatch ||
v->type == jpiBigint ||
v->type == jpiBoolean ||
v->type == jpiDate ||
@@ -1474,6 +1559,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt)
jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr);
jspIsMutableWalker(&arg, cxt);
break;
+ case jpiTsMatch:
+ Assert(status == jpdsNonDateTime);
+ jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc);
+ jspIsMutableWalker(&arg, cxt);
+ break;
/* literals */
case jpiNull:
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 52ae0ba4cf7..35b29148316 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -123,6 +123,12 @@ typedef struct JsonLikeRegexContext
int cflags;
} JsonLikeRegexContext;
+typedef struct JsonTsMatchContext
+{
+ text *vec;
+ Oid tsCfg_id;
+} JsonTsMatchContext;
+
/* Result of jsonpath predicate evaluation */
typedef enum JsonPathBool
{
@@ -306,6 +312,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt,
JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found);
static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt,
JsonPathItem *jsp, JsonValueList *found, JsonPathBool res);
+static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param);
static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item,
JsonbValue *value);
static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen,
@@ -800,6 +807,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
case jpiExists:
case jpiStartsWith:
case jpiLikeRegex:
+ case jpiTsMatch:
{
JsonPathBool st = executeBoolItem(cxt, jsp, jb, true);
@@ -1868,6 +1876,16 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp,
return executePredicate(cxt, jsp, &larg, NULL, jb, false,
executeLikeRegex, &lrcxt);
}
+ case jpiTsMatch:
+ {
+ JsonTsMatchContext lrcxt = {0};
+
+ jspInitByBuffer(&larg, jsp->base,
+ jsp->content.tsmatch.doc);
+
+ return executePredicate(cxt, jsp, &larg, NULL, jb, false,
+ executeTsMatch, &lrcxt);
+ }
case jpiExists:
jspGetArg(jsp, &larg);
@@ -1899,7 +1917,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp,
return res == jperOk ? jpbTrue : jpbFalse;
}
-
default:
elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type);
return jpbUnknown;
@@ -2922,6 +2939,75 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
return res;
}
+#include "tsearch/ts_utils.h"
+#include "tsearch/ts_cache.h"
+#include "utils/regproc.h"
+#include "catalog/namespace.h"
+
+static JsonPathBool
+executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
+ void *param)
+{
+ JsonTsMatchContext *cxt = param;
+ text *doc;
+ Datum tsquery;
+ Datum tsvector;
+ bool match;
+
+ if (!(str = getScalar(str, jbvString)))
+ return jpbUnknown;
+
+ /* Setup the Context (Run only once per predicate) */
+ if (!cxt->vec)
+ {
+ /* Cache the tsquery */
+ cxt->vec = cstring_to_text_with_len(jsp->content.tsmatch.tsquery,
+ jsp->content.tsmatch.tsquerylen);
+
+ /* Resolve the tsconfig OID from the offset */
+ if (jsp->content.tsmatch.tsconfig != 0)
+ {
+ JsonPathItem config_item;
+ int32 config_len;
+ char *config_str;
+
+ jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig);
+ config_str = jspGetString(&config_item, &config_len);;
+
+ cxt->tsCfg_id = get_ts_config_oid(stringToQualifiedNameList(config_str, NULL), true);
+ }
+ else
+ {
+ cxt->tsCfg_id = getTSCurrentConfig(true);
+ }
+ }
+
+ /*
+ * elog(NOTICE, "ts_config=[%s] cfgId=[%u] vector=[%.*s] query=[%s]",
+ * config_name, cxt->cfgId, str->val.string.len, str->val.string.val,
+ * jsp->content.tsmatch.query);
+ */
+
+
+ doc = cstring_to_text_with_len(str->val.string.val,
+ str->val.string.len);
+
+ tsvector = DirectFunctionCall2(to_tsvector_byid,
+ ObjectIdGetDatum(cxt->tsCfg_id),
+ PointerGetDatum(doc));
+
+ tsquery = DirectFunctionCall2(plainto_tsquery_byid,
+ ObjectIdGetDatum(cxt->tsCfg_id),
+ PointerGetDatum(cxt->vec));
+
+
+ match = DatumGetBool(DirectFunctionCall2(ts_match_vq,
+ tsvector,
+ tsquery));
+
+ return match ? jpbTrue : jpbFalse;
+}
+
/*
* Convert boolean execution status 'res' to a boolean JSON item and execute
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 4543626ffc8..894007ad992 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -43,6 +43,11 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr,
JsonPathString *flags,
JsonPathParseItem ** result,
struct Node *escontext);
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
/*
* Bison doesn't allocate anything that needs to live across parser calls,
@@ -81,7 +86,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr,
%token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P
%token <str> OR_P AND_P NOT_P
%token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P
-%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P
+%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P
%token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P
%token <str> DATETIME_P
%token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P
@@ -187,6 +192,20 @@ predicate:
YYABORT;
$$ = jppitem;
}
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
;
starts_with_initial:
@@ -357,6 +376,8 @@ key_name:
| TIME_TZ_P
| TIMESTAMP_P
| TIMESTAMP_TZ_P
+ | TSCONFIG_P
+ | TSMATCH_P
;
method:
@@ -683,3 +704,46 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext)
return true;
}
+
+
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathParseItem **result,
+ struct Node *escontext)
+{
+ /* Allocate the parent node */
+ JsonPathParseItem *v = makeItemType(jpiTsMatch);
+
+ /* Attach the child expression (@) */
+ v->value.tsmatch.doc = doc;
+
+ /* Attach the Pattern (Stored as raw char* because it's always a leaf) */
+ v->value.tsmatch.tsquery = tsquery->val;
+ v->value.tsmatch.tsquerylen = tsquery->len;
+
+ /* Handle the Configuration (Stored as a Node) */
+ if (tsconfig)
+ {
+ /*
+ * The flattener expects tsconfig to be a JsonPathParseItem*.
+ * So we wrap the raw string in a jpiString node.
+ */
+ JsonPathParseItem *conf = makeItemType(jpiString);
+
+ conf->value.string.val = tsconfig->val;
+ conf->value.string.len = tsconfig->len;
+
+ /* Assign the pointer */
+ v->value.tsmatch.tsconfig = conf;
+ }
+ else
+ {
+ v->value.tsmatch.tsconfig = NULL;
+ }
+
+ /* Success */
+ *result = v;
+ return true;
+}
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 38c5841e879..86a9c03436c 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -427,9 +427,11 @@ static const JsonPathKeyword keywords[] = {
{7, false, DECIMAL_P, "decimal"},
{7, false, INTEGER_P, "integer"},
{7, false, TIME_TZ_P, "time_tz"},
+ {7, false, TSMATCH_P, "tsmatch"},
{7, false, UNKNOWN_P, "unknown"},
{8, false, DATETIME_P, "datetime"},
{8, false, KEYVALUE_P, "keyvalue"},
+ {8, false, TSCONFIG_P, "tsconfig"},
{9, false, TIMESTAMP_P, "timestamp"},
{10, false, LIKE_REGEX_P, "like_regex"},
{12, false, TIMESTAMP_TZ_P, "timestamp_tz"},
diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h
index 6f529d74dcd..12899e0c496 100644
--- a/src/include/utils/jsonpath.h
+++ b/src/include/utils/jsonpath.h
@@ -104,6 +104,7 @@ typedef enum JsonPathItemType
jpiLast, /* LAST array subscript */
jpiStartsWith, /* STARTS WITH predicate */
jpiLikeRegex, /* LIKE_REGEX predicate */
+ jpiTsMatch, /* TSMATCH predicate */
jpiBigint, /* .bigint() item method */
jpiBoolean, /* .boolean() item method */
jpiDate, /* .date() item method */
@@ -188,6 +189,13 @@ typedef struct JsonPathItem
int32 patternlen;
uint32 flags;
} like_regex;
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ } tsmatch;
} content;
} JsonPathItem;
@@ -266,6 +274,13 @@ struct JsonPathParseItem
uint32 len;
char *val; /* could not be not null-terminated */
} string;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ } tsmatch;
} value;
};
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index 4bcd4e91a29..edc812b9a32 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -4510,3 +4510,49 @@ ORDER BY s1.num, s2.num;
{"s": "B"} | {"s": "B"} | false | true | true | true | false
(144 rows)
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")');
+ jsonb_path_query
+------------------
+(0 rows)
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "running"
+ "runs"
+ "run"
+(3 rows)
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")');
+ jsonb_path_query
+------------------
+ "run"
+(1 row)
+
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "postgres"
+ "POSTGRES"
+(2 rows)
+
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")');
+ jsonb_path_query
+------------------
+ "postgres"
+ "POSTGRES"
+(2 rows)
+
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "fast car"
+ "super fast car"
+(2 rows)
+
+select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "fat rats"
+(1 row)
+
diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out
index fd9bd755f52..84a11a3ebeb 100644
--- a/src/test/regress/expected/jsonpath.out
+++ b/src/test/regress/expected/jsonpath.out
@@ -1294,3 +1294,45 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text,
1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | |
(5 rows)
+-- tsmatch (Full Text Search)
+-- basic success
+select '$ ? (@ tsmatch "simple")'::jsonpath;
+ jsonpath
+------------------------
+ $?(@ tsmatch "simple")
+(1 row)
+
+select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath;
+ jsonpath
+--------------------------------------------
+ $?(@ tsmatch "running" tsconfig "english")
+(1 row)
+
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath;
+ jsonpath
+--------------------------------------------------
+ $?(@ tsmatch "fast & furious" tsconfig "simple")
+(1 row)
+
+select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath;
+ jsonpath
+--------------------------------------------------
+ $[*]?(@."title" tsmatch "god" && @."rating" > 5)
+(1 row)
+
+select '$ ? (@ tsmatch $pattern)'::jsonpath;
+ERROR: syntax error at or near "$pattern" of jsonpath input
+LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath;
+ ^
+-- only string literals (no variables) are allowed for tsquery
+select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+ERROR: syntax error at or near "$var" of jsonpath input
+LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+ ^
+-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it)
+select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath;
+ jsonpath
+------------------------------------------------
+ $?(@ tsmatch "running" tsconfig "wrongconfig")
+(1 row)
+
diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql
index 3e8929a5269..f9ae5889ee2 100644
--- a/src/test/regress/sql/jsonb_jsonpath.sql
+++ b/src/test/regress/sql/jsonb_jsonpath.sql
@@ -1147,3 +1147,11 @@ SELECT
jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt
FROM str s1, str s2
ORDER BY s1.num, s2.num;
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")');
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")');
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")');
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")');
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")');
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")');
+select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")');
diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql
index 61a5270d4e8..edb515204ba 100644
--- a/src/test/regress/sql/jsonpath.sql
+++ b/src/test/regress/sql/jsonpath.sql
@@ -265,3 +265,17 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text,
'00',
'1a']) str,
LATERAL pg_input_error_info(str, 'jsonpath') as errinfo;
+
+-- tsmatch (Full Text Search)
+
+-- basic success
+select '$ ? (@ tsmatch "simple")'::jsonpath;
+select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath;
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath;
+select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath;
+select '$ ? (@ tsmatch $pattern)'::jsonpath;
+
+-- only string literals (no variables) are allowed for tsquery
+select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it)
+select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath;
--
2.52.0
[application/octet-stream] tsmatch_bench.out (6.7K, 4-tsmatch_bench.out)
download
[application/octet-stream] tsmatch_bench.sql (2.4K, 5-tsmatch_bench.sql)
download
^ permalink raw reply [nested|flat] 2+ messages in thread
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search
2026-01-26 17:22 Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]>
@ 2026-02-01 11:02 ` Florents Tselai <[email protected]>
0 siblings, 0 replies; 2+ messages in thread
From: Florents Tselai @ 2026-02-01 11:02 UTC (permalink / raw)
To: pgsql-hackers <[email protected]>
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <[email protected]>
wrote:
> Hi,
>
> in real-life I work a lot with json & fts search, here's a feature I've
> always wished I had,
> but never tackle it. Until yesterday that is.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
> tsmatch "performance")');
>
> This patch introduces a tsmatch boolean operator to the JSONPath engine.
> By integrating FTS natively into path expressions,
> this operator allows for high-precision filtering of nested JSONB
> structures—
> solving issues with structural ambiguity and query complexity.
>
> Currently, users must choose between two suboptimal paths for FTS-ing
> nested JSON:
> - Imprecise Global Indexing
> jsonb_to_tsvector aggregates text into a flat vector.
> This ignores JSON boundaries, leading to false positives when the same key
> (e.g., "body")
> appears in different contexts (e.g., a "Product Description" vs. a
> "Customer Review").
>
> - Complex SQL Workarounds
> Achieving 100% precision requires unnesting the document via
> jsonb_array_elements and LATERAL joins.
> This leads to verbose SQL and high memory overhead from generating
> intermediate heap tuples.
>
> One of the most significant advantages of tsmatch is its ability to
> participate in multi-condition predicates
> within the same JSON object - something jsonb_to_tsvector cannot do.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
> tsmatch "performance")');
>
> In a flat vector, the association between "Alice" and "performance" is
> lost.
> tsmatch preserves this link by evaluating the FTS predicate in-place
> during path traversal.
>
> While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
> FTS operator,
> tsmatch is architecturally modeled after the standard-defined like_regex.
>
> The implementation follows the like_regex precedent:
> it is a non-indexable predicate that relies on GIN path-matching for
> pruning and heap re-checks for precision.
> Caching is scoped to the JsonPathExecContext,
> ensuring 'compile-once' efficiency per execution without violating the
> stability requirements of prepared statements.
>
> This initial implementation uses plainto_tsquery.
> However, the grammar is designed to support a "mode" flag (similar to
> like_regex flags)
> in future iterations to toggle between to_tsquery, websearch_to_tsquery,
> and phraseto_tsquery.
>
Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and
furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
Attachments:
[application/octet-stream] v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (31.2K, 3-v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch)
download | inline diff:
From 68e16b4515fa31873a2764d2f48909e65aed92bc Mon Sep 17 00:00:00 2001
From: Florents Tselai <[email protected]>
Date: Sun, 1 Feb 2026 11:42:58 +0200
Subject: [PATCH v2] Add tsmatch JSONPath operator for granular Full Text
Search
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch introduces the tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions, this operator allows
for high-precision filtering of nested JSONB structures—solving issues
with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for searching
nested JSON:
1. Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector. This ignores
JSON boundaries, leading to false positives when the same key
(e.g., "body") appears in different contexts (e.g., a "Product
Description" vs. a "Customer Review").
2. Complex SQL Workarounds
Achieving 100% precision requires "exploding" the document via
jsonb_array_elements and LATERAL joins. This leads to verbose SQL
and high memory overhead from generating intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates within the same JSON object—
something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is
lost. tsmatch preserves this link by evaluating the FTS predicate
in-place during path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define
an FTS operator, tsmatch is architecturally modeled after the
standard-defined like_regex.
The operator supports optional configuration for both the dictionary and
the query parser:
@ tsmatch "query" [ tsconfig "regconfig" ] [ tsqparser "mode" ]
Supported parser modes are:
- "pl": plainto_tsquery (no operators required)
- "ph": phraseto_tsquery
- "w": websearch_to_tsquery
- Omitted: Defaults to to_tsquery (strict mode)
The implementation relies on GIN path-matching for index pruning and
heap re-checks for precision. Caching is scoped to the
JsonPathExecContext, ensuring 'compile-once' efficiency for the
tsquery and OID lookup per execution, respecting the stability
requirements of prepared statements.
---
doc/src/sgml/func/func-json.sgml | 34 +++++
src/backend/utils/adt/jsonpath.c | 136 ++++++++++++++++++-
src/backend/utils/adt/jsonpath_exec.c | 130 +++++++++++++++++-
src/backend/utils/adt/jsonpath_gram.y | 99 +++++++++++++-
src/backend/utils/adt/jsonpath_scan.l | 3 +
src/include/utils/jsonpath.h | 19 +++
src/test/regress/expected/jsonb_jsonpath.out | 58 ++++++++
src/test/regress/expected/jsonpath.out | 60 ++++++++
src/test/regress/sql/jsonb_jsonpath.sql | 13 ++
src/test/regress/sql/jsonpath.sql | 19 +++
10 files changed, 567 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml
index 1ec73cff464..f5d6da3824d 100644
--- a/doc/src/sgml/func/func-json.sgml
+++ b/doc/src/sgml/func/func-json.sgml
@@ -3107,6 +3107,40 @@ ERROR: jsonpath member accessor can only be applied to an object
<returnvalue>[]</returnvalue>
</para></entry>
</row>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
+ <returnvalue>boolean</returnvalue>
+ </para>
+ <para>
+ Tests whether the JSON string (first operand) matches the full-text search query
+ (second operand). The matching behavior can be customized using the optional
+ <literal>tsconfig</literal> and <literal>tsqparser</literal> clauses.
+ If <literal>tsconfig</literal> is omitted, the current session's default text search configuration
+ is used (see <xref linkend="guc-default-text-search-config"/>).
+ The <literal>tsqparser</literal> clause determines how the query string is parsed
+ (see <xref linkend="textsearch-parsing-queries"/>).
+ Valid options are <literal>"pl"</literal> (<function>plainto_tsquery</function>),
+ <literal>"ph"</literal> (<function>phraseto_tsquery</function>), and
+ <literal>"w"</literal> (<function>websearch_to_tsquery</function>).
+ If <literal>tsqparser</literal> is omitted, the query is parsed using <function>to_tsquery</function>.
+ </para>
+ <para>
+ <literal>jsonb_path_query_array('["running", "runs", "ran", "jogging"]', '$[*] ? (@ tsmatch "run" tsconfig "english")')</literal>
+ <returnvalue>["running", "runs"]</returnvalue>
+ </para>
+ <para>
+ <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast & car")')</literal>
+ <returnvalue>["fast car"]</returnvalue>
+ </para>
+ <para>
+ <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")')</literal>
+ <returnvalue>["fast car"]</returnvalue>
+ </para>
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 18a8046d6cf..df4ee2a26f1 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -349,6 +349,66 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext,
*(int32 *) (buf->data + offs) = chld - pos;
}
break;
+ case jpiTsMatch:
+ {
+ int32 expr_off;
+ int32 tsconfig_off;
+
+ /* Reserve slots for child node offsets */
+ expr_off = reserveSpaceForItemPointer(buf);
+ tsconfig_off = reserveSpaceForItemPointer(buf);
+
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
+
+ /* Write the Parser Flag */
+ if (item->value.tsmatch.tsqparser)
+ {
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsqparser_len,
+ sizeof(item->value.tsmatch.tsqparser_len));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsqparser,
+ item->value.tsmatch.tsqparser_len);
+ appendStringInfoChar(buf, '\0');
+ }
+ else
+ {
+ uint32 zero = 0;
+
+ appendBinaryStringInfo(buf, &zero, sizeof(uint32));
+ appendStringInfoChar(buf, '\0');
+ }
+
+ if (!flattenJsonPathParseItem(buf, &chld, escontext,
+ item->value.tsmatch.doc,
+ nestingLevel,
+ insideArraySubscript))
+ return false;
+ *(int32 *) (buf->data + expr_off) = chld - pos;
+
+ /* TSConfig (Optional) */
+ if (item->value.tsmatch.tsconfig)
+ {
+ if (!flattenJsonPathParseItem(buf, &chld, escontext,
+ item->value.tsmatch.tsconfig,
+ nestingLevel,
+ insideArraySubscript))
+ return false;
+ *(int32 *) (buf->data + tsconfig_off) = chld - pos;
+ }
+ else
+ {
+ *(int32 *) (buf->data + tsconfig_off) = 0;
+ }
+ }
+ break;
case jpiFilter:
argNestingLevel++;
/* FALLTHROUGH */
@@ -759,6 +819,58 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey,
appendStringInfoChar(buf, '"');
}
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+ case jpiTsMatch:
+ if (printBracketes)
+ appendStringInfoChar(buf, '(');
+
+ jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc);
+ printJsonPathItem(buf, &elem, false,
+ operationPriority(elem.type) <=
+ operationPriority(v->type));
+
+ appendStringInfoString(buf, " tsmatch ");
+
+ escape_json_with_len(buf,
+ v->content.tsmatch.tsquery,
+ v->content.tsmatch.tsquerylen);
+
+ /* Print TSConfig if present */
+ if (v->content.tsmatch.tsconfig)
+ {
+ JsonPathItem config_item;
+ int32 config_len;
+ char *config_str;
+
+ appendStringInfoString(buf, " tsconfig ");
+ jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig);
+ config_str = jspGetString(&config_item, &config_len);
+ appendStringInfoChar(buf, '"');
+ appendBinaryStringInfo(buf, config_str, config_len);
+ appendStringInfoChar(buf, '"');
+ }
+
+ if (v->content.tsmatch.tsqparser_len > 0)
+ {
+ appendStringInfoString(buf, " tsqparser ");
+ appendStringInfoChar(buf, '"');
+
+ /*
+ * Use simple binary append since flags like "pl" don't need
+ * JSON escaping
+ */
+ appendBinaryStringInfo(buf,
+ v->content.tsmatch.tsqparser,
+ v->content.tsmatch.tsqparser_len);
+ appendStringInfoChar(buf, '"');
+ }
+
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
break;
@@ -914,6 +1026,8 @@ jspOperationName(JsonPathItemType type)
return "timestamp";
case jpiTimestampTz:
return "timestamp_tz";
+ case jpiTsMatch:
+ return "tsmatch";
default:
elog(ERROR, "unrecognized jsonpath item type: %d", type);
return NULL;
@@ -1072,6 +1186,21 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos)
read_int32(v->content.like_regex.patternlen, base, pos);
v->content.like_regex.pattern = base + pos;
break;
+ case jpiTsMatch:
+ read_int32(v->content.tsmatch.doc, base, pos);
+ read_int32(v->content.tsmatch.tsconfig, base, pos);
+
+ /* Read Query String */
+ read_int32(v->content.tsmatch.tsquerylen, base, pos);
+ v->content.tsmatch.tsquery = base + pos;
+ /* Skip past query string + null terminator */
+ pos += v->content.tsmatch.tsquerylen + 1;
+
+ /* Read Parser Flag */
+ read_int32(v->content.tsmatch.tsqparser_len, base, pos);
+ v->content.tsmatch.tsqparser = base + pos;
+ pos += v->content.tsmatch.tsqparser_len + 1;
+ break;
default:
elog(ERROR, "unrecognized jsonpath item type: %d", v->type);
}
@@ -1142,6 +1271,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a)
v->type == jpiLast ||
v->type == jpiStartsWith ||
v->type == jpiLikeRegex ||
+ v->type == jpiTsMatch ||
v->type == jpiBigint ||
v->type == jpiBoolean ||
v->type == jpiDate ||
@@ -1474,7 +1604,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt)
jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr);
jspIsMutableWalker(&arg, cxt);
break;
-
+ case jpiTsMatch:
+ Assert(status == jpdsNonDateTime);
+ jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc);
+ jspIsMutableWalker(&arg, cxt);
+ break;
/* literals */
case jpiNull:
case jpiString:
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 52ae0ba4cf7..8b932518a05 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -123,6 +123,13 @@ typedef struct JsonLikeRegexContext
int cflags;
} JsonLikeRegexContext;
+typedef struct JsonTsMatchContext
+{
+ Datum queryDatum; /* Cache the compiled binary TSQuery */
+ Oid tsconfigId; /* Cache the dictionary OID */
+ bool initialized; /* Flag to run setup only once */
+} JsonTsMatchContext;
+
/* Result of jsonpath predicate evaluation */
typedef enum JsonPathBool
{
@@ -306,6 +313,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt,
JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found);
static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt,
JsonPathItem *jsp, JsonValueList *found, JsonPathBool res);
+static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param);
static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item,
JsonbValue *value);
static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen,
@@ -800,6 +808,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
case jpiExists:
case jpiStartsWith:
case jpiLikeRegex:
+ case jpiTsMatch:
{
JsonPathBool st = executeBoolItem(cxt, jsp, jb, true);
@@ -1868,6 +1877,16 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp,
return executePredicate(cxt, jsp, &larg, NULL, jb, false,
executeLikeRegex, &lrcxt);
}
+ case jpiTsMatch:
+ {
+ JsonTsMatchContext lrcxt = {0};
+
+ jspInitByBuffer(&larg, jsp->base,
+ jsp->content.tsmatch.doc);
+
+ return executePredicate(cxt, jsp, &larg, NULL, jb, false,
+ executeTsMatch, &lrcxt);
+ }
case jpiExists:
jspGetArg(jsp, &larg);
@@ -1899,7 +1918,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp,
return res == jperOk ? jpbTrue : jpbFalse;
}
-
default:
elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type);
return jpbUnknown;
@@ -2922,6 +2940,116 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
return res;
}
+#include "tsearch/ts_utils.h"
+#include "tsearch/ts_cache.h"
+#include "utils/regproc.h"
+#include "catalog/namespace.h"
+
+static JsonPathBool
+executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
+ void *param)
+{
+ JsonTsMatchContext *cxt = param;
+ text *doc_text;
+ Datum tsvector_datum;
+ bool match;
+
+ if (!(str = getScalar(str, jbvString)))
+ return jpbUnknown;
+
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
+ {
+ text *query_text;
+ char *parser_mode;
+ uint32 parser_len;
+
+ if (jsp->content.tsmatch.tsconfig != 0)
+ {
+ JsonPathItem config_item;
+ int32 config_len;
+ char *config_str;
+
+ jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig);
+ config_str = jspGetString(&config_item, &config_len);
+
+ cxt->tsconfigId = get_ts_config_oid(stringToQualifiedNameList(config_str, NULL), true);
+ }
+ else
+ {
+ cxt->tsconfigId = getTSCurrentConfig(true);
+ }
+
+ /* Prepare Query Text */
+ query_text = cstring_to_text_with_len(jsp->content.tsmatch.tsquery,
+ jsp->content.tsmatch.tsquerylen);
+
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
+ {
+ cxt->queryDatum = DirectFunctionCall2(plainto_tsquery_byid,
+ ObjectIdGetDatum(cxt->tsconfigId),
+ PointerGetDatum(query_text));
+ }
+ else if (pg_strncasecmp(parser_mode, "ph", parser_len) == 0)
+ {
+ cxt->queryDatum = DirectFunctionCall2(phraseto_tsquery_byid,
+ ObjectIdGetDatum(cxt->tsconfigId),
+ PointerGetDatum(query_text));
+ }
+ else if (pg_strncasecmp(parser_mode, "w", parser_len) == 0)
+ {
+ cxt->queryDatum = DirectFunctionCall2(websearch_to_tsquery_byid,
+ ObjectIdGetDatum(cxt->tsconfigId),
+ PointerGetDatum(query_text));
+ }
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
+ }
+ else
+ {
+ /*
+ * Default: to_tsquery (Standard Mode) Note: This expects
+ * operators like '&' or '|' in the query string
+ */
+ cxt->queryDatum = DirectFunctionCall2(to_tsquery_byid,
+ ObjectIdGetDatum(cxt->tsconfigId),
+ PointerGetDatum(query_text));
+ }
+
+ cxt->initialized = true;
+ }
+
+ /* Runtime: Convert Doc to Vector and Match */
+
+ doc_text = cstring_to_text_with_len(str->val.string.val,
+ str->val.string.len);
+
+ tsvector_datum = DirectFunctionCall2(to_tsvector_byid,
+ ObjectIdGetDatum(cxt->tsconfigId),
+ PointerGetDatum(doc_text));
+
+ match = DatumGetBool(DirectFunctionCall2(ts_match_vq,
+ tsvector_datum,
+ cxt->queryDatum));
+
+ return match ? jpbTrue : jpbFalse;
+}
+
/*
* Convert boolean execution status 'res' to a boolean JSON item and execute
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 4543626ffc8..b0e19b8b567 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -43,7 +43,12 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr,
JsonPathString *flags,
JsonPathParseItem ** result,
struct Node *escontext);
-
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
/*
* Bison doesn't allocate anything that needs to live across parser calls,
* so we can easily have it use palloc instead of malloc. This prevents
@@ -81,7 +86,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr,
%token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P
%token <str> OR_P AND_P NOT_P
%token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P
-%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P
+%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P TSQUERYPARSER_P
%token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P
%token <str> DATETIME_P
%token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P
@@ -187,6 +192,37 @@ predicate:
YYABORT;
$$ = jppitem;
}
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
;
starts_with_initial:
@@ -357,6 +393,9 @@ key_name:
| TIME_TZ_P
| TIMESTAMP_P
| TIMESTAMP_TZ_P
+ | TSCONFIG_P
+ | TSMATCH_P
+ | TSQUERYPARSER_P
;
method:
@@ -683,3 +722,59 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext)
return true;
}
+
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
+{
+ JsonPathParseItem *v = makeItemType(jpiTsMatch);
+
+ v->value.tsmatch.doc = doc;
+
+ v->value.tsmatch.tsquery = tsquery->val;
+ v->value.tsmatch.tsquerylen = tsquery->len;
+
+ /* Handle the Configuration (Stored as a Node) */
+ if (tsconfig)
+ {
+ JsonPathParseItem *conf = makeItemType(jpiString);
+ conf->value.string.val = tsconfig->val;
+ conf->value.string.len = tsconfig->len;
+ v->value.tsmatch.tsconfig = conf;
+ }
+ else
+ {
+ v->value.tsmatch.tsconfig = NULL;
+ }
+
+ /* Handle the TSQuery Parser Flag */
+ if (tsquery_parser)
+ {
+ /* Check for "pl", "ph", "w" here to throw a syntax error immediately */
+ if (pg_strncasecmp(tsquery_parser->val, "pl", tsquery_parser->len) != 0 &&
+ pg_strncasecmp(tsquery_parser->val, "ph", tsquery_parser->len) != 0 &&
+ pg_strncasecmp(tsquery_parser->val, "w", tsquery_parser->len) != 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
+ }
+
+ v->value.tsmatch.tsqparser = tsquery_parser->val;
+ v->value.tsmatch.tsqparser_len = tsquery_parser->len;
+ }
+ else
+ {
+ v->value.tsmatch.tsqparser = NULL;
+ v->value.tsmatch.tsqparser_len = 0;
+ }
+
+ /* Success */
+ *result = v;
+ return true;
+}
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 38c5841e879..809c66ec085 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -427,10 +427,13 @@ static const JsonPathKeyword keywords[] = {
{7, false, DECIMAL_P, "decimal"},
{7, false, INTEGER_P, "integer"},
{7, false, TIME_TZ_P, "time_tz"},
+ {7, false, TSMATCH_P, "tsmatch"},
{7, false, UNKNOWN_P, "unknown"},
{8, false, DATETIME_P, "datetime"},
{8, false, KEYVALUE_P, "keyvalue"},
+ {8, false, TSCONFIG_P, "tsconfig"},
{9, false, TIMESTAMP_P, "timestamp"},
+ {9, false, TSQUERYPARSER_P, "tsqparser"},
{10, false, LIKE_REGEX_P, "like_regex"},
{12, false, TIMESTAMP_TZ_P, "timestamp_tz"},
};
diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h
index 6f529d74dcd..00752321ff6 100644
--- a/src/include/utils/jsonpath.h
+++ b/src/include/utils/jsonpath.h
@@ -104,6 +104,7 @@ typedef enum JsonPathItemType
jpiLast, /* LAST array subscript */
jpiStartsWith, /* STARTS WITH predicate */
jpiLikeRegex, /* LIKE_REGEX predicate */
+ jpiTsMatch, /* TSMATCH predicate */
jpiBigint, /* .bigint() item method */
jpiBoolean, /* .boolean() item method */
jpiDate, /* .date() item method */
@@ -188,6 +189,15 @@ typedef struct JsonPathItem
int32 patternlen;
uint32 flags;
} like_regex;
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} content;
} JsonPathItem;
@@ -266,6 +276,15 @@ struct JsonPathParseItem
uint32 len;
char *val; /* could not be not null-terminated */
} string;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
};
diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out
index 4bcd4e91a29..35558af04e5 100644
--- a/src/test/regress/expected/jsonb_jsonpath.out
+++ b/src/test/regress/expected/jsonb_jsonpath.out
@@ -4510,3 +4510,61 @@ ORDER BY s1.num, s2.num;
{"s": "B"} | {"s": "B"} | false | true | true | true | false
(144 rows)
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")');
+ jsonb_path_query
+------------------
+(0 rows)
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "running"
+ "runs"
+ "run"
+(3 rows)
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")');
+ jsonb_path_query
+------------------
+ "run"
+(1 row)
+
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "postgres"
+ "POSTGRES"
+(2 rows)
+
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")');
+ jsonb_path_query
+------------------
+ "postgres"
+ "POSTGRES"
+(2 rows)
+
+-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")');
+ERROR: syntax error in tsquery: "fast car"
+-- if we specify "w" however it should be ok
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")');
+ jsonb_path_query
+------------------
+ "fast car"
+ "super fast car"
+(2 rows)
+
+-- it should also be ok if we change to a valid to_tsquery
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "fast car"
+ "super fast car"
+(2 rows)
+
+select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")');
+ jsonb_path_query
+------------------
+ "fat rats"
+(1 row)
+
diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out
index fd9bd755f52..02fececb2a9 100644
--- a/src/test/regress/expected/jsonpath.out
+++ b/src/test/regress/expected/jsonpath.out
@@ -1294,3 +1294,63 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text,
1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | |
(5 rows)
+-- tsmatch (Full Text Search)
+-- basic success
+select '$ ? (@ tsmatch "simple")'::jsonpath;
+ jsonpath
+------------------------
+ $?(@ tsmatch "simple")
+(1 row)
+
+select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath;
+ jsonpath
+--------------------------------------------
+ $?(@ tsmatch "running" tsconfig "english")
+(1 row)
+
+-- w/out tsconfig and tsqparser
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath;
+ jsonpath
+--------------------------------------------------
+ $?(@ tsmatch "fast & furious" tsconfig "simple")
+(1 row)
+
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath;
+ jsonpath
+----------------------------------------------------------------
+ $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")
+(1 row)
+
+-- tsconfig must be specified first and then tsqparser
+select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath;
+ERROR: syntax error at or near " " of jsonpath input
+LINE 1: select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconf...
+ ^
+select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath;
+ jsonpath
+----------------------------------------------
+ $?(@ tsmatch "fast & furious" tsqparser "w")
+(1 row)
+
+select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath;
+ jsonpath
+--------------------------------------------------
+ $[*]?(@."title" tsmatch "god" && @."rating" > 5)
+(1 row)
+
+select '$ ? (@ tsmatch $pattern)'::jsonpath;
+ERROR: syntax error at or near "$pattern" of jsonpath input
+LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath;
+ ^
+-- only string literals (no variables) are allowed for tsquery
+select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+ERROR: syntax error at or near "$var" of jsonpath input
+LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+ ^
+-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it)
+select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath;
+ jsonpath
+------------------------------------------------
+ $?(@ tsmatch "running" tsconfig "wrongconfig")
+(1 row)
+
diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql
index 3e8929a5269..9c7229eb382 100644
--- a/src/test/regress/sql/jsonb_jsonpath.sql
+++ b/src/test/regress/sql/jsonb_jsonpath.sql
@@ -1147,3 +1147,16 @@ SELECT
jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt
FROM str s1, str s2
ORDER BY s1.num, s2.num;
+
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")');
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")');
+select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")');
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")');
+select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")');
+-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")');
+-- if we specify "w" however it should be ok
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")');
+-- it should also be ok if we change to a valid to_tsquery
+select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")');
+select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")');
diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql
index 61a5270d4e8..9f58f466626 100644
--- a/src/test/regress/sql/jsonpath.sql
+++ b/src/test/regress/sql/jsonpath.sql
@@ -265,3 +265,22 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text,
'00',
'1a']) str,
LATERAL pg_input_error_info(str, 'jsonpath') as errinfo;
+
+-- tsmatch (Full Text Search)
+
+-- basic success
+select '$ ? (@ tsmatch "simple")'::jsonpath;
+select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath;
+-- w/out tsconfig and tsqparser
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath;
+select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath;
+-- tsconfig must be specified first and then tsqparser
+select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath;
+select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath;
+select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath;
+select '$ ? (@ tsmatch $pattern)'::jsonpath;
+
+-- only string literals (no variables) are allowed for tsquery
+select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath;
+-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it)
+select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath;
--
2.52.0
^ permalink raw reply [nested|flat] 2+ messages in thread
end of thread, other threads:[~2026-02-01 11:02 UTC | newest]
Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-26 17:22 Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]>
2026-02-01 11:02 ` Florents Tselai <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox