public inbox for [email protected]help / color / mirror / Atom feed
Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search 5+ messages / 2 participants [nested] [flat]
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search @ 2026-02-27 05:59 Florents Tselai <[email protected]> 2026-03-02 03:44 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Chao Li <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Florents Tselai @ 2026-02-27 05:59 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: pgsql-hackers <[email protected]> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: > > > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> > wrote: > > > > > > > > > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < > [email protected]> wrote: > > Hi, > > > > in real-life I work a lot with json & fts search, here's a feature I've > always wished I had, > > but never tackle it. Until yesterday that is. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > This patch introduces a tsmatch boolean operator to the JSONPath engine. > > By integrating FTS natively into path expressions, > > this operator allows for high-precision filtering of nested JSONB > structures— > > solving issues with structural ambiguity and query complexity. > > > > Currently, users must choose between two suboptimal paths for FTS-ing > nested JSON: > > - Imprecise Global Indexing > > jsonb_to_tsvector aggregates text into a flat vector. > > This ignores JSON boundaries, leading to false positives when the same > key (e.g., "body") > > appears in different contexts (e.g., a "Product Description" vs. a > "Customer Review"). > > > > - Complex SQL Workarounds > > Achieving 100% precision requires unnesting the document via > jsonb_array_elements and LATERAL joins. > > This leads to verbose SQL and high memory overhead from generating > intermediate heap tuples. > > > > One of the most significant advantages of tsmatch is its ability to > participate in multi-condition predicates > > within the same JSON object - something jsonb_to_tsvector cannot do. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > In a flat vector, the association between "Alice" and "performance" is > lost. > > tsmatch preserves this link by evaluating the FTS predicate in-place > during path traversal. > > > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define > an FTS operator, > > tsmatch is architecturally modeled after the standard-defined like_regex. > > > > The implementation follows the like_regex precedent: > > it is a non-indexable predicate that relies on GIN path-matching for > pruning and heap re-checks for precision. > > Caching is scoped to the JsonPathExecContext, > > ensuring 'compile-once' efficiency per execution without violating the > stability requirements of prepared statements. > > > > This initial implementation uses plainto_tsquery. > > However, the grammar is designed to support a "mode" flag (similar to > like_regex flags) > > in future iterations to toggle between to_tsquery, websearch_to_tsquery, > and phraseto_tsquery. > > > > Here's a v2, that implements the tsqparser clause > > > > So this should now work too > > > > select jsonb_path_query_array('["fast car", "slow car", "fast and > furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") > <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > Hi Florents, > > Grant pinged me about this. I can review it in coming days. Can you please > rebase it? I failed to apply to current master. Also, the CF reported a > failure test case, please take a look. Hi Evan, thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch . I noticed the failure too, but I'm having a hard time reproducing it tbh. This fails for Debian Trixie with Meson. The same with Autoconf passes... https://github.com/Florents-Tselai/postgres/runs/65098077968 Attachments: [application/octet-stream] v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (31.0K, 3-v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch) download | inline diff: From 56447340feddcbcef2a84a76a9be3e70e7f964f5 Mon Sep 17 00:00:00 2001 From: Florents Tselai <[email protected]> Date: Fri, 27 Feb 2026 07:37:49 +0200 Subject: [PATCH v3] Add tsmatch JSONPath operator for granular Full Text Search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch introduces the tsmatch boolean operator to the JSONPath engine. By integrating FTS natively into path expressions, this operator allows for high-precision filtering of nested JSONB structures—solving issues with structural ambiguity and query complexity. Currently, users must choose between two suboptimal paths for searching nested JSON: 1. Imprecise Global Indexing jsonb_to_tsvector aggregates text into a flat vector. This ignores JSON boundaries, leading to false positives when the same key (e.g., "body") appears in different contexts (e.g., a "Product Description" vs. a "Customer Review"). 2. Complex SQL Workarounds Achieving 100% precision requires "exploding" the document via jsonb_array_elements and LATERAL joins. This leads to verbose SQL and high memory overhead from generating intermediate heap tuples. One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates within the same JSON object— something jsonb_to_tsvector cannot do. SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); In a flat vector, the association between "Alice" and "performance" is lost. tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal. While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator, tsmatch is architecturally modeled after the standard-defined like_regex. The operator supports optional configuration for both the dictionary and the query parser: @ tsmatch "query" [ tsconfig "regconfig" ] [ tsqparser "mode" ] Supported parser modes are: - "pl": plainto_tsquery (no operators required) - "ph": phraseto_tsquery - "w": websearch_to_tsquery - Omitted: Defaults to to_tsquery (strict mode) The implementation relies on GIN path-matching for index pruning and heap re-checks for precision. Caching is scoped to the JsonPathExecContext, ensuring 'compile-once' efficiency for the tsquery and OID lookup per execution, respecting the stability requirements of prepared statements. --- doc/src/sgml/func/func-json.sgml | 34 +++++ src/backend/utils/adt/jsonpath.c | 136 ++++++++++++++++++- src/backend/utils/adt/jsonpath_exec.c | 130 +++++++++++++++++- src/backend/utils/adt/jsonpath_gram.y | 99 +++++++++++++- src/backend/utils/adt/jsonpath_scan.l | 3 + src/include/utils/jsonpath.h | 19 +++ src/test/regress/expected/jsonb_jsonpath.out | 58 ++++++++ src/test/regress/expected/jsonpath.out | 60 ++++++++ src/test/regress/sql/jsonb_jsonpath.sql | 13 ++ src/test/regress/sql/jsonpath.sql | 19 +++ 10 files changed, 567 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml index 1ec73cff464..f5d6da3824d 100644 --- a/doc/src/sgml/func/func-json.sgml +++ b/doc/src/sgml/func/func-json.sgml @@ -3107,6 +3107,40 @@ ERROR: jsonpath member accessor can only be applied to an object <returnvalue>[]</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable> + <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional> + <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional> + <returnvalue>boolean</returnvalue> + </para> + <para> + Tests whether the JSON string (first operand) matches the full-text search query + (second operand). The matching behavior can be customized using the optional + <literal>tsconfig</literal> and <literal>tsqparser</literal> clauses. + If <literal>tsconfig</literal> is omitted, the current session's default text search configuration + is used (see <xref linkend="guc-default-text-search-config"/>). + The <literal>tsqparser</literal> clause determines how the query string is parsed + (see <xref linkend="textsearch-parsing-queries"/>). + Valid options are <literal>"pl"</literal> (<function>plainto_tsquery</function>), + <literal>"ph"</literal> (<function>phraseto_tsquery</function>), and + <literal>"w"</literal> (<function>websearch_to_tsquery</function>). + If <literal>tsqparser</literal> is omitted, the query is parsed using <function>to_tsquery</function>. + </para> + <para> + <literal>jsonb_path_query_array('["running", "runs", "ran", "jogging"]', '$[*] ? (@ tsmatch "run" tsconfig "english")')</literal> + <returnvalue>["running", "runs"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast & car")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index d70ff1eaa54..61f299c18ff 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -349,6 +349,66 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, *(int32 *) (buf->data + offs) = chld - pos; } break; + case jpiTsMatch: + { + int32 expr_off; + int32 tsconfig_off; + + /* Reserve slots for child node offsets */ + expr_off = reserveSpaceForItemPointer(buf); + tsconfig_off = reserveSpaceForItemPointer(buf); + + /* Write the Main Query String */ + appendBinaryStringInfo(buf, + &item->value.tsmatch.tsquerylen, + sizeof(item->value.tsmatch.tsquerylen)); + appendBinaryStringInfo(buf, + item->value.tsmatch.tsquery, + item->value.tsmatch.tsquerylen); + appendStringInfoChar(buf, '\0'); + + /* Write the Parser Flag */ + if (item->value.tsmatch.tsqparser) + { + appendBinaryStringInfo(buf, + &item->value.tsmatch.tsqparser_len, + sizeof(item->value.tsmatch.tsqparser_len)); + appendBinaryStringInfo(buf, + item->value.tsmatch.tsqparser, + item->value.tsmatch.tsqparser_len); + appendStringInfoChar(buf, '\0'); + } + else + { + uint32 zero = 0; + + appendBinaryStringInfo(buf, &zero, sizeof(uint32)); + appendStringInfoChar(buf, '\0'); + } + + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.doc, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + expr_off) = chld - pos; + + /* TSConfig (Optional) */ + if (item->value.tsmatch.tsconfig) + { + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.tsconfig, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + tsconfig_off) = chld - pos; + } + else + { + *(int32 *) (buf->data + tsconfig_off) = 0; + } + } + break; case jpiFilter: argNestingLevel++; pg_fallthrough; @@ -759,6 +819,58 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey, appendStringInfoChar(buf, '"'); } + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + case jpiTsMatch: + if (printBracketes) + appendStringInfoChar(buf, '('); + + jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc); + printJsonPathItem(buf, &elem, false, + operationPriority(elem.type) <= + operationPriority(v->type)); + + appendStringInfoString(buf, " tsmatch "); + + escape_json_with_len(buf, + v->content.tsmatch.tsquery, + v->content.tsmatch.tsquerylen); + + /* Print TSConfig if present */ + if (v->content.tsmatch.tsconfig) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + + appendStringInfoString(buf, " tsconfig "); + jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + appendStringInfoChar(buf, '"'); + appendBinaryStringInfo(buf, config_str, config_len); + appendStringInfoChar(buf, '"'); + } + + if (v->content.tsmatch.tsqparser_len > 0) + { + appendStringInfoString(buf, " tsqparser "); + appendStringInfoChar(buf, '"'); + + /* + * Use simple binary append since flags like "pl" don't need + * JSON escaping + */ + appendBinaryStringInfo(buf, + v->content.tsmatch.tsqparser, + v->content.tsmatch.tsqparser_len); + appendStringInfoChar(buf, '"'); + } + + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + if (printBracketes) appendStringInfoChar(buf, ')'); break; @@ -914,6 +1026,8 @@ jspOperationName(JsonPathItemType type) return "timestamp"; case jpiTimestampTz: return "timestamp_tz"; + case jpiTsMatch: + return "tsmatch"; default: elog(ERROR, "unrecognized jsonpath item type: %d", type); return NULL; @@ -1072,6 +1186,21 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos) read_int32(v->content.like_regex.patternlen, base, pos); v->content.like_regex.pattern = base + pos; break; + case jpiTsMatch: + read_int32(v->content.tsmatch.doc, base, pos); + read_int32(v->content.tsmatch.tsconfig, base, pos); + + /* Read Query String */ + read_int32(v->content.tsmatch.tsquerylen, base, pos); + v->content.tsmatch.tsquery = base + pos; + /* Skip past query string + null terminator */ + pos += v->content.tsmatch.tsquerylen + 1; + + /* Read Parser Flag */ + read_int32(v->content.tsmatch.tsqparser_len, base, pos); + v->content.tsmatch.tsqparser = base + pos; + pos += v->content.tsmatch.tsqparser_len + 1; + break; default: elog(ERROR, "unrecognized jsonpath item type: %d", v->type); } @@ -1142,6 +1271,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a) v->type == jpiLast || v->type == jpiStartsWith || v->type == jpiLikeRegex || + v->type == jpiTsMatch || v->type == jpiBigint || v->type == jpiBoolean || v->type == jpiDate || @@ -1474,7 +1604,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt) jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr); jspIsMutableWalker(&arg, cxt); break; - + case jpiTsMatch: + Assert(status == jpdsNonDateTime); + jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc); + jspIsMutableWalker(&arg, cxt); + break; /* literals */ case jpiNull: case jpiString: diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 52ae0ba4cf7..8b932518a05 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -123,6 +123,13 @@ typedef struct JsonLikeRegexContext int cflags; } JsonLikeRegexContext; +typedef struct JsonTsMatchContext +{ + Datum queryDatum; /* Cache the compiled binary TSQuery */ + Oid tsconfigId; /* Cache the dictionary OID */ + bool initialized; /* Flag to run setup only once */ +} JsonTsMatchContext; + /* Result of jsonpath predicate evaluation */ typedef enum JsonPathBool { @@ -306,6 +313,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found); static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonValueList *found, JsonPathBool res); +static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param); static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item, JsonbValue *value); static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen, @@ -800,6 +808,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, case jpiExists: case jpiStartsWith: case jpiLikeRegex: + case jpiTsMatch: { JsonPathBool st = executeBoolItem(cxt, jsp, jb, true); @@ -1868,6 +1877,16 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return executePredicate(cxt, jsp, &larg, NULL, jb, false, executeLikeRegex, &lrcxt); } + case jpiTsMatch: + { + JsonTsMatchContext lrcxt = {0}; + + jspInitByBuffer(&larg, jsp->base, + jsp->content.tsmatch.doc); + + return executePredicate(cxt, jsp, &larg, NULL, jb, false, + executeTsMatch, &lrcxt); + } case jpiExists: jspGetArg(jsp, &larg); @@ -1899,7 +1918,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return res == jperOk ? jpbTrue : jpbFalse; } - default: elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type); return jpbUnknown; @@ -2922,6 +2940,116 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, return res; } +#include "tsearch/ts_utils.h" +#include "tsearch/ts_cache.h" +#include "utils/regproc.h" +#include "catalog/namespace.h" + +static JsonPathBool +executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, + void *param) +{ + JsonTsMatchContext *cxt = param; + text *doc_text; + Datum tsvector_datum; + bool match; + + if (!(str = getScalar(str, jbvString))) + return jpbUnknown; + + /* Setup Context (Run ONLY once per predicate) */ + if (!cxt->initialized) + { + text *query_text; + char *parser_mode; + uint32 parser_len; + + if (jsp->content.tsmatch.tsconfig != 0) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + + jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + + cxt->tsconfigId = get_ts_config_oid(stringToQualifiedNameList(config_str, NULL), true); + } + else + { + cxt->tsconfigId = getTSCurrentConfig(true); + } + + /* Prepare Query Text */ + query_text = cstring_to_text_with_len(jsp->content.tsmatch.tsquery, + jsp->content.tsmatch.tsquerylen); + + /* Select Parser and Compile Query */ + parser_mode = jsp->content.tsmatch.tsqparser; + parser_len = jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + /* Dispatch based on flag */ + if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0) + { + cxt->queryDatum = DirectFunctionCall2(plainto_tsquery_byid, + ObjectIdGetDatum(cxt->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "ph", parser_len) == 0) + { + cxt->queryDatum = DirectFunctionCall2(phraseto_tsquery_byid, + ObjectIdGetDatum(cxt->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "w", parser_len) == 0) + { + cxt->queryDatum = DirectFunctionCall2(websearch_to_tsquery_byid, + ObjectIdGetDatum(cxt->tsconfigId), + PointerGetDatum(query_text)); + } + else + { + /* + * Fallback or Error for unknown flags (should be caught by + * parser) + */ + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unrecognized tsqparser flag"))); + } + } + else + { + /* + * Default: to_tsquery (Standard Mode) Note: This expects + * operators like '&' or '|' in the query string + */ + cxt->queryDatum = DirectFunctionCall2(to_tsquery_byid, + ObjectIdGetDatum(cxt->tsconfigId), + PointerGetDatum(query_text)); + } + + cxt->initialized = true; + } + + /* Runtime: Convert Doc to Vector and Match */ + + doc_text = cstring_to_text_with_len(str->val.string.val, + str->val.string.len); + + tsvector_datum = DirectFunctionCall2(to_tsvector_byid, + ObjectIdGetDatum(cxt->tsconfigId), + PointerGetDatum(doc_text)); + + match = DatumGetBool(DirectFunctionCall2(ts_match_vq, + tsvector_datum, + cxt->queryDatum)); + + return match ? jpbTrue : jpbFalse; +} + /* * Convert boolean execution status 'res' to a boolean JSON item and execute diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index 87070235d11..a2acf2660f6 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -43,7 +43,12 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *flags, JsonPathParseItem ** result, struct Node *escontext); - +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem ** result, + struct Node *escontext); /* * Bison doesn't allocate anything that needs to live across parser calls, * so we can easily have it use palloc instead of malloc. This prevents @@ -81,7 +86,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P %token <str> OR_P AND_P NOT_P %token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P -%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P +%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P TSQUERYPARSER_P %token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P %token <str> DATETIME_P %token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P @@ -187,6 +192,37 @@ predicate: YYABORT; $$ = jppitem; } + | expr TSMATCH_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) */ + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + JsonPathParseItem *jppitem; + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } ; starts_with_initial: @@ -357,6 +393,9 @@ key_name: | TIME_TZ_P | TIMESTAMP_P | TIMESTAMP_TZ_P + | TSCONFIG_P + | TSMATCH_P + | TSQUERYPARSER_P ; method: @@ -684,3 +723,59 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext) return true; } + +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) +{ + JsonPathParseItem *v = makeItemType(jpiTsMatch); + + v->value.tsmatch.doc = doc; + + v->value.tsmatch.tsquery = tsquery->val; + v->value.tsmatch.tsquerylen = tsquery->len; + + /* Handle the Configuration (Stored as a Node) */ + if (tsconfig) + { + JsonPathParseItem *conf = makeItemType(jpiString); + conf->value.string.val = tsconfig->val; + conf->value.string.len = tsconfig->len; + v->value.tsmatch.tsconfig = conf; + } + else + { + v->value.tsmatch.tsconfig = NULL; + } + + /* Handle the TSQuery Parser Flag */ + if (tsquery_parser) + { + /* Check for "pl", "ph", "w" here to throw a syntax error immediately */ + if (pg_strncasecmp(tsquery_parser->val, "pl", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "ph", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "w", tsquery_parser->len) != 0) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val), + errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); + } + + v->value.tsmatch.tsqparser = tsquery_parser->val; + v->value.tsmatch.tsqparser_len = tsquery_parser->len; + } + else + { + v->value.tsmatch.tsqparser = NULL; + v->value.tsmatch.tsqparser_len = 0; + } + + /* Success */ + *result = v; + return true; +} diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index 38c5841e879..809c66ec085 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -427,10 +427,13 @@ static const JsonPathKeyword keywords[] = { {7, false, DECIMAL_P, "decimal"}, {7, false, INTEGER_P, "integer"}, {7, false, TIME_TZ_P, "time_tz"}, + {7, false, TSMATCH_P, "tsmatch"}, {7, false, UNKNOWN_P, "unknown"}, {8, false, DATETIME_P, "datetime"}, {8, false, KEYVALUE_P, "keyvalue"}, + {8, false, TSCONFIG_P, "tsconfig"}, {9, false, TIMESTAMP_P, "timestamp"}, + {9, false, TSQUERYPARSER_P, "tsqparser"}, {10, false, LIKE_REGEX_P, "like_regex"}, {12, false, TIMESTAMP_TZ_P, "timestamp_tz"}, }; diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h index 6f529d74dcd..00752321ff6 100644 --- a/src/include/utils/jsonpath.h +++ b/src/include/utils/jsonpath.h @@ -104,6 +104,7 @@ typedef enum JsonPathItemType jpiLast, /* LAST array subscript */ jpiStartsWith, /* STARTS WITH predicate */ jpiLikeRegex, /* LIKE_REGEX predicate */ + jpiTsMatch, /* TSMATCH predicate */ jpiBigint, /* .bigint() item method */ jpiBoolean, /* .boolean() item method */ jpiDate, /* .date() item method */ @@ -188,6 +189,15 @@ typedef struct JsonPathItem int32 patternlen; uint32 flags; } like_regex; + struct + { + int32 doc; + char *tsquery; + uint32 tsquerylen; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } content; } JsonPathItem; @@ -266,6 +276,15 @@ struct JsonPathParseItem uint32 len; char *val; /* could not be not null-terminated */ } string; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquerylen; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; }; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index 4bcd4e91a29..35558af04e5 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -4510,3 +4510,61 @@ ORDER BY s1.num, s2.num; {"s": "B"} | {"s": "B"} | false | true | true | true | false (144 rows) +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); + jsonb_path_query +------------------ +(0 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); + jsonb_path_query +------------------ + "running" + "runs" + "run" +(3 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); + jsonb_path_query +------------------ + "run" +(1 row) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +ERROR: syntax error in tsquery: "fast car" +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); + jsonb_path_query +------------------ + "fat rats" +(1 row) + diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index fd9bd755f52..02fececb2a9 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -1294,3 +1294,63 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, 1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | | (5 rows) +-- tsmatch (Full Text Search) +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; + jsonpath +------------------------ + $?(@ tsmatch "simple") +(1 row) + +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; + jsonpath +-------------------------------------------- + $?(@ tsmatch "running" tsconfig "english") +(1 row) + +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; + jsonpath +-------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple") +(1 row) + +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w") +(1 row) + +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +ERROR: syntax error at or near " " of jsonpath input +LINE 1: select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconf... + ^ +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------- + $?(@ tsmatch "fast & furious" tsqparser "w") +(1 row) + +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; + jsonpath +-------------------------------------------------- + $[*]?(@."title" tsmatch "god" && @."rating" > 5) +(1 row) + +select '$ ? (@ tsmatch $pattern)'::jsonpath; +ERROR: syntax error at or near "$pattern" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath; + ^ +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +ERROR: syntax error at or near "$var" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; + ^ +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; + jsonpath +------------------------------------------------ + $?(@ tsmatch "running" tsconfig "wrongconfig") +(1 row) + diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index 3e8929a5269..9c7229eb382 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -1147,3 +1147,16 @@ SELECT jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt FROM str s1, str s2 ORDER BY s1.num, s2.num; + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql index 61a5270d4e8..9f58f466626 100644 --- a/src/test/regress/sql/jsonpath.sql +++ b/src/test/regress/sql/jsonpath.sql @@ -265,3 +265,22 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, '00', '1a']) str, LATERAL pg_input_error_info(str, 'jsonpath') as errinfo; + +-- tsmatch (Full Text Search) + +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; +select '$ ? (@ tsmatch $pattern)'::jsonpath; + +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; -- 2.53.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search 2026-02-27 05:59 Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> @ 2026-03-02 03:44 ` Chao Li <[email protected]> 2026-04-03 09:56 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Chao Li @ 2026-03-02 03:44 UTC (permalink / raw) To: Florents Tselai <[email protected]>; +Cc: pgsql-hackers <[email protected]> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> wrote: > > > > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: > > > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> wrote: > > > > > > > > > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <[email protected]> wrote: > > Hi, > > > > in real-life I work a lot with json & fts search, here's a feature I've always wished I had, > > but never tackle it. Until yesterday that is. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); > > > > This patch introduces a tsmatch boolean operator to the JSONPath engine. > > By integrating FTS natively into path expressions, > > this operator allows for high-precision filtering of nested JSONB structures— > > solving issues with structural ambiguity and query complexity. > > > > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON: > > - Imprecise Global Indexing > > jsonb_to_tsvector aggregates text into a flat vector. > > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body") > > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review"). > > > > - Complex SQL Workarounds > > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins. > > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples. > > > > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates > > within the same JSON object - something jsonb_to_tsvector cannot do. > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); > > > > In a flat vector, the association between "Alice" and "performance" is lost. > > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal. > > > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator, > > tsmatch is architecturally modeled after the standard-defined like_regex. > > > > The implementation follows the like_regex precedent: > > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision. > > Caching is scoped to the JsonPathExecContext, > > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements. > > > > This initial implementation uses plainto_tsquery. > > However, the grammar is designed to support a "mode" flag (similar to like_regex flags) > > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery. > > > > Here's a v2, that implements the tsqparser clause > > > > So this should now work too > > > > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > Hi Florents, > > Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look. > > Hi Evan, > thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch . > > I noticed the failure too, but I'm having a hard time reproducing it tbh. > This fails for Debian Trixie with Meson. The same with Autoconf passes... > > https://github.com/Florents-Tselai/postgres/runs/65098077968 > > > > > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> I have reviewed v3 and traced a few test cases. Here comes my review comments: 1 ``` + <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable> + <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional> + <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional> ``` For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example: ``` <replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable> <optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional> <optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional> ``` 2 - jsonpath_gram.y ``` +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem ** result, + struct Node *escontext); ``` Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line. 3 - jsonpath_gram.y ``` + | expr TSMATCH_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) */ + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + JsonPathParseItem *jppitem; + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } ``` Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you. 4 - jsonpath_gram.y ``` +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) ``` makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR). 5 - jsonpath.h ``` + struct + { + int32 doc; + char *tsquery; + uint32 tsquerylen; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquerylen; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; ``` tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure? 6 - jsonpath_exec.c ``` #include "tsearch/ts_utils.h" #include "tsearch/ts_cache.h" #include "utils/regproc.h" #include "catalog/namespace.h" static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param) ``` Why don’t put these includes to the header section together with other includes? 7 - jsonpath_exec.c ``` + else + { + /* + * Fallback or Error for unknown flags (should be caught by + * parser) + */ + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unrecognized tsqparser flag"))); + } ``` This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). 8 - jsonpath_exec.c ``` + /* Setup Context (Run ONLY once per predicate) */ + if (!cxt->initialized) ``` While tracing this SQL: ``` evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; ?column? ---------- (1 row) ``` I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused. 9 - jsonpath_exec.c ``` + /* Select Parser and Compile Query */ + parser_mode = jsp->content.tsmatch.tsqparser; + parser_len = jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + /* Dispatch based on flag */ + if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0) ``` Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”. 10 - jsonpath_gram.y ``` + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val), + errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); ``` When tested a case with an invalid parser, I got: ``` evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")'; ERROR: invalid tsquery_parser value: "pss@" LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... ^ HINT: Valid values are "pl", "ph", and "w". ``` You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with: ``` errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val), ``` This change is also included in the attached diff file. 11 - jsonpath.c ``` + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + if (printBracketes) appendStringInfoChar(buf, ')'); ``` Duplicate code. Looks like a copy-pasto. 12 - jsonpath.c ``` + /* Write the Main Query String */ + appendBinaryStringInfo(buf, + &item->value.tsmatch.tsquerylen, + sizeof(item->value.tsmatch.tsquerylen)); + appendBinaryStringInfo(buf, + item->value.tsmatch.tsquery, + item->value.tsmatch.tsquerylen); + appendStringInfoChar(buf, '\0'); ``` I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added. ``` /* * appendBinaryStringInfo * * Append arbitrary binary data to a StringInfo, allocating more space * if necessary. Ensures that a trailing null byte is present. */ void appendBinaryStringInfo(StringInfo str, const void *data, int datalen) ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ Attachments: [application/octet-stream] nocfbot_jsonpath_gram_y.diff (2.8K, 2-nocfbot_jsonpath_gram_y.diff) download | inline diff: diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index a2acf2660f6..4d5837e2459 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -78,6 +78,13 @@ static bool makeItemTsMatch(JsonPathParseItem *doc, JsonPathParseItem *value; JsonPathParseResult *result; JsonPathItemType optype; + struct + { + bool has_tsconfig; + JsonPathString tsconfig; + bool has_tsqparser; + JsonPathString tsqparser; + } tsmatch_opts; bool boolean; int integer; } @@ -100,6 +107,8 @@ static bool makeItemTsMatch(JsonPathParseItem *doc, datetime_template opt_datetime_template csv_elem datetime_precision opt_datetime_precision +%type <tsmatch_opts> tsmatch_opts + %type <elems> accessor_expr csv_list opt_csv_list %type <indexs> index_list @@ -192,36 +201,44 @@ predicate: YYABORT; $$ = jppitem; } - | expr TSMATCH_P STRING_P + | expr TSMATCH_P STRING_P tsmatch_opts { JsonPathParseItem *jppitem; - /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */ - if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext)) - YYABORT; + + if (! makeItemTsMatch($1, &$3, + $4.has_tsconfig ? &$4.tsconfig : NULL, + $4.has_tsqparser ? &$4.tsqparser : NULL, + &jppitem, escontext)) + YYABORT; + $$ = jppitem; } - | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P + ; + +tsmatch_opts: + /* EMPTY */ { - JsonPathParseItem *jppitem; - /* Pass NULL for tsquery_parser (4th) */ - if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = false; + $$.has_tsqparser = false; } - | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P + | TSCONFIG_P STRING_P { - JsonPathParseItem *jppitem; - /* Pass NULL for tsconfig (3rd) */ - if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = false; } - | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + | TSQUERYPARSER_P STRING_P { - JsonPathParseItem *jppitem; - if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = false; + $$.has_tsqparser = true; + $$.tsqparser = $2; + } + | TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = true; + $$.tsqparser = $4; } ; @@ -762,7 +779,7 @@ makeItemTsMatch(JsonPathParseItem *doc, { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val), + errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val), errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); } ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search 2026-02-27 05:59 Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 2026-03-02 03:44 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Chao Li <[email protected]> @ 2026-04-03 09:56 ` Florents Tselai <[email protected]> 2026-04-04 09:38 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Florents Tselai @ 2026-04-03 09:56 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: pgsql-hackers <[email protected]> On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote: > > > > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> > wrote: > > > > > > > > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: > > > > > > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> > wrote: > > > > > > > > > > > > > > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < > [email protected]> wrote: > > > Hi, > > > > > > in real-life I work a lot with json & fts search, here's a feature > I've always wished I had, > > > but never tackle it. Until yesterday that is. > > > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > > > This patch introduces a tsmatch boolean operator to the JSONPath > engine. > > > By integrating FTS natively into path expressions, > > > this operator allows for high-precision filtering of nested JSONB > structures— > > > solving issues with structural ambiguity and query complexity. > > > > > > Currently, users must choose between two suboptimal paths for FTS-ing > nested JSON: > > > - Imprecise Global Indexing > > > jsonb_to_tsvector aggregates text into a flat vector. > > > This ignores JSON boundaries, leading to false positives when the same > key (e.g., "body") > > > appears in different contexts (e.g., a "Product Description" vs. a > "Customer Review"). > > > > > > - Complex SQL Workarounds > > > Achieving 100% precision requires unnesting the document via > jsonb_array_elements and LATERAL joins. > > > This leads to verbose SQL and high memory overhead from generating > intermediate heap tuples. > > > > > > One of the most significant advantages of tsmatch is its ability to > participate in multi-condition predicates > > > within the same JSON object - something jsonb_to_tsvector cannot do. > > > > > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && > @.body tsmatch "performance")'); > > > > > > In a flat vector, the association between "Alice" and "performance" is > lost. > > > tsmatch preserves this link by evaluating the FTS predicate in-place > during path traversal. > > > > > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly > define an FTS operator, > > > tsmatch is architecturally modeled after the standard-defined > like_regex. > > > > > > The implementation follows the like_regex precedent: > > > it is a non-indexable predicate that relies on GIN path-matching for > pruning and heap re-checks for precision. > > > Caching is scoped to the JsonPathExecContext, > > > ensuring 'compile-once' efficiency per execution without violating the > stability requirements of prepared statements. > > > > > > This initial implementation uses plainto_tsquery. > > > However, the grammar is designed to support a "mode" flag (similar to > like_regex flags) > > > in future iterations to toggle between to_tsquery, > websearch_to_tsquery, and phraseto_tsquery. > > > > > > Here's a v2, that implements the tsqparser clause > > > > > > So this should now work too > > > > > > select jsonb_path_query_array('["fast car", "slow car", "fast and > furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") > <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > > > Hi Florents, > > > > Grant pinged me about this. I can review it in coming days. Can you > please rebase it? I failed to apply to current master. Also, the CF > reported a failure test case, please take a look. > > > > Hi Evan, > > thanks for having a look. The conflict was due to the intro of > pg_fallthrough. Not related to this patch . > > > > I noticed the failure too, but I'm having a hard time reproducing it tbh. > > This fails for Debian Trixie with Meson. The same with Autoconf passes... > > > > https://github.com/Florents-Tselai/postgres/runs/65098077968 > > > > > > > > > > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> > > I have reviewed v3 and traced a few test cases. Here comes my review > comments: > > 1 > ``` > + <replaceable>string</replaceable> <literal>tsmatch</literal> > <replaceable>string</replaceable> > + <optional> <literal>tsconfig</literal> > <replaceable>string</replaceable> </optional> > + <optional> <literal>tsqparser</literal> > <replaceable>string</replaceable> </optional> > ``` > > For all “replaceable”, instead of “string”, would it be better to use > something more descriptive? For example: > ``` > <replaceable>json_string</replaceable> <literal>tsmatch</literal> > <replaceable>query</replaceable> > <optional> <literal>tsconfig</literal> > <replaceable>config_name</replaceable> </optional> > <optional> <literal>tsqparser</literal> > <replaceable>parser_mode</replaceable> </optional> > ``` > > 2 - jsonpath_gram.y > ``` > +static bool makeItemTsMatch(JsonPathParseItem *doc, > + JsonPathString > *tsquery, > + JsonPathString > *tsconfig, > + JsonPathString > *tsquery_parser, > + > JsonPathParseItem ** result, > + struct Node > *escontext); > ``` > > Format Nit: Looking at the existing code, the J in the second and > following lines, should be placed in the same column as the J in the first > line. > > 3 - jsonpath_gram.y > ``` > + | expr TSMATCH_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsconfig (3rd) and NULL for > tsquery_parser (4th) */ > + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsquery_parser (4th) */ > + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P > + { > + JsonPathParseItem *jppitem; > + /* Pass NULL for tsconfig (3rd) */ > + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P > STRING_P > + { > + JsonPathParseItem *jppitem; > + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, > escontext)) > + YYABORT; > + $$ = jppitem; > + } > ``` > > Feels a little redundant, repeatedly calls makeItemTsMatch. See the > attached diff for a simplification. But my version is a bit longer in terms > of number of lines. So, up to you. > > 4 - jsonpath_gram.y > ``` > +static bool > +makeItemTsMatch(JsonPathParseItem *doc, > + JsonPathString *tsquery, > + JsonPathString *tsconfig, > + JsonPathString *tsquery_parser, > + JsonPathParseItem **result, > + struct Node *escontext) > ``` > > makeItemTsMatch doesn’t need to return a bool. Actually, now it never > returns false, instead, it just ereport(ERROR). > > 5 - jsonpath.h > ``` > + struct > + { > + int32 doc; > + char *tsquery; > + uint32 tsquerylen; > + int32 tsconfig; > + char *tsqparser; > + uint32 tsqparser_len; > + } tsmatch; > > + struct > + { > + JsonPathParseItem *doc; > + char *tsquery; > + uint32 tsquerylen; > + JsonPathParseItem *tsconfig; > + char *tsqparser; > + uint32 tsqparser_len; > + } tsmatch; > } value; > ``` > > tsquerylen doesn’t have _ before len, and tsqparser_len, would it be > better to make naming conventions consistent in the same structure? > > 6 - jsonpath_exec.c > ``` > #include "tsearch/ts_utils.h" > #include "tsearch/ts_cache.h" > #include "utils/regproc.h" > #include "catalog/namespace.h" > > static JsonPathBool > executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, > void *param) > ``` > > Why don’t put these includes to the header section together with other > includes? > > 7 - jsonpath_exec.c > ``` > + else > + { > + /* > + * Fallback or Error for unknown flags > (should be caught by > + * parser) > + */ > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("unrecognized > tsqparser flag"))); > + } > ``` > > This “else” should never be entered as the same check has been done by > makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). > > 8 - jsonpath_exec.c > ``` > + /* Setup Context (Run ONLY once per predicate) */ > + if (!cxt->initialized) > ``` > > While tracing this SQL: > ``` > evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb > evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; > ?column? > ---------- > > (1 row) > ``` > > I noticed that, when process “jogging”, cxt->initialized is still false, > meaning that, the cxt is not reused across array items. Given the same > tsconfig should apply to all array items, I think cxt should be reused. > > 9 - jsonpath_exec.c > ``` > + /* Select Parser and Compile Query */ > + parser_mode = jsp->content.tsmatch.tsqparser; > + parser_len = jsp->content.tsmatch.tsqparser_len; > + > + if (parser_len > 0) > + { > + /* Dispatch based on flag */ > + if (pg_strncasecmp(parser_mode, "pl", parser_len) > == 0) > ``` > > Nit: parser_mode is only used inside if (parser_len > 0), it can be > defined inside the “if”. > > 10 - jsonpath_gram.y > ``` > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("invalid tsquery_parser > value: \"%s\"", tsquery_parser->val), > + errhint("Valid values are > \"pl\", \"ph\", and \"w\"."))); > ``` > > When tested a case with an invalid parser, I got: > ``` > evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb > > @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser > "pss")'; > ERROR: invalid tsquery_parser value: "pss @" > LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... > ^ > HINT: Valid values are "pl", "ph", and "w". > ``` > > You can see the it shows a bad looking invalid value. I think that’s > because tsquery_parser->val is not NULL terminated. I fixed this problem > with: > ``` > errmsg("invalid tsquery_parser value: \"%.*s\"", (int) > tsquery_parser->len, tsquery_parser->val), > ``` > > This change is also included in the attached diff file. > > 11 - jsonpath.c > ``` > + if (printBracketes) > + appendStringInfoChar(buf, ')'); > + break; > + > if (printBracketes) > appendStringInfoChar(buf, ')'); > ``` > > Duplicate code. Looks like a copy-pasto. > > 12 - jsonpath.c > ``` > + /* Write the Main Query String */ > + appendBinaryStringInfo(buf, > + > &item->value.tsmatch.tsquerylen, > + > sizeof(item->value.tsmatch.tsquerylen)); > + appendBinaryStringInfo(buf, > + > item->value.tsmatch.tsquery, > + > item->value.tsmatch.tsquerylen); > + appendStringInfoChar(buf, '\0'); > ``` > > I don’t think we need to manually append ‘\0’ after > appendBinaryStringInfo. Looking at the header comment of > appendBinaryStringInfo, it says that a trailing null will be added. > ``` > /* > * appendBinaryStringInfo > * > * Append arbitrary binary data to a StringInfo, allocating more space > * if necessary. Ensures that a trailing null byte is present. > */ > void > appendBinaryStringInfo(StringInfo str, const void *data, int datalen) > ``` > Here's a v4 which incorporates most of Evan's comments & feedback - shifted tsquery compilation logic to use persistent cache within JsonPathExecContext - Fixed a binary serialization alignment issue which caused Dixie to fail earlier - I've refactor and simplified the grammar per Evan's input by adding a tsmatch_opts rule. - Also updated the docs per Evan's comments Attachments: [application/octet-stream] v4-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (33.2K, 3-v4-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch) download | inline diff: From 1219ffeefb507a75a5fc4b7fae7820fdb89d29bd Mon Sep 17 00:00:00 2001 From: Florents Tselai <[email protected]> Date: Fri, 3 Apr 2026 12:50:40 +0300 Subject: [PATCH v4] Add tsmatch JSONPath operator for granular Full Text Search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch introduces the tsmatch boolean operator to the JSONPath engine. By integrating FTS natively into path expressions, this operator allows for high-precision filtering of nested JSONB structures—solving issues with structural ambiguity and query complexity. Currently, users must choose between two suboptimal paths for searching nested JSON: 1. Imprecise Global Indexing jsonb_to_tsvector aggregates text into a flat vector. This ignores JSON boundaries, leading to false positives when the same key (e.g., "body") appears in different contexts (e.g., a "Product Description" vs. a "Customer Review"). 2. Complex SQL Workarounds Achieving 100% precision requires "exploding" the document via jsonb_array_elements and LATERAL joins. This leads to verbose SQL and high memory overhead from generating intermediate heap tuples. One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates within the same JSON object— something jsonb_to_tsvector cannot do. SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); In a flat vector, the association between "Alice" and "performance" is lost. tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal. While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator, tsmatch is architecturally modeled after the standard-defined like_regex. The operator supports optional configuration for both the dictionary and the query parser: @ tsmatch "query" [ tsconfig "regconfig" ] [ tsqparser "mode" ] Supported parser modes are: - "pl": plainto_tsquery (no operators required) - "ph": phraseto_tsquery - "w": websearch_to_tsquery - Omitted: Defaults to to_tsquery (strict mode) The implementation relies on GIN path-matching for index pruning and heap re-checks for precision. Caching is scoped to the JsonPathExecContext, ensuring 'compile-once' efficiency for the tsquery and OID lookup per execution, respecting the stability requirements of prepared statements. --- doc/src/sgml/func/func-json.sgml | 34 +++++ src/backend/utils/adt/jsonpath.c | 131 +++++++++++++++- src/backend/utils/adt/jsonpath_exec.c | 153 ++++++++++++++++++- src/backend/utils/adt/jsonpath_gram.y | 115 +++++++++++++- src/backend/utils/adt/jsonpath_scan.l | 3 + src/include/utils/jsonpath.h | 19 +++ src/test/regress/expected/jsonb_jsonpath.out | 58 +++++++ src/test/regress/expected/jsonpath.out | 60 ++++++++ src/test/regress/sql/jsonb_jsonpath.sql | 13 ++ src/test/regress/sql/jsonpath.sql | 19 +++ 10 files changed, 601 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml index 4cd338fe6e3..edf67fc694e 100644 --- a/doc/src/sgml/func/func-json.sgml +++ b/doc/src/sgml/func/func-json.sgml @@ -3247,6 +3247,40 @@ ERROR: jsonpath member accessor can only be applied to an object <returnvalue>[]</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable> + <optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional> + <optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional> + <returnvalue>boolean</returnvalue> + </para> + <para> + Tests whether the JSON string (first operand) matches the full-text search query + (second operand). The matching behavior can be customized using the optional + <literal>tsconfig</literal> and <literal>tsqparser</literal> clauses. + If <literal>tsconfig</literal> is omitted, the current session's default text search configuration + is used (see <xref linkend="guc-default-text-search-config"/>). + The <literal>tsqparser</literal> clause determines how the query string is parsed + (see <xref linkend="textsearch-parsing-queries"/>). + Valid options are <literal>"pl"</literal> (<function>plainto_tsquery</function>), + <literal>"ph"</literal> (<function>phraseto_tsquery</function>), and + <literal>"w"</literal> (<function>websearch_to_tsquery</function>). + If <literal>tsqparser</literal> is omitted, the query is parsed using <function>to_tsquery</function>. + </para> + <para> + <literal>jsonb_path_query_array('["running", "runs", "ran", "jogging"]', '$[*] ? (@ tsmatch "run" tsconfig "english")')</literal> + <returnvalue>["running", "runs"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast & car")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 7bfc18c9888..5abb6f433f7 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -351,6 +351,55 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, *(int32 *) (buf->data + offs) = chld - pos; } break; + case jpiTsMatch: + { + int32 expr_off; + int32 tsconfig_off; + uint32 tsqparser_len_val = item->value.tsmatch.tsqparser ? item->value.tsmatch.tsqparser_len : 0; + + expr_off = reserveSpaceForItemPointer(buf); + tsconfig_off = reserveSpaceForItemPointer(buf); + + /* + * Write all integers FIRST so they are naturally 4-byte + * aligned + */ + appendBinaryStringInfo(buf, &item->value.tsmatch.tsquery_len, sizeof(uint32)); + appendBinaryStringInfo(buf, &tsqparser_len_val, sizeof(uint32)); + + /* Now append the strings at the end */ + appendBinaryStringInfo(buf, item->value.tsmatch.tsquery, item->value.tsmatch.tsquery_len); + appendStringInfoChar(buf, '\0'); + + if (item->value.tsmatch.tsqparser) + { + appendBinaryStringInfo(buf, item->value.tsmatch.tsqparser, tsqparser_len_val); + appendStringInfoChar(buf, '\0'); + } + + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.doc, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + expr_off) = chld - pos; + + /* TSConfig (Optional) */ + if (item->value.tsmatch.tsconfig) + { + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.tsconfig, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + tsconfig_off) = chld - pos; + } + else + { + *(int32 *) (buf->data + tsconfig_off) = 0; + } + } + break; case jpiFilter: argNestingLevel++; pg_fallthrough; @@ -767,6 +816,58 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey, appendStringInfoChar(buf, '"'); } + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + case jpiTsMatch: + if (printBracketes) + appendStringInfoChar(buf, '('); + + jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc); + printJsonPathItem(buf, &elem, false, + operationPriority(elem.type) <= + operationPriority(v->type)); + + appendStringInfoString(buf, " tsmatch "); + + escape_json_with_len(buf, + v->content.tsmatch.tsquery, + v->content.tsmatch.tsquery_len); + + /* Print TSConfig if present */ + if (v->content.tsmatch.tsconfig) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + + appendStringInfoString(buf, " tsconfig "); + jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + appendStringInfoChar(buf, '"'); + appendBinaryStringInfo(buf, config_str, config_len); + appendStringInfoChar(buf, '"'); + } + + if (v->content.tsmatch.tsqparser_len > 0) + { + appendStringInfoString(buf, " tsqparser "); + appendStringInfoChar(buf, '"'); + + /* + * Use simple binary append since flags like "pl" don't need + * JSON escaping + */ + appendBinaryStringInfo(buf, + v->content.tsmatch.tsqparser, + v->content.tsmatch.tsqparser_len); + appendStringInfoChar(buf, '"'); + } + + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + if (printBracketes) appendStringInfoChar(buf, ')'); break; @@ -976,6 +1077,8 @@ jspOperationName(JsonPathItemType type) return "timestamp"; case jpiTimestampTz: return "timestamp_tz"; + case jpiTsMatch: + return "tsmatch"; case jpiStrReplace: return "replace"; case jpiStrLower: @@ -1158,6 +1261,27 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos) read_int32(v->content.like_regex.patternlen, base, pos); v->content.like_regex.pattern = base + pos; break; + case jpiTsMatch: + /* FIX: Read all integers first */ + read_int32(v->content.tsmatch.doc, base, pos); + read_int32(v->content.tsmatch.tsconfig, base, pos); + read_int32(v->content.tsmatch.tsquery_len, base, pos); + read_int32(v->content.tsmatch.tsqparser_len, base, pos); + + /* Set pointers to the strings in the buffer */ + v->content.tsmatch.tsquery = base + pos; + pos += v->content.tsmatch.tsquery_len + 1; + + if (v->content.tsmatch.tsqparser_len > 0) + { + v->content.tsmatch.tsqparser = base + pos; + pos += v->content.tsmatch.tsqparser_len + 1; + } + else + { + v->content.tsmatch.tsqparser = NULL; + } + break; default: elog(ERROR, "unrecognized jsonpath item type: %d", v->type); } @@ -1231,6 +1355,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a) v->type == jpiLast || v->type == jpiStartsWith || v->type == jpiLikeRegex || + v->type == jpiTsMatch || v->type == jpiBigint || v->type == jpiBoolean || v->type == jpiDate || @@ -1575,7 +1700,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt) jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr); jspIsMutableWalker(&arg, cxt); break; - + case jpiTsMatch: + Assert(status == jpdsNonDateTime); + jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc); + jspIsMutableWalker(&arg, cxt); + break; /* literals */ case jpiNull: case jpiString: diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 770840a0611..b95ee630f2a 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -65,6 +65,7 @@ #include "miscadmin.h" #include "nodes/miscnodes.h" #include "nodes/nodeFuncs.h" +#include "nodes/pg_list.h" #include "regex/regex.h" #include "utils/builtins.h" #include "utils/date.h" @@ -114,6 +115,7 @@ typedef struct JsonPathExecContext bool throwErrors; /* with "false" all suppressible errors are * suppressed */ bool useTz; + List *tsmatch_cache; /* Persists compiled FTS queries */ } JsonPathExecContext; /* Context for LIKE_REGEX execution. */ @@ -123,6 +125,20 @@ typedef struct JsonLikeRegexContext int cflags; } JsonLikeRegexContext; +typedef struct JsonTsMatchContext +{ + Datum queryDatum; /* Cache the compiled binary TSQuery */ + Oid tsconfigId; /* Cache the dictionary OID */ + bool initialized; /* Flag to run setup only once */ +} JsonTsMatchContext; + +typedef struct JsonTsMatchCacheEntry +{ + void *jsp_ptr; /* Cache key: pointer to the AST node */ + Datum queryDatum; /* Cached compiled query */ + Oid tsconfigId; /* Cached dictionary OID */ +} JsonTsMatchCacheEntry; + /* Result of jsonpath predicate evaluation */ typedef enum JsonPathBool { @@ -335,6 +351,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found); static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonValueList *found, JsonPathBool res); +static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param); static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item, JsonbValue *value); static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen, @@ -740,6 +757,7 @@ executeJsonPath(JsonPath *path, void *vars, JsonPathGetVarCallback getVar, cxt.innermostArraySize = -1; cxt.throwErrors = throwErrors; cxt.useTz = useTz; + cxt.tsmatch_cache = NIL; if (jspStrictAbsenceOfErrors(&cxt) && !result) { @@ -840,6 +858,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, case jpiExists: case jpiStartsWith: case jpiLikeRegex: + case jpiTsMatch: { JsonPathBool st = executeBoolItem(cxt, jsp, jb, true); @@ -1914,6 +1933,18 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return executePredicate(cxt, jsp, &larg, NULL, jb, false, executeLikeRegex, &lrcxt); } + case jpiTsMatch: + { + jspInitByBuffer(&larg, jsp->base, + jsp->content.tsmatch.doc); + + /* + * Pass 'cxt' as the param so executeTsMatch can access the + * cache list + */ + return executePredicate(cxt, jsp, &larg, NULL, jb, false, + executeTsMatch, cxt); + } case jpiExists: jspGetArg(jsp, &larg); @@ -1952,7 +1983,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return res == jperOk ? jpbTrue : jpbFalse; } - default: elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type); return jpbUnknown; @@ -3185,6 +3215,127 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, return res; } +#include "tsearch/ts_utils.h" +#include "tsearch/ts_cache.h" +#include "utils/regproc.h" +#include "catalog/namespace.h" + +static JsonPathBool +executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, + void *param) +{ + JsonPathExecContext *cxt = (JsonPathExecContext *) param; + JsonTsMatchCacheEntry *cache = NULL; + ListCell *lc; + text *doc_text; + Datum tsvector_datum; + bool match; + + if (!(str = getScalar(str, jbvString))) + return jpbUnknown; + + /* Check if we already compiled the query for this specific AST node */ + foreach(lc, cxt->tsmatch_cache) + { + JsonTsMatchCacheEntry *entry = (JsonTsMatchCacheEntry *) lfirst(lc); + + if (entry->jsp_ptr == (void *) jsp) + { + cache = entry; + break; + } + } + + /* If not found, parse, compile, and cache it */ + if (cache == NULL) + { + text *query_text; + char *parser_mode; + uint32 parser_len; + + /* Allocate new cache entry in the execution memory context */ + cache = palloc0(sizeof(JsonTsMatchCacheEntry)); + cache->jsp_ptr = (void *) jsp; + + if (jsp->content.tsmatch.tsconfig != 0) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + char *safe_config_str; + + jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + + /* Null-terminate for safe catalog lookup */ + safe_config_str = pnstrdup(config_str, config_len); + cache->tsconfigId = get_ts_config_oid(stringToQualifiedNameList(safe_config_str, NULL), true); + pfree(safe_config_str); + } + else + { + cache->tsconfigId = getTSCurrentConfig(true); + } + + /* Prepare Query Text */ + query_text = cstring_to_text_with_len(jsp->content.tsmatch.tsquery, + jsp->content.tsmatch.tsquery_len); + + /* Select Parser and Compile Query */ + parser_mode = jsp->content.tsmatch.tsqparser; + parser_len = jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(plainto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "ph", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(phraseto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "w", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(websearch_to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else + { + pg_unreachable(); + } + } + else + { + cache->queryDatum = DirectFunctionCall2(to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + + /* Append the newly compiled cache entry to the context's list */ + cxt->tsmatch_cache = lappend(cxt->tsmatch_cache, cache); + } + + /* Runtime Execution (Using the cached query) */ + doc_text = cstring_to_text_with_len(str->val.string.val, + str->val.string.len); + + tsvector_datum = DirectFunctionCall2(to_tsvector_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(doc_text)); + + match = DatumGetBool(DirectFunctionCall2(ts_match_vq, + tsvector_datum, + cache->queryDatum)); + + return match ? jpbTrue : jpbFalse; +} + /* * Convert boolean execution status 'res' to a boolean JSON item and execute diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index f826697d098..47f6208d933 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -43,7 +43,12 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *flags, JsonPathParseItem ** result, struct Node *escontext); - +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem ** result, + struct Node *escontext); /* * Bison doesn't allocate anything that needs to live across parser calls, * so we can easily have it use palloc instead of malloc. This prevents @@ -73,6 +78,13 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathParseItem *value; JsonPathParseResult *result; JsonPathItemType optype; + struct + { + bool has_tsconfig; + JsonPathString tsconfig; + bool has_tsqparser; + JsonPathString tsqparser; + } tsmatch_opts; bool boolean; int integer; } @@ -81,7 +93,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P %token <str> OR_P AND_P NOT_P %token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P -%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P +%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P TSQUERYPARSER_P %token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P %token <str> DATETIME_P %token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P @@ -109,6 +121,8 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %type <integer> any_level +%type <tsmatch_opts> tsmatch_opts + %left OR_P %left AND_P %right NOT_P @@ -189,7 +203,44 @@ predicate: YYABORT; $$ = jppitem; } + | expr TSMATCH_P STRING_P tsmatch_opts + { + JsonPathParseItem *jppitem; + + makeItemTsMatch($1, &$3, + $4.has_tsconfig ? &$4.tsconfig : NULL, + $4.has_tsqparser ? &$4.tsqparser : NULL, + &jppitem, escontext); + + $$ = jppitem; + } ; + tsmatch_opts: + /* EMPTY */ + { + $$.has_tsconfig = false; + $$.has_tsqparser = false; + } + | TSCONFIG_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = false; + } + | TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = false; + $$.has_tsqparser = true; + $$.tsqparser = $2; + } + | TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = true; + $$.tsqparser = $4; + } + ; starts_with_initial: STRING_P { $$ = makeItemString(&$1); } @@ -377,6 +428,9 @@ key_name: | TIME_TZ_P | TIMESTAMP_P | TIMESTAMP_TZ_P + | TSCONFIG_P + | TSMATCH_P + | TSQUERYPARSER_P | STR_LOWER_P | STR_UPPER_P | STR_INITCAP_P @@ -715,3 +769,60 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext) return true; } + +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) +{ + JsonPathParseItem *v = makeItemType(jpiTsMatch); + + v->value.tsmatch.doc = doc; + + v->value.tsmatch.tsquery = tsquery->val; + v->value.tsmatch.tsquery_len = tsquery->len; + + /* Handle the Configuration (Stored as a Node) */ + if (tsconfig) + { + JsonPathParseItem *conf = makeItemType(jpiString); + conf->value.string.val = tsconfig->val; + conf->value.string.len = tsconfig->len; + v->value.tsmatch.tsconfig = conf; + } + else + { + v->value.tsmatch.tsconfig = NULL; + } + + /* Handle the TSQuery Parser Flag */ + if (tsquery_parser) + { + /* Check for "pl", "ph", "w" here to throw a syntax error immediately */ + if (pg_strncasecmp(tsquery_parser->val, "pl", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "ph", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "w", tsquery_parser->len) != 0) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser value: \"%.*s\"", + (int) tsquery_parser->len, tsquery_parser->val), + errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); + } + + v->value.tsmatch.tsqparser = tsquery_parser->val; + v->value.tsmatch.tsqparser_len = tsquery_parser->len; + } + else + { + v->value.tsmatch.tsqparser = NULL; + v->value.tsmatch.tsqparser_len = 0; + } + + /* Success */ + *result = v; + return true; +} diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index e4fadcc2e69..92b3e9be926 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -434,10 +434,13 @@ static const JsonPathKeyword keywords[] = { {7, false, INTEGER_P, "integer"}, {7, false, STR_REPLACE_P, "replace"}, {7, false, TIME_TZ_P, "time_tz"}, + {7, false, TSMATCH_P, "tsmatch"}, {7, false, UNKNOWN_P, "unknown"}, {8, false, DATETIME_P, "datetime"}, {8, false, KEYVALUE_P, "keyvalue"}, + {8, false, TSCONFIG_P, "tsconfig"}, {9, false, TIMESTAMP_P, "timestamp"}, + {9, false, TSQUERYPARSER_P, "tsqparser"}, {10, false, LIKE_REGEX_P, "like_regex"}, {10, false, STR_SPLIT_PART_P, "split_part"}, {12, false, TIMESTAMP_TZ_P, "timestamp_tz"}, diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h index 8d27206e242..89753bf8bc7 100644 --- a/src/include/utils/jsonpath.h +++ b/src/include/utils/jsonpath.h @@ -104,6 +104,7 @@ typedef enum JsonPathItemType jpiLast, /* LAST array subscript */ jpiStartsWith, /* STARTS WITH predicate */ jpiLikeRegex, /* LIKE_REGEX predicate */ + jpiTsMatch, /* TSMATCH predicate */ jpiBigint, /* .bigint() item method */ jpiBoolean, /* .boolean() item method */ jpiDate, /* .date() item method */ @@ -196,6 +197,15 @@ typedef struct JsonPathItem int32 patternlen; uint32 flags; } like_regex; + struct + { + int32 doc; + char *tsquery; + uint32 tsquery_len; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } content; } JsonPathItem; @@ -274,6 +284,15 @@ struct JsonPathParseItem uint32 len; char *val; /* could not be not null-terminated */ } string; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquery_len; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; }; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index afa6c4cb529..ac78becc305 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -4891,3 +4891,61 @@ ORDER BY s1.num, s2.num; {"s": "B"} | {"s": "B"} | false | true | true | true | false (144 rows) +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); + jsonb_path_query +------------------ +(0 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); + jsonb_path_query +------------------ + "running" + "runs" + "run" +(3 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); + jsonb_path_query +------------------ + "run" +(1 row) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +ERROR: syntax error in tsquery: "fast car" +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); + jsonb_path_query +------------------ + "fat rats" +(1 row) + diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index ea971e79854..600cf572f08 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -1480,3 +1480,63 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, 1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | | (5 rows) +-- tsmatch (Full Text Search) +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; + jsonpath +------------------------ + $?(@ tsmatch "simple") +(1 row) + +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; + jsonpath +-------------------------------------------- + $?(@ tsmatch "running" tsconfig "english") +(1 row) + +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; + jsonpath +-------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple") +(1 row) + +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w") +(1 row) + +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +ERROR: syntax error at or near " " of jsonpath input +LINE 1: select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconf... + ^ +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------- + $?(@ tsmatch "fast & furious" tsqparser "w") +(1 row) + +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; + jsonpath +-------------------------------------------------- + $[*]?(@."title" tsmatch "god" && @."rating" > 5) +(1 row) + +select '$ ? (@ tsmatch $pattern)'::jsonpath; +ERROR: syntax error at or near "$pattern" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath; + ^ +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +ERROR: syntax error at or near "$var" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; + ^ +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; + jsonpath +------------------------------------------------ + $?(@ tsmatch "running" tsconfig "wrongconfig") +(1 row) + diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index d3a38c57791..8698021ba2b 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -1253,3 +1253,16 @@ SELECT jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt FROM str s1, str s2 ORDER BY s1.num, s2.num; + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql index 44178d8b45a..dc428ffe59b 100644 --- a/src/test/regress/sql/jsonpath.sql +++ b/src/test/regress/sql/jsonpath.sql @@ -306,3 +306,22 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, '00', '1a']) str, LATERAL pg_input_error_info(str, 'jsonpath') as errinfo; + +-- tsmatch (Full Text Search) + +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; +select '$ ? (@ tsmatch $pattern)'::jsonpath; + +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; -- 2.53.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search 2026-02-27 05:59 Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 2026-03-02 03:44 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Chao Li <[email protected]> 2026-04-03 09:56 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> @ 2026-04-04 09:38 ` Florents Tselai <[email protected]> 2026-04-05 09:15 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Florents Tselai @ 2026-04-04 09:38 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: pgsql-hackers <[email protected]> Here's a v5 which is v4 + pgindent On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <[email protected]> wrote: > > > > On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote: > >> >> >> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> >> wrote: >> > >> > >> > >> > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> wrote: >> > >> > >> > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> >> wrote: >> > > >> > > >> > > >> > > >> > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < >> [email protected]> wrote: >> > > Hi, >> > > >> > > in real-life I work a lot with json & fts search, here's a feature >> I've always wished I had, >> > > but never tackle it. Until yesterday that is. >> > > >> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >> @.body tsmatch "performance")'); >> > > >> > > This patch introduces a tsmatch boolean operator to the JSONPath >> engine. >> > > By integrating FTS natively into path expressions, >> > > this operator allows for high-precision filtering of nested JSONB >> structures— >> > > solving issues with structural ambiguity and query complexity. >> > > >> > > Currently, users must choose between two suboptimal paths for FTS-ing >> nested JSON: >> > > - Imprecise Global Indexing >> > > jsonb_to_tsvector aggregates text into a flat vector. >> > > This ignores JSON boundaries, leading to false positives when the >> same key (e.g., "body") >> > > appears in different contexts (e.g., a "Product Description" vs. a >> "Customer Review"). >> > > >> > > - Complex SQL Workarounds >> > > Achieving 100% precision requires unnesting the document via >> jsonb_array_elements and LATERAL joins. >> > > This leads to verbose SQL and high memory overhead from generating >> intermediate heap tuples. >> > > >> > > One of the most significant advantages of tsmatch is its ability to >> participate in multi-condition predicates >> > > within the same JSON object - something jsonb_to_tsvector cannot do. >> > > >> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >> @.body tsmatch "performance")'); >> > > >> > > In a flat vector, the association between "Alice" and "performance" >> is lost. >> > > tsmatch preserves this link by evaluating the FTS predicate in-place >> during path traversal. >> > > >> > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly >> define an FTS operator, >> > > tsmatch is architecturally modeled after the standard-defined >> like_regex. >> > > >> > > The implementation follows the like_regex precedent: >> > > it is a non-indexable predicate that relies on GIN path-matching for >> pruning and heap re-checks for precision. >> > > Caching is scoped to the JsonPathExecContext, >> > > ensuring 'compile-once' efficiency per execution without violating >> the stability requirements of prepared statements. >> > > >> > > This initial implementation uses plainto_tsquery. >> > > However, the grammar is designed to support a "mode" flag (similar to >> like_regex flags) >> > > in future iterations to toggle between to_tsquery, >> websearch_to_tsquery, and phraseto_tsquery. >> > > >> > > Here's a v2, that implements the tsqparser clause >> > > >> > > So this should now work too >> > > >> > > select jsonb_path_query_array('["fast car", "slow car", "fast and >> furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") >> <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >> > >> > Hi Florents, >> > >> > Grant pinged me about this. I can review it in coming days. Can you >> please rebase it? I failed to apply to current master. Also, the CF >> reported a failure test case, please take a look. >> > >> > Hi Evan, >> > thanks for having a look. The conflict was due to the intro of >> pg_fallthrough. Not related to this patch . >> > >> > I noticed the failure too, but I'm having a hard time reproducing it >> tbh. >> > This fails for Debian Trixie with Meson. The same with Autoconf >> passes... >> > >> > https://github.com/Florents-Tselai/postgres/runs/65098077968 >> > >> > >> > >> > >> > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >> >> I have reviewed v3 and traced a few test cases. Here comes my review >> comments: >> >> 1 >> ``` >> + <replaceable>string</replaceable> <literal>tsmatch</literal> >> <replaceable>string</replaceable> >> + <optional> <literal>tsconfig</literal> >> <replaceable>string</replaceable> </optional> >> + <optional> <literal>tsqparser</literal> >> <replaceable>string</replaceable> </optional> >> ``` >> >> For all “replaceable”, instead of “string”, would it be better to use >> something more descriptive? For example: >> ``` >> <replaceable>json_string</replaceable> <literal>tsmatch</literal> >> <replaceable>query</replaceable> >> <optional> <literal>tsconfig</literal> >> <replaceable>config_name</replaceable> </optional> >> <optional> <literal>tsqparser</literal> >> <replaceable>parser_mode</replaceable> </optional> >> ``` >> >> 2 - jsonpath_gram.y >> ``` >> +static bool makeItemTsMatch(JsonPathParseItem *doc, >> + JsonPathString >> *tsquery, >> + JsonPathString >> *tsconfig, >> + JsonPathString >> *tsquery_parser, >> + >> JsonPathParseItem ** result, >> + struct Node >> *escontext); >> ``` >> >> Format Nit: Looking at the existing code, the J in the second and >> following lines, should be placed in the same column as the J in the first >> line. >> >> 3 - jsonpath_gram.y >> ``` >> + | expr TSMATCH_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsconfig (3rd) and NULL for >> tsquery_parser (4th) */ >> + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsquery_parser (4th) */ >> + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + /* Pass NULL for tsconfig (3rd) */ >> + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P >> STRING_P >> + { >> + JsonPathParseItem *jppitem; >> + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, >> escontext)) >> + YYABORT; >> + $$ = jppitem; >> + } >> ``` >> >> Feels a little redundant, repeatedly calls makeItemTsMatch. See the >> attached diff for a simplification. But my version is a bit longer in terms >> of number of lines. So, up to you. >> >> 4 - jsonpath_gram.y >> ``` >> +static bool >> +makeItemTsMatch(JsonPathParseItem *doc, >> + JsonPathString *tsquery, >> + JsonPathString *tsconfig, >> + JsonPathString *tsquery_parser, >> + JsonPathParseItem **result, >> + struct Node *escontext) >> ``` >> >> makeItemTsMatch doesn’t need to return a bool. Actually, now it never >> returns false, instead, it just ereport(ERROR). >> >> 5 - jsonpath.h >> ``` >> + struct >> + { >> + int32 doc; >> + char *tsquery; >> + uint32 tsquerylen; >> + int32 tsconfig; >> + char *tsqparser; >> + uint32 tsqparser_len; >> + } tsmatch; >> >> + struct >> + { >> + JsonPathParseItem *doc; >> + char *tsquery; >> + uint32 tsquerylen; >> + JsonPathParseItem *tsconfig; >> + char *tsqparser; >> + uint32 tsqparser_len; >> + } tsmatch; >> } value; >> ``` >> >> tsquerylen doesn’t have _ before len, and tsqparser_len, would it be >> better to make naming conventions consistent in the same structure? >> >> 6 - jsonpath_exec.c >> ``` >> #include "tsearch/ts_utils.h" >> #include "tsearch/ts_cache.h" >> #include "utils/regproc.h" >> #include "catalog/namespace.h" >> >> static JsonPathBool >> executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, >> void *param) >> ``` >> >> Why don’t put these includes to the header section together with other >> includes? >> >> 7 - jsonpath_exec.c >> ``` >> + else >> + { >> + /* >> + * Fallback or Error for unknown flags >> (should be caught by >> + * parser) >> + */ >> + ereport(ERROR, >> + >> (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("unrecognized >> tsqparser flag"))); >> + } >> ``` >> >> This “else” should never be entered as the same check has been done by >> makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). >> >> 8 - jsonpath_exec.c >> ``` >> + /* Setup Context (Run ONLY once per predicate) */ >> + if (!cxt->initialized) >> ``` >> >> While tracing this SQL: >> ``` >> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >> evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; >> ?column? >> ---------- >> >> (1 row) >> ``` >> >> I noticed that, when process “jogging”, cxt->initialized is still false, >> meaning that, the cxt is not reused across array items. Given the same >> tsconfig should apply to all array items, I think cxt should be reused. >> >> 9 - jsonpath_exec.c >> ``` >> + /* Select Parser and Compile Query */ >> + parser_mode = jsp->content.tsmatch.tsqparser; >> + parser_len = jsp->content.tsmatch.tsqparser_len; >> + >> + if (parser_len > 0) >> + { >> + /* Dispatch based on flag */ >> + if (pg_strncasecmp(parser_mode, "pl", parser_len) >> == 0) >> ``` >> >> Nit: parser_mode is only used inside if (parser_len > 0), it can be >> defined inside the “if”. >> >> 10 - jsonpath_gram.y >> ``` >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("invalid tsquery_parser >> value: \"%s\"", tsquery_parser->val), >> + errhint("Valid values are >> \"pl\", \"ph\", and \"w\"."))); >> ``` >> >> When tested a case with an invalid parser, I got: >> ``` >> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >> >> @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser >> "pss")'; >> ERROR: invalid tsquery_parser value: "pss @" >> LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... >> ^ >> HINT: Valid values are "pl", "ph", and "w". >> ``` >> >> You can see the it shows a bad looking invalid value. I think that’s >> because tsquery_parser->val is not NULL terminated. I fixed this problem >> with: >> ``` >> errmsg("invalid tsquery_parser value: \"%.*s\"", (int) >> tsquery_parser->len, tsquery_parser->val), >> ``` >> >> This change is also included in the attached diff file. >> >> 11 - jsonpath.c >> ``` >> + if (printBracketes) >> + appendStringInfoChar(buf, ')'); >> + break; >> + >> if (printBracketes) >> appendStringInfoChar(buf, ')'); >> ``` >> >> Duplicate code. Looks like a copy-pasto. >> >> 12 - jsonpath.c >> ``` >> + /* Write the Main Query String */ >> + appendBinaryStringInfo(buf, >> + >> &item->value.tsmatch.tsquerylen, >> + >> sizeof(item->value.tsmatch.tsquerylen)); >> + appendBinaryStringInfo(buf, >> + >> item->value.tsmatch.tsquery, >> + >> item->value.tsmatch.tsquerylen); >> + appendStringInfoChar(buf, '\0'); >> ``` >> >> I don’t think we need to manually append ‘\0’ after >> appendBinaryStringInfo. Looking at the header comment of >> appendBinaryStringInfo, it says that a trailing null will be added. >> ``` >> /* >> * appendBinaryStringInfo >> * >> * Append arbitrary binary data to a StringInfo, allocating more space >> * if necessary. Ensures that a trailing null byte is present. >> */ >> void >> appendBinaryStringInfo(StringInfo str, const void *data, int datalen) >> ``` >> > > Here's a v4 which incorporates most of Evan's comments & feedback > - shifted tsquery compilation logic to use persistent cache within > JsonPathExecContext > - Fixed a binary serialization alignment issue which caused Dixie to fail > earlier > - I've refactor and simplified the grammar per Evan's input by adding a > tsmatch_opts rule. > - Also updated the docs per Evan's comments > > > Attachments: [application/octet-stream] v5-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (33.0K, 3-v5-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch) download | inline diff: From 4b90199612d058753bd7dcd39fac9716f1a4bc44 Mon Sep 17 00:00:00 2001 From: Florents Tselai <[email protected]> Date: Sat, 4 Apr 2026 12:04:25 +0300 Subject: [PATCH v5] Add tsmatch JSONPath operator for granular Full Text Search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch introduces the tsmatch boolean operator to the JSONPath engine. By integrating FTS natively into path expressions, this operator allows for high-precision filtering of nested JSONB structures—solving issues with structural ambiguity and query complexity. Currently, users must choose between two suboptimal paths for searching nested JSON: 1. Imprecise Global Indexing jsonb_to_tsvector aggregates text into a flat vector. This ignores JSON boundaries, leading to false positives when the same key (e.g., "body") appears in different contexts (e.g., a "Product Description" vs. a "Customer Review"). 2. Complex SQL Workarounds Achieving 100% precision requires "exploding" the document via jsonb_array_elements and LATERAL joins. This leads to verbose SQL and high memory overhead from generating intermediate heap tuples. One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates within the same JSON object— something jsonb_to_tsvector cannot do. SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); In a flat vector, the association between "Alice" and "performance" is lost. tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal. While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator, tsmatch is architecturally modeled after the standard-defined like_regex. The operator supports optional configuration for both the dictionary and the query parser: @ tsmatch "query" [ tsconfig "regconfig" ] [ tsqparser "mode" ] Supported parser modes are: - "pl": plainto_tsquery (no operators required) - "ph": phraseto_tsquery - "w": websearch_to_tsquery - Omitted: Defaults to to_tsquery (strict mode) The implementation relies on GIN path-matching for index pruning and heap re-checks for precision. Caching is scoped to the JsonPathExecContext, ensuring 'compile-once' efficiency for the tsquery and OID lookup per execution, respecting the stability requirements of prepared statements. --- doc/src/sgml/func/func-json.sgml | 34 +++++ src/backend/utils/adt/jsonpath.c | 131 +++++++++++++++- src/backend/utils/adt/jsonpath_exec.c | 153 ++++++++++++++++++- src/backend/utils/adt/jsonpath_gram.y | 115 +++++++++++++- src/backend/utils/adt/jsonpath_scan.l | 3 + src/include/utils/jsonpath.h | 19 +++ src/test/regress/expected/jsonb_jsonpath.out | 58 +++++++ src/test/regress/expected/jsonpath.out | 60 ++++++++ src/test/regress/sql/jsonb_jsonpath.sql | 13 ++ src/test/regress/sql/jsonpath.sql | 19 +++ 10 files changed, 601 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml index 4cd338fe6e3..edf67fc694e 100644 --- a/doc/src/sgml/func/func-json.sgml +++ b/doc/src/sgml/func/func-json.sgml @@ -3247,6 +3247,40 @@ ERROR: jsonpath member accessor can only be applied to an object <returnvalue>[]</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable> + <optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional> + <optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional> + <returnvalue>boolean</returnvalue> + </para> + <para> + Tests whether the JSON string (first operand) matches the full-text search query + (second operand). The matching behavior can be customized using the optional + <literal>tsconfig</literal> and <literal>tsqparser</literal> clauses. + If <literal>tsconfig</literal> is omitted, the current session's default text search configuration + is used (see <xref linkend="guc-default-text-search-config"/>). + The <literal>tsqparser</literal> clause determines how the query string is parsed + (see <xref linkend="textsearch-parsing-queries"/>). + Valid options are <literal>"pl"</literal> (<function>plainto_tsquery</function>), + <literal>"ph"</literal> (<function>phraseto_tsquery</function>), and + <literal>"w"</literal> (<function>websearch_to_tsquery</function>). + If <literal>tsqparser</literal> is omitted, the query is parsed using <function>to_tsquery</function>. + </para> + <para> + <literal>jsonb_path_query_array('["running", "runs", "ran", "jogging"]', '$[*] ? (@ tsmatch "run" tsconfig "english")')</literal> + <returnvalue>["running", "runs"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast & car")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 7bfc18c9888..5abb6f433f7 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -351,6 +351,55 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, *(int32 *) (buf->data + offs) = chld - pos; } break; + case jpiTsMatch: + { + int32 expr_off; + int32 tsconfig_off; + uint32 tsqparser_len_val = item->value.tsmatch.tsqparser ? item->value.tsmatch.tsqparser_len : 0; + + expr_off = reserveSpaceForItemPointer(buf); + tsconfig_off = reserveSpaceForItemPointer(buf); + + /* + * Write all integers FIRST so they are naturally 4-byte + * aligned + */ + appendBinaryStringInfo(buf, &item->value.tsmatch.tsquery_len, sizeof(uint32)); + appendBinaryStringInfo(buf, &tsqparser_len_val, sizeof(uint32)); + + /* Now append the strings at the end */ + appendBinaryStringInfo(buf, item->value.tsmatch.tsquery, item->value.tsmatch.tsquery_len); + appendStringInfoChar(buf, '\0'); + + if (item->value.tsmatch.tsqparser) + { + appendBinaryStringInfo(buf, item->value.tsmatch.tsqparser, tsqparser_len_val); + appendStringInfoChar(buf, '\0'); + } + + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.doc, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + expr_off) = chld - pos; + + /* TSConfig (Optional) */ + if (item->value.tsmatch.tsconfig) + { + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.tsconfig, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + tsconfig_off) = chld - pos; + } + else + { + *(int32 *) (buf->data + tsconfig_off) = 0; + } + } + break; case jpiFilter: argNestingLevel++; pg_fallthrough; @@ -767,6 +816,58 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey, appendStringInfoChar(buf, '"'); } + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + case jpiTsMatch: + if (printBracketes) + appendStringInfoChar(buf, '('); + + jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc); + printJsonPathItem(buf, &elem, false, + operationPriority(elem.type) <= + operationPriority(v->type)); + + appendStringInfoString(buf, " tsmatch "); + + escape_json_with_len(buf, + v->content.tsmatch.tsquery, + v->content.tsmatch.tsquery_len); + + /* Print TSConfig if present */ + if (v->content.tsmatch.tsconfig) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + + appendStringInfoString(buf, " tsconfig "); + jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + appendStringInfoChar(buf, '"'); + appendBinaryStringInfo(buf, config_str, config_len); + appendStringInfoChar(buf, '"'); + } + + if (v->content.tsmatch.tsqparser_len > 0) + { + appendStringInfoString(buf, " tsqparser "); + appendStringInfoChar(buf, '"'); + + /* + * Use simple binary append since flags like "pl" don't need + * JSON escaping + */ + appendBinaryStringInfo(buf, + v->content.tsmatch.tsqparser, + v->content.tsmatch.tsqparser_len); + appendStringInfoChar(buf, '"'); + } + + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + if (printBracketes) appendStringInfoChar(buf, ')'); break; @@ -976,6 +1077,8 @@ jspOperationName(JsonPathItemType type) return "timestamp"; case jpiTimestampTz: return "timestamp_tz"; + case jpiTsMatch: + return "tsmatch"; case jpiStrReplace: return "replace"; case jpiStrLower: @@ -1158,6 +1261,27 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos) read_int32(v->content.like_regex.patternlen, base, pos); v->content.like_regex.pattern = base + pos; break; + case jpiTsMatch: + /* FIX: Read all integers first */ + read_int32(v->content.tsmatch.doc, base, pos); + read_int32(v->content.tsmatch.tsconfig, base, pos); + read_int32(v->content.tsmatch.tsquery_len, base, pos); + read_int32(v->content.tsmatch.tsqparser_len, base, pos); + + /* Set pointers to the strings in the buffer */ + v->content.tsmatch.tsquery = base + pos; + pos += v->content.tsmatch.tsquery_len + 1; + + if (v->content.tsmatch.tsqparser_len > 0) + { + v->content.tsmatch.tsqparser = base + pos; + pos += v->content.tsmatch.tsqparser_len + 1; + } + else + { + v->content.tsmatch.tsqparser = NULL; + } + break; default: elog(ERROR, "unrecognized jsonpath item type: %d", v->type); } @@ -1231,6 +1355,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a) v->type == jpiLast || v->type == jpiStartsWith || v->type == jpiLikeRegex || + v->type == jpiTsMatch || v->type == jpiBigint || v->type == jpiBoolean || v->type == jpiDate || @@ -1575,7 +1700,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt) jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr); jspIsMutableWalker(&arg, cxt); break; - + case jpiTsMatch: + Assert(status == jpdsNonDateTime); + jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc); + jspIsMutableWalker(&arg, cxt); + break; /* literals */ case jpiNull: case jpiString: diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 770840a0611..b95ee630f2a 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -65,6 +65,7 @@ #include "miscadmin.h" #include "nodes/miscnodes.h" #include "nodes/nodeFuncs.h" +#include "nodes/pg_list.h" #include "regex/regex.h" #include "utils/builtins.h" #include "utils/date.h" @@ -114,6 +115,7 @@ typedef struct JsonPathExecContext bool throwErrors; /* with "false" all suppressible errors are * suppressed */ bool useTz; + List *tsmatch_cache; /* Persists compiled FTS queries */ } JsonPathExecContext; /* Context for LIKE_REGEX execution. */ @@ -123,6 +125,20 @@ typedef struct JsonLikeRegexContext int cflags; } JsonLikeRegexContext; +typedef struct JsonTsMatchContext +{ + Datum queryDatum; /* Cache the compiled binary TSQuery */ + Oid tsconfigId; /* Cache the dictionary OID */ + bool initialized; /* Flag to run setup only once */ +} JsonTsMatchContext; + +typedef struct JsonTsMatchCacheEntry +{ + void *jsp_ptr; /* Cache key: pointer to the AST node */ + Datum queryDatum; /* Cached compiled query */ + Oid tsconfigId; /* Cached dictionary OID */ +} JsonTsMatchCacheEntry; + /* Result of jsonpath predicate evaluation */ typedef enum JsonPathBool { @@ -335,6 +351,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found); static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonValueList *found, JsonPathBool res); +static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param); static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item, JsonbValue *value); static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen, @@ -740,6 +757,7 @@ executeJsonPath(JsonPath *path, void *vars, JsonPathGetVarCallback getVar, cxt.innermostArraySize = -1; cxt.throwErrors = throwErrors; cxt.useTz = useTz; + cxt.tsmatch_cache = NIL; if (jspStrictAbsenceOfErrors(&cxt) && !result) { @@ -840,6 +858,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, case jpiExists: case jpiStartsWith: case jpiLikeRegex: + case jpiTsMatch: { JsonPathBool st = executeBoolItem(cxt, jsp, jb, true); @@ -1914,6 +1933,18 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return executePredicate(cxt, jsp, &larg, NULL, jb, false, executeLikeRegex, &lrcxt); } + case jpiTsMatch: + { + jspInitByBuffer(&larg, jsp->base, + jsp->content.tsmatch.doc); + + /* + * Pass 'cxt' as the param so executeTsMatch can access the + * cache list + */ + return executePredicate(cxt, jsp, &larg, NULL, jb, false, + executeTsMatch, cxt); + } case jpiExists: jspGetArg(jsp, &larg); @@ -1952,7 +1983,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return res == jperOk ? jpbTrue : jpbFalse; } - default: elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type); return jpbUnknown; @@ -3185,6 +3215,127 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, return res; } +#include "tsearch/ts_utils.h" +#include "tsearch/ts_cache.h" +#include "utils/regproc.h" +#include "catalog/namespace.h" + +static JsonPathBool +executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, + void *param) +{ + JsonPathExecContext *cxt = (JsonPathExecContext *) param; + JsonTsMatchCacheEntry *cache = NULL; + ListCell *lc; + text *doc_text; + Datum tsvector_datum; + bool match; + + if (!(str = getScalar(str, jbvString))) + return jpbUnknown; + + /* Check if we already compiled the query for this specific AST node */ + foreach(lc, cxt->tsmatch_cache) + { + JsonTsMatchCacheEntry *entry = (JsonTsMatchCacheEntry *) lfirst(lc); + + if (entry->jsp_ptr == (void *) jsp) + { + cache = entry; + break; + } + } + + /* If not found, parse, compile, and cache it */ + if (cache == NULL) + { + text *query_text; + char *parser_mode; + uint32 parser_len; + + /* Allocate new cache entry in the execution memory context */ + cache = palloc0(sizeof(JsonTsMatchCacheEntry)); + cache->jsp_ptr = (void *) jsp; + + if (jsp->content.tsmatch.tsconfig != 0) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + char *safe_config_str; + + jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + + /* Null-terminate for safe catalog lookup */ + safe_config_str = pnstrdup(config_str, config_len); + cache->tsconfigId = get_ts_config_oid(stringToQualifiedNameList(safe_config_str, NULL), true); + pfree(safe_config_str); + } + else + { + cache->tsconfigId = getTSCurrentConfig(true); + } + + /* Prepare Query Text */ + query_text = cstring_to_text_with_len(jsp->content.tsmatch.tsquery, + jsp->content.tsmatch.tsquery_len); + + /* Select Parser and Compile Query */ + parser_mode = jsp->content.tsmatch.tsqparser; + parser_len = jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(plainto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "ph", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(phraseto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (pg_strncasecmp(parser_mode, "w", parser_len) == 0) + { + cache->queryDatum = DirectFunctionCall2(websearch_to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else + { + pg_unreachable(); + } + } + else + { + cache->queryDatum = DirectFunctionCall2(to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + + /* Append the newly compiled cache entry to the context's list */ + cxt->tsmatch_cache = lappend(cxt->tsmatch_cache, cache); + } + + /* Runtime Execution (Using the cached query) */ + doc_text = cstring_to_text_with_len(str->val.string.val, + str->val.string.len); + + tsvector_datum = DirectFunctionCall2(to_tsvector_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(doc_text)); + + match = DatumGetBool(DirectFunctionCall2(ts_match_vq, + tsvector_datum, + cache->queryDatum)); + + return match ? jpbTrue : jpbFalse; +} + /* * Convert boolean execution status 'res' to a boolean JSON item and execute diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index f826697d098..00ed72b1d84 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -43,7 +43,12 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *flags, JsonPathParseItem ** result, struct Node *escontext); - +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem ** result, + struct Node *escontext); /* * Bison doesn't allocate anything that needs to live across parser calls, * so we can easily have it use palloc instead of malloc. This prevents @@ -73,6 +78,13 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathParseItem *value; JsonPathParseResult *result; JsonPathItemType optype; + struct + { + bool has_tsconfig; + JsonPathString tsconfig; + bool has_tsqparser; + JsonPathString tsqparser; + } tsmatch_opts; bool boolean; int integer; } @@ -81,7 +93,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P %token <str> OR_P AND_P NOT_P %token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P -%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P +%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P TSQUERYPARSER_P %token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P %token <str> DATETIME_P %token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P @@ -109,6 +121,8 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %type <integer> any_level +%type <tsmatch_opts> tsmatch_opts + %left OR_P %left AND_P %right NOT_P @@ -189,7 +203,44 @@ predicate: YYABORT; $$ = jppitem; } + | expr TSMATCH_P STRING_P tsmatch_opts + { + JsonPathParseItem *jppitem; + + makeItemTsMatch($1, &$3, + $4.has_tsconfig ? &$4.tsconfig : NULL, + $4.has_tsqparser ? &$4.tsqparser : NULL, + &jppitem, escontext); + + $$ = jppitem; + } ; + tsmatch_opts: + /* EMPTY */ + { + $$.has_tsconfig = false; + $$.has_tsqparser = false; + } + | TSCONFIG_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = false; + } + | TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = false; + $$.has_tsqparser = true; + $$.tsqparser = $2; + } + | TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = true; + $$.tsqparser = $4; + } + ; starts_with_initial: STRING_P { $$ = makeItemString(&$1); } @@ -377,6 +428,9 @@ key_name: | TIME_TZ_P | TIMESTAMP_P | TIMESTAMP_TZ_P + | TSCONFIG_P + | TSMATCH_P + | TSQUERYPARSER_P | STR_LOWER_P | STR_UPPER_P | STR_INITCAP_P @@ -715,3 +769,60 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext) return true; } + +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) +{ + JsonPathParseItem *v = makeItemType(jpiTsMatch); + + v->value.tsmatch.doc = doc; + + v->value.tsmatch.tsquery = tsquery->val; + v->value.tsmatch.tsquery_len = tsquery->len; + + /* Handle the Configuration (Stored as a Node) */ + if (tsconfig) + { + JsonPathParseItem *conf = makeItemType(jpiString); + conf->value.string.val = tsconfig->val; + conf->value.string.len = tsconfig->len; + v->value.tsmatch.tsconfig = conf; + } + else + { + v->value.tsmatch.tsconfig = NULL; + } + + /* Handle the TSQuery Parser Flag */ + if (tsquery_parser) + { + /* Check for "pl", "ph", "w" here to throw a syntax error immediately */ + if (pg_strncasecmp(tsquery_parser->val, "pl", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "ph", tsquery_parser->len) != 0 && + pg_strncasecmp(tsquery_parser->val, "w", tsquery_parser->len) != 0) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser value: \"%.*s\"", + (int) tsquery_parser->len, tsquery_parser->val), + errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); + } + + v->value.tsmatch.tsqparser = tsquery_parser->val; + v->value.tsmatch.tsqparser_len = tsquery_parser->len; + } + else + { + v->value.tsmatch.tsqparser = NULL; + v->value.tsmatch.tsqparser_len = 0; + } + + /* Success */ + *result = v; + return true; +} diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index e4fadcc2e69..92b3e9be926 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -434,10 +434,13 @@ static const JsonPathKeyword keywords[] = { {7, false, INTEGER_P, "integer"}, {7, false, STR_REPLACE_P, "replace"}, {7, false, TIME_TZ_P, "time_tz"}, + {7, false, TSMATCH_P, "tsmatch"}, {7, false, UNKNOWN_P, "unknown"}, {8, false, DATETIME_P, "datetime"}, {8, false, KEYVALUE_P, "keyvalue"}, + {8, false, TSCONFIG_P, "tsconfig"}, {9, false, TIMESTAMP_P, "timestamp"}, + {9, false, TSQUERYPARSER_P, "tsqparser"}, {10, false, LIKE_REGEX_P, "like_regex"}, {10, false, STR_SPLIT_PART_P, "split_part"}, {12, false, TIMESTAMP_TZ_P, "timestamp_tz"}, diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h index 8d27206e242..89753bf8bc7 100644 --- a/src/include/utils/jsonpath.h +++ b/src/include/utils/jsonpath.h @@ -104,6 +104,7 @@ typedef enum JsonPathItemType jpiLast, /* LAST array subscript */ jpiStartsWith, /* STARTS WITH predicate */ jpiLikeRegex, /* LIKE_REGEX predicate */ + jpiTsMatch, /* TSMATCH predicate */ jpiBigint, /* .bigint() item method */ jpiBoolean, /* .boolean() item method */ jpiDate, /* .date() item method */ @@ -196,6 +197,15 @@ typedef struct JsonPathItem int32 patternlen; uint32 flags; } like_regex; + struct + { + int32 doc; + char *tsquery; + uint32 tsquery_len; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } content; } JsonPathItem; @@ -274,6 +284,15 @@ struct JsonPathParseItem uint32 len; char *val; /* could not be not null-terminated */ } string; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquery_len; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; }; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index afa6c4cb529..ac78becc305 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -4891,3 +4891,61 @@ ORDER BY s1.num, s2.num; {"s": "B"} | {"s": "B"} | false | true | true | true | false (144 rows) +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); + jsonb_path_query +------------------ +(0 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); + jsonb_path_query +------------------ + "running" + "runs" + "run" +(3 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); + jsonb_path_query +------------------ + "run" +(1 row) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +ERROR: syntax error in tsquery: "fast car" +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); + jsonb_path_query +------------------ + "fat rats" +(1 row) + diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index ea971e79854..600cf572f08 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -1480,3 +1480,63 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, 1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | | (5 rows) +-- tsmatch (Full Text Search) +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; + jsonpath +------------------------ + $?(@ tsmatch "simple") +(1 row) + +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; + jsonpath +-------------------------------------------- + $?(@ tsmatch "running" tsconfig "english") +(1 row) + +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; + jsonpath +-------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple") +(1 row) + +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w") +(1 row) + +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +ERROR: syntax error at or near " " of jsonpath input +LINE 1: select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconf... + ^ +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------- + $?(@ tsmatch "fast & furious" tsqparser "w") +(1 row) + +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; + jsonpath +-------------------------------------------------- + $[*]?(@."title" tsmatch "god" && @."rating" > 5) +(1 row) + +select '$ ? (@ tsmatch $pattern)'::jsonpath; +ERROR: syntax error at or near "$pattern" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath; + ^ +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +ERROR: syntax error at or near "$var" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; + ^ +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; + jsonpath +------------------------------------------------ + $?(@ tsmatch "running" tsconfig "wrongconfig") +(1 row) + diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index d3a38c57791..8698021ba2b 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -1253,3 +1253,16 @@ SELECT jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt FROM str s1, str s2 ORDER BY s1.num, s2.num; + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql index 44178d8b45a..dc428ffe59b 100644 --- a/src/test/regress/sql/jsonpath.sql +++ b/src/test/regress/sql/jsonpath.sql @@ -306,3 +306,22 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, '00', '1a']) str, LATERAL pg_input_error_info(str, 'jsonpath') as errinfo; + +-- tsmatch (Full Text Search) + +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; +-- tsconfig must be specified first and then tsqparser +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; +select '$ ? (@ tsmatch $pattern)'::jsonpath; + +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; -- 2.53.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search 2026-02-27 05:59 Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 2026-03-02 03:44 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Chao Li <[email protected]> 2026-04-03 09:56 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 2026-04-04 09:38 ` Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> @ 2026-04-05 09:15 ` Florents Tselai <[email protected]> 0 siblings, 0 replies; 5+ messages in thread From: Florents Tselai @ 2026-04-05 09:15 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: pgsql-hackers <[email protected]>; Andrew Dunstan <[email protected]> On Sat, Apr 4, 2026 at 12:38 PM Florents Tselai <[email protected]> wrote: > Here's a v5 which is v4 + pgindent > > > > > On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <[email protected]> > wrote: > >> >> >> >> On Mon, Mar 2, 2026 at 5:44 AM Chao Li <[email protected]> wrote: >> >>> >>> >>> > On Feb 27, 2026, at 13:59, Florents Tselai <[email protected]> >>> wrote: >>> > >>> > >>> > >>> > On Thu, Feb 26, 2026 at 8:48 AM Chao Li <[email protected]> >>> wrote: >>> > >>> > >>> > > On Feb 1, 2026, at 19:02, Florents Tselai <[email protected]> >>> wrote: >>> > > >>> > > >>> > > >>> > > >>> > > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai < >>> [email protected]> wrote: >>> > > Hi, >>> > > >>> > > in real-life I work a lot with json & fts search, here's a feature >>> I've always wished I had, >>> > > but never tackle it. Until yesterday that is. >>> > > >>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >>> @.body tsmatch "performance")'); >>> > > >>> > > This patch introduces a tsmatch boolean operator to the JSONPath >>> engine. >>> > > By integrating FTS natively into path expressions, >>> > > this operator allows for high-precision filtering of nested JSONB >>> structures— >>> > > solving issues with structural ambiguity and query complexity. >>> > > >>> > > Currently, users must choose between two suboptimal paths for >>> FTS-ing nested JSON: >>> > > - Imprecise Global Indexing >>> > > jsonb_to_tsvector aggregates text into a flat vector. >>> > > This ignores JSON boundaries, leading to false positives when the >>> same key (e.g., "body") >>> > > appears in different contexts (e.g., a "Product Description" vs. a >>> "Customer Review"). >>> > > >>> > > - Complex SQL Workarounds >>> > > Achieving 100% precision requires unnesting the document via >>> jsonb_array_elements and LATERAL joins. >>> > > This leads to verbose SQL and high memory overhead from generating >>> intermediate heap tuples. >>> > > >>> > > One of the most significant advantages of tsmatch is its ability to >>> participate in multi-condition predicates >>> > > within the same JSON object - something jsonb_to_tsvector cannot do. >>> > > >>> > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && >>> @.body tsmatch "performance")'); >>> > > >>> > > In a flat vector, the association between "Alice" and "performance" >>> is lost. >>> > > tsmatch preserves this link by evaluating the FTS predicate in-place >>> during path traversal. >>> > > >>> > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly >>> define an FTS operator, >>> > > tsmatch is architecturally modeled after the standard-defined >>> like_regex. >>> > > >>> > > The implementation follows the like_regex precedent: >>> > > it is a non-indexable predicate that relies on GIN path-matching for >>> pruning and heap re-checks for precision. >>> > > Caching is scoped to the JsonPathExecContext, >>> > > ensuring 'compile-once' efficiency per execution without violating >>> the stability requirements of prepared statements. >>> > > >>> > > This initial implementation uses plainto_tsquery. >>> > > However, the grammar is designed to support a "mode" flag (similar >>> to like_regex flags) >>> > > in future iterations to toggle between to_tsquery, >>> websearch_to_tsquery, and phraseto_tsquery. >>> > > >>> > > Here's a v2, that implements the tsqparser clause >>> > > >>> > > So this should now work too >>> > > >>> > > select jsonb_path_query_array('["fast car", "slow car", "fast and >>> furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") >>> <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >>> > >>> > Hi Florents, >>> > >>> > Grant pinged me about this. I can review it in coming days. Can you >>> please rebase it? I failed to apply to current master. Also, the CF >>> reported a failure test case, please take a look. >>> > >>> > Hi Evan, >>> > thanks for having a look. The conflict was due to the intro of >>> pg_fallthrough. Not related to this patch . >>> > >>> > I noticed the failure too, but I'm having a hard time reproducing it >>> tbh. >>> > This fails for Debian Trixie with Meson. The same with Autoconf >>> passes... >>> > >>> > https://github.com/Florents-Tselai/postgres/runs/65098077968 >>> > >>> > >>> > >>> > >>> > <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch> >>> >>> I have reviewed v3 and traced a few test cases. Here comes my review >>> comments: >>> >>> 1 >>> ``` >>> + <replaceable>string</replaceable> <literal>tsmatch</literal> >>> <replaceable>string</replaceable> >>> + <optional> <literal>tsconfig</literal> >>> <replaceable>string</replaceable> </optional> >>> + <optional> <literal>tsqparser</literal> >>> <replaceable>string</replaceable> </optional> >>> ``` >>> >>> For all “replaceable”, instead of “string”, would it be better to use >>> something more descriptive? For example: >>> ``` >>> <replaceable>json_string</replaceable> <literal>tsmatch</literal> >>> <replaceable>query</replaceable> >>> <optional> <literal>tsconfig</literal> >>> <replaceable>config_name</replaceable> </optional> >>> <optional> <literal>tsqparser</literal> >>> <replaceable>parser_mode</replaceable> </optional> >>> ``` >>> >>> 2 - jsonpath_gram.y >>> ``` >>> +static bool makeItemTsMatch(JsonPathParseItem *doc, >>> + JsonPathString >>> *tsquery, >>> + JsonPathString >>> *tsconfig, >>> + JsonPathString >>> *tsquery_parser, >>> + >>> JsonPathParseItem ** result, >>> + struct Node >>> *escontext); >>> ``` >>> >>> Format Nit: Looking at the existing code, the J in the second and >>> following lines, should be placed in the same column as the J in the first >>> line. >>> >>> 3 - jsonpath_gram.y >>> ``` >>> + | expr TSMATCH_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsconfig (3rd) and NULL for >>> tsquery_parser (4th) */ >>> + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsquery_parser (4th) */ >>> + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + /* Pass NULL for tsconfig (3rd) */ >>> + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P >>> STRING_P >>> + { >>> + JsonPathParseItem *jppitem; >>> + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, >>> escontext)) >>> + YYABORT; >>> + $$ = jppitem; >>> + } >>> ``` >>> >>> Feels a little redundant, repeatedly calls makeItemTsMatch. See the >>> attached diff for a simplification. But my version is a bit longer in terms >>> of number of lines. So, up to you. >>> >>> 4 - jsonpath_gram.y >>> ``` >>> +static bool >>> +makeItemTsMatch(JsonPathParseItem *doc, >>> + JsonPathString *tsquery, >>> + JsonPathString *tsconfig, >>> + JsonPathString *tsquery_parser, >>> + JsonPathParseItem **result, >>> + struct Node *escontext) >>> ``` >>> >>> makeItemTsMatch doesn’t need to return a bool. Actually, now it never >>> returns false, instead, it just ereport(ERROR). >>> >>> 5 - jsonpath.h >>> ``` >>> + struct >>> + { >>> + int32 doc; >>> + char *tsquery; >>> + uint32 tsquerylen; >>> + int32 tsconfig; >>> + char *tsqparser; >>> + uint32 tsqparser_len; >>> + } tsmatch; >>> >>> + struct >>> + { >>> + JsonPathParseItem *doc; >>> + char *tsquery; >>> + uint32 tsquerylen; >>> + JsonPathParseItem *tsconfig; >>> + char *tsqparser; >>> + uint32 tsqparser_len; >>> + } tsmatch; >>> } value; >>> ``` >>> >>> tsquerylen doesn’t have _ before len, and tsqparser_len, would it be >>> better to make naming conventions consistent in the same structure? >>> >>> 6 - jsonpath_exec.c >>> ``` >>> #include "tsearch/ts_utils.h" >>> #include "tsearch/ts_cache.h" >>> #include "utils/regproc.h" >>> #include "catalog/namespace.h" >>> >>> static JsonPathBool >>> executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, >>> void *param) >>> ``` >>> >>> Why don’t put these includes to the header section together with other >>> includes? >>> >>> 7 - jsonpath_exec.c >>> ``` >>> + else >>> + { >>> + /* >>> + * Fallback or Error for unknown flags >>> (should be caught by >>> + * parser) >>> + */ >>> + ereport(ERROR, >>> + >>> (errcode(ERRCODE_SYNTAX_ERROR), >>> + errmsg("unrecognized >>> tsqparser flag"))); >>> + } >>> ``` >>> >>> This “else” should never be entered as the same check has been done by >>> makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable(). >>> >>> 8 - jsonpath_exec.c >>> ``` >>> + /* Setup Context (Run ONLY once per predicate) */ >>> + if (!cxt->initialized) >>> ``` >>> >>> While tracing this SQL: >>> ``` >>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >>> evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; >>> ?column? >>> ---------- >>> >>> (1 row) >>> ``` >>> >>> I noticed that, when process “jogging”, cxt->initialized is still false, >>> meaning that, the cxt is not reused across array items. Given the same >>> tsconfig should apply to all array items, I think cxt should be reused. >>> >>> 9 - jsonpath_exec.c >>> ``` >>> + /* Select Parser and Compile Query */ >>> + parser_mode = jsp->content.tsmatch.tsqparser; >>> + parser_len = jsp->content.tsmatch.tsqparser_len; >>> + >>> + if (parser_len > 0) >>> + { >>> + /* Dispatch based on flag */ >>> + if (pg_strncasecmp(parser_mode, "pl", >>> parser_len) == 0) >>> ``` >>> >>> Nit: parser_mode is only used inside if (parser_len > 0), it can be >>> defined inside the “if”. >>> >>> 10 - jsonpath_gram.y >>> ``` >>> + ereport(ERROR, >>> + (errcode(ERRCODE_SYNTAX_ERROR), >>> + errmsg("invalid tsquery_parser >>> value: \"%s\"", tsquery_parser->val), >>> + errhint("Valid values are >>> \"pl\", \"ph\", and \"w\"."))); >>> ``` >>> >>> When tested a case with an invalid parser, I got: >>> ``` >>> evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb >>> >>> @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser >>> "pss")'; >>> ERROR: invalid tsquery_parser value: "pss @" >>> LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... >>> ^ >>> HINT: Valid values are "pl", "ph", and "w". >>> ``` >>> >>> You can see the it shows a bad looking invalid value. I think that’s >>> because tsquery_parser->val is not NULL terminated. I fixed this problem >>> with: >>> ``` >>> errmsg("invalid tsquery_parser value: \"%.*s\"", (int) >>> tsquery_parser->len, tsquery_parser->val), >>> ``` >>> >>> This change is also included in the attached diff file. >>> >>> 11 - jsonpath.c >>> ``` >>> + if (printBracketes) >>> + appendStringInfoChar(buf, ')'); >>> + break; >>> + >>> if (printBracketes) >>> appendStringInfoChar(buf, ')'); >>> ``` >>> >>> Duplicate code. Looks like a copy-pasto. >>> >>> 12 - jsonpath.c >>> ``` >>> + /* Write the Main Query String */ >>> + appendBinaryStringInfo(buf, >>> + >>> &item->value.tsmatch.tsquerylen, >>> + >>> sizeof(item->value.tsmatch.tsquerylen)); >>> + appendBinaryStringInfo(buf, >>> + >>> item->value.tsmatch.tsquery, >>> + >>> item->value.tsmatch.tsquerylen); >>> + appendStringInfoChar(buf, '\0'); >>> ``` >>> >>> I don’t think we need to manually append ‘\0’ after >>> appendBinaryStringInfo. Looking at the header comment of >>> appendBinaryStringInfo, it says that a trailing null will be added. >>> ``` >>> /* >>> * appendBinaryStringInfo >>> * >>> * Append arbitrary binary data to a StringInfo, allocating more space >>> * if necessary. Ensures that a trailing null byte is present. >>> */ >>> void >>> appendBinaryStringInfo(StringInfo str, const void *data, int datalen) >>> ``` >>> >> >> Here's a v4 which incorporates most of Evan's comments & feedback >> - shifted tsquery compilation logic to use persistent cache within >> JsonPathExecContext >> - Fixed a binary serialization alignment issue which caused Dixie to fail >> earlier >> - I've refactor and simplified the grammar per Evan's input by adding a >> tsmatch_opts rule. >> - Also updated the docs per Evan's comments >> >> >> Andrew was kind enough to give some comments off-list. Here's a v6 - Fixed soft-error path in makeItemTsMatch - ereport replaced with errsave, return value now checked at the grammar call site with YYABORT - Fixed pg_strncasecmp prefix match bug in parser-mode validation - "p" was being accepted as "pl" - Fixed get_ts_config_oid to throw on unknown tsconfig instead of passing InvalidOid downstream - Grammar restructured to make tsconfig and tsqparser order-independent - Removed unused JsonTsMatchContext struct and some misplaced #includes Attachments: [application/octet-stream] v6-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch (33.4K, 3-v6-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch) download | inline diff: From 40947834d58131c0b928d4bbe79fe885d482fccc Mon Sep 17 00:00:00 2001 From: Florents Tselai <[email protected]> Date: Sun, 5 Apr 2026 11:52:03 +0300 Subject: [PATCH v6] Add tsmatch JSONPath operator for granular Full Text Search MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch introduces the tsmatch boolean operator to the JSONPath engine. By integrating FTS natively into path expressions, this operator allows for high-precision filtering of nested JSONB structures—solving issues with structural ambiguity and query complexity. Currently, users must choose between two suboptimal paths for searching nested JSON: 1. Imprecise Global Indexing jsonb_to_tsvector aggregates text into a flat vector. This ignores JSON boundaries, leading to false positives when the same key (e.g., "body") appears in different contexts (e.g., a "Product Description" vs. a "Customer Review"). 2. Complex SQL Workarounds Achieving 100% precision requires "exploding" the document via jsonb_array_elements and LATERAL joins. This leads to verbose SQL and high memory overhead from generating intermediate heap tuples. One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates within the same JSON object— something jsonb_to_tsvector cannot do. SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")'); In a flat vector, the association between "Alice" and "performance" is lost. tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal. While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator, tsmatch is architecturally modeled after the standard-defined like_regex. The operator supports optional configuration for both the dictionary and the query parser: @ tsmatch "query" [ tsconfig "regconfig" ] [ tsqparser "mode" ] Supported parser modes are: - "pl": plainto_tsquery (no operators required) - "ph": phraseto_tsquery - "w": websearch_to_tsquery - Omitted: Defaults to to_tsquery (strict mode) The implementation relies on GIN path-matching for index pruning and heap re-checks for precision. Caching is scoped to the JsonPathExecContext, ensuring 'compile-once' efficiency for the tsquery and OID lookup per execution, respecting the stability requirements of prepared statements. --- doc/src/sgml/func/func-json.sgml | 35 +++++ src/backend/utils/adt/jsonpath.c | 128 +++++++++++++++- src/backend/utils/adt/jsonpath_exec.c | 147 ++++++++++++++++++- src/backend/utils/adt/jsonpath_gram.y | 117 ++++++++++++++- src/backend/utils/adt/jsonpath_scan.l | 3 + src/include/utils/jsonpath.h | 19 +++ src/test/regress/expected/jsonb_jsonpath.out | 58 ++++++++ src/test/regress/expected/jsonpath.out | 62 ++++++++ src/test/regress/sql/jsonb_jsonpath.sql | 13 ++ src/test/regress/sql/jsonpath.sql | 19 +++ 10 files changed, 597 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml index 4cd338fe6e3..9c8905668ac 100644 --- a/doc/src/sgml/func/func-json.sgml +++ b/doc/src/sgml/func/func-json.sgml @@ -3247,6 +3247,41 @@ ERROR: jsonpath member accessor can only be applied to an object <returnvalue>[]</returnvalue> </para></entry> </row> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable> + <optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional> + <optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional> + <returnvalue>boolean</returnvalue> + </para> + <para> + Tests whether the JSON string (first operand) matches the full-text search query + (second operand). The matching behavior can be customized using the optional + <literal>tsconfig</literal> and <literal>tsqparser</literal> clauses. + If <literal>tsconfig</literal> is omitted, the current session's default text search configuration + is used (see <xref linkend="guc-default-text-search-config"/>). + The <literal>tsqparser</literal> clause determines how the query string is parsed + (see <xref linkend="textsearch-parsing-queries"/>). + Valid options are <literal>"pl"</literal> (<function>plainto_tsquery</function>), + <literal>"ph"</literal> (<function>phraseto_tsquery</function>), and + <literal>"w"</literal> (<function>websearch_to_tsquery</function>). + If <literal>tsqparser</literal> is omitted, the query is parsed using <function>to_tsquery</function>. + <literal>tsconfig</literal> and <literal>tsqparser</literal> can appear in any order. + </para> + <para> + <literal>jsonb_path_query_array('["running", "runs", "ran", "jogging"]', '$[*] ? (@ tsmatch "run" tsconfig "english")')</literal> + <returnvalue>["running", "runs"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast & car")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + <para> + <literal>jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")')</literal> + <returnvalue>["fast car"]</returnvalue> + </para> + </entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c index 7bfc18c9888..e551d65dc25 100644 --- a/src/backend/utils/adt/jsonpath.c +++ b/src/backend/utils/adt/jsonpath.c @@ -351,6 +351,55 @@ flattenJsonPathParseItem(StringInfo buf, int *result, struct Node *escontext, *(int32 *) (buf->data + offs) = chld - pos; } break; + case jpiTsMatch: + { + int32 expr_off; + int32 tsconfig_off; + uint32 tsqparser_len_val = item->value.tsmatch.tsqparser ? item->value.tsmatch.tsqparser_len : 0; + + expr_off = reserveSpaceForItemPointer(buf); + tsconfig_off = reserveSpaceForItemPointer(buf); + + /* + * Write all integers FIRST so they are naturally 4-byte + * aligned + */ + appendBinaryStringInfo(buf, &item->value.tsmatch.tsquery_len, sizeof(uint32)); + appendBinaryStringInfo(buf, &tsqparser_len_val, sizeof(uint32)); + + /* Now append the strings at the end */ + appendBinaryStringInfo(buf, item->value.tsmatch.tsquery, item->value.tsmatch.tsquery_len); + appendStringInfoChar(buf, '\0'); + + if (item->value.tsmatch.tsqparser) + { + appendBinaryStringInfo(buf, item->value.tsmatch.tsqparser, tsqparser_len_val); + appendStringInfoChar(buf, '\0'); + } + + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.doc, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + expr_off) = chld - pos; + + /* TSConfig (Optional) */ + if (item->value.tsmatch.tsconfig) + { + if (!flattenJsonPathParseItem(buf, &chld, escontext, + item->value.tsmatch.tsconfig, + nestingLevel, + insideArraySubscript)) + return false; + *(int32 *) (buf->data + tsconfig_off) = chld - pos; + } + else + { + *(int32 *) (buf->data + tsconfig_off) = 0; + } + } + break; case jpiFilter: argNestingLevel++; pg_fallthrough; @@ -766,6 +815,55 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey, appendStringInfoChar(buf, '"'); } + break; + case jpiTsMatch: + if (printBracketes) + appendStringInfoChar(buf, '('); + + jspInitByBuffer(&elem, v->base, v->content.tsmatch.doc); + printJsonPathItem(buf, &elem, false, + operationPriority(elem.type) <= + operationPriority(v->type)); + + appendStringInfoString(buf, " tsmatch "); + + escape_json_with_len(buf, + v->content.tsmatch.tsquery, + v->content.tsmatch.tsquery_len); + + /* Print TSConfig if present */ + if (v->content.tsmatch.tsconfig) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + + appendStringInfoString(buf, " tsconfig "); + jspInitByBuffer(&config_item, v->base, v->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + appendStringInfoChar(buf, '"'); + appendBinaryStringInfo(buf, config_str, config_len); + appendStringInfoChar(buf, '"'); + } + + if (v->content.tsmatch.tsqparser_len > 0) + { + appendStringInfoString(buf, " tsqparser "); + appendStringInfoChar(buf, '"'); + + /* + * Use simple binary append since flags like "pl" don't need + * JSON escaping + */ + appendBinaryStringInfo(buf, + v->content.tsmatch.tsqparser, + v->content.tsmatch.tsqparser_len); + appendStringInfoChar(buf, '"'); + } + + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; if (printBracketes) appendStringInfoChar(buf, ')'); @@ -976,6 +1074,8 @@ jspOperationName(JsonPathItemType type) return "timestamp"; case jpiTimestampTz: return "timestamp_tz"; + case jpiTsMatch: + return "tsmatch"; case jpiStrReplace: return "replace"; case jpiStrLower: @@ -1158,6 +1258,27 @@ jspInitByBuffer(JsonPathItem *v, char *base, int32 pos) read_int32(v->content.like_regex.patternlen, base, pos); v->content.like_regex.pattern = base + pos; break; + case jpiTsMatch: + /* FIX: Read all integers first */ + read_int32(v->content.tsmatch.doc, base, pos); + read_int32(v->content.tsmatch.tsconfig, base, pos); + read_int32(v->content.tsmatch.tsquery_len, base, pos); + read_int32(v->content.tsmatch.tsqparser_len, base, pos); + + /* Set pointers to the strings in the buffer */ + v->content.tsmatch.tsquery = base + pos; + pos += v->content.tsmatch.tsquery_len + 1; + + if (v->content.tsmatch.tsqparser_len > 0) + { + v->content.tsmatch.tsqparser = base + pos; + pos += v->content.tsmatch.tsqparser_len + 1; + } + else + { + v->content.tsmatch.tsqparser = NULL; + } + break; default: elog(ERROR, "unrecognized jsonpath item type: %d", v->type); } @@ -1231,6 +1352,7 @@ jspGetNext(JsonPathItem *v, JsonPathItem *a) v->type == jpiLast || v->type == jpiStartsWith || v->type == jpiLikeRegex || + v->type == jpiTsMatch || v->type == jpiBigint || v->type == jpiBoolean || v->type == jpiDate || @@ -1575,7 +1697,11 @@ jspIsMutableWalker(JsonPathItem *jpi, struct JsonPathMutableContext *cxt) jspInitByBuffer(&arg, jpi->base, jpi->content.like_regex.expr); jspIsMutableWalker(&arg, cxt); break; - + case jpiTsMatch: + Assert(status == jpdsNonDateTime); + jspInitByBuffer(&arg, jpi->base, jpi->content.tsmatch.doc); + jspIsMutableWalker(&arg, cxt); + break; /* literals */ case jpiNull: case jpiString: diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index 770840a0611..de8937fdb08 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -59,13 +59,17 @@ #include "postgres.h" +#include "catalog/namespace.h" #include "catalog/pg_collation.h" #include "catalog/pg_type.h" #include "funcapi.h" #include "miscadmin.h" #include "nodes/miscnodes.h" #include "nodes/nodeFuncs.h" +#include "nodes/pg_list.h" #include "regex/regex.h" +#include "tsearch/ts_cache.h" +#include "tsearch/ts_utils.h" #include "utils/builtins.h" #include "utils/date.h" #include "utils/datetime.h" @@ -74,6 +78,7 @@ #include "utils/json.h" #include "utils/jsonpath.h" #include "utils/memutils.h" +#include "utils/regproc.h" #include "utils/timestamp.h" /* @@ -114,6 +119,7 @@ typedef struct JsonPathExecContext bool throwErrors; /* with "false" all suppressible errors are * suppressed */ bool useTz; + List *tsmatch_cache; /* Persists compiled FTS queries */ } JsonPathExecContext; /* Context for LIKE_REGEX execution. */ @@ -123,6 +129,14 @@ typedef struct JsonLikeRegexContext int cflags; } JsonLikeRegexContext; +/* Context for tsmatch execution. */ +typedef struct JsonTsMatchCacheEntry +{ + void *jsp_ptr; /* Cache key: pointer to the AST node */ + Datum queryDatum; /* Cached compiled query */ + Oid tsconfigId; /* Cached dictionary OID */ +} JsonTsMatchCacheEntry; + /* Result of jsonpath predicate evaluation */ typedef enum JsonPathBool { @@ -335,6 +349,7 @@ static JsonPathExecResult executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonbValue *jb, JsonValueList *found); static JsonPathExecResult appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonValueList *found, JsonPathBool res); +static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param); static void getJsonPathItem(JsonPathExecContext *cxt, JsonPathItem *item, JsonbValue *value); static JsonbValue *GetJsonPathVar(void *cxt, char *varName, int varNameLen, @@ -740,6 +755,7 @@ executeJsonPath(JsonPath *path, void *vars, JsonPathGetVarCallback getVar, cxt.innermostArraySize = -1; cxt.throwErrors = throwErrors; cxt.useTz = useTz; + cxt.tsmatch_cache = NIL; if (jspStrictAbsenceOfErrors(&cxt) && !result) { @@ -840,6 +856,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, case jpiExists: case jpiStartsWith: case jpiLikeRegex: + case jpiTsMatch: { JsonPathBool st = executeBoolItem(cxt, jsp, jb, true); @@ -1914,6 +1931,18 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return executePredicate(cxt, jsp, &larg, NULL, jb, false, executeLikeRegex, &lrcxt); } + case jpiTsMatch: + { + jspInitByBuffer(&larg, jsp->base, + jsp->content.tsmatch.doc); + + /* + * Pass 'cxt' as the param so executeTsMatch can access the + * cache list + */ + return executePredicate(cxt, jsp, &larg, NULL, jb, false, + executeTsMatch, cxt); + } case jpiExists: jspGetArg(jsp, &larg); @@ -1952,7 +1981,6 @@ executeBoolItem(JsonPathExecContext *cxt, JsonPathItem *jsp, return res == jperOk ? jpbTrue : jpbFalse; } - default: elog(ERROR, "invalid boolean jsonpath item type: %d", jsp->type); return jpbUnknown; @@ -3186,6 +3214,123 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp, return res; } +static JsonPathBool +executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, + void *param) +{ + JsonPathExecContext *cxt = (JsonPathExecContext *) param; + JsonTsMatchCacheEntry *cache = NULL; + ListCell *lc; + text *doc_text; + Datum tsvector_datum; + bool match; + + if (!(str = getScalar(str, jbvString))) + return jpbUnknown; + + /* Check if we already compiled the query for this specific AST node */ + foreach(lc, cxt->tsmatch_cache) + { + JsonTsMatchCacheEntry *entry = (JsonTsMatchCacheEntry *) lfirst(lc); + + if (entry->jsp_ptr == (void *) jsp) + { + cache = entry; + break; + } + } + + /* If not found, parse, compile, and cache it */ + if (cache == NULL) + { + text *query_text; + char *parser_mode; + uint32 parser_len; + + /* Allocate new cache entry in the execution memory context */ + cache = palloc0(sizeof(JsonTsMatchCacheEntry)); + cache->jsp_ptr = (void *) jsp; + + if (jsp->content.tsmatch.tsconfig != 0) + { + JsonPathItem config_item; + int32 config_len; + char *config_str; + char *safe_config_str; + + jspInitByBuffer(&config_item, jsp->base, jsp->content.tsmatch.tsconfig); + config_str = jspGetString(&config_item, &config_len); + + /* Null-terminate for safe catalog lookup */ + safe_config_str = pnstrdup(config_str, config_len); + cache->tsconfigId = get_ts_config_oid(stringToQualifiedNameList(safe_config_str, NULL), false); + pfree(safe_config_str); + } + else + { + cache->tsconfigId = getTSCurrentConfig(true); + } + + /* Prepare Query Text */ + query_text = cstring_to_text_with_len(jsp->content.tsmatch.tsquery, + jsp->content.tsmatch.tsquery_len); + + /* Select Parser and Compile Query */ + parser_mode = jsp->content.tsmatch.tsqparser; + parser_len = jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + if (parser_len == 2 && pg_strncasecmp(parser_mode, "pl", 2) == 0) + { + cache->queryDatum = DirectFunctionCall2(plainto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (parser_len == 2 && pg_strncasecmp(parser_mode, "ph", 2) == 0) + { + cache->queryDatum = DirectFunctionCall2(phraseto_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else if (parser_len == 1 && pg_strncasecmp(parser_mode, "w", 1) == 0) + { + cache->queryDatum = DirectFunctionCall2(websearch_to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + else + { + pg_unreachable(); + } + } + else + { + cache->queryDatum = DirectFunctionCall2(to_tsquery_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(query_text)); + } + + /* Append the newly compiled cache entry to the context's list */ + cxt->tsmatch_cache = lappend(cxt->tsmatch_cache, cache); + } + + /* Runtime Execution (Using the cached query) */ + doc_text = cstring_to_text_with_len(str->val.string.val, + str->val.string.len); + + tsvector_datum = DirectFunctionCall2(to_tsvector_byid, + ObjectIdGetDatum(cache->tsconfigId), + PointerGetDatum(doc_text)); + + match = DatumGetBool(DirectFunctionCall2(ts_match_vq, + tsvector_datum, + cache->queryDatum)); + + return match ? jpbTrue : jpbFalse; +} + + /* * Convert boolean execution status 'res' to a boolean JSON item and execute * next jsonpath. diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index f826697d098..9f3aa0cbe14 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -43,7 +43,12 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *flags, JsonPathParseItem ** result, struct Node *escontext); - +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem ** result, + struct Node *escontext); /* * Bison doesn't allocate anything that needs to live across parser calls, * so we can easily have it use palloc instead of malloc. This prevents @@ -73,6 +78,13 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, JsonPathParseItem *value; JsonPathParseResult *result; JsonPathItemType optype; + struct + { + bool has_tsconfig; + JsonPathString tsconfig; + bool has_tsqparser; + JsonPathString tsqparser; + } tsmatch_opts; bool boolean; int integer; } @@ -81,7 +93,7 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %token <str> IDENT_P STRING_P NUMERIC_P INT_P VARIABLE_P %token <str> OR_P AND_P NOT_P %token <str> LESS_P LESSEQUAL_P EQUAL_P NOTEQUAL_P GREATEREQUAL_P GREATER_P -%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P +%token <str> ANY_P STRICT_P LAX_P LAST_P STARTS_P WITH_P LIKE_REGEX_P FLAG_P TSMATCH_P TSCONFIG_P TSQUERYPARSER_P %token <str> ABS_P SIZE_P TYPE_P FLOOR_P DOUBLE_P CEILING_P KEYVALUE_P %token <str> DATETIME_P %token <str> BIGINT_P BOOLEAN_P DATE_P DECIMAL_P INTEGER_P NUMBER_P @@ -109,6 +121,8 @@ static bool makeItemLikeRegex(JsonPathParseItem *expr, %type <integer> any_level +%type <tsmatch_opts> tsmatch_opts + %left OR_P %left AND_P %right NOT_P @@ -189,6 +203,17 @@ predicate: YYABORT; $$ = jppitem; } + | expr TSMATCH_P STRING_P tsmatch_opts + { + JsonPathParseItem *jppitem; + + if (!makeItemTsMatch($1, &$3, + $4.has_tsconfig ? &$4.tsconfig : NULL, + $4.has_tsqparser ? &$4.tsqparser : NULL, + &jppitem, escontext)) + YYABORT; + $$ = jppitem; + } ; starts_with_initial: @@ -337,6 +362,33 @@ str_str_args: str_elem ',' str_elem { $$ = list_make2($1, $3); } ; +tsmatch_opts: + /* EMPTY */ + { + $$.has_tsconfig = false; + $$.has_tsqparser = false; + } + | tsmatch_opts TSCONFIG_P STRING_P + { + $$ = $1; + if ($$.has_tsconfig) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate tsconfig option in tsmatch"))); + $$.has_tsconfig = true; + $$.tsconfig = $3; + } + | tsmatch_opts TSQUERYPARSER_P STRING_P + { + $$ = $1; + if ($$.has_tsqparser) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate tsqparser option in tsmatch"))); + $$.has_tsqparser = true; + $$.tsqparser = $3; + } + ; key: key_name { $$ = makeItemKey(&$1); } ; @@ -377,6 +429,9 @@ key_name: | TIME_TZ_P | TIMESTAMP_P | TIMESTAMP_TZ_P + | TSCONFIG_P + | TSMATCH_P + | TSQUERYPARSER_P | STR_LOWER_P | STR_UPPER_P | STR_INITCAP_P @@ -715,3 +770,61 @@ jspConvertRegexFlags(uint32 xflags, int *result, struct Node *escontext) return true; } + +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) +{ + JsonPathParseItem *v = makeItemType(jpiTsMatch); + + v->value.tsmatch.doc = doc; + + v->value.tsmatch.tsquery = tsquery->val; + v->value.tsmatch.tsquery_len = tsquery->len; + + /* Handle the Configuration (Stored as a Node) */ + if (tsconfig) + { + JsonPathParseItem *conf = makeItemType(jpiString); + conf->value.string.val = tsconfig->val; + conf->value.string.len = tsconfig->len; + v->value.tsmatch.tsconfig = conf; + } + else + { + v->value.tsmatch.tsconfig = NULL; + } + + /* Handle the TSQuery Parser Flag */ + if (tsquery_parser) + { + /* Check for "pl", "ph", "w" here to throw a syntax error immediately */ + if (!(tsquery_parser->len == 2 && pg_strncasecmp(tsquery_parser->val, "pl", 2) == 0) && + !(tsquery_parser->len == 2 && pg_strncasecmp(tsquery_parser->val, "ph", 2) == 0) && + !(tsquery_parser->len == 1 && pg_strncasecmp(tsquery_parser->val, "w", 1) == 0)) + { + errsave(escontext, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser value: \"%.*s\"", + (int) tsquery_parser->len, tsquery_parser->val), + errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); + return false; + } + + v->value.tsmatch.tsqparser = tsquery_parser->val; + v->value.tsmatch.tsqparser_len = tsquery_parser->len; + } + else + { + v->value.tsmatch.tsqparser = NULL; + v->value.tsmatch.tsqparser_len = 0; + } + + /* Success */ + *result = v; + return true; +} diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l index e4fadcc2e69..92b3e9be926 100644 --- a/src/backend/utils/adt/jsonpath_scan.l +++ b/src/backend/utils/adt/jsonpath_scan.l @@ -434,10 +434,13 @@ static const JsonPathKeyword keywords[] = { {7, false, INTEGER_P, "integer"}, {7, false, STR_REPLACE_P, "replace"}, {7, false, TIME_TZ_P, "time_tz"}, + {7, false, TSMATCH_P, "tsmatch"}, {7, false, UNKNOWN_P, "unknown"}, {8, false, DATETIME_P, "datetime"}, {8, false, KEYVALUE_P, "keyvalue"}, + {8, false, TSCONFIG_P, "tsconfig"}, {9, false, TIMESTAMP_P, "timestamp"}, + {9, false, TSQUERYPARSER_P, "tsqparser"}, {10, false, LIKE_REGEX_P, "like_regex"}, {10, false, STR_SPLIT_PART_P, "split_part"}, {12, false, TIMESTAMP_TZ_P, "timestamp_tz"}, diff --git a/src/include/utils/jsonpath.h b/src/include/utils/jsonpath.h index 8d27206e242..89753bf8bc7 100644 --- a/src/include/utils/jsonpath.h +++ b/src/include/utils/jsonpath.h @@ -104,6 +104,7 @@ typedef enum JsonPathItemType jpiLast, /* LAST array subscript */ jpiStartsWith, /* STARTS WITH predicate */ jpiLikeRegex, /* LIKE_REGEX predicate */ + jpiTsMatch, /* TSMATCH predicate */ jpiBigint, /* .bigint() item method */ jpiBoolean, /* .boolean() item method */ jpiDate, /* .date() item method */ @@ -196,6 +197,15 @@ typedef struct JsonPathItem int32 patternlen; uint32 flags; } like_regex; + struct + { + int32 doc; + char *tsquery; + uint32 tsquery_len; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } content; } JsonPathItem; @@ -274,6 +284,15 @@ struct JsonPathParseItem uint32 len; char *val; /* could not be not null-terminated */ } string; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquery_len; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; }; diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index afa6c4cb529..ac78becc305 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -4891,3 +4891,61 @@ ORDER BY s1.num, s2.num; {"s": "B"} | {"s": "B"} | false | true | true | true | false (144 rows) +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); + jsonb_path_query +------------------ +(0 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); + jsonb_path_query +------------------ + "running" + "runs" + "run" +(3 rows) + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); + jsonb_path_query +------------------ + "run" +(1 row) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); + jsonb_path_query +------------------ + "postgres" + "POSTGRES" +(2 rows) + +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +ERROR: syntax error in tsquery: "fast car" +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); + jsonb_path_query +------------------ + "fast car" + "super fast car" +(2 rows) + +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); + jsonb_path_query +------------------ + "fat rats" +(1 row) + diff --git a/src/test/regress/expected/jsonpath.out b/src/test/regress/expected/jsonpath.out index ea971e79854..ba437085405 100644 --- a/src/test/regress/expected/jsonpath.out +++ b/src/test/regress/expected/jsonpath.out @@ -1480,3 +1480,65 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, 1a | f | 42601 | trailing junk after numeric literal at or near "1a" of jsonpath input | | (5 rows) +-- tsmatch (Full Text Search) +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; + jsonpath +------------------------ + $?(@ tsmatch "simple") +(1 row) + +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; + jsonpath +-------------------------------------------- + $?(@ tsmatch "running" tsconfig "english") +(1 row) + +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; + jsonpath +-------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple") +(1 row) + +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w") +(1 row) + +-- tsconfig and tsqparser can appear in any order +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; + jsonpath +---------------------------------------------------------------- + $?(@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w") +(1 row) + +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; + jsonpath +---------------------------------------------- + $?(@ tsmatch "fast & furious" tsqparser "w") +(1 row) + +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; + jsonpath +-------------------------------------------------- + $[*]?(@."title" tsmatch "god" && @."rating" > 5) +(1 row) + +select '$ ? (@ tsmatch $pattern)'::jsonpath; +ERROR: syntax error at or near "$pattern" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $pattern)'::jsonpath; + ^ +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +ERROR: syntax error at or near "$var" of jsonpath input +LINE 1: select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; + ^ +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; + jsonpath +------------------------------------------------ + $?(@ tsmatch "running" tsconfig "wrongconfig") +(1 row) + diff --git a/src/test/regress/sql/jsonb_jsonpath.sql b/src/test/regress/sql/jsonb_jsonpath.sql index d3a38c57791..8698021ba2b 100644 --- a/src/test/regress/sql/jsonb_jsonpath.sql +++ b/src/test/regress/sql/jsonb_jsonpath.sql @@ -1253,3 +1253,16 @@ SELECT jsonb_path_query_first(s1.j, '$.s > $s', vars => s2.j) gt FROM str s1, str s2 ORDER BY s1.num, s2.num; + +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "fly" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "english")'); +select jsonb_path_query('[null, 1, "running", "runs", "ran", "run", "runner", "jogging"]', 'lax $[*] ? (@ tsmatch "run" tsconfig "simple")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "english")'); +select jsonb_path_query('[null, 1, "PostgreSQL", "postgres", "POSTGRES", "database"]', 'lax $[*] ? (@ tsmatch "Postgres" tsconfig "simple")'); +-- in the default tsqparser (to_tsquery) spaces are not allowed, so this should fail for syntax +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english")'); +-- if we specify "w" however it should be ok +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast car" tsconfig "english" tsqparser "w")'); +-- it should also be ok if we change to a valid to_tsquery +select jsonb_path_query('["fast car", "super fast car", "fast and furious", "slow car"]', 'lax $[*] ? (@ tsmatch "fast & car" tsconfig "english")'); +select jsonb_path_query('["fat cat", "cat fat", "fat rats"]', 'lax $[*] ? (@ tsmatch "fat & rat" tsconfig "english")'); diff --git a/src/test/regress/sql/jsonpath.sql b/src/test/regress/sql/jsonpath.sql index 44178d8b45a..d111cee6264 100644 --- a/src/test/regress/sql/jsonpath.sql +++ b/src/test/regress/sql/jsonpath.sql @@ -306,3 +306,22 @@ FROM unnest(ARRAY['$ ? (@ like_regex "pattern" flag "smixq")'::text, '00', '1a']) str, LATERAL pg_input_error_info(str, 'jsonpath') as errinfo; + +-- tsmatch (Full Text Search) + +-- basic success +select '$ ? (@ tsmatch "simple")'::jsonpath; +select '$ ? (@ tsmatch "running" tsconfig "english")'::jsonpath; +-- w/out tsconfig and tsqparser +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple")'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsconfig "simple" tsqparser "w")'::jsonpath; +-- tsconfig and tsqparser can appear in any order +select '$ ? (@ tsmatch "fast & furious" tsqparser "w" tsconfig "simple" )'::jsonpath; +select '$ ? (@ tsmatch "fast & furious" tsqparser "w")'::jsonpath; +select '$[*] ? (@.title tsmatch "god" && @.rating > 5)'::jsonpath; +select '$ ? (@ tsmatch $pattern)'::jsonpath; + +-- only string literals (no variables) are allowed for tsquery +select '$ ? (@ tsmatch $var tsconfig "english")'::jsonpath; +-- if a tsconfig doesn't exist it should parse nonetheless (executor will fail it) +select '$ ? (@ tsmatch "running" tsconfig "wrongconfig")'::jsonpath; -- 2.53.0 ^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2026-04-05 09:15 UTC | newest] Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-27 05:59 Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Florents Tselai <[email protected]> 2026-03-02 03:44 ` Chao Li <[email protected]> 2026-04-03 09:56 ` Florents Tselai <[email protected]> 2026-04-04 09:38 ` Florents Tselai <[email protected]> 2026-04-05 09:15 ` Florents Tselai <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox