Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwuDE-00381a-2l for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 03:44:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vwuDD-00FE8O-2a for pgsql-hackers@arkaria.postgresql.org; Mon, 02 Mar 2026 03:44:51 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwuDD-00FE8F-1I for pgsql-hackers@lists.postgresql.org; Mon, 02 Mar 2026 03:44:51 +0000 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vwuDA-00000001xsa-0OxW for pgsql-hackers@lists.postgresql.org; Mon, 02 Mar 2026 03:44:51 +0000 Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-824b32875e7so1821349b3a.3 for ; Sun, 01 Mar 2026 19:44:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772423086; x=1773027886; darn=lists.postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=anxinTq45q0sdSvNXTRxM9+BJ3Ur/of4iHoexCbOVRs=; b=TbYBm073cNv7PJbiyMuw0+HrNKvsjZqOjtGYXrs9mli1toxxrcpjmurICBbS5nz8ka z6dvT1EhMve/hBgHtYQGRiLZ7/Yzz3dLDK5Dr7FOZLeDWlLmZgD7OdYVTKtU4ZbDyWfV UAl2lE0kLUvaacHV2u77d70JYKr07w+tORiELKnWBaUH935BpRuAai3V0rAIupjL1VTp +72pxF4vJFTSOHTBEDPRztjP5eNV8OhK2KwWfxoyjjyTjBGqxVAJx7JW6QsFiU5PF2x5 bX7dL91TiAXEblx21N6kDtJRjDRRmrgzyhiX8tO2qNr2yp71G/UsF246DjE+zjYzqHH/ IHfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772423086; x=1773027886; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=anxinTq45q0sdSvNXTRxM9+BJ3Ur/of4iHoexCbOVRs=; b=qp91X/fDpN/yuSJt5h4o9bRY5/ZJxkEHn3eu0Qq1InjLRVHzShCY528bUivfqtBvxe 5BjxzldyCcvCfrOprw6F6XWdctmXhq8i5e2YKt/mV/eaYVOfDAjanF943tejqN3WFgaf HfzwUyF8w+0rJ50tUzzgk2pIDsPU6BxbKcOQ2xF/yyhmMBxy2SogW+0zeum4oDIejoHH jiTnAmUh6uZE62EwHNrneZvHkWcWAPqzANpyDtYAZV0iWmdCtyEB9hMHbfiuzefnMB8j LtZl+qG7msQRb3wW3OLkOvit8sPiOLTkIuFEDmALsAr8Lgu6JwzbMyUrv9qcy3JnX8zf 2jeA== X-Gm-Message-State: AOJu0YxIKbUBuIKlgOoGczm1EFFIPd+F5siEHTLBCDXiWXN6SZWc5Bar b66eagOlEM+5IxWEPIGmRiIBL1UO4rePsQuZ+n34NZX60DLMvTKusalmGramRw6P9hs= X-Gm-Gg: ATEYQzxJZuek/b4D4wYiy57YRX5dWe/xJ2bqLNKKU2xWa5R7V0AHnDAuKx3flZlCISo zdgXEP3ma6VMnZKArwh62/wDoju1pD7lVZAX9jCNNkGA0Cz0yFcflAKK5Teb8+qQwtzqHxjCelR SY7QvoL+1lzPTxZDtlN1/FW13E8Dk4ABahXXBLgg+9VdSIQEDIg3VbxQS9tczjS+5v8xLwSK4Z2 fOMuP+bR/p4tOZ5pnkOyGTwo5MwLCzKtHP6VLBZbI7vI4eBGJOQM5xM4skcWMnt1HMR2BWES/sD 8EWdznOoLOFUYUZiz/QoIqAe50BsD6rFlY2UDpIlVvWzKekcwcTYsqoNPIIXOMj3x9Emrcx7JZz N3ShOS1PpyThkRjLrQ/F5fsH0TWpLZ+o35sOpufBuvZm5MGF1APz4Lvq9O5v3RoYP4zc+k3H/gy y7hZdlG4S19U9yHDZuG7K2GLvZaaowX5M= X-Received: by 2002:a05:6a00:a28:b0:823:f96:63bb with SMTP id d2e1a72fcca58-8274da03227mr9562801b3a.52.1772423086229; Sun, 01 Mar 2026 19:44:46 -0800 (PST) Received: from smtpclient.apple ([203.76.245.26]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82739db4955sm12056380b3a.29.2026.03.01.19.44.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 01 Mar 2026 19:44:45 -0800 (PST) From: Chao Li Message-Id: Content-Type: multipart/mixed; boundary="Apple-Mail=_A4920CB7-CA63-4626-B702-126AF5829B9E" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search Date: Mon, 2 Mar 2026 11:44:12 +0800 In-Reply-To: Cc: pgsql-hackers To: Florents Tselai References: X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_A4920CB7-CA63-4626-B702-126AF5829B9E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Feb 27, 2026, at 13:59, Florents Tselai = wrote: >=20 >=20 >=20 > On Thu, Feb 26, 2026 at 8:48=E2=80=AFAM Chao Li = wrote: >=20 >=20 > > On Feb 1, 2026, at 19:02, Florents Tselai = wrote: > >=20 > >=20 > >=20 > >=20 > > On Mon, Jan 26, 2026 at 7:22=E2=80=AFPM Florents Tselai = wrote: > > Hi, > >=20 > > in real-life I work a lot with json & fts search, here's a feature = I've always wished I had, > > but never tackle it. Until yesterday that is. > >=20 > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user =3D=3D "Alice" = && @.body tsmatch "performance")'); > >=20 > > This patch introduces a tsmatch boolean operator to the JSONPath = engine. > > By integrating FTS natively into path expressions, > > this operator allows for high-precision filtering of nested JSONB = structures=E2=80=94 > > solving issues with structural ambiguity and query complexity. > >=20 > > Currently, users must choose between two suboptimal paths for = FTS-ing nested JSON: > > - Imprecise Global Indexing > > jsonb_to_tsvector aggregates text into a flat vector. > > This ignores JSON boundaries, leading to false positives when the = same key (e.g., "body") > > appears in different contexts (e.g., a "Product Description" vs. a = "Customer Review"). > >=20 > > - Complex SQL Workarounds > > Achieving 100% precision requires unnesting the document via = jsonb_array_elements and LATERAL joins. > > This leads to verbose SQL and high memory overhead from generating = intermediate heap tuples. > >=20 > > One of the most significant advantages of tsmatch is its ability to = participate in multi-condition predicates > > within the same JSON object - something jsonb_to_tsvector cannot do. > >=20 > > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user =3D=3D "Alice" = && @.body tsmatch "performance")'); > >=20 > > In a flat vector, the association between "Alice" and "performance" = is lost. > > tsmatch preserves this link by evaluating the FTS predicate in-place = during path traversal. > >=20 > > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly = define an FTS operator, > > tsmatch is architecturally modeled after the standard-defined = like_regex. > >=20 > > The implementation follows the like_regex precedent: > > it is a non-indexable predicate that relies on GIN path-matching for = pruning and heap re-checks for precision. > > Caching is scoped to the JsonPathExecContext, > > ensuring 'compile-once' efficiency per execution without violating = the stability requirements of prepared statements. > >=20 > > This initial implementation uses plainto_tsquery. > > However, the grammar is designed to support a "mode" flag (similar = to like_regex flags) > > in future iterations to toggle between to_tsquery, = websearch_to_tsquery, and phraseto_tsquery. > >=20 > > Here's a v2, that implements the tsqparser clause=20 > >=20 > > So this should now work too=20 > >=20 > > select jsonb_path_query_array('["fast car", "slow car", "fast and = furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") = >=20 > Hi Florents, >=20 > Grant pinged me about this. I can review it in coming days. Can you = please rebase it? I failed to apply to current master. Also, the CF = reported a failure test case, please take a look. >=20 > Hi Evan,=20 > thanks for having a look. The conflict was due to the intro of = pg_fallthrough. Not related to this patch . >=20 > I noticed the failure too, but I'm having a hard time reproducing it = tbh. > This fails for Debian Trixie with Meson. The same with Autoconf = passes... >=20 > https://github.com/Florents-Tselai/postgres/runs/65098077968 >=20 >=20 >=20 >=20 > I have reviewed v3 and traced a few test cases. Here comes my review = comments: 1=20 ``` + string tsmatch = string + tsconfig = string + tsqparser = string ``` For all =E2=80=9Creplaceable=E2=80=9D, instead of =E2=80=9Cstring=E2=80=9D= , would it be better to use something more descriptive? For example: ``` json_string tsmatch = query tsconfig = config_name tsqparser = parser_mode ``` 2 - jsonpath_gram.y ``` +static bool makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString = *tsquery, + JsonPathString = *tsconfig, + JsonPathString = *tsquery_parser, + = JsonPathParseItem ** result, + struct Node = *escontext); ``` Format Nit: Looking at the existing code, the J in the second and = following lines, should be placed in the same column as the J in the = first line. 3 - jsonpath_gram.y ``` + | expr TSMATCH_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) and NULL for = tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, = escontext)) + YYABORT; + $$ =3D jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsquery_parser (4th) */ + if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, = escontext)) + YYABORT; + $$ =3D jppitem; + } + | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P + { + JsonPathParseItem *jppitem; + /* Pass NULL for tsconfig (3rd) */ + if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, = escontext)) + YYABORT; + $$ =3D jppitem; + } + | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P = STRING_P + { + JsonPathParseItem *jppitem; + if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, = escontext)) + YYABORT; + $$ =3D jppitem; + } ``` Feels a little redundant, repeatedly calls makeItemTsMatch. See the = attached diff for a simplification. But my version is a bit longer in = terms of number of lines. So, up to you. 4 - jsonpath_gram.y ``` +static bool +makeItemTsMatch(JsonPathParseItem *doc, + JsonPathString *tsquery, + JsonPathString *tsconfig, + JsonPathString *tsquery_parser, + JsonPathParseItem **result, + struct Node *escontext) ``` makeItemTsMatch doesn=E2=80=99t need to return a bool. Actually, now it = never returns false, instead, it just ereport(ERROR). 5 - jsonpath.h ``` + struct + { + int32 doc; + char *tsquery; + uint32 tsquerylen; + int32 tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; + struct + { + JsonPathParseItem *doc; + char *tsquery; + uint32 tsquerylen; + JsonPathParseItem *tsconfig; + char *tsqparser; + uint32 tsqparser_len; + } tsmatch; } value; ``` tsquerylen doesn=E2=80=99t have _ before len, and tsqparser_len, would = it be better to make naming conventions consistent in the same = structure? 6 - jsonpath_exec.c ``` #include "tsearch/ts_utils.h" #include "tsearch/ts_cache.h" #include "utils/regproc.h" #include "catalog/namespace.h" static JsonPathBool executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg, void *param) ``` Why don=E2=80=99t put these includes to the header section together with = other includes? 7 - jsonpath_exec.c ``` + else + { + /* + * Fallback or Error for unknown flags = (should be caught by + * parser) + */ + ereport(ERROR, + = (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unrecognized = tsqparser flag"))); + } ``` This =E2=80=9Celse=E2=80=9D should never be entered as the same check = has been done by makeItemTsMatch. So, maybe just use an Assert here, or = pg_unreachable(). 8 - jsonpath_exec.c ``` + /* Setup Context (Run ONLY once per predicate) */ + if (!cxt->initialized) ``` While tracing this SQL: ``` evantest=3D# SELECT '{"tags": ["running", "jogging"]}'::jsonb evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")'; ?column? ---------- (1 row) ``` I noticed that, when process =E2=80=9Cjogging=E2=80=9D, cxt->initialized = is still false, meaning that, the cxt is not reused across array items. = Given the same tsconfig should apply to all array items, I think cxt = should be reused. 9 - jsonpath_exec.c ``` + /* Select Parser and Compile Query */ + parser_mode =3D jsp->content.tsmatch.tsqparser; + parser_len =3D jsp->content.tsmatch.tsqparser_len; + + if (parser_len > 0) + { + /* Dispatch based on flag */ + if (pg_strncasecmp(parser_mode, "pl", = parser_len) =3D=3D 0) ``` Nit: parser_mode is only used inside if (parser_len > 0), it can be = defined inside the =E2=80=9Cif=E2=80=9D. 10 - jsonpath_gram.y ``` + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("invalid tsquery_parser = value: \"%s\"", tsquery_parser->val), + errhint("Valid values are = \"pl\", \"ph\", and \"w\"."))); ``` When tested a case with an invalid parser, I got: ``` evantest=3D# SELECT '{"tags": ["running", "jogging"]}'::jsonb = = @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" = tsqparser "pss")'; ERROR: invalid tsquery_parser value: = "pss=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F=7F= =7F=7F=7F=7F=7F@" LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar... ^ HINT: Valid values are "pl", "ph", and "w". ``` You can see the it shows a bad looking invalid value. I think that=E2=80=99= s because tsquery_parser->val is not NULL terminated. I fixed this = problem with: ``` errmsg("invalid tsquery_parser value: \"%.*s\"", (int) = tsquery_parser->len, tsquery_parser->val), ``` This change is also included in the attached diff file. 11 - jsonpath.c ``` + if (printBracketes) + appendStringInfoChar(buf, ')'); + break; + if (printBracketes) appendStringInfoChar(buf, ')'); ``` Duplicate code. Looks like a copy-pasto. 12 - jsonpath.c ``` + /* Write the Main Query String */ + appendBinaryStringInfo(buf, + = &item->value.tsmatch.tsquerylen, + = sizeof(item->value.tsmatch.tsquerylen)); + appendBinaryStringInfo(buf, + = item->value.tsmatch.tsquery, + = item->value.tsmatch.tsquerylen); + appendStringInfoChar(buf, '\0'); ``` I don=E2=80=99t think we need to manually append =E2=80=98\0=E2=80=99 = after appendBinaryStringInfo. Looking at the header comment of = appendBinaryStringInfo, it says that a trailing null will be added. ``` /* * appendBinaryStringInfo * * Append arbitrary binary data to a StringInfo, allocating more space * if necessary. Ensures that a trailing null byte is present. */ void appendBinaryStringInfo(StringInfo str, const void *data, int datalen) ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ --Apple-Mail=_A4920CB7-CA63-4626-B702-126AF5829B9E Content-Disposition: attachment; filename=nocfbot_jsonpath_gram_y.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="nocfbot_jsonpath_gram_y.diff" Content-Transfer-Encoding: 7bit diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y index a2acf2660f6..4d5837e2459 100644 --- a/src/backend/utils/adt/jsonpath_gram.y +++ b/src/backend/utils/adt/jsonpath_gram.y @@ -78,6 +78,13 @@ static bool makeItemTsMatch(JsonPathParseItem *doc, JsonPathParseItem *value; JsonPathParseResult *result; JsonPathItemType optype; + struct + { + bool has_tsconfig; + JsonPathString tsconfig; + bool has_tsqparser; + JsonPathString tsqparser; + } tsmatch_opts; bool boolean; int integer; } @@ -100,6 +107,8 @@ static bool makeItemTsMatch(JsonPathParseItem *doc, datetime_template opt_datetime_template csv_elem datetime_precision opt_datetime_precision +%type tsmatch_opts + %type accessor_expr csv_list opt_csv_list %type index_list @@ -192,36 +201,44 @@ predicate: YYABORT; $$ = jppitem; } - | expr TSMATCH_P STRING_P + | expr TSMATCH_P STRING_P tsmatch_opts { JsonPathParseItem *jppitem; - /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */ - if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext)) - YYABORT; + + if (! makeItemTsMatch($1, &$3, + $4.has_tsconfig ? &$4.tsconfig : NULL, + $4.has_tsqparser ? &$4.tsqparser : NULL, + &jppitem, escontext)) + YYABORT; + $$ = jppitem; } - | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P + ; + +tsmatch_opts: + /* EMPTY */ { - JsonPathParseItem *jppitem; - /* Pass NULL for tsquery_parser (4th) */ - if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = false; + $$.has_tsqparser = false; } - | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P + | TSCONFIG_P STRING_P { - JsonPathParseItem *jppitem; - /* Pass NULL for tsconfig (3rd) */ - if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = false; } - | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + | TSQUERYPARSER_P STRING_P { - JsonPathParseItem *jppitem; - if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext)) - YYABORT; - $$ = jppitem; + $$.has_tsconfig = false; + $$.has_tsqparser = true; + $$.tsqparser = $2; + } + | TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P + { + $$.has_tsconfig = true; + $$.tsconfig = $2; + $$.has_tsqparser = true; + $$.tsqparser = $4; } ; @@ -762,7 +779,7 @@ makeItemTsMatch(JsonPathParseItem *doc, { ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val), + errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val), errhint("Valid values are \"pl\", \"ph\", and \"w\"."))); } --Apple-Mail=_A4920CB7-CA63-4626-B702-126AF5829B9E--