Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBlJ0-004sNA-K2 for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Oct 2025 02:43:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vBlIz-002rf5-6X for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Oct 2025 02:43:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBlIy-002rej-RP for pgsql-hackers@lists.postgresql.org; Thu, 23 Oct 2025 02:43:55 +0000 Received: from mail-pf1-x435.google.com ([2607:f8b0:4864:20::435]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vBlIv-003ikk-1H for pgsql-hackers@lists.postgresql.org; Thu, 23 Oct 2025 02:43:55 +0000 Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-77f5d497692so394158b3a.1 for ; Wed, 22 Oct 2025 19:43:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761187431; x=1761792231; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1uG34MmtmtZrNaDxbEtDNDd63WOg14HgwLqFn1QuJ5k=; b=CJY4FqOFLHftCuCXm+0DyuEAYlfAfxbbfydpsyIuK4Y97AftR5MK8ruGtsiZgTluHA 6v6rSjpgD6vWO1sjgb9xzS5I37gchG4xNDJmF62NYUrp+9YItBO5vrf9qGOZa7Bq/Np0 7pIs6RVIzuOv70xo64GeJlEj96lExa0JSzxyyqkannNbVtsAzaLqvCtdwinvdl0Saj5B 9O3kAfPNUhfa+hLUV0M7bOJb3Tytu4XpLg2Q2mPmNHsLsN0F3sxdEKG6QWxz73u1OKwv LDydaGKiiDj9A0Xnv2u98DcZBb+2EjWE7YKoeT7lG79C/xpZiBqvIaQJy36/ogcDOb55 vQJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761187431; x=1761792231; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1uG34MmtmtZrNaDxbEtDNDd63WOg14HgwLqFn1QuJ5k=; b=i4GfTX2XwbEHZnNFGdtS58VkDaJWdQcIT5h8Uj8Pp0IobAFYauAbTu6drqqABsqd3D 4GooEnEf81kVCagTF6yg+ACmBLgMW7m4KtZqP2PpGvxBUK0sC/tDapR2uGCMizUotTYX q1k2UjG2XbiZfWz5vKHefhVUNn0InV/2MEnWK6i4E6/KVNzBaUIzNsqMi4E2Ddf1nova Nv9nB1ZrgEEAdEJt04UEC50z4eYd2lMZfgZdRsl2OHzjfap411kyyfFvhd/mFdluYXTa ie0qI7i7tR4rAz87lr4FtpuNbfH0DSfOelDDXx2f2g44oXmdteYfnHgzooULwwxn27aE fLHA== X-Forwarded-Encrypted: i=1; AJvYcCUKv0g6v/SHitSGO8T2bXWkSa2fl/Tu+xDx/hDBttO2NKJUwbq/ljO29ebl6dfnG2FFM+j5rb58AgbcRpzu@lists.postgresql.org X-Gm-Message-State: AOJu0YwNkmP0MRp5A9ZxpcZvSAbTJmtVpNwCj1LJlLs6wsSZjl1BT+4f hohToiHEYk8NJ8tiBUnEeIvelQCIIwgz9y1dFcgn8dxkL5P47oDr0qeU0wkrQuIiOqs= X-Gm-Gg: ASbGncuWD7hTCF9Nkg7XdcDbH+YLlllCS2TewF3DjYPNDt11b30lHNgcJOm82CeeiEP VRrz9nGxnSyThg2ItqAmi6lbeQDh2ZmTwnMt4U2E//yfMYPKxy5nHNtJcStJ6PNR6DuOjkdi7b2 U6fhEew8G0LuTdgm+3aaB2CrbNHACLXs55iArA4oFeqcnwcjLQQWsRPeF5sqr31yPgJ0eebm6eu nvF3LtH9TX2C9t6G1pORxVMo9PYGR5zqwbqHPRq5Aia6NlokI6prtySp3T5gUFABkFb2z9pvFqL 5wMpg0RGhzstfZZOkmJXDUSQzHuQBj/giWrrhZJ04KDinW+7afN/65f45R+lNmE8c1HJjZIyo/B Y7BSuLcO5x8uugSFgS21Fg4maQjALvd8/8L1+8Qxah7WEdtPyCWgdOxSmxBiZCisrlZzCGjBszW AFjI60huvc4UY= X-Google-Smtp-Source: AGHT+IGhD25MW1FVcMEYBHUdBXuKOY/MMKszAo/Q7Mhh0/XB5FC9XdTMFOisKU/e+18y/AxglYaPpQ== X-Received: by 2002:a05:6a21:4d17:b0:334:a9f2:bc8e with SMTP id adf61e73a8af0-334a9f2bd11mr27449057637.6.1761187430787; Wed, 22 Oct 2025 19:43:50 -0700 (PDT) Received: from smtpclient.apple ([170.178.170.211]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b6cf4e2d787sm527877a12.29.2025.10.22.19.43.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Oct 2025 19:43:50 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part From: Chao Li In-Reply-To: <202510210803.3b3vfvygxtjl@alvherre.pgsql> Date: Thu, 23 Oct 2025 10:43:13 +0800 Cc: "David E. Wheeler" , Florents Tselai , Tom Lane , Peter Eisentraut , Robert Haas , Alexander Korotkov , pgsql-hackers , Andrew Dunstan Content-Transfer-Encoding: quoted-printable Message-Id: <87530674-E6B6-4C97-A704-78C7E07CF01F@gmail.com> References: <202510210803.3b3vfvygxtjl@alvherre.pgsql> To: =?utf-8?Q?=C3=81lvaro_Herrera?= X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Alvora, I have reviewed and tested the patch, and got a few comments. > On Oct 21, 2025, at 16:05, =C3=81lvaro Herrera = wrote: >=20 > This was lacking rebase after the func.sgml changes. Here it is = again. > I have not reviewed it. >=20 > --=20 > =C3=81lvaro Herrera 48=C2=B001'N 7=C2=B057'E =E2=80=94 = https://www.EnterpriseDB.com/ > = 1 - jsonpath.c ``` case jpiStringFunc: return "string"; + case jpiStrReplace: + return "replace"; + case jpiStrLower: + return "lower"; + case jpiStrUpper: + return "upper"; case jpiTime: return "time"; case jpiTimeTz: @@ -914,6 +982,16 @@ jspOperationName(JsonPathItemType type) return "timestamp"; case jpiTimestampTz: return "timestamp_tz"; + case jpiStrLtrim: + return "ltrim"; + case jpiStrRtrim: + return "rtrim"; + case jpiStrBtrim: + return "btrim"; + case jpiStrInitcap: + return "initcap"; + case jpiStrSplitPart: + return "split_part"; default: elog(ERROR, "unrecognized jsonpath item type: = %d", type); return NULL; ``` I wonder if there is some consideration for the order? Feels that = jpiSttLtrim and the following jpiStrXXX should be placed above = jpiTimeXXX. 2 - jsonpath.c ``` + if (v->content.arg) + { + jspGetArg(v, &elem); + printJsonPathItem(buf, &elem, false, = false); + } ``` As there is jspGetArg(), for v->content.arg, does it make sense to add a = macro or inline function of jspHasArg(v)? 3 - jsonpath.c ``` + case jpiStrLtrim: + return "ltrim"; + case jpiStrRtrim: + return "rtrim"; + case jpiStrBtrim: + return "btrim"; ``` I know =E2=80=9Cb=E2=80=9D in =E2=80=9Cbtrim=E2=80=9D stands for = =E2=80=9Cboth=E2=80=9D, just curious why trim both side function is = named =E2=80=9Cbtrim()=E2=80=9D? In most of programming languages I am = aware of, trim() is the choice.=20 4 - jsonpath_exec.c ``` default: ; /* cant' happen */ ``` Typo: cant=E2=80=99 -> can=E2=80=99t=20 5 - jsonpath_exec.c ``` + /* Create the appropriate jb value to return */ + switch (jsp->type) + { + /* Cases for functions that return text */ + case jpiStrLower: + case jpiStrUpper: + case jpiStrReplace: + case jpiStrLtrim: + case jpiStrRtrim: + case jpiStrBtrim: + case jpiStrInitcap: + case jpiStrSplitPart: + jb->type =3D jbvString; + jb->val.string.val =3D resStr; + jb->val.string.len =3D = strlen(jb->val.string.val); + default: + ; + /* cant' happen */ + } ```=20 As =E2=80=9Cdefault=E2=80=9D clause has a comment =E2=80=9Ccan=E2=80=99t = happen=E2=80=9D, I believe =E2=80=9Cbreak=E2=80=9D is missing in the = case clause. Also, do we want to add an assert in default to report a message in case = it happens? 6 - jsonpath_exec.c ``` + resStr =3D = TextDatumGetCString(DirectFunctionCall3Coll(replace_text, + = = C_COLLATION_OID, + = = CStringGetTextDatum(tmp), + = = CStringGetTextDatum(from_str), + = = CStringGetTextDatum(to_str))); ``` For trim functions, DEFAULT_COLLATION_OID used. Why C_COLLATION_OID is = used for replace and split_part? I don=E2=80=99t see anything mentioned = in your changes to the doc (func-json.sgml). 7 - jsonpath_exec.c ``` + if (!(jb =3D getScalar(jb, jbvString))) + RETURN_ERROR(ereport(ERROR, + = (errcode(ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION), + = errmsg("jsonpath item method .%s() can only be applied to a string", + = jspOperationName(jsp->type))))); ``` ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION seems wrong, = this is a string function, not a date time function. 8 - jsonpath_exec.c ``` + switch (jsp->type) + { + case jpiStrLtrim: + case jpiStrRtrim: + case jpiStrBtrim: + { + char *characters_str; + int characters_len; + PGFunction func =3D NULL; + + if (jsp->content.arg) + { + switch (jsp->type) + { + case jpiStrLtrim: + func =3D ltrim; + break; + case jpiStrRtrim: + func =3D rtrim; + break; + case jpiStrBtrim: + func =3D btrim; + break; + default:; + } + jspGetArg(jsp, &elem); + if (elem.type !=3D jpiString) + elog(ERROR, "invalid = jsonpath item type for .%s() argument", jspOperationName(jsp->type)); + + characters_str =3D = jspGetString(&elem, &characters_len); + resStr =3D = TextDatumGetCString(DirectFunctionCall2Coll(func, + = = DEFAULT_COLLATION_OID, str, + = = CStringGetTextDatum(characters_str))); + break; + } + + switch (jsp->type) + { + case jpiStrLtrim: + func =3D ltrim1; + break; + case jpiStrRtrim: + func =3D rtrim1; + break; + case jpiStrBtrim: + func =3D btrim1; + break; + default:; + } + resStr =3D = TextDatumGetCString(DirectFunctionCall1Coll(func, + = = DEFAULT_COLLATION_OID, str)); + break; + } ``` The two nested =E2=80=9Cswitch (jsp->type)=E2=80=9D are quit redundant. = We can pull up the second one, and simplify the first one, something = like: ``` Switch (jsp->) { Case: .. } If (jsp->content.arg) { jspGetArg(jsp, &elem); ... } ``` 9 - jsonpath_exec.c ``` + if (elem.type !=3D jpiString) + elog(ERROR, "invalid jsonpath = item type for .replace() from"); + + from_str =3D jspGetString(&elem, = &from_len); + + jspGetRightArg(jsp, &elem); + if (elem.type !=3D jpiString) + elog(ERROR, "invalid jsonpath = item type for .replace() to"); ``` In these two elog(), do we want to log the invalid type? As I see in the = =E2=80=9Cdefault=E2=80=9D clause, jsp->type is logged: ``` + default: + elog(ERROR, "unsupported jsonpath item type: = %d", jsp->type); ``` Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/