Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qgg34-00GlOb-9d for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Sep 2023 06:41:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qgg32-006Emp-8z for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Sep 2023 06:41:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qgg31-006Emf-Rn for pgsql-hackers@lists.postgresql.org; Thu, 14 Sep 2023 06:41:55 +0000 Received: from mail-ot1-x334.google.com ([2607:f8b0:4864:20::334]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qgg2w-004z80-QK for pgsql-hackers@postgresql.org; Thu, 14 Sep 2023 06:41:54 +0000 Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-6c232d36da8so78760a34.1 for ; Wed, 13 Sep 2023 23:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694673708; x=1695278508; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9TIyvg9f0ChJEMfz8FDTpw1vMzeLEts3cR982AMjHSc=; b=HCDiKsGN0PKXa+GZ2fudMp6qjJTQEtCAKGMGUja1amt5qnx8hxg3z5yXqWB+/c2+Cp 6kIUbJjjcK5pk3b0CIyuNA847fFIMoSgHv8H8AFQDsLhU5B6jHu4oNA4cTr5hnaBEx+U +wAmVwADsw8Sf77E+v0DkOI8L044IhkC14B4FEDsdCxjmYJAFqwRKtrEe+vS/KV4ctdf 7DYexpT2/U5VC58Oe4WKO2oHe6Bp2RTXqyXsQTMYOoQUYzacDK8wLhFOU8hVL+3qmmHb zJ36IYzF1gMtlbXyOW6yaVBF0l0A13ESxsHRWVEvT4/B95Ddgga9N38lWxbRJyXLLn9S STSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694673708; x=1695278508; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9TIyvg9f0ChJEMfz8FDTpw1vMzeLEts3cR982AMjHSc=; b=YV1thEhov9gb2A17CajsMWAK9Jt5dU7QZpb+DUTIMSroIncxCw5DTQKlPKgvvJ6KGG vfxzLGOFC74OK/J0LkIk2T5zuxbYrVBpXXID+agD3evU2wcXMSCvDVl3QWmhhbA5Rukv dwWGdtYGhFFqXZyIG8E86Qpd95oRv6knTtGVrHCY3pyHDzvPZ5vqEWz04rWZXnn5lB5E awaifUUPrFJWvFHnClaAUQE4JrOrW4usQk0EnydwAwEtABvSdyKHW5YhMHXp+h79tbL0 OfkUN+9LPnocQ3a/yDdbKo7mpjp4M7lN6olAFCmop6uU7ZIljN34N3Sl2YpehfobDRNU DU/A== X-Gm-Message-State: AOJu0YyiN7ullyINvipAz9fIyYjoZYSOFJlHOtylL0yRy/qo00eMyb62 QYgh4rzkRiIgXMPbBDXC69fRClF0twMuQk2ec2w= X-Google-Smtp-Source: AGHT+IFy9lDdpgdaIyZmpqk1hAZ2EYs2vgXnhcoSx2Kyf7sDnLn9/SbbcWsb7GvlV1MyBb4cAWZfWZpY3msCg1dxU2w= X-Received: by 2002:a05:6870:b487:b0:1b0:b13:c16 with SMTP id y7-20020a056870b48700b001b00b130c16mr5807773oap.2.1694673708539; Wed, 13 Sep 2023 23:41:48 -0700 (PDT) MIME-Version: 1.0 References: <3507485.1691090027@sss.pgh.pa.us> <900892914fdc3f477b101d699efb40e0@anastigmatix.net> <903341.1692022214@sss.pgh.pa.us> <4b97f1a1dd9b6e45443d24870d3be698@anastigmatix.net> <111272f2dc112c7becdd35ad89f6b935@anastigmatix.net> <5138c6b5fd239e7ce4e1a4e63826ac27@anastigmatix.net> <369543439e988ae43f0a6307500b27c4@anastigmatix.net> <5955e93347a7e3b1612cf7e129ae6d04@anastigmatix.net> <43a988594ac91a63dc4bb49a94303a42@anastigmatix.net> In-Reply-To: <43a988594ac91a63dc4bb49a94303a42@anastigmatix.net> From: Andy Fan Date: Thu, 14 Sep 2023 14:41:36 +0800 Message-ID: Subject: Re: Extract numeric filed in JSONB more effectively To: Chapman Flack Cc: jian he , Pavel Stehule , Tom Lane , pgsql-hackers Content-Type: multipart/alternative; boundary="00000000000068644906054bf874" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000068644906054bf874 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Sep 14, 2023 at 5:18=E2=80=AFAM Chapman Flack wrote: > On 2023-09-04 10:35, Andy Fan wrote: > > v13 attached. Changes includes: > > > > 1. fix the bug Jian provides. > > 2. reduce more code duplication without DirectFunctionCall. > > 3. add the overlooked jsonb_path_query and jsonb_path_query_first as > > candidates > > Apologies for the delay. I like the way this is shaping up. This is a great signal:) > > My first comment will be one that may be too large for this patch > (and too large to rest on my opinion alone); that's why I'm making > it first. > > It seems at first a minor point: to me it feels like a wart to have > to pass jsonb_finish_numeric (and *only* jsonb_finish_numeric) a type > oid reflecting the target type of a cast that's going to be applied > *after jsonb_finish_numeric has done its work*, and only for the > purpose of generating a message if the jsonb type *isn't numeric*, > but saying "cannot cast to" (that later target type) instead. > > I understand this is done to exactly match the existing behavior, > so what makes this a larger issue is I'm not convinced the existing > behavior is good. Therefore I'm not convinced that bending over > backward to preserve it is good. > I hesitated to do so, but I'm thinking if any postgresql user uses some code like if (errMsg.equals('old-error-message')), and if we change the error message, the application will be broken. I agree with the place for the error message, IIUC, you intend to choose not compatible with the old error message? What's not good: the places a message from cannotCastJsonbValue > are produced, there has been no attempt yet to cast anything. > The message purely tells you about whether you have the kind > of jsonb node you think you have (and array, bool, null, numeric, > object, string are the only kinds of those). If you're wrong > about what kind of jsonb node it is, you get this message. > If you're right about the kind of node, you don't get this > message, regardless of whether its value can be cast to the > later target type. For example, '32768'::jsonb::int2 produces > ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE "smallint out of range" > but that message comes from the actual int2 cast. > > IMV, what the "cannot cast jsonb foo to type %s" message really > means is "jsonb foo where jsonb bar is required" and that's what > it should say, and that message depends on nothing about any > future plans for what will be done to the jsonb bar, so it can > be produced without needing any extra information to be passed. > > I'm also not convinced ERRCODE_INVALID_PARAMETER_VALUE is a > good errcode for that message (whatever the wording). I do not > see much precedent elsewhere in the code for using > INVALID_PARAMETER_VALUE to signal this kind of "data value > isn't what you think it is" condition. Mostly it is used > when checking the kinds of parameters passed to a function to > indicate what it should do. > > There seem to be several more likely choices for an errcode > there in the 2203x range. > > But I understand that issue is not with this patch so much > as with preexisting behavior, and because it's preexisting, > there can be sound arguments against changing it. > But if that preexisting message could be changed, it would > eliminate the need for an unpleasing wart here. > > Other notes are more minor: > > + else > + /* not the desired pattern. */ > + PG_RETURN_POINTER(fexpr); > ... > + > + if (!OidIsValid(new_func_id)) > + PG_RETURN_POINTER(fexpr); > ... > + default: > + PG_RETURN_POINTER(fexpr); > > If I am reading supportnodes.h right, returning NULL is > sufficient to say no transformation is needed. > I double confirmed you are right here. Changed it to PG_RETURN_POINT(null); here in the next version. > > + FuncExpr *fexpr =3D palloc0(sizeof(FuncExpr)); > ... > + memcpy(fexpr, req->fcall, sizeof(FuncExpr)); > > Is the shallow copy necessary? If so, a comment explaining why > might be apropos. Because the copy is shallow, if there is any > concern about the lifespan of req->fcall, would there not be a > concern about its children? > All the interesting parts in the input FuncExpr are heap based, but the FuncExpr itself is stack based (I'm not sure why the fact works like this), Also based on my previously understanding, I need to return a FuncExpr original even if the function can't be simplified, so I made a shallow copy. There will be no copy at all in the following version since I returned NULL in such a case. > Is there a reason not to transform the _tz flavors of > jsonb_path_query and jsonb_path-query_first? > I misunderstood the _tz flavors return timestamp, after some deep reading of these functions, they just work at the comparisons part. so I will add them in the following version. > > - JsonbValue *v; > - JsonbValue vbuf; > + JsonbValue *v; > ... > + int i; > JsonbValue *jbvp =3D NULL; > - int i; > > Sometimes it's worth looking over a patch for changes like these, > and reverting such whitespace or position changes, if they aren't > things you want a reviewer to be squinting at. :) > Yes, I look over my patch carefully before sending it out usually, but this is an oversight. Lastly, do you have any idea about the function naming as Jian & I discussed at [1]? [1] https://www.postgresql.org/message-id/CAKU4AWqQ0hed%3DZmpT%2B7Vxnp4H9ZxQqFz= 30%3Dk%3DvvrmNe57X4dKQ%40mail.gmail.com --=20 Best Regards Andy Fan --00000000000068644906054bf874 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Sep 14, 2023 at 5:18=E2=80=AF= AM Chapman Flack <chap@anastigm= atix.net> wrote:
On 2023-09-04 10:35, Andy Fan wrote:
>=C2=A0 =C2=A0v13 attached.=C2=A0 Changes includes:
>
> 1.=C2=A0 fix the bug Jian provides.
> 2.=C2=A0 reduce more code duplication without DirectFunctionCall.
> 3.=C2=A0 add the overlooked=C2=A0 jsonb_path_query and jsonb_path_quer= y_first as
> candidates

Apologies for the delay. I like the way this is shaping up.

This is a great signal:)=C2=A0
=C2=A0
=C2=A0
My first comment will be one that may be too large for this patch
(and too large to rest on my opinion alone); that's why I'm making<= br> it first.

It seems at first a minor point: to me it feels like a wart to have
to pass jsonb_finish_numeric (and *only* jsonb_finish_numeric) a type
oid reflecting the target type of a cast that's going to be applied
*after jsonb_finish_numeric has done its work*, and only for the
purpose of generating a message if the jsonb type *isn't numeric*,
but saying "cannot cast to" (that later target type) instead.

I understand this is done to exactly match the existing behavior,
so what makes this a larger issue is I'm not convinced the existing
behavior is good. Therefore I'm not convinced that bending over
backward to preserve it is good.

I hesi= tated to do so, but I'm thinking if any postgresql user uses
= some code like=C2=A0 =C2=A0if (errMsg.equals('old-error-message')),= =C2=A0 and if we
change the error message, the application will b= e broken. I agree=C2=A0
with the place for the error message,=C2= =A0 IIUC,=C2=A0 you intend to choose
not compatible with the old = error message?=C2=A0

What's not good: the places a message from cannotCastJsonbValue
are produced, there has been no attempt yet to cast anything.
The message purely tells you about whether you have the kind
of jsonb node you think you have (and array, bool, null, numeric,
object, string are the only kinds of those). If you're wrong
about what kind of jsonb node it is, you get this message.
If you're right about the kind of node, you don't get this
message, regardless of whether its value can be cast to the
later target type. For example, '32768'::jsonb::int2 produces
ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE "smallint out of range"
but that message comes from the actual int2 cast.

IMV, what the "cannot cast jsonb foo to type %s" message really means is "jsonb foo where jsonb bar is required" and that's w= hat
it should say, and that message depends on nothing about any
future plans for what will be done to the jsonb bar, so it can
be produced without needing any extra information to be passed.

I'm also not convinced ERRCODE_INVALID_PARAMETER_VALUE is a
good errcode for that message (whatever the wording). I do not
see much precedent elsewhere in the code for using
INVALID_PARAMETER_VALUE to signal this kind of "data value
isn't what you think it is" condition. Mostly it is used
when checking the kinds of parameters passed to a function to
indicate what it should do.

There seem to be several more likely choices for an errcode
there in the 2203x range.

But I understand that issue is not with this patch so much
as with preexisting behavior, and because it's preexisting,
there can be sound arguments against changing it.=C2=A0=C2=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">
But if that preexisting message could be changed, it would
eliminate the need for an unpleasing wart here.

Other notes are more minor:

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0/* not the desired pattern. */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0PG_RETURN_POINTER(fexpr);
...
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!OidIsValid(new= _func_id))
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0PG_RETURN_POINTER(fexpr);
...
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0default:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PG_RETURN_POINTER(fexpr);

If I am reading supportnodes.h right, returning NULL is
sufficient to say no transformation is needed.

I double confirmed you are right here.=C2=A0=C2=A0
Change= d it to PG_RETURN_POINT(null);=C2=A0 =C2=A0here in the next version.=C2=A0<= /div>

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0FuncExpr=C2=A0 =C2= =A0 =C2=A0 =C2=A0 *fexpr =3D palloc0(sizeof(FuncExpr));
...
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(fexpr, req-&= gt;fcall, sizeof(FuncExpr));

Is the shallow copy necessary? If so, a comment explaining why
might be apropos. Because the copy is shallow, if there is any
concern about the lifespan of req->fcall, would there not be a
concern about its children?

All the int= eresting parts in the input FuncExpr are heap based,=C2=A0
but th= e FuncExpr itself is stack based (I'm not sure why the fact
w= orks like this),=C2=A0 Also based on my previously understanding, I
need to return a FuncExpr original even if the function can't be=C2= =A0
simplified, so I made a shallow copy.=C2=A0 There will be no = copy at=C2=A0
all in the following version since I returned NULL = in such a case.=C2=A0
=C2=A0
Is there a reason not to transform the _tz flavors of
jsonb_path_query and jsonb_path-query_first?

I misunderstood the _tz flavors return timestamp,=C2=A0 after some de= ep
reading of these functions, they just work at the=C2=A0compari= sons part.
so I will add them in the following version.=C2=A0=C2= =A0
=C2=A0

-=C2=A0 =C2=A0 =C2=A0 =C2=A0JsonbValue *v;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0JsonbValue=C2=A0 =C2=A0 =C2=A0 vbuf;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0JsonbValue=C2=A0 =C2=A0 =C2=A0 *v;
...
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int i;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 JsonbValue *jbvp =3D NULL;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0i;

Sometimes it's worth looking over a patch for changes like these,
and reverting such whitespace or position changes, if they aren't
things you want a reviewer to be squinting at. :)

=
Yes, I=C2=A0 look over my patch carefully before sending it out = usually,
but this is an oversight.=C2=A0

Lastly,=C2=A0 do you have any idea about the function naming as Jian &= I
discussed at [1]?


= --
B= est Regards
Andy Fan
--00000000000068644906054bf874--