Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRUwM-001ynm-JP for pgsql-hackers@arkaria.postgresql.org; Thu, 03 Aug 2023 09:48:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qRUwK-00C62e-IE for pgsql-hackers@arkaria.postgresql.org; Thu, 03 Aug 2023 09:48:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRUwK-00C62W-0t for pgsql-hackers@lists.postgresql.org; Thu, 03 Aug 2023 09:48:16 +0000 Received: from mail-yw1-x1132.google.com ([2607:f8b0:4864:20::1132]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qRUwH-000I9W-DV for pgsql-hackers@postgresql.org; Thu, 03 Aug 2023 09:48:15 +0000 Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-583f99641adso7725317b3.2 for ; Thu, 03 Aug 2023 02:48:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691056092; x=1691660892; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=snq/sIBzwVKYLWTQG38pnEsAsvy792HMSvPVDQ+Dmxw=; b=dapgzJUHYLuhs9ud3ZAgKifhAhE8/0/Sk4G4ZRNod4J8FPLhmT7Qc2zlojYxzdqTxk raRRmCQlcEWZcO/WLEhm5bEgA1E9osXke9AC2CeGALIMg6eB05YN6WNecIXVj70JLLlY JV1J2hTy75M1E8Az44yGa8Du/Zdg131O84x4hriQGq3kI9P2POL0ne/h4a6bsamReNKn 8uLoQ4hLN1gBmxAbjNIspd/eL2GTc3ZCgVEm3q5UpCzspt5Z6Qr/4CDI4H4sY2kPwvWQ AW8v71Z60zYMF0T6qGA4MhIBu8cG3gWemFpYy6TQDpMw5cdPnmrIy9oFFTYrePtuUoRB C/Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691056092; x=1691660892; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=snq/sIBzwVKYLWTQG38pnEsAsvy792HMSvPVDQ+Dmxw=; b=WX4+xMhrGFS0DhSUM7O2gy5pgDGFXBf8zAdxOJnd8BSy89uvMPQtcYkh+/caQO4a8V KsscurLIns/gNFP5rpEsy0OgqWomTkui2TNiutmaP6rVCVzHEHRTeNKAcLJSOKGpzu6h VYmfghDBgJbDaguB3A/K2E06eM2ud3qfqkPKyvPgbCsXRMStOGClvnKhae3Npqsle9Ge WI04izPmtoQPM3gID04SP+CjAVVAbFCv7u8TO323ZZSG9HtTiadnJTKW7ZOmngSiB4qx 8+u8SMalYjW79eQFQnrxHt8yQDJwD8W9tjsUcHpG5jyiofMX5qoY/h2QzkaAn6x8A+qG adHg== X-Gm-Message-State: ABy/qLbO/fDsRaV+YbvhSOyKNv0MymJ5AH1b9O8aDp8DoKkZooZGaVBU 8SB4ognyFIg8qMNK1Si2ltlJwaRYT3tCzUW9IK8= X-Google-Smtp-Source: APBJJlHu80EPkz2TP+dcgY2swVJ99awsCxIOUMvoy6iAO6QxqdTEibfgD2fi61z0u/zDcjaWvXbm5b4fURiIYB+b/VA= X-Received: by 2002:a0d:e691:0:b0:576:9d2e:ef2b with SMTP id p139-20020a0de691000000b005769d2eef2bmr15265693ywe.10.1691056090945; Thu, 03 Aug 2023 02:48:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pavel Stehule Date: Thu, 3 Aug 2023 11:47:33 +0200 Message-ID: Subject: Re: Extract numeric filed in JSONB more effectively To: Andy Fan Cc: jian he , pgsql-hackers Content-Type: multipart/alternative; boundary="000000000000989f97060201adfe" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000989f97060201adfe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi =C4=8Dt 3. 8. 2023 v 9:53 odes=C3=ADlatel Andy Fan napsal: > Hi Pavel: > > Thanks for the feedback. > > I don't like this solution because it is bloating operators and it is no= t >> extra readable. >> > > If we support it with cast, could we say we are bloating CAST? It is tru= e > that it is not extra readable, if so how about a->>'a' return text? > Actually > I can't guess any meaning of the existing jsonb operations without > documentation. > yes, it can bloat CAST, but for usage we have already used syntax, and these casts are cooked already: (2023-08-03 11:04:51) postgres=3D# select castfunc::regprocedure from pg_ca= st where castsource =3D 'jsonb'::regtype; =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=90 =E2=94=82 castfunc =E2=94=82 =E2=95=9E=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=A1 =E2=94=82 - =E2=94=82 =E2=94=82 bool(jsonb) =E2=94=82 =E2=94=82 "numeric"(jsonb) =E2=94=82 =E2=94=82 int2(jsonb) =E2=94=82 =E2=94=82 int4(jsonb) =E2=94=82 =E2=94=82 int8(jsonb) =E2=94=82 =E2=94=82 float4(jsonb) =E2=94=82 =E2=94=82 float8(jsonb) =E2=94=82 =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=98 (8 rows) the operator ->> was a special case, the text type is special in postgres as the most convertible type. And when you want to visualise a value or display the value, you should convert value to text. I can live with that because it is just one, but with your proposal opening the doors for implementing tens of similar operators, I think it is bad. Using ::target_type is common syntax and doesn't require reading documentation. More, I believe so lot of people uses more common syntax, and then this syntax should to have good performance - for jsonb - (val->'op')::numeric works, and then there should not be performance penalty, because this syntax will be used in 99%. Usage of cast is self documented. > For completeness you should implement cast for date, int, boolean too. >> Next, the same problem is with XML or hstore type (probably with any typ= es >> that are containers). >> > > I am not sure completeness is a gold rule we should obey anytime, > like we have some function like int24le to avoid the unnecessary > cast, but we just avoid casting for special types for performance > reason, but not for all. At the same time, `int2/int4/int8` doesn't > have a binary compatibility type in jsonb. and the serialization > /deserialization for boolean is pretty cheap. > > I didn't realize timetime types are binary compatible with SQL, > so maybe we can have some similar optimization as well. > (It is a pity that timestamp(tz) are not binary, or else we may > just need one operator). > > >> >> I don't like the idea so using a special operator is 2x faster than >> common syntax for casting. It is a signal, so there is a space for >> optimization. Black magic with special operators is not user friendly fo= r >> relatively common problems. >> > > I don't think "Black magic" is a proper word here, since it is not much > different from ->> return a text. If you argue text can be cast to > most-of-types, that would be a reason, but I doubt this difference > should generate a "black magic". > I used the term black magic, because nobody without reading documentation can find this operator. It is used just for this special case, and the functionality is the same as using cast (only with different performance). The operator ->> is more widely used. But if we have some possibility to work without it, then the usage for a lot of users will be more simple. More if the target types can be based on context Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead `EXTRACT(YEAR FROM (val->>'field')::date)` > >> >> Maybe we can introduce some *internal operator* "extract to type", and i= n >> rewrite stage we can the pattern (x->'field')::type transform to OP(x, >> 'field', typid) >> > > Not sure what the OP should be? If it is a function, what is the > return value? It looks to me like it is hard to do in c language? > It should be internal structure - it can be similar like COALESCE or IS operator > > After all, if we really care about the number of operators, I'm OK > with just let users use the function directly, like > > jsonb_field_as_numeric(jsonb, 'filedname') > jsonb_field_as_timestamp(jsonb, 'filedname'); > jsonb_field_as_timestamptz(jsonb, 'filedname'); > jsonb_field_as_date(jsonb, 'filedname'); > > it can save an operator and sloves the readable issue. > I don't like it too much, but it is better than introduction new operator We already have the jsonb_extract_path and jsonb_extract_path_text function. I can imagine to usage "anyelement" type too. some like `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )` > -- > Best Regards > Andy Fan > --000000000000989f97060201adfe Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

=C4=8Dt 3. 8. 2023 v=C2=A09:53 odes=C3=ADlate= l Andy Fan <zhihui.fan1213@g= mail.com> napsal:
Hi Pavel:

Thanks = for the feedback.=C2=A0

I don't like this solution because it is bloating= =C2=A0 operators and it is not extra readable.
=C2=A0
If we support it with cast, could we say we a= re bloating=C2=A0CAST?=C2=A0 It is true
that it is not extra read= able, if so how about=C2=A0 a->>'a'=C2=A0 return text?=C2=A0 = Actually
I can't guess any meaning of the existing json= b operations without
documentation.

yes, it can bloat CAST, but for usage we have already = used syntax, and these casts are cooked already:

<= span style=3D"font-family:monospace">(2023-08-03 11:04:51) postgres=3D# sel= ect castfunc::regprocedure from pg_cast where castsource =3D 'jsonb'= ;::regtype;
=E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=90
=E2=94=82 =C2=A0 =C2=A0 ca= stfunc =C2=A0 =C2=A0 =E2=94=82
=E2=95=9E=E2=95=90=E2=95=90=E2=95=90=E2= =95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95= =90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=90=E2=95=A1
=E2= =94=82 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=E2=94=82=E2=94=82 bool(jsonb) =C2=A0 =C2=A0 =C2=A0=E2=94=82
=E2=94=82 "nu= meric"(jsonb) =E2=94=82
=E2=94=82 int2(jsonb) =C2=A0 =C2=A0 =C2=A0= =E2=94=82
=E2=94=82 int4(jsonb) =C2=A0 =C2=A0 =C2=A0=E2=94=82
=E2=94= =82 int8(jsonb) =C2=A0 =C2=A0 =C2=A0=E2=94=82
=E2=94=82 float4(jsonb) = =C2=A0 =C2=A0=E2=94=82
=E2=94=82 float8(jsonb) =C2=A0 =C2=A0=E2=94=82=E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=98
(8 rows)
=C2=A0
the operator ->> was a special case, the text type is specia= l in postgres as the most convertible type. And when you want to visualise = a value or display the value, you should convert value to text.
=

I can live = with that because it is just one, but with your proposal opening the doors = for implementing tens of similar operators, I think it is bad. Using ::targ= et_type is common syntax and doesn't require reading documentation.

More, I b= elieve so lot of people uses more common syntax, and then this syntax shoul= d to have good performance - for jsonb - (val->'op')::numeric wo= rks, and then there should not be performance penalty, because this syntax = will be used in 99%.

Usage of cast is self documented.


For completeness you should implement cast for date, int, bo= olean too. Next, the same problem is with XML or hstore type (probably with= any types that are containers).

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance<= /div>
reason, but not for all. At the same time,=C2=A0 `int2/int4/int8`= doesn't
have a binary compatibility type in jsonb. and the s= erialization
/deserialization for boolean is pretty cheap.
<= div>
I didn't realize timetime=C2=A0types are binary comp= atible with SQL,=C2=A0=C2=A0
so maybe we can have some similar op= timization as well.=C2=A0
(It is a pity that timestamp(tz) are no= t binary, or else we may=C2=A0
just need one operator).=C2=A0
=C2=A0

I don't like = the idea so using a special operator is 2x faster than common syntax for ca= sting. It is a signal, so there is a space for optimization. Black magic wi= th special operators is not user friendly for relatively common problems.

I don't think &quo= t;Black magic" is a proper word here, since it is not much
d= ifferent from ->> return a text.=C2=A0 If you argue text can be cast = to=C2=A0
most-of-types,=C2=A0 that would be a reason, but I doubt= this difference
should generate a "black magic".=C2=A0=

I used the term black ma= gic, because nobody without reading documentation can find this operator. I= t is used just for this special case, and the functionality is the same as = using cast (only with different performance).

The = operator ->> is more widely used. But if we have some possibility to = work without it, then the usage for a lot of users will be more simple. Mor= e if the target types can be based on context

= Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` i= nstead `EXTRACT(YEAR FROM (val->>'field')::date)`

=C2=A0
Maybe we can introduce some *internal operator* "extract = to type", and in rewrite stage we can the pattern (x->'field= 9;)::type transform to OP(x, 'field', typid)

Not sure what the OP should be?=C2=A0 If it is = a function, what is the
return value?=C2=A0 It looks to me like i= t is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or = IS operator
=C2=A0

After all,=C2=A0 if we really care about the number of operators, I'm= OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')=C2=A0
jsonb_field_as_timestamp(jsonb, 'filedname');=C2=A0
jsonb_field_as_timestamptz(jsonb, 'filedname');=C2=A0
<= div>jsonb_field_as_date(jsonb, 'filedname');=C2=A0
it can save an operator and sloves the readable issue.=C2=A0

I don't like it too mu= ch, but it is better than introduction new operator=C2=A0

We already have the jsonb_extract_path and jsonb_extract_path_text = function.

I can imagine to usage "anyele= ment" type too. some like `jsonb_extract_path_type(jsonb, anyelement, = variadic text[] )`



<= br>

<= div dir=3D"ltr">

--
Best Regards
Andy Fan
--000000000000989f97060201adfe--