Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRYlW-002BBF-L4 for pgsql-hackers@arkaria.postgresql.org; Thu, 03 Aug 2023 13:53:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qRYlV-00Dnd1-26 for pgsql-hackers@arkaria.postgresql.org; Thu, 03 Aug 2023 13:53:21 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRYlU-00Dnct-GV for pgsql-hackers@lists.postgresql.org; Thu, 03 Aug 2023 13:53:21 +0000 Received: from mail-yb1-xb36.google.com ([2607:f8b0:4864:20::b36]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qRYlS-000K8l-3x for pgsql-hackers@postgresql.org; Thu, 03 Aug 2023 13:53:19 +0000 Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-d3563cb41e9so1058968276.0 for ; Thu, 03 Aug 2023 06:53:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691070797; x=1691675597; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=k1Abzuy1qW8ghwgSXBpOBu76Eeotj/aJShvcWGsZheE=; b=FDE+rT/I68ogAKa4dlawW+2ciCVJlglUWlgg/jYklt62VgMePURMu0d8joMLkyZXyC M51YxCsdGHo3aVrE2a6bmbg7/Qii9hlbqHYnXHKoE3QfgHuTvFfvr8Y98edIvd49u2Go u3qvLErLuoJfTk/+u90q87YSwIJZLwvNWQ5m1+7R1eJp5GG0xPX+VMIv238RCPzX4JOy pA0pcYwnG1vRzXMsno53+xB5gSSBfIsUYhxcNASHnPWiIZu6jSLH0NYVTEEtGqYZLLzU 92lxVNMjrw3q9r4N6vYz5I1O8zMDIXxuQW8m64kRMlECFKwub3iBpbqsHyR+ko53UBQ3 fsTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691070797; x=1691675597; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=k1Abzuy1qW8ghwgSXBpOBu76Eeotj/aJShvcWGsZheE=; b=CjzKFb/xWSe+RJi0gdll5/CyC3HKbZd8+qlSwWTi2sszaMWxeD3VceYgRqjwydtoC6 oJV0K3iNyMbovFhopGMZKn5cCat/QCtctveGnpV3McHyF+MBNN8cYxiR8bmrYHiXTA/m ycDfkc+DHUKwFLIXI6iwBLDuyQWfrVtHqgVfx435XEQNUnrFrXHQ0ilzOnlM4xm0N6TZ PMoF+wO7dq47BXKrgtNqWb0oNbMbSV2F8vTjzJZb2B+8jSiWzSwHxj7f0UMTh3YmC+gJ FV/kd9oco+43fDJpMidZR+13dZ+8INvYtpILCKuryYbeY4ya6QrpxKM/QrDai6dFpz22 erSA== X-Gm-Message-State: ABy/qLas9yAnvgfqSIsP9xt1QeGuzgncNwByWdIw2Bu+JaDAYYhgxCtN Gif4sT8GrqREkhkH1/CIJwm7bKnOQv8UEKCsJM0Vm/Y7Kt8= X-Google-Smtp-Source: APBJJlFw+DFqA5rpyHuCMGdxMkVPeQy6v1YS8A6u78kUjx0mmWhmtUTlzGKppUebtU6rqeTyI8ekjmKgxFftRuhVRpo= X-Received: by 2002:a25:230e:0:b0:cad:c870:a108 with SMTP id j14-20020a25230e000000b00cadc870a108mr16886643ybj.10.1691070797112; Thu, 03 Aug 2023 06:53:17 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pavel Stehule Date: Thu, 3 Aug 2023 15:52:40 +0200 Message-ID: Subject: Re: Extract numeric filed in JSONB more effectively To: Andy Fan Cc: jian he , pgsql-hackers Content-Type: multipart/alternative; boundary="00000000000026eba10602051aa6" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000026eba10602051aa6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi =C4=8Dt 3. 8. 2023 v 15:23 odes=C3=ADlatel Andy Fan napsal: > Hi: > > >> More, I believe so lot of people uses more common syntax, and then this >> syntax should to have good performance - for jsonb - (val->'op')::numeri= c >> works, and then there should not be performance penalty, because this >> syntax will be used in 99%. >> > > This looks like a valid opinion IMO, but to rescue it, we have to do > something like "internal structure" and remove the existing cast. > But even we pay the effort, it still breaks some common knowledge, > since xx:numeric is not a cast. It is an "internal structure"! > I didn't study jsonb function, but there is an xml function that extracts value and next casts it to some target type. It does what is expected - for known types use hard coded casts, for other ask system catalog for cast function or does IO cast. This code is used for the XMLTABLE function. The JSON_TABLE function is not implemented yet, but there should be similar code. If you use explicit cast, then the code should not be hard, in the rewrite stage all information should be known. > > I don't think "Black magic" is a proper word here, since it is not much >>> different from ->> return a text. If you argue text can be cast to >>> most-of-types, that would be a reason, but I doubt this difference >>> should generate a "black magic". >>> >> >> I used the term black magic, because nobody without reading documentatio= n >> can find this operator. >> > > I think this is what document is used for.. > > >> It is used just for this special case, and the functionality is the same >> as using cast (only with different performance). >> > > This is not good, but I didn't see a better choice so far, see my first > graph. > > >> >> The operator ->> is more widely used. But if we have some possibility to >> work without it, then the usage for a lot of users will be more simple. >> More if the target types can be based on context >> > > It would be cool but still I didn't see a way to do that without making > something else complex. > sure - it is significantly more work, but it should be usable for all types and just use common syntax. The custom @-> operator you can implement in your own custom extension. Builtin solutions should be generic as it is possible. The things should be as simple as possible - mainly for users, that missing knowledge, and any other possibility of how to do some task just increases their confusion. Can be nice if users find one solution on stack overflow and this solution should be great for performance too. It is worse if users find more solutions, but it is not too bad, if these solutions have similar performance. It is too bad if any solution has great performance and others not too much. Users has not internal knowledge, and then don't understand why sometimes should to use special operator and not common syntax. > > >>>> Maybe we can introduce some *internal operator* "extract to type", and >>>> in rewrite stage we can the pattern (x->'field')::type transform to OP= (x, >>>> 'field', typid) >>>> >>> >>> Not sure what the OP should be? If it is a function, what is the >>> return value? It looks to me like it is hard to do in c language? >>> >> >> It should be internal structure - it can be similar like COALESCE or IS >> operator >> > > It may work, but see my answer in the first graph. > > >> >> >>> >>> After all, if we really care about the number of operators, I'm OK >>> with just let users use the function directly, like >>> >>> jsonb_field_as_numeric(jsonb, 'filedname') >>> jsonb_field_as_timestamp(jsonb, 'filedname'); >>> jsonb_field_as_timestamptz(jsonb, 'filedname'); >>> jsonb_field_as_date(jsonb, 'filedname'); >>> >>> it can save an operator and sloves the readable issue. >>> >> >> I don't like it too much, but it is better than introduction new operato= r >> > > Good to know it. Naming operators is a complex task if we add four. > > >> We already have the jsonb_extract_path and jsonb_extract_path_text >> function. >> > > I can't follow this. jsonb_extract_path returns a jsonb, which is far > away from > our goal: return a numeric effectively? > I proposed `jsonb_extract_path_type` that is of anyelement type. Regards Pavel > I can imagine to usage "anyelement" type too. some like >> `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )` >> > > Can you elaborate this please? > > -- > Best Regards > Andy Fan > --00000000000026eba10602051aa6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

=C4=8Dt 3. 8. 2023 v=C2=A015:23 odes=C3=ADlat= el Andy Fan <zhihui.fan1213@= gmail.com> napsal:
Hi:
= =C2=A0
More, I bel= ieve so lot of people uses more common syntax, and then this syntax should = to have good performance - for jsonb - (val->'op')::numeric work= s, and then there should not be performance penalty, because this syntax wi= ll be used in 99%.

This looks l= ike a valid opinion IMO,=C2=A0 but to rescue=C2=A0it, we have to do
something like "internal structure" and remove the existing ca= st.=C2=A0
But even we pay the effort, it still breaks some common= knowledge,
since xx:numeric is not a cast.=C2=A0 It is an "= internal structure"!=C2=A0=C2=A0
I didn't study jsonb function, but there is an xml functio= n that extracts value and next casts it to some target type. It does what i= s expected - for known types use hard coded casts, for other ask system cat= alog for cast function or does IO cast. This code is used for the XMLTABLE = function. The JSON_TABLE function is not implemented yet, but there should = be similar code. If you use explicit cast, then the code should not be hard= , in the rewrite stage all information should be known.

=



I don't think "Black magic&q= uot; is a proper word here, since it is not much
different from -= >> return a text.=C2=A0 If you argue text can be cast to=C2=A0
<= div>most-of-types,=C2=A0 that would be a reason, but I doubt this differenc= e
should generate a "black magic".=C2=A0

I used the term black magic, because no= body without reading documentation can find this operator.

I think this is what document is used fo= r..=C2=A0 =C2=A0
=C2=A0
It is used j= ust for this special case, and the functionality is the same as using cast = (only with different performance).

<= /div>
This is not good, but I didn't see a better choice so far,=C2= =A0 see my first graph.
=C2=A0

<= /div>
The operator ->> is more widely used. But if we have some p= ossibility to work without it, then the usage for a lot of users will be mo= re simple. More if the target types can be based on context
=

It would be cool but still I didn= 9;t see a way to do that without making
something else complex.= =C2=A0

sure - it is signi= ficantly more work, but it should be usable for all=20 types and just use common syntax. The custom @-> operator you can=20 implement in your own custom extension. Builtin solutions should be=20 generic as it is possible.

The th= ings should be as simple as possible - mainly for users, that missing knowl= edge, and any other possibility of how to do some task just increases their= confusion. Can be nice if users find one solution on stack overflow and th= is solution should be great for performance too. It is worse if users find = more solutions, but it is not too bad, if these solutions have similar perf= ormance. It is too bad if any solution has great performance and others not= too much. Users has not internal knowledge, and then don't understand = why sometimes should to use special operator and not common syntax.
=C2=A0


Maybe we can introd= uce some *internal operator* "extract to type", and in rewrite st= age we can the pattern (x->'field')::type transform to OP(x, = 9;field', typid)

Not = sure what the OP should be?=C2=A0 If it is a function, what is the
return value?=C2=A0 It looks to me like it is hard to do in c language?

It should be internal stru= cture - it can be similar like COALESCE or IS operator

It may work, but see my answer in the first g= raph.=C2=A0
=C2=A0
=C2=A0

After all,=C2=A0 if we really care about the number of operators, I&= #39;m OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname')=C2= =A0
jsonb_field_as_timestamp(jsonb, 'filedname');=C2=A0
jsonb_field_as_timestamptz(jsonb, 'filedname');=C2=A0
<= /div>
jsonb_field_as_date(jsonb, 'filedname');=C2=A0
<= div>
it can save an operator and sloves the readable issue.= =C2=A0

I don't like i= t too much, but it is better than introduction new operator=C2=A0

Good to know it.=C2=A0 Naming oper= ators is a complex task=C2=A0 if we add four.=C2=A0


We already have the jsonb_extract_path= and jsonb_extract_path_text function.
<= br>
I can't follow this.=C2=A0 jsonb_extract_path returns a j= sonb, which is=C2=A0=C2=A0far away from
our goal: return a numeri= c effectively?

I proposed= `jsonb_extract_path_type` that is of anyelement type.

<= /div>
Regards

Pavel



I c= an imagine to usage "anyelement" type too. some like `jsonb_extra= ct_path_type(jsonb, anyelement, variadic text[] )`
=C2=A0
Can you=C2=A0elaborate this please?=C2=A0 =C2= =A0

--
Best Re= gards
Andy Fan
--00000000000026eba10602051aa6--