Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qbD7W-00BPvL-JA for pgsql-hackers@arkaria.postgresql.org; Wed, 30 Aug 2023 04:47:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qbD7U-00EZqG-5w for pgsql-hackers@arkaria.postgresql.org; Wed, 30 Aug 2023 04:47:55 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qbD7T-00EZpy-H2 for pgsql-hackers@lists.postgresql.org; Wed, 30 Aug 2023 04:47:55 +0000 Received: from mail-oo1-xc35.google.com ([2607:f8b0:4864:20::c35]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qbD7L-001izU-KB for pgsql-hackers@postgresql.org; Wed, 30 Aug 2023 04:47:54 +0000 Received: by mail-oo1-xc35.google.com with SMTP id 006d021491bc7-57354433a7dso2279731eaf.1 for ; Tue, 29 Aug 2023 21:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693370866; x=1693975666; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KxVKXr1RHQciZ8ZwQK6Gu/Qfubb5/qgimY3LFcvZ3Cg=; b=Cwz4oJAGNm2BVbbFa39lTDoB3Ig2OFGEYqTVhSWPzy1RfVYMV55nR0+eyelgqOgILY BKjclRd9K7vmetfXzk8Bkf8Q6dOYBAjaNBP3UIM1Kh7GOvzFKZZDNE9ngFS5dxSzQdIB OUDamXrZ5ILOLX6LZIPm4hPhiljxxIqSMH8Ix1KHnN85UWzPhmArgNwV0vZ4HBbwn778 c9zlSMZpTjA2zRCnUFUahSq3enLNcPUvuFcg8NIvYjBeyDvTXfsWGc8MtdtC0+IpMeYI w/3jzRZAox5VPJXy0eEtGUA5tnqV5zHH2wo8MbX67uIrsSER+6rvBqTwz3I4SoBOSQZk DN0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693370866; x=1693975666; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KxVKXr1RHQciZ8ZwQK6Gu/Qfubb5/qgimY3LFcvZ3Cg=; b=gr+LBUOpDFaskOx1XvbvPSCFvXrgACU/REktG1TFA1YjuASpOFeG8EVmy5UAH6hj1N hsUmD9OhBhbiEAH9wc4/NiWccsOQdobfhXIRUDXZ5zkggyQViWnEHavMZRpJ1p5vyG56 C7HeyNfXDtBYcV3WefuHB6ZA36ZNvGYf5ewlmc5HkmdPxFlYKhilAc8+s4W+nZ0PVW2u p6DGlfeOP4ZrmBtyFbiEHP24DXNjsC2k2uYMlaVfo5P5BsPcx/zG+wr8OPZoBrZpBgJ0 /P3FZf0njyTRxVUxRX+wQnaPcCsQI7uDRFGC0+cpzWQgpIKqkh9KtvB8065/2Z7xDEZf MLAw== X-Gm-Message-State: AOJu0Yw6R9L4E2zdvhW6N55GT1Mz5XRP8GMlmAubjBBpxbgLAkYew+TE 04+4E8/YUfJKW+cDOrdD4QBJ/v2Vi/Bl/E1raOQ= X-Google-Smtp-Source: AGHT+IHNPlviJ9gblwyByNGxLczC9aMPIqSJ6V6+YMWFo1HAgI0BjLR9pZpEzxawA99xJWLk6uPmTG4RreVixCJh8Fo= X-Received: by 2002:a4a:301e:0:b0:571:23ce:a4af with SMTP id q30-20020a4a301e000000b0057123cea4afmr900656oof.3.1693370866546; Tue, 29 Aug 2023 21:47:46 -0700 (PDT) MIME-Version: 1.0 References: <3507485.1691090027@sss.pgh.pa.us> <900892914fdc3f477b101d699efb40e0@anastigmatix.net> <903341.1692022214@sss.pgh.pa.us> <4b97f1a1dd9b6e45443d24870d3be698@anastigmatix.net> <111272f2dc112c7becdd35ad89f6b935@anastigmatix.net> <5138c6b5fd239e7ce4e1a4e63826ac27@anastigmatix.net> <369543439e988ae43f0a6307500b27c4@anastigmatix.net> <5955e93347a7e3b1612cf7e129ae6d04@anastigmatix.net> In-Reply-To: <5955e93347a7e3b1612cf7e129ae6d04@anastigmatix.net> From: Andy Fan Date: Wed, 30 Aug 2023 12:47:35 +0800 Message-ID: Subject: Re: Extract numeric filed in JSONB more effectively To: Chapman Flack Cc: jian he , Pavel Stehule , Tom Lane , pgsql-hackers Content-Type: multipart/alternative; boundary="000000000000f93d5c06041ca08f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f93d5c06041ca08f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable (Sorry for leaving this discussion for such a long time, how times fly!) On Sun, Aug 27, 2023 at 6:28=E2=80=AFAM Chapman Flack wrote: > On 2023-08-22 08:16, Chapman Flack wrote: > > On 2023-08-22 01:54, Andy Fan wrote: > >> After we label it, we will get error like this: > >> > >> select (a->'a')::int4 from m; > >> ERROR: cannot display a value of type internal > > > > Without looking in depth right now, I would double-check > > what relabel node is being applied at the result. The idea, > > of course, was to relabel the result as the expected result > > as I suspected, looking at v10-0003, there's this: > > + fexpr =3D (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID, > + 0, InvalidOid, > COERCE_IMPLICIT_CAST); > > compared to the example I had sent earlier: > > On 2023-08-18 17:02, Chapman Flack wrote: > > expr =3D (Expr *)makeRelabelType((Expr *)fexpr, > > targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST); > > The key difference: this is the label going on the result type > of the function we are swapping in. I'm feeling we have some understanding gap in this area, let's see what it is. Suppose the original query is: numeric(jsonb_object_field(v_jsonb, text)) -> numeric. without the patch 003, the rewritten query is: jsonb_object_field_type(NUMERICOID, v_jsonb, text) -> NUMERIC. However the declared type of jsonb_object_field_type is: jsonb_object_field_type(internal, jsonb, text) -> internal. So the situation is: a). We input a CONST(type=3DOIDOID, ..) for an internal argument. b). We return a NUMERIC type which matches the original query c). result type NUMERIC doesn't match the declared type 'internal' d). it doesn't match the run-time type of internal argument which is OID. case a) is fixed by RelableType. case b) shouldn't be treat as an issue. I thought you wanted to address the case c), and patch 003 tries to fix it, then the ERROR above. At last I realized case c) isn't the one you want to fix. case d) shouldn't be requirement at the first place IIUC. So your new method is: 1. jsonb_{op}_start() -> internal (internal actually JsonbValue). 2. jsonb_finish_{type}(internal, ..) -> type. (internal is JsonbValue ). This avoids the case a) at the very beginning. I'd like to provides patches for both solutions for comparison. Any other benefits of this method I am missing? > Two more minor differences: (1) the node you get from > makeRelabelType is an Expr, but not really a FuncExpr. Casting > it to FuncExpr is a bit bogus. Also, the third argument to > makeRelabelType is a typmod, and I believe the "not-modified" > typmod is -1, not 0. > My fault, you are right. > > On 2023-08-21 06:50, Andy Fan wrote: > > I'm not very excited with this manner, reasons are: a). It will have > > to emit more steps in ExprState->steps which will be harmful for > > execution. The overhead is something I'm not willing to afford. > > I would be open to a performance comparison, but offhand I am not > sure whether the overhead of another step or two in an ExprState > is appreciably more than some of the overhead in the present patch, > such as the every-time-through fcinfo initialization buried in > DirectFunctionCall1 where you don't necessarily see it. I bet the fcinfo in an ExprState step gets set up once, and just has > new argument values slammed into it each time through. > fcinfo initialization in DirectFunctionCall1 is an interesting point! so I am persuaded the extra steps in ExprState may not be worse than the current way due to the "every-time-through fcinfo initialization" (in which case the memory is allocated once in heap rather than every time in stack). I can do a comparison at last to see if we can find some other interesting findings. > I would not underestimate the benefit of reducing the code > duplication and keeping the patch as clear as possible. > The key contributions of the patch are getting a numeric or > boolean efficiently out of the JSON operation. Getting from > numeric to int or float are things the system already does > well. True, reusing the casting system should be better than hard-code the casting function manually. I'd apply this on both methods. > A patch that focuses on what it contributes, and avoids > redoing things the system already can do--unless the duplication > can be shown to have a strong performance benefit--is easier to > review and probably to get integrated. > Agreed. At last, thanks for the great insights and patience! --=20 Best Regards Andy Fan --000000000000f93d5c06041ca08f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
(Sorry for leaving this discussion for such a long ti= me,=C2=A0 how times fly!)=C2=A0

On Sun, Aug 27, 2023 at 6:28=E2=80=AFAM Chap= man Flack <chap@anastigmatix.ne= t> wrote:
On 2023-08-22 08:16, Chapman Flack wrote:
> On 2023-08-22 01:54, Andy Fan wrote:
>> After we label it, we will get error like this:
>>
>> select (a->'a')::int4 from m;
>> ERROR:=C2=A0 cannot display a value of type internal
>
> Without looking in depth right now, I would double-check
> what relabel node is being applied at the result. The idea,
> of course, was to relabel the result as the expected result

as I suspected, looking at v10-0003, there's this:

+=C2=A0 =C2=A0fexpr =3D (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNA= LOID,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00, Invali= dOid,
COERCE_IMPLICIT_CAST);

compared to the example I had sent earlier:

On 2023-08-18 17:02, Chapman Flack wrote:
>=C2=A0 =C2=A0 =C2=A0expr =3D (Expr *)makeRelabelType((Expr *)fexpr,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0targetOid, -1, InvalidOid, COERCE_IMPLICIT_C= AST);

The key difference: this is the label going on the result type
of the function we are swapping in.

I'= m feeling we have some understanding gap in this area, let's
= see what it is.=C2=A0 Suppose the original query is:

numeric(jsonb_object_field(v_jsonb, text)) -> numeric.
=
without the patch 003,=C2=A0 the rewritten query is:
jsonb_object_field_type(NUMERICOID,=C2=A0 v_jsonb, text) -> = NUMERIC.=C2=A0

However the declared type of js= onb_object_field_type is:

jsonb_object_field_type(= internal, jsonb, text) -> internal.=C2=A0

So th= e situation is:=C2=A0 a).=C2=A0 We input a CONST(type=3DOIDOID, ..) for an<= /div>
internal argument.=C2=A0 b).=C2=A0 We return a NUMERIC type which= matches
the original query c).=C2=A0 result type NUMERIC doesn&#= 39;t match the declared
type=C2=A0 'internal'=C2=A0 d).= =C2=A0 it doesn't match the=C2=A0=C2=A0run-time=C2=A0type of internal= =C2=A0
argument which is OID.=C2=A0

case= a) is fixed by RelableType.=C2=A0 case b) shouldn't be treat as an
issue.=C2=A0 I thought you wanted to address the case c), and patch= =C2=A0
003 tries to fix it, then the ERROR above.=C2=A0 At last I= realized case=C2=A0
c) isn't the one you want to fix.=C2=A0 = case d) shouldn't be requirement
at the first place IIUC.=C2= =A0

So your new method is:
1. jsonb= _{op}_start() ->=C2=A0 internal=C2=A0 (internal actually JsonbValue).=C2= =A0
2. jsonb_finish_{type}(internal, ..) -> type.=C2=A0 =C2=A0= (internal is JsonbValue ).=C2=A0

This avoids= the case a) at the very beginning.=C2=A0 I'd like to provides
patches for both solutions for comparison.=C2=A0 Any other benefits of
this method I am missing?=C2=A0
=C2=A0
Two more minor differences: (1) the node you get from
makeRelabelType is an Expr, but not really a FuncExpr. Casting
it to FuncExpr is a bit bogus. Also, the third argument to
makeRelabelType is a typmod, and I believe the "not-modified"
typmod is -1, not 0.

My fault, you are = right.=C2=A0
=C2=A0

On 2023-08-21 06:50, Andy Fan wrote:
> I'm not very excited with this manner, reasons are: a).=C2=A0 It w= ill have
> to emit more steps in ExprState->steps which will be harmful for > execution. The overhead=C2=A0 is something I'm not willing to affo= rd.

I would be open to a performance comparison, but offhand I am not
sure whether the overhead of another step or two in an ExprState
is appreciably more than some of the overhead in the present patch,
such as the every-time-through fcinfo initialization buried in
DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
new argument values slammed into it each time through.

fcinfo initialization in DirectFunctionCall1 is an interest= ing point!
so=C2=A0 I am persuaded the extra steps in=C2=A0 ExprS= tate may not be=C2=A0
worse than the current way due to the "= ;every-time-through=C2=A0
fcinfo initialization" (in which c= ase the memory is allocated=C2=A0
once in heap rather than every = time in stack).=C2=A0 =C2=A0I can do a=C2=A0
comparison at last t= o see if we can find some other interesting
findings.=C2=A0=C2=A0=

=C2=A0
I would not underestimate the benefit of reducing the code
duplication and keeping the patch as clear as possible.
The key contributions of the patch are getting a numeric or
boolean efficiently out of the JSON operation. Getting from
numeric to int or float are things the system already does
well.

True, reusing the casting system sho= uld be better than hard-code
the casting function manually.=C2=A0= I'd apply this on both methods.=C2=A0
=C2=A0
A patch that focuses on what it con= tributes, and avoids
redoing things the system already can do--unless the duplication
can be shown to have a strong performance benefit--is easier to
review and probably to get integrated.

= Agreed.=C2=A0=C2=A0

At last, thanks for the great = insights and=C2=A0patience!

--
=
Best Regards
Andy Fan
--000000000000f93d5c06041ca08f--