Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rhBL0-00A3zr-Jo for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Mar 2024 16:38:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rhBKz-009Q6J-7h for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Mar 2024 16:38:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rhBKy-009Q6A-A0 for pgsql-hackers@lists.postgresql.org; Mon, 04 Mar 2024 16:38:49 +0000 Received: from m16.mail.163.com ([220.197.31.4]) by makus.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rhBKs-002pMG-AA for pgsql-hackers@lists.postgresql.org; Mon, 04 Mar 2024 16:38:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-ID:MIME-Version: Content-Type; bh=MS+SdTQVN2YvawHeW13dpnHq/s0DyAA4qNTHrxbWRBI=; b=IBAds8Ng9lJxiTcJ3OOZtHhFwO3WIE7axtgo6EsSqPxdXgJwWeGYxr6Y1P5KwO q7aytr906FIeF5B5AS5MNULevCTa3aVNl8WgIznGpUKK/MFYIg5oXkZzDsee1HYN jGR3geyRZeOVR6A3qWntAHdvcApLQFjeXJQBJPoUe1T1I= Received: from 8235eee8a2a0 (unknown [140.205.118.113]) by gzga-smtp-mta-g2-0 (Coremail) with SMTP id _____wD3nzfS+OVl7QE2AQ--.544S3; Tue, 05 Mar 2024 00:37:39 +0800 (CST) References: <169880504467.94392.3769687331705514588.pgcf@coridan.postgresql.org> <87a5rry0bz.fsf@163.com> <87h6jpob9x.fsf@163.com> <87r0hmvuvr.fsf@163.com> <8102ff5b-b156-409e-a48f-e53e63a39b36@eisentraut.org> User-agent: mu4e 1.10.7; emacs 29.1 From: Andy Fan To: Peter Eisentraut ,Amit Langote ,Alvaro Herrera Cc: jian he , Chapman Flack , pgsql-hackers@lists.postgresql.org Subject: Re: Extract numeric filed in JSONB more effectively Date: Tue, 05 Mar 2024 00:14:43 +0800 In-reply-to: <8102ff5b-b156-409e-a48f-e53e63a39b36@eisentraut.org> Message-ID: <8734t6c5rh.fsf@163.com> MIME-Version: 1.0 Content-Type: text/plain X-CM-TRANSID:_____wD3nzfS+OVl7QE2AQ--.544S3 X-Coremail-Antispam: 1Uf129KBjvJXoW3Gr1rAw4DXF1rJr1DXw43KFg_yoW7XFyUpF 4fKr1akr4DJFyvkrs7ZayrXry8Ar1rAr17Kw1qqw18C345C34UXr4Yqr40krW7Gry7Gw4Y qry5ur15CF4vyFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UTKZAUUUUU= X-Originating-IP: [140.205.118.113] X-CM-SenderInfo: x2klx3xlid0iqsrtqiywtou0bp/1tbiVBiXU2VOBvCtAwAAs7 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Peter Eisentraut writes: > On 09.02.24 10:05, Andy Fan wrote: >> 2. Where is the current feature blocked for the past few months? >> It's error message compatible issue! Continue with above setup: >> master: >> select * from tb where (a->'b')::numeric > 3::numeric; >> ERROR: cannot cast jsonb string to type numeric >> select * from tb where (a->'b')::int4 > 3::numeric; >> ERROR: cannot cast jsonb string to type integer >> You can see the error message is different (numeric vs integer). >> Patched: >> We still can get the same error message as master BUT the code >> looks odd. >> select * from tb where (a->'b')::int4 > 3; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------- >> Seq Scan on public.tb >> Output: a >> Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3) >> (3 rows) >> You can see "jsonb_finish_numeric(.., '23::oid)" the '23::oid' is >> just >> for the *"integer"* output in error message: >> "cannot cast jsonb string to type*integer*" >> Now the sistuation is either we use the odd argument (23::oid) in >> jsonb_finish_numeric, or we use a incompatible error message with the >> previous version. I'm not sure which way is better, but this is the >> place the current feature is blocked. > > I'm not bothered by that. It also happens on occasion in the backend C > code that we pass around extra information to be able to construct > better error messages. The functions here are not backend C code, but > they are internal functions, so similar considerations can apply. Thanks for speaking on this! > > But I have a different question about this patch set. This has some > overlap with the JSON_VALUE function that is being discussed at > [0][1]. For example, if I apply the patch > v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run > > select count(*) from tb where json_value(a, '$.a' returning numeric) = 2; > > and I get a noticeable performance boost over > > select count(*) from tb where cast (a->'a' as numeric) = 2; Here is my test and profile about the above 2 queries. create table tb(a jsonb); insert into tb select jsonb_build_object('a', i) from generate_series(1, 10000)i; cat a.sql select count(*) from tb where json_value(a, '$.a' returning numeric) = 2; cat b.sql select count(*) from tb where cast (a->'a' as numeric) = 2; pgbench -n -f xxx.sql postgres -T100 | grep lat Then here is the result: | query | master | patched (patch here and jsonb_value) | |-------+---------+-------------------------------------| | a.sql | / | 2.59 (ms) | | b.sql | 3.34 ms | 1.75 (ms) | As we can see the patch here has the best performance (this result looks be different from yours?). After I check the code, I am sure both patches *don't* have the problem in master where it get a jsonbvalue first and convert it to jsonb and then cast to numeric. Then I perf the result, and find the below stuff: JSOB_VALUE ------------ - 32.02% 4.30% postgres postgres [.] JsonPathValue - 27.72% JsonPathValue - 22.63% executeJsonPath (inlined) - 19.97% executeItem (inlined) - executeItemOptUnwrapTarget - 17.79% executeNextItem - 15.49% executeItem (inlined) - executeItemOptUnwrapTarget + 8.50% getKeyJsonValueFromContainer (note here..) + 2.30% executeNextItem 0.79% findJsonbValueFromContainer + 0.68% check_stack_depth + 1.51% jspGetNext + 0.73% check_stack_depth 1.27% jspInitByBuffer 0.85% JsonbExtractScalar + 4.91% DatumGetJsonbP (inlined) Patch here for b.sql: --------------------- - 19.98% 2.10% postgres postgres [.] jsonb_object_field_start - 17.88% jsonb_object_field_start - 17.70% jsonb_object_field_internal (inlined) + 11.03% getKeyJsonValueFromContainer - 6.26% DatumGetJsonbP (inlined) + 5.78% detoast_attr + 1.21% _start + 0.54% 0x55ddb44552a JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer, then I think JSON_VALUE probably is designed for some more complex path which need to pay extra effort which bring the above performance difference. I added Amit and Alvaro to this thread in case they can have more insight on this. > So some questions to think about: > > 1. Compare performance of base case vs. this patch vs. json_value. Done, as above. > > 2. Can json_value be optimized further? hmm, I have some troubles to understand A's performance boost over B, then who is better. But in my test above, looks the patch here is better on the given case and the differece may comes from JSON_VALUE is designed to handle more generic purpose. > 3. Is this patch still needed? I think yes. One reason is the patch here have better performance, the other reason is the patch here prevent user from changing their existing queries. > > 3a. If yes, should the internal rewriting make use of json_value or > share code with it? As for now, looks json_value is designed for more generic case, not sure if we could share some code. My patch actually doesn't add much code on the json function part. -- Best Regards Andy Fan