MIME-Version: 1.0
References: <CAB+=1TU2bi8UkdD94kMizsTrTBUgqbdWKtM4Lq67+BMi6Vt1qA@mail.gmail.com>
 <8efa554c-ba95-43d5-953c-def0d53dca9e@aklaver.com> <CAKFQuwarbmz7=yg14h6u-UhGPfbT56TsX8S6w9wVdMHGiWxbYg@mail.gmail.com>
 <CAHyXU0xQGXFBZ10GtqTkXL3_b8FbB79qP+XS2XCfxp+6WuH1Cg@mail.gmail.com>
In-Reply-To: <CAHyXU0xQGXFBZ10GtqTkXL3_b8FbB79qP+XS2XCfxp+6WuH1Cg@mail.gmail.com>
From: veem v <veema0000@gmail.com>
Date: Sun, 20 Jul 2025 02:09:56 +0530
Message-ID: <CAB+=1TWwerQMVQvOP0oAVunTUz8tnX425Jn-84DBYbOD+us9Eg@mail.gmail.com>
Subject: Re: Performance of JSON type in postgres
To: Merlin Moncure <mmoncure@gmail.com>, Adrian Klaver <adrian.klaver@aklaver.com>, 
	"David G. Johnston" <david.g.johnston@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000086b7c1063a4e3ffc"
Archived-At: <https://www.postgresql.org/message-id/CAB%2B%3D1TWwerQMVQvOP0oAVunTUz8tnX425Jn-84DBYbOD%2Bus9Eg%40mail.gmail.com>
Precedence: bulk

--00000000000086b7c1063a4e3ffc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, 15 Jul 2025 at 23:02, Merlin Moncure <mmoncure@gmail.com> wrote:

> On Mon, Jul 14, 2025 at 2:01=E2=80=AFPM David G. Johnston <
> david.g.johnston@gmail.com> wrote:
>
>> On Mon, Jul 14, 2025 at 12:54=E2=80=AFPM Adrian Klaver <adrian.klaver@ak=
laver.com>
>> wrote:
>>
>>> On 7/14/25 12:51, veem v wrote:
>>> > So I want to
>>> > understand the experts' opinion on this which I believe will  be
>>> > crucial during design itself.
>>>
>>> It is spelled out here:
>>>
>>> https://www.postgresql.org/docs/current/datatype-json.html
>>>
>>>
>> I've taken to heart the main takeaway from that page:
>>
>> "In general, most applications should prefer to store JSON data as jsonb=
,
>> unless there are quite specialized needs, such as legacy assumptions abo=
ut
>> ordering of object keys."
>>
>
> I don't think the documentation is accurate at all, unless one of those
> specialized needs is to 'be faster'.   json serialization is more than 2x
> faster based on simple testing (see below).   This is absolutely not a
> trivial difference.
>
> I would say, use json for serialization, use jsonb for data storage,
> unless the precise structure of the input document is important.
>
> merlin
>
> leaselock_iam@leaselock_prod=3D> explain analyze select json_agg(l) from =
(
> select l from llcore.lease l limit 10000) q;
>                                                           QUERY PLAN
>
> -------------------------------------------------------------------------=
-----------------------------------------------------
>  Aggregate  (cost=3D405.52..405.53 rows=3D1 width=3D32) (actual
> time=3D69.043..69.048 rows=3D1 loops=3D1)
>    ->  Limit  (cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual
> time=3D0.017..9.764 rows=3D10000 loops=3D1)
>          ->  Seq Scan on lease l  (cost=3D0.00..100383.89 rows=3D2638089
> width=3D247) (actual time=3D0.016..8.831 rows=3D10000 loops=3D1)
>  Planning Time: 0.109 ms
>  Execution Time: 69.088 ms
> (5 rows)
>
> Time: 160.560 ms
> leaselock_iam@leaselock_prod=3D> explain analyze select jsonb_agg(l) from=
 (
> select l from llcore.lease l limit 10000) q;
>                                                           QUERY PLAN
>
> -------------------------------------------------------------------------=
------------------------------------------------------
>  Aggregate  (cost=3D405.52..405.53 rows=3D1 width=3D32) (actual
> time=3D146.139..146.141 rows=3D1 loops=3D1)
>    ->  Limit  (cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual
> time=3D0.017..20.837 rows=3D10000 loops=3D1)
>          ->  Seq Scan on lease l  (cost=3D0.00..100383.89 rows=3D2638089
> width=3D247) (actual time=3D0.016..19.975 rows=3D10000 loops=3D1)
>  Planning Time: 0.108 ms
>  Execution Time: 152.277 ms
>
>


Thank you.

I tested below for sample data. I see loading or serialization seems a lot
slower(twice as slower) in JSONB as compared to JSON. Whereas storage looks
efficient in JSONB. and reading performance of nested fields are 7-8 times
slower in JSON as compared to JSONB(and ofcourse index support makes it a
better choice here). Hope i am testing it correctly here.

https://dbfiddle.uk/6P7sjL22

So I am a bit confused here . Also one of our use case is, along with
persisting this data and querying it in postgres database, We are also
going to move this data from postgres (which is a upstream OLTP system) to
a downstream OLAP system ,which is in Snowflake database which is having
data types like Variant or Varchar types. So, will it create a significant
difference if we store it in JSON vs JSONB in our postgres i.e the
source/upstream database?

--00000000000086b7c1063a4e3ffc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div><br></div></div><div class=3D"gmail_=
quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, =
15 Jul 2025 at 23:02, Merlin Moncure &lt;<a href=3D"mailto:mmoncure@gmail.c=
om">mmoncure@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">On Mon, Jul 14, 202=
5 at 2:01=E2=80=AFPM David G. Johnston &lt;<a href=3D"mailto:david.g.johnst=
on@gmail.com" target=3D"_blank">david.g.johnston@gmail.com</a>&gt; wrote:</=
div><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><div style=3D"font-family:arial,hel=
vetica,sans-serif"><span style=3D"font-family:Arial,Helvetica,sans-serif">O=
n Mon, Jul 14, 2025 at 12:54=E2=80=AFPM Adrian Klaver &lt;<a href=3D"mailto=
:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klaver@aklaver.com</a>=
&gt; wrote:</span></div></div><div class=3D"gmail_quote"><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">On 7/14/25 12:51, veem v wrote:<br><span c=
lass=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif">&gt=
;=C2=A0</span>So I want to <br>
&gt; understand the experts&#39; opinion on this which=C2=A0I believe=C2=A0=
will=C2=A0 be <br>
&gt; crucial=C2=A0during design itself.<br>
<br>
It is spelled out here:<br>
<br>
<a href=3D"https://www.postgresql.org/docs/current/datatype-json.html" rel=
=3D"noreferrer" target=3D"_blank">https://www.postgresql.org/docs/current/d=
atatype-json.html</a><br><br></blockquote><div><br></div><div style=3D"font=
-family:arial,helvetica,sans-serif">I&#39;ve taken to heart the main takeaw=
ay from that page:</div><div style=3D"font-family:arial,helvetica,sans-seri=
f"><br></div><div style=3D"font-family:arial,helvetica,sans-serif">&quot;In=
 general, most applications should prefer to store JSON data as jsonb, unle=
ss there are quite specialized needs, such as legacy assumptions about orde=
ring of object keys.&quot;</div></div></div></blockquote><div><br></div><di=
v>I don&#39;t think the=C2=A0documentation is accurate at all, unless one o=
f those specialized needs is to &#39;be faster&#39;.=C2=A0 =C2=A0json seria=
lization is more than 2x faster based on simple testing (see below).=C2=A0 =
=C2=A0This is absolutely not a trivial difference.</div><div><br></div><div=
>I would say, use json for serialization, use jsonb for data storage, unles=
s the precise structure of the input document is important.</div><div><br><=
/div><div>merlin</div><div><br></div><div>leaselock_iam@leaselock_prod=3D&g=
t; explain analyze select json_agg(l) from ( select l from llcore.lease l l=
imit 10000) q;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Q=
UERY PLAN<br>--------------------------------------------------------------=
----------------------------------------------------------------<br>=C2=A0A=
ggregate =C2=A0(cost=3D405.52..405.53 rows=3D1 width=3D32) (actual time=3D6=
9.043..69.048 rows=3D1 loops=3D1)<br>=C2=A0 =C2=A0-&gt; =C2=A0Limit =C2=A0(=
cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual time=3D0.017..9.764 r=
ows=3D10000 loops=3D1)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Seq=
 Scan on lease l =C2=A0(cost=3D0.00..100383.89 rows=3D2638089 width=3D247) =
(actual time=3D0.016..8.831 rows=3D10000 loops=3D1)<br>=C2=A0Planning Time:=
 0.109 ms<br>=C2=A0Execution Time: 69.088 ms<br>(5 rows)<br><br>Time: 160.5=
60 ms<br>leaselock_iam@leaselock_prod=3D&gt; explain analyze select jsonb_a=
gg(l) from ( select l from llcore.lease l limit 10000) q;<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN<br>-------------------=
---------------------------------------------------------------------------=
---------------------------------<br>=C2=A0Aggregate =C2=A0(cost=3D405.52..=
405.53 rows=3D1 width=3D32) (actual time=3D146.139..146.141 rows=3D1 loops=
=3D1)<br>=C2=A0 =C2=A0-&gt; =C2=A0Limit =C2=A0(cost=3D0.00..380.52 rows=3D1=
0000 width=3D247) (actual time=3D0.017..20.837 rows=3D10000 loops=3D1)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Seq Scan on lease l =C2=A0(co=
st=3D0.00..100383.89 rows=3D2638089 width=3D247) (actual time=3D0.016..19.9=
75 rows=3D10000 loops=3D1)<br>=C2=A0Planning Time: 0.108 ms<br>=C2=A0Execut=
ion Time: 152.277 ms</div><div>=C2=A0</div></div></div></blockquote><div><b=
r></div><div><br></div>Thank you.<div><br></div><div>I tested below for sam=
ple data. I see loading or serialization seems a lot slower(twice as slower=
) in JSONB as compared to JSON. Whereas storage looks efficient in JSONB. a=
nd reading performance of nested fields are 7-8 times slower in JSON as com=
pared to JSONB(and ofcourse index support makes it a better choice here). H=
ope i am testing it correctly here.</div><div><br><a href=3D"https://dbfidd=
le.uk/6P7sjL22">https://dbfiddle.uk/6P7sjL22</a><br><br></div><div>So I am =
a bit confused here . Also one of our use case=C2=A0is, along with persisti=
ng this data and querying it in postgres database, We are also going to mov=
e this data from postgres (which is a upstream OLTP system) to a downstream=
 OLAP system ,which is in Snowflake database which is having data types lik=
e Variant or Varchar types. So, will it create a significant difference if =
we store it in JSON vs JSONB in our postgres i.e the source/upstream databa=
se?=C2=A0</div></div></div>

--00000000000086b7c1063a4e3ffc--