Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1udELw-004tTf-3f for pgsql-general@arkaria.postgresql.org; Sat, 19 Jul 2025 20:40:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1udELt-007zhE-Fq for pgsql-general@arkaria.postgresql.org; Sat, 19 Jul 2025 20:40:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1udELs-007zgj-SO for pgsql-general@lists.postgresql.org; Sat, 19 Jul 2025 20:40:13 +0000 Received: from mail-ej1-x635.google.com ([2a00:1450:4864:20::635]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1udELq-008Auy-2t for pgsql-general@lists.postgresql.org; Sat, 19 Jul 2025 20:40:11 +0000 Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-ae0dd7ac1f5so562587166b.2 for ; Sat, 19 Jul 2025 13:40:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752957608; x=1753562408; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Rfl4gql8gfvEziSJnvPqvGX4HZB6LtY7T747KLZxGiQ=; b=aCSua1CBrsVI7QJzwKn+MoDqb2piZ567F0SXVwh1gNd4MMvTpLUbTQfjrQMyOaHH7y MMhl952HyoCtGXsUX9lW7OPqiLk2gOZ8zg4bTrC4f2QpoClzQdT+qN1wQQ+/sJCXHqjl aQS72M/uziHEEX+FsrKp6/GYmWPRENua/B/uhuFgCIy7b26htkLMCVCv38dQLwGx3R79 n8OkdJpvo3650HCrOBYAQsgne3P2D1tJNgk2si1y4zrH7W1X8PnvyTvHoVUPi6s2BViB rEIU/MUWnsEqjANecSFXmhWnn8gGRwd08Nol2CMunrBoHo8RkSLtWqHAT6df6UaIsNCy K9oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752957608; x=1753562408; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Rfl4gql8gfvEziSJnvPqvGX4HZB6LtY7T747KLZxGiQ=; b=rhOr13AAs16cyoZ3O7IkaLRy2shRV6iqkCOJlu6S26ZmC5YvONv3RgSOv/m81ImXqt Lv+onhoogAqN4ygFHB1DNqoeEi4GMG23xJXa+014InrtHIjfHEeLOp7E6gFIUJ6n3630 oqtsYe0kOYd3KuzzFEiYFWo5f8/l1yNO2KneGJrvot4g3/8sBN3cfGDGSsTlW2UAV/Yi GgcKqc/3w0jZly86b4QXAaBy8NtdULS3ndaiVUW/czFQlVKQRehcuwz5mDMDQM4InIuz 6jIEpxo/cSTew8wMLmS98UpTzxJFmgc7BFa2tqg4C33s/EwYrpj0eTUHXkb/oRWQCHtc L98w== X-Gm-Message-State: AOJu0YwRHesGp9t4w8OKkjElsnUhCszERZkdlXRA/8pRkQQRVXofaL2Y fFNSsJB6ysgLI26jE9VP4nWqL/z58u3n0vegi5202aD0PVjTzH5sxpqgdohgbrg5y/1QAOKW7Or XeiXhysxij9yLj2Uy1NhYjSAu2g51Q60= X-Gm-Gg: ASbGnctcdY7PKIYyVtrttLdV8ktbMZPU3TwGr591csdaRXSKtRhIHBAOviS1Jk5AyRB zImNUS1ahJnFJcwJnK5/p9EFPTOFSDWhCRbKqqPHBcucicdT+FT6QqQkfvlKXJOqOqqu771Ixb7 YL5prIuNDW5E37eOz2zQtsSbibvjNFKxGLqaVz6NKFZDrOC3bDfuF6+rHIqaXPz00ifZFmB96tW iHypOg2MFMT0HvaLTVShETmeScVxya3mlMFgA== X-Google-Smtp-Source: AGHT+IG72zwLTMm5V34454xYSlwIyupXU9i3cXJpK9gPTJvUtOGgI81qWr9z3HEFWoetgxViJQBWa0CWUND6fdZ1Cug= X-Received: by 2002:a17:907:1c0e:b0:ae0:d4f2:dff3 with SMTP id a640c23a62f3a-aec6a672617mr698940766b.58.1752957607916; Sat, 19 Jul 2025 13:40:07 -0700 (PDT) MIME-Version: 1.0 References: <8efa554c-ba95-43d5-953c-def0d53dca9e@aklaver.com> In-Reply-To: From: veem v Date: Sun, 20 Jul 2025 02:09:56 +0530 X-Gm-Features: Ac12FXxa7XsH_qLEXNtv9SUH9TXlqvwJ7oiOiBG9xoKzT149PrC1t3QkhYVGbt8 Message-ID: Subject: Re: Performance of JSON type in postgres To: Merlin Moncure , Adrian Klaver , "David G. Johnston" Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000086b7c1063a4e3ffc" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000086b7c1063a4e3ffc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 15 Jul 2025 at 23:02, Merlin Moncure wrote: > On Mon, Jul 14, 2025 at 2:01=E2=80=AFPM David G. Johnston < > david.g.johnston@gmail.com> wrote: > >> On Mon, Jul 14, 2025 at 12:54=E2=80=AFPM Adrian Klaver >> wrote: >> >>> On 7/14/25 12:51, veem v wrote: >>> > So I want to >>> > understand the experts' opinion on this which I believe will be >>> > crucial during design itself. >>> >>> It is spelled out here: >>> >>> https://www.postgresql.org/docs/current/datatype-json.html >>> >>> >> I've taken to heart the main takeaway from that page: >> >> "In general, most applications should prefer to store JSON data as jsonb= , >> unless there are quite specialized needs, such as legacy assumptions abo= ut >> ordering of object keys." >> > > I don't think the documentation is accurate at all, unless one of those > specialized needs is to 'be faster'. json serialization is more than 2x > faster based on simple testing (see below). This is absolutely not a > trivial difference. > > I would say, use json for serialization, use jsonb for data storage, > unless the precise structure of the input document is important. > > merlin > > leaselock_iam@leaselock_prod=3D> explain analyze select json_agg(l) from = ( > select l from llcore.lease l limit 10000) q; > QUERY PLAN > > -------------------------------------------------------------------------= ----------------------------------------------------- > Aggregate (cost=3D405.52..405.53 rows=3D1 width=3D32) (actual > time=3D69.043..69.048 rows=3D1 loops=3D1) > -> Limit (cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual > time=3D0.017..9.764 rows=3D10000 loops=3D1) > -> Seq Scan on lease l (cost=3D0.00..100383.89 rows=3D2638089 > width=3D247) (actual time=3D0.016..8.831 rows=3D10000 loops=3D1) > Planning Time: 0.109 ms > Execution Time: 69.088 ms > (5 rows) > > Time: 160.560 ms > leaselock_iam@leaselock_prod=3D> explain analyze select jsonb_agg(l) from= ( > select l from llcore.lease l limit 10000) q; > QUERY PLAN > > -------------------------------------------------------------------------= ------------------------------------------------------ > Aggregate (cost=3D405.52..405.53 rows=3D1 width=3D32) (actual > time=3D146.139..146.141 rows=3D1 loops=3D1) > -> Limit (cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual > time=3D0.017..20.837 rows=3D10000 loops=3D1) > -> Seq Scan on lease l (cost=3D0.00..100383.89 rows=3D2638089 > width=3D247) (actual time=3D0.016..19.975 rows=3D10000 loops=3D1) > Planning Time: 0.108 ms > Execution Time: 152.277 ms > > Thank you. I tested below for sample data. I see loading or serialization seems a lot slower(twice as slower) in JSONB as compared to JSON. Whereas storage looks efficient in JSONB. and reading performance of nested fields are 7-8 times slower in JSON as compared to JSONB(and ofcourse index support makes it a better choice here). Hope i am testing it correctly here. https://dbfiddle.uk/6P7sjL22 So I am a bit confused here . Also one of our use case is, along with persisting this data and querying it in postgres database, We are also going to move this data from postgres (which is a upstream OLTP system) to a downstream OLAP system ,which is in Snowflake database which is having data types like Variant or Varchar types. So, will it create a significant difference if we store it in JSON vs JSONB in our postgres i.e the source/upstream database? --00000000000086b7c1063a4e3ffc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Tue, = 15 Jul 2025 at 23:02, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Jul 14, 202= 5 at 2:01=E2=80=AFPM David G. Johnston <david.g.johnston@gmail.com> wrote:
O= n Mon, Jul 14, 2025 at 12:54=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.com= > wrote:
On 7/14/25 12:51, veem v wrote:
>= ;=C2=A0So I want to
> understand the experts' opinion on this which=C2=A0I believe=C2=A0= will=C2=A0 be
> crucial=C2=A0during design itself.

It is spelled out here:

https://www.postgresql.org/docs/current/d= atatype-json.html


I've taken to heart the main takeaw= ay from that page:

"In= general, most applications should prefer to store JSON data as jsonb, unle= ss there are quite specialized needs, such as legacy assumptions about orde= ring of object keys."

I don't think the=C2=A0documentation is accurate at all, unless one o= f those specialized needs is to 'be faster'.=C2=A0 =C2=A0json seria= lization is more than 2x faster based on simple testing (see below).=C2=A0 = =C2=A0This is absolutely not a trivial difference.

I would say, use json for serialization, use jsonb for data storage, unles= s the precise structure of the input document is important.

<= /div>
merlin

leaselock_iam@leaselock_prod=3D&g= t; explain analyze select json_agg(l) from ( select l from llcore.lease l l= imit 10000) q;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Q= UERY PLAN
--------------------------------------------------------------= ----------------------------------------------------------------
=C2=A0A= ggregate =C2=A0(cost=3D405.52..405.53 rows=3D1 width=3D32) (actual time=3D6= 9.043..69.048 rows=3D1 loops=3D1)
=C2=A0 =C2=A0-> =C2=A0Limit =C2=A0(= cost=3D0.00..380.52 rows=3D10000 width=3D247) (actual time=3D0.017..9.764 r= ows=3D10000 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Seq= Scan on lease l =C2=A0(cost=3D0.00..100383.89 rows=3D2638089 width=3D247) = (actual time=3D0.016..8.831 rows=3D10000 loops=3D1)
=C2=A0Planning Time:= 0.109 ms
=C2=A0Execution Time: 69.088 ms
(5 rows)

Time: 160.5= 60 ms
leaselock_iam@leaselock_prod=3D> explain analyze select jsonb_a= gg(l) from ( select l from llcore.lease l limit 10000) q;
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN
-------------------= ---------------------------------------------------------------------------= ---------------------------------
=C2=A0Aggregate =C2=A0(cost=3D405.52..= 405.53 rows=3D1 width=3D32) (actual time=3D146.139..146.141 rows=3D1 loops= =3D1)
=C2=A0 =C2=A0-> =C2=A0Limit =C2=A0(cost=3D0.00..380.52 rows=3D1= 0000 width=3D247) (actual time=3D0.017..20.837 rows=3D10000 loops=3D1)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Seq Scan on lease l =C2=A0(co= st=3D0.00..100383.89 rows=3D2638089 width=3D247) (actual time=3D0.016..19.9= 75 rows=3D10000 loops=3D1)
=C2=A0Planning Time: 0.108 ms
=C2=A0Execut= ion Time: 152.277 ms
=C2=A0

Thank you.

I tested below for sam= ple data. I see loading or serialization seems a lot slower(twice as slower= ) in JSONB as compared to JSON. Whereas storage looks efficient in JSONB. a= nd reading performance of nested fields are 7-8 times slower in JSON as com= pared to JSONB(and ofcourse index support makes it a better choice here). H= ope i am testing it correctly here.
So I am = a bit confused here . Also one of our use case=C2=A0is, along with persisti= ng this data and querying it in postgres database, We are also going to mov= e this data from postgres (which is a upstream OLTP system) to a downstream= OLAP system ,which is in Snowflake database which is having data types lik= e Variant or Varchar types. So, will it create a significant difference if = we store it in JSON vs JSONB in our postgres i.e the source/upstream databa= se?=C2=A0
--00000000000086b7c1063a4e3ffc--