Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s4RzT-0043RB-Ba for pgsql-general@arkaria.postgresql.org; Tue, 07 May 2024 21:04:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s4RzQ-005Aa0-QW for pgsql-general@arkaria.postgresql.org; Tue, 07 May 2024 21:04:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s4RzQ-005AZs-73 for pgsql-general@lists.postgresql.org; Tue, 07 May 2024 21:04:45 +0000 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s4RzM-001vZk-KK for pgsql-general@lists.postgresql.org; Tue, 07 May 2024 21:04:43 +0000 Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-a4702457ccbso946987966b.3 for ; Tue, 07 May 2024 14:04:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=novo-cloud.20230601.gappssmtp.com; s=20230601; t=1715115877; x=1715720677; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=D0dl58ncQHjFfcfCmuqER4roh4dAyrOSP3iiKFQ2r2g=; b=GV20WnMSCw3WzlS11FjOOgHMaq+bsjE5TB8+nf6I3/TrkvrpnNSunqUbGr2twp4pFd s0bgk4EIjhbAd9uR3dILYz10ic7w+onDD/cwFsKrdw0dcfta87MX2S7GjQAQ5YiDTTCq AehFBgyI7axs/DO6GoQnRj4cSsmz+OUGS4kk+dtZrd9uDuH5gC0i2tSWs3IhUppbptii A8zrv1toEUyMThGu5Sfar7Vbd/7sfDqI8Rlprs7W9/61N5JTreAmpi6eAfDeg/cimAzO bgvcpf+qdVSUqTdmHixofLFcghq+yk2X+hb8YuKyTgx2FBShlfAkuPM6R1eSH+iywPCj T9UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715115877; x=1715720677; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=D0dl58ncQHjFfcfCmuqER4roh4dAyrOSP3iiKFQ2r2g=; b=KqtfyyLLUMzXSj+xxXXPgxKRxQB5o638r2V4QLCj30daYiQseL8RCvGUFUV/2jRzZ3 8ZOk7GgEUmWjQCUAFhJOH/2ILqbb8sliO1onD6SD3JchjSiQ1LKmPQjvEuprNDr2G+if Rs5xtA9z3BymRPwT4fDFaK1FovP20yuRiFfNKCzS77LNXRWKfG3KuxpXDo3HZxiXY3dJ XT6GuklggQOU6jFQ3xCrBeJ4FMSrISzQpiftnhovOyop1HB2EuB9o8rQd2gmfBA+S+Wz IxOacOtax3eNAH8bl/6//fQPLZsm2yRqV+HCTAr3tDCt/qNH3vKptRbj95yiJKQRjwh9 NGLg== X-Gm-Message-State: AOJu0Yymr6koxRTmZF4agMyrtMjh1XUm4sXQ3q0QMn84Qi5BjeBdaYBj 9lVj4dRiAoyrtF+/SyeLydQFMQEa8eIRZqJceda/Bl3SGSQocMEyLPojI0pvjNqsl5K/j6Gh6uZ vUV8uxZthXEGFe7kJPfkh9KbYFmoscWsdED6scSMpyCC0XnsP6uw= X-Google-Smtp-Source: AGHT+IFi1R3bZfXmJjZNO1ahnuMdw486HasNiSluIzC/tLgoAfvJDvCBvokNT9+LoagjxVQp8OgpAr5QZC14F9rZeX8= X-Received: by 2002:a17:906:80c2:b0:a59:d243:945 with SMTP id a640c23a62f3a-a59fb9ce376mr34813366b.57.1715115877356; Tue, 07 May 2024 14:04:37 -0700 (PDT) MIME-Version: 1.0 From: Mike Jarmy Date: Tue, 7 May 2024 17:04:27 -0400 Message-ID: Subject: Using GIN Index to see if a nested key exists in JSONB To: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000009e75a40617e388be" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009e75a40617e388be Content-Type: text/plain; charset="UTF-8" I have a table of semi-structured json that I am storing in a JSONB column with a GIN index: create table foo ( id text primary key, obj jsonb ); create index foo_obj on foo using gin (obj); I populated the table with 10,000 rows of randomly generated JSON objects, with random amounts of nesting. Here are the first few rows: {"b": 0, "c": 3, "d": 2, "id": "0"} {"b": 4, "c": {"a": 3, "b": 4, "c": 4, "d": 1}, "id": "1"} {"a": 3, "b": 1, "c": 0, "d": 0, "id": "2"} {"a": 3, "b": 2, "c": 4, "d": 1, "id": "3"} {"a": 2, "b": {"b": 3, "c": 2, "d": 0}, "c": 3, "id": "4"} {"c": 2, "id": "5"} {"b": {"a": {"d": 3}, "b": 2, "c": 3, "d": 3}, "id": "6"} {"a": 0, "b": 1, "id": "7"} {"a": 1, "c": 3, "d": 3, "id": "8"} {"a": 4, "b": 3, "c": 1, "d": 1, "id": "9"} {"a": 0, "b": {"a": 3, "b": 0, "d": 3}, "d": 0, "id": "10"} {"a": {"b": 2}, "b": 4, "c": 4, "d": 4, "id": "11"} {"a": {"a": 2, "b": 3, "c": 1, "d": {"c": 4}}, "c": 1, "d": 0, "id": "12"} ... There are two queries that I want to issue using the GIN index. The first query, which seems to work, returns the 2 records in the test data set that happen to have a nested key with a certain value. If I use EXPLAIN on the query, it shows me that the index is being used. postgres=# explain analyze select obj from foo where (obj @@ '$.a.a.a.a == 1'::jsonpath); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on foo (cost=44.01..48.02 rows=1 width=102) (actual time=3.531..5.402 rows=2 loops=1) Recheck Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath) Rows Removed by Index Recheck: 4211 Heap Blocks: exact=173 -> Bitmap Index Scan on foo_obj (cost=0.00..44.01 rows=1 width=0) (actual time=2.330..2.331 rows=4213 loops=1) Index Cond: (obj @@ '($."a"."a"."a"."a" == 1)'::jsonpath) Planning Time: 0.276 ms Execution Time: 5.464 ms (8 rows) The second query is the one I'm having trouble with. For this one, I just want to know if a nested key is present in the row -- I don't care what its value is. This query works in the sense that it does return the 13 rows out of 10,000 that happen to have the given nested key. However, EXPLAIN shows that the index isn't being used. postgres=# explain analyze select obj from foo where (obj @? '$.a.a.a.a'::jsonpath); QUERY PLAN -------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..298.00 rows=1 width=102) (actual time=0.549..4.360 rows=13 loops=1) Filter: (obj @? '$."a"."a"."a"."a"'::jsonpath) Rows Removed by Filter: 9987 Planning Time: 0.164 ms Execution Time: 4.384 ms How can I fix the second query so it uses the GIN index? Maybe the first query is inefficient too and I just don't understand what EXPLAIN is telling me? Is there anything else I'm doing wrong or have muddled up? I'm new to JSONB. P.S. For what its worth, I've come up with a variant of the first query that also works, and that also uses the index: select obj from foo where (obj @> '{"a": {"a": {"a": {"a": 1}}}}'::jsonb); --0000000000009e75a40617e388be Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have a table of semi-structured json that I am storing i= n a JSONB column with
a GIN index:

=C2=A0 =C2=A0 create table foo= (
=C2=A0 =C2=A0 =C2=A0 id text primary key,
=C2=A0 =C2=A0 =C2=A0 obj= jsonb
=C2=A0 =C2=A0 );
=C2=A0 =C2=A0 create index foo_obj on foo usi= ng gin (obj);

I populated the table with 10,000 rows of randomly gen= erated JSON objects, with
random amounts of nesting.=C2=A0 Here are the = first few rows:

=C2=A0 =C2=A0 =C2=A0{"b": 0, "c"= : 3, "d": 2, "id": "0"}
=C2=A0 =C2=A0 =C2= =A0{"b": 4, "c": {"a": 3, "b": 4, &= quot;c": 4, "d": 1}, "id": "1"}
=C2= =A0 =C2=A0 =C2=A0{"a": 3, "b": 1, "c": 0, &qu= ot;d": 0, "id": "2"}
=C2=A0 =C2=A0 =C2=A0{"= ;a": 3, "b": 2, "c": 4, "d": 1, "id= ": "3"}
=C2=A0 =C2=A0 =C2=A0{"a": 2, "b&qu= ot;: {"b": 3, "c": 2, "d": 0}, "c":= 3, "id": "4"}
=C2=A0 =C2=A0 =C2=A0{"c": 2= , "id": "5"}
=C2=A0 =C2=A0 =C2=A0{"b": {&q= uot;a": {"d": 3}, "b": 2, "c": 3, "= d": 3}, "id": "6"}
=C2=A0 =C2=A0 =C2=A0{"a= ": 0, "b": 1, "id": "7"}
=C2=A0 =C2= =A0 =C2=A0{"a": 1, "c": 3, "d": 3, "id&q= uot;: "8"}
=C2=A0 =C2=A0 =C2=A0{"a": 4, "b"= ;: 3, "c": 1, "d": 1, "id": "9"}=C2=A0 =C2=A0 =C2=A0{"a": 0, "b": {"a": 3, &= quot;b": 0, "d": 3}, "d": 0, "id": "= ;10"}
=C2=A0 =C2=A0 =C2=A0{"a": {"b": 2}, "= ;b": 4, "c": 4, "d": 4, "id": "11&q= uot;}
=C2=A0 =C2=A0 =C2=A0{"a": {"a": 2, "b&quo= t;: 3, "c": 1, "d": {"c": 4}}, "c":= 1, "d": 0, "id": "12"}
=C2=A0 =C2=A0 =C2= =A0...

There are two queries that I want to issue using the GIN inde= x.=C2=A0 The first
query, which seems to work, returns the 2 records in = the test data set that
happen to have a nested key with a certain value.= If I use EXPLAIN on the query,
it shows me that the index is being used= .

=C2=A0 =C2=A0 postgres=3D# explain analyze select obj from foo whe= re (obj @@ '$.a.a.a.a =3D=3D 1'::jsonpath);

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN
=C2=A0 =C2=A0 ------= ---------------------------------------------------------------------------= -----------------------------------
=C2=A0 =C2=A0 =C2=A0Bitmap Heap Scan= on foo =C2=A0(cost=3D44.01..48.02 rows=3D1 width=3D102) (actual time=3D3.5= 31..5.402 rows=3D2 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0Recheck Cond: (= obj @@ '($."a"."a"."a"."a" =3D= =3D 1)'::jsonpath)
=C2=A0 =C2=A0 =C2=A0 =C2=A0Rows Removed by Index = Recheck: 4211
=C2=A0 =C2=A0 =C2=A0 =C2=A0Heap Blocks: exact=3D173
=C2= =A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Bitmap Index Scan on foo_obj =C2=A0(cos= t=3D0.00..44.01 rows=3D1 width=3D0) (actual time=3D2.330..2.331 rows=3D4213= loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: = (obj @@ '($."a"."a"."a"."a" =3D= =3D 1)'::jsonpath)
=C2=A0 =C2=A0 =C2=A0Planning Time: 0.276 ms
= =C2=A0 =C2=A0 =C2=A0Execution Time: 5.464 ms
=C2=A0 =C2=A0 (8 rows)
<= br>The second query is the one I'm having trouble with.=C2=A0 For this = one, I just want
to know if a nested key is present in the row -- I don&= #39;t care what its value is.
This query works in the sense that it does= return the 13 rows out of 10,000 that
happen to have the given nested k= ey.=C2=A0 However, EXPLAIN shows that the index
isn't being used.
=C2=A0 =C2=A0 postgres=3D# explain analyze select obj from foo where (= obj @? '$.a.a.a.a'::jsonpath);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PL= AN
=C2=A0 =C2=A0 -------------------------------------------------------= -------------------------------------------
=C2=A0 =C2=A0 =C2=A0Seq Scan= on foo =C2=A0(cost=3D0.00..298.00 rows=3D1 width=3D102) (actual time=3D0.5= 49..4.360 rows=3D13 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0Filter: (obj @= ? '$."a"."a"."a"."a"'::json= path)
=C2=A0 =C2=A0 =C2=A0 =C2=A0Rows Removed by Filter: 9987
=C2=A0 = =C2=A0 =C2=A0Planning Time: 0.164 ms
=C2=A0 =C2=A0 =C2=A0Execution Time:= 4.384 ms

How can I fix the second query so it uses the GIN index? M= aybe the first query
is inefficient too and I just don't understand = what EXPLAIN is telling me? Is
there anything else I'm doing wrong o= r have muddled up?=C2=A0 I'm new to JSONB.

P.S. For what its wor= th, I've come up with a variant of the first query that also
works, = and that also uses the index:

=C2=A0 =C2=A0 select obj from foo wher= e (obj @> '{"a": {"a": {"a": {"a&= quot;: 1}}}}'::jsonb);
--0000000000009e75a40617e388be--