Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgrLm-005Zv3-26 for pgsql-hackers@arkaria.postgresql.org; Fri, 16 Jan 2026 21:27:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vgrLl-00580U-2S for pgsql-hackers@arkaria.postgresql.org; Fri, 16 Jan 2026 21:27:22 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgrLl-00580J-1E for pgsql-hackers@lists.postgresql.org; Fri, 16 Jan 2026 21:27:21 +0000 Received: from mail-ot1-x344.google.com ([2607:f8b0:4864:20::344]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vgrLj-000swc-0l for pgsql-hackers@lists.postgresql.org; Fri, 16 Jan 2026 21:27:21 +0000 Received: by mail-ot1-x344.google.com with SMTP id 46e09a7af769-7cfcb46ffc9so2103892a34.0 for ; Fri, 16 Jan 2026 13:27:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768598836; x=1769203636; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=nnzSVoBDl0C8l726MOXK9XH3h9lxbqwdDGHslrxDWyQ=; b=gjqQCY+UWivTTNM2MJ7D414ncPQTV0aOxVBjTLe2vV52UQkqNakukG0NVrjSssXsIo eEaLZAqYQozxRGc4eL6SSgbRZJd51yP19m5V7lSNsuCvlKARuEVt6sS3SOg673bkT0Mn rxs1e4eU/KdMrSVdjX9EsnHDGsDpre0Td5WAj685+kt4XlXnypD5saKLAG2XNKpSmrFz CQgms1qWJfZBmtSP1kitYliDmrqruXNrGXf1f04HvqcY6YabKXb7USbxkvId9gOEUPqm kAhHrdXfSQ0VTVmfBShRnPe7VUz7JCG9JEXJYbN9rIuF0PcoT7InWf7QM7m93425q0BN iFRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768598836; x=1769203636; h=cc:to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nnzSVoBDl0C8l726MOXK9XH3h9lxbqwdDGHslrxDWyQ=; b=gZkBgJhfTxKVDMBIKpdrBBto6iu15qFOdStcQ1c6bxxcYoPVKcBRfgx/n6volfaFfK ZP1a/q0lHOPFf2AvYGIrtXMsSLgZu0EPvSW/c7LA9/InVt82TL6BqSCHSDhoiTEBYg44 VeeeYUzrhM0CqT4f+fX94dywP2qsmfylBYhCQ3JuxgBIIzrg5Y8c5vj370Dlvs1dxAI4 0eDhq45ZDNL1xepsDmhaziMJHJYyidYGSgCASFcYkSfuMCOuzEOox3s6Acu6DrYYnqJw KkPS19BcW2v8jmABbBisrmXEonCaDZ2Leb1nVlyE9en4QSmSNzZEfxyzm4H12GCm7z3d 2Aog== X-Gm-Message-State: AOJu0Yya+w0mmPJ7XxM2DJYuYKPalP10SF5QJEiKFcLfa8LeJt6v4gOo sVdJzGah5hsNxte+Kz7Y9MTUJRDfMlMXuFMQzFOA/L5/ETBG8kFJM1+fOQvfgap8oRB2USEPAam dIS4WvAI8s7CC6qVfW+t4YKj76T1PiMZIEobV7kQ= X-Gm-Gg: AY/fxX5CSOyodDjtnN5YsgamQH6sQBb7zAsi5gTbMaADGrSwX1q8Vag8z/qT0rM84TE ZpOhzZari9c39NsFmMxc+OHcjdQgwcmTTR10juOhjRExCNvO46RYNrUwynPed5V2F6JiUhWBNpg Dz5t5GtFGpNVyqhxvnrn8JUXECrmxkWRa43FkZ1Jxv/R/FZ4q9E30OyOYFCz72D/YWk3Q0zun2x +aT7PcFqOrvIvnqCv2NKrV2/t1Bm1tAh/Hr8tmVJZlOzjBlA6xDsb9iBmX+y6BRDeE5tvkgMfH+ YulePFil X-Received: by 2002:a05:6808:f05:b0:453:ea2:36d9 with SMTP id 5614622812f47-45c9c9186bfmr1672313b6e.32.1768598835762; Fri, 16 Jan 2026 13:27:15 -0800 (PST) MIME-Version: 1.0 From: Alexandra Wang Date: Fri, 16 Jan 2026 13:26:37 -0800 X-Gm-Features: AZwV_QiYO27UxK17b2lRy1BnNbspLyNF4uyqOqMlbatEWdh-qRojP4JQa_aDAyI Message-ID: Subject: jsonb subscripting vs SQL/JSON array accessor semantics (SQL:2023) To: PostgreSQL Hackers Cc: Peter Eisentraut Content-Type: multipart/alternative; boundary="0000000000005b1bf40648880185" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005b1bf40648880185 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi hackers, While working on a patch [1] to add SQL/JSON simplified accessor support (dot notation, wildcards, etc.) to jsonb, I ran into a semantic mismatch between PostgreSQL=E2=80=99s existing jsonb subscripting = [2] syntax (value[...]), which predates the SQL/JSON standard, and the SQL:2023 specification of the JSON array accessor (also value[...]) in the SQL/JSON path language. Because both use bracket notation for array access, this creates several edge cases where the existing behavior in PostgreSQL differs from what the standard specifies, and from what json_query() already implemented today. I=E2=80=99d like to summarize the differences, discuss whether we should reconcile them, and get guidance on how we want to proceed. --- ## Context PostgreSQL currently allows jsonb subscripts that are coercible to int4 or text: -- object member access SELECT ('{"a": 1}'::jsonb)['a']; -- returns 1 (a jsonb number) -- array element access SELECT ('[1, "2", null]'::jsonb)[1]; -- returns "2" (a jsonb string) In contrast, SQL:2023 defines array access as part of the SQL/JSON path language: ::=3D ::=3D [ { }... ] where each subscript expression must evaluate to a singleton numeric value (except for the special *last* keyword). Object member access is performed using dot notation, not ['key']. The standard further specifies that a simplified accessor expression is semantically equivalent to: JSON_QUERY (VEP, 'lax $JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) where: VEP =3D JC =3D For example: select ('[1, "2", null]'::jsonb)[1]; by the standard's specification is semantically equivalent to: select json_query('[1, "2", null]'::jsonb, 'lax $[1]' with conditional array wrapper); However, as Peter Eisentraut pointed out in [3], these two forms currently diverge in several cases. --- ## Observed differences Below are the main cases where jsonb subscripting and SQL/JSON path semantics differ. ### Case 1a: Array access on a jsonb primitive value select ('123'::jsonb)[0]; jsonb ------- (1 row) select json_query('123'::jsonb, 'lax $[0]' with conditional array wrapper); json_query ------------ 123 (1 row) The json_query() behavior is correct and comply with the SQL standard: > In lax mode: > If an operation requires an SQL/JSON array but the operand is not an > SQL/JSON array, then the operand is first "wrapped" in an SQL/JSON > array prior to performing the operation. So, whenever there is a 'lax $[0]' jsonpath expression applying on a non-json array, the return value should always be the same as the original json value expression, in this case, the same number 123. The jsonb subscripting result is not only different, but also contradicts its current documentation [2], which suggests raising error: > However, if any val itself or one of the intermediary values is > defined as a non-object such as a string, number, or jsonb null, > traversal cannot proceed so an error is raised and the transaction > aborted. --- ### Case 1b: Array access on a jsonb object select ('{"a": 1}'::jsonb)[0]; jsonb ------- (1 row) select json_query('{"a": 1}'::jsonb, 'lax $[0]' with conditional array wrapper); json_query ------------ {"a": 1} (1 row) Same issue as Case 1a, but with an jsonb object instead of a scalar. json_query() returns the same input json object as expected, whereas the jsonb subscripting returns null. The jsonb subscripting behavior here is undocumented. ### Case 1c: Chained array access on a one-dimensional json array select ('[1, "2", null]'::jsonb)[1][0]; jsonb ------- (1 row) select json_query('[1, "2", null]'::jsonb, 'lax $[1][0]' with conditional array wrapper); json_query ------------ "2" (1 row) This reduces to Case 1a: the intermediate result of [1] is a scalar ("2"), which SQL/JSON treats as a single-element array in 'lax' mode. --- ### Case 2: Non-integer array index select ('[1, "2", null]'::jsonb)[1.0]; ERROR: subscript type numeric is not supported LINE 1: select ('[1, "2", null]'::jsonb)[1.0]; ^ HINT: jsonb subscript must be coercible to either integer or text. select json_query('[1, "2", null]'::jsonb, 'lax $[1.0]' with conditional array wrapper); json_query ------------ "2" (1 row) select json_query('[1, "2", null]'::jsonb, 'lax $[1.9]' with conditional array wrapper); json_query ------------ "2" (1 row) The standard allows non-integer numeric subscripts, with implementation-defined rounding or truncation (IA073). PostgreSQL=E2=80=99s json_query() uses truncation, which is compliant. jsonb subscripting does not support this at all, which I think is reasonable, but again diverges from the standard behavior. For reference, since DuckDB and Oracle both support JSON simplified accessor, I tried similar queries on them as well: -- DuckDB duckdb> select ('[1, "2", null]'::json)[0.4]; -- returns 1 duckdb> select ('[1, "2", null]'::json)[0.5]; -- returns 2 -- Oracle WITH data AS ( SELECT JSON('[1, "2", null]') AS jcol ) SELECT d.jcol[1.5] FROM data d; -- error So DuckDB does rounding, and Orcale errors due to "an invalid array index". --- ### Case 3: Negative array index select ('[1, "2", null]'::jsonb)[-2]; jsonb ------- "2" (1 row) select json_query('[1, "2", null]'::jsonb, 'lax $[-2]' with conditional array wrapper); json_query ------------ (1 row) select json_query('[1, "2", null]'::jsonb, 'lax $[-0.9]' with conditional array wrapper); json_query ------------ 1 (1 row) The json_query() behavior is correct according to the standard for out of range index in 'lax' mode; and the jsonb subscripting is a custom extension (-1 means last element, etc.). Interestingly, DuckDB's JSON array accessor made the same custom choice as PostgreSQL for negative index. --- ## Summary In all cases above, json_query() follows the SQL/JSON standard specification. jsonb subscripting, which predates the standard, differs in several ways: 1. Array access on non-arrays (scalars or objects) does not use 'lax' mode wrapping. As a result, "[0]" does not return the original value. 2. Non-integer subscripts are not supported. 3. Negative subscripts use a PostgreSQL-specific extension. These are all edge cases and likely low-impact in practice. However, if we were to add broader support for SQL/JSON simplified accessors (dot notation, wildcards, item methods, etc.), which can be chained together and include array access using the existing bracket syntax, it seems useful to clarify what semantics we would want for array access in those situations. For example, with expressions such as select (jb)[0].a select (jb).a[0].b select (jb).a[-1].b (where jb is a jsonb value), it would be potentially confusing if their behavior differed from the equivalent json_query() calls using the same JSON path. Given this, it is unclear to me whether the expectation should be to move closer to the SQL/JSON path semantics, or to preserve the existing jsonb subscripting behavior and document it as PostgreSQL-specific. This question comes up in the context of the dot-notation work I mentioned at the beginning, but more generally it seems like something we may want to be explicit about before extending the syntax further. I would very much appreciate any thoughts or guidance on this. Best, Alex [1] https://commitfest.postgresql.org/patch/5214/ [2] https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTI= NG [3] https://www.postgresql.org/message-id/a8843d0a-8adb-4fdc-9ac8-8efd22f7d29c%= 40eisentraut.org [4] https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJS= ON-STRICT-AND-LAX-MODES --=20 Alexandra Wang EDB: https://www.enterprisedb.com --0000000000005b1bf40648880185 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi hackers,

While working on a patch [1] to ad= d SQL/JSON simplified accessor
support (dot notation, wildcards, etc.) t= o jsonb, I ran into a
semantic mismatch between PostgreSQL=E2=80=99s exi= sting jsonb subscripting [2]
syntax (value[...]), which predates the SQL= /JSON standard, and the
SQL:2023 specification of the JSON array accesso= r (also value[...]) in
the SQL/JSON path language.

Because both u= se bracket notation for array access, this creates
several edge cases wh= ere the existing behavior in PostgreSQL differs
from what the standard s= pecifies, and from what json_query() already
implemented today.

I= =E2=80=99d like to summarize the differences, discuss whether we should
= reconcile them, and get guidance on how we want to proceed.

---
<= br>## Context

PostgreSQL currently allows jsonb subscripts that are = coercible to
int4 or text:

-- object member access
SELECT (= 9;{"a": 1}'::jsonb)['a']; -- returns 1 (a jsonb numbe= r)

-- array element access
SELECT ('[1, "2", null]&= #39;::jsonb)[1]; -- returns "2" (a jsonb string)

In contra= st, SQL:2023 defines array access as part of the SQL/JSON
path language:=

<JSON array accessor> ::=3D
=C2=A0 <left bracket> &l= t;JSON subscript list> <right bracket>
<JSON subscript list&= gt; ::=3D
=C2=A0 <JSON subscript> [ { <comma> <JSON subsc= ript> }... ]

where each subscript expression must evaluate to a s= ingleton numeric
value (except for the special *last* keyword). Object m= ember access is
performed using dot notation, not ['key'].
The standard further specifies that a simplified accessor expression
i= s semantically equivalent to:

=C2=A0 =C2=A0 JSON_QUERY (VEP, 'la= x $JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
<= br>=C2=A0 =C2=A0 where:
=C2=A0 =C2=A0 =C2=A0 VEP =3D <value expressio= n primary>
=C2=A0 =C2=A0 =C2=A0 JC =3D <JSON simplified accessor o= p chain>

For example:

select ('[1, "2", null= ]'::jsonb)[1];
=C2=A0 =C2=A0 =C2=A0
by the standard's specif= ication is semantically equivalent to:

select json_query('[1, &q= uot;2", null]'::jsonb, 'lax $[1]' with conditional array w= rapper);

However, as Peter Eisentraut pointed out in [3], these two = forms
currently diverge in several cases.

---

## Observed = differences
=C2=A0 =C2=A0 =C2=A0
Below are the main cases where json= b subscripting and SQL/JSON path
semantics differ.
=C2=A0 =C2=A0 =C2= =A0
### Case 1a: Array access on a jsonb primitive value
select (= 9;123'::jsonb)[0];
=C2=A0 jsonb
-------

(1 row)

sel= ect json_query('123'::jsonb, 'lax $[0]' with conditional ar= ray wrapper);
=C2=A0 json_query
------------
=C2=A0 123
(1 row)=
=C2=A0 =C2=A0 =C2=A0
The json_query() behavior is correct and compl= y with the SQL standard:
=C2=A0 =C2=A0 =C2=A0
> In lax mode:
= > If an operation requires an SQL/JSON array but the operand is not an> SQL/JSON array, then the operand is first "wrapped" in an = SQL/JSON
> array prior to performing the operation.

So, whenev= er there is a 'lax $[0]' jsonpath expression applying on a
non-j= son array, the return value should always be the same as the
original js= on value expression, in this case, the same number 123.

The jsonb su= bscripting result is not only different, but also
contradicts its curren= t documentation [2], which suggests raising
error:
> However, if a= ny val itself or one of the intermediary values is
> defined as a non= -object such as a string, number, or jsonb null,
> traversal cannot p= roceed so an error is raised and the transaction
> aborted.

--= -

### Case 1b: Array access on a jsonb object
select ('{"= ;a": 1}'::jsonb)[0];
=C2=A0 jsonb
-------

(1 row)
=
select json_query('{"a": 1}'::jsonb, 'lax $[0]= 9; with conditional array
wrapper);
=C2=A0 json_query
------------=
=C2=A0 {"a": 1}
(1 row)

Same issue as Case 1a, but = with an jsonb object instead of a scalar.
json_query() returns the same = input json object as expected, whereas
the jsonb subscripting returns nu= ll. The jsonb subscripting behavior
here is undocumented.

### Cas= e 1c: Chained array access on a one-dimensional json array

select (&= #39;[1, "2", null]'::jsonb)[1][0];
=C2=A0 jsonb
-------=

(1 row)

select json_query('[1, "2", null]'= ::jsonb, 'lax $[1][0]' with
conditional array wrapper);
=C2= =A0 json_query
------------
=C2=A0 "2"
(1 row)

Th= is reduces to Case 1a: the intermediate result of [1] is a scalar
("= ;2"), which SQL/JSON treats as a single-element array in 'lax'= mode.

---

### Case 2: Non-integer array index

select = ('[1, "2", null]'::jsonb)[1.0];
ERROR: =C2=A0subscript= type numeric is not supported
LINE 1: select ('[1, "2", n= ull]'::jsonb)[1.0];
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0^
HINT: =C2=A0jsonb subscript must be coercible = to either integer or text.

select json_query('[1, "2",= null]'::jsonb, 'lax $[1.0]' with conditional
array wrapper)= ;
=C2=A0 json_query
------------
=C2=A0 "2"
(1 row)
select json_query('[1, "2", null]'::jsonb, 'lax= $[1.9]' with conditional
array wrapper);
=C2=A0 json_query
--= ----------
=C2=A0 "2"
(1 row)

The standard allows no= n-integer numeric subscripts, with
implementation-defined rounding or tr= uncation (IA073). PostgreSQL=E2=80=99s
json_query() uses truncation, whi= ch is compliant.
=C2=A0 =C2=A0 =C2=A0
jsonb subscripting does not su= pport this at all, which I think is
reasonable, but again diverges from = the standard behavior.
=C2=A0 =C2=A0 =C2=A0
For reference, since Duc= kDB and Oracle both support JSON simplified
accessor, I tried similar qu= eries on them as well:
=C2=A0 =C2=A0 =C2=A0
-- DuckDB
duckdb> = select ('[1, "2", null]'::json)[0.4]; -- returns 1
duc= kdb> select ('[1, "2", null]'::json)[0.5]; -- returns = 2

-- Oracle
WITH data AS (
=C2=A0 SELECT JSON('[1, "2= ", null]') AS jcol
)
SELECT d.jcol[1.5] FROM data d; -- erro= r
=C2=A0 =C2=A0 =C2=A0
So DuckDB does rounding, and Orcale errors du= e to "an invalid array
index".

---

### Case 3: = Negative array index
select ('[1, "2", null]'::jsonb)[= -2];
=C2=A0 jsonb
-------
=C2=A0 "2"
(1 row)

s= elect json_query('[1, "2", null]'::jsonb, 'lax $[-2]&= #39; with conditional
array wrapper);
=C2=A0 json_query
----------= --

(1 row)
=C2=A0 =C2=A0 =C2=A0
select json_query('[1, &q= uot;2", null]'::jsonb, 'lax $[-0.9]' with conditional =C2= =A0 =C2=A0 =C2=A0 =C2=A0 array wrapper);
=C2=A0json_query
-----------= -
=C2=A01
(1 row)

The json_query() behavior is correct accordi= ng to the standard for out
of range index in 'lax' mode; and the= jsonb subscripting is a custom
extension (-1 means last element, etc.).=

Interestingly, DuckDB's JSON array accessor made the same custo= m
choice as PostgreSQL for negative index.

---

## Summary<= br>=C2=A0 =C2=A0 =C2=A0
In all cases above, json_query() follows the SQ= L/JSON standard
specification. jsonb subscripting, which predates the st= andard,
differs in several ways:

1. Array access on non-arrays (s= calars or objects) does not use 'lax'
mode wrapping. As a result= , "[0]" does not return the original value.
2. Non-integer sub= scripts are not supported.
3. Negative subscripts use a PostgreSQL-spec= ific extension.
=C2=A0 =C2=A0 =C2=A0
These are all edge cases and li= kely low-impact in practice. However,
if we were to add broader support = for SQL/JSON simplified accessors
(dot notation, wildcards, item methods= , etc.), which can be chained
together and include array access using th= e existing bracket syntax,
it seems useful to clarify what semantics we = would want for array
access in those situations.

For example, wit= h expressions such as

=C2=A0 =C2=A0 select (jb)[0].a
=C2=A0 =C2= =A0 select (jb).a[0].b
=C2=A0 =C2=A0 select (jb).a[-1].b

(where j= b is a jsonb value), it would be potentially confusing if
their behavior= differed from the equivalent json_query() calls using
the same JSON pat= h.

Given this, it is unclear to me whether the expectation should be= to
move closer to the SQL/JSON path semantics, or to preserve the
ex= isting jsonb subscripting behavior and document it as
PostgreSQL-specifi= c.

This question comes up in the context of the dot-notation work I<= br>mentioned at the beginning, but more generally it seems like somethingwe may want to be explicit about before extending the syntax further.
=
I would very much appreciate any thoughts or guidance on this.
=C2= =A0 =C2=A0 =C2=A0
Best,
Alex
=C2=A0 =C2=A0 =C2=A0
[1] https://commitfest.postg= resql.org/patch/5214/
[2] https://www.postgresql.org/= docs/current/datatype-json.html#JSONB-SUBSCRIPTING
[3] https://www.postgresql.org/message-id/a8843d0a-8adb-4fdc-9a= c8-8efd22f7d29c%40eisentraut.org
[4]https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-S= QLJSON-STRICT-AND-LAX-MODES

--
= Alexandra Wang
--0000000000005b1bf40648880185--