Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tvRks-004ldU-BO for pgsql-docs@arkaria.postgresql.org; Fri, 21 Mar 2025 02:05:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tvRkq-00DaOS-UR for pgsql-docs@arkaria.postgresql.org; Fri, 21 Mar 2025 02:05:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tvRkq-00DaNA-IZ for pgsql-docs@lists.postgresql.org; Fri, 21 Mar 2025 02:05:00 +0000 Received: from mail-oo1-xc2c.google.com ([2607:f8b0:4864:20::c2c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tvRkn-000F8q-0e for pgsql-docs@lists.postgresql.org; Fri, 21 Mar 2025 02:04:59 +0000 Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-601c469cce3so321390eaf.2 for ; Thu, 20 Mar 2025 19:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742522695; x=1743127495; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=YDRNLgC46KfL2xWorYCQ8sqLJPmswdSKfpuGID/5qr4=; b=begZrDnLhTc6x2YBOJPFMaaQjj0BAMbCDWnQlDqTOdtxXwL7y1/LdXGMYp/PVLC7mc 8ox42AvCaiA8KJ4XmVQ3jnn9TpEAeDC6KuBZXpSF4UbwqEQj1f0iBNfYZSz+9adzCUzV 3CdNporCxTEm7Fguoq1PPpoxU0IcP1KfucgICOgjN5sKw9ICQqxqCSmUWBgR8r0v+lQF deBOPif5dDUDryUJDKGTrGFJzeSXotiVBlODlIHJ56oFfeyGbZKG1tL34RkRTC6h4ZwI OHYKe3rUa06uZus1YA2xMOGraghXCR+CkAdjGLOlOxyqYgC5iRWNmvc48C+bHaHdtFFV kiXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742522695; x=1743127495; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YDRNLgC46KfL2xWorYCQ8sqLJPmswdSKfpuGID/5qr4=; b=gNVBKcK8DetakWsKIzA1cmO1Rv0V22DuNbg/lqOz1w/CFI2NHg+wIs1lXzZQxuui1a 9JsCOyEZjtSylaJo9388raaPzTMNZ73BCuHVtEDDud1lDcWcDhc4pJtZdWyncMsa05e0 +sESZsPqWZgJKyu0ppasQgCdFznZgTZBOcrIlIxcSUqaGbtLpPA4vy2KspUhwtdadWqC o7ADlJF8Mt3cwgRRqnAodgV00tLU/mhk6+U8o3YfTbocGObjWCkCIOiawvp44zjqXzrh vVGNwTJicS3wxp6G2QoThMpETkhvPR5tAV7muKKZ8FQeRas8o3DDY+BUgp8mUwULyr3t H6CQ== X-Forwarded-Encrypted: i=1; AJvYcCVDXjjPE98gbHHq/W+W2adVDh4ku6gyNugBcaWIw/GY41f/kDshDl3La9TOX91N/aEFToNJjxditbgH@lists.postgresql.org X-Gm-Message-State: AOJu0YxucxlKPEluPZmfk/nmP7lC9xTDnMwlfDK3cU/g8JygFjkNLpwG F7I9ufj7mJHUzaZuaGMwYZtouDTLhY0bt5mAieAJpDHPLzTf1DDSdKGUD37Dmfvv617Md98wuHx Fh4zDuAvQNmD5fWHfyS/0TnuG+rY= X-Gm-Gg: ASbGnctfwCsetB/mNsgcI+evaPDoOWvomBjlYFwdOrIZgk6yPuatniFniyCQK+1AVhp Hu4s9/KUaoPJuA8FOcdJ5aAW/L1nlh93qCDalzpxpihG9s6eQBoYHuQx/GnLm8B35bChc/oLC+/ ZONKJVG4/GOaUhV9A80AaVUcSh X-Google-Smtp-Source: AGHT+IG0qZfvfTtBjta9lDjQ6d6aTqY0yXyr8KA3RvGovd//5r8IleGTl2wNQHqKIyd96xu21CMpQ0Bw4XtIePlu/4s= X-Received: by 2002:a05:6820:4481:b0:5fe:a12d:46cc with SMTP id 006d021491bc7-602345dc38dmr860778eaf.4.1742522694898; Thu, 20 Mar 2025 19:04:54 -0700 (PDT) MIME-Version: 1.0 References: <174238647361.682.12732328104350596711@wrigleys.postgresql.org> In-Reply-To: From: "David G. Johnston" Date: Thu, 20 Mar 2025 19:04:18 -0700 X-Gm-Features: AQ5f1JoWlgpEaeQTpLRU2QIa0naUu4myjB4Nx1CveMLkSaAaRbxmrUv0rWM-_2s Message-ID: Subject: Re: Ambiguity in IS JSON description and logic To: Kirk Parker Cc: vavankaru@gmail.com, pgsql-docs@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000003e19510630d0aee7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003e19510630d0aee7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Mar 20, 2025 at 7:22=E2=80=AFAM Kirk Parker wrot= e: > On Thu, Mar 20, 2025 at 7:08=E2=80=AFAM Kirk Parker wr= ote: > >> >> On Thu, Mar 20, 2025 at 2:46=E2=80=AFAM PG Doc comments form < >> noreply@postgresql.org> wrote: >> >>> The following documentation comment has been logged on the website: >>> >>> Page: https://www.postgresql.org/docs/17/functions-json.html >>> Description: >>> >>> On the manual page >>> https://www.postgresql.org/docs/current/functions-json.html, in the >>> Table >>> 9.48. "SQL/JSON Testing Functions" there is a description of IS JSON. I= t >>> includes the next sentence: "If WITH UNIQUE KEYS is specified, then any >>> object in the expression is also tested to see if it has duplicate keys= ." >>> And such text is ambiguous, because the term "object" has certain meani= ng >>> regarding json format. In reality the option WITH UNIQUE KEYS allows to >>> check for duplicated keys any array element not object. For objects, bo= th >>> WITH UNIQUE KEYS and WITHOUT UNIQUE KEYS return false, and both IS JSON >>> ARRAY WITH UNIQUE KEY and IS JSON ARRAY WITHOUT UNIQUE KEY return true >>> (it >>> is at the same time with and without unique values, how it is possible?= ), >>> i.e. it works the same as just IS JSON ARRAY. The example code that >>> confirms >>> my reasoning: >>> SELECT >>> js.vl AS "tested str", >>> >>> >>> js.vl IS JSON OBJECT WITH UNIQUE KEYS AS ".. object w. UQ >>> keys", >>> >>> js.vl IS JSON OBJECT WITHOUT UNIQUE KEYS AS ".. object w/o UQ >>> keys", >>> js.vl IS JSON ARRAY WITH UNIQUE KEYS AS ".. array w. UQ >>> keys", >>> >>> js.vl IS JSON ARRAY WITHOUT UNIQUE KEYS AS ".. array w/o UQ keys= ", >>> js.vl IS JSON ARRAY ".. array" >>> FROM (VALUES ('{{"a": "a1"}, {"a": "a2"}}'), ('[{"a": "a1"}, {"a": >>> "a2"}]'), >>> ('["a", "a"]')) AS js(vl); >>> >>> I'm not sure what should be the right logic for this option, for me it >>> looks >>> now the same as simple IS JSON ARRAY without any UNIQUE KEY option, but >>> if >>> we use an option it should be either true for WITH UNIQUE KEYS or WITHO= UT >>> UNIQUE KEYS but not for both at the same time. But anyway the sentence = I >>> showed above should contain "array" instead of "object" because for >>> objects >>> it returns false independently of applied option. I tested it on >>> "PostgreSQL 17.0 on x86_64-windows, compiled by msvc-19.41.34120, >>> 64-bit". >>> >> >> First, WITHOUT UNIQUE KEYS does not mean "confirm that there are >> duplicate keys", it's just a way of stating the default explicitly. In >> other words it means "w/o testing for duplicate keys". Thus IS JSON OBJE= CT >> and IS JSON OBJECT WITHOUT UNIQUE KEYS will both always return identical >> results on the same JSON expression. >> >> Secondly, the UNIQUE test is recursive; for objects maybe the meaning is >> intuitive, but for JSON arrays -- which don't have any concept of keys; >> JSON arrays are just ordered lists -- it means "does this array contain = any >> embedded objects with duplicate keys". >> >> See: >> >> SELECT js, >> js IS JSON "json?", >> js IS JSON OBJECT "object?", >> js IS JSON OBJECT WITH UNIQUE KEYS "object w. UK?", >> js IS JSON OBJECT WITHOUT UNIQUE KEYS "object w/o UK?", >> js IS JSON ARRAY "array?", >> js IS JSON ARRAY WITH UNIQUE KEYS "array w. UK?", >> js IS JSON ARRAY WITHOUT UNIQUE KEYS "array w/o UK?" >> FROM (VALUES >> ('[{"a":1},{"b":2,"b":3}]'), -- expect t for array, array w/o UK >> ('[{"a":1},{"b":2,"c":3}]'), -- expect t for ALL array tests >> ('{"b":2,"b":3}'), -- expect t for object, object w/o = UK >> ('{"c":2,"d":3}'), -- expect t for ALL object tests >> ('{"c":2,"d":{ "e": 0, "e": 1}}'), -- WITH UNIQUE is recursive for >> nested objects >> ('{"c":2,"d":{ "e": 0, "f": {"g":1,"g":2}}}'), -- no matter how deep >> ('[{"a":1},{"b":2,"c":{"d":1, "d":2}}]') -- and also tests arrays >> recursively for embedded objecs >> ) foo(js); >> >> >> A couple of side notes: >> >> 1. Your first data example is not JSON at all. It's helpful for this >> kind of test to include a plain IS JSON column, since any of the IS JSON= X >> tests can fail for two reasons: (a) it's not JSON, or (b) it is JSON but >> it's not an X. >> >> 2. Curiously, the JSON spec itself is completely silent on the meaning o= f >> objects with duplicate keys. PostgreSQL is more helpful in this >> regard--the docs explicitly state that the last value is the one that is >> retained by JSONB and used in processing functions. >> >> > To improve the documentation here, I would suggest simply adding the word > "recursively" after "tested": > > If WITH UNIQUE KEYS is specified, then any object in the *expression*= is > also tested recursively to see if it has duplicate keys > > I think the existing word "any" sufficiently implies "recursively". It also doesn't really address the complaint here. I'm thinking something more like: (this is changed intentionally, see below) expression IS [ NOT ] JSON [ { SCALAR | ARRAY | OBJECT } ] [ WITH UNIQUE ] -> boolean This predicate tests whether expression can be parsed as JSON. Two additional properties can be tested at the same time: the type of the JSON value, and whether it passes the unique object keys constraint. Enable the first test by specifying one of SCALAR, ARRAY, or OBJECT. Enable the second test by specifying WITH UNIQUE: This test is applied to all objects contained within the JSON value. The return value is true only if expression can be parsed as JSON and all enabled tests pass. The return value is inverted if NOT is specified. The test label "array w/o UK?" has to go. Coupling with the "additional tests" idea introduced above, and the recommended syntax, we should do something like: SELECT js, js IS JSON "parses ok, no tests", js IS JSON OBJECT "object test only", js IS JSON ARRAY "array test only", js IS JSON WITH UNIQUE "unique test only", js IS JSON ARRAY WITH UNIQUE "array and unique tests" Then, to keep the technical reference thorough, re-add the full syntax at the end. This is the full syntax accepted for this predicate. Both VALUE and WITHOUT, the default explicit keywords to disable the two additional tests, as well as KEYS, are omitted above for clarity. expression IS [ NOT ] JSON [ { VALUE | SCALAR | ARRAY | OBJECT } ] [ { WITH | WITHOUT } UNIQUE [ KEYS ] ] David J. --0000000000003e19510630d0aee7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Mar 20, 2025 at 7:22=E2=80=AFAM Kirk Parker <khp@equatoria.us> wrote:
On Thu, = Mar 20, 2025 at 7:08=E2=80=AFAM Kirk Parker <khp@equatoria.us> wrote:

On Thu, Mar 20, 2025 at 2:46=E2=80=AFAM PG Doc comments = form <norepl= y@postgresql.org> wrote:
The following documentation comment has been logged on the = website:

Page: https://www.postgresql.org/docs/17/funct= ions-json.html
Description:

On the manual page
https://www.postgresql.org/docs/current/f= unctions-json.html, in the Table
9.48. "SQL/JSON Testing Functions" there is a description of IS J= SON. It
includes the next sentence: "If WITH UNIQUE KEYS is specified, then an= y
object in the expression is also tested to see if it has duplicate keys.&qu= ot;
And such text is ambiguous, because the term "object" has certain= meaning
regarding json format. In reality the option WITH UNIQUE KEYS allows to
check for duplicated keys any array element not object. For objects, both WITH UNIQUE KEYS and WITHOUT UNIQUE KEYS return false, and both IS JSON
ARRAY WITH UNIQUE KEY and IS JSON ARRAY WITHOUT UNIQUE KEY return true (it<= br> is at the same time with and without unique values, how it is possible?), i.e. it works the same as just IS JSON ARRAY. The example code that confirm= s
my reasoning:=C2=A0
SELECT
=C2=A0 =C2=A0 js.vl AS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 "tested str",=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0

=C2=A0 =C2=A0 js.vl IS JSON OBJECT WITH UNIQUE KEYS AS=C2=A0 =C2=A0 =C2=A0 = ".. object w. UQ keys",=C2=A0 =C2=A0

=C2=A0 =C2=A0 js.vl IS JSON OBJECT WITHOUT UNIQUE KEYS AS=C2=A0 =C2=A0"= ;.. object w/o UQ keys",
=C2=A0 =C2=A0 js.vl IS JSON ARRAY WITH UNIQUE KEYS AS=C2=A0 =C2=A0 =C2=A0 = =C2=A0".. array w. UQ keys",=C2=A0 =C2=A0

=C2=A0 =C2=A0 js.vl IS JSON ARRAY WITHOUT UNIQUE KEYS AS=C2=A0 =C2=A0 "= ;.. array w/o UQ keys",
=C2=A0 =C2=A0 js.vl IS JSON ARRAY ".. array"
FROM (VALUES ('{{"a": "a1"}, {"a": "= a2"}}'), ('[{"a": "a1"}, {"a": &= quot;a2"}]'),
('["a", "a"]')) AS js(vl);

I'm not sure what should be the right logic for this option, for me it = looks
now the same as simple IS JSON ARRAY without any UNIQUE KEY option, but if<= br> we use an option it should be either true for WITH UNIQUE KEYS or WITHOUT UNIQUE KEYS but not for both at the same time. But anyway the sentence I showed above should contain "array" instead of "object"= because for objects
it returns false independently of applied option.=C2=A0 I tested it on
"PostgreSQL 17.0 on x86_64-windows, compiled by msvc-19.41.34120, 64-b= it".

First, WITHOUT UNIQUE KEYS do= es not mean "confirm that there are duplicate keys", it's jus= t a way of stating the default explicitly. In other words it means "w/= o testing for duplicate keys". Thus IS JSON OBJECT and IS JSON OBJECT = WITHOUT UNIQUE KEYS will both always return identical results on the same J= SON expression.=C2=A0

Secondly, the UNIQUE test is= recursive; for objects maybe the meaning is intuitive, but for JSON arrays= -- which don't have any concept of keys; JSON arrays are just ordered = lists -- it means "does this array contain any embedded objects with d= uplicate keys".

See:
=C2=A0SELECT js,
<= div class=3D"gmail_quote">
=C2=A0 js IS JSON &= quot;json?",
=C2=A0 js IS JSON OBJECT "object?",
=C2=A0 js I= S JSON OBJECT WITH UNIQUE KEYS "object w. UK?",
=C2=A0 js IS JSON= OBJECT WITHOUT UNIQUE KEYS "object w/o UK?",
<= div class=3D"gmail_quote">
=C2=A0 js IS JSON A= RRAY "array?",
= =C2=A0 js IS JSON ARRAY WITH UNIQUE KEYS "arr= ay w. UK?",
=C2=A0 js IS JSON ARRAY WITHOUT UNIQUE KEYS "array w/= o UK?"
FROM (VALUES
= =C2=A0 ('[{"a":1},{"b":2,&= quot;b":3}]'), =C2=A0 =C2=A0 =C2=A0 -- expect t for array, array w= /o UK
=C2=A0 ('[{"a":1},{"b":2,"c":3}]= 9;), =C2=A0 =C2=A0 =C2=A0 -- expect t for ALL array tests
=C2=A0 ('{&qu= ot;b":2,"b":3}'), =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 -- expect t for object, object w/o UK
<= div class=3D"gmail_quote">
=C2=A0 ('{"= ;c":2,"d":3}'), =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 -- expect t for ALL object tests
=C2=A0 ('{"c&qu= ot;:2,"d":{ "e": 0, "e": 1}}'), -- WITH U= NIQUE is recursive for nested objects
=C2=A0 ('{"c":2,"d= ":{ "e": 0, "f": {"g":1,"g":2}= }}'), -- no matter how deep
=C2=A0 ('[{"a":1},{"b&qu= ot;:2,"c":{"d":1, "d":2}}]') -- and also = tests arrays recursively for embedded objecs
) foo(js);

A couple of side notes:

1. Your first data examp= le is not JSON at all.=C2=A0 It's helpful for this kind of test to incl= ude a plain IS JSON column, since any of the IS JSON X tests can fail for t= wo reasons: (a) it's not JSON, or (b) it is JSON but it's not an X.=

2. Curiously, the JSON spec itself is comple= tely silent on the meaning of objects with duplicate=C2=A0keys.=C2=A0=C2=A0= PostgreSQL is more helpful in this regard--the d= ocs explicitly state that the last value is the one that is retained by JSO= NB and used in processing functions.

=C2=A0
To improve the documentation=C2=A0here,= I would suggest simply adding the word "recursively" after "= ;tested":

=C2=A0 =C2=A0 If=C2=A0<= code style=3D"box-sizing:border-box;font-family:monospace,monospace;font-si= ze:14.4px;color:rgb(33,37,41);border-radius:0.25rem;margin:0px;padding:0px;= word-break:unset">WITH UNIQUE KEYS=C2=A0is sp= ecified, then any object in the=C2=A0expression= =C2=A0is also tested recursively to see if it has duplicate keys


I think= the existing word "any" sufficiently implies "recursively&q= uot;.=C2=A0 It also doesn't really address the complaint here.=C2=A0 I&= #39;m thinking something more like:

(this is changed i= ntentionally, see below)

expression IS [ NOT ] JSON [ = { SCALAR | ARRAY | OBJECT } ] [ WITH UNIQUE ] -> boolean

This predicate tests whether=C2=A0expression can be parsed as JSON.= =C2=A0 Two additional properties can be tested at the same time: the type o= f the JSON value, and whether it passes the unique object keys constraint.= =C2=A0 Enable the first test by specifying one of SCALAR, ARRAY, or OBJECT.= =C2=A0 Enable the second test by specifying WITH UNIQUE: This test is appli= ed to all objects contained within the JSON value.=C2=A0 The return value i= s true only if expression can be parsed as JSON and all enabled tests pass.= =C2=A0 The return value is inverted if NOT is specified.

=
The test label "array w/o UK?" has to go.=C2=A0 Coupling with = the "additional tests" idea introduced above, and the recommended= syntax, we should do something like:

SELECT js,
=
=C2=A0 js IS JSON "parses ok, no tests",
=C2=A0 js IS JSON = OBJECT "object test only",
=C2=A0 js IS JSON ARRAY "array= test only",
=C2=A0 js IS JSON WITH UNIQUE "unique test o= nly",
=C2=A0 js IS JSON ARRAY WITH UNIQUE "array and uniq= ue tests"


Then, to keep the technica= l reference thorough, re-add the full syntax at the end.

=

This is the full syntax accepted for this predicate.=C2=A0 = Both VALUE and WITHOUT, the default explicit keywords to disable=C2=A0the t= wo additional tests, as well as KEYS, are omitted above for clarity.
<= div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif= ">expression IS [ NOT ] JSON [ { VALUE | SCALAR | ARRAY | OBJECT } ] [ { WI= TH | WITHOUT } UNIQUE [ KEYS ] ]

David J.

=
--0000000000003e19510630d0aee7--