public inbox for [email protected]
help / color / mirror / Atom feedFrom: Jonathan S. Katz <[email protected]>
To: [email protected]
To: [email protected]
To: PG Doc comments form <[email protected]>
Subject: Re: legacy assumptions
Date: Mon, 25 Nov 2019 19:28:43 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
Hi,
On 11/25/19 12:47 PM, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/datatype-json.html
> Description:
>
> I'm wondering if this one line of section 8.14 JSON Types
> (https://www.postgresql.org/docs/current/datatype-json.html) can be edited
> to remove the word "legacy":
>
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as legacy assumptions about
> ordering of object keys."
>
> I'm concerned that with the word "legacy" there, someone might come along
> eventually and decide the json column type isn't needed anymore because it's
> "legacy", where in fact there are modern and legitimate uses for a field
> that allows you to retrieve the data exactly as it was stored and allows
> JSON queries on that data (even if they are slower).
While I'm certainly sensitive to this need as once upon a time I had a
similar requirement, slightly less strict requirement, I made sure to
not rely on the PostgreSQL JSON type itself to ensure ordering was
preserved (and in my case I was able to rely on a solution external to
PostgreSQL).
The JSON RFC states that objects should be considered "unordered", and
mentions that while different parsing libraries may preserve key
ordering, "implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences."[1]
> An alternative would be to store the
> plaintext as binary data for the integrity check and have a separate jsonb
> column with a second copy of the same data. Since different applications
> have different time/space tradeoffs, it's good to have the choice.
Another approach is to leverage PostgreSQL's expression index
capabilities, which would allow you to limit the data duplication. For
example:
CREATE TABLE docs (doc bytea);
-- populating some test data
INSERT INTO docs
SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea
FROM generate_series(1, 100000) x;
-- create an expression index that maps to the operators supported by GIN
CREATE INDEX docs_doc_json_idx ON docs
USING gin(jsonb(encode(doc, 'escape')));
and in one test run:
EXPLAIN
SELECT doc
FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}';
I got a plan similar to:
QUERY PLAN
------------------------------------------------------------------------------------
Bitmap Heap Scan on docs (cost=28.77..306.00 rows=100 width=31)
Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)
-> Bitmap Index Scan on docs_doc_json_idx (cost=0.00..28.75
rows=100 width=0)
Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)
In this way, you can:
- Keep the key ordering preserved and perform any integrity checks, etc.
that your application requires
- Limit your data duplication to that of the index
- Still get the benefits of the JSONB lookup functions that work with
the indexing
- Still perform JSON validation:
INSERT INTO docs VALUES ('{]'::bytea);
ERROR: invalid input syntax for type json
DETAIL: Expected string or "}", but found "]".
CONTEXT: JSON data, line 1: {]
> My suggestion for that sentence:
>
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as assumptions about ordering
> of object keys or the need to retrieve the data exactly as it was stored."
My preference would be that we guide in the documentation on what to do
if one has an application sensitive to ordering. I'm not opposed to the
wording, but I'd prefer we encourage people to leverage JSONB for
storage & retrieval.
Thanks!
Jonathan
[1] https://tools.ietf.org/html/rfc7159#section-4
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: legacy assumptions
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox