Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iZeq6-0000Wg-J6 for pgsql-docs@arkaria.postgresql.org; Tue, 26 Nov 2019 17:41:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1iZeq5-0005Ma-5z for pgsql-docs@arkaria.postgresql.org; Tue, 26 Nov 2019 17:41:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iZeq4-0005MS-Uf for pgsql-docs@lists.postgresql.org; Tue, 26 Nov 2019 17:41:24 +0000 Received: from mout.perfora.net ([74.208.4.197]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iZeq1-0000lt-Tb for pgsql-docs@lists.postgresql.org; Tue, 26 Nov 2019 17:41:24 +0000 Received: from [192.168.254.10] ([50.39.98.38]) by mrelay.perfora.net (mreueus001 [74.208.5.2]) with ESMTPSA (Nemesis) id 0LvjHG-1hk1bf38ay-017YqA for ; Tue, 26 Nov 2019 18:41:19 +0100 Subject: Re: legacy assumptions To: pgsql-docs@lists.postgresql.org References: <157470402952.7229.3931536721568065202@wrigleys.postgresql.org> From: Jonathan Buhacoff Openpgp: preference=signencrypt Autocrypt: addr=jonathan@buhacoff.net; keydata= mQENBFtFxiABCACkb7/EQ9yqctEaJTdMFyqbzASf6ccTqriHYZ5b5fYCEDN9QUrA3LFzsSQ3 UcnnBID3Z2diAuY+vMQ8eIXWOt49Tu7qd+Oi7sdBX0b4HH0QqfQZzwOjF++wAONQQkOwpkzv UTDeeaaYQv/JbSWjPG45Y7BEcpWwQMa/clTvAgHpV2l64FfnuUmtCdF02f9W6dAascuU0lX6 BZ4Mi+PUp/Flo8wMnwjHSQ5DIZIVcfDBf1BpcTTxa7F+so8B3el0kWyzbHbY5pAxodWVgkSi qD7oHxmkDKQkFdsnTjcgv8ekO73v/Cx1Q2sNOWA610iDYRJLt0vdcIPrWyREPNQETkVNABEB AAG0KUpvbmF0aGFuIEJ1aGFjb2ZmIDxqb25hdGhhbkBidWhhY29mZi5uZXQ+iQFOBBMBCAA4 FiEESkQyCf7u1VZG0GYiWwMNNTOaAvAFAludJpkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgEC F4AACgkQWwMNNTOaAvAAgQf+OCBLAhdQDc6dhXjkhPngWr6nGO4Wksh87GOlYt3W8hYaYDt4 UXujTsCdNudS0YcqFwBum0A/kyEvzsdukL+66Tx0rGkSAG2aU5i8S8rvMLKlTAEsQN+fC9yt pUM5VLxKHmyzdRv7JhlJ73Gz2hVTPdB7HL/iaLQQs2EptzXHL1qXdV6EW2ZlUH3bz3sBGubE V/Cl+nTB50YtoUKaRDNKmJ1KCcv5mA7GefPz4m2g4h/ne92gyJkqWx1858jpG2zloUSXjMDH JR8EWTU28f/+0HmYfsOcVHLQpgMVJMgaGaxXZwUUofShz4Hf0h2urMreJKLrw/0PbEUE04qb yaW5KLkBDQRbRcYgAQgA5AeMvBeBz/XuqK3c6VbWMG4XVahZ3fTwUlrkyKJXzgCFAFwd0ixf fGb9M/IvAtJvIXJH60NXCr3cuQ5EPQBegaYqE0L+jUdS82hkpYgEoQhlUVAuVZi3mo1ETyFx 46NKL0TCyDpGDpOAv+VHw30mZr58uOtEkPeAj2x8O+9dRqSKu6MwuMmy05CBcGPHi5Suysiz S/jZIsH3nSZCT/LrZqDHFitVcUOTRDA4DYygTJQfhNY4YbJ7rLnxZkpY5Jtw/rniDpEFAd0z HnEuGruH/2sM1yz3XLtBp7i+X+8OxFJ2/wEpo9Cbj0KqSQHnX1Q487RP9oM1Y4GJ113EfkSz FQARAQABiQE2BBgBCAAgFiEESkQyCf7u1VZG0GYiWwMNNTOaAvAFAltFxiACGwwACgkQWwMN NTOaAvBrxwf9F3UXRtMFG8he9749/JMzbbxDtTDzazOaqYs4Eoqor31cDpX4LobbfHnoETRl fYxybbI75IwiDnJ1WvBQJHeMMmfK/Lxr1facBe0jRVFAY2VFUX0ec1O24AhztI/4Py3xpx7k Ndl/wgqrOjhqfRKumamTvMeL6MPRLv2x1c2JuAjTpLKR6MaFbvM9YagUUe6SNfSk8af43LKO kvsDenqwrqtrUMR545G9niJAEFSJR7/Y9qAzeB2GqqnIloiZPbO1GmshVH+4ZXOMZbz94zFK CiXdp5aJjApkzfGkn+Uzg9JmUGcgjNEM35Mwkoh0WivqA7TSRpMJ2Rxes8T+2fZPuA== Message-ID: <62f15e85-4bc7-ac8b-b0b5-05412e81e2b0@buhacoff.net> Date: Tue, 26 Nov 2019 09:41:17 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Provags-ID: V03:K1:e0Gol3ZZIDjA8dpcHuJ8r4a/+NDLZ2uSNlsnyLR3Jq8NCQL3t4w +rpFnv0RXzgObiQiLOK8haKTtNOf44uh/E84g/7xW63mg+Wj2n50OglB96oOflEBOLHysps FQuxO4ed+FYotFqkdz3Lt8zhUXTmmWEOw9AP1cAFdVJ7p+aX5jhwMIQ6fk4NOROCblvUh4x cI1RjFAYlqpvnBsMosmrQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:n2tVAEGLrXg=:mV8Oq7TGxbhOJ+tnlC8A7N YUNvEigQJEdAiEfQBtD/cYwzkmDz5P+dBQa1v8XxNzhbeARCQiGwX4ODD9SsMmUA2HsOzFHqP 4ZtR4D+dfvQbvOz0IVwIIWIIYa8OksHUtxWb0RKiiEa9VFhgb7DFbn/rqLifSDR1XPvmBVoY9 jwLmW8Jda9to1MQeNK8KKoOPITbzQAjde4eNIl82xIWtxqmmSuOw8RCu/yrrBVtDIExkXm0lO M3ATH/BPr01LiwByWwVJcJo5Topg2e3Ynv5+uYMqIb5FPof7OcK3+7AO6RnBM4igJvregWaj9 hbYNOthYrickykaYrWMZ5BA+9zkkFSrYG7q7Ls+avhX7E0jxMPw95tw5utGf53F3W+v8hHXx9 Qs1f3D+7M4RY//vd9Ed5bWei35mQtjT/6ok2KpMq/FHCS0msq/oXuT8+fSLmyqCpYkbMNu7Uo UmYanzwZCv7Z/jWGJey7ku8u9YWenPqL/BT9u/ulBxNNrQ79zs85eh8D49UlBUv8I06ycmapg thqOOacKXdHJkxbxX+WSMGe4FT37JPUhNg1pC40hHrf+SpSTHLYvgZF/n4+2sW+Bo1dRXVUt/ K2LcXyTy7GQTKthaWjEgoRtAJYAdn283JE7a+A3EnOacfqqv76/LRK5roleZPMNbgpcjp9F33 jH4T/m/PXuXqfPeCsFh1Jc5P4EepB+e+3COGeThtl1HyXMbuiHLWA8wf1haTRXBRh99b3nAVz 27CMhyetDYJlBC73vX5ExEhc4NVv+es2VESGtqeveGAwi/rzHaV2HeoID22MxmtSZaOqWVb3f 6e9zgCCzt75/OPh0EoIpo/PkbzjvslT+h6WKCEgp5i/evp5uN6Ff/BeJjQhnP7N93wbXAj0QJ LrjCownylhi6kS/4IR4eI5xaSXeb3VjpG7nDf+9rU= List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk Thank you for sharing that alternative approach. "In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as assumptions about ordering of object keys or the need to retrieve the data exactly as it was stored." I don't know how to distill your example into a succinct addition to that paragraph. A brief Internet search I just tried didn't yield anything quite like it in the first few pages. I think including the full example on that page, or linking to it, would help a lot for anyone with these specialized requirements who is reading that page. Since you mentioned the RFC, I'll just point out that in my application, the integrity check is on the encoded JSON data (before parsing), not on the parsed JSON object in memory -- so there is no assumption about the ordering of keys. The JSON is not even parsed by the application if the integrity check fails. When the application writes data, it encodes the JSON first, then computes the integrity on the encoded bytes. The data could be stored in any format and the procedure would be the same. The reason for using the json type (or bytea with expression index, jsonb, and gin as you suggested) is to be able to select records based on the un-trusted JSON data before validating and re-parsing it in the application. Jonathan B On 11/25/2019 4:28 PM, Jonathan S. Katz wrote: > Hi, > > On 11/25/19 12:47 PM, PG Doc comments form wrote: >> The following documentation comment has been logged on the website: >> >> Page: https://www.postgresql.org/docs/12/datatype-json.html >> Description: >> >> I'm wondering if this one line of section 8.14 JSON Types >> (https://www.postgresql.org/docs/current/datatype-json.html) can be edited >> to remove the word "legacy": >> >> "In general, most applications should prefer to store JSON data as jsonb, >> unless there are quite specialized needs, such as legacy assumptions about >> ordering of object keys." >> >> I'm concerned that with the word "legacy" there, someone might come along >> eventually and decide the json column type isn't needed anymore because it's >> "legacy", where in fact there are modern and legitimate uses for a field >> that allows you to retrieve the data exactly as it was stored and allows >> JSON queries on that data (even if they are slower). > While I'm certainly sensitive to this need as once upon a time I had a > similar requirement, slightly less strict requirement, I made sure to > not rely on the PostgreSQL JSON type itself to ensure ordering was > preserved (and in my case I was able to rely on a solution external to > PostgreSQL). > > The JSON RFC states that objects should be considered "unordered", and > mentions that while different parsing libraries may preserve key > ordering, "implementations whose behavior does not depend on member > ordering will be interoperable in the sense that they will not be > affected by these differences."[1] > >> An alternative would be to store the >> plaintext as binary data for the integrity check and have a separate jsonb >> column with a second copy of the same data. Since different applications >> have different time/space tradeoffs, it's good to have the choice. > Another approach is to leverage PostgreSQL's expression index > capabilities, which would allow you to limit the data duplication. For > example: > > CREATE TABLE docs (doc bytea); > > -- populating some test data > INSERT INTO docs > SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea > FROM generate_series(1, 100000) x; > > -- create an expression index that maps to the operators supported by GIN > CREATE INDEX docs_doc_json_idx ON docs > USING gin(jsonb(encode(doc, 'escape'))); > > and in one test run: > > EXPLAIN > SELECT doc > FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}'; > > I got a plan similar to: > > QUERY PLAN > > ------------------------------------------------------------------------------------ > Bitmap Heap Scan on docs (cost=28.77..306.00 rows=100 width=31) > Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": > 567}'::jsonb) > -> Bitmap Index Scan on docs_doc_json_idx (cost=0.00..28.75 > rows=100 width=0) > Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": > 567}'::jsonb) > > In this way, you can: > > - Keep the key ordering preserved and perform any integrity checks, etc. > that your application requires > - Limit your data duplication to that of the index > - Still get the benefits of the JSONB lookup functions that work with > the indexing > - Still perform JSON validation: > > INSERT INTO docs VALUES ('{]'::bytea); > > ERROR: invalid input syntax for type json > DETAIL: Expected string or "}", but found "]". > CONTEXT: JSON data, line 1: {] > >> My suggestion for that sentence: >> >> "In general, most applications should prefer to store JSON data as jsonb, >> unless there are quite specialized needs, such as assumptions about ordering >> of object keys or the need to retrieve the data exactly as it was stored." > My preference would be that we guide in the documentation on what to do > if one has an application sensitive to ordering. I'm not opposed to the > wording, but I'd prefer we encourage people to leverage JSONB for > storage & retrieval. > > Thanks! > > Jonathan > > [1] https://tools.ietf.org/html/rfc7159#section-4 >