public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tom Lane <[email protected]>
To: David E. Wheeler <[email protected]>
Cc: Bruce Momjian <[email protected]>
Cc: Greg Stark <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: Tue, 06 May 2014 18:20:53 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CAM3SWZQzOxmryVe_bdu9WhqbJTBoiqM7knvKm0oRYxvfsFnz7Q@mail.gmail.com>
<[email protected]>
<[email protected]>
<CA+TgmoZucoe-w5P_16A-gdfRBFCCu7PD17YO-KqrYPBhb5TxAA@mail.gmail.com>
<[email protected]>
<CAM-w4HNawF2sPDSUrZsng7zgGnste-Vy1+42AufyVq9UkVeCZw@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-hackers>
"David E. Wheeler" <[email protected]> writes:
> On May 6, 2014, at 2:20 PM, Bruce Momjian <[email protected]> wrote:
>> Well, then, we only have a few days to come up with a name.
> What are the options?
We have no proposals as yet.
I've been looking at the source code to try to understand the difference
between the two opclasses (and BTW I concur with the opinions expressed
recently about the poor state of the internal documentation for jsonb).
If I've got it straight:
jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
would give rise to GIN entries that are effectively the strings "Kxyz"
and "V2". If you're looking for tuples containing "{xyz: 2}" then you
would be looking for the AND of those independent index entries, which
fortunately GIN is pretty good at computing. But you could also look
for just keys or just values.
jsonb_hash_ops creates an index entry only for values, but what it
stores is a hash of both the value and the key it's stored under.
So in this example you'd get a hash combining "xyz" and "2". This
means the only type of query you can perform is like "find JSON tuples
containing {xyz: 2}".
Because jsonb_ops stores the *whole* value, you can do lossless index
searches (no recheck needed on the heap tuple), but you also run the
risk of long strings failing to fit into an index entry. Since jsonb_ops
reduces everything to a hash, there's no possibility of index failure,
but all queries are lossy and require recheck.
TBH, at this point I'm sort of agreeing with the thought expressed
upthread that maybe neither of these should be the default as-is.
They seem like rather arbitrary combinations of choices. In particular
I wonder why there's not an option to store keys and values separately,
but as hashes not as the original strings, so that indexability of
everything could be guaranteed. Or a variant of that might be to hash
only strings that are too large to fit in an index entry, and force
recheck only when searching for a string that needed hashing.
I wonder whether the most effective use of time at this point
wouldn't be to fix jsonb_ops to do that, rather than arguing about
what to rename it to. If it didn't have the failure-for-long-strings
problem I doubt anybody would be unhappy about making it the default.
regards, tom lane
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
view thread (65+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox