Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WiOfI-0005Mz-Cm for pgsql-hackers@arkaria.postgresql.org; Thu, 08 May 2014 13:47:12 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1WiOfH-00027x-O3 for pgsql-hackers@arkaria.postgresql.org; Thu, 08 May 2014 13:47:11 +0000 Received: from makus.postgresql.org ([2001:4800:7903:4::125]) by malur.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WiOfG-00027n-BK for pgsql-hackers@postgresql.org; Thu, 08 May 2014 13:47:10 +0000 Received: from momjian.us ([72.94.173.45]) by makus.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WiOfC-00031x-FU for pgsql-hackers@postgresql.org; Thu, 08 May 2014 13:47:09 +0000 Received: from bruce by momjian.us with local (Exim 4.72) (envelope-from ) id 1WiOf7-0007W6-2L; Thu, 08 May 2014 09:47:01 -0400 Date: Thu, 8 May 2014 09:47:01 -0400 From: Bruce Momjian To: Tom Lane Cc: "David E. Wheeler" , Greg Stark , Robert Haas , Heikki Linnakangas , Andrew Dunstan , Peter Geoghegan , "pgsql-hackers@postgresql.org" Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation) Message-ID: <20140508134701.GO30817@momjian.us> References: <30137.1397057056@sss.pgh.pa.us> <20140422223230.GL10046@momjian.us> <16527.1398214220@sss.pgh.pa.us> <20140506201048.GI30817@momjian.us> <16769.1399407530@sss.pgh.pa.us> <20140506212020.GK30817@momjian.us> <57E8AA44-F816-45F2-BB61-5A854FFB0A97@justatheory.com> <28554.1399414853@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <28554.1399414853@sss.pgh.pa.us> User-Agent: Mutt/1.5.20 (2009-06-14) X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org On Tue, May 6, 2014 at 06:20:53PM -0400, Tom Lane wrote: > "David E. Wheeler" writes: > > On May 6, 2014, at 2:20 PM, Bruce Momjian wrote: > >> Well, then, we only have a few days to come up with a name. > > > What are the options? > > We have no proposals as yet. > > I've been looking at the source code to try to understand the difference > between the two opclasses (and BTW I concur with the opinions expressed > recently about the poor state of the internal documentation for jsonb). > If I've got it straight: > > jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}" > would give rise to GIN entries that are effectively the strings "Kxyz" > and "V2". If you're looking for tuples containing "{xyz: 2}" then you > would be looking for the AND of those independent index entries, which > fortunately GIN is pretty good at computing. But you could also look > for just keys or just values. > > jsonb_hash_ops creates an index entry only for values, but what it > stores is a hash of both the value and the key it's stored under. > So in this example you'd get a hash combining "xyz" and "2". This > means the only type of query you can perform is like "find JSON tuples > containing {xyz: 2}". Good summary, thanks. This is the information I was hoping we had in our docs. How does hstore deal with these issues? > Because jsonb_ops stores the *whole* value, you can do lossless index > searches (no recheck needed on the heap tuple), but you also run the > risk of long strings failing to fit into an index entry. Since jsonb_ops > reduces everything to a hash, there's no possibility of index failure, > but all queries are lossy and require recheck. > > TBH, at this point I'm sort of agreeing with the thought expressed > upthread that maybe neither of these should be the default as-is. > They seem like rather arbitrary combinations of choices. In particular > I wonder why there's not an option to store keys and values separately, > but as hashes not as the original strings, so that indexability of > everything could be guaranteed. Or a variant of that might be to hash > only strings that are too large to fit in an index entry, and force > recheck only when searching for a string that needed hashing. > > I wonder whether the most effective use of time at this point > wouldn't be to fix jsonb_ops to do that, rather than arguing about > what to rename it to. If it didn't have the failure-for-long-strings > problem I doubt anybody would be unhappy about making it the default. Can we hash just the values, not the keys, in jsonb_ops, and hash the combo in jsonb_hash_ops. That would give us key-only lookups without a recheck. How do we index long strings now? Is it he combination of GIN and long strings that is the problem? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers