Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WjFCT-0005Ri-Ly for pgsql-hackers@arkaria.postgresql.org; Sat, 10 May 2014 21:52:57 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1WjFCT-0001pB-2f for pgsql-hackers@arkaria.postgresql.org; Sat, 10 May 2014 21:52:57 +0000 Received: from makus.postgresql.org ([2001:4800:7903:4::125]) by malur.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WjFCQ-0001mj-LH for pgsql-hackers@postgresql.org; Sat, 10 May 2014 21:52:54 +0000 Received: from mail-oa0-f45.google.com ([209.85.219.45]) by makus.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1WjFCN-0003h4-Ma for pgsql-hackers@postgresql.org; Sat, 10 May 2014 21:52:53 +0000 Received: by mail-oa0-f45.google.com with SMTP id l6so6649776oag.32 for ; Sat, 10 May 2014 14:52:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=3GXVp7kqwrJj+fSrEZDmXjwymz20ClrDUpX7fTe1LzM=; b=U4n+7CcH+aoaVM7fmuz/MOnYPzdkE2mqaEVs/cX02I0tVNd0ghsL8NbHLv7ZttLDSG 5VObiSszMymXW3KYblup8EujcmEkbTl7RqeU2/R2pnTDW5CIGpPuFyWtn1ip8qYP8XWG p7S5eBDFcQX8LredECf7B8yvB91MuW/aHqelQfJ+MCZnRb4zT6CkBxykWoR5AB2xQmaC kXMPZbBvMKfnt0P6BEoYoiBRnQVlGBFXcaNFT5phZZiyg1EJN6tSwbYPbbKu6AxpkOQm JJbe0qh69vFJ7enf+xeJfBOGELBVV9srGbTHiMo97+EgQyziiXoRYV6Hr9IR0EFrF/ew 0M8A== X-Gm-Message-State: ALoCoQlu+VZcw27HvEZ3GQ96Rarr+u1V4UoMwiSkboi9qISvEF3jkuJ69QrRHpqAVLsauJWSdz3X MIME-Version: 1.0 X-Received: by 10.182.118.169 with SMTP id kn9mr23935051obb.46.1399758770762; Sat, 10 May 2014 14:52:50 -0700 (PDT) Received: by 10.182.176.67 with HTTP; Sat, 10 May 2014 14:52:50 -0700 (PDT) In-Reply-To: <536E8F3A.40706@vmware.com> References: <16769.1399407530@sss.pgh.pa.us> <20140506212020.GK30817@momjian.us> <57E8AA44-F816-45F2-BB61-5A854FFB0A97@justatheory.com> <28554.1399414853@sss.pgh.pa.us> <20140508134701.GO30817@momjian.us> <5819.1399558614@sss.pgh.pa.us> <1888.1399588751@sss.pgh.pa.us> <20140509033405.GA23254@momjian.us> <536C550F.50108@archidevsys.co.nz> <18360.1399633457@sss.pgh.pa.us> <20140509135336.GC23254@momjian.us> <28961.1399668272@sss.pgh.pa.us> <536E8F3A.40706@vmware.com> Date: Sat, 10 May 2014 14:52:50 -0700 Message-ID: Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation) From: Peter Geoghegan To: Heikki Linnakangas Cc: Tom Lane , Greg Stark , Bruce Momjian , Gavin Flower , "David E. Wheeler" , Robert Haas , Andrew Dunstan , "pgsql-hackers@postgresql.org" Content-Type: text/plain; charset=UTF-8 X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org On Sat, May 10, 2014 at 1:42 PM, Heikki Linnakangas wrote: > For example, if you have an object like '{"foo": {"bar": 123 } }', one will > index "foo", "foo->bar", and "foo->bar->123" while the other will index > "foo", "bar" and "123". That isn't quite right, if we're talking about the user's perspective, and concerns about what is made indexable. You cannot actually query "foo->bar" there, because containment semantics don't support it. Basically, you'd have to write that like this: select * from docs where jdoc @> '{"foo":{"bar":*}'::jsonb; I've added a wildcard to the rhs jsonb here, which of course won't work, the proximate cause being that that simply isn't valid jsonb. It's also something inherently impossible to support with the current jsonb_hash_op's indexing strategy. That only hashes elements and values, mixing in keys from all outer nesting levels (so it's possible for there to be 0 *nentries). In the strict physical sense, it only indexes "123" (with all keys mixed in to the hash) from your example, because that's the only element or value that appears. Your description of only indexing the "leaf level" is not ideal, because it makes me think of B-Trees. Unlike B-Trees, jsonb can and frequently will have jsonb_hash_ops-indexable primitive values/elements at all nesting levels (that is, both "inner" containers and "leaf" containers). > Whether the opclasses use hashing to shorten the key is an orthogonal > property, and IMHO not as important. To reflect that, I suggest that we name > the opclasses: > > json_path_ops > json_value_ops > > or something along those lines. I would like to once again emphasize the user-level distinction between the two: one (the text storage opclass) is better for hstore-style indexing, where jsonb data is heterogeneous in structure, most probably a hodge-podge of key/value pairs. The other (the hash storage opclass) is better for document-database style use cases, where keys alone have low selectivity, and there is an unenforced/implicit schema that is mostly adhered to by client applications. That's why I don't think it's much of a problem that the example query above won't work (you could do something with expressions, or expression indexes to make something work, but it won't work with the text-storage opclass because there is no appropriate operator). In general, for those document-database style use cases, indexing keys alone isn't useful. Wherever that turns out to be untrue, ad-hoc expression indexes of the text storage opclass are probably the best solution. Anyway, I agree with your general assessment; hashing is nothing more than an implementation detail. I'm not sure which of your proposed names is intended for which opclass, though. Do you mean "jsonb_path_ops" and "jsonb_key_ops"? That makes perfect sense to me, because indexing keys alone is the main user-level distinction. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers