Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bB6gn-0008Ha-TH for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jun 2016 20:36:30 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bB6gn-000540-G5 for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jun 2016 20:36:29 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bB6gm-000539-A6 for pgsql-performance@postgresql.org; Thu, 09 Jun 2016 20:36:28 +0000 Received: from mail-vk0-x236.google.com ([2607:f8b0:400c:c05::236]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bB6gi-00012a-8j for pgsql-performance@postgresql.org; Thu, 09 Jun 2016 20:36:27 +0000 Received: by mail-vk0-x236.google.com with SMTP id d127so71440256vkh.2 for ; Thu, 09 Jun 2016 13:36:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=tDfrnMPZ2Dfsq2ucCPfHjUK8TT8UhA/hR/uqFL3T3nM=; b=plG3PCVJachO8MHp6e5eS3HWF7AMm89l0fJ6wYKpOnvPyGqIt0NSLwbntxLdI0IDDz qL2/j+1fuc6gzoqYIh5rubFmlPbdAjWUqp5MIGkWF7W2ico5uFJuDqcUQQKfVfWd2ian EWEj4QdkQsPIceDNWdAoxVa0BjcSkVgBYduma2cldgwwL6S3XC/7+U31zk3iWzguThSF UuuIu2/uKaAnmJ3beYAJRgLIJBMH+LJZxQozZnjDnD8M1cczbboPsUeBXGH45vVN/0Os AIXvPRo2bWqoXGAR5TXV8Cbk8/98GBPcpgPRH+PK3I24751Kl9Mq+Zm/TtRAzobF12CB sgqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=tDfrnMPZ2Dfsq2ucCPfHjUK8TT8UhA/hR/uqFL3T3nM=; b=TNdWAdo6rlc31hBlJ7s/CX6SuH94en0VmxQqoGD8p1LyOACWrPcm/5gkL5TwRjai5G 1IZ3i7kijEUBcgzFQHPsIITZfGg6+LEoFEjZ1sQnVmm8SI9PNa4T5l9EJiWJRzETf+yp 1MQH9swN+LOzoGXF/dhjl4yOUmc2LQ0SmsvxOoVk2/emTrWQ4QZYtEwVOxRNgh7ph0Ut 7vXN11st1EKEunFlCbJqkg/9ildyDdffPwMacc5q4XgnIC1wXY7FODHw5UzAO0CsoXze dz5a2r/AbXjIvzYkH6fONRPjQf+lhFBdYrjLAJihVIFQEHewyac7qTSzwT++twhRimBP DgDg== X-Gm-Message-State: ALyK8tKRzilNEkGkm9gCN3exR4ljUEsR70PkAqVHfze68MIC36L8gcQ/kwBtjl4PAw/AfqiNh8yDwiUrGEjzQA== X-Received: by 10.176.6.36 with SMTP id f33mr5512280uaf.145.1465504582362; Thu, 09 Jun 2016 13:36:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.68.103 with HTTP; Thu, 9 Jun 2016 13:36:21 -0700 (PDT) In-Reply-To: References: <5756C618.1070802@agliodbs.com> <717.1465365859@sss.pgh.pa.us> From: Merlin Moncure Date: Thu, 9 Jun 2016 15:36:21 -0500 Message-ID: Subject: Re: array size exceeds the maximum allowed (1073741823) when building a json To: Nicolas Paris Cc: Tom Lane , Michael Paquier , Josh Berkus , "David G. Johnston" , pgsql-performance Content-Type: text/plain; charset=UTF-8 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris wrote: > > > 2016-06-09 15:31 GMT+02:00 Merlin Moncure : >> >> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane wrote: >> > Michael Paquier writes: >> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus wrote: >> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote: >> >>>> Will this 1GO restriction is supposed to increase in a near future ? >> > >> >>> Not planned, no. Thing is, that's the limit for a field in general, >> >>> not >> >>> just JSON; changing it would be a fairly large patch. It's >> >>> desireable, >> >>> but AFAIK nobody is working on it. >> > >> >> And there are other things to consider on top of that, like the >> >> maximum allocation size for palloc, the maximum query string size, >> >> COPY, etc. This is no small project, and the potential side-effects >> >> should not be underestimated. >> > >> > It's also fair to doubt that client-side code would "just work" with >> > no functionality or performance problems for such large values. >> > >> > I await with interest the OP's results on other JSON processors that >> > have no issues with GB-sized JSON strings. >> >> Yup. Most json libraries and tools are going to be disgusting memory >> hogs or have exponential behaviors especially when you consider you >> are doing the transformation as well. Just prettifying json documents >> over 1GB can be a real challenge. >> >> Fortunately the workaround here is pretty easy. Keep your query >> exactly as is but remove the final aggregation step so that it returns >> a set. Next, make a small application that runs this query and does >> the array bits around each row (basically prepending the final result >> with [ appending the final result with ] and putting , between rows). > > > The point is when prepending/appending leads to deal with strings. > Transforming each value of the resultset to a string implies to escape the > double quote. > then: > row1 contains {"hello":"world"} > step 1 = prepend -> "[{\"hello\":\"world\"}" > step 2 = append -> "[{\"hello\":\"world\"}," right 3 rows contain {"hello":"world"} before iteration: emit '[' before every row except the first, prepend ',' after iteration: emit ']' you end up with: [{"hello":"world"} ,{"hello":"world"} ,{"hello":"world"}] ...which is 100% valid json as long as each row of the set is a json object. in SQL, the technique is like this: select ('[' || string_agg(j::text, ',') || ']')::json from (select json_build_object('hello', 'world') j from generate_series(1,3)) q; the difference is, instead of having the database do the string_agg step, it's handled on the client during iteration over the output of generate_series. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance