Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bB0GY-0006Ss-MN for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jun 2016 13:44:58 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bB0GY-0007Ua-4S for pgsql-performance@arkaria.postgresql.org; Thu, 09 Jun 2016 13:44:58 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bB0Es-0005e4-DM for pgsql-performance@postgresql.org; Thu, 09 Jun 2016 13:43:14 +0000 Received: from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bB0Eo-0006pH-At for pgsql-performance@postgresql.org; Thu, 09 Jun 2016 13:43:12 +0000 Received: by mail-wm0-x235.google.com with SMTP id k204so61230908wmk.0 for ; Thu, 09 Jun 2016 06:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=SCaDILiNGVPg11UY+LmrfihrUf0NQUKtfUmJm5hz5PY=; b=U2ASZl2ZAxlG9ylNbpDAVWZ08cmfuDN2m6+jE3/xYq5kZuI9wYoAgBuiT6zBf7miK3 TIBC4UCOUoc1Cw687laoDXI66Gy4lh8qhFdQ+pHLdfHJv8Wh5b+tixCT9SL1lP8KwiiP bKu1zmYQT1lNiTvP1QySlvAbvDyxyAnmFnKoR4VqsV07sqtktn0PQBwxvGgyrAhsMhF+ VzPmNfGN+Z3ZKTc3eU7Gtgd+aZXMkHvNnSYyvfNbuN9j4EMmuGfjcJUlhHOrov6Bsa6n Goeq21Pq6bxYn+39kacsOXHJkWTl4ySIAtx8vbJUQh1SpMEUzigYiBxnzbWv4S+xsciA IJ2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=SCaDILiNGVPg11UY+LmrfihrUf0NQUKtfUmJm5hz5PY=; b=It1yV/4x0EXCP+s1XDnEvT0er3toEzk/XhUKK6RSuA3C2MTu1PCVfogl8bI6qJQPXO eh0jqSLCMgzcmpEsStydwZWF/dbw0b6xXQMTnQmJLbZMLigAFQv4Q+1PL6AMjsafK3cF Horuxi7qUcZnto4ej8ihUwDBj4biKrVoO+OUFOj+Cq3cSDfHJ2l1G5n0/SwVcwCxZdDE ET5RhwArUAraTwZzFb9MJCMVPsaSkb08cH0dr1dnxBqp53mBCtVL2ar6S8tFbQPsr09L Wy3mpJ2s73Cb4rya/YgcfL0zXracAY/jzi6vYZ+IJQ9ze1mo4cjRbd/dzR3bmbh/YSn7 Cq0A== X-Gm-Message-State: ALyK8tJw9QEPhDDNPYJzXPsZD+SiEBXgV1OwQrDSbcdHVTm3Tn7GHs/do/1TqFF3xb80TiqBp/R/mBk6sAvWiA== X-Received: by 10.28.147.19 with SMTP id v19mr13218966wmd.13.1465479788514; Thu, 09 Jun 2016 06:43:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.133.196 with HTTP; Thu, 9 Jun 2016 06:43:07 -0700 (PDT) In-Reply-To: References: <5756C618.1070802@agliodbs.com> <717.1465365859@sss.pgh.pa.us> From: Nicolas Paris Date: Thu, 9 Jun 2016 15:43:07 +0200 Message-ID: Subject: Re: array size exceeds the maximum allowed (1073741823) when building a json To: Merlin Moncure Cc: Tom Lane , Michael Paquier , Josh Berkus , "David G. Johnston" , pgsql-performance Content-Type: multipart/alternative; boundary=001a1145b1943854450534d89a83 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a1145b1943854450534d89a83 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2016-06-09 15:31 GMT+02:00 Merlin Moncure : > On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane wrote: > > Michael Paquier writes: > >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus wrote= : > >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote: > >>>> Will this 1GO restriction is supposed to increase in a near future ? > > > >>> Not planned, no. Thing is, that's the limit for a field in general, > not > >>> just JSON; changing it would be a fairly large patch. It's desireabl= e, > >>> but AFAIK nobody is working on it. > > > >> And there are other things to consider on top of that, like the > >> maximum allocation size for palloc, the maximum query string size, > >> COPY, etc. This is no small project, and the potential side-effects > >> should not be underestimated. > > > > It's also fair to doubt that client-side code would "just work" with > > no functionality or performance problems for such large values. > > > > I await with interest the OP's results on other JSON processors that > > have no issues with GB-sized JSON strings. > > Yup. Most json libraries and tools are going to be disgusting memory > hogs or have exponential behaviors especially when you consider you > are doing the transformation as well. Just prettifying json documents > over 1GB can be a real challenge. > > Fortunately the workaround here is pretty easy. Keep your query > exactly as is but remove the final aggregation step so that it returns > a set. Next, make a small application that runs this query and does > the array bits around each row (basically prepending the final result > with [ appending the final result with ] and putting , between rows). > =E2=80=8BThe point is when prepending/appending leads to deal with strings. Transforming each value of the resultset to a string implies to escape the double quote. then: row1 contains {"hello":"world"} step 1 =3D prepend -> "[{\"hello\":\"world\"}" step 2 =3D append -> "[{\"hello\":\"world\"}," and so on the json is corrupted. Hopelly I am sure I am on a wrong way about that. =E2=80=8B > It's essential that you use a client library that does not buffer the > entire result in memory before emitting results. This can be done in > psql (FETCH mode), java, libpq (single row mode), etc. I suspect > node.js pg module can do this as well, and there certainty will be > others. > > The basic objective is you want the rows to be streamed out of the > database without being buffered. If you do that, you should be able > to stream arbitrarily large datasets out of the database to a json > document assuming the server can produce the query. > > merlin > --001a1145b1943854450534d89a83 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncu= re@gmail.com>:
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <micha= el.paquier@gmail.com> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a nea= r future ?
>
>>> Not planned, no.=C2=A0 Thing is, that's the limit for a fi= eld in general, not
>>> just JSON; changing it would be a fairly large patch.=C2=A0 It= 's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the >> maximum allocation size for palloc, the maximum query string size,=
>> COPY, etc. This is no small project, and the potential side-effect= s
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just wor= k" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors th= at
> have no issues with GB-sized JSON strings.

Yup.=C2=A0 Most json libraries and tools are going to be disgusting = memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.=C2=A0 Just prettifying json documents=
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.=C2=A0 Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).

=E2=80=8BThe point is when = prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to = escape the double quote.
then:
row1= contains {"hello":"world"}
step 1 =3D prepend -> "[{\"hello\":\"world= \"}"
step 2 =3D append ->= "[{\"hello\":\"world\"},"
and so on
the json = is corrupted. Hopelly I am sure I am on a wrong way about that.

=E2=80=8B=C2=A0
It's essential that you use a client library that does not buffer the entire result in memory before emitting results.=C2=A0 =C2=A0This can be do= ne in
psql (FETCH mode), java, libpq (single row mode), etc.=C2=A0 =C2=A0I suspec= t
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.=C2=A0 If you do that, you should be able to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin

--001a1145b1943854450534d89a83--