MIME-Version: 1.0
In-Reply-To: 
 <CAHyXU0yNbarj8ibwAzkKjWmF=M0x2HFiK0BSc84ZwLiaLyMGAg@mail.gmail.com>
References: 
 <CA+ssMORBYVFxedSXVVxL5VPeo=9AbYRQ-zbpuKJm6vEjmJ1CFA@mail.gmail.com>
 <CAKFQuwbjwyaaFUOEzxoYCBDm25ujOayCniwiWimPML_A_VaGYA@mail.gmail.com>
 <CA+ssMOT04Wg+OGfSc5AUk8nVQOxkVS6=CD-h7bH-pAH-ix50cw@mail.gmail.com>
 <CAKFQuwZEkX6o2mv2Ekqp8DpAs9F0bMsBnvQxzJGUr+SS6t2yTg@mail.gmail.com>
 <CA+ssMOSdHUTMWqswZ4njAgtf4vpAkuZ8duH7znB0G5pEswa38w@mail.gmail.com>
 <WM!d318efe352c6e124baed8053dba915699725885de80c4fdd70cff903faba6afc6e6233ba053fb428df2d56162a988029!@mailstronghold-1.zmailcloud.com>
 <5756C618.1070802@agliodbs.com>
 <CAB7nPqRdN=7xU14Q=2=fho_apR0TEoKtWyHZip0HSMkYsOUp-Q@mail.gmail.com>
 <717.1465365859@sss.pgh.pa.us>
 <CAHyXU0yNbarj8ibwAzkKjWmF=M0x2HFiK0BSc84ZwLiaLyMGAg@mail.gmail.com>
From: Nicolas Paris <niparisco@gmail.com>
Date: Thu, 9 Jun 2016 15:43:07 +0200
Message-ID: 
 <CA+ssMOSuBmUFWvWn_=6vow63Wp7seEhUF_TDCO7Duip3j2GUUw@mail.gmail.com>
Subject: Re: array size exceeds the maximum allowed (1073741823)
 when building a json
To: Merlin Moncure <mmoncure@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Michael Paquier <michael.paquier@gmail.com>,
	Josh Berkus <josh@agliodbs.com>,
 "David G. Johnston" <david.g.johnston@gmail.com>,
	pgsql-performance <pgsql-performance@postgresql.org>
Content-Type: multipart/alternative; boundary=001a1145b1943854450534d89a83
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a1145b1943854450534d89a83
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:

> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote=
:
> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
> >>>> Will this 1GO restriction is supposed to increase in a near future ?
> >
> >>> Not planned, no.  Thing is, that's the limit for a field in general,
> not
> >>> just JSON; changing it would be a fairly large patch.  It's desireabl=
e,
> >>> but AFAIK nobody is working on it.
> >
> >> And there are other things to consider on top of that, like the
> >> maximum allocation size for palloc, the maximum query string size,
> >> COPY, etc. This is no small project, and the potential side-effects
> >> should not be underestimated.
> >
> > It's also fair to doubt that client-side code would "just work" with
> > no functionality or performance problems for such large values.
> >
> > I await with interest the OP's results on other JSON processors that
> > have no issues with GB-sized JSON strings.
>
> Yup.  Most json libraries and tools are going to be disgusting memory
> hogs or have exponential behaviors especially when you consider you
> are doing the transformation as well.  Just prettifying json documents
> over 1GB can be a real challenge.
>
> Fortunately the workaround here is pretty easy.  Keep your query
> exactly as is but remove the final aggregation step so that it returns
> a set. Next, make a small application that runs this query and does
> the array bits around each row (basically prepending the final result
> with [ appending the final result with ] and putting , between rows).
>

=E2=80=8BThe point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the
double quote.
then:
row1 contains {"hello":"world"}
step 1 =3D prepend -> "[{\"hello\":\"world\"}"
step 2 =3D append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

=E2=80=8B


> It's essential that you use a client library that does not buffer the
> entire result in memory before emitting results.   This can be done in
> psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
> node.js pg module can do this as well, and there certainty will be
> others.
>
> The basic objective is you want the rows to be streamed out of the
> database without being buffered.  If you do that, you should be able
> to stream arbitrarily large datasets out of the database to a json
> document assuming the server can produce the query.
>
> merlin
>

--001a1145b1943854450534d89a83
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:tahoma,s=
ans-serif;color:rgb(0,51,51)"><br></div><div class=3D"gmail_extra"><br><div=
 class=3D"gmail_quote">2016-06-09 15:31 GMT+02:00 Merlin Moncure <span dir=
=3D"ltr">&lt;<a href=3D"mailto:mmoncure@gmail.com" target=3D"_blank">mmoncu=
re@gmail.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><span class=3D"">On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane &lt;<a hre=
f=3D"mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>&gt; wrote:<br>
&gt; Michael Paquier &lt;<a href=3D"mailto:michael.paquier@gmail.com">micha=
el.paquier@gmail.com</a>&gt; writes:<br>
&gt;&gt; On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus &lt;<a href=3D"mailto=
:josh@agliodbs.com">josh@agliodbs.com</a>&gt; wrote:<br>
&gt;&gt;&gt; On 06/07/2016 08:42 AM, Nicolas Paris wrote:<br>
&gt;&gt;&gt;&gt; Will this 1GO restriction is supposed to increase in a nea=
r future ?<br>
&gt;<br>
&gt;&gt;&gt; Not planned, no.=C2=A0 Thing is, that&#39;s the limit for a fi=
eld in general, not<br>
&gt;&gt;&gt; just JSON; changing it would be a fairly large patch.=C2=A0 It=
&#39;s desireable,<br>
&gt;&gt;&gt; but AFAIK nobody is working on it.<br>
&gt;<br>
&gt;&gt; And there are other things to consider on top of that, like the<br=
>
&gt;&gt; maximum allocation size for palloc, the maximum query string size,=
<br>
&gt;&gt; COPY, etc. This is no small project, and the potential side-effect=
s<br>
&gt;&gt; should not be underestimated.<br>
&gt;<br>
&gt; It&#39;s also fair to doubt that client-side code would &quot;just wor=
k&quot; with<br>
&gt; no functionality or performance problems for such large values.<br>
&gt;<br>
&gt; I await with interest the OP&#39;s results on other JSON processors th=
at<br>
&gt; have no issues with GB-sized JSON strings.<br>
<br>
</span>Yup.=C2=A0 Most json libraries and tools are going to be disgusting =
memory<br>
hogs or have exponential behaviors especially when you consider you<br>
are doing the transformation as well.=C2=A0 Just prettifying json documents=
<br>
over 1GB can be a real challenge.<br>
<br>
Fortunately the workaround here is pretty easy.=C2=A0 Keep your query<br>
exactly as is but remove the final aggregation step so that it returns<br>
a set. Next, make a small application that runs this query and does<br>
the array bits around each row (basically prepending the final result<br>
with [ appending the final result with ] and putting , between rows).<br></=
blockquote><div><br><div class=3D"gmail_default" style=3D"font-family:tahom=
a,sans-serif;color:rgb(0,51,51);display:inline">=E2=80=8BThe point is when =
prepending/appending leads to deal with strings.<br></div><div class=3D"gma=
il_default" style=3D"font-family:tahoma,sans-serif;color:rgb(0,51,51);displ=
ay:inline">Transforming each value of the resultset to a string implies to =
escape the double quote.<br>then:<br></div><div class=3D"gmail_default" sty=
le=3D"font-family:tahoma,sans-serif;color:rgb(0,51,51);display:inline">row1=
 contains {&quot;hello&quot;:&quot;world&quot;}<br></div><div class=3D"gmai=
l_default" style=3D"font-family:tahoma,sans-serif;color:rgb(0,51,51);displa=
y:inline">step 1 =3D prepend -&gt; &quot;[{\&quot;hello\&quot;:\&quot;world=
\&quot;}&quot;<br></div><div class=3D"gmail_default" style=3D"font-family:t=
ahoma,sans-serif;color:rgb(0,51,51);display:inline">step 2 =3D append -&gt;=
 &quot;[{\&quot;hello\&quot;:\&quot;world\&quot;},&quot;<br></div><div clas=
s=3D"gmail_default" style=3D"font-family:tahoma,sans-serif;color:rgb(0,51,5=
1);display:inline">and so on<br></div><div class=3D"gmail_default" style=3D=
"font-family:tahoma,sans-serif;color:rgb(0,51,51);display:inline">the json =
is corrupted. Hopelly I am sure I am on a wrong way about that.<br></div><d=
iv class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif;color:rgb=
(0,51,51);display:inline"><br></div><div class=3D"gmail_default" style=3D"f=
ont-family:tahoma,sans-serif;color:rgb(0,51,51);display:inline">=E2=80=8B</=
div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0=
px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
It&#39;s essential that you use a client library that does not buffer the<b=
r>
entire result in memory before emitting results.=C2=A0 =C2=A0This can be do=
ne in<br>
psql (FETCH mode), java, libpq (single row mode), etc.=C2=A0 =C2=A0I suspec=
t<br>
node.js pg module can do this as well, and there certainty will be<br>
others.<br>
<br>
The basic objective is you want the rows to be streamed out of the<br>
database without being buffered.=C2=A0 If you do that, you should be able<b=
r>
to stream arbitrarily large datasets out of the database to a json<br>
document assuming the server can produce the query.<br>
<span class=3D""><font color=3D"#888888"><br>
merlin<br>
</font></span></blockquote></div><br></div></div>

--001a1145b1943854450534d89a83--