public inbox for [email protected]  
help / color / mirror / Atom feed
array size exceeds the maximum allowed (1073741823) when building a json
13+ messages / 6 participants
[nested] [flat]

* array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 11:44  Nicolas Paris <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nicolas Paris @ 2016-06-07 11:44 UTC (permalink / raw)
  To: pgsql-performance

Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:

SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


Thanks by advance,

Informations:

postgresql 9.4
shared_buffers = 55GB
64bit Red Hat Enterprise Linux Server release 6.7

the query:
 WITH sel AS
  (SELECT ids_pat,
          ids_nda
   FROM eds.nda
   WHERE (dt_deb_nda >= '20150101'
          AND dt_deb_nda <= '20150401')),
      diag AS
  ( SELECT ids_nda_rum,
           json_agg(diago) AS diago,
           count(1) AS total
   FROM
     (SELECT ids_nda_rum,
             json_build_object( 'cd_cim', cd_cim,
'lib_cim',lib_typ_diag_tr, 'dt_cim',dt_exec) AS diago
      FROM eds.fait_diag_tr
      WHERE ids_nda IN
          (SELECT ids_nda
           FROM sel)
      ORDER BY dt_exec) AS diago2
   GROUP BY ids_nda_rum),
      act AS
  ( SELECT ids_nda_rum,
           json_agg(acto) AS acto,
           count(1) AS total
   FROM
     ( SELECT ids_nda_rum,
              json_build_object( 'cd_act',cd_ccam, 'dt_act',dt_exec) AS acto
      FROM eds.fait_act_tr
      WHERE ids_nda IN
          (SELECT ids_nda
           FROM sel)
      ORDER BY dt_exec) AS acto2
   GROUP BY ids_nda_rum ),
      ghm AS
  ( SELECT ids_nda_rum,
           json_agg(ghmo) AS ghmo,
           count(1) AS total
   FROM
     ( SELECT ids_nda_rum,
              json_build_object( 'cd_ghm',cd_ghm, 'cd_ghs',cd_ghs,
'status',lib_statut_tr, 'dt_maj_rum_ghm',dt_maj_rum_ghm) AS ghmo
      FROM eds.nda_rum_ghm_tr
      LEFT JOIN eds.nda_rum_tr rum USING (ids_nda_rum)
      WHERE nda_rum_ghm_tr.ids_nda IN
          (SELECT ids_nda
           FROM sel)
        AND rum.cd_rum = 'RSS'
      ORDER BY dt_maj_rum_ghm) AS ghmo
   GROUP BY ids_nda_rum ),
      lab AS
  (SELECT ids_nda,
          json_agg(lab) AS labo,
          count(1) AS total
   FROM
     (SELECT ids_nda,
             json_build_object( 'valeur_type_tr',valeur_type_tr,
'dt_fait', dt_fait, 'unite',unite, 'cd_test_lab',cd_test_lab,
'valeur_sign_tr',valeur_sign_tr, 'valeur_num_tr',valeur_num_tr,
'valeur_text_tr',valeur_text_tr,
'valeur_abnormal_tr',valeur_abnormal_tr) AS lab
      FROM eds.fait_lab_tr
      WHERE ids_nda IN
          (SELECT ids_nda
           FROM sel)
      ORDER BY dt_fait) AS labo
   GROUP BY ids_nda),
      rum AS
  ( SELECT ids_nda,
           json_agg(rum) AS rumo,
           count(1) AS total
   FROM
     ( SELECT ids_nda,
              json_build_object( 'cd_rum',cd_rum, 'dt_deb_rum',
dt_deb_rum, 'dt_fin_rum', dt_fin_rum, 'diag',
json_build_object('total',diag.total,'diag',diag.diago), 'act',
json_build_object('total',act.total,'act',act.acto) ) AS rum
      FROM eds.nda_rum_tr
      LEFT JOIN diag USING (ids_nda_rum)
      LEFT JOIN act USING (ids_nda_rum)
      WHERE ids_nda IN
          (SELECT ids_nda
           FROM sel)
        AND cd_rum = 'RUM' ) AS rumo
   GROUP BY ids_nda),
      rss AS
  ( SELECT ids_nda,
           json_agg(rss) AS rsso,
           count(1) AS total
   FROM
     ( SELECT ids_nda,
              json_build_object( 'cd_rum',cd_rum, 'dt_deb_rss',
dt_deb_rum, 'dt_fin_rss', dt_fin_rum, 'ghm',
json_build_object('total',ghm.total,'ghm',ghm.ghmo), 'rum',
json_build_object('total',rum.total, 'rum',rum.rumo) ) AS rss
      FROM eds.nda_rum_tr
      LEFT JOIN ghm USING (ids_nda_rum)
      LEFT JOIN rum USING (ids_nda)
      WHERE ids_nda IN
          (SELECT ids_nda
           FROM sel)
        AND cd_rum = 'RSS' ) AS rss
   GROUP BY ids_nda),
      enc AS
  (SELECT 'Encounter' AS "resourceType",
          cd_nda AS "identifier",
          duree_hospit AS "length",
          lib_statut_nda_tr AS "status",
          lib_type_nda_tr AS "type",
          ids_pat,
          json_build_object('start', dt_deb_nda,'end', dt_fin_nda) AS
"appointment",
          json_build_object('total',lab.total, 'lab',lab.labo) AS lab,
          json_build_object('total',rss.total, 'rss',rss.rsso) AS rss
   FROM eds.nda_tr
   LEFT JOIN lab USING (ids_nda)
   LEFT JOIN rss USING (ids_nda)
   WHERE ids_nda IN
       (SELECT ids_nda
        FROM sel)
   ORDER BY dt_deb_nda ASC)
SELECT 'Bundle' AS "resourceType",
       count(1) AS total,
       array_to_json(array_agg(ROW)) AS encounter
FROM
  (SELECT 'Patient' AS "resourceType",
          ipp AS "identifier",
          nom AS "name",
          cd_sex_tr AS "gender",
          dt_nais AS "birthDate",
          json_build_array(enc.*) AS encounters
   FROM eds.patient_tr
   INNER JOIN enc USING (ids_pat) ) ROW;


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 12:31  David G. Johnston <[email protected]>
  parent: Nicolas Paris <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: David G. Johnston @ 2016-06-07 12:31 UTC (permalink / raw)
  To: Nicolas Paris <[email protected]>; +Cc: pgsql-performance

On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <[email protected]> wrote:

> Hello,
>
> I run a query transforming huge tables to a json document based on a period.
> It works great for a modest period (little dataset).
> However, when increasing the period (huge dataset) I get this error:
>
> SQL ERROR[54000]
> ERROR: array size exceeds the maximum allowed (1073741823)
>
> ​https://www.postgresql.org/about/​

​Maximum Field Size: 1 GB​

​It doesn't matter that the data never actually is placed into a physical
table.

David J.


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 12:36  Nicolas Paris <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nicolas Paris @ 2016-06-07 12:36 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: pgsql-performance

2016-06-07 14:31 GMT+02:00 David G. Johnston <[email protected]>:

> On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <[email protected]> wrote:
>
>> Hello,
>>
>> I run a query transforming huge tables to a json document based on a period.
>> It works great for a modest period (little dataset).
>> However, when increasing the period (huge dataset) I get this error:
>>
>> SQL ERROR[54000]
>> ERROR: array size exceeds the maximum allowed (1073741823)
>>
>> ​https://www.postgresql.org/about/​
>
> ​Maximum Field Size: 1 GB​
>

It means a json cannot exceed 1GB in postgresql, right ?
Then I must build it with an external tool ?
​


>
> ​It doesn't matter that the data never actually is placed into a physical
> table.
>
> David J.
>
>


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 12:39  David G. Johnston <[email protected]>
  parent: Nicolas Paris <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: David G. Johnston @ 2016-06-07 12:39 UTC (permalink / raw)
  To: Nicolas Paris <[email protected]>; +Cc: pgsql-performance

On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <[email protected]> wrote:

> 2016-06-07 14:31 GMT+02:00 David G. Johnston <[email protected]>:
>
>> On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> I run a query transforming huge tables to a json document based on a period.
>>> It works great for a modest period (little dataset).
>>> However, when increasing the period (huge dataset) I get this error:
>>>
>>> SQL ERROR[54000]
>>> ERROR: array size exceeds the maximum allowed (1073741823)
>>>
>>> ​https://www.postgresql.org/about/​
>>
>> ​Maximum Field Size: 1 GB​
>>
>
> It means a json cannot exceed 1GB in postgresql, right ?
>

​Yes​


> Then I must build it with an external tool ?
> ​
>
>

​​You have to do something different.  Using multiple columns and/or
multiple rows might we workable.

David J.


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 12:42  Nicolas Paris <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 2 replies; 13+ messages in thread

From: Nicolas Paris @ 2016-06-07 12:42 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: pgsql-performance

2016-06-07 14:39 GMT+02:00 David G. Johnston <[email protected]>:

> On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <[email protected]> wrote:
>
>> 2016-06-07 14:31 GMT+02:00 David G. Johnston <[email protected]>
>> :
>>
>>> On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I run a query transforming huge tables to a json document based on a period.
>>>> It works great for a modest period (little dataset).
>>>> However, when increasing the period (huge dataset) I get this error:
>>>>
>>>> SQL ERROR[54000]
>>>> ERROR: array size exceeds the maximum allowed (1073741823)
>>>>
>>>> ​https://www.postgresql.org/about/​
>>>
>>> ​Maximum Field Size: 1 GB​
>>>
>>
>> It means a json cannot exceed 1GB in postgresql, right ?
>>
>
> ​Yes​
>
>
>> Then I must build it with an external tool ?
>> ​
>>
>>
>
> ​​You have to do something different.  Using multiple columns and/or
> multiple rows might we workable.
>

​Certainly. Kind of disappointing, because I won't find any json builder as
performant as postgresql.​

​

Will this 1GO restriction is supposed to increase in a near future ?​


> David J.
>
>


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 12:58  David G. Johnston <[email protected]>
  parent: Nicolas Paris <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: David G. Johnston @ 2016-06-07 12:58 UTC (permalink / raw)
  To: Nicolas Paris <[email protected]>; +Cc: pgsql-performance

On Tue, Jun 7, 2016 at 8:42 AM, Nicolas Paris <[email protected]> wrote:

>
>
> 2016-06-07 14:39 GMT+02:00 David G. Johnston <[email protected]>:
>
>> On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <[email protected]>
>> wrote:
>>
>>> 2016-06-07 14:31 GMT+02:00 David G. Johnston <[email protected]
>>> >:
>>>
>>>> On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I run a query transforming huge tables to a json document based on a period.
>>>>> It works great for a modest period (little dataset).
>>>>> However, when increasing the period (huge dataset) I get this error:
>>>>>
>>>>> SQL ERROR[54000]
>>>>> ERROR: array size exceeds the maximum allowed (1073741823)
>>>>>
>>>>> ​https://www.postgresql.org/about/​
>>>>
>>>> ​Maximum Field Size: 1 GB​
>>>>
>>>
>>> It means a json cannot exceed 1GB in postgresql, right ?
>>>
>>
>> ​Yes​
>>
>>
>>> Then I must build it with an external tool ?
>>> ​
>>>
>>>
>>
>> ​​You have to do something different.  Using multiple columns and/or
>> multiple rows might we workable.
>>
>
> ​Certainly. Kind of disappointing, because I won't find any json builder
> as performant as postgresql.​
>
> ​
>
> Will this 1GO restriction is supposed to increase in a near future ?​
>
>
There has been zero chatter on the public lists about increasing any of the
limits on that page I linked to.

David J.
​


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 13:03  Josh Berkus <[email protected]>
  parent: Nicolas Paris <[email protected]>
  1 sibling, 2 replies; 13+ messages in thread

From: Josh Berkus @ 2016-06-07 13:03 UTC (permalink / raw)
  To: Nicolas Paris <[email protected]>; David G. Johnston <[email protected]>; +Cc: pgsql-performance

On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>     ​​You have to do something different.  Using multiple columns and/or
>     multiple rows might we workable.
> 
> 
> ​Certainly. Kind of disappointing, because I won't find any json builder
> as performant as postgresql.​

That's nice to hear.

> Will this 1GO restriction is supposed to increase in a near future ?​

Not planned, no.  Thing is, that's the limit for a field in general, not
just JSON; changing it would be a fairly large patch.  It's desireable,
but AFAIK nobody is working on it.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-07 19:23  Nicolas Paris <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: Nicolas Paris @ 2016-06-07 19:23 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-performance

2016-06-07 15:03 GMT+02:00 Josh Berkus <[email protected]>:

> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
> >     ​​You have to do something different.  Using multiple columns and/or
> >     multiple rows might we workable.
>

​Getting a unique document from multiple rows coming from postgresql is not
that easy... The external tools considers each postgresql JSON fields as
strings or have to parse it again. Parsing them would add an overhead on
the external tool, and I d'say this would be better to build the entire
JSON in the external tool. This leads not to use postgresql JSON builder at
all, and delegate this job to a tool that is able to deal with > 1GO
documents.



> >
> >
> > ​Certainly. Kind of disappointing, because I won't find any json builder
> > as performant as postgresql.​
>
> That's nice to hear.
>
> > Will this 1GO restriction is supposed to increase in a near future ?​
>
> Not planned, no.  Thing is, that's the limit for a field in general, not
> just JSON; changing it would be a fairly large patch.  It's desireable,
> but AFAIK nobody is working on it.
>

Comparing to mongoDB 16MO document limitation 1GO is great (
http://tech.tulentsev.com/2014/02/limitations-of-mongodb/)​. But for my use
case this is not sufficient.



> --
> --
> Josh Berkus
> Red Hat OSAS
> (any opinions are my own)
>


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-08 05:56  Michael Paquier <[email protected]>
  parent: Josh Berkus <[email protected]>
  1 sibling, 1 reply; 13+ messages in thread

From: Michael Paquier @ 2016-06-08 05:56 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: Nicolas Paris <[email protected]>; David G. Johnston <[email protected]>; pgsql-performance

On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <[email protected]> wrote:
> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>     You have to do something different.  Using multiple columns and/or
>>     multiple rows might we workable.
>>
>>
>> Certainly. Kind of disappointing, because I won't find any json builder
>> as performant as postgresql.
>
> That's nice to hear.
>
>> Will this 1GO restriction is supposed to increase in a near future ?
>
> Not planned, no.  Thing is, that's the limit for a field in general, not
> just JSON; changing it would be a fairly large patch.  It's desireable,
> but AFAIK nobody is working on it.

And there are other things to consider on top of that, like the
maximum allocation size for palloc, the maximum query string size,
COPY, etc. This is no small project, and the potential side-effects
should not be underestimated.
-- 
Michael


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-08 06:04  Tom Lane <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Tom Lane @ 2016-06-08 06:04 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Josh Berkus <[email protected]>; Nicolas Paris <[email protected]>; David G. Johnston <[email protected]>; pgsql-performance

Michael Paquier <[email protected]> writes:
> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <[email protected]> wrote:
>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>> Will this 1GO restriction is supposed to increase in a near future ?

>> Not planned, no.  Thing is, that's the limit for a field in general, not
>> just JSON; changing it would be a fairly large patch.  It's desireable,
>> but AFAIK nobody is working on it.

> And there are other things to consider on top of that, like the
> maximum allocation size for palloc, the maximum query string size,
> COPY, etc. This is no small project, and the potential side-effects
> should not be underestimated.

It's also fair to doubt that client-side code would "just work" with
no functionality or performance problems for such large values.

I await with interest the OP's results on other JSON processors that
have no issues with GB-sized JSON strings.

			regards, tom lane


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-09 13:31  Merlin Moncure <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Merlin Moncure @ 2016-06-09 13:31 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Josh Berkus <[email protected]>; Nicolas Paris <[email protected]>; David G. Johnston <[email protected]>; pgsql-performance

On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <[email protected]> wrote:
> Michael Paquier <[email protected]> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <[email protected]> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-09 13:43  Nicolas Paris <[email protected]>
  parent: Merlin Moncure <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Nicolas Paris @ 2016-06-09 13:43 UTC (permalink / raw)
  To: Merlin Moncure <[email protected]>; +Cc: Tom Lane <[email protected]>; Michael Paquier <[email protected]>; Josh Berkus <[email protected]>; David G. Johnston <[email protected]>; pgsql-performance

2016-06-09 15:31 GMT+02:00 Merlin Moncure <[email protected]>:

> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <[email protected]> wrote:
> > Michael Paquier <[email protected]> writes:
> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <[email protected]> wrote:
> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
> >>>> Will this 1GO restriction is supposed to increase in a near future ?
> >
> >>> Not planned, no.  Thing is, that's the limit for a field in general,
> not
> >>> just JSON; changing it would be a fairly large patch.  It's desireable,
> >>> but AFAIK nobody is working on it.
> >
> >> And there are other things to consider on top of that, like the
> >> maximum allocation size for palloc, the maximum query string size,
> >> COPY, etc. This is no small project, and the potential side-effects
> >> should not be underestimated.
> >
> > It's also fair to doubt that client-side code would "just work" with
> > no functionality or performance problems for such large values.
> >
> > I await with interest the OP's results on other JSON processors that
> > have no issues with GB-sized JSON strings.
>
> Yup.  Most json libraries and tools are going to be disgusting memory
> hogs or have exponential behaviors especially when you consider you
> are doing the transformation as well.  Just prettifying json documents
> over 1GB can be a real challenge.
>
> Fortunately the workaround here is pretty easy.  Keep your query
> exactly as is but remove the final aggregation step so that it returns
> a set. Next, make a small application that runs this query and does
> the array bits around each row (basically prepending the final result
> with [ appending the final result with ] and putting , between rows).
>

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the
double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

​


> It's essential that you use a client library that does not buffer the
> entire result in memory before emitting results.   This can be done in
> psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
> node.js pg module can do this as well, and there certainty will be
> others.
>
> The basic objective is you want the rows to be streamed out of the
> database without being buffered.  If you do that, you should be able
> to stream arbitrarily large datasets out of the database to a json
> document assuming the server can produce the query.
>
> merlin
>


^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: array size exceeds the maximum allowed (1073741823) when building a json
@ 2016-06-09 20:36  Merlin Moncure <[email protected]>
  parent: Nicolas Paris <[email protected]>
  0 siblings, 0 replies; 13+ messages in thread

From: Merlin Moncure @ 2016-06-09 20:36 UTC (permalink / raw)
  To: Nicolas Paris <[email protected]>; +Cc: Tom Lane <[email protected]>; Michael Paquier <[email protected]>; Josh Berkus <[email protected]>; David G. Johnston <[email protected]>; pgsql-performance

On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris <[email protected]> wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure <[email protected]>:
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <[email protected]> wrote:
>> > Michael Paquier <[email protected]> writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <[email protected]> wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>> >>>> Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no.  Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch.  It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup.  Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well.  Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy.  Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




^ permalink  raw  reply  [nested|flat] 13+ messages in thread


end of thread, other threads:[~2016-06-09 20:36 UTC | newest]

Thread overview: 13+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2016-06-07 11:44 array size exceeds the maximum allowed (1073741823) when building a json Nicolas Paris <[email protected]>
2016-06-07 12:31 ` David G. Johnston <[email protected]>
2016-06-07 12:36   ` Nicolas Paris <[email protected]>
2016-06-07 12:39     ` David G. Johnston <[email protected]>
2016-06-07 12:42       ` Nicolas Paris <[email protected]>
2016-06-07 12:58         ` David G. Johnston <[email protected]>
2016-06-07 13:03         ` Josh Berkus <[email protected]>
2016-06-07 19:23           ` Nicolas Paris <[email protected]>
2016-06-08 05:56           ` Michael Paquier <[email protected]>
2016-06-08 06:04             ` Tom Lane <[email protected]>
2016-06-09 13:31               ` Merlin Moncure <[email protected]>
2016-06-09 13:43                 ` Nicolas Paris <[email protected]>
2016-06-09 20:36                   ` Merlin Moncure <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox