public inbox for [email protected]
help / color / mirror / Atom feedFrom: Sehrope Sarkuni <[email protected]>
To: Andrew Dunstan <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Joe Conway <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Davin Shearer <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Emitting JSON to file using COPY TO
Date: Wed, 6 Dec 2023 16:36:02 -0500
Message-ID: <CAH7T-arYm4FF2pPWDwz_6Fnp8Jj2vMNOfEd2B54HkFOPNX_kGw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<20231206163349.GA2873889@nathanxps13>
<20231206164439.GC2873889@nathanxps13>
<[email protected]>
<[email protected]>
<[email protected]>
On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan <[email protected]> wrote:
> > The output size difference does say that maybe we should pay some
> > attention to the nearby request to not always label every field.
> > Perhaps there should be an option for each row to transform to
> > a JSON array rather than an object?
>
> I doubt it. People who want this are likely to want pretty much what
> this patch is providing, not something they would have to transform in
> order to get it. If they want space-efficient data they won't really be
> wanting JSON. Maybe they want Protocol Buffers or something in that vein.
>
For arrays v.s. objects, it's not just about data size. There are plenty of
situations where a JSON array is superior to an object (e.g. duplicate
column names). Lines of JSON arrays of strings is pretty much CSV with JSON
escaping rules and a pair of wrapping brackets. It's common for tabular
data in node.js environments as you don't need a separate CSV parser.
Each one has its place and a default of the row_to_json(...) representation
of the row still makes sense. But if the user has the option of outputting
a single json/jsonb field for each row without an object or array wrapper,
then it's possible to support all of these use cases as the user can
explicitly pick whatever envelope makes sense:
-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3)
a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]
-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT
WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"
I'm not sure how I feel about the behavior being automatic if it's a single
top level json / jsonb field rather than requiring the explicit option.
It's probably what a user would want but it also feels odd to change the
output wrapper automatically based on the fields in the response. If it is
automatic and the user wants the additional envelope, the option always
exists to wrap it further in another: json_build_object('some_field",
my_field_i_want_wrapped)
The duplicate field names would be a good test case too. I haven't gone
through this patch but I'm guessing it doesn't filter out duplicates so the
behavior would match up with row_to_json(...), i.e. duplicates are
preserved:
=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
row_to_json
---------------
{"a":1,"a":2}
If so, that's a good test case to add as however that's handled should be
deterministic.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
view thread (37+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Emitting JSON to file using COPY TO
In-Reply-To: <CAH7T-arYm4FF2pPWDwz_6Fnp8Jj2vMNOfEd2B54HkFOPNX_kGw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox