public inbox for [email protected]
help / color / mirror / Atom feedRe: Emitting JSON to file using COPY TO
4+ messages / 2 participants
[nested] [flat]
* Re: Emitting JSON to file using COPY TO
@ 2023-12-04 18:37 Davin Shearer <[email protected]>
2023-12-04 20:06 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Davin Shearer @ 2023-12-04 18:37 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; pgsql-hackers
Looking great!
For testing, in addition to the quotes, include DOS and Unix EOL, \ and /,
Byte Order Markers, and mulitbyte characters like UTF-8.
Essentially anything considered textural is fair game to be a value.
On Mon, Dec 4, 2023, 10:46 Joe Conway <[email protected]> wrote:
> On 12/4/23 09:25, Andrew Dunstan wrote:
> >
> > On 2023-12-04 Mo 08:37, Joe Conway wrote:
> >> On 12/4/23 07:41, Andrew Dunstan wrote:
> >>>
> >>> On 2023-12-03 Su 20:14, Joe Conway wrote:
> >>>> (please don't top quote on the Postgres lists)
> >>>>
> >>>> On 12/3/23 17:38, Davin Shearer wrote:
> >>>>> " being quoted as \\" breaks the JSON. It needs to be \". This has
> >>>>> been my whole problem with COPY TO for JSON.
> >>>>>
> >>>>> Please validate that the output is in proper format with correct
> >>>>> quoting for special characters. I use `jq` on the command line to
> >>>>> validate and format the output.
> >>>>
> >>>> I just hooked existing "row-to-json machinery" up to the "COPY TO"
> >>>> statement. If the output is wrong (just for for this use case?),
> >>>> that would be a missing feature (or possibly a bug?).
> >>>>
> >>>> Davin -- how did you work around the issue with the way the built in
> >>>> functions output JSON?
> >>>>
> >>>> Andrew -- comments/thoughts?
> >>>
> >>> I meant to mention this when I was making comments yesterday.
> >>>
> >>> The patch should not be using CopyAttributeOutText - it will try to
> >>> escape characters such as \, which produces the effect complained of
> >>> here, or else we need to change its setup so we have a way to inhibit
> >>> that escaping.
> >>
> >>
> >> Interesting.
> >>
> >> I am surprised this has never been raised as a problem with COPY TO
> >> before.
> >>
> >> Should the JSON output, as produced by composite_to_json(), be sent
> >> as-is with no escaping at all? If yes, is JSON somehow unique in this
> >> regard?
> >
> >
> > Text mode output is in such a form that it can be read back in using
> > text mode input. There's nothing special about JSON in this respect -
> > any text field will be escaped too. But output suitable for text mode
> > input is not what you're trying to produce here; you're trying to
> > produce valid JSON.
> >
> > So, yes, the result of composite_to_json, which is already suitably
> > escaped, should not be further escaped in this case.
>
> Gotcha.
>
> This patch version uses CopySendData() instead and includes
> documentation changes. Still lacks regression tests.
>
> Hopefully this looks better. Any other particular strings I ought to
> test with?
>
> 8<------------------
> test=# copy (select * from foo limit 4) to stdout (format json,
> force_array true);
> [
> {"id":1,"f1":"line with \" in it:
> 1","f2":"2023-12-03T12:26:41.596053-05:00"}
> ,{"id":2,"f1":"line with ' in it:
> 2","f2":"2023-12-03T12:26:41.596173-05:00"}
> ,{"id":3,"f1":"line with \" in it:
> 3","f2":"2023-12-03T12:26:41.596179-05:00"}
> ,{"id":4,"f1":"line with ' in it:
> 4","f2":"2023-12-03T12:26:41.596182-05:00"}
> ]
> 8<------------------
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-04 18:37 Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-04 20:06 ` Andrew Dunstan <[email protected]>
2023-12-04 22:55 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Dunstan @ 2023-12-04 20:06 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; Joe Conway <[email protected]>; +Cc: pgsql-hackers
On 2023-12-04 Mo 13:37, Davin Shearer wrote:
> Looking great!
>
> For testing, in addition to the quotes, include DOS and Unix EOL, \
> and /, Byte Order Markers, and mulitbyte characters like UTF-8.
>
> Essentially anything considered textural is fair game to be a value.
Joe already asked you to avoid top-posting on PostgreSQL lists. See
<http://idallen.com/topposting.html;
<http://idallen.com/topposting.html>; for an explanation.
We don't process BOMs elsewhere, and probably should not here either.
They are in fact neither required nor recommended for use with UTF8
data, AIUI. See a recent discussion on this list on that topic:
<https://www.postgresql.org/message-id/flat/81ca2b25-6b3a-499a-9a09-2dd21253c2cb%40unitrunker.net;
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-04 18:37 Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-04 20:06 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-04 22:55 ` Davin Shearer <[email protected]>
2023-12-05 14:56 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Davin Shearer @ 2023-12-04 22:55 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; pgsql-hackers; +Cc: Joe Conway <[email protected]>
Sorry about the top posting / top quoting... the link you sent me gives me
a 404. I'm not exactly sure what top quoting / posting means and Googling
those terms wasn't helpful for me, but I've removed the quoting that my
mail client is automatically "helpfully" adding to my emails. I mean no
offense.
Okay, digging in more...
If the value contains text that has BOMs [footnote 1] in it, it must be
preserved (the database doesn't need to interpret them or do anything
special with them - just store it and fetch it). There are however a few
characters that need to be escaped (per
https://www.w3docs.com/snippets/java/how-should-i-escape-strings-in-json.html)
so that the JSON format isn't broken. They are:
1. " (double quote)
2. \ (backslash)
3. / (forward slash)
4. \b (backspace)
5. \f (form feed)
6. \n (new line)
7. \r (carriage return)
8. \t (horizontal tab)
These characters should be represented in the test cases to see how the
escaping behaves and to ensure that the escaping is done properly per JSON
requirements. Forward slash comes as a bit of a surprise to me, but `jq`
handles it either way:
➜ echo '{"key": "this / is a forward slash"}' | jq .
{
"key": "this / is a forward slash"
}
➜ echo '{"key": "this \/ is a forward slash"}' | jq .
{
"key": "this / is a forward slash"
}
Hope it helps, and thank you!
1. I don't disagree that BOMs shouldn't be used for UTF-8, but I'm also
processing UTF-16{BE,LE} and UTF-32{BE,LE} (as well as other textural
formats that are neither ASCII or Unicode). I don't have the luxury of
changing the data that is given.
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-04 18:37 Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-04 20:06 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-04 22:55 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-05 14:56 ` Andrew Dunstan <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Dunstan @ 2023-12-05 14:56 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; pgsql-hackers; +Cc: Joe Conway <[email protected]>
On 2023-12-04 Mo 17:55, Davin Shearer wrote:
> Sorry about the top posting / top quoting... the link you sent me
> gives me a 404. I'm not exactly sure what top quoting / posting means
> and Googling those terms wasn't helpful for me, but I've removed the
> quoting that my mail client is automatically "helpfully" adding to my
> emails. I mean no offense.
Hmm. Luckily the Wayback Machine has a copy:
<http://web.archive.org/web/20230608210806/idallen.com/topposting.html;
Maybe I'll put a copy in the developer wiki.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2023-12-05 14:56 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-12-04 18:37 Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-04 20:06 ` Andrew Dunstan <[email protected]>
2023-12-04 22:55 ` Davin Shearer <[email protected]>
2023-12-05 14:56 ` Andrew Dunstan <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox