Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rADpi-005Usw-AP for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Dec 2023 18:38:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rADpe-000kmM-NY for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Dec 2023 18:38:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rADpe-000kmD-9O for pgsql-hackers@lists.postgresql.org; Mon, 04 Dec 2023 18:38:14 +0000 Received: from mxout1-ec2-va.apache.org ([3.227.148.255]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rADpb-008nAr-2F for pgsql-hackers@postgresql.org; Mon, 04 Dec 2023 18:38:13 +0000 Received: from mail.apache.org (mailgw-he-de.apache.org [116.203.246.181]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with ESMTPS id 3E5A14DAE2 for ; Mon, 4 Dec 2023 18:37:20 +0000 (UTC) Received: (qmail 3443969 invoked by uid 116); 4 Dec 2023 18:37:19 -0000 Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47) by apache.org (qpsmtpd/0.94) with ESMTP; Mon, 04 Dec 2023 18:37:19 +0000 Authentication-Results: apache.org; auth=none Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPSA id A5A704022F for ; Mon, 4 Dec 2023 18:37:18 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1d0521554ddso20429235ad.2 for ; Mon, 04 Dec 2023 10:37:18 -0800 (PST) X-Gm-Message-State: AOJu0YxyMBU+8GM7VWssr8BXOZUFlFg/BUd5K2f2SdIympMDVV5FNYMV ZGIJCV2By2lzP0LvBNddMTN5Cjs63xxZGV7juPI= X-Google-Smtp-Source: AGHT+IG/kZ6PMxqq3eFk4A3t9zrxtOyy/5EHrFCCBl7NfCSMlJ0hTO5CURGfPp9OdBebYetnJ+XoNckKvY7TJwxB0xc= X-Received: by 2002:a17:903:32c5:b0:1d0:6ffe:1e7a with SMTP id i5-20020a17090332c500b001d06ffe1e7amr3063945plr.93.1701715037795; Mon, 04 Dec 2023 10:37:17 -0800 (PST) MIME-Version: 1.0 References: <3853387.1701096982@sss.pgh.pa.us> <3a98decf-3fe3-4b49-9b68-fda01338872c@sedlakovi.org> <24e3ee88-ec1e-421b-89ae-8a47ee0d2df1@joeconway.com> <7117a356-916c-4cf3-bad8-861490e65dcf@joeconway.com> <9c77b6fa-ee88-b2e6-0fa7-4fc81721da35@dunslane.net> <41dcba92-1075-e5e5-cb99-36711abf6cec@dunslane.net> In-Reply-To: From: Davin Shearer Date: Mon, 4 Dec 2023 13:37:06 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Joe Conway Cc: Andrew Dunstan , PostgreSQL-development Content-Type: multipart/alternative; boundary="000000000000563bce060bb3680c" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000563bce060bb3680c Content-Type: text/plain; charset="UTF-8" Looking great! For testing, in addition to the quotes, include DOS and Unix EOL, \ and /, Byte Order Markers, and mulitbyte characters like UTF-8. Essentially anything considered textural is fair game to be a value. On Mon, Dec 4, 2023, 10:46 Joe Conway wrote: > On 12/4/23 09:25, Andrew Dunstan wrote: > > > > On 2023-12-04 Mo 08:37, Joe Conway wrote: > >> On 12/4/23 07:41, Andrew Dunstan wrote: > >>> > >>> On 2023-12-03 Su 20:14, Joe Conway wrote: > >>>> (please don't top quote on the Postgres lists) > >>>> > >>>> On 12/3/23 17:38, Davin Shearer wrote: > >>>>> " being quoted as \\" breaks the JSON. It needs to be \". This has > >>>>> been my whole problem with COPY TO for JSON. > >>>>> > >>>>> Please validate that the output is in proper format with correct > >>>>> quoting for special characters. I use `jq` on the command line to > >>>>> validate and format the output. > >>>> > >>>> I just hooked existing "row-to-json machinery" up to the "COPY TO" > >>>> statement. If the output is wrong (just for for this use case?), > >>>> that would be a missing feature (or possibly a bug?). > >>>> > >>>> Davin -- how did you work around the issue with the way the built in > >>>> functions output JSON? > >>>> > >>>> Andrew -- comments/thoughts? > >>> > >>> I meant to mention this when I was making comments yesterday. > >>> > >>> The patch should not be using CopyAttributeOutText - it will try to > >>> escape characters such as \, which produces the effect complained of > >>> here, or else we need to change its setup so we have a way to inhibit > >>> that escaping. > >> > >> > >> Interesting. > >> > >> I am surprised this has never been raised as a problem with COPY TO > >> before. > >> > >> Should the JSON output, as produced by composite_to_json(), be sent > >> as-is with no escaping at all? If yes, is JSON somehow unique in this > >> regard? > > > > > > Text mode output is in such a form that it can be read back in using > > text mode input. There's nothing special about JSON in this respect - > > any text field will be escaped too. But output suitable for text mode > > input is not what you're trying to produce here; you're trying to > > produce valid JSON. > > > > So, yes, the result of composite_to_json, which is already suitably > > escaped, should not be further escaped in this case. > > Gotcha. > > This patch version uses CopySendData() instead and includes > documentation changes. Still lacks regression tests. > > Hopefully this looks better. Any other particular strings I ought to > test with? > > 8<------------------ > test=# copy (select * from foo limit 4) to stdout (format json, > force_array true); > [ > {"id":1,"f1":"line with \" in it: > 1","f2":"2023-12-03T12:26:41.596053-05:00"} > ,{"id":2,"f1":"line with ' in it: > 2","f2":"2023-12-03T12:26:41.596173-05:00"} > ,{"id":3,"f1":"line with \" in it: > 3","f2":"2023-12-03T12:26:41.596179-05:00"} > ,{"id":4,"f1":"line with ' in it: > 4","f2":"2023-12-03T12:26:41.596182-05:00"} > ] > 8<------------------ > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > --000000000000563bce060bb3680c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Looking great!

For testing, in addition to the quotes, include DOS and Unix EOL, \ and = /, Byte Order Markers, and mulitbyte characters like UTF-8.

Essentially anything considered textu= ral is fair game to be a value.=C2=A0

On Mon, Dec 4, 2023, 10:46 Joe C= onway <mail@joeconway.com> = wrote:
On 12/4/23 09:25, Andrew Dun= stan wrote:
>
> On 2023-12-04 Mo 08:37, Joe Conway wrote:
>> On 12/4/23 07:41, Andrew Dunstan wrote:
>>>
>>> On 2023-12-03 Su 20:14, Joe Conway wrote:
>>>> (please don't top quote on the Postgres lists)
>>>>
>>>> On 12/3/23 17:38, Davin Shearer wrote:
>>>>> " being quoted as \\" breaks the JSON. It ne= eds to be \".=C2=A0 This has
>>>>> been my whole problem with COPY TO for JSON.
>>>>>
>>>>> Please validate that the output is in proper format wi= th correct
>>>>> quoting for special characters. I use `jq` on the comm= and line to
>>>>> validate and format the output.
>>>>
>>>> I just hooked existing "row-to-json machinery" u= p to the "COPY TO"
>>>> statement. If the output is wrong (just for for this use c= ase?),
>>>> that would be a missing feature (or possibly a bug?).
>>>>
>>>> Davin -- how did you work around the issue with the way th= e built in
>>>> functions output JSON?
>>>>
>>>> Andrew -- comments/thoughts?
>>>
>>> I meant to mention this when I was making comments yesterday.<= br> >>>
>>> The patch should not be using CopyAttributeOutText - it will t= ry to
>>> escape characters such as \, which produces the effect complai= ned of
>>> here, or else we need to change its setup so we have a way to = inhibit
>>> that escaping.
>>
>>
>> Interesting.
>>
>> I am surprised this has never been raised as a problem with COPY T= O
>> before.
>>
>> Should the JSON output, as produced by composite_to_json(), be sen= t
>> as-is with no escaping at all? If yes, is JSON somehow unique in t= his
>> regard?
>
>
> Text mode output is in such a form that it can be read back in using > text mode input. There's nothing special about JSON in this respec= t -
> any text field will be escaped too. But output suitable for text mode<= br> > input is not what you're trying to produce here; you're trying= to
> produce valid JSON.
>
> So, yes, the result of composite_to_json, which is already suitably > escaped, should not be further escaped in this case.

Gotcha.

This patch version uses CopySendData() instead and includes
documentation changes. Still lacks regression tests.

Hopefully this looks better. Any other particular strings I ought to
test with?

8<------------------
test=3D# copy (select * from foo limit 4) to stdout (format json,
force_array true);
[
=C2=A0 {"id":1,"f1":"line with \" in it:
1","f2":"2023-12-03T12:26:41.596053-05:00"}
,{"id":2,"f1":"line with ' in it:
2","f2":"2023-12-03T12:26:41.596173-05:00"}
,{"id":3,"f1":"line with \" in it:
3","f2":"2023-12-03T12:26:41.596179-05:00"}
,{"id":4,"f1":"line with ' in it:
4","f2":"2023-12-03T12:26:41.596182-05:00"}
]
8<------------------

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
--000000000000563bce060bb3680c--