MIME-Version: 1.0
References: <b3a73eca-ec3b-4b64-b6c2-82418bb16701@manitou-mail.org>
 <b7d8c88b-d175-44ca-a323-ec2f5c3cd37d@joeconway.com>
 <8620df11-96e4-4ca3-8f3c-33a479260961@joeconway.com>
 <4162f7f7-6fd6-4720-98e2-89f80e3de2ed@joeconway.com>
In-Reply-To: <4162f7f7-6fd6-4720-98e2-89f80e3de2ed@joeconway.com>
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 19 Jan 2024 17:09:47 +0900
Message-ID: 
 <CAD21AoCb02zhZM3vXb8HSw8fwOsL+iRdEFb--Kmunv8PjPAWjw@mail.gmail.com>
Subject: Re: Emitting JSON to file using COPY TO
To: Joe Conway <mail@joeconway.com>
Cc: Daniel Verite <daniel@manitou-mail.org>,
 Andrew Dunstan <andrew@dunslane.net>,
	Davin Shearer <davin@apache.org>,
 PostgreSQL-development <pgsql-hackers@postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAD21AoCb02zhZM3vXb8HSw8fwOsL%2BiRdEFb--Kmunv8PjPAWjw%40mail.gmail.com>
Precedence: bulk

On Thu, Dec 7, 2023 at 10:10=E2=80=AFAM Joe Conway <mail@joeconway.com> wro=
te:
>
> On 12/6/23 18:09, Joe Conway wrote:
> > On 12/6/23 14:47, Joe Conway wrote:
> >> On 12/6/23 13:59, Daniel Verite wrote:
> >>>     Andrew Dunstan wrote:
> >>>
> >>>> IMNSHO, we should produce either a single JSON
> >>>> document (the ARRAY case) or a series of JSON documents, one per row
> >>>> (the LINES case).
> >>>
> >>> "COPY Operations" in the doc says:
> >>>
> >>> " The backend sends a CopyOutResponse message to the frontend, follow=
ed
> >>>     by zero or more CopyData messages (always one per row), followed =
by
> >>>     CopyDone".
> >>>
> >>> In the ARRAY case, the first messages with the copyjsontest
> >>> regression test look like this (tshark output):
> >>>
> >>> PostgreSQL
> >>>      Type: CopyOut response
> >>>      Length: 13
> >>>      Format: Text (0)
> >>>      Columns: 3
> >>>     Format: Text (0)
> >>> PostgreSQL
> >>>      Type: Copy data
> >>>      Length: 6
> >>>      Copy data: 5b0a
> >>> PostgreSQL
> >>>      Type: Copy data
> >>>      Length: 76
> >>>      Copy data:
> >>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2=
031=E2=80=A6
> >>>
> >>> The first Copy data message with contents "5b0a" does not qualify
> >>> as a row of data with 3 columns as advertised in the CopyOut
> >>> message. Isn't that a problem?
> >>
> >>
> >> Is it a real problem, or just a bit of documentation change that I mis=
sed?
> >>
> >> Anything receiving this and looking for a json array should know how t=
o
> >> assemble the data correctly despite the extra CopyData messages.
> >
> > Hmm, maybe the real problem here is that Columns do not equal "3" for
> > the json mode case -- that should really say "1" I think, because the
> > row is not represented as 3 columns but rather 1 json object.
> >
> > Does that sound correct?
> >
> > Assuming yes, there is still maybe an issue that there are two more
> > "rows" that actual output rows (the "[" and the "]"), but maybe those
> > are less likely to cause some hazard?
>
>
> The attached should fix the CopyOut response to say one column. I.e. it
> ought to look something like:
>
> PostgreSQL
>       Type: CopyOut response
>       Length: 13
>       Format: Text (0)
>       Columns: 1
>       Format: Text (0)
> PostgreSQL
>       Type: Copy data
>       Length: 6
>       Copy data: 5b0a
> PostgreSQL
>       Type: Copy data
>       Length: 76
>       Copy data: [...]
>

If I'm not missing, copyto_json.007.diff is the latest patch but it
needs to be rebased to the current HEAD. Here are random comments:

---
 if (opts_out->json_mode)
+   {
+       if (is_from)
+           ereport(ERROR,
+                   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                    errmsg("cannot use JSON mode in COPY FROM")));
+   }
+   else if (opts_out->force_array)
+       ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("COPY FORCE_ARRAY requires JSON mode")));

I think that flatting these two condition make the code more readable:

if (opts_out->json_mode && is_from)
ereport(ERROR, ...);

if (!opts_out->json_mode && opts_out->force_array)
ereport(ERROR, ...);

Also these checks can be moved close to other checks at the end of
ProcessCopyOptions().

---
@@ -3395,6 +3395,10 @@ copy_opt_item:
                {
                    $$ =3D makeDefElem("format", (Node *) makeString("csv")=
, @1);
                }
+           | JSON
+               {
+                   $$ =3D makeDefElem("format", (Node *) makeString("json"=
), @1);
+               }
            | HEADER_P
                {
                    $$ =3D makeDefElem("header", (Node *) makeBoolean(true)=
, @1);
@@ -3427,6 +3431,10 @@ copy_opt_item:
                {
                    $$ =3D makeDefElem("encoding", (Node *) makeString($2),=
 @1);
                }
+           | FORCE ARRAY
+               {
+                   $$ =3D makeDefElem("force_array", (Node *)
makeBoolean(true), @1);
+               }
        ;

I believe we don't need to support new options in old-style syntax.

---
@@ -3469,6 +3477,10 @@ copy_generic_opt_elem:
                {
                    $$ =3D makeDefElem($1, $2, @1);
                }
+           | FORMAT_LA copy_generic_opt_arg
+               {
+                   $$ =3D makeDefElem("format", $2, @1);
+               }
        ;

I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.

---
+/* need delimiter to start next json array element */
+static bool json_row_delim_needed =3D false;

I think it's cleaner to include json_row_delim_needed into CopyToStateData.

---
Splitting the patch into two patches: add json format and add
force_array option would make reviews easy.

Regards,

--=20
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com