Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r7dVl-0065Xs-Ml for pgsql-general@arkaria.postgresql.org; Mon, 27 Nov 2023 15:27:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1r7dVk-00EWFQ-Bv for pgsql-general@arkaria.postgresql.org; Mon, 27 Nov 2023 15:27:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r7dVk-00EWFG-0z for pgsql-general@lists.postgresql.org; Mon, 27 Nov 2023 15:27:00 +0000 Received: from mail-ot1-x32e.google.com ([2607:f8b0:4864:20::32e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1r7dVg-008oPN-ST for pgsql-general@lists.postgresql.org; Mon, 27 Nov 2023 15:26:59 +0000 Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6ce2ea3a944so2903398a34.1 for ; Mon, 27 Nov 2023 07:26:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701098815; x=1701703615; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=K3gUznNKqb2ThHrOuiskOuUgNYAag4Sk2ISQTAAC864=; b=g2+65YPSSOZtwGQL1RywPB6S5Kdg7Cc4nPCDc9nnT2RtFH1+5brt6v1sXoXRdRdXGB Tez66wHQdj0r/vSjhIYPwAL1m9B9Y3WCpD5VUuU+cU3R9tYUojcUmSw6eZ/p+P0qB6tY eZRbaVjbTQOkeBegpbPZUB0gDpTsx4voEs1TLTCPuMjAYU1yr6g0HMR+kJoV5J7hzNsQ 2Ps4iNFUBGRj8CpElz0O5fQfGVHFhoqwY4Ar+OZ+gTuutp5N8O4YEdBDkx1QG6u3GovF IWs7XPvdCOEINuOq3lgmoyMS9nW+kddM9HUfcCucGWzK3jla1dRFdYyakrDBe+84BfjE QN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701098815; x=1701703615; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=K3gUznNKqb2ThHrOuiskOuUgNYAag4Sk2ISQTAAC864=; b=ujRz/LQSSpFmDVPafIQIUBcwGd5oUbVz1XzSx5KwNPszl9KfONjXwUYPzmfBR7XaxY HhqNFXOZw7tvVXO8VGAQftf5Hsf7RQALmiWW1tVuuH4Y3UquxWML/X2fVABEqxwr7aD0 4TcWXSTxOyJwb3Ig0PackaNd0t6LxFMh+VhrfwuMdHY2UzOGJCGDu2Xq0ewQVzOsrvQa kCX2cGngDuuP3nkfSdDMFtwYf3PWEPlXGtcKRBR7U8CiCsffSI1Llc7wBSfHl8sGkiDT ZfaVyFioVZKCzqVywfKIzeVyUV+RTE9I3ZB2yEgnef36WNi8dsYQ6OrBtnYEBBxqsw8i +Pvg== X-Gm-Message-State: AOJu0YxRk3SZESpVlFclUQJEjmGHd9schYdpyJD/PtXmQZgYDyQ+wf55 mOlYUZu9BcNqYi7P0fYYYWybUxkggHj7rKkcCmM= X-Google-Smtp-Source: AGHT+IFYQL5dX/jatsRX2qwyIddZfOKKjqBlQ6yscZJy+wK1Oy4GUUPv8vYIeu0Mlf1TtaBS5oG0ukH2Z7eGj6UndOw= X-Received: by 2002:a05:6870:9f8b:b0:1f0:d96:8d9c with SMTP id xm11-20020a0568709f8b00b001f00d968d9cmr16594566oab.9.1701098815176; Mon, 27 Nov 2023 07:26:55 -0800 (PST) MIME-Version: 1.0 References: <3853387.1701096982@sss.pgh.pa.us> In-Reply-To: <3853387.1701096982@sss.pgh.pa.us> From: Dominique Devienne Date: Mon, 27 Nov 2023 16:26:43 +0100 Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Tom Lane Cc: "David G. Johnston" , Adrian Klaver , Davin Shearer , "pgsql-general@lists.postgresql.org" Content-Type: multipart/alternative; boundary="0000000000009b45e0060b23eee4" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009b45e0060b23eee4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Nov 27, 2023 at 3:56=E2=80=AFPM Tom Lane wrote: > "David G. Johnston" writes: > > I agree there should be a copy option for =E2=80=9Cnot formatted=E2=80= =9D so if you dump > a > > single column result in that format you get the raw unescaped contents = of > > the column. > > I'm not sure I even buy that. JSON data in particular is typically > multi-line, so how will you know where the row boundaries are? > That is, is a newline a row separator or part of the data? > > You can debate the intelligence of any particular quoting/escaping > scheme, but imagining that you can get away without having one at > all will just create its own problems. > What I was suggesting is not about a "not formatted" option. But rather than JSON values (i.e. typed `json` or `jsonb`) in a JSON-formatted COPY operator, the JSON values should not be serialized to text that is simply output as a JSON-text-value by COPY, but "inlined" as a "real" JSON value without the JSON document output by COPY. This is a special case, where the inner and outer "values" (for lack of a better terminology) are *both* JSON documents, and given that JSON is hierarchical, the inner JSON value can either by 1) serializing to text first, which must thus be escaped using the JSON escaping rules, 2) NOT serialized, but "inline" or "spliced-in" the outer COPY JSON document. I guess COPY in JSON mode supports only #1 now? While #2 makes more sense to me. But both options are valid. Is that clearer? BTW, JSON is not multi-line, except for insignificant whitespace. So even COPY in JSON mode is not supposed to be line based I guess? Unless COPY in JSON mode is more like NDJSON (https://ndjson.org/)? --DD --0000000000009b45e0060b23eee4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Nov 27, 2023 at 3:56=E2=80=AFPM T= om Lane <tgl@sss.pgh.pa.us> = wrote:
"David G. Johnston" <david.g.johnston@gmail.com>= writes:
> I agree there should be a copy option for =E2=80=9Cnot formatted=E2=80= =9D so if you dump a
> single column result in that format you get the raw unescaped contents= of
> the column.

I'm not sure I even buy that.=C2=A0 JSON data in particular is typicall= y
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

= What I was suggesting is not about a "not formatted" option.
But rather than JSON values (i.e. typed `json` or `jsonb`) in a
=
JSON-formatted COPY operator, the JSON values should not be
= serialized to text that is simply output as a JSON-text-value by COPY,
but "inlined" as a "real" JSON value without the = JSON document output by COPY.

This is a special ca= se, where the inner and outer "values" (for lack of a better term= inology)
are *both* JSON documents, and given that JSON is hierar= chical, the inner JSON value can
either by 1) serializing to text= first, which must thus be escaped using the JSON escaping rules,
2) NOT serialized, but "inline" or "spliced-in" the ou= ter COPY JSON document.

I guess COPY in JSON mode = supports only #1 now? While #2 makes more sense to me.
But both o= ptions are valid. Is that clearer?

BTW, JSON is no= t multi-line, except for insignificant whitespace.
So even COPY i= n JSON mode is not supposed to be line based I guess?
Unless COPY= in JSON mode is more like NDJSON (https://= ndjson.org/)? --DD
--0000000000009b45e0060b23eee4--