Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r9GFR-00FgxI-OW for pgsql-hackers@arkaria.postgresql.org; Sat, 02 Dec 2023 03:00:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1r9GFQ-00AEiu-AB for pgsql-hackers@arkaria.postgresql.org; Sat, 02 Dec 2023 03:00:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r9GFP-00AEim-Vs for pgsql-hackers@lists.postgresql.org; Sat, 02 Dec 2023 03:00:52 +0000 Received: from mxout1-he-de.apache.org ([95.216.194.37]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r9GFM-009ejn-MK for pgsql-hackers@postgresql.org; Sat, 02 Dec 2023 03:00:50 +0000 Received: from mail.apache.org (mailgw-he-de.apache.org [IPv6:2a01:4f8:c2c:d4aa::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with ESMTPS id CB48960BED for ; Sat, 2 Dec 2023 03:00:45 +0000 (UTC) Received: (qmail 2648492 invoked by uid 116); 2 Dec 2023 03:00:45 -0000 Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47) by apache.org (qpsmtpd/0.94) with ESMTP; Sat, 02 Dec 2023 03:00:45 +0000 Authentication-Results: apache.org; auth=none Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPSA id DEFA33E985 for ; Sat, 2 Dec 2023 03:00:44 +0000 (UTC) Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3b8929269a0so962713b6e.1 for ; Fri, 01 Dec 2023 19:00:44 -0800 (PST) X-Gm-Message-State: AOJu0YxxMh5sFhusbcsmzUqfCZo1+MsgW7AsWYYpS0fKWpnE/6YkBh1A Ani9n4SOBrDzlzfk/F1464nl5etiLtma5kYZK9I= X-Google-Smtp-Source: AGHT+IE0BC9HYUl33zFxgDB7LLH3txUxje6vTP7hNSvVeevwi/5E5S0ccxZC/3Ae1ZubF0uOw5SWFZ9UcuUL4iTORgs= X-Received: by 2002:a05:6359:650b:b0:170:17ea:f4e6 with SMTP id sk11-20020a056359650b00b0017017eaf4e6mr362238rwb.51.1701486044305; Fri, 01 Dec 2023 19:00:44 -0800 (PST) MIME-Version: 1.0 References: <3853387.1701096982@sss.pgh.pa.us> <3a98decf-3fe3-4b49-9b68-fda01338872c@sedlakovi.org> <24e3ee88-ec1e-421b-89ae-8a47ee0d2df1@joeconway.com> <20231201230958.GA1786735@nathanxps13> In-Reply-To: <20231201230958.GA1786735@nathanxps13> From: Davin Shearer Date: Fri, 1 Dec 2023 22:00:29 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: PostgreSQL-development Cc: Joe Conway , Nathan Bossart Content-Type: multipart/alternative; boundary="00000000000042df92060b7e17c0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000042df92060b7e17c0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm really glad to see this taken up as a possible new feature and will definitely use it if it gets released. I'm impressed with how clean, understandable, and approachable the postgres codebase is in general and how easy it is to read and understand this patch. I reviewed the patch (though I didn't build and test the code) and have a concern with adding the '[' at the beginning and ']' at the end of the json output. Those are already added by `json_agg` ( https://www.postgresql.org/docs/current/functions-aggregate.html) as you can see in my initial email. Adding them in the COPY TO may be redundant (e.g., [[{"key":"value"...}....]]). I think COPY TO makes good sense to support, though COPY FROM maybe not so much as JSON isn't necessarily flat and rectangular like CSV. For my use-case, I'm emitting JSON files to Apache NiFi for processing, and NiFi has superior handling of JSON (via JOLT parsers) versus CSV where parsing is generally done with regex. I want to be able to emit JSON using a postgres function and thus COPY TO. Definitely +1 for COPY TO. I don't think COPY FROM will work out well unless the JSON is required to be flat and rectangular. I would vote -1 to leave it out due to the necessary restrictions making it not generally useful. Hope it helps, Davin On Fri, Dec 1, 2023 at 6:10=E2=80=AFPM Nathan Bossart wrote: > On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote: > > I did a quick PoC patch (attached) -- if there interest and no hard > > objections I would like to get it up to speed for the January commitfes= t. > > Cool. I would expect there to be interest, given all the other JSON > support that has been added thus far. > > I noticed that, with the PoC patch, "json" is the only format that must b= e > quoted. Without quotes, I see a syntax error. I'm assuming there's a > conflict with another json-related rule somewhere in gram.y, but I haven'= t > tracked down exactly which one is causing it. > > > 1. Is supporting JSON array format sufficient, or does it need to suppo= rt > > some other options? How flexible does the support scheme need to be? > > I don't presently have a strong opinion on this one. My instinct would b= e > start with something simple, though. I don't think we offer any special > options for log_destination... > > > 2. This only supports COPY TO and we would undoubtedly want to support > COPY > > FROM for JSON as well, but is that required from the start? > > I would vote for including COPY FROM support from the start. > > > ! if (!cstate->opts.json_mode) > > I think it's unfortunate that this further complicates the branching in > CopyOneRowTo(), but after some quick glances at the code, I'm not sure it= 's > worth refactoring a bunch of stuff to make this nicer. > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com > --00000000000042df92060b7e17c0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm really glad to see this taken up as a possible new= feature and will definitely use=C2=A0it if it gets=C2=A0released.=C2=A0 I&= #39;m impressed with how clean, understandable, and approachable the postgr= es codebase is in general and how easy it=C2=A0is to=C2=A0read and understa= nd this patch.

I reviewed the patch (though I didn't= build and test the code) and have a concern with adding the '[' at= the beginning and ']' at the end of the json output.=C2=A0 Those a= re already added by `json_agg` (https://www.postgresql.org/docs/current/f= unctions-aggregate.html) as you can see in my initial email.=C2=A0 Addi= ng them in the COPY TO may be redundant (e.g., [[{"key":"val= ue"...}....]]).

I think COPY TO makes good se= nse to support, though COPY FROM maybe not so much as JSON isn't necess= arily flat and rectangular like CSV.

For my use-ca= se, I'm emitting JSON files to Apache NiFi for processing, and NiFi has= superior handling of JSON (via JOLT parsers) versus CSV where parsing is g= enerally done with regex.=C2=A0 I want to be able to emit JSON using a post= gres function and thus COPY TO.

Definitely=C2=A0+1= for COPY TO.

I don't think COPY FROM will wor= k out well unless the JSON is required to be flat and rectangular.=C2=A0 I = would vote -1 to leave it out due to the necessary restrictions making it n= ot generally useful.

Hope it helps,
Davi= n

On Fri, Dec 1, 2023 at 6:10=E2=80=AFPM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conw= ay wrote:
> I did a quick PoC patch (attached) -- if there interest and no hard > objections I would like to get it up to speed for the January commitfe= st.

Cool.=C2=A0 I would expect there to be interest, given all the other JSON support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format tha= t must be
quoted.=C2=A0 Without quotes, I see a syntax error.=C2=A0 I'm assuming = there's a
conflict with another json-related rule somewhere in gram.y, but I haven= 9;t
tracked down exactly which one is causing it.

> 1. Is supporting JSON array format sufficient, or does it need to supp= ort
> some other options? How flexible does the support scheme need to be?
I don't presently have a strong opinion on this one.=C2=A0 My instinct = would be
start with something simple, though.=C2=A0 I don't think we offer any s= pecial
options for log_destination...

> 2. This only supports COPY TO and we would undoubtedly want to support= COPY
> FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

> !=C2=A0 =C2=A0 =C2=A0if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in=
CopyOneRowTo(), but after some quick glances at the code, I'm not sure = it's
worth refactoring a bunch of stuff to make this nicer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
--00000000000042df92060b7e17c0--