Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rQjxO-003Gih-OL for pgsql-hackers@arkaria.postgresql.org; Fri, 19 Jan 2024 08:10:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rQjxM-007uT7-Ss for pgsql-hackers@arkaria.postgresql.org; Fri, 19 Jan 2024 08:10:28 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rQjxM-007uSz-G9 for pgsql-hackers@lists.postgresql.org; Fri, 19 Jan 2024 08:10:28 +0000 Received: from mail-lf1-x134.google.com ([2a00:1450:4864:20::134]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rQjxJ-002bQz-EE for pgsql-hackers@postgresql.org; Fri, 19 Jan 2024 08:10:27 +0000 Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-50e7b03fbbeso146285e87.0 for ; Fri, 19 Jan 2024 00:10:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705651824; x=1706256624; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vpCXoLY7WZCDOscMgqZLhEmG+TAeNXUpV6SEAJ20mAc=; b=L8XFTF/AhySC0doJMyg16/KywHr3juU04PYD8eA/VlLxdVdGs1R99QkcarV/0uKD69 bkOGYo6eRsSXpGpQSHsXCDlXOoDISgzN7XERvSqEiU69d8wg44CGR0zTVcMsuk7MSgoB VgvNQja+UR25XoaKEDhsDC96Gg06UYZ5AlNtEN9kFLwRudbXsqipnbG/+GtDhYjUO2Wm R6+hOJRxpCk6EDGHzIOzM/Ffax2xBL8+9Dg84rHH231zNn1hJ6iOJ5arhi8A6Gjr8u5g a074oDGsDYw3OaSLob9MjGHlYORv9wabBiZVFMqXv6DvbnrvzDEu0AH3i+5vX5OI6Wts ARMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705651824; x=1706256624; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vpCXoLY7WZCDOscMgqZLhEmG+TAeNXUpV6SEAJ20mAc=; b=BIMSQXEFqjKDcfLoC4Jl3CzmAacC1uEzjw6S814JtFPPnra310PAnL9ESw3erSbgFi xzhossZn9dlRLJvcWIctHNOLi8vGqMB5kGBDMHIvLiB5lEyxQkjTu/xPTmkr0IYGdiD7 65Uia7ANXz0y6OS762Koy3C3E1chFR4PBemrvz9CdKsuAa4nFiqlR49OJWVNXxt16xh5 7oMR+EUAAmgA4YIqYpjD/ZfLyb5P0deqsc9tnexRd1FQFZbX3Y/6wANDaWQdZokGgei9 sURpYJ6g/JNgx6vIZKaXxLR3FT7SSDt1a41tuI9dba/nxS0HOhoDsUSxzjlddiHOFOWv Hc2Q== X-Gm-Message-State: AOJu0YzrZVfTouO+MjFjzKnLuqfFscK384qK5zhGo+ZzSMRvgJS/kZpl qXz4v4qJvl0x1CLiarf+D9yBvYzYFBYY7DiUvJWRQ717OGJpCLvB8mWVKl91UsmdSRlxSb32PYe u68zRKuybnvbgEVdtEG/RHw1H96Y= X-Google-Smtp-Source: AGHT+IE6GwLU5Pcn0QXXGW7EsBy8co6/uWIMyfnBuJ0qmEl5UCKWGZp/RXNg5gREU3/nZjmkgxBDS8V43uPQqdkkhIs= X-Received: by 2002:a19:4f44:0:b0:50e:7e55:be92 with SMTP id a4-20020a194f44000000b0050e7e55be92mr2449451lfk.2.1705651823917; Fri, 19 Jan 2024 00:10:23 -0800 (PST) MIME-Version: 1.0 References: <8620df11-96e4-4ca3-8f3c-33a479260961@joeconway.com> <4162f7f7-6fd6-4720-98e2-89f80e3de2ed@joeconway.com> In-Reply-To: <4162f7f7-6fd6-4720-98e2-89f80e3de2ed@joeconway.com> From: Masahiko Sawada Date: Fri, 19 Jan 2024 17:09:47 +0900 Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Joe Conway Cc: Daniel Verite , Andrew Dunstan , Davin Shearer , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Dec 7, 2023 at 10:10=E2=80=AFAM Joe Conway wro= te: > > On 12/6/23 18:09, Joe Conway wrote: > > On 12/6/23 14:47, Joe Conway wrote: > >> On 12/6/23 13:59, Daniel Verite wrote: > >>> Andrew Dunstan wrote: > >>> > >>>> IMNSHO, we should produce either a single JSON > >>>> document (the ARRAY case) or a series of JSON documents, one per row > >>>> (the LINES case). > >>> > >>> "COPY Operations" in the doc says: > >>> > >>> " The backend sends a CopyOutResponse message to the frontend, follow= ed > >>> by zero or more CopyData messages (always one per row), followed = by > >>> CopyDone". > >>> > >>> In the ARRAY case, the first messages with the copyjsontest > >>> regression test look like this (tshark output): > >>> > >>> PostgreSQL > >>> Type: CopyOut response > >>> Length: 13 > >>> Format: Text (0) > >>> Columns: 3 > >>> Format: Text (0) > >>> PostgreSQL > >>> Type: Copy data > >>> Length: 6 > >>> Copy data: 5b0a > >>> PostgreSQL > >>> Type: Copy data > >>> Length: 76 > >>> Copy data: > >>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2= 031=E2=80=A6 > >>> > >>> The first Copy data message with contents "5b0a" does not qualify > >>> as a row of data with 3 columns as advertised in the CopyOut > >>> message. Isn't that a problem? > >> > >> > >> Is it a real problem, or just a bit of documentation change that I mis= sed? > >> > >> Anything receiving this and looking for a json array should know how t= o > >> assemble the data correctly despite the extra CopyData messages. > > > > Hmm, maybe the real problem here is that Columns do not equal "3" for > > the json mode case -- that should really say "1" I think, because the > > row is not represented as 3 columns but rather 1 json object. > > > > Does that sound correct? > > > > Assuming yes, there is still maybe an issue that there are two more > > "rows" that actual output rows (the "[" and the "]"), but maybe those > > are less likely to cause some hazard? > > > The attached should fix the CopyOut response to say one column. I.e. it > ought to look something like: > > PostgreSQL > Type: CopyOut response > Length: 13 > Format: Text (0) > Columns: 1 > Format: Text (0) > PostgreSQL > Type: Copy data > Length: 6 > Copy data: 5b0a > PostgreSQL > Type: Copy data > Length: 76 > Copy data: [...] > If I'm not missing, copyto_json.007.diff is the latest patch but it needs to be rebased to the current HEAD. Here are random comments: --- if (opts_out->json_mode) + { + if (is_from) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot use JSON mode in COPY FROM"))); + } + else if (opts_out->force_array) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("COPY FORCE_ARRAY requires JSON mode"))); I think that flatting these two condition make the code more readable: if (opts_out->json_mode && is_from) ereport(ERROR, ...); if (!opts_out->json_mode && opts_out->force_array) ereport(ERROR, ...); Also these checks can be moved close to other checks at the end of ProcessCopyOptions(). --- @@ -3395,6 +3395,10 @@ copy_opt_item: { $$ =3D makeDefElem("format", (Node *) makeString("csv")= , @1); } + | JSON + { + $$ =3D makeDefElem("format", (Node *) makeString("json"= ), @1); + } | HEADER_P { $$ =3D makeDefElem("header", (Node *) makeBoolean(true)= , @1); @@ -3427,6 +3431,10 @@ copy_opt_item: { $$ =3D makeDefElem("encoding", (Node *) makeString($2),= @1); } + | FORCE ARRAY + { + $$ =3D makeDefElem("force_array", (Node *) makeBoolean(true), @1); + } ; I believe we don't need to support new options in old-style syntax. --- @@ -3469,6 +3477,10 @@ copy_generic_opt_elem: { $$ =3D makeDefElem($1, $2, @1); } + | FORMAT_LA copy_generic_opt_arg + { + $$ =3D makeDefElem("format", $2, @1); + } ; I think it's not necessary. "format" option is already handled in copy_generic_opt_elem. --- +/* need delimiter to start next json array element */ +static bool json_row_delim_needed =3D false; I think it's cleaner to include json_row_delim_needed into CopyToStateData. --- Splitting the patch into two patches: add json format and add force_array option would make reviews easy. Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com