public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andrew Dunstan <[email protected]>
To: jian he <[email protected]>
To: Junwang Zhao <[email protected]>
Cc: Florents Tselai <[email protected]>
Cc: Joe Conway <[email protected]>
Cc: Andrey M. Borodin <[email protected]>
Cc: Dean Rasheed <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Davin Shearer <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Emitting JSON to file using COPY TO
Date: Wed, 4 Mar 2026 10:51:14 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACJufxFLpMsbW41T65xJsw925MSE1bvOr6X+h_7sw8_qmDpTpA@mail.gmail.com>
References: <[email protected]>
	<CACJufxG_VY-Hv6ss8pqDxP8SaBmBPYUdv=LB3QFpsqtTZu9FYQ@mail.gmail.com>
	<CAEG8a3J=Wb0dv4uwhhMXQBNPfZ63_ZaSdko8-chK-tc-wSAwog@mail.gmail.com>
	<CACJufxGmSw5GUOquAT+q7B0k+xweU3NwcNc53fLjHQjsMeanaw@mail.gmail.com>
	<CACJufxGK1WiwN48Srr6Si_XY4kz7SmDKBGxgyR3EE2WoeXJgFA@mail.gmail.com>
	<CACJufxGF0F8ooTCCaY=PLhb-K84NOz=SYGjYF8ZrbgO8C0iFqQ@mail.gmail.com>
	<CACJufxGbiNtsHnn=ZGw2J4VcjD5za5znbcxsaEvVd7vMdxQQ+g@mail.gmail.com>
	<CACJufxG7T2_LF=UuJ4iFYMJZsK37L6d3XJvBrcR-3Pp3z+BiGw@mail.gmail.com>
	<CACJufxG0=CoG64Ng7DfMP4zy0MORhTW7rRioQnNVCw-6GxRaXg@mail.gmail.com>
	<CACJufxGq6aJszBhfrUQKHeKmp0r+ka7=SwHSY7Qs_6LQMwiEbw@mail.gmail.com>
	<CA+v5N43-F9-Wiktg_-+aqKJz+YiCNJMAh5ootDeTKkOJ=kiaHA@mail.gmail.com>
	<CACJufxEjZwrocCpt29xtmJTwhWZUu1Nt0GfHFarNGXUS5AAecw@mail.gmail.com>
	<CAEG8a3LB7q1eQ0AzFTEBDFDxs3kJ=5iJA+HTmiYGya6Wb5jsRA@mail.gmail.com>
	<CACJufxFLpMsbW41T65xJsw925MSE1bvOr6X+h_7sw8_qmDpTpA@mail.gmail.com>


On 2026-02-08 Su 10:48 PM, jian he wrote:
> On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao<[email protected]> wrote:
>> Here are some comments on v23:
>>
>> 0001: The refactor looks straightforward to me. Introducing a format
>> field should make future extensions easier. One suggestion is that we
>> could add some helper macros around format, for example:
>>
>> #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
>> #define IS_FORMAT_TEXT_LIKE(format) \
>>      (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)
>>
>> I think this would improve readability.
> Personally, I don't like marcos....
>
>> 0002: Since you have moved the `CopyFormat enum` into 0001, the
>> following commit msg should be rephrased.
>>
>> The CopyFormat enum was originally contributed by Joel Jacobson
>> [email protected], later refactored by Jian He to address various issues, and
>> further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
>> struct (commit 2e4127b6d2).
>>
>> - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
>> - ereport(ERROR,
>> - (errcode(ERRCODE_SYNTAX_ERROR),
>> - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> - errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
>> + if (opts_out->delim)
>> + {
>> + if (opts_out->format == COPY_FORMAT_BINARY)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> + errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
>> + else if (opts_out->format == COPY_FORMAT_JSON)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>> + }
>>
>> Can we add a function that converts CopyFormat to a string? Treating
>> CopyFormat as %s in error messages would make the code shorter.
>> However, I'm not sure whether this aligns with translation
>> conventions, correct me if I'm wrong.
>>
> I don’t think this is worth the added complexity.
> That said, I tried to simplify the code and changed it to:
>
>      if (opts_out->delim &&
>          (opts_out->format == COPY_FORMAT_BINARY ||
>           opts_out->format == COPY_FORMAT_JSON))
>          ereport(ERROR,
>                  errcode(ERRCODE_SYNTAX_ERROR),
>                  opts_out->format == COPY_FORMAT_BINARY
>                  ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
>                  : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>
>> - * CSV and text formats share the same TextLike routines except for the
>> + * CSV and text, json formats share the same TextLike routines except for the
>>
>> I'd suggest rewording to `CSV, text and json ...`. The same applied to
>> other parts in this patch.
>>
> sure.
>
>> 0003: The commit message includes some changes(adapt the newly
>> introduced CopyToRoutine) that actually belong to 0002; it would be
>> better to remove them from this commit.
>>
> 0002 commit message:
> """
> This introduces the JSON format option for the COPY TO command, allowing users
> to export query results or table data directly as a single JSON object or a
> stream of JSON objects.
>
> The JSON format is currently supported only for COPY TO operations; it
> is not available for COPY FROM.
>
> JSON format is incompatible with some standard text/CSV parsing or
> formatting options,
> including:
> - HEADER
> - DEFAULT
> - NULL
> - DELIMITER
> - FORCE QUOTE / FORCE NOT NULL
>
> Regression tests covering valid JSON exports and error handling for
> incompatible options have been added to src/test/regress/sql/copy.sql.
> """
>
> 0003 commit message:
> """
> Add option force_array for COPY JSON FORMAT
> This adds the force_array option, which is available exclusively
> when using COPY TO with the JSON format.
>
> When enabled, this option wraps the output in a top-level JSON array
> (enclosed in square brackets with comma-separated elements), making the
> entire result a valid single JSON value. Without this option, the default
> behavior is to output a stream of independent JSON objects.
>
> Attempting to use this option with COPY FROM or with formats other than
> JSON will raise an error.
> """
>
>> + if (cstate->json_row_delim_needed && cstate->opts.force_array)
>> + CopySendChar(cstate, ',');
>> + else if (cstate->opts.force_array)
>> + {
>> + /* first row needs no delimiter */
>> + CopySendChar(cstate, ' ');
>> + cstate->json_row_delim_needed = true;
>> + }
>>
>> can we do this:
>>
>> if (cstate->opts.force_array)
>> {
>>      if (cstate->json_row_delim_needed)
>>          CopySendChar(cstate, ',');
>>      else
>>      {
>>          /* first row needs no delimiter */
>>          CopySendChar(cstate, ' ');
>>          cstate->json_row_delim_needed = true;
>>      }
>> }
>>
> good suggestion.
>
> If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can
> switch to it accordingly.
> The change itself is quite straightforward.


I have reworked these. First I cleaned up a number of things in patches 
2 and 3 (Thanks, Claude for the summary):

Patch 2: json format for COPY TO

copy.c:
   - "json" → "JSON" in the COPY FROM rejection error message.

   copyto.c:
   1. TupleDesc setup runs once, not every row — Added 
json_tupledesc_ready flag; the 
memcpy/populate_compact_attribute/BlessTupleDesc block is now guarded by 
if (!cstate->json_tupledesc_ready).
   2. Comment rewritten — Old: "the slot's TupleDesc may change during 
query execution". New: explains BlessTupleDesc registers the RECORDOID 
descriptor so lookup_rowtype_tupdesc inside composite_to_json can
   find it.
   3. Eliminated per-row makeStringInfo() — Added StringInfoData 
json_buf to the struct, initialized once in CopyToTextLikeStart in 
copycontext. Each row does resetStringInfo instead of allocating a new
   StringInfo.
   4. Column list rejection added — Error: "column selection is not 
supported in JSON mode" when attnamelist != NIL.
   5. Improved SendCopyBegin comment — Old: "JSON format is always one 
non-binary column". New: explains each CopyData message contains one 
complete JSON object.

   Tests:
   6. Added copy copytest (style) to stdout (format json) to the 
error-case block.
   7. Added copyjsontype table test with json, jsonb columns — verifies 
values are embedded directly, not double-encoded. Covers json objects, 
scalars, arrays, nested objects, and nulls.

   Patch 3: Add option force_array

   copy.c:
   1. "json" → "JSON" in COPY FROM error (carried from patch 2).
   2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY 
error.

   copyto.c:
   3. Struct fields reorganized — JSON fields grouped under /* JSON 
format state */ comment with inline descriptions, instead of a 
standalone json_row_delim_needed with a vague comment.
   4. Block comment updated — Was "CSV, text and json formats share the 
same TextLike routines except for the one-row callback". Now correctly 
notes JSON has its own one-row and end callbacks.
   5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now 
"text and CSV" (JSON uses CopyToJsonEnd).
   6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside 
the if (format == JSON) block after json_buf init, instead of a separate 
top-level conditional.
   7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf, 
improved comments, column list rejection.

   Tests:
   8. --Error → -- should fail: force_array requires json format; --ok → 
-- force_array variants.
   9. copyjsontype test carried through from patch 2.

Then I reworked the way this works. In order to support column lists 
with JSON output, we need to deal with individual columns instead of 
whole records. This involved quite a number of changes, as can be seen 
in patch 4. This involved exporting a new small function from json.c.

The result is a lot cleaner, I believe, and in my benchmarking is faster 
by a factor of almost 2.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Emitting JSON to file using COPY TO
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox