public inbox for [email protected]
help / color / mirror / Atom feedFrom: Andrew Dunstan <[email protected]>
To: jian he <[email protected]>
To: Junwang Zhao <[email protected]>
Cc: Florents Tselai <[email protected]>
Cc: Joe Conway <[email protected]>
Cc: Andrey M. Borodin <[email protected]>
Cc: Dean Rasheed <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Davin Shearer <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Emitting JSON to file using COPY TO
Date: Wed, 4 Mar 2026 10:51:14 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACJufxFLpMsbW41T65xJsw925MSE1bvOr6X+h_7sw8_qmDpTpA@mail.gmail.com>
References: <[email protected]>
<CACJufxG_VY-Hv6ss8pqDxP8SaBmBPYUdv=LB3QFpsqtTZu9FYQ@mail.gmail.com>
<CAEG8a3J=Wb0dv4uwhhMXQBNPfZ63_ZaSdko8-chK-tc-wSAwog@mail.gmail.com>
<CACJufxGmSw5GUOquAT+q7B0k+xweU3NwcNc53fLjHQjsMeanaw@mail.gmail.com>
<CACJufxGK1WiwN48Srr6Si_XY4kz7SmDKBGxgyR3EE2WoeXJgFA@mail.gmail.com>
<CACJufxGF0F8ooTCCaY=PLhb-K84NOz=SYGjYF8ZrbgO8C0iFqQ@mail.gmail.com>
<CACJufxGbiNtsHnn=ZGw2J4VcjD5za5znbcxsaEvVd7vMdxQQ+g@mail.gmail.com>
<CACJufxG7T2_LF=UuJ4iFYMJZsK37L6d3XJvBrcR-3Pp3z+BiGw@mail.gmail.com>
<CACJufxG0=CoG64Ng7DfMP4zy0MORhTW7rRioQnNVCw-6GxRaXg@mail.gmail.com>
<CACJufxGq6aJszBhfrUQKHeKmp0r+ka7=SwHSY7Qs_6LQMwiEbw@mail.gmail.com>
<CA+v5N43-F9-Wiktg_-+aqKJz+YiCNJMAh5ootDeTKkOJ=kiaHA@mail.gmail.com>
<CACJufxEjZwrocCpt29xtmJTwhWZUu1Nt0GfHFarNGXUS5AAecw@mail.gmail.com>
<CAEG8a3LB7q1eQ0AzFTEBDFDxs3kJ=5iJA+HTmiYGya6Wb5jsRA@mail.gmail.com>
<CACJufxFLpMsbW41T65xJsw925MSE1bvOr6X+h_7sw8_qmDpTpA@mail.gmail.com>
On 2026-02-08 Su 10:48 PM, jian he wrote:
> On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao<[email protected]> wrote:
>> Here are some comments on v23:
>>
>> 0001: The refactor looks straightforward to me. Introducing a format
>> field should make future extensions easier. One suggestion is that we
>> could add some helper macros around format, for example:
>>
>> #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
>> #define IS_FORMAT_TEXT_LIKE(format) \
>> (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)
>>
>> I think this would improve readability.
> Personally, I don't like marcos....
>
>> 0002: Since you have moved the `CopyFormat enum` into 0001, the
>> following commit msg should be rephrased.
>>
>> The CopyFormat enum was originally contributed by Joel Jacobson
>> [email protected], later refactored by Jian He to address various issues, and
>> further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
>> struct (commit 2e4127b6d2).
>>
>> - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
>> - ereport(ERROR,
>> - (errcode(ERRCODE_SYNTAX_ERROR),
>> - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> - errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
>> + if (opts_out->delim)
>> + {
>> + if (opts_out->format == COPY_FORMAT_BINARY)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> + errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
>> + else if (opts_out->format == COPY_FORMAT_JSON)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>> + }
>>
>> Can we add a function that converts CopyFormat to a string? Treating
>> CopyFormat as %s in error messages would make the code shorter.
>> However, I'm not sure whether this aligns with translation
>> conventions, correct me if I'm wrong.
>>
> I don’t think this is worth the added complexity.
> That said, I tried to simplify the code and changed it to:
>
> if (opts_out->delim &&
> (opts_out->format == COPY_FORMAT_BINARY ||
> opts_out->format == COPY_FORMAT_JSON))
> ereport(ERROR,
> errcode(ERRCODE_SYNTAX_ERROR),
> opts_out->format == COPY_FORMAT_BINARY
> ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
> : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>
>> - * CSV and text formats share the same TextLike routines except for the
>> + * CSV and text, json formats share the same TextLike routines except for the
>>
>> I'd suggest rewording to `CSV, text and json ...`. The same applied to
>> other parts in this patch.
>>
> sure.
>
>> 0003: The commit message includes some changes(adapt the newly
>> introduced CopyToRoutine) that actually belong to 0002; it would be
>> better to remove them from this commit.
>>
> 0002 commit message:
> """
> This introduces the JSON format option for the COPY TO command, allowing users
> to export query results or table data directly as a single JSON object or a
> stream of JSON objects.
>
> The JSON format is currently supported only for COPY TO operations; it
> is not available for COPY FROM.
>
> JSON format is incompatible with some standard text/CSV parsing or
> formatting options,
> including:
> - HEADER
> - DEFAULT
> - NULL
> - DELIMITER
> - FORCE QUOTE / FORCE NOT NULL
>
> Regression tests covering valid JSON exports and error handling for
> incompatible options have been added to src/test/regress/sql/copy.sql.
> """
>
> 0003 commit message:
> """
> Add option force_array for COPY JSON FORMAT
> This adds the force_array option, which is available exclusively
> when using COPY TO with the JSON format.
>
> When enabled, this option wraps the output in a top-level JSON array
> (enclosed in square brackets with comma-separated elements), making the
> entire result a valid single JSON value. Without this option, the default
> behavior is to output a stream of independent JSON objects.
>
> Attempting to use this option with COPY FROM or with formats other than
> JSON will raise an error.
> """
>
>> + if (cstate->json_row_delim_needed && cstate->opts.force_array)
>> + CopySendChar(cstate, ',');
>> + else if (cstate->opts.force_array)
>> + {
>> + /* first row needs no delimiter */
>> + CopySendChar(cstate, ' ');
>> + cstate->json_row_delim_needed = true;
>> + }
>>
>> can we do this:
>>
>> if (cstate->opts.force_array)
>> {
>> if (cstate->json_row_delim_needed)
>> CopySendChar(cstate, ',');
>> else
>> {
>> /* first row needs no delimiter */
>> CopySendChar(cstate, ' ');
>> cstate->json_row_delim_needed = true;
>> }
>> }
>>
> good suggestion.
>
> If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can
> switch to it accordingly.
> The change itself is quite straightforward.
I have reworked these. First I cleaned up a number of things in patches
2 and 3 (Thanks, Claude for the summary):
Patch 2: json format for COPY TO
copy.c:
- "json" → "JSON" in the COPY FROM rejection error message.
copyto.c:
1. TupleDesc setup runs once, not every row — Added
json_tupledesc_ready flag; the
memcpy/populate_compact_attribute/BlessTupleDesc block is now guarded by
if (!cstate->json_tupledesc_ready).
2. Comment rewritten — Old: "the slot's TupleDesc may change during
query execution". New: explains BlessTupleDesc registers the RECORDOID
descriptor so lookup_rowtype_tupdesc inside composite_to_json can
find it.
3. Eliminated per-row makeStringInfo() — Added StringInfoData
json_buf to the struct, initialized once in CopyToTextLikeStart in
copycontext. Each row does resetStringInfo instead of allocating a new
StringInfo.
4. Column list rejection added — Error: "column selection is not
supported in JSON mode" when attnamelist != NIL.
5. Improved SendCopyBegin comment — Old: "JSON format is always one
non-binary column". New: explains each CopyData message contains one
complete JSON object.
Tests:
6. Added copy copytest (style) to stdout (format json) to the
error-case block.
7. Added copyjsontype table test with json, jsonb columns — verifies
values are embedded directly, not double-encoded. Covers json objects,
scalars, arrays, nested objects, and nulls.
Patch 3: Add option force_array
copy.c:
1. "json" → "JSON" in COPY FROM error (carried from patch 2).
2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY
error.
copyto.c:
3. Struct fields reorganized — JSON fields grouped under /* JSON
format state */ comment with inline descriptions, instead of a
standalone json_row_delim_needed with a vague comment.
4. Block comment updated — Was "CSV, text and json formats share the
same TextLike routines except for the one-row callback". Now correctly
notes JSON has its own one-row and end callbacks.
5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now
"text and CSV" (JSON uses CopyToJsonEnd).
6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside
the if (format == JSON) block after json_buf init, instead of a separate
top-level conditional.
7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf,
improved comments, column list rejection.
Tests:
8. --Error → -- should fail: force_array requires json format; --ok →
-- force_array variants.
9. copyjsontype test carried through from patch 2.
Then I reworked the way this works. In order to support column lists
with JSON output, we need to deal with individual columns instead of
whole records. This involved quite a number of changes, as can be seen
in patch 4. This involved exporting a new small function from json.c.
The result is a lot cleaner, I believe, and in my benchmarking is faster
by a factor of almost 2.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Emitting JSON to file using COPY TO
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox