public inbox for [email protected]
help / color / mirror / Atom feedRe: Emitting JSON to file using COPY TO
28+ messages / 8 participants
[nested] [flat]
* Re: Emitting JSON to file using COPY TO
@ 2023-12-06 19:47 Joe Conway <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Joe Conway @ 2023-12-06 19:47 UTC (permalink / raw)
To: Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 13:59, Daniel Verite wrote:
> Andrew Dunstan wrote:
>
>> IMNSHO, we should produce either a single JSON
>> document (the ARRAY case) or a series of JSON documents, one per row
>> (the LINES case).
>
> "COPY Operations" in the doc says:
>
> " The backend sends a CopyOutResponse message to the frontend, followed
> by zero or more CopyData messages (always one per row), followed by
> CopyDone".
>
> In the ARRAY case, the first messages with the copyjsontest
> regression test look like this (tshark output):
>
> PostgreSQL
> Type: CopyOut response
> Length: 13
> Format: Text (0)
> Columns: 3
> Format: Text (0)
> PostgreSQL
> Type: Copy data
> Length: 6
> Copy data: 5b0a
> PostgreSQL
> Type: Copy data
> Length: 76
> Copy data:
> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…
>
> The first Copy data message with contents "5b0a" does not qualify
> as a row of data with 3 columns as advertised in the CopyOut
> message. Isn't that a problem?
Is it a real problem, or just a bit of documentation change that I missed?
Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2023-12-06 23:09 Joe Conway <[email protected]>
parent: Joe Conway <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Joe Conway @ 2023-12-06 23:09 UTC (permalink / raw)
To: Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 14:47, Joe Conway wrote:
> On 12/6/23 13:59, Daniel Verite wrote:
>> Andrew Dunstan wrote:
>>
>>> IMNSHO, we should produce either a single JSON
>>> document (the ARRAY case) or a series of JSON documents, one per row
>>> (the LINES case).
>>
>> "COPY Operations" in the doc says:
>>
>> " The backend sends a CopyOutResponse message to the frontend, followed
>> by zero or more CopyData messages (always one per row), followed by
>> CopyDone".
>>
>> In the ARRAY case, the first messages with the copyjsontest
>> regression test look like this (tshark output):
>>
>> PostgreSQL
>> Type: CopyOut response
>> Length: 13
>> Format: Text (0)
>> Columns: 3
>> Format: Text (0)
>> PostgreSQL
>> Type: Copy data
>> Length: 6
>> Copy data: 5b0a
>> PostgreSQL
>> Type: Copy data
>> Length: 76
>> Copy data:
>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…
>>
>> The first Copy data message with contents "5b0a" does not qualify
>> as a row of data with 3 columns as advertised in the CopyOut
>> message. Isn't that a problem?
>
>
> Is it a real problem, or just a bit of documentation change that I missed?
>
> Anything receiving this and looking for a json array should know how to
> assemble the data correctly despite the extra CopyData messages.
Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.
Does that sound correct?
Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2023-12-07 01:10 Joe Conway <[email protected]>
parent: Joe Conway <[email protected]>
0 siblings, 2 replies; 28+ messages in thread
From: Joe Conway @ 2023-12-07 01:10 UTC (permalink / raw)
To: Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 18:09, Joe Conway wrote:
> On 12/6/23 14:47, Joe Conway wrote:
>> On 12/6/23 13:59, Daniel Verite wrote:
>>> Andrew Dunstan wrote:
>>>
>>>> IMNSHO, we should produce either a single JSON
>>>> document (the ARRAY case) or a series of JSON documents, one per row
>>>> (the LINES case).
>>>
>>> "COPY Operations" in the doc says:
>>>
>>> " The backend sends a CopyOutResponse message to the frontend, followed
>>> by zero or more CopyData messages (always one per row), followed by
>>> CopyDone".
>>>
>>> In the ARRAY case, the first messages with the copyjsontest
>>> regression test look like this (tshark output):
>>>
>>> PostgreSQL
>>> Type: CopyOut response
>>> Length: 13
>>> Format: Text (0)
>>> Columns: 3
>>> Format: Text (0)
>>> PostgreSQL
>>> Type: Copy data
>>> Length: 6
>>> Copy data: 5b0a
>>> PostgreSQL
>>> Type: Copy data
>>> Length: 76
>>> Copy data:
>>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…
>>>
>>> The first Copy data message with contents "5b0a" does not qualify
>>> as a row of data with 3 columns as advertised in the CopyOut
>>> message. Isn't that a problem?
>>
>>
>> Is it a real problem, or just a bit of documentation change that I missed?
>>
>> Anything receiving this and looking for a json array should know how to
>> assemble the data correctly despite the extra CopyData messages.
>
> Hmm, maybe the real problem here is that Columns do not equal "3" for
> the json mode case -- that should really say "1" I think, because the
> row is not represented as 3 columns but rather 1 json object.
>
> Does that sound correct?
>
> Assuming yes, there is still maybe an issue that there are two more
> "rows" that actual output rows (the "[" and the "]"), but maybe those
> are less likely to cause some hazard?
The attached should fix the CopyOut response to say one column. I.e. it
ought to look something like:
PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 1
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data: [...]
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
[text/x-patch] copyto_json.007.diff (19.4K, 2-copyto_json.007.diff)
download | inline diff:
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,49 ----
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 207,220 ----
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 378,396 ----
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
<varlistentry>
<term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,425 ----
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 444,451 ----
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 543,555 ----
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "convert_selectively") == 0)
{
/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 608,625 ----
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->json_mode)
+ {
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+ }
+ else if (opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 689,699 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot specify HEADER in BINARY mode")));
+ if (opts_out->json_mode && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify HEADER in JSON mode")));
+
/* Check quote */
if (!opts_out->csv_mode && opts_out->quote != NULL)
ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..e068229 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 28,33 ****
--- 28,34 ----
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+ #include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
***************
*** 37,42 ****
--- 38,44 ----
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+ #include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 114,121 ----
/* NOTE: there's a copy of this in copyfromparse.c */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
/* non-export function prototypes */
static void EndCopy(CopyToState cstate);
*************** SendCopyBegin(CopyToState cstate)
*** 146,154 ****
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
! pq_sendint16(&buf, natts);
! for (i = 0; i < natts; i++)
! pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
--- 150,169 ----
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
! if (!cstate->opts.json_mode)
! {
! pq_sendint16(&buf, natts);
! for (i = 0; i < natts; i++)
! pq_sendint16(&buf, format); /* per-column formats */
! }
! else
! {
! /*
! * JSON mode is always one non-binary column
! */
! pq_sendint16(&buf, 1);
! pq_sendint16(&buf, 0);
! }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
*************** DoCopyTo(CopyToState cstate)
*** 759,764 ****
--- 774,781 ----
tupDesc = RelationGetDescr(cstate->rel);
else
tupDesc = cstate->queryDesc->tupDesc;
+ BlessTupleDesc(tupDesc);
+
num_phys_attrs = tupDesc->natts;
cstate->opts.null_print_client = cstate->opts.null_print; /* default */
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 862,881 ----
CopySendEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send
+ * the opening bracket.
+ */
+ if (cstate->opts.json_mode)
+ {
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendEndOfRow(cstate);
+ }
+ json_row_delim_needed = false;
+ }
}
if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 923,939 ----
CopySendEndOfRow(cstate);
}
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * closing bracket.
+ */
+ if (cstate->opts.json_mode &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendEndOfRow(cstate);
+ }
+
MemoryContextDelete(cstate->rowcontext);
if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 948,954 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
!
! foreach(cur, cstate->attnumlist)
{
! int attnum = lfirst_int(cur);
! Datum value = slot->tts_values[attnum - 1];
! bool isnull = slot->tts_isnull[attnum - 1];
! if (!cstate->opts.binary)
! {
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
! else
! CopySendInt32(cstate, -1);
! }
! else
{
if (!cstate->opts.binary)
{
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
else
! CopyAttributeOutText(cstate, string);
}
else
{
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
}
}
}
CopySendEndOfRow(cstate);
--- 959,1042 ----
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! if (!cstate->opts.json_mode)
{
! bool need_delim = false;
! FmgrInfo *out_functions = cstate->out_functions;
! ListCell *cur;
! char *string;
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
! foreach(cur, cstate->attnumlist)
{
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
if (!cstate->opts.binary)
{
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
!
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
else
! CopySendInt32(cstate, -1);
}
else
{
! if (!cstate->opts.binary)
! {
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
! else
! CopyAttributeOutText(cstate, string);
! }
! else
! {
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
! }
}
}
}
+ else
+ {
+ Datum rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ StringInfo result;
+
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ if (json_row_delim_needed &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ',');
+ }
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ json_row_delim_needed = true;
+ }
+
+ CopySendData(cstate, result->data, result->len);
+ }
CopySendEndOfRow(cstate);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..e6789d7 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3461 ----
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3498,3507 ----
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
JsonUniqueBuilderState unique_check;
} JsonAggState;
- static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
/*
* Turn a composite / record into JSON.
*/
! static void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
--- 488,496 ----
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
! void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..97899b6 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool json_mode; /* JSON format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,68 ----
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
+ bool force_array; /* add JSON array decorations */
bool convert_selectively; /* do selective binary conversion? */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
#include "lib/stringinfo.h"
/* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..31913f6 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,117 ----
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array true);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array false);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout (format json, header);
+ ERROR: cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..4b76541 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,101 ----
copy copytest3 to stdout csv header;
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+
+ copy copytest to stdout (format json);
+
+ copy copytest to stdout (format json, force_array);
+
+ copy copytest to stdout (format json, force_array true);
+
+ copy copytest to stdout (format json, force_array false);
+
+ -- Error
+ copy copytest to stdout (format json, header);
+
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+
+ copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-19 08:09 Masahiko Sawada <[email protected]>
parent: Joe Conway <[email protected]>
1 sibling, 1 reply; 28+ messages in thread
From: Masahiko Sawada @ 2024-01-19 08:09 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Thu, Dec 7, 2023 at 10:10 AM Joe Conway <[email protected]> wrote:
>
> On 12/6/23 18:09, Joe Conway wrote:
> > On 12/6/23 14:47, Joe Conway wrote:
> >> On 12/6/23 13:59, Daniel Verite wrote:
> >>> Andrew Dunstan wrote:
> >>>
> >>>> IMNSHO, we should produce either a single JSON
> >>>> document (the ARRAY case) or a series of JSON documents, one per row
> >>>> (the LINES case).
> >>>
> >>> "COPY Operations" in the doc says:
> >>>
> >>> " The backend sends a CopyOutResponse message to the frontend, followed
> >>> by zero or more CopyData messages (always one per row), followed by
> >>> CopyDone".
> >>>
> >>> In the ARRAY case, the first messages with the copyjsontest
> >>> regression test look like this (tshark output):
> >>>
> >>> PostgreSQL
> >>> Type: CopyOut response
> >>> Length: 13
> >>> Format: Text (0)
> >>> Columns: 3
> >>> Format: Text (0)
> >>> PostgreSQL
> >>> Type: Copy data
> >>> Length: 6
> >>> Copy data: 5b0a
> >>> PostgreSQL
> >>> Type: Copy data
> >>> Length: 76
> >>> Copy data:
> >>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…
> >>>
> >>> The first Copy data message with contents "5b0a" does not qualify
> >>> as a row of data with 3 columns as advertised in the CopyOut
> >>> message. Isn't that a problem?
> >>
> >>
> >> Is it a real problem, or just a bit of documentation change that I missed?
> >>
> >> Anything receiving this and looking for a json array should know how to
> >> assemble the data correctly despite the extra CopyData messages.
> >
> > Hmm, maybe the real problem here is that Columns do not equal "3" for
> > the json mode case -- that should really say "1" I think, because the
> > row is not represented as 3 columns but rather 1 json object.
> >
> > Does that sound correct?
> >
> > Assuming yes, there is still maybe an issue that there are two more
> > "rows" that actual output rows (the "[" and the "]"), but maybe those
> > are less likely to cause some hazard?
>
>
> The attached should fix the CopyOut response to say one column. I.e. it
> ought to look something like:
>
> PostgreSQL
> Type: CopyOut response
> Length: 13
> Format: Text (0)
> Columns: 1
> Format: Text (0)
> PostgreSQL
> Type: Copy data
> Length: 6
> Copy data: 5b0a
> PostgreSQL
> Type: Copy data
> Length: 76
> Copy data: [...]
>
If I'm not missing, copyto_json.007.diff is the latest patch but it
needs to be rebased to the current HEAD. Here are random comments:
---
if (opts_out->json_mode)
+ {
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+ }
+ else if (opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
I think that flatting these two condition make the code more readable:
if (opts_out->json_mode && is_from)
ereport(ERROR, ...);
if (!opts_out->json_mode && opts_out->force_array)
ereport(ERROR, ...);
Also these checks can be moved close to other checks at the end of
ProcessCopyOptions().
---
@@ -3395,6 +3395,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3427,6 +3431,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *)
makeBoolean(true), @1);
+ }
;
I believe we don't need to support new options in old-style syntax.
---
@@ -3469,6 +3477,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
I think it's not necessary. "format" option is already handled in
copy_generic_opt_elem.
---
+/* need delimiter to start next json array element */
+static bool json_row_delim_needed = false;
I think it's cleaner to include json_row_delim_needed into CopyToStateData.
---
Splitting the patch into two patches: add json format and add
force_array option would make reviews easy.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-23 05:31 jian he <[email protected]>
parent: Masahiko Sawada <[email protected]>
0 siblings, 2 replies; 28+ messages in thread
From: jian he @ 2024-01-23 05:31 UTC (permalink / raw)
To: Masahiko Sawada <[email protected]>; +Cc: Joe Conway <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Fri, Jan 19, 2024 at 4:10 PM Masahiko Sawada <[email protected]> wrote:
>
> If I'm not missing, copyto_json.007.diff is the latest patch but it
> needs to be rebased to the current HEAD. Here are random comments:
>
please check the latest version.
> if (opts_out->json_mode)
> + {
> + if (is_from)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use JSON mode in COPY FROM")));
> + }
> + else if (opts_out->force_array)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("COPY FORCE_ARRAY requires JSON mode")));
>
> I think that flatting these two condition make the code more readable:
I make it two condition check
> if (opts_out->json_mode && is_from)
> ereport(ERROR, ...);
>
> if (!opts_out->json_mode && opts_out->force_array)
> ereport(ERROR, ...);
>
> Also these checks can be moved close to other checks at the end of
> ProcessCopyOptions().
>
Yes. I did it, please check it.
> @@ -3395,6 +3395,10 @@ copy_opt_item:
> {
> $$ = makeDefElem("format", (Node *) makeString("csv"), @1);
> }
> + | JSON
> + {
> + $$ = makeDefElem("format", (Node *) makeString("json"), @1);
> + }
> | HEADER_P
> {
> $$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
> @@ -3427,6 +3431,10 @@ copy_opt_item:
> {
> $$ = makeDefElem("encoding", (Node *) makeString($2), @1);
> }
> + | FORCE ARRAY
> + {
> + $$ = makeDefElem("force_array", (Node *)
> makeBoolean(true), @1);
> + }
> ;
>
> I believe we don't need to support new options in old-style syntax.
>
> ---
> @@ -3469,6 +3477,10 @@ copy_generic_opt_elem:
> {
> $$ = makeDefElem($1, $2, @1);
> }
> + | FORMAT_LA copy_generic_opt_arg
> + {
> + $$ = makeDefElem("format", $2, @1);
> + }
> ;
>
> I think it's not necessary. "format" option is already handled in
> copy_generic_opt_elem.
>
test it, I found out this part is necessary.
because a query with WITH like `copy (select 1) to stdout with
(format json, force_array false); ` will fail.
> ---
> +/* need delimiter to start next json array element */
> +static bool json_row_delim_needed = false;
>
> I think it's cleaner to include json_row_delim_needed into CopyToStateData.
yes. I agree. So I did it.
> ---
> Splitting the patch into two patches: add json format and add
> force_array option would make reviews easy.
>
done. one patch for json format, another one for force_array option.
I also made the following cases fail.
copy copytest to stdout (format csv, force_array false);
ERROR: specify COPY FORCE_ARRAY is only allowed in JSON mode.
If copy to table then call table_scan_getnextslot no need to worry
about the Tupdesc.
however if we copy a query output as format json, we may need to consider it.
cstate->queryDesc->tupDesc is the output of Tupdesc, we can rely on this.
for copy a query result to json, I memcpy( cstate->queryDesc->tupDesc)
to the the slot's slot->tts_tupleDescriptor
so composite_to_json can use cstate->queryDesc->tupDesc to do the work.
I guess this will make it more bullet-proof.
Attachments:
[text/x-patch] v8-0002-Add-force_array-for-COPY-TO-json-fomrat.patch (8.5K, 2-v8-0002-Add-force_array-for-COPY-TO-json-fomrat.patch)
download | inline diff:
From 214ad534d13730cba13008798c3d70f8b363436f Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 23 Jan 2024 12:26:43 +0800
Subject: [PATCH v8 2/2] Add force_array for COPY TO json fomrat.
make add open brackets and close for the whole output.
separate each json row with comma after the first row.
---
doc/src/sgml/ref/copy.sgml | 14 ++++++++++++++
src/backend/commands/copy.c | 17 +++++++++++++++++
src/backend/commands/copyto.c | 30 ++++++++++++++++++++++++++++++
src/backend/parser/gram.y | 4 ++++
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 24 ++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 10 ++++++++++
7 files changed, 100 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index ccd90b61..d19332ac 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
ON_ERROR '<replaceable class="parameter">error_action</replaceable>'
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
@@ -379,6 +380,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ON_ERROR</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5d5b733d..e15056e1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -456,6 +456,7 @@ ProcessCopyOptions(ParseState *pstate,
bool freeze_specified = false;
bool header_specified = false;
bool on_error_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -610,6 +611,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -806,6 +814,15 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot use JSON mode in COPY FROM")));
+ if (!opts_out->json_mode && opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ if (!opts_out->json_mode && force_array_specified)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("specify COPY FORCE_ARRAY is only allowed in JSON mode")));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4f55d6d5..d9245df0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -88,6 +88,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter to start next json array element */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -858,6 +859,16 @@ DoCopyTo(CopyToState cstate)
CopySendEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send
+ * the opening bracket.
+ */
+ if (cstate->opts.json_mode && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendEndOfRow(cstate);
+ }
}
if (cstate->rel)
@@ -905,6 +916,15 @@ DoCopyTo(CopyToState cstate)
CopySendEndOfRow(cstate);
}
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * closing bracket.
+ */
+ if (cstate->opts.json_mode && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendEndOfRow(cstate);
+ }
MemoryContextDelete(cstate->rowcontext);
if (fe_copy)
@@ -1006,6 +1026,16 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ',');
+ }
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
CopySendData(cstate, result->data, result->len);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 702f04c3..4e13a0ab 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3468,6 +3468,10 @@ copy_opt_item:
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f591b613..51656eec 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -72,6 +72,7 @@ typedef struct CopyFormatOptions
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
+ bool force_array; /* add JSON array decorations */
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
List *convert_select; /* list of column names (can be NIL) */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0c5ade47..1b200b0d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -59,6 +59,30 @@ ERROR: cannot specify HEADER in JSON mode
-- Error
copy copytest from stdout (format json);
ERROR: cannot use JSON mode in COPY FROM
+--Error
+copy copytest to stdout (format csv, force_array false);
+ERROR: specify COPY FORCE_ARRAY is only allowed in JSON mode
+copy copytest from stdin (format json, force_array true);
+ERROR: cannot use JSON mode in COPY FROM
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 96e4f0b6..a07d27af 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -64,6 +64,16 @@ copy copytest to stdout (format json, header);
-- Error
copy copytest from stdout (format json);
+--Error
+copy copytest to stdout (format csv, force_array false);
+copy copytest from stdin (format json, force_array true);
+
+copy copytest to stdout (format json, force_array);
+
+copy copytest to stdout (format json, force_array true);
+
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
--
2.34.1
[text/x-patch] v8-0001-Add-another-COPY-fomrat-json.patch (13.7K, 3-v8-0001-Add-another-COPY-fomrat-json.patch)
download | inline diff:
From 0cd43bfbaeacecaffcd8167d1aab0115aa229847 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 22 Jan 2024 22:58:37 +0800
Subject: [PATCH v8 1/2] Add another COPY fomrat: json.
this format is only allowed in COPY TO operation.
---
doc/src/sgml/ref/copy.sgml | 5 ++
src/backend/commands/copy.c | 13 ++++
src/backend/commands/copyto.c | 121 +++++++++++++++++++----------
src/backend/parser/gram.y | 8 ++
src/backend/utils/adt/json.c | 5 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 54 +++++++++++++
src/test/regress/sql/copy.sql | 39 ++++++++++
9 files changed, 204 insertions(+), 44 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a0..ccd90b61 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -207,9 +207,14 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6..5d5b733d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -480,6 +480,8 @@ ProcessCopyOptions(ParseState *pstate,
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
@@ -716,6 +718,11 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot specify HEADER in BINARY mode")));
+ if (opts_out->json_mode && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify HEADER in JSON mode")));
+
/* Check quote */
if (!opts_out->csv_mode && opts_out->quote != NULL)
ereport(ERROR,
@@ -793,6 +800,12 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY FREEZE cannot be used with COPY TO")));
+ /* Check json format */
+ if (opts_out->json_mode && is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc8..4f55d6d5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -28,6 +28,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -37,6 +38,7 @@
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
@@ -146,9 +148,20 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (!cstate->opts.json_mode)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON mode is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -906,11 +919,7 @@ DoCopyTo(CopyToState cstate)
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -921,54 +930,84 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
- /* Make sure the tuple is fully deconstructed */
- slot_getallattrs(slot);
-
- foreach(cur, cstate->attnumlist)
+ if (!cstate->opts.json_mode)
{
- int attnum = lfirst_int(cur);
- Datum value = slot->tts_values[attnum - 1];
- bool isnull = slot->tts_isnull[attnum - 1];
+ bool need_delim = false;
+ FmgrInfo *out_functions = cstate->out_functions;
+ ListCell *cur;
+ char *string;
- if (!cstate->opts.binary)
- {
- if (need_delim)
- CopySendChar(cstate, cstate->opts.delim[0]);
- need_delim = true;
- }
+ /* Make sure the tuple is fully deconstructed */
+ slot_getallattrs(slot);
- if (isnull)
- {
- if (!cstate->opts.binary)
- CopySendString(cstate, cstate->opts.null_print_client);
- else
- CopySendInt32(cstate, -1);
- }
- else
+ foreach(cur, cstate->attnumlist)
{
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
if (!cstate->opts.binary)
{
- string = OutputFunctionCall(&out_functions[attnum - 1],
- value);
- if (cstate->opts.csv_mode)
- CopyAttributeOutCSV(cstate, string,
- cstate->opts.force_quote_flags[attnum - 1],
- list_length(cstate->attnumlist) == 1);
+ if (need_delim)
+ CopySendChar(cstate, cstate->opts.delim[0]);
+ need_delim = true;
+ }
+
+ if (isnull)
+ {
+ if (!cstate->opts.binary)
+ CopySendString(cstate, cstate->opts.null_print_client);
else
- CopyAttributeOutText(cstate, string);
+ CopySendInt32(cstate, -1);
}
else
{
- bytea *outputbytes;
+ if (!cstate->opts.binary)
+ {
+ string = OutputFunctionCall(&out_functions[attnum - 1],
+ value);
+ if (cstate->opts.csv_mode)
+ CopyAttributeOutCSV(cstate, string,
+ cstate->opts.force_quote_flags[attnum - 1],
+ list_length(cstate->attnumlist) == 1);
+ else
+ CopyAttributeOutText(cstate, string);
+ }
+ else
+ {
+ bytea *outputbytes;
- outputbytes = SendFunctionCall(&out_functions[attnum - 1],
- value);
- CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
- CopySendData(cstate, VARDATA(outputbytes),
- VARSIZE(outputbytes) - VARHDRSZ);
+ outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+ value);
+ CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+ CopySendData(cstate, VARDATA(outputbytes),
+ VARSIZE(outputbytes) - VARHDRSZ);
+ }
}
}
}
+ else
+ {
+ Datum rowdata;
+ StringInfo result;
+
+ if(!cstate->rel)
+ {
+ for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+ {
+ /* Flat-copy the attribute array */
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+ TupleDescAttr(cstate->queryDesc->tupDesc, i),
+ 1 * sizeof(FormData_pg_attribute));
+ }
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+ }
CopySendEndOfRow(cstate);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3460fea5..702f04c3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3424,6 +3424,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3506,6 +3510,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f..fabd4e61 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0..f591b613 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -53,6 +53,7 @@ typedef struct CopyFormatOptions
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool json_mode; /* JSON format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b38..d5631171 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365ec..0c5ade47 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -42,6 +42,60 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- Error
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+-- Error
+copy copytest from stdout (format json);
+ERROR: cannot use JSON mode in COPY FROM
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e906..96e4f0b6 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -54,6 +54,45 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- Error
+copy copytest to stdout (format json, header);
+-- Error
+copy copytest from stdout (format json);
+
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-27 05:55 Junwang Zhao <[email protected]>
parent: jian he <[email protected]>
1 sibling, 1 reply; 28+ messages in thread
From: Junwang Zhao @ 2024-01-27 05:55 UTC (permalink / raw)
To: jian he <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers; Sutou Kouhei <[email protected]>
Hi hackers,
Kou-san(CCed) has been working on *Make COPY format extendable[1]*, so
I think making *copy to json* based on that work might be the right direction.
I write an extension for that purpose, and here is the patch set together
with Kou-san's *extendable copy format* implementation:
0001-0009 is the implementation of extendable copy format
00010 is the pg_copy_json extension
I also created a PR[2] if anybody likes the github review style.
The *extendable copy format* feature is still being developed, I post this
email in case the patch set in this thread is committed without knowing
the *extendable copy format* feature.
I'd like to hear your opinions.
[1]: https://www.postgresql.org/message-id/20240124.144936.67229716500876806.kou%40clear-code.com
[2]: https://github.com/zhjwpku/postgres/pull/2/files
--
Regards
Junwang Zhao
Attachments:
[application/octet-stream] v8-0004-Add-support-for-implementing-custom-COPY-TO-forma.patch (3.3K, 2-v8-0004-Add-support-for-implementing-custom-COPY-TO-forma.patch)
download | inline diff:
From 4b177469f3fb8f14f0cd6bff3c7878dcafd9b760 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Tue, 23 Jan 2024 15:12:43 +0900
Subject: [PATCH v8 04/10] Add support for implementing custom COPY TO format
as extension
* Add CopyToStateData::opaque that can be used to keep data for custom
COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
* Rename CopySendEndOfRow() to CopyToStateFlush() because it's a
method for CopyToState and it's used for flushing. End-of-row related
codes were moved to CopyToTextSendEndOfRow().
---
src/backend/commands/copyto.c | 15 +++++++--------
src/include/commands/copyapi.h | 5 +++++
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cfc74ee7b1..b5d8678394 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -69,7 +69,6 @@ static void SendCopyEnd(CopyToState cstate);
static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
static void CopySendString(CopyToState cstate, const char *str);
static void CopySendChar(CopyToState cstate, char c);
-static void CopySendEndOfRow(CopyToState cstate);
static void CopySendInt32(CopyToState cstate, int32 val);
static void CopySendInt16(CopyToState cstate, int16 val);
@@ -117,7 +116,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
default:
break;
}
- CopySendEndOfRow(cstate);
+ CopyToStateFlush(cstate);
}
static void
@@ -302,7 +301,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
}
}
- CopySendEndOfRow(cstate);
+ CopyToStateFlush(cstate);
}
static void
@@ -311,7 +310,7 @@ CopyToBinaryEnd(CopyToState cstate)
/* Generate trailer for a binary copy */
CopySendInt16(cstate, -1);
/* Need to flush out the trailer */
- CopySendEndOfRow(cstate);
+ CopyToStateFlush(cstate);
}
CopyToRoutine CopyToRoutineText = {
@@ -377,8 +376,8 @@ SendCopyEnd(CopyToState cstate)
* CopySendData sends output data to the destination (file or frontend)
* CopySendString does the same for null-terminated strings
* CopySendChar does the same for single characters
- * CopySendEndOfRow does the appropriate thing at end of each data row
- * (data is not actually flushed except by CopySendEndOfRow)
+ * CopyToStateFlush flushes the buffered data
+ * (data is not actually flushed except by CopyToStateFlush)
*
* NB: no data conversion is applied by these functions
*----------
@@ -401,8 +400,8 @@ CopySendChar(CopyToState cstate, char c)
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
}
-static void
-CopySendEndOfRow(CopyToState cstate)
+void
+CopyToStateFlush(CopyToState cstate)
{
StringInfo fe_msgbuf = cstate->fe_msgbuf;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index a869d78d72..ffad433a21 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -174,6 +174,11 @@ typedef struct CopyToStateData
FmgrInfo *out_functions; /* lookup info for output functions */
MemoryContext rowcontext; /* per-row evaluation context */
uint64 bytes_processed; /* number of bytes processed so far */
+
+ /* For custom format implementation */
+ void *opaque; /* private space */
} CopyToStateData;
+extern void CopyToStateFlush(CopyToState cstate);
+
#endif /* COPYAPI_H */
--
2.41.0
[application/octet-stream] v8-0003-Export-CopyToStateData.patch (13.8K, 3-v8-0003-Export-CopyToStateData.patch)
download | inline diff:
From c3a59753b1157dc8e47e719263f58677acc33178 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Tue, 23 Jan 2024 14:54:10 +0900
Subject: [PATCH v8 03/10] Export CopyToStateData
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest enum values such as COPY_FILE are conflicted
CopySource enum values defined in copyfrom_internal.h. So COPY_DEST_
prefix instead of COPY_ prefix is used. For example, COPY_FILE is
renamed to COPY_DEST_FILE.
Note that this change isn't enough to implement a custom COPY TO
format handler as extension. We'll do the followings in a subsequent
commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
src/backend/commands/copyto.c | 74 +++-----------------
src/include/commands/copy.h | 59 ----------------
src/include/commands/copyapi.h | 120 ++++++++++++++++++++++++++++++++-
3 files changed, 127 insertions(+), 126 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 6547b7c654..cfc74ee7b1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -43,64 +43,6 @@
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
- COPY_FILE, /* to file (or a piped program) */
- COPY_FRONTEND, /* to frontend */
- COPY_CALLBACK, /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
- /* low-level state data */
- CopyDest copy_dest; /* type of copy source/destination */
- FILE *copy_file; /* used if copy_dest == COPY_FILE */
- StringInfo fe_msgbuf; /* used for all dests during COPY TO */
-
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy to */
- QueryDesc *queryDesc; /* executable query to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDOUT */
- bool is_program; /* is 'filename' a program to popen? */
- copy_data_dest_cb data_dest_cb; /* function for writing data */
-
- CopyFormatOptions opts;
- Node *whereClause; /* WHERE condition (or NULL) */
-
- /*
- * Working state
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- FmgrInfo *out_functions; /* lookup info for output functions */
- MemoryContext rowcontext; /* per-row evaluation context */
- uint64 bytes_processed; /* number of bytes processed so far */
-} CopyToStateData;
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -160,7 +102,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
{
switch (cstate->copy_dest)
{
- case COPY_FILE:
+ case COPY_DEST_FILE:
/* Default line termination depends on platform */
#ifndef WIN32
CopySendChar(cstate, '\n');
@@ -168,7 +110,7 @@ CopyToTextSendEndOfRow(CopyToState cstate)
CopySendString(cstate, "\r\n");
#endif
break;
- case COPY_FRONTEND:
+ case COPY_DEST_FRONTEND:
/* The FE/BE protocol uses \n as newline for all platforms */
CopySendChar(cstate, '\n');
break;
@@ -419,7 +361,7 @@ SendCopyBegin(CopyToState cstate)
for (i = 0; i < natts; i++)
pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
- cstate->copy_dest = COPY_FRONTEND;
+ cstate->copy_dest = COPY_DEST_FRONTEND;
}
static void
@@ -466,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
- case COPY_FILE:
+ case COPY_DEST_FILE:
if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
cstate->copy_file) != 1 ||
ferror(cstate->copy_file))
@@ -500,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate)
errmsg("could not write to COPY file: %m")));
}
break;
- case COPY_FRONTEND:
+ case COPY_DEST_FRONTEND:
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
break;
- case COPY_CALLBACK:
+ case COPY_DEST_CALLBACK:
cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
break;
}
@@ -877,12 +819,12 @@ BeginCopyTo(ParseState *pstate,
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
- cstate->copy_dest = COPY_FILE; /* default */
+ cstate->copy_dest = COPY_DEST_FILE; /* default */
if (data_dest_cb)
{
progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
- cstate->copy_dest = COPY_CALLBACK;
+ cstate->copy_dest = COPY_DEST_CALLBACK;
cstate->data_dest_cb = data_dest_cb;
}
else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 34bea880ca..b3f4682f95 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,69 +20,10 @@
#include "parser/parse_node.h"
#include "tcop/dest.h"
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
- COPY_HEADER_FALSE = 0,
- COPY_HEADER_TRUE,
- COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors. More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
- COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
- COPY_ON_ERROR_IGNORE, /* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
- /* parameters from the COPY command */
- int file_encoding; /* file or remote side's character encoding,
- * -1 if not specified */
- bool binary; /* binary format? */
- bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
- CopyHeaderChoice header_line; /* header line? */
- char *null_print; /* NULL marker string (server encoding!) */
- int null_print_len; /* length of same */
- char *null_print_client; /* same converted to file encoding */
- char *default_print; /* DEFAULT marker string */
- int default_print_len; /* length of same */
- char *delim; /* column delimiter (must be 1 byte) */
- char *quote; /* CSV quote char (must be 1 byte) */
- char *escape; /* CSV escape char (must be 1 byte) */
- List *force_quote; /* list of column names */
- bool force_quote_all; /* FORCE_QUOTE *? */
- bool *force_quote_flags; /* per-column CSV FQ flags */
- List *force_notnull; /* list of column names */
- bool force_notnull_all; /* FORCE_NOT_NULL *? */
- bool *force_notnull_flags; /* per-column CSV FNN flags */
- List *force_null; /* list of column names */
- bool force_null_all; /* FORCE_NULL *? */
- bool *force_null_flags; /* per-column CSV FN flags */
- bool convert_selectively; /* do selective binary conversion? */
- CopyOnErrorChoice on_error; /* what to do when error happened */
- List *convert_select; /* list of column names (can be NIL) */
- CopyToRoutine *to_routine; /* callback routines for COPY TO */
-} CopyFormatOptions;
-
/* This is private in commands/copyfrom.c */
typedef struct CopyFromStateData *CopyFromState;
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
int stmt_location, int stmt_len,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 9c25e1c415..a869d78d72 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,10 +14,10 @@
#ifndef COPYAPI_H
#define COPYAPI_H
+#include "executor/execdesc.h"
#include "executor/tuptable.h"
#include "nodes/parsenodes.h"
-/* This is private in commands/copyto.c */
typedef struct CopyToStateData *CopyToState;
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
@@ -58,4 +58,122 @@ extern CopyToRoutine CopyToRoutineText;
extern CopyToRoutine CopyToRoutineCSV;
extern CopyToRoutine CopyToRoutineBinary;
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+ COPY_HEADER_FALSE = 0,
+ COPY_HEADER_TRUE,
+ COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors. More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+ COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
+ COPY_ON_ERROR_IGNORE, /* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+ /* parameters from the COPY command */
+ int file_encoding; /* file or remote side's character encoding,
+ * -1 if not specified */
+ bool binary; /* binary format? */
+ bool freeze; /* freeze rows on loading? */
+ bool csv_mode; /* Comma Separated Value format? */
+ CopyHeaderChoice header_line; /* header line? */
+ char *null_print; /* NULL marker string (server encoding!) */
+ int null_print_len; /* length of same */
+ char *null_print_client; /* same converted to file encoding */
+ char *default_print; /* DEFAULT marker string */
+ int default_print_len; /* length of same */
+ char *delim; /* column delimiter (must be 1 byte) */
+ char *quote; /* CSV quote char (must be 1 byte) */
+ char *escape; /* CSV escape char (must be 1 byte) */
+ List *force_quote; /* list of column names */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool *force_quote_flags; /* per-column CSV FQ flags */
+ List *force_notnull; /* list of column names */
+ bool force_notnull_all; /* FORCE_NOT_NULL *? */
+ bool *force_notnull_flags; /* per-column CSV FNN flags */
+ List *force_null; /* list of column names */
+ bool force_null_all; /* FORCE_NULL *? */
+ bool *force_null_flags; /* per-column CSV FN flags */
+ bool convert_selectively; /* do selective binary conversion? */
+ CopyOnErrorChoice on_error; /* what to do when error happened */
+ List *convert_select; /* list of column names (can be NIL) */
+ CopyToRoutine *to_routine; /* callback routines for COPY TO */
+} CopyFormatOptions;
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+ COPY_DEST_FILE, /* to file (or a piped program) */
+ COPY_DEST_FRONTEND, /* to frontend */
+ COPY_DEST_CALLBACK, /* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ FILE *copy_file; /* used if copy_dest == COPY_FILE */
+ StringInfo fe_msgbuf; /* used for all dests during COPY TO */
+
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy to */
+ QueryDesc *queryDesc; /* executable query to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDOUT */
+ bool is_program; /* is 'filename' a program to popen? */
+ copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+ CopyFormatOptions opts;
+ Node *whereClause; /* WHERE condition (or NULL) */
+
+ /*
+ * Working state
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ FmgrInfo *out_functions; /* lookup info for output functions */
+ MemoryContext rowcontext; /* per-row evaluation context */
+ uint64 bytes_processed; /* number of bytes processed so far */
+} CopyToStateData;
+
#endif /* COPYAPI_H */
--
2.41.0
[application/octet-stream] v8-0001-Extract-COPY-TO-format-implementations.patch (23.8K, 4-v8-0001-Extract-COPY-TO-format-implementations.patch)
download | inline diff:
From 6e68ba6380dc825a242e7f0d0a53442bba3a4a61 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v8 01/10] Extract COPY TO format implementations
This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com
This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
Note that CopyToRoutine can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.
Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:
https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us
> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.
You can see that there is no significant loss of performance:
Data: Random 32 bit integers:
CREATE TABLE data (int32 integer);
INSERT INTO data
SELECT random() * 10000
FROM generate_series(1, ${n_records});
The number of records: 100K, 1M and 10M
100K without this change:
format,elapsed time (ms)
text,11.002
csv,11.696
binary,11.352
100K with this change:
format,elapsed time (ms)
text,100000,11.562
csv,100000,11.889
binary,100000,10.825
1M without this change:
format,elapsed time (ms)
text,108.359
csv,114.233
binary,111.251
1M with this change:
format,elapsed time (ms)
text,111.269
csv,116.277
binary,104.765
10M without this change:
format,elapsed time (ms)
text,1090.763
csv,1136.103
binary,1137.141
10M with this change:
format,elapsed time (ms)
text,1082.654
csv,1196.991
binary,1069.697
---
contrib/file_fdw/file_fdw.c | 2 +-
src/backend/commands/copy.c | 43 +++-
src/backend/commands/copyfrom.c | 2 +-
src/backend/commands/copyto.c | 428 ++++++++++++++++++++------------
src/include/commands/copy.h | 7 +-
src/include/commands/copyapi.h | 59 +++++
6 files changed, 376 insertions(+), 165 deletions(-)
create mode 100644 src/include/commands/copyapi.h
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 249d82d3a0..9e4e819858 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -329,7 +329,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
/*
* Now apply the core COPY code's validation logic for more checks.
*/
- ProcessCopyOptions(NULL, NULL, true, other_options);
+ ProcessCopyOptions(NULL, NULL, true, NULL, other_options);
/*
* Either filename or program option is required for file_fdw foreign
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..5f3697a5f9 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -442,6 +442,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
* a list of options. In that usage, 'opts_out' can be passed as NULL and
* the collected data is just leaked until CurrentMemoryContext is reset.
*
+ * 'cstate' is CopyToState* for !is_from, CopyFromState* for is_from. 'cstate'
+ * may be NULL. For example, file_fdw uses NULL.
+ *
* Note that additional checking, such as whether column names listed in FORCE
* QUOTE actually exist, has to be applied later. This just checks for
* self-consistency of the options list.
@@ -450,6 +453,7 @@ void
ProcessCopyOptions(ParseState *pstate,
CopyFormatOptions *opts_out,
bool is_from,
+ void *cstate,
List *options)
{
bool format_specified = false;
@@ -464,7 +468,13 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->file_encoding = -1;
- /* Extract options from the statement node tree */
+ /* Text is the default format. */
+ opts_out->to_routine = &CopyToRoutineText;
+
+ /*
+ * Extract only the "format" option to detect target routine as the first
+ * step
+ */
foreach(option, options)
{
DefElem *defel = lfirst_node(DefElem, option);
@@ -479,15 +489,29 @@ ProcessCopyOptions(ParseState *pstate,
if (strcmp(fmt, "text") == 0)
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
+ {
opts_out->csv_mode = true;
+ opts_out->to_routine = &CopyToRoutineCSV;
+ }
else if (strcmp(fmt, "binary") == 0)
+ {
opts_out->binary = true;
+ opts_out->to_routine = &CopyToRoutineBinary;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY format \"%s\" not recognized", fmt),
parser_errposition(pstate, defel->location)));
}
+ }
+ /* Extract options except "format" from the statement node tree */
+ foreach(option, options)
+ {
+ DefElem *defel = lfirst_node(DefElem, option);
+
+ if (strcmp(defel->defname, "format") == 0)
+ continue;
else if (strcmp(defel->defname, "freeze") == 0)
{
if (freeze_specified)
@@ -616,11 +640,18 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from);
}
else
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("option \"%s\" not recognized",
- defel->defname),
- parser_errposition(pstate, defel->location)));
+ {
+ bool processed = false;
+
+ if (!is_from)
+ processed = opts_out->to_routine->CopyToProcessOption(cstate, defel);
+ if (!processed)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("option \"%s\" not recognized",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
}
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..fb3d4d9296 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1416,7 +1416,7 @@ BeginCopyFrom(ParseState *pstate,
oldcontext = MemoryContextSwitchTo(cstate->copycontext);
/* Extract options from the statement node tree */
- ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
+ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , cstate, options);
/* Process the target relation */
cstate->rel = rel;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc854..6547b7c654 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,275 @@ static void CopySendEndOfRow(CopyToState cstate);
static void CopySendInt32(CopyToState cstate, int32 val);
static void CopySendInt16(CopyToState cstate, int16 val);
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToRoutine implementation for "text" and "csv". CopyToText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
+ * move the code to here later. */
+static bool
+CopyToTextProcessOption(CopyToState cstate, DefElem *defel)
+{
+ return false;
+}
+
+static int16
+CopyToTextGetFormat(CopyToState cstate)
+{
+ return 0;
+}
+
+static void
+CopyToTextSendEndOfRow(CopyToState cstate)
+{
+ switch (cstate->copy_dest)
+ {
+ case COPY_FILE:
+ /* Default line termination depends on platform */
+#ifndef WIN32
+ CopySendChar(cstate, '\n');
+#else
+ CopySendString(cstate, "\r\n");
+#endif
+ break;
+ case COPY_FRONTEND:
+ /* The FE/BE protocol uses \n as newline for all platforms */
+ CopySendChar(cstate, '\n');
+ break;
+ default:
+ break;
+ }
+ CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+ int num_phys_attrs;
+ ListCell *cur;
+
+ num_phys_attrs = tupDesc->natts;
+ /* Get info about the columns we need to process. */
+ cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ Oid out_func_oid;
+ bool isvarlena;
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+ }
+
+ /*
+ * For non-binary copy, we need to convert null_print to file encoding,
+ * because it will be sent directly with CopySendString.
+ */
+ if (cstate->need_transcoding)
+ cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+ cstate->opts.null_print_len,
+ cstate->file_encoding);
+
+ /* if a header has been requested send the line */
+ if (cstate->opts.header_line)
+ {
+ bool hdr_delim = false;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ char *colname;
+
+ if (hdr_delim)
+ CopySendChar(cstate, cstate->opts.delim[0]);
+ hdr_delim = true;
+
+ colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+ if (cstate->opts.csv_mode)
+ CopyAttributeOutCSV(cstate, colname, false,
+ list_length(cstate->attnumlist) == 1);
+ else
+ CopyAttributeOutText(cstate, colname);
+ }
+
+ CopyToTextSendEndOfRow(cstate);
+ }
+}
+
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ bool need_delim = false;
+ FmgrInfo *out_functions = cstate->out_functions;
+ ListCell *cur;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
+ if (need_delim)
+ CopySendChar(cstate, cstate->opts.delim[0]);
+ need_delim = true;
+
+ if (isnull)
+ {
+ CopySendString(cstate, cstate->opts.null_print_client);
+ }
+ else
+ {
+ char *string;
+
+ string = OutputFunctionCall(&out_functions[attnum - 1], value);
+ if (cstate->opts.csv_mode)
+ CopyAttributeOutCSV(cstate, string,
+ cstate->opts.force_quote_flags[attnum - 1],
+ list_length(cstate->attnumlist) == 1);
+ else
+ CopyAttributeOutText(cstate, string);
+ }
+ }
+
+ CopyToTextSendEndOfRow(cstate);
+}
+
+static void
+CopyToTextEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/* All "binary" options are parsed in ProcessCopyOptions(). We may move the
+ * code to here later. */
+static bool
+CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel)
+{
+ return false;
+}
+
+static int16
+CopyToBinaryGetFormat(CopyToState cstate)
+{
+ return 1;
+}
+
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+ int num_phys_attrs;
+ ListCell *cur;
+
+ num_phys_attrs = tupDesc->natts;
+ /* Get info about the columns we need to process. */
+ cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ Oid out_func_oid;
+ bool isvarlena;
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+ }
+
+ {
+ /* Generate header for a binary copy */
+ int32 tmp;
+
+ /* Signature */
+ CopySendData(cstate, BinarySignature, 11);
+ /* Flags field */
+ tmp = 0;
+ CopySendInt32(cstate, tmp);
+ /* No header extension */
+ tmp = 0;
+ CopySendInt32(cstate, tmp);
+ }
+}
+
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ FmgrInfo *out_functions = cstate->out_functions;
+ ListCell *cur;
+
+ /* Binary per-tuple header */
+ CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
+ if (isnull)
+ {
+ CopySendInt32(cstate, -1);
+ }
+ else
+ {
+ bytea *outputbytes;
+
+ outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+ CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+ CopySendData(cstate, VARDATA(outputbytes),
+ VARSIZE(outputbytes) - VARHDRSZ);
+ }
+ }
+
+ CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+ /* Generate trailer for a binary copy */
+ CopySendInt16(cstate, -1);
+ /* Need to flush out the trailer */
+ CopySendEndOfRow(cstate);
+}
+
+CopyToRoutine CopyToRoutineText = {
+ .CopyToProcessOption = CopyToTextProcessOption,
+ .CopyToGetFormat = CopyToTextGetFormat,
+ .CopyToStart = CopyToTextStart,
+ .CopyToOneRow = CopyToTextOneRow,
+ .CopyToEnd = CopyToTextEnd,
+};
+
+/*
+ * We can use the same CopyToRoutine for both of "text" and "csv" because
+ * CopyToText*() refer cstate->opts.csv_mode and change their behavior. We can
+ * split the implementations and stop referring cstate->opts.csv_mode later.
+ */
+CopyToRoutine CopyToRoutineCSV = {
+ .CopyToProcessOption = CopyToTextProcessOption,
+ .CopyToGetFormat = CopyToTextGetFormat,
+ .CopyToStart = CopyToTextStart,
+ .CopyToOneRow = CopyToTextOneRow,
+ .CopyToEnd = CopyToTextEnd,
+};
+
+CopyToRoutine CopyToRoutineBinary = {
+ .CopyToProcessOption = CopyToBinaryProcessOption,
+ .CopyToGetFormat = CopyToBinaryGetFormat,
+ .CopyToStart = CopyToBinaryStart,
+ .CopyToOneRow = CopyToBinaryOneRow,
+ .CopyToEnd = CopyToBinaryEnd,
+};
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -141,7 +410,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = cstate->opts.to_routine->CopyToGetFormat(cstate);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
@@ -198,16 +467,6 @@ CopySendEndOfRow(CopyToState cstate)
switch (cstate->copy_dest)
{
case COPY_FILE:
- if (!cstate->opts.binary)
- {
- /* Default line termination depends on platform */
-#ifndef WIN32
- CopySendChar(cstate, '\n');
-#else
- CopySendString(cstate, "\r\n");
-#endif
- }
-
if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
cstate->copy_file) != 1 ||
ferror(cstate->copy_file))
@@ -242,10 +501,6 @@ CopySendEndOfRow(CopyToState cstate)
}
break;
case COPY_FRONTEND:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->opts.binary)
- CopySendChar(cstate, '\n');
-
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
break;
@@ -431,7 +686,7 @@ BeginCopyTo(ParseState *pstate,
oldcontext = MemoryContextSwitchTo(cstate->copycontext);
/* Extract options from the statement node tree */
- ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
+ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , cstate, options);
/* Process the source/target relation or query */
if (rel)
@@ -748,8 +1003,6 @@ DoCopyTo(CopyToState cstate)
bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
bool fe_copy = (pipe && whereToSendOutput == DestRemote);
TupleDesc tupDesc;
- int num_phys_attrs;
- ListCell *cur;
uint64 processed;
if (fe_copy)
@@ -759,32 +1012,11 @@ DoCopyTo(CopyToState cstate)
tupDesc = RelationGetDescr(cstate->rel);
else
tupDesc = cstate->queryDesc->tupDesc;
- num_phys_attrs = tupDesc->natts;
cstate->opts.null_print_client = cstate->opts.null_print; /* default */
/* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
cstate->fe_msgbuf = makeStringInfo();
- /* Get info about the columns we need to process. */
- cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- Oid out_func_oid;
- bool isvarlena;
- Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
- if (cstate->opts.binary)
- getTypeBinaryOutputInfo(attr->atttypid,
- &out_func_oid,
- &isvarlena);
- else
- getTypeOutputInfo(attr->atttypid,
- &out_func_oid,
- &isvarlena);
- fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
- }
-
/*
* Create a temporary memory context that we can reset once per row to
* recover palloc'd memory. This avoids any problems with leaks inside
@@ -795,57 +1027,7 @@ DoCopyTo(CopyToState cstate)
"COPY TO",
ALLOCSET_DEFAULT_SIZES);
- if (cstate->opts.binary)
- {
- /* Generate header for a binary copy */
- int32 tmp;
-
- /* Signature */
- CopySendData(cstate, BinarySignature, 11);
- /* Flags field */
- tmp = 0;
- CopySendInt32(cstate, tmp);
- /* No header extension */
- tmp = 0;
- CopySendInt32(cstate, tmp);
- }
- else
- {
- /*
- * For non-binary copy, we need to convert null_print to file
- * encoding, because it will be sent directly with CopySendString.
- */
- if (cstate->need_transcoding)
- cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
- cstate->opts.null_print_len,
- cstate->file_encoding);
-
- /* if a header has been requested send the line */
- if (cstate->opts.header_line)
- {
- bool hdr_delim = false;
-
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- char *colname;
-
- if (hdr_delim)
- CopySendChar(cstate, cstate->opts.delim[0]);
- hdr_delim = true;
-
- colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
- if (cstate->opts.csv_mode)
- CopyAttributeOutCSV(cstate, colname, false,
- list_length(cstate->attnumlist) == 1);
- else
- CopyAttributeOutText(cstate, colname);
- }
-
- CopySendEndOfRow(cstate);
- }
- }
+ cstate->opts.to_routine->CopyToStart(cstate, tupDesc);
if (cstate->rel)
{
@@ -884,13 +1066,7 @@ DoCopyTo(CopyToState cstate)
processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
}
- if (cstate->opts.binary)
- {
- /* Generate trailer for a binary copy */
- CopySendInt16(cstate, -1);
- /* Need to flush out the trailer */
- CopySendEndOfRow(cstate);
- }
+ cstate->opts.to_routine->CopyToEnd(cstate);
MemoryContextDelete(cstate->rowcontext);
@@ -906,71 +1082,15 @@ DoCopyTo(CopyToState cstate)
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
- if (cstate->opts.binary)
- {
- /* Binary per-tuple header */
- CopySendInt16(cstate, list_length(cstate->attnumlist));
- }
-
/* Make sure the tuple is fully deconstructed */
slot_getallattrs(slot);
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- Datum value = slot->tts_values[attnum - 1];
- bool isnull = slot->tts_isnull[attnum - 1];
-
- if (!cstate->opts.binary)
- {
- if (need_delim)
- CopySendChar(cstate, cstate->opts.delim[0]);
- need_delim = true;
- }
-
- if (isnull)
- {
- if (!cstate->opts.binary)
- CopySendString(cstate, cstate->opts.null_print_client);
- else
- CopySendInt32(cstate, -1);
- }
- else
- {
- if (!cstate->opts.binary)
- {
- string = OutputFunctionCall(&out_functions[attnum - 1],
- value);
- if (cstate->opts.csv_mode)
- CopyAttributeOutCSV(cstate, string,
- cstate->opts.force_quote_flags[attnum - 1],
- list_length(cstate->attnumlist) == 1);
- else
- CopyAttributeOutText(cstate, string);
- }
- else
- {
- bytea *outputbytes;
-
- outputbytes = SendFunctionCall(&out_functions[attnum - 1],
- value);
- CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
- CopySendData(cstate, VARDATA(outputbytes),
- VARSIZE(outputbytes) - VARHDRSZ);
- }
- }
- }
-
- CopySendEndOfRow(cstate);
+ cstate->opts.to_routine->CopyToOneRow(cstate, slot);
MemoryContextSwitchTo(oldcontext);
}
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..34bea880ca 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "commands/copyapi.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
List *convert_select; /* list of column names (can be NIL) */
+ CopyToRoutine *to_routine; /* callback routines for COPY TO */
} CopyFormatOptions;
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
typedef void (*copy_data_dest_cb) (void *data, int len);
@@ -87,7 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
int stmt_location, int stmt_len,
uint64 *processed);
-extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options);
extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
const char *filename,
bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..eb68f2fb7b
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,59 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ * API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/parsenodes.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
+typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
+typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
+typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
+typedef void (*CopyToEnd_function) (CopyToState cstate);
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToRoutine
+{
+ /*
+ * Called for processing one COPY TO option. This will return false when
+ * the given option is invalid.
+ */
+ CopyToProcessOption_function CopyToProcessOption;
+
+ /*
+ * Called when COPY TO is started. This will return a format as int16
+ * value. It's used for the CopyOutResponse message.
+ */
+ CopyToGetFormat_function CopyToGetFormat;
+
+ /* Called when COPY TO is started. This will send a header. */
+ CopyToStart_function CopyToStart;
+
+ /* Copy one row for COPY TO. */
+ CopyToOneRow_function CopyToOneRow;
+
+ /* Called when COPY TO is ended. This will send a trailer. */
+ CopyToEnd_function CopyToEnd;
+} CopyToRoutine;
+
+/* Built-in CopyToRoutine for "text", "csv" and "binary". */
+extern CopyToRoutine CopyToRoutineText;
+extern CopyToRoutine CopyToRoutineCSV;
+extern CopyToRoutine CopyToRoutineBinary;
+
+#endif /* COPYAPI_H */
--
2.41.0
[application/octet-stream] v8-0002-Add-support-for-adding-custom-COPY-TO-format.patch (17.7K, 5-v8-0002-Add-support-for-adding-custom-COPY-TO-format.patch)
download | inline diff:
From a597f8a2beec12971d77419f08b5722f531774f3 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Tue, 23 Jan 2024 13:58:38 +0900
Subject: [PATCH v8 02/10] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
We will add support for custom COPY FROM format later. We'll use the
same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY
TO/FROM handler with "is_from" argument. It's true for COPY FROM and
false for COPY TO:
copy_handler(true) returns CopyToRoutine
copy_handler(false) returns CopyFromRoutine (not exist yet)
We discussed that we introduce a wrapper struct for it:
typedef struct CopyRoutine
{
NodeTag type;
/* either CopyToRoutine or CopyFromRoutine */
Node *routine;
}
copy_handler(true) returns CopyRoutine with CopyToRoutine
copy_handler(false) returns CopyRoutine with CopyFromRoutine
See also: https://www.postgresql.org/message-id/flat/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com
But I noticed that we don't need the wrapper struct. We can just
CopyToRoutine or CopyFromRoutine. Because we can distinct the returned
struct by checking its NodeTag. So I don't use the wrapper struct
approach.
---
src/backend/commands/copy.c | 84 ++++++++++++++-----
src/backend/nodes/Makefile | 1 +
src/backend/nodes/gen_node_support.pl | 2 +
src/backend/utils/adt/pseudotypes.c | 1 +
src/include/catalog/pg_proc.dat | 6 ++
src/include/catalog/pg_type.dat | 6 ++
src/include/commands/copyapi.h | 2 +
src/include/nodes/meson.build | 1 +
src/test/modules/Makefile | 1 +
src/test/modules/meson.build | 1 +
src/test/modules/test_copy_format/.gitignore | 4 +
src/test/modules/test_copy_format/Makefile | 23 +++++
.../expected/test_copy_format.out | 17 ++++
src/test/modules/test_copy_format/meson.build | 33 ++++++++
.../test_copy_format/sql/test_copy_format.sql | 8 ++
.../test_copy_format--1.0.sql | 8 ++
.../test_copy_format/test_copy_format.c | 77 +++++++++++++++++
.../test_copy_format/test_copy_format.control | 4 +
18 files changed, 260 insertions(+), 19 deletions(-)
mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
create mode 100644 src/test/modules/test_copy_format/.gitignore
create mode 100644 src/test/modules/test_copy_format/Makefile
create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
create mode 100644 src/test/modules/test_copy_format/meson.build
create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5f3697a5f9..6f0db0ae7c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
+#include "parser/parse_func.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
#include "utils/acl.h"
@@ -430,6 +431,69 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
return COPY_ON_ERROR_STOP; /* keep compiler quiet */
}
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine. If no
+ * COPY format handler is found, this function reports an error.
+ */
+static void
+ProcessCopyOptionCustomFormat(ParseState *pstate,
+ CopyFormatOptions *opts_out,
+ bool is_from,
+ DefElem *defel)
+{
+ char *format;
+ Oid funcargtypes[1];
+ Oid handlerOid = InvalidOid;
+ Datum datum;
+ void *routine;
+
+ format = defGetString(defel);
+
+ /* built-in formats */
+ if (strcmp(format, "text") == 0)
+ /* default format */ return;
+ else if (strcmp(format, "csv") == 0)
+ {
+ opts_out->csv_mode = true;
+ opts_out->to_routine = &CopyToRoutineCSV;
+ return;
+ }
+ else if (strcmp(format, "binary") == 0)
+ {
+ opts_out->binary = true;
+ opts_out->to_routine = &CopyToRoutineBinary;
+ return;
+ }
+
+ /* custom format */
+ if (!is_from)
+ {
+ funcargtypes[0] = INTERNALOID;
+ handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+ funcargtypes, true);
+ }
+ if (!OidIsValid(handlerOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY format \"%s\" not recognized", format),
+ parser_errposition(pstate, defel->location)));
+
+ datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+ routine = DatumGetPointer(datum);
+ if (routine == NULL || !IsA(routine, CopyToRoutine))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct",
+ format, handlerOid),
+ parser_errposition(pstate, defel->location)));
+
+ opts_out->to_routine = routine;
+}
+
/*
* Process the statement option list for COPY.
*
@@ -481,28 +545,10 @@ ProcessCopyOptions(ParseState *pstate,
if (strcmp(defel->defname, "format") == 0)
{
- char *fmt = defGetString(defel);
-
if (format_specified)
errorConflictingDefElem(defel, pstate);
format_specified = true;
- if (strcmp(fmt, "text") == 0)
- /* default format */ ;
- else if (strcmp(fmt, "csv") == 0)
- {
- opts_out->csv_mode = true;
- opts_out->to_routine = &CopyToRoutineCSV;
- }
- else if (strcmp(fmt, "binary") == 0)
- {
- opts_out->binary = true;
- opts_out->to_routine = &CopyToRoutineBinary;
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY format \"%s\" not recognized", fmt),
- parser_errposition(pstate, defel->location)));
+ ProcessCopyOptionCustomFormat(pstate, opts_out, is_from, defel);
}
}
/* Extract options except "format" from the statement node tree */
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e..173ee11811 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
access/sdir.h \
access/tableam.h \
access/tsmapi.h \
+ commands/copyapi.h \
commands/event_trigger.h \
commands/trigger.h \
executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 2f0a59bc87..bd397f45ac
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
access/sdir.h
access/tableam.h
access/tsmapi.h
+ commands/copyapi.h
commands/event_trigger.h
commands/trigger.h
executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
access/sdir.h
access/tableam.h
access/tsmapi.h
+ commands/copyapi.h
commands/event_trigger.h
commands/trigger.h
executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index a3a991f634..d308780c43 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -373,6 +373,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..d4e426687c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7617,6 +7617,12 @@
{ oid => '3312', descr => 'I/O',
proname => 'tsm_handler_out', prorettype => 'cstring',
proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+ proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+ proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+ proname => 'copy_handler_out', prorettype => 'cstring',
+ proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
{ oid => '267', descr => 'I/O',
proname => 'table_am_handler_in', proisstrict => 'f',
prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index d29194da31..2040d5da83 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -632,6 +632,12 @@
typcategory => 'P', typinput => 'tsm_handler_in',
typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
typalign => 'i' },
+{ oid => '8752',
+ descr => 'pseudo-type for the result of a copy to/from method functoin',
+ typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+ typcategory => 'P', typinput => 'copy_handler_in',
+ typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+ typalign => 'i' },
{ oid => '269',
typname => 'table_am_handler',
descr => 'pseudo-type for the result of a table AM handler function',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index eb68f2fb7b..9c25e1c415 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -29,6 +29,8 @@ typedef void (*CopyToEnd_function) (CopyToState cstate);
/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
+ NodeTag type;
+
/*
* Called for processing one COPY TO option. This will return false when
* the given option is invalid.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b65..103df1a787 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
'access/sdir.h',
'access/tableam.h',
'access/tsmapi.h',
+ 'commands/copyapi.h',
'commands/event_trigger.h',
'commands/trigger.h',
'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index e32c8925f6..9d57b868d5 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
spgist_name_ops \
test_bloomfilter \
test_copy_callbacks \
+ test_copy_format \
test_custom_rmgrs \
test_ddl_deparse \
test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 397e0906e6..d76f2a6003 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -13,6 +13,7 @@ subdir('spgist_name_ops')
subdir('ssl_passphrase_callback')
subdir('test_bloomfilter')
subdir('test_copy_callbacks')
+subdir('test_copy_format')
subdir('test_custom_rmgrs')
subdir('test_ddl_deparse')
subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 0000000000..8497f91624
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+ $(WIN32RES) \
+ test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 0000000000..3a24ae7b97
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test TO stdout WITH (
+ option_before 'before',
+ format 'test_copy_format',
+ option_after 'after'
+);
+NOTICE: test_copy_format: is_from=false
+NOTICE: CopyToProcessOption: "option_before"="before"
+NOTICE: CopyToProcessOption: "option_after"="after"
+NOTICE: CopyToGetFormat
+NOTICE: CopyToStart: natts=3
+NOTICE: CopyToOneRow: tts_nvalid=3
+NOTICE: CopyToOneRow: tts_nvalid=3
+NOTICE: CopyToOneRow: tts_nvalid=3
+NOTICE: CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 0000000000..4cefe7b709
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+ 'test_copy_format.c',
+)
+
+if host_system == 'windows'
+ test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'test_copy_format',
+ '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+ test_copy_format_sources,
+ kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+ 'test_copy_format.control',
+ 'test_copy_format--1.0.sql',
+)
+
+tests += {
+ 'name': 'test_copy_format',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'test_copy_format',
+ ],
+ },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 0000000000..0eb7ed2e11
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,8 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test TO stdout WITH (
+ option_before 'before',
+ format 'test_copy_format',
+ option_after 'after'
+);
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 0000000000..d24ea03ce9
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+ RETURNS copy_handler
+ AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 0000000000..a2219afcde
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,77 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ * Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copy.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static bool
+CopyToProcessOption(CopyToState cstate, DefElem *defel)
+{
+ ereport(NOTICE,
+ (errmsg("CopyToProcessOption: \"%s\"=\"%s\"",
+ defel->defname, defGetString(defel))));
+ return true;
+}
+
+static int16
+CopyToGetFormat(CopyToState cstate)
+{
+ ereport(NOTICE, (errmsg("CopyToGetFormat")));
+ return 0;
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+ ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+ ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+ .type = T_CopyToRoutine,
+ .CopyToProcessOption = CopyToProcessOption,
+ .CopyToGetFormat = CopyToGetFormat,
+ .CopyToStart = CopyToStart,
+ .CopyToOneRow = CopyToOneRow,
+ .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+
+ ereport(NOTICE,
+ (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+ if (is_from)
+ elog(ERROR, "COPY FROM isn't supported yet");
+
+ PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 0000000000..f05a636235
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
--
2.41.0
[application/octet-stream] v8-0005-Extract-COPY-FROM-format-implementations.patch (24.8K, 6-v8-0005-Extract-COPY-FROM-format-implementations.patch)
download | inline diff:
From 781955f19ad27cdd66748be539bf45cf1b925856 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Tue, 23 Jan 2024 17:21:23 +0900
Subject: [PATCH v8 05/10] Extract COPY FROM format implementations
This doesn't change the current behavior. This just introduces
CopyFromRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
Note that CopyFromRoutine can't be used from extensions yet because
CopyRead*() aren't exported yet. Extensions can't read data from a
source without CopyRead*(). They will be exported by subsequent
patches.
---
src/backend/commands/copy.c | 3 +
src/backend/commands/copyfrom.c | 216 +++++++++++----
src/backend/commands/copyfromparse.c | 326 ++++++++++++-----------
src/include/commands/copy.h | 3 -
src/include/commands/copyapi.h | 44 +++
src/include/commands/copyfrom_internal.h | 4 +
6 files changed, 391 insertions(+), 205 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6f0db0ae7c..ec6dfff8ab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -459,12 +459,14 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
else if (strcmp(format, "csv") == 0)
{
opts_out->csv_mode = true;
+ opts_out->from_routine = &CopyFromRoutineCSV;
opts_out->to_routine = &CopyToRoutineCSV;
return;
}
else if (strcmp(format, "binary") == 0)
{
opts_out->binary = true;
+ opts_out->from_routine = &CopyFromRoutineBinary;
opts_out->to_routine = &CopyToRoutineBinary;
return;
}
@@ -533,6 +535,7 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->file_encoding = -1;
/* Text is the default format. */
+ opts_out->from_routine = &CopyFromRoutineText;
opts_out->to_routine = &CopyToRoutineText;
/*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fb3d4d9296..d556ebb5d6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -108,6 +108,170 @@ static char *limit_printout_length(const char *str);
static void ClosePipeFromProgram(CopyFromState cstate);
+
+/*
+ * CopyFromRoutine implementations.
+ */
+
+/*
+ * CopyFromRoutine implementation for "text" and "csv". CopyFromText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
+ * move the code to here later. */
+static bool
+CopyFromTextProcessOption(CopyFromState cstate, DefElem *defel)
+{
+ return false;
+}
+
+static int16
+CopyFromTextGetFormat(CopyFromState cstate)
+{
+ return 0;
+}
+
+static void
+CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+ AttrNumber num_phys_attrs = tupDesc->natts;
+ AttrNumber attr_count;
+
+ /*
+ * If encoding conversion is needed, we need another buffer to hold the
+ * converted input data. Otherwise, we can just point input_buf to the
+ * same buffer as raw_buf.
+ */
+ if (cstate->need_transcoding)
+ {
+ cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+ cstate->input_buf_index = cstate->input_buf_len = 0;
+ }
+ else
+ cstate->input_buf = cstate->raw_buf;
+ cstate->input_reached_eof = false;
+
+ initStringInfo(&cstate->line_buf);
+
+ /*
+ * Pick up the required catalog information for each attribute in the
+ * relation, including the input function, the element type (to pass to
+ * the input function).
+ */
+ cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+ cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+ for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+ {
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+ Oid in_func_oid;
+
+ /* We don't need info for dropped attributes */
+ if (att->attisdropped)
+ continue;
+
+ /* Fetch the input function and typioparam info */
+ getTypeInputInfo(att->atttypid,
+ &in_func_oid, &cstate->typioparams[attnum - 1]);
+ fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+ }
+
+ /* create workspace for CopyReadAttributes results */
+ attr_count = list_length(cstate->attnumlist);
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+static void
+CopyFromTextEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/* All "binary" options are parsed in ProcessCopyOptions(). We may move the
+ * code to here later. */
+static bool
+CopyFromBinaryProcessOption(CopyFromState cstate, DefElem *defel)
+{
+ return false;
+}
+
+static int16
+CopyFromBinaryGetFormat(CopyFromState cstate)
+{
+ return 1;
+}
+
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+ AttrNumber num_phys_attrs = tupDesc->natts;
+
+ /*
+ * Pick up the required catalog information for each attribute in the
+ * relation, including the input function, the element type (to pass to
+ * the input function).
+ */
+ cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+ cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+ for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+ {
+ Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+ Oid in_func_oid;
+
+ /* We don't need info for dropped attributes */
+ if (att->attisdropped)
+ continue;
+
+ /* Fetch the input function and typioparam info */
+ getTypeBinaryInputInfo(att->atttypid,
+ &in_func_oid, &cstate->typioparams[attnum - 1]);
+ fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+ }
+
+ /* Read and verify binary header */
+ ReceiveCopyBinaryHeader(cstate);
+}
+
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+}
+
+CopyFromRoutine CopyFromRoutineText = {
+ .CopyFromProcessOption = CopyFromTextProcessOption,
+ .CopyFromGetFormat = CopyFromTextGetFormat,
+ .CopyFromStart = CopyFromTextStart,
+ .CopyFromOneRow = CopyFromTextOneRow,
+ .CopyFromEnd = CopyFromTextEnd,
+};
+
+/*
+ * We can use the same CopyFromRoutine for both of "text" and "csv" because
+ * CopyFromText*() refer cstate->opts.csv_mode and change their behavior. We can
+ * split the implementations and stop referring cstate->opts.csv_mode later.
+ */
+CopyFromRoutine CopyFromRoutineCSV = {
+ .CopyFromProcessOption = CopyFromTextProcessOption,
+ .CopyFromGetFormat = CopyFromTextGetFormat,
+ .CopyFromStart = CopyFromTextStart,
+ .CopyFromOneRow = CopyFromTextOneRow,
+ .CopyFromEnd = CopyFromTextEnd,
+};
+
+CopyFromRoutine CopyFromRoutineBinary = {
+ .CopyFromProcessOption = CopyFromBinaryProcessOption,
+ .CopyFromGetFormat = CopyFromBinaryGetFormat,
+ .CopyFromStart = CopyFromBinaryStart,
+ .CopyFromOneRow = CopyFromBinaryOneRow,
+ .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+
/*
* error context callback for COPY FROM
*
@@ -1384,9 +1548,6 @@ BeginCopyFrom(ParseState *pstate,
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
- FmgrInfo *in_functions;
- Oid *typioparams;
- Oid in_func_oid;
int *defmap;
ExprState **defexprs;
MemoryContext oldcontext;
@@ -1571,25 +1732,6 @@ BeginCopyFrom(ParseState *pstate,
cstate->raw_buf_index = cstate->raw_buf_len = 0;
cstate->raw_reached_eof = false;
- if (!cstate->opts.binary)
- {
- /*
- * If encoding conversion is needed, we need another buffer to hold
- * the converted input data. Otherwise, we can just point input_buf
- * to the same buffer as raw_buf.
- */
- if (cstate->need_transcoding)
- {
- cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
- cstate->input_buf_index = cstate->input_buf_len = 0;
- }
- else
- cstate->input_buf = cstate->raw_buf;
- cstate->input_reached_eof = false;
-
- initStringInfo(&cstate->line_buf);
- }
-
initStringInfo(&cstate->attribute_buf);
/* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1608,8 +1750,6 @@ BeginCopyFrom(ParseState *pstate,
* the input function), and info about defaults and constraints. (Which
* input function we use depends on text/binary format choice.)
*/
- in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
- typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
defmap = (int *) palloc(num_phys_attrs * sizeof(int));
defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
@@ -1621,15 +1761,6 @@ BeginCopyFrom(ParseState *pstate,
if (att->attisdropped)
continue;
- /* Fetch the input function and typioparam info */
- if (cstate->opts.binary)
- getTypeBinaryInputInfo(att->atttypid,
- &in_func_oid, &typioparams[attnum - 1]);
- else
- getTypeInputInfo(att->atttypid,
- &in_func_oid, &typioparams[attnum - 1]);
- fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-
/* Get default info if available */
defexprs[attnum - 1] = NULL;
@@ -1689,8 +1820,6 @@ BeginCopyFrom(ParseState *pstate,
cstate->bytes_processed = 0;
/* We keep those variables in cstate. */
- cstate->in_functions = in_functions;
- cstate->typioparams = typioparams;
cstate->defmap = defmap;
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
@@ -1763,20 +1892,7 @@ BeginCopyFrom(ParseState *pstate,
pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
- if (cstate->opts.binary)
- {
- /* Read and verify binary header */
- ReceiveCopyBinaryHeader(cstate);
- }
-
- /* create workspace for CopyReadAttributes results */
- if (!cstate->opts.binary)
- {
- AttrNumber attr_count = list_length(cstate->attnumlist);
-
- cstate->max_fields = attr_count;
- cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
- }
+ cstate->opts.from_routine->CopyFromStart(cstate, tupDesc);
MemoryContextSwitchTo(oldcontext);
@@ -1789,6 +1905,8 @@ BeginCopyFrom(ParseState *pstate,
void
EndCopyFrom(CopyFromState cstate)
{
+ cstate->opts.from_routine->CopyFromEnd(cstate);
+
/* No COPY FROM related resources except memory. */
if (cstate->is_program)
{
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..49632f75e4 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = cstate->opts.from_routine->CopyFromGetFormat(cstate);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -840,199 +840,219 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
return true;
}
-/*
- * Read next tuple from file for COPY FROM. Return false if no more tuples.
- *
- * 'econtext' is used to evaluate default expression for each column that is
- * either not read from the file or is using the DEFAULT option of COPY FROM.
- * It can be NULL when no default values are used, i.e. when all columns are
- * read from the file, and DEFAULT option is unset.
- *
- * 'values' and 'nulls' arrays must be the same length as columns of the
- * relation passed to BeginCopyFrom. This function fills the arrays.
- */
bool
-NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
- Datum *values, bool *nulls)
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
{
TupleDesc tupDesc;
- AttrNumber num_phys_attrs,
- attr_count,
- num_defaults = cstate->num_defaults;
+ AttrNumber attr_count;
FmgrInfo *in_functions = cstate->in_functions;
Oid *typioparams = cstate->typioparams;
- int i;
- int *defmap = cstate->defmap;
ExprState **defexprs = cstate->defexprs;
+ char **field_strings;
+ ListCell *cur;
+ int fldct;
+ int fieldno;
+ char *string;
tupDesc = RelationGetDescr(cstate->rel);
- num_phys_attrs = tupDesc->natts;
attr_count = list_length(cstate->attnumlist);
- /* Initialize all values for row to NULL */
- MemSet(values, 0, num_phys_attrs * sizeof(Datum));
- MemSet(nulls, true, num_phys_attrs * sizeof(bool));
- MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+ /* read raw fields in the next line */
+ if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+ return false;
- if (!cstate->opts.binary)
- {
- char **field_strings;
- ListCell *cur;
- int fldct;
- int fieldno;
- char *string;
+ /* check for overflowing fields */
+ if (attr_count > 0 && fldct > attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("extra data after last expected column")));
- /* read raw fields in the next line */
- if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
- return false;
+ fieldno = 0;
- /* check for overflowing fields */
- if (attr_count > 0 && fldct > attr_count)
+ /* Loop to read the user attributes on the line. */
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ if (fieldno >= fldct)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("extra data after last expected column")));
+ errmsg("missing data for column \"%s\"",
+ NameStr(att->attname))));
+ string = field_strings[fieldno++];
- fieldno = 0;
-
- /* Loop to read the user attributes on the line. */
- foreach(cur, cstate->attnumlist)
+ if (cstate->convert_select_flags &&
+ !cstate->convert_select_flags[m])
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
- if (fieldno >= fldct)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("missing data for column \"%s\"",
- NameStr(att->attname))));
- string = field_strings[fieldno++];
-
- if (cstate->convert_select_flags &&
- !cstate->convert_select_flags[m])
- {
- /* ignore input field, leaving column as NULL */
- continue;
- }
+ /* ignore input field, leaving column as NULL */
+ continue;
+ }
- if (cstate->opts.csv_mode)
+ if (cstate->opts.csv_mode)
+ {
+ if (string == NULL &&
+ cstate->opts.force_notnull_flags[m])
{
- if (string == NULL &&
- cstate->opts.force_notnull_flags[m])
- {
- /*
- * FORCE_NOT_NULL option is set and column is NULL -
- * convert it to the NULL string.
- */
- string = cstate->opts.null_print;
- }
- else if (string != NULL && cstate->opts.force_null_flags[m]
- && strcmp(string, cstate->opts.null_print) == 0)
- {
- /*
- * FORCE_NULL option is set and column matches the NULL
- * string. It must have been quoted, or otherwise the
- * string would already have been set to NULL. Convert it
- * to NULL as specified.
- */
- string = NULL;
- }
+ /*
+ * FORCE_NOT_NULL option is set and column is NULL - convert
+ * it to the NULL string.
+ */
+ string = cstate->opts.null_print;
}
-
- cstate->cur_attname = NameStr(att->attname);
- cstate->cur_attval = string;
-
- if (string != NULL)
- nulls[m] = false;
-
- if (cstate->defaults[m])
+ else if (string != NULL && cstate->opts.force_null_flags[m]
+ && strcmp(string, cstate->opts.null_print) == 0)
{
/*
- * The caller must supply econtext and have switched into the
- * per-tuple memory context in it.
+ * FORCE_NULL option is set and column matches the NULL
+ * string. It must have been quoted, or otherwise the string
+ * would already have been set to NULL. Convert it to NULL as
+ * specified.
*/
- Assert(econtext != NULL);
- Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
- values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+ string = NULL;
}
+ }
+
+ cstate->cur_attname = NameStr(att->attname);
+ cstate->cur_attval = string;
+
+ if (string != NULL)
+ nulls[m] = false;
+ if (cstate->defaults[m])
+ {
/*
- * If ON_ERROR is specified with IGNORE, skip rows with soft
- * errors
+ * The caller must supply econtext and have switched into the
+ * per-tuple memory context in it.
*/
- else if (!InputFunctionCallSafe(&in_functions[m],
- string,
- typioparams[m],
- att->atttypmod,
- (Node *) cstate->escontext,
- &values[m]))
- {
- cstate->num_errors++;
- return true;
- }
+ Assert(econtext != NULL);
+ Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
+ values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
- Assert(fieldno == attr_count);
+ /*
+ * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+ */
+ else if (!InputFunctionCallSafe(&in_functions[m],
+ string,
+ typioparams[m],
+ att->atttypmod,
+ (Node *) cstate->escontext,
+ &values[m]))
+ {
+ cstate->num_errors++;
+ return true;
+ }
+
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
}
- else
- {
- /* binary */
- int16 fld_count;
- ListCell *cur;
- cstate->cur_lineno++;
+ Assert(fieldno == attr_count);
- if (!CopyGetInt16(cstate, &fld_count))
- {
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ return true;
+}
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. Wait for the protocol-level EOF, and
- * complain if it doesn't come immediately. In COPY FROM STDIN,
- * this ensures that we correctly handle CopyFail, if client
- * chooses to send that now. When copying from file, we could
- * ignore the rest of the file like in text mode, but we choose to
- * be consistent with the COPY FROM STDIN case.
- */
- char dummy;
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ TupleDesc tupDesc;
+ AttrNumber attr_count;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ int16 fld_count;
+ ListCell *cur;
- if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ tupDesc = RelationGetDescr(cstate->rel);
+ attr_count = list_length(cstate->attnumlist);
- if (fld_count != attr_count)
+ cstate->cur_lineno++;
+
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
+
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. Wait for the protocol-level EOF, and complain
+ * if it doesn't come immediately. In COPY FROM STDIN, this ensures
+ * that we correctly handle CopyFail, if client chooses to send that
+ * now. When copying from file, we could ignore the rest of the file
+ * like in text mode, but we choose to be consistent with the COPY
+ * FROM STDIN case.
+ */
+ char dummy;
+
+ if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ errmsg("received copy data after EOF marker")));
+ return false;
+ }
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
- }
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
}
+ return true;
+}
+
+/*
+ * Read next tuple from file for COPY FROM. Return false if no more tuples.
+ *
+ * 'econtext' is used to evaluate default expression for each column that is
+ * either not read from the file or is using the DEFAULT option of COPY FROM.
+ * It can be NULL when no default values are used, i.e. when all columns are
+ * read from the file, and DEFAULT option is unset.
+ *
+ * 'values' and 'nulls' arrays must be the same length as columns of the
+ * relation passed to BeginCopyFrom. This function fills the arrays.
+ */
+bool
+NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
+ Datum *values, bool *nulls)
+{
+ TupleDesc tupDesc;
+ AttrNumber num_phys_attrs,
+ num_defaults = cstate->num_defaults;
+ int i;
+ int *defmap = cstate->defmap;
+ ExprState **defexprs = cstate->defexprs;
+
+ tupDesc = RelationGetDescr(cstate->rel);
+ num_phys_attrs = tupDesc->natts;
+
+ /* Initialize all values for row to NULL */
+ MemSet(values, 0, num_phys_attrs * sizeof(Datum));
+ MemSet(nulls, true, num_phys_attrs * sizeof(bool));
+ MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
+
+ if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values,
+ nulls))
+ return false;
+
/*
* Now compute and insert any defaults available for the columns not
* provided by the input data. Anything not processed here or above will
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3f4682f95..df29d42555 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,9 +20,6 @@
#include "parser/parse_node.h"
#include "tcop/dest.h"
-/* This is private in commands/copyfrom.c */
-typedef struct CopyFromStateData *CopyFromState;
-
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ffad433a21..323e4705d2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -18,6 +18,49 @@
#include "executor/tuptable.h"
#include "nodes/parsenodes.h"
+/* This is private in commands/copyfrom.c */
+typedef struct CopyFromStateData *CopyFromState;
+
+typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
+typedef int16 (*CopyFromGetFormat_function) (CopyFromState cstate);
+typedef void (*CopyFromStart_function) (CopyFromState cstate, TupleDesc tupDesc);
+typedef bool (*CopyFromOneRow_function) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+typedef void (*CopyFromEnd_function) (CopyFromState cstate);
+
+/* Routines for a COPY FROM format implementation. */
+typedef struct CopyFromRoutine
+{
+ /*
+ * Called for processing one COPY FROM option. This will return false when
+ * the given option is invalid.
+ */
+ CopyFromProcessOption_function CopyFromProcessOption;
+
+ /*
+ * Called when COPY FROM is started. This will return a format as int16
+ * value. It's used for the CopyInResponse message.
+ */
+ CopyFromGetFormat_function CopyFromGetFormat;
+
+ /*
+ * Called when COPY FROM is started. This will initialize something and
+ * receive a header.
+ */
+ CopyFromStart_function CopyFromStart;
+
+ /* Copy one row. It returns false if no more tuples. */
+ CopyFromOneRow_function CopyFromOneRow;
+
+ /* Called when COPY FROM is ended. This will finalize something. */
+ CopyFromEnd_function CopyFromEnd;
+} CopyFromRoutine;
+
+/* Built-in CopyFromRoutine for "text", "csv" and "binary". */
+extern CopyFromRoutine CopyFromRoutineText;
+extern CopyFromRoutine CopyFromRoutineCSV;
+extern CopyFromRoutine CopyFromRoutineBinary;
+
+
typedef struct CopyToStateData *CopyToState;
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
@@ -113,6 +156,7 @@ typedef struct CopyFormatOptions
bool convert_selectively; /* do selective binary conversion? */
CopyOnErrorChoice on_error; /* what to do when error happened */
List *convert_select; /* list of column names (can be NIL) */
+ CopyFromRoutine *from_routine; /* callback routines for COPY FROM */
CopyToRoutine *to_routine; /* callback routines for COPY TO */
} CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..921c1513f7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,4 +183,8 @@ typedef struct CopyFromStateData
extern void ReceiveCopyBegin(CopyFromState cstate);
extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
+
#endif /* COPYFROM_INTERNAL_H */
--
2.41.0
[application/octet-stream] v8-0008-Add-support-for-implementing-custom-COPY-FROM-for.patch (4.5K, 7-v8-0008-Add-support-for-implementing-custom-COPY-FROM-for.patch)
download | inline diff:
From 3e847de1acb2fd6966ef01192204448711ca3d5e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Wed, 24 Jan 2024 14:19:08 +0900
Subject: [PATCH v8 08/10] Add support for implementing custom COPY FROM format
as extension
* Add CopyFromStateData::opaque that can be used to keep data for
custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data
* Rename CopyReadBinaryData() to CopyFromStateRead() because it's a
method for CopyFromState and "BinaryData" is redundant.
---
src/backend/commands/copyfromparse.c | 21 ++++++++++-----------
src/include/commands/copyapi.h | 5 +++++
2 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index a78a790060..f8a194635d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -165,7 +165,6 @@ static int CopyGetData(CopyFromState cstate, void *databuf,
static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
static void CopyLoadInputBuf(CopyFromState cstate);
-static int CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes);
void
ReceiveCopyBegin(CopyFromState cstate)
@@ -194,7 +193,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyFromStateRead(cstate, readSig, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -222,7 +221,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyFromStateRead(cstate, readSig, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -364,7 +363,7 @@ CopyGetInt32(CopyFromState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
{
*val = 0; /* suppress compiler warning */
return false;
@@ -381,7 +380,7 @@ CopyGetInt16(CopyFromState cstate, int16 *val)
{
uint16 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
{
*val = 0; /* suppress compiler warning */
return false;
@@ -692,14 +691,14 @@ CopyLoadInputBuf(CopyFromState cstate)
}
/*
- * CopyReadBinaryData
+ * CopyFromStateRead
*
* Reads up to 'nbytes' bytes from cstate->copy_file via cstate->raw_buf
* and writes them to 'dest'. Returns the number of bytes read (which
* would be less than 'nbytes' only if we reach EOF).
*/
-static int
-CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
{
int copied_bytes = 0;
@@ -988,7 +987,7 @@ CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
*/
char dummy;
- if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+ if (CopyFromStateRead(cstate, &dummy, 1) > 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("received copy data after EOF marker")));
@@ -1997,8 +1996,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
resetStringInfo(&cstate->attribute_buf);
enlargeStringInfo(&cstate->attribute_buf, fld_size);
- if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
- fld_size) != fld_size)
+ if (CopyFromStateRead(cstate, cstate->attribute_buf.data,
+ fld_size) != fld_size)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("unexpected EOF in COPY data")));
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index b7e8f627bf..22accc83ab 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -314,8 +314,13 @@ typedef struct CopyFromStateData
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
uint64 bytes_processed; /* number of bytes processed so far */
+
+ /* For custom format implementation */
+ void *opaque; /* private space */
} CopyFromStateData;
+extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
/*
* Represents the different dest cases we need to worry about at
* the bottom level
--
2.41.0
[application/octet-stream] v8-0007-Export-CopyFromStateData.patch (17.0K, 8-v8-0007-Export-CopyFromStateData.patch)
download | inline diff:
From 1ed575fda7f196ea411e9e53dd9c0739f160fb78 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Wed, 24 Jan 2024 14:16:29 +0900
Subject: [PATCH v8 07/10] Export CopyFromStateData
It's for custom COPY FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopySource
enum values. CopySource enum values changes aren't required but I did
like I did for CopyDest enum values. I changed COPY_ prefix to
COPY_SOURCE_ prefix. For example, COPY_FILE to COPY_SOURCE_FILE.
Note that this change isn't enough to implement a custom COPY FROM
format handler as extension. We'll do the followings in a subsequent
commit:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
src/backend/commands/copyfrom.c | 4 +-
src/backend/commands/copyfromparse.c | 10 +-
src/include/commands/copy.h | 2 -
src/include/commands/copyapi.h | 156 ++++++++++++++++++++++-
src/include/commands/copyfrom_internal.h | 150 ----------------------
5 files changed, 162 insertions(+), 160 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index d556ebb5d6..b4ac7cbd2c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1710,7 +1710,7 @@ BeginCopyFrom(ParseState *pstate,
pg_encoding_to_char(GetDatabaseEncoding()))));
}
- cstate->copy_src = COPY_FILE; /* default */
+ cstate->copy_src = COPY_SOURCE_FILE; /* default */
cstate->whereClause = whereClause;
@@ -1829,7 +1829,7 @@ BeginCopyFrom(ParseState *pstate,
if (data_source_cb)
{
progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
- cstate->copy_src = COPY_CALLBACK;
+ cstate->copy_src = COPY_SOURCE_CALLBACK;
cstate->data_source_cb = data_source_cb;
}
else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 49632f75e4..a78a790060 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -181,7 +181,7 @@ ReceiveCopyBegin(CopyFromState cstate)
for (i = 0; i < natts; i++)
pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
- cstate->copy_src = COPY_FRONTEND;
+ cstate->copy_src = COPY_SOURCE_FRONTEND;
cstate->fe_msgbuf = makeStringInfo();
/* We *must* flush here to ensure FE knows it can send. */
pq_flush();
@@ -249,7 +249,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
switch (cstate->copy_src)
{
- case COPY_FILE:
+ case COPY_SOURCE_FILE:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
@@ -258,7 +258,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
if (bytesread == 0)
cstate->raw_reached_eof = true;
break;
- case COPY_FRONTEND:
+ case COPY_SOURCE_FRONTEND:
while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
{
int avail;
@@ -341,7 +341,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
bytesread += avail;
}
break;
- case COPY_CALLBACK:
+ case COPY_SOURCE_CALLBACK:
bytesread = cstate->data_source_cb(databuf, minread, maxread);
break;
}
@@ -1099,7 +1099,7 @@ CopyReadLine(CopyFromState cstate)
* after \. up to the protocol end of copy data. (XXX maybe better
* not to treat \. as special?)
*/
- if (cstate->copy_src == COPY_FRONTEND)
+ if (cstate->copy_src == COPY_SOURCE_FRONTEND)
{
int inbytes;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index df29d42555..cd41d32074 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -20,8 +20,6 @@
#include "parser/parse_node.h"
#include "tcop/dest.h"
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-
extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
int stmt_location, int stmt_len,
uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ef1bb201c2..b7e8f627bf 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,11 +14,12 @@
#ifndef COPYAPI_H
#define COPYAPI_H
+#include "commands/trigger.h"
#include "executor/execdesc.h"
#include "executor/tuptable.h"
+#include "nodes/miscnodes.h"
#include "nodes/parsenodes.h"
-/* This is private in commands/copyfrom.c */
typedef struct CopyFromStateData *CopyFromState;
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
@@ -162,6 +163,159 @@ typedef struct CopyFormatOptions
CopyToRoutine *to_routine; /* callback routines for COPY TO */
} CopyFormatOptions;
+
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+ COPY_SOURCE_FILE, /* from file (or a piped program) */
+ COPY_SOURCE_FRONTEND, /* from frontend */
+ COPY_SOURCE_CALLBACK, /* from callback function */
+} CopySource;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL,
+} EolType;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+ /* low-level state data */
+ CopySource copy_src; /* type of copy source */
+ FILE *copy_file; /* used if copy_src == COPY_FILE */
+ StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */
+
+ EolType eol_type; /* EOL type of input */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ Oid conversion_proc; /* encoding conversion function */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDIN */
+ bool is_program; /* is 'filename' a program to popen? */
+ copy_data_source_cb data_source_cb; /* function for reading data */
+
+ CopyFormatOptions opts;
+ bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
+ Node *whereClause; /* WHERE condition (or NULL) */
+
+ /* these are just for error messages, see CopyFromErrorCallback */
+ const char *cur_relname; /* table name for error messages */
+ uint64 cur_lineno; /* line number for error messages */
+ const char *cur_attname; /* current att for error messages */
+ const char *cur_attval; /* current att value for error messages */
+ bool relname_only; /* don't output line number, att, etc. */
+
+ /*
+ * Working state
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ AttrNumber num_defaults; /* count of att that are missing and have
+ * default value */
+ FmgrInfo *in_functions; /* array of input functions for each attrs */
+ Oid *typioparams; /* array of element types for in_functions */
+ ErrorSaveContext *escontext; /* soft error trapper during in_functions
+ * execution */
+ uint64 num_errors; /* total number of rows which contained soft
+ * errors */
+ int *defmap; /* array of default att numbers related to
+ * missing att */
+ ExprState **defexprs; /* array of default att expressions for all
+ * att */
+ bool *defaults; /* if DEFAULT marker was found for
+ * corresponding att */
+ bool volatile_defexprs; /* is any of defexprs volatile? */
+ List *range_table; /* single element list of RangeTblEntry */
+ List *rteperminfos; /* single element list of RTEPermissionInfo */
+ ExprState *qualexpr;
+
+ TransitionCaptureState *transition_capture;
+
+ /*
+ * These variables are used to reduce overhead in COPY FROM.
+ *
+ * attribute_buf holds the separated, de-escaped text for each field of
+ * the current line. The CopyReadAttributes functions return arrays of
+ * pointers into this buffer. We avoid palloc/pfree overhead by re-using
+ * the buffer on each cycle.
+ *
+ * In binary COPY FROM, attribute_buf holds the binary data for the
+ * current field, but the usage is otherwise similar.
+ */
+ StringInfoData attribute_buf;
+
+ /* field raw data pointers found by COPY FROM */
+
+ int max_fields;
+ char **raw_fields;
+
+ /*
+ * Similarly, line_buf holds the whole input line being processed. The
+ * input cycle is first to read the whole line into line_buf, and then
+ * extract the individual attribute fields into attribute_buf. line_buf
+ * is preserved unmodified so that we can display it in error messages if
+ * appropriate. (In binary mode, line_buf is not used.)
+ */
+ StringInfoData line_buf;
+ bool line_buf_valid; /* contains the row being processed? */
+
+ /*
+ * input_buf holds input data, already converted to database encoding.
+ *
+ * In text mode, CopyReadLine parses this data sufficiently to locate line
+ * boundaries, then transfers the data to line_buf. We guarantee that
+ * there is a \0 at input_buf[input_buf_len] at all times. (In binary
+ * mode, input_buf is not used.)
+ *
+ * If encoding conversion is not required, input_buf is not a separate
+ * buffer but points directly to raw_buf. In that case, input_buf_len
+ * tracks the number of bytes that have been verified as valid in the
+ * database encoding, and raw_buf_len is the total number of bytes stored
+ * in the buffer.
+ */
+#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */
+ char *input_buf;
+ int input_buf_index; /* next byte to process */
+ int input_buf_len; /* total # of bytes stored */
+ bool input_reached_eof; /* true if we reached EOF */
+ bool input_reached_error; /* true if a conversion error happened */
+ /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+ /*
+ * raw_buf holds raw input data read from the data source (file or client
+ * connection), not yet converted to the database encoding. Like with
+ * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+ */
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+ char *raw_buf;
+ int raw_buf_index; /* next byte to process */
+ int raw_buf_len; /* total # of bytes stored */
+ bool raw_reached_eof; /* true if we reached EOF */
+
+ /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+ uint64 bytes_processed; /* number of bytes processed so far */
+} CopyFromStateData;
+
/*
* Represents the different dest cases we need to worry about at
* the bottom level
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 921c1513f7..f8f6120255 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -18,28 +18,6 @@
#include "commands/trigger.h"
#include "nodes/miscnodes.h"
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
- COPY_FILE, /* from file (or a piped program) */
- COPY_FRONTEND, /* from frontend */
- COPY_CALLBACK, /* from callback function */
-} CopySource;
-
-/*
- * Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
- EOL_UNKNOWN,
- EOL_NL,
- EOL_CR,
- EOL_CRNL,
-} EolType;
-
/*
* Represents the insert method to be used during COPY FROM.
*/
@@ -52,134 +30,6 @@ typedef enum CopyInsertMethod
* ExecForeignBatchInsert only if valid */
} CopyInsertMethod;
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
- /* low-level state data */
- CopySource copy_src; /* type of copy source */
- FILE *copy_file; /* used if copy_src == COPY_FILE */
- StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */
-
- EolType eol_type; /* EOL type of input */
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- Oid conversion_proc; /* encoding conversion function */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN */
- bool is_program; /* is 'filename' a program to popen? */
- copy_data_source_cb data_source_cb; /* function for reading data */
-
- CopyFormatOptions opts;
- bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
- Node *whereClause; /* WHERE condition (or NULL) */
-
- /* these are just for error messages, see CopyFromErrorCallback */
- const char *cur_relname; /* table name for error messages */
- uint64 cur_lineno; /* line number for error messages */
- const char *cur_attname; /* current att for error messages */
- const char *cur_attval; /* current att value for error messages */
- bool relname_only; /* don't output line number, att, etc. */
-
- /*
- * Working state
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- AttrNumber num_defaults; /* count of att that are missing and have
- * default value */
- FmgrInfo *in_functions; /* array of input functions for each attrs */
- Oid *typioparams; /* array of element types for in_functions */
- ErrorSaveContext *escontext; /* soft error trapper during in_functions
- * execution */
- uint64 num_errors; /* total number of rows which contained soft
- * errors */
- int *defmap; /* array of default att numbers related to
- * missing att */
- ExprState **defexprs; /* array of default att expressions for all
- * att */
- bool *defaults; /* if DEFAULT marker was found for
- * corresponding att */
- bool volatile_defexprs; /* is any of defexprs volatile? */
- List *range_table; /* single element list of RangeTblEntry */
- List *rteperminfos; /* single element list of RTEPermissionInfo */
- ExprState *qualexpr;
-
- TransitionCaptureState *transition_capture;
-
- /*
- * These variables are used to reduce overhead in COPY FROM.
- *
- * attribute_buf holds the separated, de-escaped text for each field of
- * the current line. The CopyReadAttributes functions return arrays of
- * pointers into this buffer. We avoid palloc/pfree overhead by re-using
- * the buffer on each cycle.
- *
- * In binary COPY FROM, attribute_buf holds the binary data for the
- * current field, but the usage is otherwise similar.
- */
- StringInfoData attribute_buf;
-
- /* field raw data pointers found by COPY FROM */
-
- int max_fields;
- char **raw_fields;
-
- /*
- * Similarly, line_buf holds the whole input line being processed. The
- * input cycle is first to read the whole line into line_buf, and then
- * extract the individual attribute fields into attribute_buf. line_buf
- * is preserved unmodified so that we can display it in error messages if
- * appropriate. (In binary mode, line_buf is not used.)
- */
- StringInfoData line_buf;
- bool line_buf_valid; /* contains the row being processed? */
-
- /*
- * input_buf holds input data, already converted to database encoding.
- *
- * In text mode, CopyReadLine parses this data sufficiently to locate line
- * boundaries, then transfers the data to line_buf. We guarantee that
- * there is a \0 at input_buf[input_buf_len] at all times. (In binary
- * mode, input_buf is not used.)
- *
- * If encoding conversion is not required, input_buf is not a separate
- * buffer but points directly to raw_buf. In that case, input_buf_len
- * tracks the number of bytes that have been verified as valid in the
- * database encoding, and raw_buf_len is the total number of bytes stored
- * in the buffer.
- */
-#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */
- char *input_buf;
- int input_buf_index; /* next byte to process */
- int input_buf_len; /* total # of bytes stored */
- bool input_reached_eof; /* true if we reached EOF */
- bool input_reached_error; /* true if a conversion error happened */
- /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
- /*
- * raw_buf holds raw input data read from the data source (file or client
- * connection), not yet converted to the database encoding. Like with
- * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
- */
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
- char *raw_buf;
- int raw_buf_index; /* next byte to process */
- int raw_buf_len; /* total # of bytes stored */
- bool raw_reached_eof; /* true if we reached EOF */
-
- /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
- uint64 bytes_processed; /* number of bytes processed so far */
-} CopyFromStateData;
-
extern void ReceiveCopyBegin(CopyFromState cstate);
extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
--
2.41.0
[application/octet-stream] v8-0006-Add-support-for-adding-custom-COPY-FROM-format.patch (7.0K, 9-v8-0006-Add-support-for-adding-custom-COPY-FROM-format.patch)
download | inline diff:
From f48e7b629a8d15fc70cd4cc4737dd2ad61910cc9 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <[email protected]>
Date: Wed, 24 Jan 2024 11:07:14 +0900
Subject: [PATCH v8 06/10] Add support for adding custom COPY FROM format
We use the same approach as we used for custom COPY TO format. Now,
custom COPY format handler can return COPY TO format routines or COPY
FROM format routines based on the "is_from" argument:
copy_handler(true) returns CopyToRoutine
copy_handler(false) returns CopyFromRoutine
---
src/backend/commands/copy.c | 53 +++++++++++++------
src/include/commands/copyapi.h | 2 +
.../expected/test_copy_format.out | 12 +++++
.../test_copy_format/sql/test_copy_format.sql | 6 +++
.../test_copy_format/test_copy_format.c | 50 +++++++++++++++--
5 files changed, 105 insertions(+), 18 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ec6dfff8ab..479f36868c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -472,12 +472,9 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
}
/* custom format */
- if (!is_from)
- {
- funcargtypes[0] = INTERNALOID;
- handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
- funcargtypes, true);
- }
+ funcargtypes[0] = INTERNALOID;
+ handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+ funcargtypes, true);
if (!OidIsValid(handlerOid))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -486,14 +483,36 @@ ProcessCopyOptionCustomFormat(ParseState *pstate,
datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
routine = DatumGetPointer(datum);
- if (routine == NULL || !IsA(routine, CopyToRoutine))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct",
- format, handlerOid),
- parser_errposition(pstate, defel->location)));
-
- opts_out->to_routine = routine;
+ if (is_from)
+ {
+ if (routine == NULL || !IsA(routine, CopyFromRoutine))
+ ereport(
+ ERROR,
+ (errcode(
+ ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY handler function "
+ "%s(%u) did not return a "
+ "CopyFromRoutine struct",
+ format, handlerOid),
+ parser_errposition(
+ pstate, defel->location)));
+ opts_out->from_routine = routine;
+ }
+ else
+ {
+ if (routine == NULL || !IsA(routine, CopyToRoutine))
+ ereport(
+ ERROR,
+ (errcode(
+ ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY handler function "
+ "%s(%u) did not return a "
+ "CopyToRoutine struct",
+ format, handlerOid),
+ parser_errposition(
+ pstate, defel->location)));
+ opts_out->to_routine = routine;
+ }
}
/*
@@ -692,7 +711,11 @@ ProcessCopyOptions(ParseState *pstate,
{
bool processed = false;
- if (!is_from)
+ if (is_from)
+ processed =
+ opts_out->from_routine->CopyFromProcessOption(
+ cstate, defel);
+ else
processed = opts_out->to_routine->CopyToProcessOption(cstate, defel);
if (!processed)
ereport(ERROR,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 323e4705d2..ef1bb201c2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -30,6 +30,8 @@ typedef void (*CopyFromEnd_function) (CopyFromState cstate);
/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
+ NodeTag type;
+
/*
* Called for processing one COPY FROM option. This will return false when
* the given option is invalid.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 3a24ae7b97..6af69f0eb7 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,18 @@
CREATE EXTENSION test_copy_format;
CREATE TABLE public.test (a INT, b INT, c INT);
INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (
+ option_before 'before',
+ format 'test_copy_format',
+ option_after 'after'
+);
+NOTICE: test_copy_format: is_from=true
+NOTICE: CopyFromProcessOption: "option_before"="before"
+NOTICE: CopyFromProcessOption: "option_after"="after"
+NOTICE: CopyFromGetFormat
+NOTICE: CopyFromStart: natts=3
+NOTICE: CopyFromOneRow
+NOTICE: CopyFromEnd
COPY public.test TO stdout WITH (
option_before 'before',
format 'test_copy_format',
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0eb7ed2e11..94d3c789a0 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,12 @@
CREATE EXTENSION test_copy_format;
CREATE TABLE public.test (a INT, b INT, c INT);
INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (
+ option_before 'before',
+ format 'test_copy_format',
+ option_after 'after'
+);
+\.
COPY public.test TO stdout WITH (
option_before 'before',
format 'test_copy_format',
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
index a2219afcde..5e1b40e881 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,50 @@
PG_MODULE_MAGIC;
+static bool
+CopyFromProcessOption(CopyFromState cstate, DefElem *defel)
+{
+ ereport(NOTICE,
+ (errmsg("CopyFromProcessOption: \"%s\"=\"%s\"",
+ defel->defname, defGetString(defel))));
+ return true;
+}
+
+static int16
+CopyFromGetFormat(CopyFromState cstate)
+{
+ ereport(NOTICE, (errmsg("CopyFromGetFormat")));
+ return 0;
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+ ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+ ereport(NOTICE, (errmsg("CopyFromOneRow")));
+ return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+ ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+ .type = T_CopyFromRoutine,
+ .CopyFromProcessOption = CopyFromProcessOption,
+ .CopyFromGetFormat = CopyFromGetFormat,
+ .CopyFromStart = CopyFromStart,
+ .CopyFromOneRow = CopyFromOneRow,
+ .CopyFromEnd = CopyFromEnd,
+};
+
static bool
CopyToProcessOption(CopyToState cstate, DefElem *defel)
{
@@ -71,7 +115,7 @@ test_copy_format(PG_FUNCTION_ARGS)
(errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
if (is_from)
- elog(ERROR, "COPY FROM isn't supported yet");
-
- PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+ PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+ else
+ PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
}
--
2.41.0
[application/octet-stream] v8-0009-change-CopyToGetFormat-to-CopyToSendCopyBegin-and.patch (7.6K, 10-v8-0009-change-CopyToGetFormat-to-CopyToSendCopyBegin-and.patch)
download | inline diff:
From f0a8151feff44823881c3c4e1e7aca4f9bd690d5 Mon Sep 17 00:00:00 2001
From: Zhao Junwang <[email protected]>
Date: Sat, 27 Jan 2024 09:53:31 +0800
Subject: [PATCH v8 09/10] change CopyToGetFormat to CopyToSendCopyBegin and
export more api
Signed-off-by: Zhao Junwang <[email protected]>
---
src/backend/commands/copyto.c | 65 ++++++++++---------
src/include/commands/copyapi.h | 12 ++--
.../test_copy_format/test_copy_format.c | 7 +-
3 files changed, 46 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b5d8678394..e2a4964015 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -66,11 +66,6 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
/* Low-level communications functions */
static void SendCopyBegin(CopyToState cstate);
static void SendCopyEnd(CopyToState cstate);
-static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
-static void CopySendString(CopyToState cstate, const char *str);
-static void CopySendChar(CopyToState cstate, char c);
-static void CopySendInt32(CopyToState cstate, int32 val);
-static void CopySendInt16(CopyToState cstate, int16 val);
/*
* CopyToRoutine implementations.
@@ -90,10 +85,20 @@ CopyToTextProcessOption(CopyToState cstate, DefElem *defel)
return false;
}
-static int16
-CopyToTextGetFormat(CopyToState cstate)
+static void
+CopyToTextSendCopyBegin(CopyToState cstate)
{
- return 0;
+ StringInfoData buf;
+ int natts = list_length(cstate->attnumlist);
+ int16 format = 0;
+ int i;
+
+ pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+ pq_sendbyte(&buf, format); /* overall format */
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ pq_endmessage(&buf);
}
static void
@@ -230,10 +235,20 @@ CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel)
return false;
}
-static int16
-CopyToBinaryGetFormat(CopyToState cstate)
+static void
+CopyToBinarySendCopyBegin(CopyToState cstate)
{
- return 1;
+ StringInfoData buf;
+ int natts = list_length(cstate->attnumlist);
+ int16 format = 1;
+ int i;
+
+ pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+ pq_sendbyte(&buf, format); /* overall format */
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ pq_endmessage(&buf);
}
static void
@@ -315,7 +330,7 @@ CopyToBinaryEnd(CopyToState cstate)
CopyToRoutine CopyToRoutineText = {
.CopyToProcessOption = CopyToTextProcessOption,
- .CopyToGetFormat = CopyToTextGetFormat,
+ .CopyToSendCopyBegin = CopyToTextSendCopyBegin,
.CopyToStart = CopyToTextStart,
.CopyToOneRow = CopyToTextOneRow,
.CopyToEnd = CopyToTextEnd,
@@ -328,7 +343,7 @@ CopyToRoutine CopyToRoutineText = {
*/
CopyToRoutine CopyToRoutineCSV = {
.CopyToProcessOption = CopyToTextProcessOption,
- .CopyToGetFormat = CopyToTextGetFormat,
+ .CopyToSendCopyBegin = CopyToTextSendCopyBegin,
.CopyToStart = CopyToTextStart,
.CopyToOneRow = CopyToTextOneRow,
.CopyToEnd = CopyToTextEnd,
@@ -336,7 +351,7 @@ CopyToRoutine CopyToRoutineCSV = {
CopyToRoutine CopyToRoutineBinary = {
.CopyToProcessOption = CopyToBinaryProcessOption,
- .CopyToGetFormat = CopyToBinaryGetFormat,
+ .CopyToSendCopyBegin = CopyToBinarySendCopyBegin,
.CopyToStart = CopyToBinaryStart,
.CopyToOneRow = CopyToBinaryOneRow,
.CopyToEnd = CopyToBinaryEnd,
@@ -349,17 +364,7 @@ CopyToRoutine CopyToRoutineBinary = {
static void
SendCopyBegin(CopyToState cstate)
{
- StringInfoData buf;
- int natts = list_length(cstate->attnumlist);
- int16 format = cstate->opts.to_routine->CopyToGetFormat(cstate);
- int i;
-
- pq_beginmessage(&buf, PqMsg_CopyOutResponse);
- pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
- pq_endmessage(&buf);
+ cstate->opts.to_routine->CopyToSendCopyBegin(cstate);
cstate->copy_dest = COPY_DEST_FRONTEND;
}
@@ -382,19 +387,19 @@ SendCopyEnd(CopyToState cstate)
* NB: no data conversion is applied by these functions
*----------
*/
-static void
+void
CopySendData(CopyToState cstate, const void *databuf, int datasize)
{
appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
}
-static void
+void
CopySendString(CopyToState cstate, const char *str)
{
appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
}
-static void
+void
CopySendChar(CopyToState cstate, char c)
{
appendStringInfoCharMacro(cstate->fe_msgbuf, c);
@@ -464,7 +469,7 @@ CopyToStateFlush(CopyToState cstate)
/*
* CopySendInt32 sends an int32 in network byte order
*/
-static inline void
+inline void
CopySendInt32(CopyToState cstate, int32 val)
{
uint32 buf;
@@ -476,7 +481,7 @@ CopySendInt32(CopyToState cstate, int32 val)
/*
* CopySendInt16 sends an int16 in network byte order
*/
-static inline void
+inline void
CopySendInt16(CopyToState cstate, int16 val)
{
uint16 buf;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 22accc83ab..0a05b24c54 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -67,7 +67,7 @@ extern CopyFromRoutine CopyFromRoutineBinary;
typedef struct CopyToStateData *CopyToState;
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
-typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
+typedef void (*CopyToSendCopyBegin_function) (CopyToState cstate);
typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
typedef void (*CopyToEnd_function) (CopyToState cstate);
@@ -84,10 +84,9 @@ typedef struct CopyToRoutine
CopyToProcessOption_function CopyToProcessOption;
/*
- * Called when COPY TO is started. This will return a format as int16
- * value. It's used for the CopyOutResponse message.
+ * Called when COPY TO is started.
*/
- CopyToGetFormat_function CopyToGetFormat;
+ CopyToSendCopyBegin_function CopyToSendCopyBegin;
/* Called when COPY TO is started. This will send a header. */
CopyToStart_function CopyToStart;
@@ -384,6 +383,11 @@ typedef struct CopyToStateData
void *opaque; /* private space */
} CopyToStateData;
+extern void CopySendData(CopyToState cstate, const void *databuf, int datasize);
+extern void CopySendString(CopyToState cstate, const char *str);
+extern void CopySendChar(CopyToState cstate, char c);
+extern void CopySendInt32(CopyToState cstate, int32 val);
+extern void CopySendInt16(CopyToState cstate, int16 val);
extern void CopyToStateFlush(CopyToState cstate);
#endif /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c
index 5e1b40e881..d833f22bbf 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -71,11 +71,10 @@ CopyToProcessOption(CopyToState cstate, DefElem *defel)
return true;
}
-static int16
-CopyToGetFormat(CopyToState cstate)
+static void
+CopyToSendCopyBegin(CopyToState cstate)
{
ereport(NOTICE, (errmsg("CopyToGetFormat")));
- return 0;
}
static void
@@ -99,7 +98,7 @@ CopyToEnd(CopyToState cstate)
static const CopyToRoutine CopyToRoutineTestCopyFormat = {
.type = T_CopyToRoutine,
.CopyToProcessOption = CopyToProcessOption,
- .CopyToGetFormat = CopyToGetFormat,
+ .CopyToSendCopyBegin = CopyToSendCopyBegin,
.CopyToStart = CopyToStart,
.CopyToOneRow = CopyToOneRow,
.CopyToEnd = CopyToEnd,
--
2.41.0
[application/octet-stream] v8-0010-introduce-contrib-pg_copy_json.patch (16.3K, 11-v8-0010-introduce-contrib-pg_copy_json.patch)
download | inline diff:
From 7dc6c1c798178f31728d048d4d528181626b3695 Mon Sep 17 00:00:00 2001
From: Zhao Junwang <[email protected]>
Date: Sat, 27 Jan 2024 13:34:38 +0800
Subject: [PATCH v8 10/10] introduce contrib/pg_copy_json
Signed-off-by: Zhao Junwang <[email protected]>
---
contrib/Makefile | 1 +
contrib/meson.build | 1 +
contrib/pg_copy_json/.gitignore | 4 +
contrib/pg_copy_json/Makefile | 23 ++
.../pg_copy_json/expected/pg_copy_json.out | 80 +++++++
contrib/pg_copy_json/meson.build | 34 +++
contrib/pg_copy_json/pg_copy_json--1.0.sql | 9 +
contrib/pg_copy_json/pg_copy_json.c | 218 ++++++++++++++++++
contrib/pg_copy_json/pg_copy_json.control | 5 +
contrib/pg_copy_json/sql/pg_copy_json.sql | 59 +++++
src/backend/utils/adt/json.c | 5 +-
src/include/utils/json.h | 2 +
12 files changed, 438 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_copy_json/.gitignore
create mode 100644 contrib/pg_copy_json/Makefile
create mode 100644 contrib/pg_copy_json/expected/pg_copy_json.out
create mode 100644 contrib/pg_copy_json/meson.build
create mode 100644 contrib/pg_copy_json/pg_copy_json--1.0.sql
create mode 100644 contrib/pg_copy_json/pg_copy_json.c
create mode 100644 contrib/pg_copy_json/pg_copy_json.control
create mode 100644 contrib/pg_copy_json/sql/pg_copy_json.sql
diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..82cc496aa2 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -32,6 +32,7 @@ SUBDIRS = \
pageinspect \
passwordcheck \
pg_buffercache \
+ pg_copy_json \
pg_freespacemap \
pg_prewarm \
pg_stat_statements \
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..38933d15d1 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -45,6 +45,7 @@ subdir('oid2name')
subdir('pageinspect')
subdir('passwordcheck')
subdir('pg_buffercache')
+subdir('pg_copy_json')
subdir('pgcrypto')
subdir('pg_freespacemap')
subdir('pg_prewarm')
diff --git a/contrib/pg_copy_json/.gitignore b/contrib/pg_copy_json/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_copy_json/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_copy_json/Makefile b/contrib/pg_copy_json/Makefile
new file mode 100644
index 0000000000..b0a348d618
--- /dev/null
+++ b/contrib/pg_copy_json/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_copy_json//Makefile
+
+MODULE_big = pg_copy_json
+OBJS = \
+ $(WIN32RES) \
+ pg_copy_json.o
+PGFILEDESC = "pg_copy_json - COPY TO JSON (JavaScript Object Notation) format"
+
+EXTENSION = pg_copy_json
+DATA = pg_copy_json--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_copy_json
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_copy_json/expected/pg_copy_json.out b/contrib/pg_copy_json/expected/pg_copy_json.out
new file mode 100644
index 0000000000..73633c2303
--- /dev/null
+++ b/contrib/pg_copy_json/expected/pg_copy_json.out
@@ -0,0 +1,80 @@
+--
+-- COPY TO JSON
+--
+CREATE EXTENSION pg_copy_json;
+-- test copying in JSON format with various styles
+-- of embedded line ending characters
+create temp table copytest (
+ style text,
+ test text,
+ filler int);
+insert into copytest values('DOS',E'abc\r\ndef',1);
+insert into copytest values('Unix',E'abc\ndef',2);
+insert into copytest values('Mac',E'abc\rdef',3);
+insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+copy copytest to stdout with (format 'json');
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- pg_copy_json do not support COPY FROM
+copy copytest from stdout with (format 'json');
+ERROR: cannot use JSON mode in COPY FROM
+-- test copying in JSON format with various styles
+-- of embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout with (format 'json');
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
+-- test force array
+copy copytest to stdout (format 'json', force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format 'json', force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format 'json', force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
diff --git a/contrib/pg_copy_json/meson.build b/contrib/pg_copy_json/meson.build
new file mode 100644
index 0000000000..71f9338267
--- /dev/null
+++ b/contrib/pg_copy_json/meson.build
@@ -0,0 +1,34 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+pg_copy_json_sources = files(
+ 'pg_copy_json.c',
+)
+
+if host_system == 'windows'
+ pg_copy_json_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'pg_copy_json',
+ '--FILEDESC', 'pg_copy_json - COPY TO JSON format',])
+endif
+
+pg_copy_json = shared_module('pg_copy_json',
+ pg_copy_json_sources,
+ kwargs: contrib_mod_args,
+)
+contrib_targets += pg_copy_json
+
+install_data(
+ 'pg_copy_json--1.0.sql',
+ 'pg_copy_json.control',
+ kwargs: contrib_data_args,
+)
+
+tests += {
+ 'name': 'pg_copy_json',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+ 'regress': {
+ 'sql': [
+ 'pg_copy_json',
+ ],
+ },
+}
diff --git a/contrib/pg_copy_json/pg_copy_json--1.0.sql b/contrib/pg_copy_json/pg_copy_json--1.0.sql
new file mode 100644
index 0000000000..d738a1e7e9
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json--1.0.sql
@@ -0,0 +1,9 @@
+/* contrib/pg_copy_json/copy_json--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_copy_json" to load this file. \quit
+
+CREATE FUNCTION pg_catalog.json(internal)
+ RETURNS copy_handler
+ AS 'MODULE_PATHNAME', 'copy_json'
+ LANGUAGE C;
diff --git a/contrib/pg_copy_json/pg_copy_json.c b/contrib/pg_copy_json/pg_copy_json.c
new file mode 100644
index 0000000000..cbfdee8e8b
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json.c
@@ -0,0 +1,218 @@
+/*--------------------------------------------------------------------------
+ *
+ * pg_copy_json.c
+ * COPY TO JSON (JavaScript Object Notation) format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copy.h"
+#include "commands/defrem.h"
+#include "funcapi.h"
+#include "libpq/libpq.h"
+#include "libpq/pqformat.h"
+#include "utils/json.h"
+
+PG_MODULE_MAGIC;
+
+typedef struct
+{
+ /*
+ * Force output of square brackets as array decorations at the beginning
+ * and end of output, with commas between the rows.
+ */
+ bool force_array;
+ bool force_array_specified;
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+} CopyJsonData;
+
+static inline void
+InitCopyJsonData(CopyJsonData *p)
+{
+ Assert(p);
+ p->force_array = false;
+ p->force_array_specified = false;
+ p->json_row_delim_needed = false;
+}
+
+static void
+CopyToJsonSendEndOfRow(CopyToState cstate)
+{
+ switch (cstate->copy_dest)
+ {
+ case COPY_DEST_FILE:
+ /* Default line termination depends on platform */
+#ifndef WIN32
+ CopySendChar(cstate, '\n');
+#else
+ CopySendString(cstate, "\r\n");
+#endif
+ break;
+ case COPY_DEST_FRONTEND:
+ /* The FE/BE protocol uses \n as newline for all platforms */
+ CopySendChar(cstate, '\n');
+ break;
+ default:
+ break;
+ }
+ CopyToStateFlush(cstate);
+}
+
+static bool
+CopyToJsonProcessOption(CopyToState cstate, DefElem *defel)
+{
+ CopyJsonData *p;
+
+ if (cstate->opaque == NULL)
+ {
+ MemoryContext oldcontext;
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ cstate->opaque = palloc0(sizeof(CopyJsonData));
+ MemoryContextSwitchTo(oldcontext);
+ InitCopyJsonData(cstate->opaque);
+ }
+
+ p = (CopyJsonData *)cstate->opaque;
+
+ if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (p->force_array_specified)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("CopyToJsonProcessOption: redundant options \"%s\"=\"%s\"",
+ defel->defname, defGetString(defel)));
+ p->force_array_specified = true;
+ p->force_array = defGetBoolean(defel);
+
+ return true;
+ }
+
+ return false;
+}
+
+static void
+CopyToJsonSendCopyBegin(CopyToState cstate)
+{
+ StringInfoData buf;
+ int16 format = 0;
+
+ pq_beginmessage(&buf, PqMsg_CopyOutResponse);
+ pq_sendbyte(&buf, format); /* overall format */
+ /*
+ * JSON mode is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ pq_endmessage(&buf);
+}
+
+static void
+CopyToJsonStart(CopyToState cstate, TupleDesc tupDesc)
+{
+ CopyJsonData *p;
+
+ if (cstate->opaque == NULL)
+ {
+ MemoryContext oldcontext;
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ cstate->opaque = palloc0(sizeof(CopyJsonData));
+ MemoryContextSwitchTo(oldcontext);
+ InitCopyJsonData(cstate->opaque);
+ }
+
+ /* No need to alloc cstate->out_functions */
+
+ p = (CopyJsonData *)cstate->opaque;
+
+ /* If FORCE_ARRAY has been specified send the open bracket. */
+ if (p->force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopyToJsonSendEndOfRow(cstate);
+ }
+}
+
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+ CopyJsonData *p;
+
+ Assert(cstate->opaque);
+ p = (CopyJsonData *)cstate->opaque;
+
+ if(!cstate->rel)
+ {
+ for (int i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+ {
+ /* Flat-copy the attribute array */
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, i),
+ TupleDescAttr(cstate->queryDesc->tupDesc, i),
+ 1 * sizeof(FormData_pg_attribute));
+ }
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ if (p->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else if (p->force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ p->json_row_delim_needed = true;
+ }
+ CopySendData(cstate, result->data, result->len);
+ CopyToJsonSendEndOfRow(cstate);
+}
+
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ CopyJsonData *p;
+
+ Assert(cstate->opaque);
+ p = (CopyJsonData *)cstate->opaque;
+
+ /* If FORCE_ARRAY has been specified send the close bracket. */
+ if (p->force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopyToJsonSendEndOfRow(cstate);
+ }
+}
+
+static const CopyToRoutine CopyToRoutineJson = {
+ .type = T_CopyToRoutine,
+ .CopyToProcessOption = CopyToJsonProcessOption,
+ .CopyToSendCopyBegin = CopyToJsonSendCopyBegin,
+ .CopyToStart = CopyToJsonStart,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToJsonEnd,
+};
+
+PG_FUNCTION_INFO_V1(copy_json);
+Datum
+copy_json(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+
+ PG_RETURN_POINTER(&CopyToRoutineJson);
+}
diff --git a/contrib/pg_copy_json/pg_copy_json.control b/contrib/pg_copy_json/pg_copy_json.control
new file mode 100644
index 0000000000..90b0a74603
--- /dev/null
+++ b/contrib/pg_copy_json/pg_copy_json.control
@@ -0,0 +1,5 @@
+# pg_copy_json extension
+comment = 'COPY TO JSON format'
+default_version = '1.0'
+module_pathname = '$libdir/pg_copy_json'
+relocatable = true
diff --git a/contrib/pg_copy_json/sql/pg_copy_json.sql b/contrib/pg_copy_json/sql/pg_copy_json.sql
new file mode 100644
index 0000000000..73e7e514ac
--- /dev/null
+++ b/contrib/pg_copy_json/sql/pg_copy_json.sql
@@ -0,0 +1,59 @@
+--
+-- COPY TO JSON
+--
+
+CREATE EXTENSION pg_copy_json;
+
+-- test copying in JSON format with various styles
+-- of embedded line ending characters
+
+create temp table copytest (
+ style text,
+ test text,
+ filler int);
+
+insert into copytest values('DOS',E'abc\r\ndef',1);
+insert into copytest values('Unix',E'abc\ndef',2);
+insert into copytest values('Mac',E'abc\rdef',3);
+insert into copytest values(E'esc\\ape',E'a\\r\\\r\\\n\\nb',4);
+
+copy copytest to stdout with (format 'json');
+
+-- pg_copy_json do not support COPY FROM
+copy copytest from stdout with (format 'json');
+
+-- test copying in JSON format with various styles
+-- of embedded escaped characters
+
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout with (format 'json');
+
+-- test force array
+
+copy copytest to stdout (format 'json', force_array);
+copy copytest to stdout (format 'json', force_array true);
+copy copytest to stdout (format 'json', force_array false);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d719a61f16..fabd4e611e 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -83,8 +83,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -507,8 +505,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 6d7f1b387d..d5631171ad 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
--
2.41.0
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-31 09:49 vignesh C <[email protected]>
parent: Junwang Zhao <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: vignesh C @ 2024-01-31 09:49 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: jian he <[email protected]>; Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers; Sutou Kouhei <[email protected]>
On Sat, 27 Jan 2024 at 11:25, Junwang Zhao <[email protected]> wrote:
>
> Hi hackers,
>
> Kou-san(CCed) has been working on *Make COPY format extendable[1]*, so
> I think making *copy to json* based on that work might be the right direction.
>
> I write an extension for that purpose, and here is the patch set together
> with Kou-san's *extendable copy format* implementation:
>
> 0001-0009 is the implementation of extendable copy format
> 00010 is the pg_copy_json extension
>
> I also created a PR[2] if anybody likes the github review style.
>
> The *extendable copy format* feature is still being developed, I post this
> email in case the patch set in this thread is committed without knowing
> the *extendable copy format* feature.
>
> I'd like to hear your opinions.
CFBot shows that one of the test is failing as in [1]:
[05:46:41.678] /bin/sh: 1: cannot open
/tmp/cirrus-ci-build/contrib/pg_copy_json/sql/test_copy_format.sql: No
such file
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out:
No such file or directory
[05:46:41.678] diff:
/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out:
No such file or directory
[05:46:41.678] # diff command failed with status 512: diff
"/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out"
"/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out"
> "/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out.diff"
[05:46:41.678] Bail out!make[2]: *** [../../src/makefiles/pgxs.mk:454:
check] Error 2
[05:46:41.679] make[1]: *** [Makefile:96: check-pg_copy_json-recurse] Error 2
[05:46:41.679] make: *** [GNUmakefile:71: check-world-contrib-recurse] Error 2
Please post an updated version for the same.
[1] - https://cirrus-ci.com/task/5322439115145216
Regards,
Vignesh
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-31 09:58 Junwang Zhao <[email protected]>
parent: vignesh C <[email protected]>
0 siblings, 0 replies; 28+ messages in thread
From: Junwang Zhao @ 2024-01-31 09:58 UTC (permalink / raw)
To: vignesh C <[email protected]>; +Cc: jian he <[email protected]>; Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers; Sutou Kouhei <[email protected]>
Hi Vignesh,
On Wed, Jan 31, 2024 at 5:50 PM vignesh C <[email protected]> wrote:
>
> On Sat, 27 Jan 2024 at 11:25, Junwang Zhao <[email protected]> wrote:
> >
> > Hi hackers,
> >
> > Kou-san(CCed) has been working on *Make COPY format extendable[1]*, so
> > I think making *copy to json* based on that work might be the right direction.
> >
> > I write an extension for that purpose, and here is the patch set together
> > with Kou-san's *extendable copy format* implementation:
> >
> > 0001-0009 is the implementation of extendable copy format
> > 00010 is the pg_copy_json extension
> >
> > I also created a PR[2] if anybody likes the github review style.
> >
> > The *extendable copy format* feature is still being developed, I post this
> > email in case the patch set in this thread is committed without knowing
> > the *extendable copy format* feature.
> >
> > I'd like to hear your opinions.
>
> CFBot shows that one of the test is failing as in [1]:
> [05:46:41.678] /bin/sh: 1: cannot open
> /tmp/cirrus-ci-build/contrib/pg_copy_json/sql/test_copy_format.sql: No
> such file
> [05:46:41.678] diff:
> /tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out:
> No such file or directory
> [05:46:41.678] diff:
> /tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out:
> No such file or directory
> [05:46:41.678] # diff command failed with status 512: diff
> "/tmp/cirrus-ci-build/contrib/pg_copy_json/expected/test_copy_format.out"
> "/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out"
> > "/tmp/cirrus-ci-build/contrib/pg_copy_json/results/test_copy_format.out.diff"
> [05:46:41.678] Bail out!make[2]: *** [../../src/makefiles/pgxs.mk:454:
> check] Error 2
> [05:46:41.679] make[1]: *** [Makefile:96: check-pg_copy_json-recurse] Error 2
> [05:46:41.679] make: *** [GNUmakefile:71: check-world-contrib-recurse] Error 2
>
> Please post an updated version for the same.
Thanks for the reminder, the patch set I posted is not for commit but
for further discussion.
I will post more information about the *extendable copy* feature
when it's about to be committed.
>
> [1] - https://cirrus-ci.com/task/5322439115145216
>
> Regards,
> Vignesh
--
Regards
Junwang Zhao
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2024-01-31 13:26 Alvaro Herrera <[email protected]>
parent: jian he <[email protected]>
1 sibling, 0 replies; 28+ messages in thread
From: Alvaro Herrera @ 2024-01-31 13:26 UTC (permalink / raw)
To: jian he <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2024-Jan-23, jian he wrote:
> > + | FORMAT_LA copy_generic_opt_arg
> > + {
> > + $$ = makeDefElem("format", $2, @1);
> > + }
> > ;
> >
> > I think it's not necessary. "format" option is already handled in
> > copy_generic_opt_elem.
>
> test it, I found out this part is necessary.
> because a query with WITH like `copy (select 1) to stdout with
> (format json, force_array false); ` will fail.
Right, because "FORMAT JSON" is turned into FORMAT_LA JSON by parser.c
(see base_yylex there). I'm not really sure but I think it might be
better to make it "| FORMAT_LA JSON" instead of invoking the whole
copy_generic_opt_arg syntax. Not because of performance, but just
because it's much clearer what's going on.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-03-11 08:23 jian he <[email protected]>
parent: Joe Conway <[email protected]>
1 sibling, 1 reply; 28+ messages in thread
From: jian he @ 2025-03-11 08:23 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Sun, Mar 2, 2025 at 1:28 PM Junwang Zhao <[email protected]> wrote:
>
>
> I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
> see 2e4127b6d2.
>
> v15-0001 is the merged one of v14-0001 and v14-0002
>
> There are some other ongoing *copy to/from* refactors[1] which we can benefit
> to make the code cleaner, especially the checks done in ProcessCopyOptions.
>
> [1]: https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com
>
hi.
git apply --check $PATCHES/v15-0001-Introduce-json-format-for-COPY-TO.patch
error: patch failed: src/backend/commands/copyfrom.c:155
error: src/backend/commands/copyfrom.c: patch does not apply
error: patch failed: src/backend/commands/copyto.c:176
error: src/backend/commands/copyto.c: patch does not apply
seems to need rebase.
the attachment is the rebase, minor comments tweaks, and commit message tweaks.
another issue is this patch entry in commitfest [1] status is: Not processed,
which means no cfbots CI tests, seems not great.
not sure how to resolve this issue....
[1] https://commitfest.postgresql.org/patch/4716/
Attachments:
[text/x-patch] v16-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.7K, 2-v16-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 24e1858722dbb25c4842d0ec2dee5b1047edcc23 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 11 Mar 2025 16:19:55 +0800
Subject: [PATCH v16 2/2] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.
refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 ++++++++++++
src/backend/commands/copy.c | 13 ++++++++++++
src/backend/commands/copyto.c | 34 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 ++++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 9c519d8a9e2..7b3c913d4ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
ON_ERROR <replaceable class="parameter">error_action</replaceable>
REJECT_LIMIT <replaceable class="parameter">maxerror</replaceable>
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
@@ -392,6 +393,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>ON_ERROR</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6f74c798d0..7b4c64ea97e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -504,6 +504,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -658,6 +659,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -906,6 +914,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c1d4cbeedea..393c0440ad7 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter to start next json array element */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -128,6 +129,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +174,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -238,6 +240,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send
+ * the opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -349,11 +361,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6069b118834..e0c7412ec0a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3278,7 +3278,7 @@ match_previous_words(int pattern_id,
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
"ON_ERROR", "LOG_VERBOSITY");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 07fcc2bc9ac..fa8a8ab7e31 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -89,6 +89,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0d4cfc0b60a..3d0781700b3 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 6ee96f5aa51..2781c24bd84 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v16-0001-Introduce-json-format-for-COPY-TO.patch (30.2K, 3-v16-0001-Introduce-json-format-for-COPY-TO.patch)
download | inline diff:
From 71cf17c9d1aefecc89cc388b979bbfc9952898c8 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Tue, 11 Mar 2025 15:53:37 +0800
Subject: [PATCH v16 1/2] Introduce json format for COPY TO
json format is only allowed in COPY TO operation.
also cannot be used with {header, default, null, delimiter} options and many other options.
fully tested on src/test/regress/sql/copy.sql.
CopyFormat enum part was coied from Joel Jacobson <[email protected]>
refactored by Jian He to fix some miscellaneous issue.
refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 77 ++++++++++++++++++--------
src/backend/commands/copyfrom.c | 6 +-
src/backend/commands/copyfromparse.c | 6 +-
src/backend/commands/copyto.c | 83 ++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 +++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 14 ++++-
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 74 +++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 47 ++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
13 files changed, 286 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..9c519d8a9e2 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -219,10 +219,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -257,7 +262,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -271,7 +276,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -294,7 +299,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -310,7 +315,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
If this option is set to <literal>MATCH</literal>, the number and names
of the columns in the header line must match the actual column names of
the table, in order; otherwise an error is raised.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
The <literal>MATCH</literal> option is only valid for <command>COPY
FROM</command> commands.
</para>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..b6f74c798d0 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -525,11 +525,13 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -689,31 +691,47 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -761,7 +779,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -769,43 +787,48 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -819,8 +842,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -835,8 +858,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -860,7 +883,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -877,6 +900,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
@@ -896,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -913,7 +942,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..bfe1937539b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..02263f1b1f5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,7 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(!(cstate->opts.format == COPY_FORMAT_BINARY));
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..c1d4cbeedea 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -176,16 +187,18 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -215,7 +230,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * if the COPY TO source data come from query rather than plain table, we need
+ * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query execution,
+ * and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -392,14 +440,25 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -499,7 +558,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 271ae26cbaf..e26881bb13f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3493,6 +3493,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3575,6 +3579,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..bf69347fa94 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8432be641ac..6069b118834 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3283,7 +3283,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..07fcc2bc9ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,17 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -61,9 +72,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 06bae8c61ae..0d4cfc0b60a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 3009bdfdf89..6ee96f5aa51 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9840060997f..a2e4bfee0e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -500,6 +500,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-07-04 05:27 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2025-07-04 05:27 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Tue, Mar 11, 2025 at 4:23 PM jian he <[email protected]> wrote:
>
hi.
rebase and minor tweaks.
Attachments:
[text/x-patch] v17-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.8K, 2-v17-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 44c494fd8d7fdb9d8fd5d2d2a48f49b779d1bcb9 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 4 Jul 2025 13:25:00 +0800
Subject: [PATCH v17 2/2] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format. it make the
output json output behave like json array type. refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 +++++++++++
src/backend/commands/copy.c | 13 +++++++++++
src/backend/commands/copyto.c | 37 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 +++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 213e59cc435..d23ef99e395 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -918,6 +926,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 34b72936bca..18061661fb1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bd4c03be050..55313612398 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3303,7 +3303,7 @@ match_previous_words(int pattern_id,
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
"HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
- "FORCE_NOT_NULL", "FORCE_NULL", "ENCODING", "DEFAULT",
+ "FORCE_NOT_NULL", "FORCE_NULL", "FORCE_ARRAY", "ENCODING", "DEFAULT",
"ON_ERROR", "LOG_VERBOSITY", "REJECT_LIMIT");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index fcb8823e101..f3196fd5609 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -110,6 +110,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 80bd4a59239..c55aa08e99d 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -100,6 +100,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v17-0001-json-format-for-COPY-TO.patch (30.5K, 3-v17-0001-json-format-for-COPY-TO.patch)
download | inline diff:
From 984846c42dee4da6bd718cc24031bc0272d62f12 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 4 Jul 2025 13:23:30 +0800
Subject: [PATCH v17 1/2] json format for COPY TO
JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 79 ++++++++++++++++++--------
src/backend/commands/copyfrom.c | 6 +-
src/backend/commands/copyfromparse.c | 7 ++-
src/backend/commands/copyto.c | 83 ++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 +++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 14 ++++-
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 74 +++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 46 +++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
13 files changed, 288 insertions(+), 52 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..213e59cc435 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -535,11 +537,13 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +703,47 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->delim)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->null_print)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->default_print)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = opts_out->format == COPY_FORMAT_CSV ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = opts_out->format == COPY_FORMAT_CSV ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -771,7 +791,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -779,43 +799,48 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->format == COPY_FORMAT_JSON && opts_out->header_line)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +854,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +870,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +895,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -887,6 +912,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
@@ -906,7 +937,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +954,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..6c4bd303841 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..34b72936bca 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -176,16 +187,18 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -215,7 +230,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * if the COPY TO source data come from query rather than plain table, we
+ * need copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query
+ * execution, and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -392,14 +440,25 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -499,7 +558,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 70a0d832a11..de0f5eb4118 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3492,6 +3492,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3574,6 +3578,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..bf69347fa94 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 53e7d35fe98..bd4c03be050 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3308,7 +3308,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,17 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..fcb8823e101 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,80 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..80bd4a59239 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,52 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 114bdafafdf..fa7d7b9244a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -518,6 +518,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-08-10 15:20 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2025-08-10 15:20 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
hi.
rebase and splitted into 3 patches.
v18-0001
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
remove CopyFormatOptions two boolean field
(binary, csv_mode)
v18-0002, v18-0003 is refactoring based on prior patch.
Attachments:
[text/x-patch] v18-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.7K, 2-v18-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From e9104bcc0a6c4ca96df5ff3fdd3ae659885dd664 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v18 3/3] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format. it make the
output json output behave like json array type. refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 +++++++++++
src/backend/commands/copy.c | 13 +++++++++++
src/backend/commands/copyto.c | 37 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 +++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7fd41dba250..0a22272f3fe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -925,6 +933,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 13eb14debbd..6fc1d3e9fee 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 16db5373c5f..9c65376fc7e 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1212,7 +1212,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0fc6e84352c..22626a13ba5 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 071986d427a..0f121b48f71 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v18-0002-json-format-for-COPY-TO.patch (22.6K, 3-v18-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From e6bafe5eab9463b253e9697e0fbd82246c316a12 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sun, 10 Aug 2025 23:13:38 +0800
Subject: [PATCH v18 2/3] json format for COPY TO
JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 72 +++++++++++++++++++++-------
src/backend/commands/copyto.c | 76 ++++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 ++++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 4 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 47 ++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 271 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 68f69cfb9df..7fd41dba250 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -542,6 +542,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -701,21 +703,42 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+ }
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -781,11 +804,18 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER"));
+ else if(opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+ }
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -889,6 +919,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e990343bab0..13eb14debbd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -180,12 +191,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * If COPY TO source data come from query rather than plain table, we need
+ * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query
+ * execution, and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -397,9 +445,21 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -499,7 +559,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..48e2242327e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3528,6 +3528,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3610,6 +3614,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e9d370cb3da..e517470bbc7 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1f2ca946fc5..16db5373c5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3344,8 +3344,10 @@ match_previous_words(int pattern_id,
COMPLETE_WITH(Copy_to_options);
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
- else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
+ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ else if (Matches("COPY|\\copy", MatchAny, "TO", MatchAny, "WITH", "(", "FORMAT"))
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..0fc6e84352c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..071986d427a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..374b40d14de 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -515,6 +515,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
[text/x-patch] v18-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (12.7K, 4-v18-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From 1b30cb9b34b770cebc0bb20af867baec2f72aedb Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 16:48:56 +0800
Subject: [PATCH v18 1/3] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
5 files changed, 48 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..68f69cfb9df 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -535,11 +537,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +701,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -771,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -779,43 +781,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +831,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +847,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +872,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -906,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..6c4bd303841 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..e990343bab0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,9 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -215,7 +215,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -392,7 +392,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-10-01 06:16 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2025-10-01 06:16 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Sun, Aug 10, 2025 at 11:20 PM jian he <[email protected]> wrote:
>
> v18-0001
> +typedef enum CopyFormat
> +{
> + COPY_FORMAT_TEXT = 0,
> + COPY_FORMAT_BINARY,
> + COPY_FORMAT_CSV,
> +} CopyFormat;
> remove CopyFormatOptions two boolean field
> (binary, csv_mode)
>
> v18-0002, v18-0003 is refactoring based on prior patch.
hi.
v19 attached, same as v18.
repost it so that CFbot can pick up the latest patchset.
Attachments:
[text/x-patch] v19-0002-json-format-for-COPY-TO.patch (22.6K, 2-v19-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 3869884bd47aaf681cc3c2bf96fe59d11069155c Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sun, 10 Aug 2025 23:13:38 +0800
Subject: [PATCH v19 2/3] json format for COPY TO
JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 72 +++++++++++++++++++++-------
src/backend/commands/copyto.c | 76 ++++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 ++++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 4 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 47 ++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
11 files changed, 271 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c2d1fbc1fbe..219604ad306 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 68f69cfb9df..7fd41dba250 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -542,6 +542,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -701,21 +703,42 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+ }
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -781,11 +804,18 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER"));
+ else if(opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+ }
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -889,6 +919,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e990343bab0..13eb14debbd 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -31,6 +32,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -125,6 +127,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -144,7 +147,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -164,6 +167,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -180,12 +191,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -204,6 +217,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -226,7 +241,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -299,13 +314,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * If COPY TO source data come from query rather than plain table, we need
+ * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query
+ * execution, and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -397,9 +445,21 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -499,7 +559,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f1def67ac7c..664b0483dbd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3530,6 +3530,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3612,6 +3616,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index e9d370cb3da..e517470bbc7 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -85,8 +85,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -516,8 +514,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6176741d20b..15fe7b37b0e 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3347,8 +3347,10 @@ match_previous_words(int pattern_id,
COMPLETE_WITH(Copy_to_options);
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
- else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
+ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "FORMAT"))
COMPLETE_WITH("binary", "csv", "text");
+ else if (Matches("COPY|\\copy", MatchAny, "TO", MatchAny, "WITH", "(", "FORMAT"))
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", "(", "ON_ERROR"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index ac66eb55aee..0fc6e84352c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a1316c73bac..071986d427a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..d85c2ec3c70 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -515,6 +515,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
[text/x-patch] v19-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.7K, 3-v19-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 2abacfaebc0fd5a37a3b797df92c3b3a85761afa Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v19 3/3] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format. it make the
output json output behave like json array type. refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 +++++++++++
src/backend/commands/copy.c | 13 +++++++++++
src/backend/commands/copyto.c | 37 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 +++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 219604ad306..c01927864bd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 7fd41dba250..0a22272f3fe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -514,6 +514,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -670,6 +671,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -925,6 +933,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 13eb14debbd..6fc1d3e9fee 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,10 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -128,6 +132,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -172,7 +177,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -238,6 +243,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -349,11 +364,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 15fe7b37b0e..a42fb6c0740 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1222,7 +1222,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 0fc6e84352c..22626a13ba5 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 071986d427a..0f121b48f71 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v19-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (12.7K, 4-v19-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From 6c3e515ce84bdba4e0c96f46d8d5543481d3e1ca Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 16:48:56 +0800
Subject: [PATCH v19 1/3] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
5 files changed, 48 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fae9c41db65..68f69cfb9df 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -521,6 +521,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -535,11 +537,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -699,31 +701,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -771,7 +773,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -779,43 +781,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -829,8 +831,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -845,8 +847,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -870,7 +872,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -906,7 +908,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -923,7 +925,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..ba31b227d5f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..e990343bab0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,9 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -215,7 +215,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -392,7 +392,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-11-10 00:53 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2025-11-10 00:53 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Oct 1, 2025 at 2:16 PM jian he <[email protected]> wrote:
>
> hi.
> v19 attached, same as v18.
> repost it so that CFbot can pick up the latest patchset.
hi.
new patch attached, rebase only.
Attachments:
[text/x-patch] v20-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 2-v20-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From 97e63d2b7de1fef820305b279d9e5602c82dab53 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 10 Nov 2025 08:39:36 +0800
Subject: [PATCH v20 1/3] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28e878c3688..d674ada98e4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -564,6 +564,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -578,11 +580,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -742,31 +744,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -814,7 +816,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -822,43 +824,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -872,8 +874,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -888,8 +890,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -913,7 +915,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -949,7 +951,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -966,7 +968,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..ba31b227d5f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b1ae97b833d..578e6c0c9a2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..c97f0460b3e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -181,9 +181,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -220,7 +220,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -397,7 +397,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432509277c9..256b5000af4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -516,6 +516,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
[text/x-patch] v20-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.7K, 3-v20-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From e42e36118f73fa1a98f698031e3f1f7cbb9150cf Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 30 Jul 2025 19:50:41 +0800
Subject: [PATCH v20 3/3] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format. it make the
output json output behave like json array type. refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 +++++++++++
src/backend/commands/copy.c | 13 +++++++++++
src/backend/commands/copyto.c | 37 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 +++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 2d6d6802cbd..d8d9fb173b4 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0ec9b22d20f..6f9ae3fbfd7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -557,6 +557,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -713,6 +714,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -968,6 +976,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index accf34e1a60..b58c5bdf987 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -86,6 +86,10 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -133,6 +137,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -177,7 +182,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -243,6 +248,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -354,11 +369,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 0fd06a31201..e550aa38a25 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 10333357d68..8becc70ee7a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 80799e2ead9..6a14cfc6e68 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v20-0002-json-format-for-COPY-TO.patch (22.2K, 4-v20-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 089630f0f4706b31c4c886d033e15ad20761dd6d Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 10 Nov 2025 08:40:26 +0800
Subject: [PATCH v20 2/3] json format for COPY TO
JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 72 +++++++++++++++++++++-------
src/backend/commands/copyto.c | 76 ++++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 ++++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 47 ++++++++++++++++++
10 files changed, 268 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index fdc24b36bb8..2d6d6802cbd 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d674ada98e4..0ec9b22d20f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -585,6 +585,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -744,21 +746,42 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+ }
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -824,11 +847,18 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER"));
+ else if(opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+ }
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -932,6 +962,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c97f0460b3e..accf34e1a60 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -130,6 +132,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,7 +152,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -169,6 +172,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -185,12 +196,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -209,6 +222,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -231,7 +246,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -304,13 +319,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * If COPY TO source data come from query rather than plain table, we need
+ * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query
+ * execution, and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -402,9 +450,21 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -504,7 +564,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 57fe0186547..7cbdadc98ac 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3557,6 +3557,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3639,6 +3643,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 06dd62f0008..647adafd227 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 316a2dafbf1..0fd06a31201 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3376,7 +3376,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO [PROGRAM] <sth> WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAnyExcept("PROGRAM"), "WITH", "(", "FORMAT") ||
Matches("COPY|\\copy", MatchAny, "FROM|TO", "PROGRAM", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") ||
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 24e0f472f14..10333357d68 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 676a8b342b5..80799e2ead9 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2025-11-29 02:46 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2025-11-29 02:46 UTC (permalink / raw)
To: Junwang Zhao <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
hi.
https://commitfest.postgresql.org/patch/4716/
says need rebase.
turns out only copy.sgml has conflict.
it conflict with
https://git.postgresql.org/cgit/postgresql.git/commit/?id=e4018f891dec09ff95ac97e5b1a2307349aeeffa
So the rebase only needs to update copy.sgml.
--
jian
https://www.enterprisedb.com/
Attachments:
[text/x-patch] v21-0002-json-format-for-COPY-TO.patch (22.2K, 2-v21-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 3ff9058b29e5410a976c1013484f55df27f132a5 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 10 Nov 2025 08:40:26 +0800
Subject: [PATCH v21 2/3] json format for COPY TO
JSON format is only supported with the COPY TO operation. It is incompatible
with options such as HEADER, DEFAULT, NULL, DELIMITER, and several others. This
has been thoroughly tested in src/test/regress/sql/copy.sql
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).
Author: Joe Conway <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 72 +++++++++++++++++++++-------
src/backend/commands/copyto.c | 76 ++++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 ++++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 76 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 47 ++++++++++++++++++
10 files changed, 268 insertions(+), 34 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 53b0ea8f573..320f5f1edd4 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d674ada98e4..0ec9b22d20f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -585,6 +585,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -744,21 +746,42 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ if (opts_out->null_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "NULL"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "NULL"));
+ }
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ if (opts_out->default_print)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in BINARY mode", "DEFAULT"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DEFAULT"));
+ }
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -824,11 +847,18 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ if (opts_out->header_line != COPY_HEADER_FALSE)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "HEADER"));
+ else if(opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify %s in JSON mode", "HEADER"));
+ }
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -932,6 +962,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c97f0460b3e..accf34e1a60 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -130,6 +132,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,7 +152,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -169,6 +172,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -185,12 +196,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -209,6 +222,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -231,7 +246,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -304,13 +319,46 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+ StringInfo result;
+
+ /*
+ * If COPY TO source data come from query rather than plain table, we need
+ * copy CopyToState->QueryDesc->TupleDesc to slot->tts_tupleDescriptor.
+ * This is necessary because the slot's TupleDesc may change during query
+ * execution, and we depend on it when calling composite_to_json.
+ */
+ if (!cstate->rel)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ }
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ CopySendData(cstate, result->data, result->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -402,9 +450,21 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * JSON format is always one non-binary column
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -504,7 +564,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c3a0a354a9c..dd23c075710 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3557,6 +3557,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3639,6 +3643,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 06dd62f0008..647adafd227 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 20d7a65c614..2029b1930c8 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3377,7 +3377,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO [PROGRAM] <sth> WITH (FORMAT */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAnyExcept("PROGRAM"), "WITH", "(", "FORMAT") ||
Matches("COPY|\\copy", MatchAny, "FROM|TO", "PROGRAM", MatchAny, "WITH", "(", "FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM [PROGRAM] filename WITH (ON_ERROR */
else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAnyExcept("PROGRAM"), "WITH", "(", "ON_ERROR") ||
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 686653233b2..85aedc267d6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -56,6 +56,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 49bbda7ac06..1fa8e2ce8e2 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 24e0f472f14..10333357d68 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,82 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY json mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 676a8b342b5..80799e2ead9 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,53 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
[text/x-patch] v21-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (9.7K, 3-v21-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 93b28c6521d5ea6f7a4e901acf910f19e8ef6528 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sat, 29 Nov 2025 10:37:01 +0800
Subject: [PATCH v21 3/3] Add option force_array for COPY JSON FORMAT
force_array option can only be used in COPY TO with JSON format. it make the
output json output behave like json array type. refactored by Junwang Zhao to
adapt the newly introduced CopyToRoutine struct(2e4127b6d2).
Author: Joe Conway <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 14 +++++++++++
src/backend/commands/copy.c | 13 +++++++++++
src/backend/commands/copyto.c | 37 +++++++++++++++++++++++++++++-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 +++++++++++++++++++
src/test/regress/sql/copy.sql | 8 +++++++
7 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 320f5f1edd4..a274118b0fa 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0ec9b22d20f..6f9ae3fbfd7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -557,6 +557,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -713,6 +714,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -968,6 +976,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "json", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index accf34e1a60..b58c5bdf987 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -86,6 +86,10 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* need delimiter to start next json array element */
+ bool json_row_delim_needed;
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -133,6 +137,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -177,7 +182,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -243,6 +248,16 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -354,11 +369,31 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
result = makeStringInfo();
composite_to_json(rowdata, result, false);
+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+
CopySendData(cstate, result->data, result->len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 2029b1930c8..c25c3cdb72b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 85aedc267d6..7274b0d3ca5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 10333357d68..8becc70ee7a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -112,6 +112,29 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
ERROR: COPY json mode cannot be used with COPY FROM
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only used with JSON mode
+--ok
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 80799e2ead9..6a14cfc6e68 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -101,6 +101,14 @@ copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-- all of the above should yield error
+--Error
+copy copytest to stdout (format csv, force_array true);
+
+--ok
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v21-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 4-v21-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From fbd5ef17532bc111e40d9352049fe9fd4e807aae Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 10 Nov 2025 08:39:36 +0800
Subject: [PATCH v21 1/3] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans, binary and
csv_mode, within CopyFormatOptions. This approach, while functional, isn't ideal
for future expansion.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 28e878c3688..d674ada98e4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -564,6 +564,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = (CopyFormatOptions *) palloc0(sizeof(CopyFormatOptions));
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -578,11 +580,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -742,31 +744,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -814,7 +816,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -822,43 +824,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -872,8 +874,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -888,8 +890,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -913,7 +915,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -949,7 +951,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -966,7 +968,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 12781963b4f..ba31b227d5f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,9 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -261,7 +261,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index a09e7fbace3..9cc5bb1cf5c 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -747,7 +747,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -774,7 +774,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..c97f0460b3e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -181,9 +181,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -220,7 +220,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -397,7 +397,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 541176e1980..686653233b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -48,6 +48,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -58,9 +68,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cf3f6a7dafd..6d1f2d5bb3b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -516,6 +516,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-04 15:51 Andrew Dunstan <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Dunstan @ 2026-03-04 15:51 UTC (permalink / raw)
To: jian he <[email protected]>; Junwang Zhao <[email protected]>; +Cc: Florents Tselai <[email protected]>; Joe Conway <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2026-02-08 Su 10:48 PM, jian he wrote:
> On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao<[email protected]> wrote:
>> Here are some comments on v23:
>>
>> 0001: The refactor looks straightforward to me. Introducing a format
>> field should make future extensions easier. One suggestion is that we
>> could add some helper macros around format, for example:
>>
>> #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
>> #define IS_FORMAT_TEXT_LIKE(format) \
>> (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)
>>
>> I think this would improve readability.
> Personally, I don't like marcos....
>
>> 0002: Since you have moved the `CopyFormat enum` into 0001, the
>> following commit msg should be rephrased.
>>
>> The CopyFormat enum was originally contributed by Joel Jacobson
>> [email protected], later refactored by Jian He to address various issues, and
>> further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
>> struct (commit 2e4127b6d2).
>>
>> - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
>> - ereport(ERROR,
>> - (errcode(ERRCODE_SYNTAX_ERROR),
>> - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> - errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
>> + if (opts_out->delim)
>> + {
>> + if (opts_out->format == COPY_FORMAT_BINARY)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
>> + errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
>> + else if (opts_out->format == COPY_FORMAT_JSON)
>> + ereport(ERROR,
>> + errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>> + }
>>
>> Can we add a function that converts CopyFormat to a string? Treating
>> CopyFormat as %s in error messages would make the code shorter.
>> However, I'm not sure whether this aligns with translation
>> conventions, correct me if I'm wrong.
>>
> I don’t think this is worth the added complexity.
> That said, I tried to simplify the code and changed it to:
>
> if (opts_out->delim &&
> (opts_out->format == COPY_FORMAT_BINARY ||
> opts_out->format == COPY_FORMAT_JSON))
> ereport(ERROR,
> errcode(ERRCODE_SYNTAX_ERROR),
> opts_out->format == COPY_FORMAT_BINARY
> ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
> : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
>
>> - * CSV and text formats share the same TextLike routines except for the
>> + * CSV and text, json formats share the same TextLike routines except for the
>>
>> I'd suggest rewording to `CSV, text and json ...`. The same applied to
>> other parts in this patch.
>>
> sure.
>
>> 0003: The commit message includes some changes(adapt the newly
>> introduced CopyToRoutine) that actually belong to 0002; it would be
>> better to remove them from this commit.
>>
> 0002 commit message:
> """
> This introduces the JSON format option for the COPY TO command, allowing users
> to export query results or table data directly as a single JSON object or a
> stream of JSON objects.
>
> The JSON format is currently supported only for COPY TO operations; it
> is not available for COPY FROM.
>
> JSON format is incompatible with some standard text/CSV parsing or
> formatting options,
> including:
> - HEADER
> - DEFAULT
> - NULL
> - DELIMITER
> - FORCE QUOTE / FORCE NOT NULL
>
> Regression tests covering valid JSON exports and error handling for
> incompatible options have been added to src/test/regress/sql/copy.sql.
> """
>
> 0003 commit message:
> """
> Add option force_array for COPY JSON FORMAT
> This adds the force_array option, which is available exclusively
> when using COPY TO with the JSON format.
>
> When enabled, this option wraps the output in a top-level JSON array
> (enclosed in square brackets with comma-separated elements), making the
> entire result a valid single JSON value. Without this option, the default
> behavior is to output a stream of independent JSON objects.
>
> Attempting to use this option with COPY FROM or with formats other than
> JSON will raise an error.
> """
>
>> + if (cstate->json_row_delim_needed && cstate->opts.force_array)
>> + CopySendChar(cstate, ',');
>> + else if (cstate->opts.force_array)
>> + {
>> + /* first row needs no delimiter */
>> + CopySendChar(cstate, ' ');
>> + cstate->json_row_delim_needed = true;
>> + }
>>
>> can we do this:
>>
>> if (cstate->opts.force_array)
>> {
>> if (cstate->json_row_delim_needed)
>> CopySendChar(cstate, ',');
>> else
>> {
>> /* first row needs no delimiter */
>> CopySendChar(cstate, ' ');
>> cstate->json_row_delim_needed = true;
>> }
>> }
>>
> good suggestion.
>
> If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can
> switch to it accordingly.
> The change itself is quite straightforward.
I have reworked these. First I cleaned up a number of things in patches
2 and 3 (Thanks, Claude for the summary):
Patch 2: json format for COPY TO
copy.c:
- "json" → "JSON" in the COPY FROM rejection error message.
copyto.c:
1. TupleDesc setup runs once, not every row — Added
json_tupledesc_ready flag; the
memcpy/populate_compact_attribute/BlessTupleDesc block is now guarded by
if (!cstate->json_tupledesc_ready).
2. Comment rewritten — Old: "the slot's TupleDesc may change during
query execution". New: explains BlessTupleDesc registers the RECORDOID
descriptor so lookup_rowtype_tupdesc inside composite_to_json can
find it.
3. Eliminated per-row makeStringInfo() — Added StringInfoData
json_buf to the struct, initialized once in CopyToTextLikeStart in
copycontext. Each row does resetStringInfo instead of allocating a new
StringInfo.
4. Column list rejection added — Error: "column selection is not
supported in JSON mode" when attnamelist != NIL.
5. Improved SendCopyBegin comment — Old: "JSON format is always one
non-binary column". New: explains each CopyData message contains one
complete JSON object.
Tests:
6. Added copy copytest (style) to stdout (format json) to the
error-case block.
7. Added copyjsontype table test with json, jsonb columns — verifies
values are embedded directly, not double-encoded. Covers json objects,
scalars, arrays, nested objects, and nulls.
Patch 3: Add option force_array
copy.c:
1. "json" → "JSON" in COPY FROM error (carried from patch 2).
2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY
error.
copyto.c:
3. Struct fields reorganized — JSON fields grouped under /* JSON
format state */ comment with inline descriptions, instead of a
standalone json_row_delim_needed with a vague comment.
4. Block comment updated — Was "CSV, text and json formats share the
same TextLike routines except for the one-row callback". Now correctly
notes JSON has its own one-row and end callbacks.
5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now
"text and CSV" (JSON uses CopyToJsonEnd).
6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside
the if (format == JSON) block after json_buf init, instead of a separate
top-level conditional.
7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf,
improved comments, column list rejection.
Tests:
8. --Error → -- should fail: force_array requires json format; --ok →
-- force_array variants.
9. copyjsontype test carried through from patch 2.
Then I reworked the way this works. In order to support column lists
with JSON output, we need to deal with individual columns instead of
whole records. This involved quite a number of changes, as can be seen
in patch 4. This involved exporting a new small function from json.c.
The result is a lot cleaner, I believe, and in my benchmarking is faster
by a factor of almost 2.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-05 20:49 Andrew Dunstan <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Dunstan @ 2026-03-05 20:49 UTC (permalink / raw)
To: Joe Conway <[email protected]>; jian he <[email protected]>; Junwang Zhao <[email protected]>; +Cc: Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
>
On 2026-03-05 Th 1:06 PM, Joe Conway wrote:
>>
>>
>> Then I reworked the way this works. In order to support column lists
>> with JSON output, we need to deal with individual columns instead of
>> whole records. This involved quite a number of changes, as can be
>> seen in patch 4. This involved exporting a new small function from
>> json.c.
>>
>> The result is a lot cleaner, I believe, and in my benchmarking is
>> faster by a factor of almost 2.
>
> Andrew,
>
> I don't see the actual patches. Did I miss it somewhere?
Nope. Bad hair day apparently.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Attachments:
[text/x-patch] v25-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 2-v25-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From b50c26e9e2a76001315fb3e5000f2d33e254c741 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 21 Jan 2026 18:38:24 +0800
Subject: [PATCH v25 1/4] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans (binary, csv_mode)
fields in CopyFormatOptions This approach, while functional, isn't ideal for
future other implement other format.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson <[email protected]>,
later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 63b86802ba2..2f46be516f2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -576,6 +576,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = palloc0_object(CopyFormatOptions);
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -590,11 +592,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -754,31 +756,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -826,7 +828,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -834,43 +836,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -884,8 +886,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -900,8 +902,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -925,7 +927,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -961,7 +963,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -978,7 +980,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..4d927410159 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -156,9 +156,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -262,7 +262,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fbd13353efc..c366874bd95 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -750,7 +750,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -777,7 +777,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..0325a16f82a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -181,9 +181,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -220,7 +220,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -397,7 +397,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 877202af67b..2430fb0b2e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -59,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..8399be97fd5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -528,6 +528,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.43.0
[text/x-patch] v25-0002-json-format-for-COPY-TO.patch (22.2K, 3-v25-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 285be0d7935b5505b6cc1509742f3fc87e19ea09 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 9 Feb 2026 11:00:06 +0800
Subject: [PATCH v25 2/4] json format for COPY TO
---
doc/src/sgml/ref/copy.sgml | 13 ++--
src/backend/commands/copy.c | 49 ++++++++++----
src/backend/commands/copyto.c | 101 ++++++++++++++++++++++++++---
src/backend/parser/gram.y | 8 +++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 78 ++++++++++++++++++++++
src/test/regress/sql/copy.sql | 48 ++++++++++++++
10 files changed, 278 insertions(+), 29 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0ad890ef95f..75f55bbf6f8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f46be516f2..29c121c7f08 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -597,6 +597,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -756,21 +758,32 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
+ if (opts_out->delim &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
+ : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
+ if (opts_out->null_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "NULL")
+ : errmsg("cannot specify %s in JSON mode", "NULL"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
+ if (opts_out->default_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DEFAULT")
+ : errmsg("cannot specify %s in JSON mode", "DEFAULT"));
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -836,11 +849,15 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->header_line != COPY_HEADER_FALSE &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "HEADER")
+ : errmsg("cannot specify %s in JSON mode", "HEADER"));
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -944,6 +961,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 0325a16f82a..96605079eeb 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -84,6 +86,11 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+
+ /* JSON format state */
+ bool json_tupledesc_ready; /* TupleDesc setup done for JSON */
+ StringInfoData json_buf; /* reusable buffer for JSON output */
+
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -130,6 +137,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,7 +157,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV and text formats share the same TextLike routines except for the
+ * CSV, text and json formats share the same TextLike routines except for the
* one-row callback.
*/
@@ -169,6 +177,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -185,12 +201,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -209,6 +227,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -228,10 +248,21 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /* JSON-specific initialization */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ MemoryContext oldcxt;
+
+ /* Allocate reusable JSON output buffer in long-lived context */
+ oldcxt = MemoryContextSwitchTo(cstate->copycontext);
+ initStringInfo(&cstate->json_buf);
+ MemoryContextSwitchTo(oldcxt);
+ }
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -304,13 +335,47 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+
+ /*
+ * For query-based COPY, copy the query's TupleDesc attributes into the
+ * slot's TupleDesc once. BlessTupleDesc registers the RECORDOID
+ * descriptor so that lookup_rowtype_tupdesc inside composite_to_json can
+ * find it.
+ */
+ if (!cstate->rel && !cstate->json_tupledesc_ready)
+ {
+ memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
+ TupleDescAttr(cstate->queryDesc->tupDesc, 0),
+ cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
+
+ for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
+ populate_compact_attribute(slot->tts_tupleDescriptor, i);
+
+ BlessTupleDesc(slot->tts_tupleDescriptor);
+ cstate->json_tupledesc_ready = true;
+ }
+
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ resetStringInfo(&cstate->json_buf);
+ composite_to_json(rowdata, &cstate->json_buf, false);
+
+ CopySendData(cstate, cstate->json_buf.data, cstate->json_buf.len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -402,9 +467,23 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * For JSON format, report one text-format column. Each CopyData
+ * message contains one complete JSON object, not individual column
+ * values, so the per-column count is always 1.
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -506,7 +585,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
@@ -890,6 +969,12 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* JSON outputs whole rows; a column list doesn't make sense */
+ if (cstate->opts.format == COPY_FORMAT_JSON && attnamelist != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("column selection is not supported in JSON mode")));
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c567252acc4..db98f2d91bf 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3609,6 +3609,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3691,6 +3695,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 0b161398465..f609d7b9417 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 905c076763c..d257837b0c5 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3425,7 +3425,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (TailMatches("FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM|TO filename WITH (FREEZE */
else if (TailMatches("FREEZE"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2430fb0b2e5..2b5bef6738e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f8cc52b1e78..2f4be40518d 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d0d563e0fa8..4ea658a45de 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,84 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest from stdin(format json);
+ERROR: COPY JSON mode cannot be used with COPY FROM
+copy copytest (style) to stdout (format json);
+ERROR: column selection is not supported in JSON mode
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 65cbdaf7f3e..c558eba202a 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,54 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest from stdin(format json);
+copy copytest (style) to stdout (format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.43.0
[text/x-patch] v25-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (10.7K, 4-v25-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 312dedc0a3aea1833bb992f096121db4086fd075 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 9 Feb 2026 11:14:15 +0800
Subject: [PATCH v25 3/4] Add option force_array for COPY JSON FORMAT
---
doc/src/sgml/ref/copy.sgml | 30 +++++++++++++++++++++
src/backend/commands/copy.c | 13 +++++++++
src/backend/commands/copyto.c | 43 +++++++++++++++++++++++++++---
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 23 ++++++++++++++++
src/test/regress/sql/copy.sql | 8 ++++++
7 files changed, 115 insertions(+), 5 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 75f55bbf6f8..a79587f7613 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
@@ -1103,6 +1117,22 @@ COPY country TO STDOUT (DELIMITER '|');
</programlisting>
</para>
+<para>
+ When the <literal>FORCE_ARRAY</literal> option is enabled,
+ the entire output is wrapped in a single JSON array with rows separated by commas:
+<programlisting>
+COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
+</programlisting>
+The output is as follows:
+<screen>
+[
+ {"id":1}
+,{"id":2}
+]
+</screen>
+</para>
+
+
<para>
To copy data from a file into the <literal>country</literal> table:
<programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 29c121c7f08..84254d46a67 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -569,6 +569,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -725,6 +726,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -967,6 +975,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only be used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 96605079eeb..a7615cc34ec 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -88,6 +88,7 @@ typedef struct CopyToStateData
bool is_program; /* is 'filename' a program to popen? */
/* JSON format state */
+ bool json_row_delim_needed; /* need delimiter before next row */
bool json_tupledesc_ready; /* TupleDesc setup done for JSON */
StringInfoData json_buf; /* reusable buffer for JSON output */
@@ -138,6 +139,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -157,8 +159,9 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
*
- * CSV, text and json formats share the same TextLike routines except for the
- * one-row callback.
+ * Text and CSV formats share the same TextLike routines except for the
+ * one-row callback. JSON shares the start and outfunc callbacks with
+ * text/CSV, but has its own one-row and end callbacks.
*/
/* text format */
@@ -182,7 +185,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -258,6 +261,15 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
oldcxt = MemoryContextSwitchTo(cstate->copycontext);
initStringInfo(&cstate->json_buf);
MemoryContextSwitchTo(oldcxt);
+
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
}
@@ -335,7 +347,7 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text, CSV, and json formats */
+/* Implementation of the end callback for text and CSV formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
@@ -371,11 +383,34 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
resetStringInfo(&cstate->json_buf);
composite_to_json(rowdata, &cstate->json_buf, false);
+ if (cstate->opts.force_array)
+ {
+ if (cstate->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+ }
+
CopySendData(cstate, cstate->json_buf.data, cstate->json_buf.len);
CopySendTextLikeEndOfRow(cstate);
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index d257837b0c5..76d3258e92b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2b5bef6738e..abecfe51098 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -88,6 +88,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 4ea658a45de..a7e88b711d7 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -114,6 +114,29 @@ ERROR: COPY JSON mode cannot be used with COPY FROM
copy copytest (style) to stdout (format json);
ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only be used with JSON mode
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index c558eba202a..ae202fc5e8d 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -102,6 +102,14 @@ copy copytest from stdin(format json);
copy copytest (style) to stdout (format json);
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.43.0
[text/x-patch] v25-0004-COPY-TO-JSON-build-JSON-per-column-support-colum.patch (13.3K, 5-v25-0004-COPY-TO-JSON-build-JSON-per-column-support-colum.patch)
download | inline diff:
From d043adcbb2032eb2c1218df99992008838f2e200 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Wed, 4 Mar 2026 09:40:10 -0500
Subject: [PATCH v25 4/4] COPY TO JSON: build JSON per-column, support column
lists
Rework CopyToJsonOneRow to iterate attnumlist and build JSON objects
column-by-column using datum_to_json_append, instead of converting the
whole row via ExecFetchSlotHeapTupleDatum + composite_to_json.
This has several benefits:
- Column lists now work with JSON format (previously rejected)
- Per-column JSON type categorization is done once at startup rather
than on every row (composite_to_json called json_categorize_type
per column per row)
- The TupleDesc memcpy/BlessTupleDesc hack for query-based COPY is
eliminated entirely
- Pre-escaped column key strings avoid repeated escape_json calls
Add CopyToJsonStart (pre-computes escaped key strings and json_buf),
CopyToJsonOutFunc (calls json_categorize_type once per column), and
export datum_to_json_append from json.c for efficient append-to-
StringInfo JSON serialization.
---
src/backend/commands/copyto.c | 159 ++++++++++++++++++-----------
src/backend/utils/adt/json.c | 14 +++
src/include/utils/jsonfuncs.h | 2 +
src/test/regress/expected/copy.out | 8 +-
src/test/regress/sql/copy.sql | 4 +-
5 files changed, 126 insertions(+), 61 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index a7615cc34ec..9502c910b43 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,7 +26,6 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
-#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -35,6 +34,7 @@
#include "storage/fd.h"
#include "tcop/tcopprot.h"
#include "utils/json.h"
+#include "utils/jsonfuncs.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -89,8 +89,10 @@ typedef struct CopyToStateData
/* JSON format state */
bool json_row_delim_needed; /* need delimiter before next row */
- bool json_tupledesc_ready; /* TupleDesc setup done for JSON */
StringInfoData json_buf; /* reusable buffer for JSON output */
+ JsonTypeCategory *json_categories; /* per-column JSON type categories */
+ Oid *json_outfuncoids; /* per-column JSON output func OIDs */
+ char **json_col_keys; /* per-column pre-escaped "key": strings */
copy_data_dest_cb data_dest_cb; /* function for writing data */
@@ -138,6 +140,8 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToJsonOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
@@ -160,8 +164,8 @@ static void CopySendInt16(CopyToState cstate, int16 val);
* COPY TO routines for built-in formats.
*
* Text and CSV formats share the same TextLike routines except for the
- * one-row callback. JSON shares the start and outfunc callbacks with
- * text/CSV, but has its own one-row and end callbacks.
+ * one-row callback. JSON has its own start, outfunc, one-row, and end
+ * callbacks.
*/
/* text format */
@@ -182,8 +186,8 @@ static const CopyToRoutine CopyToRoutineCSV = {
/* json format */
static const CopyToRoutine CopyToRoutineJson = {
- .CopyToStart = CopyToTextLikeStart,
- .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToStart = CopyToJsonStart,
+ .CopyToOutFunc = CopyToJsonOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
.CopyToEnd = CopyToJsonEnd,
};
@@ -211,7 +215,7 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text, CSV, and json formats */
+/* Implementation of the start callback for text and CSV formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -230,8 +234,6 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
- Assert(cstate->opts.format != COPY_FORMAT_JSON);
-
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -251,30 +253,10 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
-
- /* JSON-specific initialization */
- if (cstate->opts.format == COPY_FORMAT_JSON)
- {
- MemoryContext oldcxt;
-
- /* Allocate reusable JSON output buffer in long-lived context */
- oldcxt = MemoryContextSwitchTo(cstate->copycontext);
- initStringInfo(&cstate->json_buf);
- MemoryContextSwitchTo(oldcxt);
-
- /*
- * If FORCE_ARRAY has been specified, send the opening bracket.
- */
- if (cstate->opts.force_array)
- {
- CopySendChar(cstate, '[');
- CopySendTextLikeEndOfRow(cstate);
- }
- }
}
/*
- * Implementation of the outfunc callback for text, CSV, and json formats. Assign
+ * Implementation of the outfunc callback for text and CSV formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -354,34 +336,95 @@ CopyToTextLikeEnd(CopyToState cstate)
/* Nothing to do here */
}
+/*
+ * Implementation of the start callback for json format.
+ *
+ * Pre-compute the escaped JSON key strings ('"colname":') for each selected
+ * column so CopyToJsonOneRow only needs to copy them per row.
+ */
+static void
+CopyToJsonStart(CopyToState cstate, TupleDesc tupDesc)
+{
+ MemoryContext oldcxt;
+ StringInfoData keybuf;
+
+ oldcxt = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Allocate reusable JSON output buffer */
+ initStringInfo(&cstate->json_buf);
+
+ /* Pre-build escaped key strings: "\"colname\":" */
+ cstate->json_col_keys = palloc0(tupDesc->natts * sizeof(char *));
+ initStringInfo(&keybuf);
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ char *colname;
+
+ colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+ resetStringInfo(&keybuf);
+ escape_json(&keybuf, colname);
+ appendStringInfoChar(&keybuf, ':');
+
+ cstate->json_col_keys[attnum - 1] = pstrdup(keybuf.data);
+ }
+ pfree(keybuf.data);
+
+ MemoryContextSwitchTo(oldcxt);
+
+ /* If FORCE_ARRAY, send the opening bracket */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
+/*
+ * Implementation of the outfunc callback for json format.
+ *
+ * Instead of text output functions, we categorize each column's type for
+ * JSON serialization once so CopyToJsonOneRow can use datum_to_json_append
+ * directly.
+ */
+static void
+CopyToJsonOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+ int attidx = finfo - cstate->out_functions;
+
+ json_categorize_type(atttypid, false,
+ &cstate->json_categories[attidx],
+ &cstate->json_outfuncoids[attidx]);
+}
+
/* Implementation of per-row callback for json format */
static void
CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
{
- Datum rowdata;
+ bool needsep = false;
- /*
- * For query-based COPY, copy the query's TupleDesc attributes into the
- * slot's TupleDesc once. BlessTupleDesc registers the RECORDOID
- * descriptor so that lookup_rowtype_tupdesc inside composite_to_json can
- * find it.
- */
- if (!cstate->rel && !cstate->json_tupledesc_ready)
- {
- memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0),
- TupleDescAttr(cstate->queryDesc->tupDesc, 0),
- cstate->queryDesc->tupDesc->natts * sizeof(FormData_pg_attribute));
-
- for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++)
- populate_compact_attribute(slot->tts_tupleDescriptor, i);
-
- BlessTupleDesc(slot->tts_tupleDescriptor);
- cstate->json_tupledesc_ready = true;
- }
-
- rowdata = ExecFetchSlotHeapTupleDatum(slot);
resetStringInfo(&cstate->json_buf);
- composite_to_json(rowdata, &cstate->json_buf, false);
+ appendStringInfoChar(&cstate->json_buf, '{');
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
+ if (needsep)
+ appendStringInfoChar(&cstate->json_buf, ',');
+ needsep = true;
+
+ /* Append pre-escaped "key": */
+ appendStringInfoString(&cstate->json_buf,
+ cstate->json_col_keys[attnum - 1]);
+
+ datum_to_json_append(value, isnull, &cstate->json_buf,
+ cstate->json_categories[attnum - 1],
+ cstate->json_outfuncoids[attnum - 1]);
+ }
+
+ appendStringInfoChar(&cstate->json_buf, '}');
if (cstate->opts.force_array)
{
@@ -1004,12 +1047,6 @@ BeginCopyTo(ParseState *pstate,
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
- /* JSON outputs whole rows; a column list doesn't make sense */
- if (cstate->opts.format == COPY_FORMAT_JSON && attnamelist != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("column selection is not supported in JSON mode")));
-
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
@@ -1204,6 +1241,12 @@ DoCopyTo(CopyToState cstate)
/* Get info about the columns we need to process. */
cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ /* JSON outfunc callback stores per-column type categorization here */
+ cstate->json_categories = palloc0(num_phys_attrs * sizeof(JsonTypeCategory));
+ cstate->json_outfuncoids = palloc0(num_phys_attrs * sizeof(Oid));
+ }
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index f609d7b9417..de81160a831 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -771,6 +771,20 @@ datum_to_json(Datum val, JsonTypeCategory tcategory, Oid outfuncoid)
return PointerGetDatum(cstring_to_text_with_len(result.data, result.len));
}
+/*
+ * Append JSON representation of a Datum to a StringInfo.
+ *
+ * tcategory and outfuncoid are from a previous call to json_categorize_type.
+ * If is_null is true, appends "null" regardless of tcategory/outfuncoid.
+ */
+void
+datum_to_json_append(Datum val, bool is_null, StringInfo result,
+ JsonTypeCategory tcategory, Oid outfuncoid)
+{
+ datum_to_json_internal(val, is_null, result, tcategory, outfuncoid,
+ false);
+}
+
/*
* json_agg transition function
*
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 636f0f55840..12a01451fbb 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -85,6 +85,8 @@ extern void json_categorize_type(Oid typoid, bool is_jsonb,
JsonTypeCategory *tcategory, Oid *outfuncoid);
extern Datum datum_to_json(Datum val, JsonTypeCategory tcategory,
Oid outfuncoid);
+extern void datum_to_json_append(Datum val, bool is_null, StringInfo result,
+ JsonTypeCategory tcategory, Oid outfuncoid);
extern Datum datum_to_jsonb(Datum val, JsonTypeCategory tcategory,
Oid outfuncoid);
extern Datum jsonb_from_text(text *js, bool unique_keys);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index a7e88b711d7..d60e5a4d32a 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -111,8 +111,6 @@ LINE 1: copy copytest to stdout (format json, on_error ignore);
^
copy copytest from stdin(format json);
ERROR: COPY JSON mode cannot be used with COPY FROM
-copy copytest (style) to stdout (format json);
-ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
-- should fail: force_array requires json format
copy copytest to stdout (format csv, force_array true);
@@ -137,6 +135,12 @@ copy copytest to stdout (format json, force_array false);
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+{"style":"DOS","filler":1}
+{"style":"Unix","filler":2}
+{"style":"Mac","filler":3}
+{"style":"esc\\ape","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index ae202fc5e8d..d64f4c66b93 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -99,7 +99,6 @@ copy copytest to stdout (format json, force_not_null *);
copy copytest to stdout (format json, force_null *);
copy copytest to stdout (format json, on_error ignore);
copy copytest from stdin(format json);
-copy copytest (style) to stdout (format json);
-- all of the above should yield error
-- should fail: force_array requires json format
@@ -110,6 +109,9 @@ copy copytest to stdout (format json, force_array);
copy copytest to stdout (format json, force_array true);
copy copytest to stdout (format json, force_array false);
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.43.0
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-06 09:38 jian he <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2026-03-06 09:38 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
COPY (SELECT 1 UNION ALL SELECT 2) TO stdout WITH (format json);
still fails with v25-0002, json_tupledesc_ready is not helpful.
I think I figured it out. We need to use BlessTupleDesc in BeginCopyTo.
Then let slot->tts_tupleDescriptor point to cstate->queryDesc->tupDesc
in CopyToJsonOneRow
* CSV, text and json formats share the same TextLike routines except for the
* one-row callback.
This comment is not useful, I want to delete it.
CopyToTextLikeStart
+ /* JSON-specific initialization */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ MemoryContext oldcxt;
+
+ /* Allocate reusable JSON output buffer in long-lived context */
+ oldcxt = MemoryContextSwitchTo(cstate->copycontext);
+ initStringInfo(&cstate->json_buf);
+ MemoryContextSwitchTo(oldcxt);
+ }
We ca just add
cstate->json_buf = makeStringInfo();
in BeginCopyTo.
v25-0004-COPY-TO-JSON-build-JSON-per-column-support-colum.patch
added several fields to the CopyToStateData.
Actually, there is a simpler way (construct a new Tupdesc and let
composite_to_json do the job), please see my v26-0004.
--
jian
https://www.enterprisedb.com/
Attachments:
[text/x-patch] v26-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (12.3K, 2-v26-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 61ea6a492ef900bf8745455847e2d9cffb396152 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 6 Mar 2026 14:47:58 +0800
Subject: [PATCH v26 3/4] Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the default
behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Florents Tselai <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 30 ++++++++++++++++++++++
src/backend/commands/copy.c | 13 ++++++++++
src/backend/commands/copyto.c | 41 ++++++++++++++++++++++++++++--
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 33 ++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 10 ++++++++
7 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 75f55bbf6f8..a79587f7613 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
@@ -1103,6 +1117,22 @@ COPY country TO STDOUT (DELIMITER '|');
</programlisting>
</para>
+<para>
+ When the <literal>FORCE_ARRAY</literal> option is enabled,
+ the entire output is wrapped in a single JSON array with rows separated by commas:
+<programlisting>
+COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
+</programlisting>
+The output is as follows:
+<screen>
+[
+ {"id":1}
+,{"id":2}
+]
+</screen>
+</para>
+
+
<para>
To copy data from a file into the <literal>country</literal> table:
<programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 29c121c7f08..84254d46a67 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -569,6 +569,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -725,6 +726,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -967,6 +975,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only be used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e87310ec5a0..4ea44daee0a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -86,6 +86,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter before next row */
StringInfo json_buf; /* reusable buffer for JSON output, it is
* initliazed in BeginCopyTo */
copy_data_dest_cb data_dest_cb; /* function for writing data */
@@ -135,6 +136,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -176,7 +178,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -242,6 +244,18 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+ }
}
/*
@@ -318,13 +332,24 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text, CSV, and json formats */
+/* Implementation of the end callback for text and CSV formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/* Implementation of per-row callback for json format */
static void
CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
@@ -345,6 +370,18 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
rowdata = ExecFetchSlotHeapTupleDatum(slot);
composite_to_json(rowdata, cstate->json_buf, false);
+ if (cstate->opts.force_array)
+ {
+ if (cstate->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+ }
+
CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
CopySendTextLikeEndOfRow(cstate);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 0d9649c1f0a..4b18cc6e2cd 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2b5bef6738e..abecfe51098 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -88,6 +88,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 4324e3e4961..309a33ca2e7 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -80,6 +80,16 @@ copy (select 1 union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
{"column1":1}
{"column1":2}
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+[
+ {"?column?":1}
+,{"?column?":2}
+]
+copy (values (1), (2)) TO stdout with (format json, force_array true);
+[
+ {"column1":1}
+,{"column1":2}
+]
copy copytest to stdout json;
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
@@ -122,6 +132,29 @@ ERROR: COPY JSON mode cannot be used with COPY FROM
copy copytest (style) to stdout (format json);
ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only be used with JSON mode
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 4e9f74537f8..8a20907dd4c 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -85,6 +85,8 @@ copy copytest3 to stdout csv header;
--- test copying in JSON mode with various styles
copy (select 1 union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+copy (values (1), (2)) TO stdout with (format json, force_array true);
copy copytest to stdout json;
copy copytest to stdout (format json);
@@ -105,6 +107,14 @@ copy copytest from stdin(format json);
copy copytest (style) to stdout (format json);
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v26-0004-COPY-TO-JSON-support-column-lists.patch (6.1K, 3-v26-0004-COPY-TO-JSON-support-column-lists.patch)
download | inline diff:
From c531737641b7c0951b29b80144362168d1f96aa1 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 6 Mar 2026 17:30:23 +0800
Subject: [PATCH v26 4/4] COPY TO JSON support column lists
Author: Andrew Dunstan <[email protected]>
Reviewed-by: jian he <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copyto.c | 54 +++++++++++++++++++++++++-----
src/test/regress/expected/copy.out | 15 +++++++--
src/test/regress/sql/copy.sql | 5 ++-
3 files changed, 63 insertions(+), 11 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4ea44daee0a..a45ef6bcab3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -89,6 +89,8 @@ typedef struct CopyToStateData
bool json_row_delim_needed; /* need delimiter before next row */
StringInfo json_buf; /* reusable buffer for JSON output, it is
* initliazed in BeginCopyTo */
+ TupleDesc tupDesc; /* Descriptor for the COPY TO column list.
+ * This is only necessary for JSON output. */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -359,11 +361,17 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
/*
* composite_to_json() requires a stable TupleDesc. The slot's descriptor
* (slot->tts_tupleDescriptor) may change during the execution of a SELECT
- * query, using cstate->queryDesc instead. No need worry this if COPY TO
- * is directly from a table.
+ * query, using cstate->queryDesc instead.
+ *
+ * No need worry this if COPY TO is directly from a table, howeever when a
+ * direct COPY TO from a table with a subset of columns in JSON mode, the
+ * default slot's descriptor is obviously not OK, use the dedicated
+ * TupleDesc constructed in BeginCopyTO.
*/
- if (!cstate->rel)
- slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+ if (cstate->rel && (RelationGetDescr(cstate->rel) != cstate->tupDesc))
+ ReleaseTupleDesc(slot->tts_tupleDescriptor);
+
+ slot->tts_tupleDescriptor = cstate->tupDesc;
resetStringInfo(cstate->json_buf);
@@ -839,6 +847,7 @@ BeginCopyTo(ParseState *pstate,
tupDesc = RelationGetDescr(cstate->rel);
cstate->partitions = children;
+ cstate->tupDesc = tupDesc;
}
else
{
@@ -976,6 +985,7 @@ BeginCopyTo(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
tupDesc = BlessTupleDesc(tupDesc);
+ cstate->tupDesc = tupDesc;
}
/* Generate or convert list of attributes to process */
@@ -986,10 +996,38 @@ BeginCopyTo(ParseState *pstate,
{
cstate->json_buf = makeStringInfo();
- if (attnamelist != NIL)
- ereport(ERROR,
- errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("column selection is not supported in JSON mode"));
+ if (attnamelist != NIL && rel)
+ {
+ char *attname;
+ Oid atttypid;
+ int32 atttypmod;
+ int attdim;
+ TupleDesc resultDesc;
+
+ /*
+ * allocate a new tuple descriptor
+ */
+ resultDesc = CreateTemplateTupleDesc(list_length(cstate->attnumlist));
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ attname = NameStr(attr->attname);
+ atttypid = attr->atttypid;
+ atttypmod = attr->atttypmod;
+ attdim = attr->attndims;
+
+ TupleDescInitEntry(resultDesc,
+ foreach_current_index(attnum) + 1,
+ attname,
+ atttypid,
+ atttypmod,
+ attdim);
+ }
+
+ cstate->tupDesc = BlessTupleDesc(resultDesc);
+ }
}
num_phys_attrs = tupDesc->natts;
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 309a33ca2e7..3ffb08bfada 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -129,8 +129,6 @@ copy copytest to stdout (format json, reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
copy copytest from stdin(format json);
ERROR: COPY JSON mode cannot be used with COPY FROM
-copy copytest (style) to stdout (format json);
-ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
-- should fail: force_array requires json format
copy copytest to stdout (format csv, force_array true);
@@ -155,6 +153,19 @@ copy copytest to stdout (format json, force_array false);
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+{"style":"DOS","filler":1667391763}
+{"style":"Unix","filler":1701055075}
+{"style":"Mac","filler":1667391761}
+{"style":"esc\\ape","filler":1918656791}
+copy copytest (style, filler) to stdout (format json, force_array true);
+[
+ {"style":"DOS","filler":1667391763}
+,{"style":"Unix","filler":1701055075}
+,{"style":"Mac","filler":1667391761}
+,{"style":"esc\\ape","filler":1918656791}
+]
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 8a20907dd4c..dae97bebb7d 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -104,7 +104,6 @@ copy copytest to stdout (format json, force_null *);
copy copytest to stdout (format json, on_error ignore);
copy copytest to stdout (format json, reject_limit 1);
copy copytest from stdin(format json);
-copy copytest (style) to stdout (format json);
-- all of the above should yield error
-- should fail: force_array requires json format
@@ -115,6 +114,10 @@ copy copytest to stdout (format json, force_array);
copy copytest to stdout (format json, force_array true);
copy copytest to stdout (format json, force_array false);
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+copy copytest (style, filler) to stdout (format json, force_array true);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v26-0002-json-format-for-COPY-TO.patch (23.3K, 4-v26-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 7244131f4710228f3e3998524dcd930e349e00bd Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Fri, 6 Mar 2026 14:32:48 +0800
Subject: [PATCH v26 2/4] json format for COPY TO
This introduces the JSON format option for the COPY TO command, allowing users
to export query results or table data directly as a single JSON object or a
stream of JSON objects.
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV parsing or formatting options,
including:
- HEADER
- DEFAULT
- NULL
- DELIMITER
- FORCE QUOTE / FORCE NOT NULL
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 49 ++++++++++++-----
src/backend/commands/copyto.c | 85 +++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 +++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 86 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 51 ++++++++++++++++++
10 files changed, 271 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0ad890ef95f..75f55bbf6f8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f46be516f2..29c121c7f08 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -597,6 +597,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -756,21 +758,32 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
+ if (opts_out->delim &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
+ : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
+ if (opts_out->null_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "NULL")
+ : errmsg("cannot specify %s in JSON mode", "NULL"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
+ if (opts_out->default_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DEFAULT")
+ : errmsg("cannot specify %s in JSON mode", "DEFAULT"));
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -836,11 +849,15 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->header_line != COPY_HEADER_FALSE &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "HEADER")
+ : errmsg("cannot specify %s in JSON mode", "HEADER"));
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -944,6 +961,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 0325a16f82a..e87310ec5a0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -84,6 +86,8 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ StringInfo json_buf; /* reusable buffer for JSON output, it is
+ * initliazed in BeginCopyTo */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -130,6 +134,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -148,9 +153,6 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
- *
- * CSV and text formats share the same TextLike routines except for the
- * one-row callback.
*/
/* text format */
@@ -169,6 +171,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -185,12 +195,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -209,6 +221,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -231,7 +245,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -304,13 +318,38 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+
+ /*
+ * composite_to_json() requires a stable TupleDesc. The slot's descriptor
+ * (slot->tts_tupleDescriptor) may change during the execution of a SELECT
+ * query, using cstate->queryDesc instead. No need worry this if COPY TO
+ * is directly from a table.
+ */
+ if (!cstate->rel)
+ slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+
+ resetStringInfo(cstate->json_buf);
+
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ composite_to_json(rowdata, cstate->json_buf, false);
+
+ CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -402,9 +441,23 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * For JSON format, report one text-format column. Each CopyData
+ * message contains one complete JSON object, not individual column
+ * values, so the per-column count is always 1.
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -506,7 +559,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
@@ -885,11 +938,23 @@ BeginCopyTo(ParseState *pstate,
ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
+ tupDesc = BlessTupleDesc(tupDesc);
}
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* JSON outputs whole rows; a column list doesn't make sense */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ cstate->json_buf = makeStringInfo();
+
+ if (attnamelist != NIL)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("column selection is not supported in JSON mode"));
+ }
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3c3e24324a8..40ad9073901 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3612,6 +3612,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3694,6 +3698,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 0b161398465..f609d7b9417 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index f8c0865ca89..0d9649c1f0a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3425,7 +3425,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (TailMatches("FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM|TO filename WITH (FREEZE */
else if (TailMatches("FREEZE"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2430fb0b2e5..2b5bef6738e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f8cc52b1e78..2f4be40518d 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d0d563e0fa8..4324e3e4961 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,92 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+{"?column?":1}
+{"?column?":2}
+copy (values (1), (2)) TO stdout with (format json);
+{"column1":1}
+{"column1":2}
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest to stdout (format json, reject_limit 1);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+copy copytest from stdin(format json);
+ERROR: COPY JSON mode cannot be used with COPY FROM
+copy copytest (style) to stdout (format json);
+ERROR: column selection is not supported in JSON mode
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 65cbdaf7f3e..4e9f74537f8 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,57 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+copy (values (1), (2)) TO stdout with (format json);
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest to stdout (format json, reject_limit 1);
+copy copytest from stdin(format json);
+copy copytest (style) to stdout (format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
[text/x-patch] v26-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 5-v26-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From e4747ff8a87e79edb8fd3b7778bcbe8a3a6e85f7 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 21 Jan 2026 18:38:24 +0800
Subject: [PATCH v26 1/4] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans (binary, csv_mode)
fields in CopyFormatOptions This approach, while functional, isn't ideal for
future other implement other format.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson <[email protected]>,
later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 63b86802ba2..2f46be516f2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -576,6 +576,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = palloc0_object(CopyFormatOptions);
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -590,11 +592,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -754,31 +756,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -826,7 +828,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -834,43 +836,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -884,8 +886,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -900,8 +902,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -925,7 +927,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -961,7 +963,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -978,7 +980,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..4d927410159 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -156,9 +156,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -262,7 +262,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fbd13353efc..c366874bd95 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -750,7 +750,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -777,7 +777,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..0325a16f82a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -181,9 +181,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -220,7 +220,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -397,7 +397,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 877202af67b..2430fb0b2e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -59,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..8399be97fd5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -528,6 +528,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-08 16:16 jian he <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2026-03-08 16:16 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
hi.
V27-0002 is still not bullet-proof.
drop table if exists t1;
create table t1(a int);
insert into t1 values (1);
copy (select * from t1) to stdout json;
{"a":1}
WARNING: resource was not closed: TupleDesc 0x7171d0ca3440 (18239,-1)
Also see ExecAssignScanProjectionInfo->ExecConditionalAssignProjectionInfo
So in v28-0002, I changed to
+ /*
+ * composite_to_json() requires a stable TupleDesc. Since the slot's
+ * descriptor (slot->tts_tupleDescriptor) can change during the execution
+ * of a SELECT query, we use cstate->queryDesc->tupDesc instead. This
+ * precaution is only necessary when the output slot's TupleDesc is of
+ * type RECORDOID.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+ cstate->json_projvalues = (Datum *) palloc(natts * sizeof(Datum));
+ cstate->json_projnulls = (bool *) palloc(natts * sizeof(bool));
I changed it to
+ cstate->json_projvalues = palloc_array(Datum, natts);
+ cstate->json_projnulls = palloc_array(bool, natts);
+ rowdata = HeapTupleHeaderGetDatum(tup->t_data);
I changed it to
+ rowdata = HeapTupleGetDatum(tup);
Patch v28-0004 adds the json_projvalues and json_projnulls pointers to struct
CopyToStateData. I wondered if adding these would slow the COPY TO with TEXT and
CSV format, so I ran a quick test using a 36-column table.
Surprisingly, v28 actually make COPY TO with TEXT and CSV performs a little bit
faster. But I didn't find out why.
You may also try the attached test script: copyto_json_perfomance_test.nocfbot.
--
jian
https://www.enterprisedb.com/
Attachments:
[text/x-patch] v28-0002-json-format-for-COPY-TO.patch (23.7K, 2-v28-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 439fc107c50408f51e3192f221373e1333672063 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sun, 8 Mar 2026 23:44:11 +0800
Subject: [PATCH v28 2/4] json format for COPY TO
This introduces the JSON format option for the COPY TO command, allowing users
to export query results or table data directly as a single JSON object or a
stream of JSON objects.
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV parsing or formatting options,
including:
- HEADER
- DEFAULT
- NULL
- DELIMITER
- FORCE QUOTE / FORCE NOT NULL
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Andrew Dunstan <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 +++--
src/backend/commands/copy.c | 49 +++++++++++-----
src/backend/commands/copyto.c | 86 ++++++++++++++++++++++++----
src/backend/parser/gram.y | 8 +++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 91 ++++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 52 +++++++++++++++++
10 files changed, 278 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0ad890ef95f..75f55bbf6f8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f46be516f2..29c121c7f08 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -597,6 +597,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -756,21 +758,32 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
+ if (opts_out->delim &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
+ : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
+ if (opts_out->null_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "NULL")
+ : errmsg("cannot specify %s in JSON mode", "NULL"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
+ if (opts_out->default_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DEFAULT")
+ : errmsg("cannot specify %s in JSON mode", "DEFAULT"));
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -836,11 +849,15 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->header_line != COPY_HEADER_FALSE &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "HEADER")
+ : errmsg("cannot specify %s in JSON mode", "HEADER"));
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -944,6 +961,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b0ee91fc9c1..6971f4b85af 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -85,6 +87,8 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ StringInfo json_buf; /* reusable buffer for JSON output, it is
+ * initliazed in BeginCopyTo */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -131,6 +135,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,9 +154,6 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
- *
- * CSV and text formats share the same TextLike routines except for the
- * one-row callback.
*/
/* text format */
@@ -170,6 +172,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -186,12 +196,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -210,6 +222,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -232,7 +246,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -305,13 +319,39 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+
+ /*
+ * composite_to_json() requires a stable TupleDesc. Since the slot's
+ * descriptor (slot->tts_tupleDescriptor) can change during the execution
+ * of a SELECT query, we use cstate->queryDesc->tupDesc instead. This
+ * precaution is only necessary when the output slot's TupleDesc is of
+ * type RECORDOID.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+
+ resetStringInfo(cstate->json_buf);
+
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ composite_to_json(rowdata, cstate->json_buf, false);
+
+ CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -403,9 +443,23 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * For JSON format, report one text-format column. Each CopyData
+ * message contains one complete JSON object, not individual column
+ * values, so the per-column count is always 1.
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -507,7 +561,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
@@ -886,11 +940,23 @@ BeginCopyTo(ParseState *pstate,
ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
+ tupDesc = BlessTupleDesc(tupDesc);
}
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* JSON outputs whole rows; a column list doesn't make sense */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ cstate->json_buf = makeStringInfo();
+
+ if (attnamelist != NIL)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("column selection is not supported in JSON mode"));
+ }
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9cbe8eafc45..136fd19b854 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3612,6 +3612,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3694,6 +3698,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 0b161398465..f609d7b9417 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6484c6a3dd4..bb82bdbcc48 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3425,7 +3425,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (TailMatches("FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM|TO filename WITH (FREEZE */
else if (TailMatches("FREEZE"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2430fb0b2e5..2b5bef6738e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f8cc52b1e78..2f4be40518d 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d0d563e0fa8..72011f3492c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,97 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+{"?column?":1}
+{"?column?":2}
+copy (values (1), (2)) TO stdout with (format json);
+{"column1":1}
+{"column1":2}
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy (select * from copytest) to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest to stdout (format json, reject_limit 1);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+copy copytest from stdin(format json);
+ERROR: COPY JSON mode cannot be used with COPY FROM
+copy copytest (style) to stdout (format json);
+ERROR: column selection is not supported in JSON mode
+-- all of the above should yield error
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 65cbdaf7f3e..a6c923ef5ab 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,58 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+copy (values (1), (2)) TO stdout with (format json);
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+copy (select * from copytest) to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest to stdout (format json, reject_limit 1);
+copy copytest from stdin(format json);
+copy copytest (style) to stdout (format json);
+-- all of the above should yield error
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
[text/x-patch] v28-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 3-v28-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From 226c834e93351ffda039774fbd4b57a9ab16b4f7 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 21 Jan 2026 18:38:24 +0800
Subject: [PATCH v28 1/4] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans (binary, csv_mode)
fields in CopyFormatOptions This approach, while functional, isn't ideal for
future other implement other format.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson <[email protected]>,
later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 63b86802ba2..2f46be516f2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -576,6 +576,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = palloc0_object(CopyFormatOptions);
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -590,11 +592,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -754,31 +756,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -826,7 +828,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -834,43 +836,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -884,8 +886,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -900,8 +902,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -925,7 +927,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -961,7 +963,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -978,7 +980,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..4d927410159 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -156,9 +156,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -262,7 +262,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..5f0c551e7ec 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -173,7 +173,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -751,7 +751,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -778,7 +778,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..b0ee91fc9c1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -182,9 +182,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -221,7 +221,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -398,7 +398,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 877202af67b..2430fb0b2e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -59,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..520cdd36800 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -528,6 +528,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
[text/x-patch] v28-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (12.3K, 4-v28-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 247eceeeaec62677d899f225bd83905da21f0087 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sun, 8 Mar 2026 15:54:23 +0800
Subject: [PATCH v28 3/4] Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the default
behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Florents Tselai <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 30 ++++++++++++++++++++++
src/backend/commands/copy.c | 13 ++++++++++
src/backend/commands/copyto.c | 41 ++++++++++++++++++++++++++++--
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 33 ++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 10 ++++++++
7 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 75f55bbf6f8..a79587f7613 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
@@ -1103,6 +1117,22 @@ COPY country TO STDOUT (DELIMITER '|');
</programlisting>
</para>
+<para>
+ When the <literal>FORCE_ARRAY</literal> option is enabled,
+ the entire output is wrapped in a single JSON array with rows separated by commas:
+<programlisting>
+COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
+</programlisting>
+The output is as follows:
+<screen>
+[
+ {"id":1}
+,{"id":2}
+]
+</screen>
+</para>
+
+
<para>
To copy data from a file into the <literal>country</literal> table:
<programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 29c121c7f08..84254d46a67 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -569,6 +569,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -725,6 +726,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -967,6 +975,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only be used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 6971f4b85af..38fbf7d4424 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -87,6 +87,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter before next row */
StringInfo json_buf; /* reusable buffer for JSON output, it is
* initliazed in BeginCopyTo */
copy_data_dest_cb data_dest_cb; /* function for writing data */
@@ -136,6 +137,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -177,7 +179,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -243,6 +245,18 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+ }
}
/*
@@ -319,13 +333,24 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text, CSV, and json formats */
+/* Implementation of the end callback for text and CSV formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/* Implementation of per-row callback for json format */
static void
CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
@@ -347,6 +372,18 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
rowdata = ExecFetchSlotHeapTupleDatum(slot);
composite_to_json(rowdata, cstate->json_buf, false);
+ if (cstate->opts.force_array)
+ {
+ if (cstate->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+ }
+
CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
CopySendTextLikeEndOfRow(cstate);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bb82bdbcc48..00458cfb4bc 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2b5bef6738e..abecfe51098 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -88,6 +88,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 72011f3492c..e1d51335e33 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -80,6 +80,16 @@ copy (select 1 union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
{"column1":1}
{"column1":2}
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+[
+ {"?column?":1}
+,{"?column?":2}
+]
+copy (values (1), (2)) TO stdout with (format json, force_array true);
+[
+ {"column1":1}
+,{"column1":2}
+]
copy copytest to stdout json;
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
@@ -127,6 +137,29 @@ ERROR: COPY JSON mode cannot be used with COPY FROM
copy copytest (style) to stdout (format json);
ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only be used with JSON mode
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index a6c923ef5ab..764d19f4947 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -85,6 +85,8 @@ copy copytest3 to stdout csv header;
--- test copying in JSON mode with various styles
copy (select 1 union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+copy (values (1), (2)) TO stdout with (format json, force_array true);
copy copytest to stdout json;
copy copytest to stdout (format json);
copy (select * from copytest) to stdout (format json);
@@ -106,6 +108,14 @@ copy copytest from stdin(format json);
copy copytest (style) to stdout (format json);
-- all of the above should yield error
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+copy copytest to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[text/x-patch] v28-0004-COPY-TO-JSON-support-column-lists.patch (12.7K, 5-v28-0004-COPY-TO-JSON-support-column-lists.patch)
download | inline diff:
From 365f340a0b733a3d9b5fdf540a2623c3ea9d4d8d Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Sun, 8 Mar 2026 23:58:29 +0800
Subject: [PATCH v28 4/4] COPY TO JSON: support column lists
When a column list is specified (e.g. COPY t (a, b) TO ... FORMAT json),
build a projected TupleDesc containing only the selected columns and
form a new tuple per row via heap_form_tuple(), so that composite_to_json()
emits the correct column names and values.
Use HeapTupleHeaderGetDatum() directly on the formed tuple rather than
heap_copy_tuple_as_datum(), since heap_form_tuple() already stamps the
datum-length, type-id, and type-mod fields on t_data, avoiding an
unnecessary palloc+memcpy per row.
Add regression tests covering column lists with diverse data types
including json, jsonb, int[], numeric, boolean, timestamp, and text,
exercising various column subsets and NULL handling.
Author: Andrew Dunstan <[email protected]>
Reviewed-by: jian he <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copyto.c | 105 ++++++++++++++++++++++++-----
src/test/regress/expected/copy.out | 73 +++++++++++++++++++-
src/test/regress/sql/copy.sql | 40 ++++++++++-
3 files changed, 197 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 38fbf7d4424..faa8e323f56 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -88,8 +88,13 @@ typedef struct CopyToStateData
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
bool json_row_delim_needed; /* need delimiter before next row */
- StringInfo json_buf; /* reusable buffer for JSON output, it is
- * initliazed in BeginCopyTo */
+ StringInfo json_buf; /* reusable buffer for JSON output,
+ * initialized in BeginCopyTo */
+ TupleDesc tupDesc; /* Descriptor for JSON output; for a column
+ * list this is a projected descriptor */
+ Datum *json_projvalues; /* pre-allocated projection values, or
+ * NULL */
+ bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -357,19 +362,53 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
{
Datum rowdata;
- /*
- * composite_to_json() requires a stable TupleDesc. Since the slot's
- * descriptor (slot->tts_tupleDescriptor) can change during the execution
- * of a SELECT query, we use cstate->queryDesc->tupDesc instead. This
- * precaution is only necessary when the output slot's TupleDesc is of
- * type RECORDOID.
- */
- if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
- slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
-
resetStringInfo(cstate->json_buf);
- rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ if (cstate->json_projvalues != NULL)
+ {
+ /*
+ * Column list case: project selected column values into sequential
+ * positions matching the custom TupleDesc, then form a new tuple.
+ */
+ HeapTuple tup;
+ int i = 0;
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ cstate->json_projvalues[i] = slot->tts_values[attnum - 1];
+ cstate->json_projnulls[i] = slot->tts_isnull[attnum - 1];
+ i++;
+ }
+
+ tup = heap_form_tuple(cstate->tupDesc,
+ cstate->json_projvalues,
+ cstate->json_projnulls);
+
+ /*
+ * heap_form_tuple already stamps the datum-length, type-id, and
+ * type-mod fields on t_data, so we can use it directly as a composite
+ * Datum without the extra pallocmemcpy that heap_copy_tuple_as_datum
+ * would do. Any TOAST pointers in the projected values will be
+ * detoasted by the per-column output functions called from
+ * composite_to_json.
+ */
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ {
+ /*
+ * Full table or query without column list. Ensure the slot uses
+ * cstate->tupDesc so that the datum is stamped with the right type;
+ * for queries output type is RECORDOID this must be the blessed
+ * descriptor so that composite_to_json can look it up via
+ * lookup_rowtype_tupdesc.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ }
+
composite_to_json(rowdata, cstate->json_buf, false);
if (cstate->opts.force_array)
@@ -841,6 +880,7 @@ BeginCopyTo(ParseState *pstate,
tupDesc = RelationGetDescr(cstate->rel);
cstate->partitions = children;
+ cstate->tupDesc = tupDesc;
}
else
{
@@ -978,20 +1018,49 @@ BeginCopyTo(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
tupDesc = BlessTupleDesc(tupDesc);
+ cstate->tupDesc = tupDesc;
}
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
- /* JSON outputs whole rows; a column list doesn't make sense */
+ /* Set up JSON-specific state */
if (cstate->opts.format == COPY_FORMAT_JSON)
{
cstate->json_buf = makeStringInfo();
- if (attnamelist != NIL)
- ereport(ERROR,
- errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("column selection is not supported in JSON mode"));
+ if (attnamelist != NIL && rel)
+ {
+ int natts = list_length(cstate->attnumlist);
+ TupleDesc resultDesc;
+
+ /*
+ * Build a TupleDesc describing only the selected columns so that
+ * composite_to_json() emits the right column names and types.
+ */
+ resultDesc = CreateTemplateTupleDesc(natts);
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ TupleDescInitEntry(resultDesc,
+ foreach_current_index(attnum) + 1,
+ NameStr(attr->attname),
+ attr->atttypid,
+ attr->atttypmod,
+ attr->attndims);
+ }
+
+ cstate->tupDesc = BlessTupleDesc(resultDesc);
+
+ /*
+ * Pre-allocate arrays for projecting selected column values into
+ * sequential positions matching the custom TupleDesc.
+ */
+ cstate->json_projvalues = palloc_array(Datum, natts);
+ cstate->json_projnulls = palloc_array(bool, natts);
+ }
}
num_phys_attrs = tupDesc->natts;
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index e1d51335e33..e44b4a1d79d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -77,6 +77,9 @@ c1,"col with , comma","col with "" quote"
copy (select 1 union all select 2) to stdout with (format json);
{"?column?":1}
{"?column?":2}
+copy (select 1 as foo union all select 2) to stdout with (format json);
+{"foo":1}
+{"foo":2}
copy (values (1), (2)) TO stdout with (format json);
{"column1":1}
{"column1":2}
@@ -134,8 +137,6 @@ copy copytest to stdout (format json, reject_limit 1);
ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
copy copytest from stdin(format json);
ERROR: COPY JSON mode cannot be used with COPY FROM
-copy copytest (style) to stdout (format json);
-ERROR: column selection is not supported in JSON mode
-- all of the above should yield error
-- should fail: force_array requires json format
copy copytest to stdout (format csv, force_array true);
@@ -160,6 +161,74 @@ copy copytest to stdout (format json, force_array false);
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+{"style":"DOS","filler":1}
+{"style":"Unix","filler":2}
+{"style":"Mac","filler":3}
+{"style":"esc\\ape","filler":4}
+copy copytest (style, filler) to stdout (format json, force_array true);
+[
+ {"style":"DOS","filler":1}
+,{"style":"Unix","filler":2}
+,{"style":"Mac","filler":3}
+,{"style":"esc\\ape","filler":4}
+]
+copy copytest (style, test, filler) to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+-- full table
+copy copyjsontest_types to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2},"arr":[1,2,3],"n":3.14,"b":true,"ts":"2024-01-15T10:30:00","t":"hello"}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}},"arr":[4,5],"n":-99.99,"b":false,"ts":"2024-06-30T23:59:59","t":"world"}
+{"id":3,"js":null,"jsb":null,"arr":[],"n":null,"b":null,"ts":null,"t":null}
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2}}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}}}
+{"id":3,"js":null,"jsb":null}
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+{"id":1,"arr":[1,2,3],"n":3.14,"b":true}
+{"id":2,"arr":[4,5],"n":-99.99,"b":false}
+{"id":3,"arr":[],"n":null,"b":null}
+copy copyjsontest_types (jsb, t) to stdout (format json);
+{"jsb":{"b": 2},"t":"hello"}
+{"jsb":{"nested": {"k": "v"}},"t":"world"}
+{"jsb":null,"t":null}
+copy copyjsontest_types (id, ts) to stdout (format json);
+{"id":1,"ts":"2024-01-15T10:30:00"}
+{"id":2,"ts":"2024-06-30T23:59:59"}
+{"id":3,"ts":null}
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+{"js":{"a":1}}
+{"js":[1,null,"x"]}
+{"js":null}
+copy copyjsontest_types (jsb) to stdout (format json);
+{"jsb":{"b": 2}}
+{"jsb":{"nested": {"k": "v"}}}
+{"jsb":null}
+drop table copyjsontest_types;
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 764d19f4947..e4e70a82ecc 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -84,6 +84,7 @@ copy copytest3 to stdout csv header;
--- test copying in JSON mode with various styles
copy (select 1 union all select 2) to stdout with (format json);
+copy (select 1 as foo union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
copy (select 1 union all select 2) to stdout with (format json, force_array true);
copy (values (1), (2)) TO stdout with (format json, force_array true);
@@ -105,7 +106,6 @@ copy copytest to stdout (format json, force_null *);
copy copytest to stdout (format json, on_error ignore);
copy copytest to stdout (format json, reject_limit 1);
copy copytest from stdin(format json);
-copy copytest (style) to stdout (format json);
-- all of the above should yield error
-- should fail: force_array requires json format
@@ -116,6 +116,44 @@ copy copytest to stdout (format json, force_array);
copy copytest to stdout (format json, force_array true);
copy copytest to stdout (format json, force_array false);
+-- column list with json format
+copy copytest (style, filler) to stdout (format json);
+copy copytest (style, filler) to stdout (format json, force_array true);
+copy copytest (style, test, filler) to stdout (format json, force_array true);
+
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+
+-- full table
+copy copyjsontest_types to stdout (format json);
+
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+copy copyjsontest_types (jsb, t) to stdout (format json);
+copy copyjsontest_types (id, ts) to stdout (format json);
+
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+copy copyjsontest_types (jsb) to stdout (format json);
+
+drop table copyjsontest_types;
+
-- embedded escaped characters
create temp table copyjsontest (
id bigserial,
--
2.34.1
[application/octet-stream] copyto_json_perfomance_test.nocfbot (1.7K, 6-copyto_json_perfomance_test.nocfbot)
download
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-08 19:44 Andrew Dunstan <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Dunstan @ 2026-03-08 19:44 UTC (permalink / raw)
To: jian he <[email protected]>; +Cc: Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2026-03-08 Su 12:16 PM, jian he wrote:
> hi.
>
> V27-0002 is still not bullet-proof.
>
> drop table if exists t1;
> create table t1(a int);
> insert into t1 values (1);
> copy (select * from t1) to stdout json;
> {"a":1}
> WARNING: resource was not closed: TupleDesc 0x7171d0ca3440 (18239,-1)
>
> Also see ExecAssignScanProjectionInfo->ExecConditionalAssignProjectionInfo
> So in v28-0002, I changed to
> + /*
> + * composite_to_json() requires a stable TupleDesc. Since the slot's
> + * descriptor (slot->tts_tupleDescriptor) can change during the execution
> + * of a SELECT query, we use cstate->queryDesc->tupDesc instead. This
> + * precaution is only necessary when the output slot's TupleDesc is of
> + * type RECORDOID.
> + */
> + if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
> + slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
How about something like this?:
- * Full table or query without column list. Ensure the slot uses
- * cstate->tupDesc so that the datum is stamped with the right type;
- * for queries output type is RECORDOID this must be the blessed
- * descriptor so that composite_to_json can look it up via
- * lookup_rowtype_tupdesc.
+ * Full table or query without column list. For queries, the slot's
+ * TupleDesc may carry RECORDOID, which is not registered in the
type
+ * cache and would cause composite_to_json's lookup_rowtype_tupdesc
+ * call to fail. Build a HeapTuple stamped with the blessed
+ * descriptor so the type can be looked up correctly.
*/
if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
RECORDOID)
- slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
+ {
+ HeapTuple tup;
- rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ tup = heap_form_tuple(cstate->tupDesc,
+ slot->tts_values,
+ slot->tts_isnull);
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ {
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ }
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-09 03:48 jian he <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2026-03-09 03:48 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan <[email protected]> wrote:
>
> Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
> How about something like this?:
>
> - * Full table or query without column list. Ensure the slot uses
> - * cstate->tupDesc so that the datum is stamped with the right type;
> - * for queries output type is RECORDOID this must be the blessed
> - * descriptor so that composite_to_json can look it up via
> - * lookup_rowtype_tupdesc.
> + * Full table or query without column list. For queries, the slot's
> + * TupleDesc may carry RECORDOID, which is not registered in the
> type
> + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
> + * call to fail. Build a HeapTuple stamped with the blessed
> + * descriptor so the type can be looked up correctly.
> */
> if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
> RECORDOID)
> - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
> + {
> + HeapTuple tup;
>
> - rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + tup = heap_form_tuple(cstate->tupDesc,
> + slot->tts_values,
> + slot->tts_isnull);
> + rowdata = HeapTupleGetDatum(tup);
> + }
> + else
> + {
> + rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + }
>
This is better. I've tried to get rid of json_projvalues and json_projnulls.
Just using heap_form_tuple, but it won't work.
I incorporated the v28-0004 COPY column list into v9-0002.
With this patch set, we added four fields to the struct CopyToStateData.
+ StringInfo json_buf; /* reusable buffer for JSON output,
+ * initialized in BeginCopyTo */
+ TupleDesc tupDesc; /* Descriptor for JSON output; for a column
+ * list this is a projected descriptor */
+ Datum *json_projvalues; /* pre-allocated projection values, or
+ * NULL */
+ bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
Using the script in
https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40ma...
I tested it again on macOS and Linux, and there are no regressions for
COPY TO with the TEXT and CSV formats.
--
jian
https://www.enterprisedb.com/
Attachments:
[text/x-patch] v29-0002-json-format-for-COPY-TO.patch (29.3K, 2-v29-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From e9fdfae2f0829b2af2e5ee4047230ec124bd72b9 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 9 Mar 2026 10:21:32 +0800
Subject: [PATCH v29 2/3] json format for COPY TO
This introduces the JSON format option for the COPY TO command, allowing users
to export query results or table data directly as a single JSON object or a
stream of JSON objects.
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV parsing or formatting options,
including:
- HEADER
- DEFAULT
- NULL
- DELIMITER
- FORCE QUOTE / FORCE NOT NULL
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Author: Andrew Dunstan <[email protected]>,
Reviewed-by: "Andrey M. Borodin" <[email protected]>,
Reviewed-by: Dean Rasheed <[email protected]>,
Reviewed-by: Daniel Verite <[email protected]>,
Reviewed-by: Davin Shearer <[email protected]>,
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 ++-
src/backend/commands/copy.c | 49 ++++++---
src/backend/commands/copyto.c | 161 +++++++++++++++++++++++++++--
src/backend/parser/gram.y | 8 ++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 146 ++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 88 ++++++++++++++++
10 files changed, 444 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0ad890ef95f..75f55bbf6f8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f46be516f2..29c121c7f08 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -597,6 +597,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -756,21 +758,32 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
+ if (opts_out->delim &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
+ : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
+ if (opts_out->null_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "NULL")
+ : errmsg("cannot specify %s in JSON mode", "NULL"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
+ if (opts_out->default_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DEFAULT")
+ : errmsg("cannot specify %s in JSON mode", "DEFAULT"));
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -836,11 +849,15 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->header_line != COPY_HEADER_FALSE &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "HEADER")
+ : errmsg("cannot specify %s in JSON mode", "HEADER"));
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -944,6 +961,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b0ee91fc9c1..9d8d8318957 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -85,6 +87,13 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ StringInfo json_buf; /* reusable buffer for JSON output,
+ * initialized in BeginCopyTo */
+ TupleDesc tupDesc; /* Descriptor for JSON output; for a column
+ * list this is a projected descriptor */
+ Datum *json_projvalues; /* pre-allocated projection values, or
+ * NULL */
+ bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -131,6 +140,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,9 +159,6 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
- *
- * CSV and text formats share the same TextLike routines except for the
- * one-row callback.
*/
/* text format */
@@ -170,6 +177,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -186,12 +201,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -210,6 +227,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -232,7 +251,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -305,13 +324,79 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+
+ resetStringInfo(cstate->json_buf);
+
+ if (cstate->json_projvalues != NULL)
+ {
+ /*
+ * Column list case: project selected column values into sequential
+ * positions matching the custom TupleDesc, then form a new tuple.
+ */
+ HeapTuple tup;
+ int i = 0;
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ cstate->json_projvalues[i] = slot->tts_values[attnum - 1];
+ cstate->json_projnulls[i] = slot->tts_isnull[attnum - 1];
+ i++;
+ }
+
+ tup = heap_form_tuple(cstate->tupDesc,
+ cstate->json_projvalues,
+ cstate->json_projnulls);
+
+ /*
+ * heap_form_tuple already stamps the datum-length, type-id, and
+ * type-mod fields on t_data, so we can use it directly as a composite
+ * Datum without the extra pallocmemcpy that heap_copy_tuple_as_datum
+ * would do. Any TOAST pointers in the projected values will be
+ * detoasted by the per-column output functions called from
+ * composite_to_json.
+ */
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ {
+ /*
+ * Full table or query without column list. For queries, the slot's
+ * TupleDesc may carry RECORDOID, which is not registered in the type
+ * cache and would cause composite_to_json's lookup_rowtype_tupdesc
+ * call to fail. Build a HeapTuple stamped with the blessed
+ * descriptor so the type can be looked up correctly.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ {
+ HeapTuple tup = heap_form_tuple(cstate->tupDesc,
+ slot->tts_values,
+ slot->tts_isnull);
+
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ }
+
+ composite_to_json(rowdata, cstate->json_buf, false);
+
+ CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -403,9 +488,23 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * For JSON format, report one text-format column. Each CopyData
+ * message contains one complete JSON object, not individual column
+ * values, so the per-column count is always 1.
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -507,7 +606,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
@@ -750,6 +849,7 @@ BeginCopyTo(ParseState *pstate,
tupDesc = RelationGetDescr(cstate->rel);
cstate->partitions = children;
+ cstate->tupDesc = tupDesc;
}
else
{
@@ -886,11 +986,52 @@ BeginCopyTo(ParseState *pstate,
ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
+ tupDesc = BlessTupleDesc(tupDesc);
+ cstate->tupDesc = tupDesc;
}
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Set up JSON-specific state */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ cstate->json_buf = makeStringInfo();
+
+ if (attnamelist != NIL && rel)
+ {
+ int natts = list_length(cstate->attnumlist);
+ TupleDesc resultDesc;
+
+ /*
+ * Build a TupleDesc describing only the selected columns so that
+ * composite_to_json() emits the right column names and types.
+ */
+ resultDesc = CreateTemplateTupleDesc(natts);
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ TupleDescInitEntry(resultDesc,
+ foreach_current_index(attnum) + 1,
+ NameStr(attr->attname),
+ attr->atttypid,
+ attr->atttypmod,
+ attr->attndims);
+ }
+
+ cstate->tupDesc = BlessTupleDesc(resultDesc);
+
+ /*
+ * * Pre-allocate arrays for projecting selected column values
+ * into sequential positions matching the custom TupleDesc.
+ */
+ cstate->json_projvalues = palloc_array(Datum, natts);
+ cstate->json_projnulls = palloc_array(bool, natts);
+ }
+ }
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9cbe8eafc45..136fd19b854 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3612,6 +3612,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3694,6 +3698,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 0b161398465..f609d7b9417 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 6484c6a3dd4..bb82bdbcc48 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3425,7 +3425,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (TailMatches("FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM|TO filename WITH (FREEZE */
else if (TailMatches("FREEZE"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2430fb0b2e5..2b5bef6738e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f8cc52b1e78..2f4be40518d 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d0d563e0fa8..9b667544905 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,152 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+{"?column?":1}
+{"?column?":2}
+copy (select 1 as foo union all select 2) to stdout with (format json);
+{"foo":1}
+{"foo":2}
+copy (values (1), (2)) TO stdout with (format json);
+{"column1":1}
+{"column1":2}
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy (select * from copytest) to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest to stdout (format json, reject_limit 1);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+copy copytest from stdin(format json);
+ERROR: COPY JSON mode cannot be used with COPY FROM
+-- all of the above should yield error
+-- column list with json format
+copy copytest (style, test, filler) to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+-- full table
+copy copyjsontest_types to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2},"arr":[1,2,3],"n":3.14,"b":true,"ts":"2024-01-15T10:30:00","t":"hello"}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}},"arr":[4,5],"n":-99.99,"b":false,"ts":"2024-06-30T23:59:59","t":"world"}
+{"id":3,"js":null,"jsb":null,"arr":[],"n":null,"b":null,"ts":null,"t":null}
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2}}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}}}
+{"id":3,"js":null,"jsb":null}
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+{"id":1,"arr":[1,2,3],"n":3.14,"b":true}
+{"id":2,"arr":[4,5],"n":-99.99,"b":false}
+{"id":3,"arr":[],"n":null,"b":null}
+copy copyjsontest_types (jsb, t) to stdout (format json);
+{"jsb":{"b": 2},"t":"hello"}
+{"jsb":{"nested": {"k": "v"}},"t":"world"}
+{"jsb":null,"t":null}
+copy copyjsontest_types (id, ts) to stdout (format json);
+{"id":1,"ts":"2024-01-15T10:30:00"}
+{"id":2,"ts":"2024-06-30T23:59:59"}
+{"id":3,"ts":null}
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+{"js":{"a":1}}
+{"js":[1,null,"x"]}
+{"js":null}
+copy copyjsontest_types (jsb) to stdout (format json);
+{"jsb":{"b": 2}}
+{"jsb":{"nested": {"k": "v"}}}
+{"jsb":null}
+drop table copyjsontest_types;
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 65cbdaf7f3e..404f4321085 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,94 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+copy (select 1 as foo union all select 2) to stdout with (format json);
+copy (values (1), (2)) TO stdout with (format json);
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+copy (select * from copytest) to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest to stdout (format json, reject_limit 1);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- column list with json format
+copy copytest (style, test, filler) to stdout (format json);
+
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+
+-- full table
+copy copyjsontest_types to stdout (format json);
+
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+copy copyjsontest_types (jsb, t) to stdout (format json);
+copy copyjsontest_types (id, ts) to stdout (format json);
+
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+copy copyjsontest_types (jsb) to stdout (format json);
+
+drop table copyjsontest_types;
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.34.1
[text/x-patch] v29-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 3-v29-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From cc747df3929913078b85dd5fbee5a852aa7d0d53 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Wed, 21 Jan 2026 18:38:24 +0800
Subject: [PATCH v29 1/3] introduce CopyFormat refactor CopyFormatOptions
Currently, COPY command format is determined by two booleans (binary, csv_mode)
fields in CopyFormatOptions This approach, while functional, isn't ideal for
future other implement other format.
To simplify adding new formats, we've introduced an enum CopyFormat. This makes
the code cleaner and more maintainable, allowing for easier integration of
additional formats down the line.
The CopyFormat enum was originally contributed by Joel Jacobson <[email protected]>,
later refactored by Jian He to address various issues.
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 63b86802ba2..2f46be516f2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -576,6 +576,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = palloc0_object(CopyFormatOptions);
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -590,11 +592,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -754,31 +756,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -826,7 +828,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -834,43 +836,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -884,8 +886,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -900,8 +902,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -925,7 +927,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -961,7 +963,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -978,7 +980,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..4d927410159 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -156,9 +156,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -262,7 +262,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..5f0c551e7ec 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -173,7 +173,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -751,7 +751,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -778,7 +778,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..b0ee91fc9c1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -182,9 +182,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -221,7 +221,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -398,7 +398,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 877202af67b..2430fb0b2e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -59,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..520cdd36800 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -528,6 +528,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.34.1
[text/x-patch] v29-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (12.3K, 4-v29-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 510f1f0435bc74b80575b0b8af2134817fedfa05 Mon Sep 17 00:00:00 2001
From: jian he <[email protected]>
Date: Mon, 9 Mar 2026 10:27:37 +0800
Subject: [PATCH v29 3/3] Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the default
behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>,
Reviewed-by: Florents Tselai <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 30 ++++++++++++++++++++++
src/backend/commands/copy.c | 13 ++++++++++
src/backend/commands/copyto.c | 41 ++++++++++++++++++++++++++++--
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 33 ++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 10 ++++++++
7 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 75f55bbf6f8..a79587f7613 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
@@ -1103,6 +1117,22 @@ COPY country TO STDOUT (DELIMITER '|');
</programlisting>
</para>
+<para>
+ When the <literal>FORCE_ARRAY</literal> option is enabled,
+ the entire output is wrapped in a single JSON array with rows separated by commas:
+<programlisting>
+COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
+</programlisting>
+The output is as follows:
+<screen>
+[
+ {"id":1}
+,{"id":2}
+]
+</screen>
+</para>
+
+
<para>
To copy data from a file into the <literal>country</literal> table:
<programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 29c121c7f08..84254d46a67 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -569,6 +569,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -725,6 +726,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -967,6 +975,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s mode cannot be used with %s", "JSON", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only be used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9d8d8318957..85ca7c947f3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -87,6 +87,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter before next row */
StringInfo json_buf; /* reusable buffer for JSON output,
* initialized in BeginCopyTo */
TupleDesc tupDesc; /* Descriptor for JSON output; for a column
@@ -141,6 +142,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -182,7 +184,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -248,6 +250,18 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+ }
}
/*
@@ -324,13 +338,24 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text, CSV, and json formats */
+/* Implementation of the end callback for text and CSV formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/* Implementation of per-row callback for json format */
static void
CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
@@ -392,6 +417,18 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
composite_to_json(rowdata, cstate->json_buf, false);
+ if (cstate->opts.force_array)
+ {
+ if (cstate->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+ }
+
CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
CopySendTextLikeEndOfRow(cstate);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bb82bdbcc48..00458cfb4bc 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2b5bef6738e..abecfe51098 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -88,6 +88,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 9b667544905..bcf45845b61 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -83,6 +83,16 @@ copy (select 1 as foo union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
{"column1":1}
{"column1":2}
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+[
+ {"?column?":1}
+,{"?column?":2}
+]
+copy (values (1), (2)) TO stdout with (format json, force_array true);
+[
+ {"column1":1}
+,{"column1":2}
+]
copy copytest to stdout json;
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
@@ -134,6 +144,29 @@ copy copytest (style, test, filler) to stdout (format json);
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only be used with JSON mode
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest(style, test) to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef"}
+,{"style":"Unix","test":"abc\ndef"}
+,{"style":"Mac","test":"abc\rdef"}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb"}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
-- column list with diverse data types
create temp table copyjsontest_types (
id int,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 404f4321085..bc12ac879ef 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -86,6 +86,8 @@ copy copytest3 to stdout csv header;
copy (select 1 union all select 2) to stdout with (format json);
copy (select 1 as foo union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+copy (values (1), (2)) TO stdout with (format json, force_array true);
copy copytest to stdout json;
copy copytest to stdout (format json);
copy (select * from copytest) to stdout (format json);
@@ -109,6 +111,14 @@ copy copytest from stdin(format json);
-- column list with json format
copy copytest (style, test, filler) to stdout (format json);
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+copy copytest(style, test) to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
-- column list with diverse data types
create temp table copyjsontest_types (
id int,
--
2.34.1
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-16 18:24 Masahiko Sawada <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Masahiko Sawada @ 2026-03-16 18:24 UTC (permalink / raw)
To: jian he <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Sun, Mar 8, 2026 at 8:49 PM jian he <[email protected]> wrote:
>
> On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan <[email protected]> wrote:
> >
> > Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
> > How about something like this?:
> >
> > - * Full table or query without column list. Ensure the slot uses
> > - * cstate->tupDesc so that the datum is stamped with the right type;
> > - * for queries output type is RECORDOID this must be the blessed
> > - * descriptor so that composite_to_json can look it up via
> > - * lookup_rowtype_tupdesc.
> > + * Full table or query without column list. For queries, the slot's
> > + * TupleDesc may carry RECORDOID, which is not registered in the
> > type
> > + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
> > + * call to fail. Build a HeapTuple stamped with the blessed
> > + * descriptor so the type can be looked up correctly.
> > */
> > if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
> > RECORDOID)
> > - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
> > + {
> > + HeapTuple tup;
> >
> > - rowdata = ExecFetchSlotHeapTupleDatum(slot);
> > + tup = heap_form_tuple(cstate->tupDesc,
> > + slot->tts_values,
> > + slot->tts_isnull);
> > + rowdata = HeapTupleGetDatum(tup);
> > + }
> > + else
> > + {
> > + rowdata = ExecFetchSlotHeapTupleDatum(slot);
> > + }
> >
> This is better. I've tried to get rid of json_projvalues and json_projnulls.
> Just using heap_form_tuple, but it won't work.
>
> I incorporated the v28-0004 COPY column list into v9-0002.
> With this patch set, we added four fields to the struct CopyToStateData.
>
> + StringInfo json_buf; /* reusable buffer for JSON output,
> + * initialized in BeginCopyTo */
> + TupleDesc tupDesc; /* Descriptor for JSON output; for a column
> + * list this is a projected descriptor */
> + Datum *json_projvalues; /* pre-allocated projection values, or
> + * NULL */
> + bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
>
> Using the script in
> https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40ma...
> I tested it again on macOS and Linux, and there are no regressions for
> COPY TO with the TEXT and CSV formats.
I've reviewed the patch and have some comments:
---
I got a SEGV in the following scenario:
postgres(1:1197708)=# create table test (a int, b text, c jsonb);
CREATE TABLE
postgres(1:1197708)=# copy test(a, b) to stdout with (format 'json' );
TRAP: failed Assert("tupdesc->firstNonCachedOffsetAttr >= 0"), File:
"execTuples.c", Line: 2328, PID: 1197708
postgres: masahiko postgres [local] COPY(ExceptionalCondition+0x9e) [0xbebe48]
postgres: masahiko postgres [local] COPY(BlessTupleDesc+0x2b) [0x729b50]
postgres: masahiko postgres [local] COPY(BeginCopyTo+0xc94) [0x637bdf]
postgres: masahiko postgres [local] COPY(DoCopy+0xb68) [0x62afbc]
postgres: masahiko postgres [local]
COPY(standard_ProcessUtility+0xa22) [0xa0ba48]
postgres: masahiko postgres [local] COPY(ProcessUtility+0x10e) [0xa0b01f]
postgres: masahiko postgres [local] COPY() [0xa09872]
postgres: masahiko postgres [local] COPY() [0xa09acf]
postgres: masahiko postgres [local] COPY(PortalRun+0x2c8) [0xa0901d]
postgres: masahiko postgres [local] COPY() [0xa02055]
postgres: masahiko postgres [local] COPY(PostgresMain+0xaf1) [0xa0724e]
postgres: masahiko postgres [local] COPY() [0x9fdab9]
postgres: masahiko postgres [local]
COPY(postmaster_child_launch+0x165) [0x905378]
postgres: masahiko postgres [local] COPY() [0x90b600]
postgres: masahiko postgres [local] COPY() [0x908e6a]
postgres: masahiko postgres [local] COPY(PostmasterMain+0x14fe) [0x90880c]
postgres: masahiko postgres [local] COPY(main+0x340) [0x7a1f9c]
It seems to forget to call TupleDescFinalize(). And I think we need
some regression tests for this case.
---
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+ }
We can conjunct the two if statement conditions.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-16 20:59 Andrew Dunstan <[email protected]>
parent: Masahiko Sawada <[email protected]>
0 siblings, 2 replies; 28+ messages in thread
From: Andrew Dunstan @ 2026-03-16 20:59 UTC (permalink / raw)
To: Masahiko Sawada <[email protected]>; jian he <[email protected]>; +Cc: Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2026-03-16 Mo 2:24 PM, Masahiko Sawada wrote:
> On Sun, Mar 8, 2026 at 8:49 PM jian he<[email protected]> wrote:
>> On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan<[email protected]> wrote:
>>> Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
>>> How about something like this?:
>>>
>>> - * Full table or query without column list. Ensure the slot uses
>>> - * cstate->tupDesc so that the datum is stamped with the right type;
>>> - * for queries output type is RECORDOID this must be the blessed
>>> - * descriptor so that composite_to_json can look it up via
>>> - * lookup_rowtype_tupdesc.
>>> + * Full table or query without column list. For queries, the slot's
>>> + * TupleDesc may carry RECORDOID, which is not registered in the
>>> type
>>> + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
>>> + * call to fail. Build a HeapTuple stamped with the blessed
>>> + * descriptor so the type can be looked up correctly.
>>> */
>>> if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
>>> RECORDOID)
>>> - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
>>> + {
>>> + HeapTuple tup;
>>>
>>> - rowdata = ExecFetchSlotHeapTupleDatum(slot);
>>> + tup = heap_form_tuple(cstate->tupDesc,
>>> + slot->tts_values,
>>> + slot->tts_isnull);
>>> + rowdata = HeapTupleGetDatum(tup);
>>> + }
>>> + else
>>> + {
>>> + rowdata = ExecFetchSlotHeapTupleDatum(slot);
>>> + }
>>>
>> This is better. I've tried to get rid of json_projvalues and json_projnulls.
>> Just using heap_form_tuple, but it won't work.
>>
>> I incorporated the v28-0004 COPY column list into v9-0002.
>> With this patch set, we added four fields to the struct CopyToStateData.
>>
>> + StringInfo json_buf; /* reusable buffer for JSON output,
>> + * initialized in BeginCopyTo */
>> + TupleDesc tupDesc; /* Descriptor for JSON output; for a column
>> + * list this is a projected descriptor */
>> + Datum *json_projvalues; /* pre-allocated projection values, or
>> + * NULL */
>> + bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
>>
>> Using the script in
>> https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40ma...
>> I tested it again on macOS and Linux, and there are no regressions for
>> COPY TO with the TEXT and CSV formats.
> I've reviewed the patch and have some comments:
>
> ---
> I got a SEGV in the following scenario:
>
> postgres(1:1197708)=# create table test (a int, b text, c jsonb);
> CREATE TABLE
> postgres(1:1197708)=# copy test(a, b) to stdout with (format 'json' );
> TRAP: failed Assert("tupdesc->firstNonCachedOffsetAttr >= 0"), File:
> "execTuples.c", Line: 2328, PID: 1197708
> postgres: masahiko postgres [local] COPY(ExceptionalCondition+0x9e) [0xbebe48]
> postgres: masahiko postgres [local] COPY(BlessTupleDesc+0x2b) [0x729b50]
> postgres: masahiko postgres [local] COPY(BeginCopyTo+0xc94) [0x637bdf]
> postgres: masahiko postgres [local] COPY(DoCopy+0xb68) [0x62afbc]
> postgres: masahiko postgres [local]
> COPY(standard_ProcessUtility+0xa22) [0xa0ba48]
> postgres: masahiko postgres [local] COPY(ProcessUtility+0x10e) [0xa0b01f]
> postgres: masahiko postgres [local] COPY() [0xa09872]
> postgres: masahiko postgres [local] COPY() [0xa09acf]
> postgres: masahiko postgres [local] COPY(PortalRun+0x2c8) [0xa0901d]
> postgres: masahiko postgres [local] COPY() [0xa02055]
> postgres: masahiko postgres [local] COPY(PostgresMain+0xaf1) [0xa0724e]
> postgres: masahiko postgres [local] COPY() [0x9fdab9]
> postgres: masahiko postgres [local]
> COPY(postmaster_child_launch+0x165) [0x905378]
> postgres: masahiko postgres [local] COPY() [0x90b600]
> postgres: masahiko postgres [local] COPY() [0x908e6a]
> postgres: masahiko postgres [local] COPY(PostmasterMain+0x14fe) [0x90880c]
> postgres: masahiko postgres [local] COPY(main+0x340) [0x7a1f9c]
>
> It seems to forget to call TupleDescFinalize(). And I think we need
> some regression tests for this case.
>
> ---
> + if (cstate->opts.format == COPY_FORMAT_JSON)
> + {
> + /*
> + * If FORCE_ARRAY has been specified, send the opening bracket.
> + */
> + if (cstate->opts.force_array)
> + {
> + CopySendChar(cstate, '[');
> + CopySendTextLikeEndOfRow(cstate);
> + }
> + }
>
> We can conjunct the two if statement conditions.
>
Here's a v30 set that I hope fixes these issues.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
Attachments:
[text/x-patch] v30-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch (13.1K, 3-v30-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch)
download | inline diff:
From f05872732a3654362d916354a2edd071aa131099 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Mon, 16 Mar 2026 16:49:01 -0400
Subject: [PATCH v30 1/3] introduce CopyFormat, refactor CopyFormatOptions
Currently, the COPY command format is determined by two boolean fields
(binary, csv_mode) in CopyFormatOptions. This approach, while
functional, isn't ideal for implementing other formats in the future.
To simplify adding new formats, introduce a CopyFormat enum. This makes
the code cleaner and more maintainable, allowing for easier integration
of additional formats down the line.
Author: Joel Jacobson <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
---
src/backend/commands/copy.c | 50 +++++++++++++++-------------
src/backend/commands/copyfrom.c | 6 ++--
src/backend/commands/copyfromparse.c | 7 ++--
src/backend/commands/copyto.c | 8 ++---
src/include/commands/copy.h | 13 ++++++--
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 63b86802ba2..2f46be516f2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -576,6 +576,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out = palloc0_object(CopyFormatOptions);
opts_out->file_encoding = -1;
+ /* default format */
+ opts_out->format = COPY_FORMAT_TEXT;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -590,11 +592,11 @@ ProcessCopyOptions(ParseState *pstate,
errorConflictingDefElem(defel, pstate);
format_specified = true;
if (strcmp(fmt, "text") == 0)
- /* default format */ ;
+ opts_out->format = COPY_FORMAT_TEXT;
else if (strcmp(fmt, "csv") == 0)
- opts_out->csv_mode = true;
+ opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
- opts_out->binary = true;
+ opts_out->format = COPY_FORMAT_BINARY;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -754,31 +756,31 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->binary && opts_out->delim)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
- if (opts_out->binary && opts_out->null_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "NULL")));
- if (opts_out->binary && opts_out->default_print)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
/* Set defaults for omitted options */
if (!opts_out->delim)
- opts_out->delim = opts_out->csv_mode ? "," : "\t";
+ opts_out->delim = (opts_out->format == COPY_FORMAT_CSV) ? "," : "\t";
if (!opts_out->null_print)
- opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+ opts_out->null_print = (opts_out->format == COPY_FORMAT_CSV) ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
- if (opts_out->csv_mode)
+ if (opts_out->format == COPY_FORMAT_CSV)
{
if (!opts_out->quote)
opts_out->quote = "\"";
@@ -826,7 +828,7 @@ ProcessCopyOptions(ParseState *pstate,
* future-proofing. Likewise we disallow all digits though only octal
* digits are actually dangerous.
*/
- if (!opts_out->csv_mode &&
+ if (opts_out->format != COPY_FORMAT_CSV &&
strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
opts_out->delim[0]) != NULL)
ereport(ERROR,
@@ -834,43 +836,43 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->binary && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("cannot specify %s in BINARY mode", "HEADER")));
/* Check quote */
- if (!opts_out->csv_mode && opts_out->quote != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "QUOTE")));
- if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+ if (opts_out->format == COPY_FORMAT_CSV && opts_out->delim[0] == opts_out->quote[0])
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY delimiter and quote must be different")));
/* Check escape */
- if (!opts_out->csv_mode && opts_out->escape != NULL)
+ if (opts_out->format != COPY_FORMAT_CSV && opts_out->escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
errmsg("COPY %s requires CSV mode", "ESCAPE")));
- if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+ if (opts_out->format == COPY_FORMAT_CSV && strlen(opts_out->escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single one-byte character")));
/* Check force_quote */
- if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || opts_out->force_quote_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -884,8 +886,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY FROM")));
/* Check force_notnull */
- if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
- opts_out->force_notnull_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_notnull != NIL ||
+ opts_out->force_notnull_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -900,8 +902,8 @@ ProcessCopyOptions(ParseState *pstate,
"COPY TO")));
/* Check force_null */
- if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
- opts_out->force_null_all))
+ if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_null != NIL ||
+ opts_out->force_null_all))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -925,7 +927,7 @@ ProcessCopyOptions(ParseState *pstate,
"NULL")));
/* Don't allow the CSV quote char to appear in the null string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -961,7 +963,7 @@ ProcessCopyOptions(ParseState *pstate,
"DEFAULT")));
/* Don't allow the CSV quote char to appear in the default string. */
- if (opts_out->csv_mode &&
+ if (opts_out->format == COPY_FORMAT_CSV &&
strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -978,7 +980,7 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("NULL specification and DEFAULT specification cannot be the same")));
}
/* Check on_error */
- if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+ if (opts_out->format == COPY_FORMAT_BINARY && opts_out->on_error != COPY_ON_ERROR_STOP)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 95f6cb416a9..a7fe29a363a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -156,9 +156,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyFromRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyFromRoutineBinary;
/* default is text */
@@ -262,7 +262,7 @@ CopyFromErrorCallback(void *arg)
cstate->cur_relname);
return;
}
- if (cstate->opts.binary)
+ if (cstate->opts.format == COPY_FORMAT_BINARY)
{
/* can't usefully display the data */
if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 00ee4154b8b..200d7bc79cd 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -174,7 +174,7 @@ ReceiveCopyBegin(CopyFromState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -752,7 +752,7 @@ bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
- cstate->opts.csv_mode);
+ cstate->opts.format == COPY_FORMAT_CSV);
}
/*
@@ -779,7 +779,8 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
bool done = false;
/* only available for text or csv input */
- Assert(!cstate->opts.binary);
+ Assert(cstate->opts.format == COPY_FORMAT_TEXT ||
+ cstate->opts.format == COPY_FORMAT_CSV);
/* on input check that the header line is correct if needed */
if (cstate->cur_lineno == 0 && cstate->opts.header_line != COPY_HEADER_FALSE)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..b0ee91fc9c1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -182,9 +182,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
static const CopyToRoutine *
CopyToGetRoutine(const CopyFormatOptions *opts)
{
- if (opts->csv_mode)
+ if (opts->format == COPY_FORMAT_CSV)
return &CopyToRoutineCSV;
- else if (opts->binary)
+ else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
/* default is text */
@@ -221,7 +221,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
- if (cstate->opts.csv_mode)
+ if (cstate->opts.format == COPY_FORMAT_CSV)
CopyAttributeOutCSV(cstate, colname, false);
else
CopyAttributeOutText(cstate, colname);
@@ -398,7 +398,7 @@ SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
- int16 format = (cstate->opts.binary ? 1 : 0);
+ int16 format = (cstate->opts.format == COPY_FORMAT_BINARY ? 1 : 0);
int i;
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 877202af67b..2430fb0b2e5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -49,6 +49,16 @@ typedef enum CopyLogVerbosityChoice
COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
+/*
+ * Represents the format of the COPY operation.
+ */
+typedef enum CopyFormat
+{
+ COPY_FORMAT_TEXT = 0,
+ COPY_FORMAT_BINARY,
+ COPY_FORMAT_CSV,
+} CopyFormat;
+
/*
* A struct to hold COPY options, in a parsed form. All of these are related
* to formatting, except for 'freeze', which doesn't really belong here, but
@@ -59,9 +69,8 @@ typedef struct CopyFormatOptions
/* parameters from the COPY command */
int file_encoding; /* file or remote side's character encoding,
* -1 if not specified */
- bool binary; /* binary format? */
+ CopyFormat format; /* format of the COPY operation */
bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
int header_line; /* number of lines to skip or COPY_HEADER_XXX
* value (see the above) */
char *null_print; /* NULL marker string (server encoding!) */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ec8513d90b5..3b2fcebcb54 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -528,6 +528,7 @@ ConversionLocation
ConvertRowtypeExpr
CookedConstraint
CopyDest
+CopyFormat
CopyFormatOptions
CopyFromRoutine
CopyFromState
--
2.43.0
[text/x-patch] v30-0002-json-format-for-COPY-TO.patch (29.5K, 4-v30-0002-json-format-for-COPY-TO.patch)
download | inline diff:
From 9d13f458dea5c3914705b06c0826c393d02cebcf Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Mon, 16 Mar 2026 16:50:24 -0400
Subject: [PATCH v30 2/3] json format for COPY TO
This introduces the JSON format option for the COPY TO command, allowing
users to export query results or table data directly as a stream of JSON
objects (one per line, NDJSON style).
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV formatting
options, including HEADER, DEFAULT, NULL, DELIMITER, FORCE QUOTE,
FORCE NOT NULL, and FORCE NULL.
Column list support is included: when a column list is specified, only
the named columns are emitted in each JSON object.
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Andrey M. Borodin <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Reviewed-by: Daniel Verite <[email protected]>
Reviewed-by: Davin Shearer <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 13 ++-
src/backend/commands/copy.c | 49 ++++++---
src/backend/commands/copyto.c | 162 +++++++++++++++++++++++++++--
src/backend/parser/gram.y | 8 ++
src/backend/utils/adt/json.c | 5 +-
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/include/utils/json.h | 2 +
src/test/regress/expected/copy.out | 146 ++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 88 ++++++++++++++++
10 files changed, 445 insertions(+), 31 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0ad890ef95f..75f55bbf6f8 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -228,10 +228,15 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
See <xref linkend="sql-copy-file-formats"/> below for details.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
@@ -266,7 +271,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
(line) of the file. The default is a tab character in text format,
a comma in <literal>CSV</literal> format.
This must be a single one-byte character.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -280,7 +285,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
string in <literal>CSV</literal> format. You might prefer an
empty string even in text format for cases where you don't want to
distinguish nulls from empty strings.
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
<note>
@@ -303,7 +308,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
is found in the input file, the default value of the corresponding column
will be used.
This option is allowed only in <command>COPY FROM</command>, and only when
- not using <literal>binary</literal> format.
+ not using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
@@ -330,7 +335,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
<command>COPY FROM</command> commands.
</para>
<para>
- This option is not allowed when using <literal>binary</literal> format.
+ This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2f46be516f2..29e22d91ecd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -597,6 +597,8 @@ ProcessCopyOptions(ParseState *pstate,
opts_out->format = COPY_FORMAT_CSV;
else if (strcmp(fmt, "binary") == 0)
opts_out->format = COPY_FORMAT_BINARY;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->format = COPY_FORMAT_JSON;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -756,21 +758,32 @@ ProcessCopyOptions(ParseState *pstate,
* Check for incompatible options (must do these three before inserting
* defaults)
*/
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
+ if (opts_out->delim &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
+ : errmsg("cannot specify %s in JSON mode", "DELIMITER"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->null_print)
+ if (opts_out->null_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "NULL")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "NULL")
+ : errmsg("cannot specify %s in JSON mode", "NULL"));
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->default_print)
+ if (opts_out->default_print &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
+ errcode(ERRCODE_SYNTAX_ERROR),
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "DEFAULT")
+ : errmsg("cannot specify %s in JSON mode", "DEFAULT"));
/* Set defaults for omitted options */
if (!opts_out->delim)
@@ -836,11 +849,15 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
/* Check header */
- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->header_line != COPY_HEADER_FALSE)
+ if (opts_out->header_line != COPY_HEADER_FALSE &&
+ (opts_out->format == COPY_FORMAT_BINARY ||
+ opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
/*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "HEADER")));
+ opts_out->format == COPY_FORMAT_BINARY
+ ? errmsg("cannot specify %s in BINARY mode", "HEADER")
+ : errmsg("cannot specify %s in JSON mode", "HEADER"));
/* Check quote */
if (opts_out->format != COPY_FORMAT_CSV && opts_out->quote != NULL)
@@ -944,6 +961,12 @@ ProcessCopyOptions(ParseState *pstate,
errmsg("COPY %s cannot be used with %s", "FREEZE",
"COPY TO")));
+ /* Check json format */
+ if (opts_out->format == COPY_FORMAT_JSON && is_from)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s is not supported for %s", "FORMAT JSON", "COPY FROM"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b0ee91fc9c1..ffe2268fbb0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -26,6 +26,7 @@
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
+#include "funcapi.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
#include "mb/pg_wchar.h"
@@ -33,6 +34,7 @@
#include "pgstat.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+#include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
@@ -85,6 +87,13 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ StringInfo json_buf; /* reusable buffer for JSON output,
+ * initialized in BeginCopyTo */
+ TupleDesc tupDesc; /* Descriptor for JSON output; for a column
+ * list this is a projected descriptor */
+ Datum *json_projvalues; /* pre-allocated projection values, or
+ * NULL */
+ bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
copy_data_dest_cb data_dest_cb; /* function for writing data */
CopyFormatOptions opts;
@@ -131,6 +140,7 @@ static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -149,9 +159,6 @@ static void CopySendInt16(CopyToState cstate, int16 val);
/*
* COPY TO routines for built-in formats.
- *
- * CSV and text formats share the same TextLike routines except for the
- * one-row callback.
*/
/* text format */
@@ -170,6 +177,14 @@ static const CopyToRoutine CopyToRoutineCSV = {
.CopyToEnd = CopyToTextLikeEnd,
};
+/* json format */
+static const CopyToRoutine CopyToRoutineJson = {
+ .CopyToStart = CopyToTextLikeStart,
+ .CopyToOutFunc = CopyToTextLikeOutFunc,
+ .CopyToOneRow = CopyToJsonOneRow,
+ .CopyToEnd = CopyToTextLikeEnd,
+};
+
/* binary format */
static const CopyToRoutine CopyToRoutineBinary = {
.CopyToStart = CopyToBinaryStart,
@@ -186,12 +201,14 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
return &CopyToRoutineCSV;
else if (opts->format == COPY_FORMAT_BINARY)
return &CopyToRoutineBinary;
+ else if (opts->format == COPY_FORMAT_JSON)
+ return &CopyToRoutineJson;
/* default is text */
return &CopyToRoutineText;
}
-/* Implementation of the start callback for text and CSV formats */
+/* Implementation of the start callback for text, CSV, and json formats */
static void
CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
{
@@ -210,6 +227,8 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
ListCell *cur;
bool hdr_delim = false;
+ Assert(cstate->opts.format != COPY_FORMAT_JSON);
+
foreach(cur, cstate->attnumlist)
{
int attnum = lfirst_int(cur);
@@ -232,7 +251,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
}
/*
- * Implementation of the outfunc callback for text and CSV formats. Assign
+ * Implementation of the outfunc callback for text, CSV, and json formats. Assign
* the output function data to the given *finfo.
*/
static void
@@ -305,13 +324,79 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text and CSV formats */
+/* Implementation of the end callback for text, CSV, and json formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of per-row callback for json format */
+static void
+CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+ Datum rowdata;
+
+ resetStringInfo(cstate->json_buf);
+
+ if (cstate->json_projvalues != NULL)
+ {
+ /*
+ * Column list case: project selected column values into sequential
+ * positions matching the custom TupleDesc, then form a new tuple.
+ */
+ HeapTuple tup;
+ int i = 0;
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ cstate->json_projvalues[i] = slot->tts_values[attnum - 1];
+ cstate->json_projnulls[i] = slot->tts_isnull[attnum - 1];
+ i++;
+ }
+
+ tup = heap_form_tuple(cstate->tupDesc,
+ cstate->json_projvalues,
+ cstate->json_projnulls);
+
+ /*
+ * heap_form_tuple already stamps the datum-length, type-id, and
+ * type-mod fields on t_data, so we can use it directly as a composite
+ * Datum without the extra pallocmemcpy that heap_copy_tuple_as_datum
+ * would do. Any TOAST pointers in the projected values will be
+ * detoasted by the per-column output functions called from
+ * composite_to_json.
+ */
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ {
+ /*
+ * Full table or query without column list. For queries, the slot's
+ * TupleDesc may carry RECORDOID, which is not registered in the type
+ * cache and would cause composite_to_json's lookup_rowtype_tupdesc
+ * call to fail. Build a HeapTuple stamped with the blessed
+ * descriptor so the type can be looked up correctly.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ {
+ HeapTuple tup = heap_form_tuple(cstate->tupDesc,
+ slot->tts_values,
+ slot->tts_isnull);
+
+ rowdata = HeapTupleGetDatum(tup);
+ }
+ else
+ rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ }
+
+ composite_to_json(rowdata, cstate->json_buf, false);
+
+ CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
+
+ CopySendTextLikeEndOfRow(cstate);
+}
+
/*
* Implementation of the start callback for binary format. Send a header
* for a binary copy.
@@ -403,9 +488,23 @@ SendCopyBegin(CopyToState cstate)
pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format); /* overall format */
- pq_sendint16(&buf, natts);
- for (i = 0; i < natts; i++)
- pq_sendint16(&buf, format); /* per-column formats */
+ if (cstate->opts.format != COPY_FORMAT_JSON)
+ {
+ pq_sendint16(&buf, natts);
+ for (i = 0; i < natts; i++)
+ pq_sendint16(&buf, format); /* per-column formats */
+ }
+ else
+ {
+ /*
+ * For JSON format, report one text-format column. Each CopyData
+ * message contains one complete JSON object, not individual column
+ * values, so the per-column count is always 1.
+ */
+ pq_sendint16(&buf, 1);
+ pq_sendint16(&buf, 0);
+ }
+
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
@@ -507,7 +606,7 @@ CopySendEndOfRow(CopyToState cstate)
}
/*
- * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * Wrapper function of CopySendEndOfRow for text, CSV, and json formats. Sends the
* line termination and do common appropriate things for the end of row.
*/
static inline void
@@ -750,6 +849,7 @@ BeginCopyTo(ParseState *pstate,
tupDesc = RelationGetDescr(cstate->rel);
cstate->partitions = children;
+ cstate->tupDesc = tupDesc;
}
else
{
@@ -886,11 +986,53 @@ BeginCopyTo(ParseState *pstate,
ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
+ tupDesc = BlessTupleDesc(tupDesc);
+ cstate->tupDesc = tupDesc;
}
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+ /* Set up JSON-specific state */
+ if (cstate->opts.format == COPY_FORMAT_JSON)
+ {
+ cstate->json_buf = makeStringInfo();
+
+ if (attnamelist != NIL && rel)
+ {
+ int natts = list_length(cstate->attnumlist);
+ TupleDesc resultDesc;
+
+ /*
+ * Build a TupleDesc describing only the selected columns so that
+ * composite_to_json() emits the right column names and types.
+ */
+ resultDesc = CreateTemplateTupleDesc(natts);
+
+ foreach_int(attnum, cstate->attnumlist)
+ {
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ TupleDescInitEntry(resultDesc,
+ foreach_current_index(attnum) + 1,
+ NameStr(attr->attname),
+ attr->atttypid,
+ attr->atttypmod,
+ attr->attndims);
+ }
+
+ TupleDescFinalize(resultDesc);
+ cstate->tupDesc = BlessTupleDesc(resultDesc);
+
+ /*
+ * Pre-allocate arrays for projecting selected column values
+ * into sequential positions matching the custom TupleDesc.
+ */
+ cstate->json_projvalues = palloc_array(Datum, natts);
+ cstate->json_projnulls = palloc_array(bool, natts);
+ }
+ }
+
num_phys_attrs = tupDesc->natts;
/* Convert FORCE_QUOTE name list to per-column flags, check validity */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f01f5734fe9..c01b9fc3997 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3617,6 +3617,10 @@ copy_opt_item:
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
@@ -3699,6 +3703,10 @@ copy_generic_opt_elem:
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 0b161398465..f609d7b9417 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -86,8 +86,6 @@ typedef struct JsonAggState
JsonUniqueBuilderState unique_check;
} JsonAggState;
-static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
const Datum *vals, const bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
@@ -517,8 +515,9 @@ array_to_json_internal(Datum array, StringInfo result, bool use_line_feeds)
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
-static void
+void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 199fc64ddf5..ac36f4591f7 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3425,7 +3425,7 @@ match_previous_words(int pattern_id,
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
else if (TailMatches("FORMAT"))
- COMPLETE_WITH("binary", "csv", "text");
+ COMPLETE_WITH("binary", "csv", "text", "json");
/* Complete COPY <sth> FROM|TO filename WITH (FREEZE */
else if (TailMatches("FREEZE"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2430fb0b2e5..2b5bef6738e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,7 @@ typedef enum CopyFormat
COPY_FORMAT_TEXT = 0,
COPY_FORMAT_BINARY,
COPY_FORMAT_CSV,
+ COPY_FORMAT_JSON,
} CopyFormat;
/*
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f8cc52b1e78..2f4be40518d 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -17,6 +17,8 @@
#include "lib/stringinfo.h"
/* functions in json.c */
+extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern void escape_json_with_len(StringInfo buf, const char *str, int len);
extern void escape_json_text(StringInfo buf, const text *txt);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index d0d563e0fa8..7f2d2e065f6 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -73,6 +73,152 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+{"?column?":1}
+{"?column?":2}
+copy (select 1 as foo union all select 2) to stdout with (format json);
+{"foo":1}
+{"foo":2}
+copy (values (1), (2)) TO stdout with (format json);
+{"column1":1}
+{"column1":2}
+copy copytest to stdout json;
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy copytest to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+copy (select * from copytest) to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+ERROR: cannot specify DELIMITER in JSON mode
+copy copytest to stdout (format json, null '\N');
+ERROR: cannot specify NULL in JSON mode
+copy copytest to stdout (format json, default '|');
+ERROR: cannot specify DEFAULT in JSON mode
+copy copytest to stdout (format json, header);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, header 1);
+ERROR: cannot specify HEADER in JSON mode
+copy copytest to stdout (format json, quote '"');
+ERROR: COPY QUOTE requires CSV mode
+copy copytest to stdout (format json, escape '"');
+ERROR: COPY ESCAPE requires CSV mode
+copy copytest to stdout (format json, force_quote *);
+ERROR: COPY FORCE_QUOTE requires CSV mode
+copy copytest to stdout (format json, force_not_null *);
+ERROR: COPY FORCE_NOT_NULL requires CSV mode
+copy copytest to stdout (format json, force_null *);
+ERROR: COPY FORCE_NULL requires CSV mode
+copy copytest to stdout (format json, on_error ignore);
+ERROR: COPY ON_ERROR cannot be used with COPY TO
+LINE 1: copy copytest to stdout (format json, on_error ignore);
+ ^
+copy copytest to stdout (format json, reject_limit 1);
+ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE
+copy copytest from stdin(format json);
+ERROR: COPY FORMAT JSON is not supported for COPY FROM
+-- all of the above should yield error
+-- column list with json format
+copy copytest (style, test, filler) to stdout (format json);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+-- full table
+copy copyjsontest_types to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2},"arr":[1,2,3],"n":3.14,"b":true,"ts":"2024-01-15T10:30:00","t":"hello"}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}},"arr":[4,5],"n":-99.99,"b":false,"ts":"2024-06-30T23:59:59","t":"world"}
+{"id":3,"js":null,"jsb":null,"arr":[],"n":null,"b":null,"ts":null,"t":null}
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+{"id":1,"js":{"a":1},"jsb":{"b": 2}}
+{"id":2,"js":[1,null,"x"],"jsb":{"nested": {"k": "v"}}}
+{"id":3,"js":null,"jsb":null}
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+{"id":1,"arr":[1,2,3],"n":3.14,"b":true}
+{"id":2,"arr":[4,5],"n":-99.99,"b":false}
+{"id":3,"arr":[],"n":null,"b":null}
+copy copyjsontest_types (jsb, t) to stdout (format json);
+{"jsb":{"b": 2},"t":"hello"}
+{"jsb":{"nested": {"k": "v"}},"t":"world"}
+{"jsb":null,"t":null}
+copy copyjsontest_types (id, ts) to stdout (format json);
+{"id":1,"ts":"2024-01-15T10:30:00"}
+{"id":2,"ts":"2024-06-30T23:59:59"}
+{"id":3,"ts":null}
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+{"js":{"a":1}}
+{"js":[1,null,"x"]}
+{"js":null}
+copy copyjsontest_types (jsb) to stdout (format json);
+{"jsb":{"b": 2}}
+{"jsb":{"nested": {"k": "v"}}}
+{"jsb":null}
+drop table copyjsontest_types;
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+copy copyjsontest to stdout json;
+{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+{"id":1,"f1":"aaa\"bbb","f2":null}
+{"id":2,"f1":"aaa\\bbb","f2":null}
+{"id":3,"f1":"aaa/bbb","f2":null}
+{"id":4,"f1":"aaa\bbbb","f2":null}
+{"id":5,"f1":"aaa\fbbb","f2":null}
+{"id":6,"f1":"aaa\nbbb","f2":null}
+{"id":7,"f1":"aaa\rbbb","f2":null}
+{"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 65cbdaf7f3e..404f4321085 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -82,6 +82,94 @@ this is just a line full of junk that would error out if parsed
copy copytest3 to stdout csv header;
+--- test copying in JSON mode with various styles
+copy (select 1 union all select 2) to stdout with (format json);
+copy (select 1 as foo union all select 2) to stdout with (format json);
+copy (values (1), (2)) TO stdout with (format json);
+copy copytest to stdout json;
+copy copytest to stdout (format json);
+copy (select * from copytest) to stdout (format json);
+
+-- all of the following should yield error
+copy copytest to stdout (format json, delimiter '|');
+copy copytest to stdout (format json, null '\N');
+copy copytest to stdout (format json, default '|');
+copy copytest to stdout (format json, header);
+copy copytest to stdout (format json, header 1);
+copy copytest to stdout (format json, quote '"');
+copy copytest to stdout (format json, escape '"');
+copy copytest to stdout (format json, force_quote *);
+copy copytest to stdout (format json, force_not_null *);
+copy copytest to stdout (format json, force_null *);
+copy copytest to stdout (format json, on_error ignore);
+copy copytest to stdout (format json, reject_limit 1);
+copy copytest from stdin(format json);
+-- all of the above should yield error
+
+-- column list with json format
+copy copytest (style, test, filler) to stdout (format json);
+
+-- column list with diverse data types
+create temp table copyjsontest_types (
+ id int,
+ js json,
+ jsb jsonb,
+ arr int[],
+ n numeric(10,2),
+ b boolean,
+ ts timestamp,
+ t text);
+
+insert into copyjsontest_types values
+(1, '{"a":1}', '{"b":2}', '{1,2,3}', 3.14, true,
+ '2024-01-15 10:30:00', 'hello'),
+(2, '[1,null,"x"]', '{"nested":{"k":"v"}}', '{4,5}', -99.99, false,
+ '2024-06-30 23:59:59', 'world'),
+(3, 'null', 'null', '{}', null, null, null, null);
+
+-- full table
+copy copyjsontest_types to stdout (format json);
+
+-- column subsets exercising each type
+copy copyjsontest_types (id, js, jsb) to stdout (format json);
+copy copyjsontest_types (id, arr, n, b) to stdout (format json);
+copy copyjsontest_types (jsb, t) to stdout (format json);
+copy copyjsontest_types (id, ts) to stdout (format json);
+
+-- single column: json and jsonb
+copy copyjsontest_types (js) to stdout (format json);
+copy copyjsontest_types (jsb) to stdout (format json);
+
+drop table copyjsontest_types;
+
+-- embedded escaped characters
+create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+insert into copyjsontest (f1) values
+(E'aaa\"bbb'::text),
+(E'aaa\\bbb'::text),
+(E'aaa\/bbb'::text),
+(E'aaa\bbbb'::text),
+(E'aaa\fbbb'::text),
+(E'aaa\nbbb'::text),
+(E'aaa\rbbb'::text),
+(E'aaa\tbbb'::text);
+
+copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
--
2.43.0
[text/x-patch] v30-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch (12.5K, 5-v30-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch)
download | inline diff:
From 9e0a65578eb1de9c8d0521b2931fbe76ad5f2bcf Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Mon, 16 Mar 2026 16:51:12 -0400
Subject: [PATCH v30 3/3] Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the
default behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
Author: Joe Conway <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Florents Tselai <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/copy.sgml | 30 +++++++++++++++++++++++
src/backend/commands/copy.c | 13 ++++++++++
src/backend/commands/copyto.c | 38 ++++++++++++++++++++++++++++--
src/bin/psql/tab-complete.in.c | 2 +-
src/include/commands/copy.h | 1 +
src/test/regress/expected/copy.out | 37 +++++++++++++++++++++++++++++
src/test/regress/sql/copy.sql | 13 ++++++++++
7 files changed, 131 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 75f55bbf6f8..a79587f7613 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -40,6 +40,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
HEADER [ <replaceable class="parameter">boolean</replaceable> | <replaceable class="parameter">integer</replaceable> | MATCH ]
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -366,6 +367,19 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</listitem>
</varlistentry>
+ <varlistentry id="sql-copy-params-force-array">
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>json</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="sql-copy-params-force-quote">
<term><literal>FORCE_QUOTE</literal></term>
<listitem>
@@ -1103,6 +1117,22 @@ COPY country TO STDOUT (DELIMITER '|');
</programlisting>
</para>
+<para>
+ When the <literal>FORCE_ARRAY</literal> option is enabled,
+ the entire output is wrapped in a single JSON array with rows separated by commas:
+<programlisting>
+COPY (SELECT * FROM (VALUES(1),(2)) val(id)) TO STDOUT (FORMAT JSON, FORCE_ARRAY);
+</programlisting>
+The output is as follows:
+<screen>
+[
+ {"id":1}
+,{"id":2}
+]
+</screen>
+</para>
+
+
<para>
To copy data from a file into the <literal>country</literal> table:
<programlisting>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 29e22d91ecd..e837f417d0d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -569,6 +569,7 @@ ProcessCopyOptions(ParseState *pstate,
bool on_error_specified = false;
bool log_verbosity_specified = false;
bool reject_limit_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
@@ -725,6 +726,13 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "on_error") == 0)
{
if (on_error_specified)
@@ -967,6 +975,11 @@ ProcessCopyOptions(ParseState *pstate,
errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("COPY %s is not supported for %s", "FORMAT JSON", "COPY FROM"));
+ if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY %s can only be used with JSON mode", "FORCE_ARRAY"));
+
if (opts_out->default_print)
{
if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ffe2268fbb0..12872f04eef 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -87,6 +87,7 @@ typedef struct CopyToStateData
List *attnumlist; /* integer list of attnums to copy */
char *filename; /* filename, or NULL for STDOUT */
bool is_program; /* is 'filename' a program to popen? */
+ bool json_row_delim_needed; /* need delimiter before next row */
StringInfo json_buf; /* reusable buffer for JSON output,
* initialized in BeginCopyTo */
TupleDesc tupDesc; /* Descriptor for JSON output; for a column
@@ -141,6 +142,7 @@ static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
bool is_csv);
static void CopyToTextLikeEnd(CopyToState cstate);
static void CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToJsonEnd(CopyToState cstate);
static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -182,7 +184,7 @@ static const CopyToRoutine CopyToRoutineJson = {
.CopyToStart = CopyToTextLikeStart,
.CopyToOutFunc = CopyToTextLikeOutFunc,
.CopyToOneRow = CopyToJsonOneRow,
- .CopyToEnd = CopyToTextLikeEnd,
+ .CopyToEnd = CopyToJsonEnd,
};
/* binary format */
@@ -248,6 +250,15 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
CopySendTextLikeEndOfRow(cstate);
}
+
+ /*
+ * If FORCE_ARRAY has been specified, send the opening bracket.
+ */
+ if (cstate->opts.format == COPY_FORMAT_JSON && cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendTextLikeEndOfRow(cstate);
+ }
}
/*
@@ -324,13 +335,24 @@ CopyToTextLikeOneRow(CopyToState cstate,
CopySendTextLikeEndOfRow(cstate);
}
-/* Implementation of the end callback for text, CSV, and json formats */
+/* Implementation of the end callback for text and CSV formats */
static void
CopyToTextLikeEnd(CopyToState cstate)
{
/* Nothing to do here */
}
+/* Implementation of the end callback for json format */
+static void
+CopyToJsonEnd(CopyToState cstate)
+{
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendTextLikeEndOfRow(cstate);
+ }
+}
+
/* Implementation of per-row callback for json format */
static void
CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
@@ -392,6 +414,18 @@ CopyToJsonOneRow(CopyToState cstate, TupleTableSlot *slot)
composite_to_json(rowdata, cstate->json_buf, false);
+ if (cstate->opts.force_array)
+ {
+ if (cstate->json_row_delim_needed)
+ CopySendChar(cstate, ',');
+ else
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }
+ }
+
CopySendData(cstate, cstate->json_buf->data, cstate->json_buf->len);
CopySendTextLikeEndOfRow(cstate);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index ac36f4591f7..f44a1f22ebf 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1232,7 +1232,7 @@ Copy_common_options, "DEFAULT", "FORCE_NOT_NULL", "FORCE_NULL", "FREEZE", \
/* COPY TO options */
#define Copy_to_options \
-Copy_common_options, "FORCE_QUOTE"
+Copy_common_options, "FORCE_QUOTE", "FORCE_ARRAY"
/*
* These object types were introduced later than our support cutoff of
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2b5bef6738e..abecfe51098 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -88,6 +88,7 @@ typedef struct CopyFormatOptions
List *force_notnull; /* list of column names */
bool force_notnull_all; /* FORCE_NOT_NULL *? */
bool *force_notnull_flags; /* per-column CSV FNN flags */
+ bool force_array; /* add JSON array decorations */
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 7f2d2e065f6..1714faab39c 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -83,6 +83,16 @@ copy (select 1 as foo union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
{"column1":1}
{"column1":2}
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+[
+ {"?column?":1}
+,{"?column?":2}
+]
+copy (values (1), (2)) TO stdout with (format json, force_array true);
+[
+ {"column1":1}
+,{"column1":2}
+]
copy copytest to stdout json;
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
@@ -134,6 +144,33 @@ copy copytest (style, test, filler) to stdout (format json);
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+ERROR: COPY FORCE_ARRAY can only be used with JSON mode
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+[
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+,{"style":"Unix","test":"abc\ndef","filler":2}
+,{"style":"Mac","test":"abc\rdef","filler":3}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+]
+copy copytest(style, test) to stdout (format json, force_array true);
+[
+ {"style":"DOS","test":"abc\r\ndef"}
+,{"style":"Unix","test":"abc\ndef"}
+,{"style":"Mac","test":"abc\rdef"}
+,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb"}
+]
+copy copytest to stdout (format json, force_array false);
+{"style":"DOS","test":"abc\r\ndef","filler":1}
+{"style":"Unix","test":"abc\ndef","filler":2}
+{"style":"Mac","test":"abc\rdef","filler":3}
+{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+-- force_array with empty result set
+copy (select 1 where false) to stdout (format json, force_array);
+[
+]
-- column list with diverse data types
create temp table copyjsontest_types (
id int,
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 404f4321085..eaad290b257 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -86,6 +86,8 @@ copy copytest3 to stdout csv header;
copy (select 1 union all select 2) to stdout with (format json);
copy (select 1 as foo union all select 2) to stdout with (format json);
copy (values (1), (2)) TO stdout with (format json);
+copy (select 1 union all select 2) to stdout with (format json, force_array true);
+copy (values (1), (2)) TO stdout with (format json, force_array true);
copy copytest to stdout json;
copy copytest to stdout (format json);
copy (select * from copytest) to stdout (format json);
@@ -109,6 +111,17 @@ copy copytest from stdin(format json);
-- column list with json format
copy copytest (style, test, filler) to stdout (format json);
+-- should fail: force_array requires json format
+copy copytest to stdout (format csv, force_array true);
+
+-- force_array variants
+copy copytest to stdout (format json, force_array);
+copy copytest(style, test) to stdout (format json, force_array true);
+copy copytest to stdout (format json, force_array false);
+
+-- force_array with empty result set
+copy (select 1 where false) to stdout (format json, force_array);
+
-- column list with diverse data types
create temp table copyjsontest_types (
id int,
--
2.43.0
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-17 17:50 Masahiko Sawada <[email protected]>
parent: Andrew Dunstan <[email protected]>
1 sibling, 0 replies; 28+ messages in thread
From: Masahiko Sawada @ 2026-03-17 17:50 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: jian he <[email protected]>; Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Daniel Verite <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Mon, Mar 16, 2026 at 1:59 PM Andrew Dunstan <[email protected]> wrote:
>
>
> On 2026-03-16 Mo 2:24 PM, Masahiko Sawada wrote:
>
> On Sun, Mar 8, 2026 at 8:49 PM jian he <[email protected]> wrote:
>
> On Mon, Mar 9, 2026 at 3:44 AM Andrew Dunstan <[email protected]> wrote:
>
> Hmm. But should we be scribbling on slot->tts_tupleDescriptor like that?
> How about something like this?:
>
> - * Full table or query without column list. Ensure the slot uses
> - * cstate->tupDesc so that the datum is stamped with the right type;
> - * for queries output type is RECORDOID this must be the blessed
> - * descriptor so that composite_to_json can look it up via
> - * lookup_rowtype_tupdesc.
> + * Full table or query without column list. For queries, the slot's
> + * TupleDesc may carry RECORDOID, which is not registered in the
> type
> + * cache and would cause composite_to_json's lookup_rowtype_tupdesc
> + * call to fail. Build a HeapTuple stamped with the blessed
> + * descriptor so the type can be looked up correctly.
> */
> if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid ==
> RECORDOID)
> - slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;
> + {
> + HeapTuple tup;
>
> - rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + tup = heap_form_tuple(cstate->tupDesc,
> + slot->tts_values,
> + slot->tts_isnull);
> + rowdata = HeapTupleGetDatum(tup);
> + }
> + else
> + {
> + rowdata = ExecFetchSlotHeapTupleDatum(slot);
> + }
>
> This is better. I've tried to get rid of json_projvalues and json_projnulls.
> Just using heap_form_tuple, but it won't work.
>
> I incorporated the v28-0004 COPY column list into v9-0002.
> With this patch set, we added four fields to the struct CopyToStateData.
>
> + StringInfo json_buf; /* reusable buffer for JSON output,
> + * initialized in BeginCopyTo */
> + TupleDesc tupDesc; /* Descriptor for JSON output; for a column
> + * list this is a projected descriptor */
> + Datum *json_projvalues; /* pre-allocated projection values, or
> + * NULL */
> + bool *json_projnulls; /* pre-allocated projection nulls, or NULL */
>
> Using the script in
> https://www.postgresql.org/message-id/CACJufxFFZqxC3p4WjpTEi4riaJm%3DpADX%2Bpy0yQ0%3DRWTn5cqK3Q%40ma...
> I tested it again on macOS and Linux, and there are no regressions for
> COPY TO with the TEXT and CSV formats.
>
> I've reviewed the patch and have some comments:
>
> ---
> I got a SEGV in the following scenario:
>
> postgres(1:1197708)=# create table test (a int, b text, c jsonb);
> CREATE TABLE
> postgres(1:1197708)=# copy test(a, b) to stdout with (format 'json' );
> TRAP: failed Assert("tupdesc->firstNonCachedOffsetAttr >= 0"), File:
> "execTuples.c", Line: 2328, PID: 1197708
> postgres: masahiko postgres [local] COPY(ExceptionalCondition+0x9e) [0xbebe48]
> postgres: masahiko postgres [local] COPY(BlessTupleDesc+0x2b) [0x729b50]
> postgres: masahiko postgres [local] COPY(BeginCopyTo+0xc94) [0x637bdf]
> postgres: masahiko postgres [local] COPY(DoCopy+0xb68) [0x62afbc]
> postgres: masahiko postgres [local]
> COPY(standard_ProcessUtility+0xa22) [0xa0ba48]
> postgres: masahiko postgres [local] COPY(ProcessUtility+0x10e) [0xa0b01f]
> postgres: masahiko postgres [local] COPY() [0xa09872]
> postgres: masahiko postgres [local] COPY() [0xa09acf]
> postgres: masahiko postgres [local] COPY(PortalRun+0x2c8) [0xa0901d]
> postgres: masahiko postgres [local] COPY() [0xa02055]
> postgres: masahiko postgres [local] COPY(PostgresMain+0xaf1) [0xa0724e]
> postgres: masahiko postgres [local] COPY() [0x9fdab9]
> postgres: masahiko postgres [local]
> COPY(postmaster_child_launch+0x165) [0x905378]
> postgres: masahiko postgres [local] COPY() [0x90b600]
> postgres: masahiko postgres [local] COPY() [0x908e6a]
> postgres: masahiko postgres [local] COPY(PostmasterMain+0x14fe) [0x90880c]
> postgres: masahiko postgres [local] COPY(main+0x340) [0x7a1f9c]
>
> It seems to forget to call TupleDescFinalize(). And I think we need
> some regression tests for this case.
>
> ---
> + if (cstate->opts.format == COPY_FORMAT_JSON)
> + {
> + /*
> + * If FORCE_ARRAY has been specified, send the opening bracket.
> + */
> + if (cstate->opts.force_array)
> + {
> + CopySendChar(cstate, '[');
> + CopySendTextLikeEndOfRow(cstate);
> + }
> + }
>
> We can conjunct the two if statement conditions.
>
>
> Here's a v30 set that I hope fixes these issues.
Thank you for updating the patch! The patches look good to me.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-18 14:37 Daniel Verite <[email protected]>
parent: Andrew Dunstan <[email protected]>
1 sibling, 1 reply; 28+ messages in thread
From: Daniel Verite @ 2026-03-18 14:37 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Jian He <[email protected]>; Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Andrew Dunstan wrote:
> Here's a v30 set that I hope fixes these issues.
Currently there's no difference in output between the null
json value and the SQL null.
postgres=# create table tbl (j jsonb);
CREATE TABLE
postgres=# insert into tbl values('null');
INSERT 0 1
postgres=# insert into tbl values(null);
INSERT 0 1
postgres=# table tbl;
j
------
null
(2 rows)
postgres=# copy tbl to stdout with (format json);
{"j":null}
{"j":null}
If we had to reload this file, we could not determine which
kind of null we had even though they are different at the SQL
level:
postgres=# select null::jsonb is distinct from 'null'::jsonb;
?column?
----------
t
Does it have to be that way or are there valid distinct outputs
that we could use to avoid this ambiguity?
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-19 01:58 jian he <[email protected]>
parent: Daniel Verite <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: jian he @ 2026-03-19 01:58 UTC (permalink / raw)
To: Daniel Verite <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Mar 18, 2026 at 10:37 PM Daniel Verite <[email protected]> wrote:
>
> Currently there's no difference in output between the null
> json value and the SQL null.
>
> postgres=# create table tbl (j jsonb);
> postgres=# insert into tbl values('null');
> postgres=# insert into tbl values(null);
> postgres=# copy tbl to stdout with (format json);
> {"j":null}
> {"j":null}
>
> Does it have to be that way or are there valid distinct outputs
> that we could use to avoid this ambiguity?
>
This is an existing (quite old) behavior of
composite_to_json->datum_to_json_internal, IMHO.
```
if (is_null)
{
appendBinaryStringInfo(result, "null", strlen("null"));
return;
}
```
produce the same results as
```
case JSONTYPE_JSON:
/* JSON and JSONB output will already be escaped */
outputstr = OidOutputFunctionCall(outfuncoid, val);
appendStringInfoString(result, outputstr);
pfree(outputstr);
break;
```
Therefore I intended to document it as below:
<refsect2 id="sql-copy-json-format" xreflabel="JSON Format">
<title>JSON Format</title>
<para>
When the <literal>json</literal> format is used, data is
exported with one JSON object per line,
where each line corresponds to a single record.
The <literal>json</literal> format has no standard way to
distinguish between an SQL <literal>NULL</literal> and a JSON
<literal>null</literal> literal.
In the examples that follow, the following table containing JSON
data will be used:
<programlisting>
CREATE TABLE my_test (a jsonb, b int);
INSERT INTO my_test VALUES ('null', 1), (NULL, 1);
</programlisting>
When exporting this table using the <literal>json</literal> format:
<programlisting>
COPY my_test TO STDOUT (FORMAT JSON);
</programlisting>
In the resulting output, both the SQL <literal>NULL</literal> and
the JSON <literal>null</literal> are rendered identically:
<screen>
{"a":null,"b":1}
{"a":null,"b":1}
</screen>
</para>
</refsect2>
what do you think?
--
jian
https://www.enterprisedb.com/
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-19 15:02 Andrew Dunstan <[email protected]>
parent: jian he <[email protected]>
0 siblings, 1 reply; 28+ messages in thread
From: Andrew Dunstan @ 2026-03-19 15:02 UTC (permalink / raw)
To: jian he <[email protected]>; Daniel Verite <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Joe Conway <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2026-03-18 We 9:58 PM, jian he wrote:
> On Wed, Mar 18, 2026 at 10:37 PM Daniel Verite <[email protected]> wrote:
>> Currently there's no difference in output between the null
>> json value and the SQL null.
>>
>> postgres=# create table tbl (j jsonb);
>> postgres=# insert into tbl values('null');
>> postgres=# insert into tbl values(null);
>> postgres=# copy tbl to stdout with (format json);
>> {"j":null}
>> {"j":null}
>>
>> Does it have to be that way or are there valid distinct outputs
>> that we could use to avoid this ambiguity?
>>
> This is an existing (quite old) behavior of
> composite_to_json->datum_to_json_internal, IMHO.
>
> ```
> if (is_null)
> {
> appendBinaryStringInfo(result, "null", strlen("null"));
> return;
> }
> ```
> produce the same results as
> ```
> case JSONTYPE_JSON:
> /* JSON and JSONB output will already be escaped */
> outputstr = OidOutputFunctionCall(outfuncoid, val);
> appendStringInfoString(result, outputstr);
> pfree(outputstr);
> break;
> ```
>
> Therefore I intended to document it as below:
>
> <refsect2 id="sql-copy-json-format" xreflabel="JSON Format">
> <title>JSON Format</title>
> <para>
> When the <literal>json</literal> format is used, data is
> exported with one JSON object per line,
> where each line corresponds to a single record.
> The <literal>json</literal> format has no standard way to
> distinguish between an SQL <literal>NULL</literal> and a JSON
> <literal>null</literal> literal.
> In the examples that follow, the following table containing JSON
> data will be used:
> <programlisting>
> CREATE TABLE my_test (a jsonb, b int);
> INSERT INTO my_test VALUES ('null', 1), (NULL, 1);
> </programlisting>
>
> When exporting this table using the <literal>json</literal> format:
> <programlisting>
> COPY my_test TO STDOUT (FORMAT JSON);
> </programlisting>
> In the resulting output, both the SQL <literal>NULL</literal> and
> the JSON <literal>null</literal> are rendered identically:
> <screen>
> {"a":null,"b":1}
> {"a":null,"b":1}
> </screen>
> </para>
> </refsect2>
>
>
>
> what do you think?
>
>
>
I can live with that, if others can.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 28+ messages in thread
* Re: Emitting JSON to file using COPY TO
@ 2026-03-19 16:06 Joe Conway <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 0 replies; 28+ messages in thread
From: Joe Conway @ 2026-03-19 16:06 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; jian he <[email protected]>; Daniel Verite <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Junwang Zhao <[email protected]>; Florents Tselai <[email protected]>; Andrey M. Borodin <[email protected]>; Dean Rasheed <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 3/19/26 11:02, Andrew Dunstan wrote:
>
> On 2026-03-18 We 9:58 PM, jian he wrote:
>> On Wed, Mar 18, 2026 at 10:37 PM Daniel Verite <[email protected]> wrote:
>>> Currently there's no difference in output between the null
>>> json value and the SQL null.
>>>
>>> postgres=# create table tbl (j jsonb);
>>> postgres=# insert into tbl values('null');
>>> postgres=# insert into tbl values(null);
>>> postgres=# copy tbl to stdout with (format json);
>>> {"j":null}
>>> {"j":null}
>>>
>>> Does it have to be that way or are there valid distinct outputs
>>> that we could use to avoid this ambiguity?
>>>
>> This is an existing (quite old) behavior of
>> composite_to_json->datum_to_json_internal, IMHO.
>>
>> ```
>> if (is_null)
>> {
>> appendBinaryStringInfo(result, "null", strlen("null"));
>> return;
>> }
>> ```
>> produce the same results as
>> ```
>> case JSONTYPE_JSON:
>> /* JSON and JSONB output will already be escaped */
>> outputstr = OidOutputFunctionCall(outfuncoid, val);
>> appendStringInfoString(result, outputstr);
>> pfree(outputstr);
>> break;
>> ```
>>
>> Therefore I intended to document it as below:
>>
>> <refsect2 id="sql-copy-json-format" xreflabel="JSON Format">
>> <title>JSON Format</title>
>> <para>
>> When the <literal>json</literal> format is used, data is
>> exported with one JSON object per line,
>> where each line corresponds to a single record.
>> The <literal>json</literal> format has no standard way to
>> distinguish between an SQL <literal>NULL</literal> and a JSON
>> <literal>null</literal> literal.
>> In the examples that follow, the following table containing JSON
>> data will be used:
>> <programlisting>
>> CREATE TABLE my_test (a jsonb, b int);
>> INSERT INTO my_test VALUES ('null', 1), (NULL, 1);
>> </programlisting>
>>
>> When exporting this table using the <literal>json</literal> format:
>> <programlisting>
>> COPY my_test TO STDOUT (FORMAT JSON);
>> </programlisting>
>> In the resulting output, both the SQL <literal>NULL</literal> and
>> the JSON <literal>null</literal> are rendered identically:
>> <screen>
>> {"a":null,"b":1}
>> {"a":null,"b":1}
>> </screen>
>> </para>
>> </refsect2>
>>
>>
>>
>> what do you think?
>>
>>
>>
>
> I can live with that, if others can.
+1
WFM
--
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 28+ messages in thread
end of thread, other threads:[~2026-03-19 16:06 UTC | newest]
Thread overview: 28+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-12-06 19:47 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 23:09 ` Joe Conway <[email protected]>
2023-12-07 01:10 ` Joe Conway <[email protected]>
2024-01-19 08:09 ` Masahiko Sawada <[email protected]>
2024-01-23 05:31 ` jian he <[email protected]>
2024-01-27 05:55 ` Junwang Zhao <[email protected]>
2024-01-31 09:49 ` vignesh C <[email protected]>
2024-01-31 09:58 ` Junwang Zhao <[email protected]>
2024-01-31 13:26 ` Alvaro Herrera <[email protected]>
2025-03-11 08:23 ` jian he <[email protected]>
2025-07-04 05:27 ` jian he <[email protected]>
2025-08-10 15:20 ` jian he <[email protected]>
2025-10-01 06:16 ` jian he <[email protected]>
2025-11-10 00:53 ` jian he <[email protected]>
2025-11-29 02:46 ` jian he <[email protected]>
2026-03-04 15:51 ` Andrew Dunstan <[email protected]>
2026-03-05 20:49 ` Andrew Dunstan <[email protected]>
2026-03-06 09:38 ` jian he <[email protected]>
2026-03-08 16:16 ` jian he <[email protected]>
2026-03-08 19:44 ` Andrew Dunstan <[email protected]>
2026-03-09 03:48 ` jian he <[email protected]>
2026-03-16 18:24 ` Masahiko Sawada <[email protected]>
2026-03-16 20:59 ` Andrew Dunstan <[email protected]>
2026-03-17 17:50 ` Masahiko Sawada <[email protected]>
2026-03-18 14:37 ` Daniel Verite <[email protected]>
2026-03-19 01:58 ` jian he <[email protected]>
2026-03-19 15:02 ` Andrew Dunstan <[email protected]>
2026-03-19 16:06 ` Joe Conway <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox