public inbox for [email protected]
help / color / mirror / Atom feedRe: Emitting JSON to file using COPY TO
37+ messages / 8 participants
[nested] [flat]
* Re: Emitting JSON to file using COPY TO
@ 2023-12-05 02:54 Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-05 02:54 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; Andrew Dunstan <[email protected]>; pgsql-hackers
On 12/4/23 17:55, Davin Shearer wrote:
> Sorry about the top posting / top quoting... the link you sent me gives
> me a 404. I'm not exactly sure what top quoting / posting means and
> Googling those terms wasn't helpful for me, but I've removed the quoting
> that my mail client is automatically "helpfully" adding to my emails. I
> mean no offense.
No offense taken. But it is worthwhile to conform to the very long
established norms of the mailing lists on which you participate. See:
https://en.wikipedia.org/wiki/Posting_style
I would describe the Postgres list style (based on that link) as
"inline replying, in which the different parts of the reply follow
the relevant parts of the original post...[with]...trimming of the
original text"
> There are however a few characters that need to be escaped
> 1. |"|(double quote)
> 2. |\|(backslash)
> 3. |/|(forward slash)
> 4. |\b|(backspace)
> 5. |\f|(form feed)
> 6. |\n|(new line)
> 7. |\r|(carriage return)
> 8. |\t|(horizontal tab)
>
> These characters should be represented in the test cases to see how the
> escaping behaves and to ensure that the escaping is done properly per
> JSON requirements.
I can look at adding these as test cases. The latest version of the
patch (attached) includes some of that already. For reference, the tests
so far include this:
8<-------------------------------
test=# select * from copytest;
style | test | filler
---------+----------+--------
DOS | abc\r +| 1
| def |
Unix | abc +| 2
| def |
Mac | abc\rdef | 3
esc\ape | a\r\\r\ +| 4
| \nb |
(4 rows)
test=# copy copytest to stdout (format json);
{"style":"DOS","test":"abc\r\ndef","filler":1}
{"style":"Unix","test":"abc\ndef","filler":2}
{"style":"Mac","test":"abc\rdef","filler":3}
{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
8<-------------------------------
At this point "COPY TO" should be sending exactly the unaltered output
of the postgres JSON processing functions.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
[text/x-patch] copyto_json.004.diff (18.8K, 2-copyto_json.004.diff)
download | inline diff:
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+ FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ROW_DELIMITER</literal></term>
+ <listitem>
+ <para>
+ Force output of commas as row delimiters, in addition to the usual
+ end of line characters. This option is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of array decorations at the beginning and end of output.
+ This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+ option. It is allowed only in <command>COPY TO</command>, and only
+ when using <literal>JSON</literal> format.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
<varlistentry>
<term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..0236a9e 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,426 ----
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool force_row_delimiter_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 445,452 ----
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 544,563 ----
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ {
+ if (force_row_delimiter_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_row_delimiter_specified = true;
+ opts_out->force_row_delimiter = defGetBoolean(defel);
+ }
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "convert_selectively") == 0)
{
/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 616,647 ----
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->json_mode)
+ {
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+
+ if (opts_out->force_array &&
+ force_row_delimiter_specified &&
+ !opts_out->force_row_delimiter)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true")));
+
+ if (opts_out->force_array)
+ opts_out->force_row_delimiter = true;
+ }
+ else if (opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ else if (opts_out->force_row_delimiter)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 711,721 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot specify HEADER in BINARY mode")));
+ if (opts_out->json_mode && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify HEADER in JSON mode")));
+
/* Check quote */
if (!opts_out->csv_mode && opts_out->quote != NULL)
ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+ #include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
/* NOTE: there's a copy of this in copyfromparse.c */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
/* non-export function prototypes */
static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
CopySendEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified
+ * send the opening bracket.
+ */
+ if (cstate->opts.json_mode)
+ {
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendEndOfRow(cstate);
+ }
+ json_row_delim_needed = false;
+ }
}
if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
CopySendEndOfRow(cstate);
}
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified
+ * send the closing bracket.
+ */
+ if (cstate->opts.json_mode &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendEndOfRow(cstate);
+ }
+
MemoryContextDelete(cstate->rowcontext);
if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
!
! foreach(cur, cstate->attnumlist)
{
! int attnum = lfirst_int(cur);
! Datum value = slot->tts_values[attnum - 1];
! bool isnull = slot->tts_isnull[attnum - 1];
! if (!cstate->opts.binary)
! {
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
! else
! CopySendInt32(cstate, -1);
! }
! else
{
if (!cstate->opts.binary)
{
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
else
! CopyAttributeOutText(cstate, string);
}
else
{
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
}
}
}
CopySendEndOfRow(cstate);
--- 945,1028 ----
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! if (!cstate->opts.json_mode)
{
! bool need_delim = false;
! FmgrInfo *out_functions = cstate->out_functions;
! ListCell *cur;
! char *string;
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
! foreach(cur, cstate->attnumlist)
{
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
if (!cstate->opts.binary)
{
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
!
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
else
! CopySendInt32(cstate, -1);
}
else
{
! if (!cstate->opts.binary)
! {
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
! else
! CopyAttributeOutText(cstate, string);
! }
! else
! {
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
! }
}
}
}
+ else
+ {
+ Datum rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ StringInfo result;
+
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ if (json_row_delim_needed &&
+ cstate->opts.force_row_delimiter)
+ {
+ CopySendChar(cstate, ',');
+ }
+ else if (cstate->opts.force_row_delimiter)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ json_row_delim_needed = true;
+ }
+
+ CopySendData(cstate, result->data, result->len);
+ }
CopySendEndOfRow(cstate);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ROW DELIMITER
+ {
+ $$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ }
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
JsonUniqueBuilderState unique_check;
} JsonAggState;
- static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
/*
* Turn a composite / record into JSON.
*/
! static void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
--- 488,496 ----
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
! void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool json_mode; /* JSON format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
+ bool force_row_delimiter; /* use comma as per-row JSON delimiter */
+ bool force_array; /* JSON array; implies force_row_delimiter */
bool convert_selectively; /* do selective binary conversion? */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
#include "lib/stringinfo.h"
/* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..a34cb39 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,98 ----
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_row_delimiter);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout
+ (format json, force_array true, force_row_delimiter false);
+ ERROR: cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true
+ -- Error
+ copy copytest to stdout
+ (format json, header);
+ ERROR: cannot specify HEADER in JSON mode
+ -- embedded quotes
+ create temp table copyjsontest (
+ id int8,
+ f1 text,
+ f2 timestamptz);
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..189c703 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,94 ----
copy copytest3 to stdout csv header;
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+
+ copy copytest to stdout (format json);
+
+ copy copytest to stdout (format json, force_array);
+
+ copy copytest to stdout (format json, force_row_delimiter);
+
+ -- Error
+ copy copytest to stdout
+ (format json, force_array true, force_row_delimiter false);
+
+ -- Error
+ copy copytest to stdout
+ (format json, header);
+
+ -- embedded quotes
+ create temp table copyjsontest (
+ id int8,
+ f1 text,
+ f2 timestamptz);
+
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+ copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 16:54 ` Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-05 16:54 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; Andrew Dunstan <[email protected]>; pgsql-hackers
On 12/4/23 21:54, Joe Conway wrote:
> On 12/4/23 17:55, Davin Shearer wrote:
>> There are however a few characters that need to be escaped
>
>> 1. |"|(double quote)
>> 2. |\|(backslash)
>> 3. |/|(forward slash)
>> 4. |\b|(backspace)
>> 5. |\f|(form feed)
>> 6. |\n|(new line)
>> 7. |\r|(carriage return)
>> 8. |\t|(horizontal tab)
>>
>> These characters should be represented in the test cases to see how the
>> escaping behaves and to ensure that the escaping is done properly per
>> JSON requirements.
>
> I can look at adding these as test cases.
So I did a quick check:
8<--------------------------
with t(f1) as
(
values
(E'aaa\"bbb'::text),
(E'aaa\\bbb'::text),
(E'aaa\/bbb'::text),
(E'aaa\bbbb'::text),
(E'aaa\fbbb'::text),
(E'aaa\nbbb'::text),
(E'aaa\rbbb'::text),
(E'aaa\tbbb'::text)
)
select
length(t.f1),
t.f1,
row_to_json(t)
from t;
length | f1 | row_to_json
--------+-------------+-------------------
7 | aaa"bbb | {"f1":"aaa\"bbb"}
7 | aaa\bbb | {"f1":"aaa\\bbb"}
7 | aaa/bbb | {"f1":"aaa/bbb"}
7 | aaa\x08bbb | {"f1":"aaa\bbbb"}
7 | aaa\x0Cbbb | {"f1":"aaa\fbbb"}
7 | aaa +| {"f1":"aaa\nbbb"}
| bbb |
7 | aaa\rbbb | {"f1":"aaa\rbbb"}
7 | aaa bbb | {"f1":"aaa\tbbb"}
(8 rows)
8<--------------------------
This is all independent of my patch for COPY TO. If I am reading that
correctly, everything matches Davin's table *except* the forward slash
("/"). I defer to the experts on the thread to debate that...
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 17:43 ` Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Davin Shearer @ 2023-12-05 17:43 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Andrew Dunstan <[email protected]>; Joe Conway <[email protected]>
Thanks for the wayback machine link Andrew. I read it, understood it, and
will comply.
Joe, those test cases look great and the outputs are the same as `jq`.
As for forward slashes being escaped, I found this:
https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped
.
Forward slash escaping is optional, so not escaping them in Postgres is
okay. The important thing is that the software _reading_ JSON interprets
both '\/' and '/' as '/'.
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-05 18:51 ` Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-05 18:51 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; pgsql-hackers; +Cc: Andrew Dunstan <[email protected]>
On 12/5/23 12:43, Davin Shearer wrote:
> Joe, those test cases look great and the outputs are the same as `jq`.
<link to info regarding escaping of forward slashes>
> Forward slash escaping is optional, so not escaping them in Postgres is
> okay. The important thing is that the software _reading_ JSON
> interprets both '\/' and '/' as '/'.
Thanks for the review and info. I modified the existing regression test
thus:
8<--------------------------
create temp table copyjsontest (
id bigserial,
f1 text,
f2 timestamptz);
insert into copyjsontest
select g.i,
CASE WHEN g.i % 2 = 0 THEN
'line with '' in it: ' || g.i::text
ELSE
'line with " in it: ' || g.i::text
END,
'Mon Feb 10 17:32:01 1997 PST'
from generate_series(1,5) as g(i);
insert into copyjsontest (f1) values
(E'aaa\"bbb'::text),
(E'aaa\\bbb'::text),
(E'aaa\/bbb'::text),
(E'aaa\bbbb'::text),
(E'aaa\fbbb'::text),
(E'aaa\nbbb'::text),
(E'aaa\rbbb'::text),
(E'aaa\tbbb'::text);
copy copyjsontest to stdout json;
{"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T20:32:01-05:00"}
{"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T20:32:01-05:00"}
{"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T20:32:01-05:00"}
{"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T20:32:01-05:00"}
{"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T20:32:01-05:00"}
{"id":1,"f1":"aaa\"bbb","f2":null}
{"id":2,"f1":"aaa\\bbb","f2":null}
{"id":3,"f1":"aaa/bbb","f2":null}
{"id":4,"f1":"aaa\bbbb","f2":null}
{"id":5,"f1":"aaa\fbbb","f2":null}
{"id":6,"f1":"aaa\nbbb","f2":null}
{"id":7,"f1":"aaa\rbbb","f2":null}
{"id":8,"f1":"aaa\tbbb","f2":null}
8<--------------------------
I think the code, documentation, and tests are in pretty good shape at
this point. Latest version attached.
Any other comments or complaints out there?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
[text/x-patch] copyto_json.005.diff (20.3K, 2-copyto_json.005.diff)
download | inline diff:
Add json format mode to COPY TO
Add json format mode support to COPY TO, which includes three output
variations: 1) "json lines" which is each row as a json object delimited
by newlines (the default); 2) "json lines", except include comma delimiters
between json objects; and 3) "json array" which is the same as #2, but with
the addition of a leading "[" and trailing "]" to form a valid json array.
Early versions: helpful hints/reviews provided by Nathan Bossart,
Tom Lane, and Maciek Sakrejda. Final versions: reviewed by Andrew Dunstan
and Davin Shearer.
Requested-by: Davin Shearer
Author: Joe Conway
Reviewed-by: Andrew Dunstan, Davin Shearer
Discussion: https://postgr.es/m/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+ FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ROW_DELIMITER</literal></term>
+ <listitem>
+ <para>
+ Force output of commas as row delimiters, in addition to the usual
+ end of line characters. This option is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of array decorations at the beginning and end of output.
+ This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+ option. It is allowed only in <command>COPY TO</command>, and only
+ when using <literal>JSON</literal> format.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
<varlistentry>
<term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..0236a9e 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,426 ----
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool force_row_delimiter_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 445,452 ----
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 544,563 ----
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ {
+ if (force_row_delimiter_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_row_delimiter_specified = true;
+ opts_out->force_row_delimiter = defGetBoolean(defel);
+ }
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "convert_selectively") == 0)
{
/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 616,647 ----
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->json_mode)
+ {
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+
+ if (opts_out->force_array &&
+ force_row_delimiter_specified &&
+ !opts_out->force_row_delimiter)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true")));
+
+ if (opts_out->force_array)
+ opts_out->force_row_delimiter = true;
+ }
+ else if (opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ else if (opts_out->force_row_delimiter)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 711,721 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot specify HEADER in BINARY mode")));
+ if (opts_out->json_mode && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify HEADER in JSON mode")));
+
/* Check quote */
if (!opts_out->csv_mode && opts_out->quote != NULL)
ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+ #include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
/* NOTE: there's a copy of this in copyfromparse.c */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
/* non-export function prototypes */
static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
CopySendEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified
+ * send the opening bracket.
+ */
+ if (cstate->opts.json_mode)
+ {
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendEndOfRow(cstate);
+ }
+ json_row_delim_needed = false;
+ }
}
if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
CopySendEndOfRow(cstate);
}
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified
+ * send the closing bracket.
+ */
+ if (cstate->opts.json_mode &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendEndOfRow(cstate);
+ }
+
MemoryContextDelete(cstate->rowcontext);
if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
!
! foreach(cur, cstate->attnumlist)
{
! int attnum = lfirst_int(cur);
! Datum value = slot->tts_values[attnum - 1];
! bool isnull = slot->tts_isnull[attnum - 1];
! if (!cstate->opts.binary)
! {
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
! else
! CopySendInt32(cstate, -1);
! }
! else
{
if (!cstate->opts.binary)
{
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
else
! CopyAttributeOutText(cstate, string);
}
else
{
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
}
}
}
CopySendEndOfRow(cstate);
--- 945,1028 ----
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! if (!cstate->opts.json_mode)
{
! bool need_delim = false;
! FmgrInfo *out_functions = cstate->out_functions;
! ListCell *cur;
! char *string;
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
! foreach(cur, cstate->attnumlist)
{
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
if (!cstate->opts.binary)
{
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
!
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
else
! CopySendInt32(cstate, -1);
}
else
{
! if (!cstate->opts.binary)
! {
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
! else
! CopyAttributeOutText(cstate, string);
! }
! else
! {
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
! }
}
}
}
+ else
+ {
+ Datum rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ StringInfo result;
+
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ if (json_row_delim_needed &&
+ cstate->opts.force_row_delimiter)
+ {
+ CopySendChar(cstate, ',');
+ }
+ else if (cstate->opts.force_row_delimiter)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ json_row_delim_needed = true;
+ }
+
+ CopySendData(cstate, result->data, result->len);
+ }
CopySendEndOfRow(cstate);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ROW DELIMITER
+ {
+ $$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ }
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
JsonUniqueBuilderState unique_check;
} JsonAggState;
- static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
/*
* Turn a composite / record into JSON.
*/
! static void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
--- 488,496 ----
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
! void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool json_mode; /* JSON format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
+ bool force_row_delimiter; /* use comma as per-row JSON delimiter */
+ bool force_array; /* JSON array; implies force_row_delimiter */
bool convert_selectively; /* do selective binary conversion? */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
#include "lib/stringinfo.h"
/* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..1a8dde2 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,115 ----
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_row_delimiter);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout
+ (format json, force_array true, force_row_delimiter false);
+ ERROR: cannot specify FORCE_ROW_DELIMITER false with FORCE_ARRAY true
+ -- Error
+ copy copytest to stdout
+ (format json, header);
+ ERROR: cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..aac4ccd 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,104 ----
copy copytest3 to stdout csv header;
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+
+ copy copytest to stdout (format json);
+
+ copy copytest to stdout (format json, force_array);
+
+ copy copytest to stdout (format json, force_row_delimiter);
+
+ -- Error
+ copy copytest to stdout
+ (format json, force_array true, force_row_delimiter false);
+
+ -- Error
+ copy copytest to stdout
+ (format json, header);
+
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+
+ copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 19:50 ` Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Davin Shearer @ 2023-12-05 19:50 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Andrew Dunstan <[email protected]>; Joe Conway <[email protected]>
Hi Joe,
In reviewing the 005 patch, I think that when used with FORCE ARRAY, we
should also _imply_ FORCE ROW DELIMITER. I can't envision a use case where
someone would want to use FORCE ARRAY without also using FORCE ROW
DELIMITER. I can, however, envision a use case where someone would want
FORCE ROW DELIMITER without FORCE ARRAY, like maybe including into a larger
array. I definitely appreciate these options and the flexibility that they
afford from a user perspective.
In the test output, will you also show the different variations with FORCE
ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false), (false,
true), (true, true)}? Technically you've already shown me the (false,
false) case as those are the defaults.
Thanks!
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-05 20:55 ` Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-05 20:55 UTC (permalink / raw)
To: Davin Shearer <[email protected]>; pgsql-hackers; +Cc: Joe Conway <[email protected]>
On 2023-12-05 Tu 14:50, Davin Shearer wrote:
> Hi Joe,
>
> In reviewing the 005 patch, I think that when used with FORCE ARRAY,
> we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
> case where someone would want to use FORCE ARRAY without also using
> FORCE ROW DELIMITER. I can, however, envision a use case where
> someone would want FORCE ROW DELIMITER without FORCE ARRAY, like maybe
> including into a larger array. I definitely appreciate these options
> and the flexibility that they afford from a user perspective.
>
> In the test output, will you also show the different variations with
> FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false),
> (false, true), (true, true)}? Technically you've already shown me the
> (false, false) case as those are the defaults.
>
>
I don't understand the point of FORCE_ROW_DELIMITER at all. There is
only one legal delimiter of array items in JSON, and that's a comma.
There's no alternative and it's not optional. So in the array case you
MUST have commas and in any other case (e.g. LINES) I can't see why you
would have them.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-05 21:02 ` Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:12 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
0 siblings, 2 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-05 21:02 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/5/23 15:55, Andrew Dunstan wrote:
>
> On 2023-12-05 Tu 14:50, Davin Shearer wrote:
>> Hi Joe,
>>
>> In reviewing the 005 patch, I think that when used with FORCE ARRAY,
>> we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
>> case where someone would want to use FORCE ARRAY without also using
>> FORCE ROW DELIMITER. I can, however, envision a use case where
>> someone would want FORCE ROW DELIMITER without FORCE ARRAY, like maybe
>> including into a larger array. I definitely appreciate these options
>> and the flexibility that they afford from a user perspective.
>>
>> In the test output, will you also show the different variations with
>> FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false),
>> (false, true), (true, true)}? Technically you've already shown me the
>> (false, false) case as those are the defaults.
>>
>>
>
> I don't understand the point of FORCE_ROW_DELIMITER at all. There is
> only one legal delimiter of array items in JSON, and that's a comma.
> There's no alternative and it's not optional. So in the array case you
> MUST have commas and in any other case (e.g. LINES) I can't see why you
> would have them.
The current patch already *does* imply row delimiters in the array case.
It says so here:
8<---------------------------
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of array decorations at the beginning and end of
output.
+ This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+ option. It is allowed only in <command>COPY TO</command>, and only
+ when using <literal>JSON</literal> format.
+ The default is <literal>false</literal>.
+ </para>
8<---------------------------
and it does so here:
8<---------------------------
+ if (opts_out->force_array)
+ opts_out->force_row_delimiter = true;
8<---------------------------
and it shows that here:
8<---------------------------
+ copy copytest to stdout (format json, force_array);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
8<---------------------------
It also does not allow explicitly setting row delimiters false while
force_array is true here:
8<---------------------------
+ if (opts_out->force_array &&
+ force_row_delimiter_specified &&
+ !opts_out->force_row_delimiter)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify FORCE_ROW_DELIMITER false with
FORCE_ARRAY true")));
8<---------------------------
Am I understanding something incorrectly?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 21:09 ` Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
1 sibling, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-05 21:09 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/5/23 16:02, Joe Conway wrote:
> On 12/5/23 15:55, Andrew Dunstan wrote:
>> and in any other case (e.g. LINES) I can't see why you
>> would have them.
Oh I didn't address this -- I saw examples in the interwebs of MSSQL
server I think [1] which had the non-array with commas import and export
style. It was not that tough to support and the code as written already
does it, so why not?
[1]
https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-wit...
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 21:20 ` Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 18:59 ` Re: Emitting JSON to file using COPY TO Daniel Verite <[email protected]>
0 siblings, 2 replies; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-05 21:20 UTC (permalink / raw)
To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-05 Tu 16:09, Joe Conway wrote:
> On 12/5/23 16:02, Joe Conway wrote:
>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>> and in any other case (e.g. LINES) I can't see why you
>>> would have them.
>
> Oh I didn't address this -- I saw examples in the interwebs of MSSQL
> server I think [1] which had the non-array with commas import and
> export style. It was not that tough to support and the code as written
> already does it, so why not?
>
> [1]
> https://learn.microsoft.com/en-us/sql/relational-databases/json/remove-square-brackets-from-json-wit...
>
>
That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-05 21:46 ` Joe Conway <[email protected]>
2023-12-05 23:45 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
1 sibling, 2 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-05 21:46 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/5/23 16:20, Andrew Dunstan wrote:
> On 2023-12-05 Tu 16:09, Joe Conway wrote:
>> On 12/5/23 16:02, Joe Conway wrote:
>>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>> and in any other case (e.g. LINES) I can't see why you
>>>> would have them.
>>
>> Oh I didn't address this -- I saw examples in the interwebs of MSSQL
>> server I think [1] which had the non-array with commas import and
>> export style. It was not that tough to support and the code as written
>> already does it, so why not?
>
> That seems quite absurd, TBH. I know we've catered for some absurdity in
> the CSV code (much of it down to me), so maybe we need to be liberal in
> what we accept here too. IMNSHO, we should produce either a single JSON
> document (the ARRAY case) or a series of JSON documents, one per row
> (the LINES case).
So your preference would be to not allow the non-array-with-commas case
but if/when we implement COPY FROM we would accept that format? As in
Postel'a law ("be conservative in what you do, be liberal in what you
accept from others")?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 23:45 ` Davin Shearer <[email protected]>
1 sibling, 0 replies; 37+ messages in thread
From: Davin Shearer @ 2023-12-05 23:45 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; pgsql-hackers
> Am I understanding something incorrectly?
No, you've got it. You already covered the concerns there.
> That seems quite absurd, TBH. I know we've catered for some absurdity in
> the CSV code (much of it down to me), so maybe we need to be liberal in
> what we accept here too. IMNSHO, we should produce either a single JSON
> document (the ARRAY case) or a series of JSON documents, one per row
> (the LINES case).
For what it's worth, I agree with Andrew on this. I also agree with COPY
FROM allowing for potentially bogus commas at the end of non-arrays for
interop with other products, but to not do that in COPY TO (unless there is
some real compelling case to do so). Emitting bogus JSON (non-array with
commas) feels wrong and would be nice to not perpetuate that, if possible.
Thanks again for doing this. If I can be of any help, let me know.
If\When this makes it into the production product, I'll be using this
feature for sure.
-Davin
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-06 12:36 ` Andrew Dunstan <[email protected]>
2023-12-06 13:49 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
1 sibling, 2 replies; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-06 12:36 UTC (permalink / raw)
To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-05 Tu 16:46, Joe Conway wrote:
> On 12/5/23 16:20, Andrew Dunstan wrote:
>> On 2023-12-05 Tu 16:09, Joe Conway wrote:
>>> On 12/5/23 16:02, Joe Conway wrote:
>>>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>>> and in any other case (e.g. LINES) I can't see why you
>>>>> would have them.
>>>
>>> Oh I didn't address this -- I saw examples in the interwebs of MSSQL
>>> server I think [1] which had the non-array with commas import and
>>> export style. It was not that tough to support and the code as
>>> written already does it, so why not?
>>
>> That seems quite absurd, TBH. I know we've catered for some absurdity in
>> the CSV code (much of it down to me), so maybe we need to be liberal in
>> what we accept here too. IMNSHO, we should produce either a single JSON
>> document (the ARRAY case) or a series of JSON documents, one per row
>> (the LINES case).
>
>
> So your preference would be to not allow the non-array-with-commas
> case but if/when we implement COPY FROM we would accept that format?
> As in Postel'a law ("be conservative in what you do, be liberal in
> what you accept from others")?
Yes, I think so.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 13:49 ` Joe Conway <[email protected]>
2023-12-06 15:32 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
1 sibling, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-06 13:49 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 07:36, Andrew Dunstan wrote:
>
> On 2023-12-05 Tu 16:46, Joe Conway wrote:
>> On 12/5/23 16:20, Andrew Dunstan wrote:
>>> On 2023-12-05 Tu 16:09, Joe Conway wrote:
>>>> On 12/5/23 16:02, Joe Conway wrote:
>>>>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>>>> and in any other case (e.g. LINES) I can't see why you
>>>>>> would have them.
>>>>
>>>> Oh I didn't address this -- I saw examples in the interwebs of MSSQL
>>>> server I think [1] which had the non-array with commas import and
>>>> export style. It was not that tough to support and the code as
>>>> written already does it, so why not?
>>>
>>> That seems quite absurd, TBH. I know we've catered for some absurdity in
>>> the CSV code (much of it down to me), so maybe we need to be liberal in
>>> what we accept here too. IMNSHO, we should produce either a single JSON
>>> document (the ARRAY case) or a series of JSON documents, one per row
>>> (the LINES case).
>>
>> So your preference would be to not allow the non-array-with-commas
>> case but if/when we implement COPY FROM we would accept that format?
>> As in Postel'a law ("be conservative in what you do, be liberal in
>> what you accept from others")?
>
>
> Yes, I think so.
Awesome. The attached does it that way. I also ran pgindent.
I believe this is ready to commit unless there are further comments or
objections.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
[text/x-patch] copyto_json.006.diff (18.7K, 2-copyto_json.006.diff)
download | inline diff:
Add json format mode to COPY TO
Add json format mode support to COPY TO, which includes two output
variations: 1) "json lines" which is each row as a json object delimited
by newlines (the default); and 2) "json array" which is the same as #1,
but with the addition of a leading "[", trailing "]", and comma row
delimiters, to form a valid json array.
Early versions: helpful hints/reviews provided by Nathan Bossart,
Tom Lane, and Maciek Sakrejda. Final versions: reviewed by Andrew Dunstan
and Davin Shearer.
Requested-by: Davin Shearer
Author: Joe Conway
Reviewed-by: Andrew Dunstan, Davin Shearer
Discussion: https://postgr.es/m/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,49 ----
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+ FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
</synopsis>
</refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 207,220 ----
Selects the data format to be read or written:
<literal>text</literal>,
<literal>csv</literal> (Comma Separated Values),
+ <literal>json</literal> (JavaScript Object Notation),
or <literal>binary</literal>.
The default is <literal>text</literal>.
</para>
+ <para>
+ The <literal>json</literal> option is allowed only in
+ <command>COPY TO</command>.
+ </para>
</listitem>
</varlistentry>
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 378,396 ----
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><literal>FORCE_ARRAY</literal></term>
+ <listitem>
+ <para>
+ Force output of square brackets as array decorations at the beginning
+ and end of output, and commas between the rows. It is allowed only in
+ <command>COPY TO</command>, and only when using
+ <literal>JSON</literal> format. The default is
+ <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
<varlistentry>
<term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 419,424 ****
--- 419,425 ----
bool format_specified = false;
bool freeze_specified = false;
bool header_specified = false;
+ bool force_array_specified = false;
ListCell *option;
/* Support external use for option sanity checking */
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 444,451 ----
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
+ else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 543,555 ----
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "force_array") == 0)
+ {
+ if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ force_array_specified = true;
+ opts_out->force_array = defGetBoolean(defel);
+ }
else if (strcmp(defel->defname, "convert_selectively") == 0)
{
/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 608,625 ----
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DEFAULT in BINARY mode")));
+ if (opts_out->json_mode)
+ {
+ if (is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use JSON mode in COPY FROM")));
+ }
+ else if (opts_out->force_array)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FORCE_ARRAY requires JSON mode")));
+
/* Set defaults for omitted options */
if (!opts_out->delim)
opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 689,699 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot specify HEADER in BINARY mode")));
+ if (opts_out->json_mode && opts_out->header_line)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot specify HEADER in JSON mode")));
+
/* Check quote */
if (!opts_out->csv_mode && opts_out->quote != NULL)
ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..6e351ec 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
+ #include "utils/json.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
/* NOTE: there's a copy of this in copyfromparse.c */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
/* non-export function prototypes */
static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
CopySendEndOfRow(cstate);
}
+
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send
+ * the opening bracket.
+ */
+ if (cstate->opts.json_mode)
+ {
+ if (cstate->opts.force_array)
+ {
+ CopySendChar(cstate, '[');
+ CopySendEndOfRow(cstate);
+ }
+ json_row_delim_needed = false;
+ }
}
if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
CopySendEndOfRow(cstate);
}
+ /*
+ * If JSON has been requested, and FORCE_ARRAY has been specified send the
+ * closing bracket.
+ */
+ if (cstate->opts.json_mode &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ']');
+ CopySendEndOfRow(cstate);
+ }
+
MemoryContextDelete(cstate->rowcontext);
if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
static void
CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
{
- bool need_delim = false;
- FmgrInfo *out_functions = cstate->out_functions;
MemoryContext oldcontext;
- ListCell *cur;
- char *string;
MemoryContextReset(cstate->rowcontext);
oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
!
! foreach(cur, cstate->attnumlist)
{
! int attnum = lfirst_int(cur);
! Datum value = slot->tts_values[attnum - 1];
! bool isnull = slot->tts_isnull[attnum - 1];
! if (!cstate->opts.binary)
! {
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
! else
! CopySendInt32(cstate, -1);
! }
! else
{
if (!cstate->opts.binary)
{
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
else
! CopyAttributeOutText(cstate, string);
}
else
{
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
}
}
}
CopySendEndOfRow(cstate);
--- 945,1028 ----
CopySendInt16(cstate, list_length(cstate->attnumlist));
}
! if (!cstate->opts.json_mode)
{
! bool need_delim = false;
! FmgrInfo *out_functions = cstate->out_functions;
! ListCell *cur;
! char *string;
! /* Make sure the tuple is fully deconstructed */
! slot_getallattrs(slot);
! foreach(cur, cstate->attnumlist)
{
+ int attnum = lfirst_int(cur);
+ Datum value = slot->tts_values[attnum - 1];
+ bool isnull = slot->tts_isnull[attnum - 1];
+
if (!cstate->opts.binary)
{
! if (need_delim)
! CopySendChar(cstate, cstate->opts.delim[0]);
! need_delim = true;
! }
!
! if (isnull)
! {
! if (!cstate->opts.binary)
! CopySendString(cstate, cstate->opts.null_print_client);
else
! CopySendInt32(cstate, -1);
}
else
{
! if (!cstate->opts.binary)
! {
! string = OutputFunctionCall(&out_functions[attnum - 1],
! value);
! if (cstate->opts.csv_mode)
! CopyAttributeOutCSV(cstate, string,
! cstate->opts.force_quote_flags[attnum - 1],
! list_length(cstate->attnumlist) == 1);
! else
! CopyAttributeOutText(cstate, string);
! }
! else
! {
! bytea *outputbytes;
! outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! value);
! CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! CopySendData(cstate, VARDATA(outputbytes),
! VARSIZE(outputbytes) - VARHDRSZ);
! }
}
}
}
+ else
+ {
+ Datum rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ StringInfo result;
+
+ result = makeStringInfo();
+ composite_to_json(rowdata, result, false);
+
+ if (json_row_delim_needed &&
+ cstate->opts.force_array)
+ {
+ CopySendChar(cstate, ',');
+ }
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ json_row_delim_needed = true;
+ }
+
+ CopySendData(cstate, result->data, result->len);
+ }
CopySendEndOfRow(cstate);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..e6789d7 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
{
$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
}
+ | JSON
+ {
+ $$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ }
| HEADER_P
{
$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3461 ----
{
$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
}
+ | FORCE ARRAY
+ {
+ $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ }
;
/* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3498,3507 ----
{
$$ = makeDefElem($1, $2, @1);
}
+ | FORMAT_LA copy_generic_opt_arg
+ {
+ $$ = makeDefElem("format", $2, @1);
+ }
;
copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
JsonUniqueBuilderState unique_check;
} JsonAggState;
- static void composite_to_json(Datum composite, StringInfo result,
- bool use_line_feeds);
static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
Datum *vals, bool *nulls, int *valcount,
JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
/*
* Turn a composite / record into JSON.
*/
! static void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
--- 488,496 ----
/*
* Turn a composite / record into JSON.
+ * Exported so COPY TO can use it.
*/
! void
composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
{
HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..97899b6 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
bool binary; /* binary format? */
bool freeze; /* freeze rows on loading? */
bool csv_mode; /* Comma Separated Value format? */
+ bool json_mode; /* JSON format? */
CopyHeaderChoice header_line; /* header line? */
char *null_print; /* NULL marker string (server encoding!) */
int null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,68 ----
List *force_null; /* list of column names */
bool force_null_all; /* FORCE_NULL *? */
bool *force_null_flags; /* per-column CSV FN flags */
+ bool force_array; /* add JSON array decorations */
bool convert_selectively; /* do selective binary conversion? */
List *convert_select; /* list of column names (can be NIL) */
} CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
#include "lib/stringinfo.h"
/* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ bool use_line_feeds);
extern void escape_json(StringInfo buf, const char *str);
extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
const int *tzp);
diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index b48365e..31913f6 100644
*** a/src/test/regress/expected/copy.out
--- b/src/test/regress/expected/copy.out
*************** copy copytest3 to stdout csv header;
*** 42,47 ****
--- 42,117 ----
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ copy copytest to stdout (format json, force_array);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array true);
+ [
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ ,{"style":"Unix","test":"abc\ndef","filler":2}
+ ,{"style":"Mac","test":"abc\rdef","filler":3}
+ ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ ]
+ copy copytest to stdout (format json, force_array false);
+ {"style":"DOS","test":"abc\r\ndef","filler":1}
+ {"style":"Unix","test":"abc\ndef","filler":2}
+ {"style":"Mac","test":"abc\rdef","filler":3}
+ {"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
+ -- Error
+ copy copytest to stdout (format json, header);
+ ERROR: cannot specify HEADER in JSON mode
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+ copy copyjsontest to stdout json;
+ {"id":1,"f1":"line with \" in it: 1","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":2,"f1":"line with ' in it: 2","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":3,"f1":"line with \" in it: 3","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":4,"f1":"line with ' in it: 4","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":5,"f1":"line with \" in it: 5","f2":"1997-02-10T17:32:01-08:00"}
+ {"id":1,"f1":"aaa\"bbb","f2":null}
+ {"id":2,"f1":"aaa\\bbb","f2":null}
+ {"id":3,"f1":"aaa/bbb","f2":null}
+ {"id":4,"f1":"aaa\bbbb","f2":null}
+ {"id":5,"f1":"aaa\fbbb","f2":null}
+ {"id":6,"f1":"aaa\nbbb","f2":null}
+ {"id":7,"f1":"aaa\rbbb","f2":null}
+ {"id":8,"f1":"aaa\tbbb","f2":null}
create temp table copytest4 (
c1 int,
"colname with tab: " text);
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 43d2e90..4b76541 100644
*** a/src/test/regress/sql/copy.sql
--- b/src/test/regress/sql/copy.sql
*************** this is just a line full of junk that wo
*** 54,59 ****
--- 54,101 ----
copy copytest3 to stdout csv header;
+ --- test copying in JSON mode with various styles
+ copy copytest to stdout json;
+
+ copy copytest to stdout (format json);
+
+ copy copytest to stdout (format json, force_array);
+
+ copy copytest to stdout (format json, force_array true);
+
+ copy copytest to stdout (format json, force_array false);
+
+ -- Error
+ copy copytest to stdout (format json, header);
+
+ -- embedded escaped characters
+ create temp table copyjsontest (
+ id bigserial,
+ f1 text,
+ f2 timestamptz);
+
+ insert into copyjsontest
+ select g.i,
+ CASE WHEN g.i % 2 = 0 THEN
+ 'line with '' in it: ' || g.i::text
+ ELSE
+ 'line with " in it: ' || g.i::text
+ END,
+ 'Mon Feb 10 17:32:01 1997 PST'
+ from generate_series(1,5) as g(i);
+
+ insert into copyjsontest (f1) values
+ (E'aaa\"bbb'::text),
+ (E'aaa\\bbb'::text),
+ (E'aaa\/bbb'::text),
+ (E'aaa\bbbb'::text),
+ (E'aaa\fbbb'::text),
+ (E'aaa\nbbb'::text),
+ (E'aaa\rbbb'::text),
+ (E'aaa\tbbb'::text);
+
+ copy copyjsontest to stdout json;
+
create temp table copytest4 (
c1 int,
"colname with tab: " text);
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 13:49 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-06 15:32 ` Andrew Dunstan <[email protected]>
2023-12-06 16:15 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-06 15:32 UTC (permalink / raw)
To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-06 We 08:49, Joe Conway wrote:
> On 12/6/23 07:36, Andrew Dunstan wrote:
>>
>> On 2023-12-05 Tu 16:46, Joe Conway wrote:
>>> On 12/5/23 16:20, Andrew Dunstan wrote:
>>>> On 2023-12-05 Tu 16:09, Joe Conway wrote:
>>>>> On 12/5/23 16:02, Joe Conway wrote:
>>>>>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>>>>> and in any other case (e.g. LINES) I can't see why you
>>>>>>> would have them.
>>>>>
>>>>> Oh I didn't address this -- I saw examples in the interwebs of
>>>>> MSSQL server I think [1] which had the non-array with commas
>>>>> import and export style. It was not that tough to support and the
>>>>> code as written already does it, so why not?
>>>>
>>>> That seems quite absurd, TBH. I know we've catered for some
>>>> absurdity in
>>>> the CSV code (much of it down to me), so maybe we need to be
>>>> liberal in
>>>> what we accept here too. IMNSHO, we should produce either a single
>>>> JSON
>>>> document (the ARRAY case) or a series of JSON documents, one per row
>>>> (the LINES case).
>>>
>>> So your preference would be to not allow the non-array-with-commas
>>> case but if/when we implement COPY FROM we would accept that format?
>>> As in Postel'a law ("be conservative in what you do, be liberal in
>>> what you accept from others")?
>>
>>
>> Yes, I think so.
>
> Awesome. The attached does it that way. I also ran pgindent.
>
> I believe this is ready to commit unless there are further comments or
> objections.
Sorry to bikeshed a little more, I'm a bit late looking at this.
I suspect that most users will actually want the table as a single JSON
document, so it should probably be the default. In any case FORCE_ARRAY
as an option has a slightly wrong feel to it. I'm having trouble coming
up with a good name for the reverse of that, off the top of my head.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 13:49 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 15:32 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 16:15 ` Joe Conway <[email protected]>
0 siblings, 0 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-06 16:15 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 10:32, Andrew Dunstan wrote:
>
> On 2023-12-06 We 08:49, Joe Conway wrote:
>> On 12/6/23 07:36, Andrew Dunstan wrote:
>>>
>>> On 2023-12-05 Tu 16:46, Joe Conway wrote:
>>>> On 12/5/23 16:20, Andrew Dunstan wrote:
>>>>> On 2023-12-05 Tu 16:09, Joe Conway wrote:
>>>>>> On 12/5/23 16:02, Joe Conway wrote:
>>>>>>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>>>>>> and in any other case (e.g. LINES) I can't see why you
>>>>>>>> would have them.
>>>>>>
>>>>>> Oh I didn't address this -- I saw examples in the interwebs of
>>>>>> MSSQL server I think [1] which had the non-array with commas
>>>>>> import and export style. It was not that tough to support and the
>>>>>> code as written already does it, so why not?
>>>>>
>>>>> That seems quite absurd, TBH. I know we've catered for some
>>>>> absurdity in
>>>>> the CSV code (much of it down to me), so maybe we need to be
>>>>> liberal in
>>>>> what we accept here too. IMNSHO, we should produce either a single
>>>>> JSON
>>>>> document (the ARRAY case) or a series of JSON documents, one per row
>>>>> (the LINES case).
>>>>
>>>> So your preference would be to not allow the non-array-with-commas
>>>> case but if/when we implement COPY FROM we would accept that format?
>>>> As in Postel'a law ("be conservative in what you do, be liberal in
>>>> what you accept from others")?
>>>
>>>
>>> Yes, I think so.
>>
>> Awesome. The attached does it that way. I also ran pgindent.
>>
>> I believe this is ready to commit unless there are further comments or
>> objections.
>
> Sorry to bikeshed a little more, I'm a bit late looking at this.
>
> I suspect that most users will actually want the table as a single JSON
> document, so it should probably be the default. In any case FORCE_ARRAY
> as an option has a slightly wrong feel to it.
Sure, I can make that happen, although I figured that for the
many-rows-scenario the single array size might be an issue for whatever
you are importing into.
> I'm having trouble coming up with a good name for the reverse of
> that, off the top of my head.
Will think about it and propose something with the next patch revision.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 15:44 ` Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:26 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
1 sibling, 4 replies; 37+ messages in thread
From: Tom Lane @ 2023-12-06 15:44 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Joe Conway <[email protected]> writes:
> I believe this is ready to commit unless there are further comments or
> objections.
I thought we were still mostly at proof-of-concept stage?
In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.
regards, tom lane
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 16:19 ` Joe Conway <[email protected]>
3 siblings, 0 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-06 16:19 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 10:44, Tom Lane wrote:
> Joe Conway <[email protected]> writes:
>> I believe this is ready to commit unless there are further comments or
>> objections.
>
> I thought we were still mostly at proof-of-concept stage?
The concept is narrowly scoped enough that I think we are homing in on
the final patch.
> In particular, has anyone done any performance testing?
> I'm concerned about that because composite_to_json() has
> zero capability to cache any metadata across calls, meaning
> there is going to be a large amount of duplicated work
> per row.
I will devise some kind of test and report back. I suppose something
with many rows and many narrow columns comparing time to COPY
text/csv/json modes would do the trick?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 16:19 ` Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
3 siblings, 1 reply; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-06 16:19 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Joe Conway <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-06 We 10:44, Tom Lane wrote:
> Joe Conway <[email protected]> writes:
>> I believe this is ready to commit unless there are further comments or
>> objections.
> I thought we were still mostly at proof-of-concept stage?
>
> In particular, has anyone done any performance testing?
> I'm concerned about that because composite_to_json() has
> zero capability to cache any metadata across calls, meaning
> there is going to be a large amount of duplicated work
> per row.
>
>
Yeah, that's hard to deal with, too, as it can be called recursively.
OTOH I'd rather have a version of this that worked slowly than none at all.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 16:28 ` Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Sehrope Sarkuni @ 2023-12-06 16:28 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Tom Lane <[email protected]>; Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Big +1 to this overall feature.
This is something I've wanted for a long time as well. While it's possible
to use a COPY with text output for a trivial case, the double escaping
falls apart quickly for arbitrary data. It's really only usable when you
know exactly what you are querying and know it will not be a problem.
Regarding the defaults for the output, I think JSON lines (rather than a
JSON array of objects) would be preferred. It's more natural to combine
them and generate that type of data on the fly rather than forcing
aggregation into a single object.
Couple more features / use cases come to mind as well. Even if they're not
part of a first round of this feature I think it'd be helpful to document
them now as it might give some ideas for what does make that first cut:
1. Outputting a top level JSON object without the additional column keys.
IIUC, the top level keys are always the column names. A common use case
would be a single json/jsonb column that is already formatted exactly as
the user would like for output. Rather than enveloping it in an object with
a dedicated key, it would be nice to be able to output it directly. This
would allow non-object results to be outputted as well (e.g., lines of JSON
arrays, numbers, or strings). Due to how JSON is structured, I think this
would play nice with the JSON lines v.s. array concept.
COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}
2. An option to ignore null fields so they are excluded from the output.
This would not be a default but would allow shrinking the total size of the
output data in many situations. This would be recursive to allow nested
objects to be shrunk down (not just the top level). This might be
worthwhile as a standalone JSON function though handling it during output
would be more efficient as it'd only be read once.
COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}
{"foo":3}
3. Reverse of #2 when copying data in to allow defaulting missing fields to
NULL.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
@ 2023-12-06 21:28 ` Joe Conway <[email protected]>
2023-12-06 21:42 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-06 21:28 UTC (permalink / raw)
To: Sehrope Sarkuni <[email protected]>; Andrew Dunstan <[email protected]>; +Cc: Tom Lane <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 11:28, Sehrope Sarkuni wrote:
> Big +1 to this overall feature.
cool!
> Regarding the defaults for the output, I think JSON lines (rather than a
> JSON array of objects) would be preferred. It's more natural to combine
> them and generate that type of data on the fly rather than forcing
> aggregation into a single object.
So that is +2 (Sehrope and me) for the status quo (JSON lines), and +2
(Andrew and Davin) for defaulting to json arrays. Anyone else want to
weigh in on that issue?
> Couple more features / use cases come to mind as well. Even if they're
> not part of a first round of this feature I think it'd be helpful to
> document them now as it might give some ideas for what does make that
> first cut:
>
> 1. Outputting a top level JSON object without the additional column
> keys. IIUC, the top level keys are always the column names. A common use
> case would be a single json/jsonb column that is already formatted
> exactly as the user would like for output. Rather than enveloping it in
> an object with a dedicated key, it would be nice to be able to output it
> directly. This would allow non-object results to be outputted as well
> (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
> structured, I think this would play nice with the JSON lines v.s. array
> concept.
>
> COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> SOME_OPTION_TO_NOT_ENVELOPE)
> {"foo":1}
> {"foo":2}
> {"foo":3}
Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...
> 2. An option to ignore null fields so they are excluded from the output.
> This would not be a default but would allow shrinking the total size of
> the output data in many situations. This would be recursive to allow
> nested objects to be shrunk down (not just the top level). This might be
> worthwhile as a standalone JSON function though handling it during
> output would be more efficient as it'd only be read once.
>
> COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
> {}
> {"foo":2}
> {"foo":3}
clear enough I think
> 3. Reverse of #2 when copying data in to allow defaulting missing fields
> to NULL.
good to record the ask, but applies to a different feature (COPY FROM
instead of COPY TO).
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-06 21:42 ` Sehrope Sarkuni <[email protected]>
2023-12-06 22:38 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Sehrope Sarkuni @ 2023-12-06 21:42 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Tom Lane <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 6, 2023 at 4:29 PM Joe Conway <[email protected]> wrote:
> > 1. Outputting a top level JSON object without the additional column
> > keys. IIUC, the top level keys are always the column names. A common use
> > case would be a single json/jsonb column that is already formatted
> > exactly as the user would like for output. Rather than enveloping it in
> > an object with a dedicated key, it would be nice to be able to output it
> > directly. This would allow non-object results to be outputted as well
> > (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
> > structured, I think this would play nice with the JSON lines v.s. array
> > concept.
> >
> > COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> > generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> > SOME_OPTION_TO_NOT_ENVELOPE)
> > {"foo":1}
> > {"foo":2}
> > {"foo":3}
>
> Your example does not match what you describe, or do I misunderstand? I
> thought your goal was to eliminate the repeated "foo" from each row...
>
The "foo" in this case is explicit as I'm adding it when building the
object. What I was trying to show was not adding an additional object
wrapper / envelope.
So each row is:
{"foo":1}
Rather than:
"{"json_build_object":{"foo":1}}
If each row has exactly one json / jsonb field, then the user has already
indicated the format for each row.
That same mechanism can be used to remove the "foo" entirely via a
json/jsonb array.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 21:42 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
@ 2023-12-06 22:38 ` Joe Conway <[email protected]>
2023-12-06 22:56 ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-06 22:38 UTC (permalink / raw)
To: Sehrope Sarkuni <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Tom Lane <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 16:42, Sehrope Sarkuni wrote:
> On Wed, Dec 6, 2023 at 4:29 PM Joe Conway <[email protected]
> <mailto:[email protected]>> wrote:
>
> > 1. Outputting a top level JSON object without the additional column
> > keys. IIUC, the top level keys are always the column names. A
> common use
> > case would be a single json/jsonb column that is already formatted
> > exactly as the user would like for output. Rather than enveloping
> it in
> > an object with a dedicated key, it would be nice to be able to
> output it
> > directly. This would allow non-object results to be outputted as
> well
> > (e.g., lines of JSON arrays, numbers, or strings). Due to how
> JSON is
> > structured, I think this would play nice with the JSON lines v.s.
> array
> > concept.
> >
> > COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> > generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> > SOME_OPTION_TO_NOT_ENVELOPE)
> > {"foo":1}
> > {"foo":2}
> > {"foo":3}
>
> Your example does not match what you describe, or do I misunderstand? I
> thought your goal was to eliminate the repeated "foo" from each row...
>
>
> The "foo" in this case is explicit as I'm adding it when building the
> object. What I was trying to show was not adding an additional object
> wrapper / envelope.
>
> So each row is:
>
> {"foo":1}
>
> Rather than:
>
> "{"json_build_object":{"foo":1}}
I am still getting confused ;-)
Let's focus on the current proposed patch with a "minimum required
feature set".
Right now the default behavior is "JSON lines":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON);
{"i":1,"v":"val1"}
{"i":2,"v":"val2"}
{"i":3,"v":"val3"}
8<-------------------------------
and the other, non-default option is "JSON array":
8<-------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
{"i":1,"v":"val1"}
,{"i":2,"v":"val2"}
,{"i":3,"v":"val3"}
]
8<-------------------------------
So the questions are:
1. Do those two formats work for the initial implementation?
2. Is the default correct or should it be switched
e.g. rather than specifying FORCE_ARRAY to get an
array, something like FORCE_NO_ARRAY to get JSON lines
and the JSON array is default?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 21:42 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 22:38 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-06 22:56 ` David G. Johnston <[email protected]>
2023-12-07 13:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: David G. Johnston @ 2023-12-06 22:56 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Sehrope Sarkuni <[email protected]>; Andrew Dunstan <[email protected]>; Tom Lane <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 6, 2023 at 3:38 PM Joe Conway <[email protected]> wrote:
> So the questions are:
> 1. Do those two formats work for the initial implementation?
>
Yes. We provide a stream-oriented format and one atomic-import format.
2. Is the default correct or should it be switched
> e.g. rather than specifying FORCE_ARRAY to get an
> array, something like FORCE_NO_ARRAY to get JSON lines
> and the JSON array is default?
>
>
No default?
Require explicit of a sub-format when the main format is JSON.
JSON_OBJECT_ROWS
JSON_ARRAY_OF_OBJECTS
For a future compact array-structured-composites sub-format:
JSON_ARRAY_OF_ARRAYS
JSON_ARRAY_ROWS
David J.
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:19 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 21:42 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
2023-12-06 22:38 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 22:56 ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
@ 2023-12-07 13:19 ` Andrew Dunstan <[email protected]>
0 siblings, 0 replies; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-07 13:19 UTC (permalink / raw)
To: David G. Johnston <[email protected]>; Joe Conway <[email protected]>; +Cc: Sehrope Sarkuni <[email protected]>; Tom Lane <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-06 We 17:56, David G. Johnston wrote:
> On Wed, Dec 6, 2023 at 3:38 PM Joe Conway <[email protected]> wrote:
>
> So the questions are:
> 1. Do those two formats work for the initial implementation?
>
>
> Yes. We provide a stream-oriented format and one atomic-import format.
>
> 2. Is the default correct or should it be switched
> e.g. rather than specifying FORCE_ARRAY to get an
> array, something like FORCE_NO_ARRAY to get JSON lines
> and the JSON array is default?
>
>
> No default?
>
> Require explicit of a sub-format when the main format is JSON.
>
> JSON_OBJECT_ROWS
> JSON_ARRAY_OF_OBJECTS
>
> For a future compact array-structured-composites sub-format:
> JSON_ARRAY_OF_ARRAYS
> JSON_ARRAY_ROWS
>
>
No default seems unlike the way we treat other COPY options. I'm not
terribly fussed about which format to have as the default, but I think
we should have one.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 16:26 ` Tom Lane <[email protected]>
3 siblings, 0 replies; 37+ messages in thread
From: Tom Lane @ 2023-12-06 16:26 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Joe Conway <[email protected]> writes:
> On 12/6/23 10:44, Tom Lane wrote:
>> In particular, has anyone done any performance testing?
> I will devise some kind of test and report back. I suppose something
> with many rows and many narrow columns comparing time to COPY
> text/csv/json modes would do the trick?
Yeah. If it's at least in the same ballpark as the existing text/csv
formats then I'm okay with it. I'm worried that it might be 10x worse,
in which case I think we'd need to do something.
regards, tom lane
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 16:28 ` Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
3 siblings, 1 reply; 37+ messages in thread
From: Tom Lane @ 2023-12-06 16:28 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Andrew Dunstan <[email protected]> writes:
> On 2023-12-06 We 10:44, Tom Lane wrote:
>> In particular, has anyone done any performance testing?
>> I'm concerned about that because composite_to_json() has
>> zero capability to cache any metadata across calls, meaning
>> there is going to be a large amount of duplicated work
>> per row.
> Yeah, that's hard to deal with, too, as it can be called recursively.
Right. On the plus side, if we did improve this it would presumably
also benefit other callers of composite_to_json[b].
> OTOH I'd rather have a version of this that worked slowly than none at all.
It might be acceptable to plan on improving the performance later,
depending on just how bad it is now.
regards, tom lane
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 16:33 ` Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Nathan Bossart @ 2023-12-06 16:33 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 06, 2023 at 11:28:59AM -0500, Tom Lane wrote:
> It might be acceptable to plan on improving the performance later,
> depending on just how bad it is now.
On 10M rows with 11 integers each, I'm seeing the following:
(format text)
Time: 10056.311 ms (00:10.056)
Time: 8789.331 ms (00:08.789)
Time: 8755.070 ms (00:08.755)
(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)
(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)
'perf top' tends to look a bit like this:
13.31% postgres [.] appendStringInfoString
7.57% postgres [.] datum_to_json_internal
6.82% postgres [.] SearchCatCache1
5.35% [kernel] [k] intel_gpio_irq
3.57% postgres [.] composite_to_json
3.31% postgres [.] IsValidJsonNumber
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
@ 2023-12-06 16:44 ` Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Nathan Bossart @ 2023-12-06 16:44 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:
> (format csv)
> Time: 12295.480 ms (00:12.295)
> Time: 12311.059 ms (00:12.311)
> Time: 12305.469 ms (00:12.305)
>
> (format json)
> Time: 24568.621 ms (00:24.569)
> Time: 23756.234 ms (00:23.756)
> Time: 24265.730 ms (00:24.266)
I should also note that the json output is 85% larger than the csv output.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
@ 2023-12-06 19:48 ` Joe Conway <[email protected]>
2023-12-06 20:20 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
0 siblings, 1 reply; 37+ messages in thread
From: Joe Conway @ 2023-12-06 19:48 UTC (permalink / raw)
To: Nathan Bossart <[email protected]>; Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 11:44, Nathan Bossart wrote:
> On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:
>> (format csv)
>> Time: 12295.480 ms (00:12.295)
>> Time: 12311.059 ms (00:12.311)
>> Time: 12305.469 ms (00:12.305)
>>
>> (format json)
>> Time: 24568.621 ms (00:24.569)
>> Time: 23756.234 ms (00:23.756)
>> Time: 24265.730 ms (00:24.266)
>
> I should also note that the json output is 85% larger than the csv output.
I'll see if I can add some caching to composite_to_json(), but based on
the relative data size it does not sound like there is much performance
left on the table to go after, no?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-06 20:20 ` Tom Lane <[email protected]>
2023-12-06 21:03 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-07 02:56 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
0 siblings, 2 replies; 37+ messages in thread
From: Tom Lane @ 2023-12-06 20:20 UTC (permalink / raw)
To: Joe Conway <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Joe Conway <[email protected]> writes:
> I'll see if I can add some caching to composite_to_json(), but based on
> the relative data size it does not sound like there is much performance
> left on the table to go after, no?
If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching. The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.
The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?
regards, tom lane
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 20:20 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-06 21:03 ` Andrew Dunstan <[email protected]>
2023-12-06 21:36 ` Re: Emitting JSON to file using COPY TO Sehrope Sarkuni <[email protected]>
1 sibling, 1 reply; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-06 21:03 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Joe Conway <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-06 We 15:20, Tom Lane wrote:
> Joe Conway <[email protected]> writes:
>> I'll see if I can add some caching to composite_to_json(), but based on
>> the relative data size it does not sound like there is much performance
>> left on the table to go after, no?
> If Nathan's perf results hold up elsewhere, it seems like some
> micro-optimization around the text-pushing (appendStringInfoString)
> might be more useful than caching. The 7% spent in cache lookups
> could be worth going after later, but it's not the top of the list.
>
> The output size difference does say that maybe we should pay some
> attention to the nearby request to not always label every field.
> Perhaps there should be an option for each row to transform to
> a JSON array rather than an object?
>
>
I doubt it. People who want this are likely to want pretty much what
this patch is providing, not something they would have to transform in
order to get it. If they want space-efficient data they won't really be
wanting JSON. Maybe they want Protocol Buffers or something in that vein.
I see there's nearby proposal to make this area pluggable at
<https://postgr.es/m/[email protected];
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 20:20 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 21:03 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 21:36 ` Sehrope Sarkuni <[email protected]>
0 siblings, 0 replies; 37+ messages in thread
From: Sehrope Sarkuni @ 2023-12-06 21:36 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Tom Lane <[email protected]>; Joe Conway <[email protected]>; Nathan Bossart <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan <[email protected]> wrote:
> > The output size difference does say that maybe we should pay some
> > attention to the nearby request to not always label every field.
> > Perhaps there should be an option for each row to transform to
> > a JSON array rather than an object?
>
> I doubt it. People who want this are likely to want pretty much what
> this patch is providing, not something they would have to transform in
> order to get it. If they want space-efficient data they won't really be
> wanting JSON. Maybe they want Protocol Buffers or something in that vein.
>
For arrays v.s. objects, it's not just about data size. There are plenty of
situations where a JSON array is superior to an object (e.g. duplicate
column names). Lines of JSON arrays of strings is pretty much CSV with JSON
escaping rules and a pair of wrapping brackets. It's common for tabular
data in node.js environments as you don't need a separate CSV parser.
Each one has its place and a default of the row_to_json(...) representation
of the row still makes sense. But if the user has the option of outputting
a single json/jsonb field for each row without an object or array wrapper,
then it's possible to support all of these use cases as the user can
explicitly pick whatever envelope makes sense:
-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3)
a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]
-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT
WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"
I'm not sure how I feel about the behavior being automatic if it's a single
top level json / jsonb field rather than requiring the explicit option.
It's probably what a user would want but it also feels odd to change the
output wrapper automatically based on the fields in the response. If it is
automatic and the user wants the additional envelope, the option always
exists to wrap it further in another: json_build_object('some_field",
my_field_i_want_wrapped)
The duplicate field names would be a good test case too. I haven't gone
through this patch but I'm guessing it doesn't filter out duplicates so the
behavior would match up with row_to_json(...), i.e. duplicates are
preserved:
=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
row_to_json
---------------
{"a":1,"a":2}
If so, that's a good test case to add as however that's handled should be
deterministic.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 20:20 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-12-07 02:56 ` Nathan Bossart <[email protected]>
2023-12-07 12:15 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
1 sibling, 1 reply; 37+ messages in thread
From: Nathan Bossart @ 2023-12-07 02:56 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Joe Conway <[email protected]>; Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:
> If Nathan's perf results hold up elsewhere, it seems like some
> micro-optimization around the text-pushing (appendStringInfoString)
> might be more useful than caching. The 7% spent in cache lookups
> could be worth going after later, but it's not the top of the list.
Hah, it turns out my benchmark of 110M integers really stresses the
JSONTYPE_NUMERIC path in datum_to_json_internal(). That particular path
calls strlen() twice: once for IsValidJsonNumber(), and once in
appendStringInfoString(). If I save the result from IsValidJsonNumber()
and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster.
It's probably worth giving datum_to_json_internal() a closer look in a new
thread.
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53ff97..1951e93d9d 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -180,6 +180,7 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
{
char *outputstr;
text *jsontext;
+ int len;
check_stack_depth();
@@ -223,8 +224,8 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo result,
* Don't call escape_json for a non-key if it's a valid JSON
* number.
*/
- if (!key_scalar && IsValidJsonNumber(outputstr, strlen(outputstr)))
- appendStringInfoString(result, outputstr);
+ if (!key_scalar && IsValidJsonNumber(outputstr, (len = strlen(outputstr))))
+ appendBinaryStringInfo(result, outputstr, len);
else
escape_json(result, outputstr);
pfree(outputstr);
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 12:36 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-06 15:44 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:28 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-06 16:33 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-06 20:20 ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
2023-12-07 02:56 ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
@ 2023-12-07 12:15 ` Joe Conway <[email protected]>
0 siblings, 0 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-07 12:15 UTC (permalink / raw)
To: Nathan Bossart <[email protected]>; Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/6/23 21:56, Nathan Bossart wrote:
> On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:
>> If Nathan's perf results hold up elsewhere, it seems like some
>> micro-optimization around the text-pushing (appendStringInfoString)
>> might be more useful than caching. The 7% spent in cache lookups
>> could be worth going after later, but it's not the top of the list.
>
> Hah, it turns out my benchmark of 110M integers really stresses the
> JSONTYPE_NUMERIC path in datum_to_json_internal(). That particular path
> calls strlen() twice: once for IsValidJsonNumber(), and once in
> appendStringInfoString(). If I save the result from IsValidJsonNumber()
> and give it to appendBinaryStringInfo() instead, the COPY goes ~8% faster.
> It's probably worth giving datum_to_json_internal() a closer look in a new
> thread.
Yep, after looking through that code I was going to make the point that
your 11 integer test was over indexing on that one type. I am sure there
are other micro-optimizations to be made here, but I also think that it
is outside the scope of the COPY TO JSON patch.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:09 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:20 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-06 18:59 ` Daniel Verite <[email protected]>
1 sibling, 0 replies; 37+ messages in thread
From: Daniel Verite @ 2023-12-06 18:59 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
Andrew Dunstan wrote:
> IMNSHO, we should produce either a single JSON
> document (the ARRAY case) or a series of JSON documents, one per row
> (the LINES case).
"COPY Operations" in the doc says:
" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".
In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):
PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…
The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?
At least the json non-ARRAY case ("json lines") doesn't have
this issue, since every CopyData message corresponds effectively
to a row in the table.
[1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-05 21:12 ` Andrew Dunstan <[email protected]>
2023-12-05 21:15 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
1 sibling, 1 reply; 37+ messages in thread
From: Andrew Dunstan @ 2023-12-05 21:12 UTC (permalink / raw)
To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 2023-12-05 Tu 16:02, Joe Conway wrote:
> On 12/5/23 15:55, Andrew Dunstan wrote:
>>
>> On 2023-12-05 Tu 14:50, Davin Shearer wrote:
>>> Hi Joe,
>>>
>>> In reviewing the 005 patch, I think that when used with FORCE ARRAY,
>>> we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
>>> case where someone would want to use FORCE ARRAY without also using
>>> FORCE ROW DELIMITER. I can, however, envision a use case where
>>> someone would want FORCE ROW DELIMITER without FORCE ARRAY, like
>>> maybe including into a larger array. I definitely appreciate these
>>> options and the flexibility that they afford from a user perspective.
>>>
>>> In the test output, will you also show the different variations with
>>> FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true,
>>> false), (false, true), (true, true)}? Technically you've already
>>> shown me the (false, false) case as those are the defaults.
>>>
>>>
>>
>> I don't understand the point of FORCE_ROW_DELIMITER at all. There is
>> only one legal delimiter of array items in JSON, and that's a comma.
>> There's no alternative and it's not optional. So in the array case you
>> MUST have commas and in any other case (e.g. LINES) I can't see why you
>> would have them.
>
> The current patch already *does* imply row delimiters in the array
> case. It says so here:
> 8<---------------------------
> + <varlistentry>
> + <term><literal>FORCE_ARRAY</literal></term>
> + <listitem>
> + <para>
> + Force output of array decorations at the beginning and end of
> output.
> + This option implies the <literal>FORCE_ROW_DELIMITER</literal>
> + option. It is allowed only in <command>COPY TO</command>, and
> only
> + when using <literal>JSON</literal> format.
> + The default is <literal>false</literal>.
> + </para>
> 8<---------------------------
>
> and it does so here:
> 8<---------------------------
> + if (opts_out->force_array)
> + opts_out->force_row_delimiter = true;
> 8<---------------------------
>
> and it shows that here:
> 8<---------------------------
> + copy copytest to stdout (format json, force_array);
> + [
> + {"style":"DOS","test":"abc\r\ndef","filler":1}
> + ,{"style":"Unix","test":"abc\ndef","filler":2}
> + ,{"style":"Mac","test":"abc\rdef","filler":3}
> + ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
> + ]
> 8<---------------------------
>
> It also does not allow explicitly setting row delimiters false while
> force_array is true here:
> 8<---------------------------
>
> + if (opts_out->force_array &&
> + force_row_delimiter_specified &&
> + !opts_out->force_row_delimiter)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot specify FORCE_ROW_DELIMITER
> false with FORCE_ARRAY true")));
> 8<---------------------------
>
> Am I understanding something incorrectly?
But what's the point of having it if you're not using FORCE_ARRAY?
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 37+ messages in thread
* Re: Emitting JSON to file using COPY TO
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 17:43 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 18:51 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 19:50 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
2023-12-05 20:55 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 21:12 ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-05 21:15 ` Joe Conway <[email protected]>
0 siblings, 0 replies; 37+ messages in thread
From: Joe Conway @ 2023-12-05 21:15 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers
On 12/5/23 16:12, Andrew Dunstan wrote:
>
> On 2023-12-05 Tu 16:02, Joe Conway wrote:
>> On 12/5/23 15:55, Andrew Dunstan wrote:
>>>
>>> On 2023-12-05 Tu 14:50, Davin Shearer wrote:
>>>> Hi Joe,
>>>>
>>>> In reviewing the 005 patch, I think that when used with FORCE ARRAY,
>>>> we should also _imply_ FORCE ROW DELIMITER. I can't envision a use
>>>> case where someone would want to use FORCE ARRAY without also using
>>>> FORCE ROW DELIMITER. I can, however, envision a use case where
>>>> someone would want FORCE ROW DELIMITER without FORCE ARRAY, like
>>>> maybe including into a larger array. I definitely appreciate these
>>>> options and the flexibility that they afford from a user perspective.
>>>>
>>>> In the test output, will you also show the different variations with
>>>> FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true,
>>>> false), (false, true), (true, true)}? Technically you've already
>>>> shown me the (false, false) case as those are the defaults.
>>>>
>>>>
>>>
>>> I don't understand the point of FORCE_ROW_DELIMITER at all. There is
>>> only one legal delimiter of array items in JSON, and that's a comma.
>>> There's no alternative and it's not optional. So in the array case you
>>> MUST have commas and in any other case (e.g. LINES) I can't see why you
>>> would have them.
>>
>> The current patch already *does* imply row delimiters in the array
>> case. It says so here:
>> 8<---------------------------
>> + <varlistentry>
>> + <term><literal>FORCE_ARRAY</literal></term>
>> + <listitem>
>> + <para>
>> + Force output of array decorations at the beginning and end of
>> output.
>> + This option implies the <literal>FORCE_ROW_DELIMITER</literal>
>> + option. It is allowed only in <command>COPY TO</command>, and
>> only
>> + when using <literal>JSON</literal> format.
>> + The default is <literal>false</literal>.
>> + </para>
>> 8<---------------------------
>>
>> and it does so here:
>> 8<---------------------------
>> + if (opts_out->force_array)
>> + opts_out->force_row_delimiter = true;
>> 8<---------------------------
>>
>> and it shows that here:
>> 8<---------------------------
>> + copy copytest to stdout (format json, force_array);
>> + [
>> + {"style":"DOS","test":"abc\r\ndef","filler":1}
>> + ,{"style":"Unix","test":"abc\ndef","filler":2}
>> + ,{"style":"Mac","test":"abc\rdef","filler":3}
>> + ,{"style":"esc\\ape","test":"a\\r\\\r\\\n\\nb","filler":4}
>> + ]
>> 8<---------------------------
>>
>> It also does not allow explicitly setting row delimiters false while
>> force_array is true here:
>> 8<---------------------------
>>
>> + if (opts_out->force_array &&
>> + force_row_delimiter_specified &&
>> + !opts_out->force_row_delimiter)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> + errmsg("cannot specify FORCE_ROW_DELIMITER
>> false with FORCE_ARRAY true")));
>> 8<---------------------------
>>
>> Am I understanding something incorrectly?
>
>
> But what's the point of having it if you're not using FORCE_ARRAY?
See the follow up email -- other databases support it so why not? It
seems to be a thing...
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 37+ messages in thread
end of thread, other threads:[~2023-12-07 13:19 UTC | newest]
Thread overview: 37+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05 02:54 Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
2023-12-05 16:54 ` Joe Conway <[email protected]>
2023-12-05 17:43 ` Davin Shearer <[email protected]>
2023-12-05 18:51 ` Joe Conway <[email protected]>
2023-12-05 19:50 ` Davin Shearer <[email protected]>
2023-12-05 20:55 ` Andrew Dunstan <[email protected]>
2023-12-05 21:02 ` Joe Conway <[email protected]>
2023-12-05 21:09 ` Joe Conway <[email protected]>
2023-12-05 21:20 ` Andrew Dunstan <[email protected]>
2023-12-05 21:46 ` Joe Conway <[email protected]>
2023-12-05 23:45 ` Davin Shearer <[email protected]>
2023-12-06 12:36 ` Andrew Dunstan <[email protected]>
2023-12-06 13:49 ` Joe Conway <[email protected]>
2023-12-06 15:32 ` Andrew Dunstan <[email protected]>
2023-12-06 16:15 ` Joe Conway <[email protected]>
2023-12-06 15:44 ` Tom Lane <[email protected]>
2023-12-06 16:19 ` Joe Conway <[email protected]>
2023-12-06 16:19 ` Andrew Dunstan <[email protected]>
2023-12-06 16:28 ` Sehrope Sarkuni <[email protected]>
2023-12-06 21:28 ` Joe Conway <[email protected]>
2023-12-06 21:42 ` Sehrope Sarkuni <[email protected]>
2023-12-06 22:38 ` Joe Conway <[email protected]>
2023-12-06 22:56 ` David G. Johnston <[email protected]>
2023-12-07 13:19 ` Andrew Dunstan <[email protected]>
2023-12-06 16:26 ` Tom Lane <[email protected]>
2023-12-06 16:28 ` Tom Lane <[email protected]>
2023-12-06 16:33 ` Nathan Bossart <[email protected]>
2023-12-06 16:44 ` Nathan Bossart <[email protected]>
2023-12-06 19:48 ` Joe Conway <[email protected]>
2023-12-06 20:20 ` Tom Lane <[email protected]>
2023-12-06 21:03 ` Andrew Dunstan <[email protected]>
2023-12-06 21:36 ` Sehrope Sarkuni <[email protected]>
2023-12-07 02:56 ` Nathan Bossart <[email protected]>
2023-12-07 12:15 ` Joe Conway <[email protected]>
2023-12-06 18:59 ` Daniel Verite <[email protected]>
2023-12-05 21:12 ` Andrew Dunstan <[email protected]>
2023-12-05 21:15 ` Joe Conway <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox