public inbox for [email protected]  
help / color / mirror / Atom feed
From: Joe Conway <[email protected]>
To: Andrew Dunstan <[email protected]>
To: Davin Shearer <[email protected]>
To: PostgreSQL-development <[email protected]>
Subject: Re: Emitting JSON to file using COPY TO
Date: Mon, 4 Dec 2023 10:45:58 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com>
	<CAFCRh--Vhx0105S7SWAYHGcVMTSRtbcoArDUGSbYSWRUGunW_g@mail.gmail.com>
	<CAKFQuwZb3zaSqkOvGXRRLZEW61az+4xyHZ7tBPR91JfrcdCKvA@mail.gmail.com>
	<[email protected]>
	<CAFCRh-_GdiUvjd5z5FfvTfhruOnYqBu163XU47zZE8RNATCJGQ@mail.gmail.com>
	<[email protected]>
	<CALvfUkCq3pvkmniWEKZwkbA6hNNvkANF3OamMfhC170RgY+HJg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CALvfUkD9ZFUU2pe9RMPqdGaKXu5OZi5xzOiPokGhOoPJ56FgAw@mail.gmail.com>
	<[email protected]>
	<CALvfUkBKTr9WLAbHQYmwZ7SVp5_VDey1F1uCZtKL4+H1-WvyBw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On 12/4/23 09:25, Andrew Dunstan wrote:
> 
> On 2023-12-04 Mo 08:37, Joe Conway wrote:
>> On 12/4/23 07:41, Andrew Dunstan wrote:
>>>
>>> On 2023-12-03 Su 20:14, Joe Conway wrote:
>>>> (please don't top quote on the Postgres lists)
>>>>
>>>> On 12/3/23 17:38, Davin Shearer wrote:
>>>>> " being quoted as \\" breaks the JSON. It needs to be \".  This has 
>>>>> been my whole problem with COPY TO for JSON.
>>>>>
>>>>> Please validate that the output is in proper format with correct 
>>>>> quoting for special characters. I use `jq` on the command line to 
>>>>> validate and format the output.
>>>>
>>>> I just hooked existing "row-to-json machinery" up to the "COPY TO" 
>>>> statement. If the output is wrong (just for for this use case?), 
>>>> that would be a missing feature (or possibly a bug?).
>>>>
>>>> Davin -- how did you work around the issue with the way the built in 
>>>> functions output JSON?
>>>>
>>>> Andrew -- comments/thoughts?
>>>
>>> I meant to mention this when I was making comments yesterday.
>>>
>>> The patch should not be using CopyAttributeOutText - it will try to
>>> escape characters such as \, which produces the effect complained of
>>> here, or else we need to change its setup so we have a way to inhibit
>>> that escaping.
>>
>>
>> Interesting.
>>
>> I am surprised this has never been raised as a problem with COPY TO 
>> before.
>>
>> Should the JSON output, as produced by composite_to_json(), be sent 
>> as-is with no escaping at all? If yes, is JSON somehow unique in this 
>> regard?
> 
> 
> Text mode output is in such a form that it can be read back in using
> text mode input. There's nothing special about JSON in this respect -
> any text field will be escaped too. But output suitable for text mode
> input is not what you're trying to produce here; you're trying to
> produce valid JSON.
> 
> So, yes, the result of composite_to_json, which is already suitably
> escaped, should not be further escaped in this case.

Gotcha.

This patch version uses CopySendData() instead and includes 
documentation changes. Still lacks regression tests.

Hopefully this looks better. Any other particular strings I ought to 
test with?

8<------------------
test=# copy (select * from foo limit 4) to stdout (format json, 
force_array true);
[
  {"id":1,"f1":"line with \" in it: 
1","f2":"2023-12-03T12:26:41.596053-05:00"}
,{"id":2,"f1":"line with ' in it: 
2","f2":"2023-12-03T12:26:41.596173-05:00"}
,{"id":3,"f1":"line with \" in it: 
3","f2":"2023-12-03T12:26:41.596179-05:00"}
,{"id":4,"f1":"line with ' in it: 
4","f2":"2023-12-03T12:26:41.596182-05:00"}
]
8<------------------

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] copyto_json.003.diff (14.3K, 2-copyto_json.003.diff)
  download | inline diff:
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..af8777b 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*************** COPY { <replaceable class="parameter">ta
*** 43,48 ****
--- 43,50 ----
      FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NOT_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
      FORCE_NULL { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
+     FORCE_ARRAY [ <replaceable class="parameter">boolean</replaceable> ]
+     FORCE_ROW_DELIMITER [ <replaceable class="parameter">boolean</replaceable> ]
      ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
  </synopsis>
   </refsynopsisdiv>
*************** COPY { <replaceable class="parameter">ta
*** 206,214 ****
--- 208,221 ----
        Selects the data format to be read or written:
        <literal>text</literal>,
        <literal>csv</literal> (Comma Separated Values),
+       <literal>json</literal> (JavaScript Object Notation),
        or <literal>binary</literal>.
        The default is <literal>text</literal>.
       </para>
+      <para>
+       The <literal>json</literal> option is allowed only in
+       <command>COPY TO</command>.
+      </para>
      </listitem>
     </varlistentry>
  
*************** COPY { <replaceable class="parameter">ta
*** 372,377 ****
--- 379,410 ----
       </para>
      </listitem>
     </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ROW_DELIMITER</literal></term>
+     <listitem>
+      <para>
+       Force output of commas as row delimiters, in addition to the usual
+       end of line characters. This option is allowed only in
+       <command>COPY TO</command>, and only when using
+       <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry>
+     <term><literal>FORCE_ARRAY</literal></term>
+     <listitem>
+      <para>
+       Force output of array decorations at the beginning and end of output.
+       This option implies the <literal>FORCE_ROW_DELIMITER</literal>
+       option. It is allowed only in <command>COPY TO</command>, and only
+       when using <literal>JSON</literal> format.
+       The default is <literal>false</literal>.
+      </para>
+     </listitem>
+    </varlistentry>
  
     <varlistentry>
      <term><literal>ENCODING</literal></term>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..46ec34f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 443,450 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 540,545 ****
--- 542,559 ----
  								defel->defname),
  						 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_row_delimiter") == 0)
+ 		{
+ 			if (opts_out->force_row_delimiter)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_row_delimiter = true;
+ 		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (opts_out->force_array)
+ 				errorConflictingDefElem(defel, pstate);
+ 			opts_out->force_array = true;
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*************** ProcessCopyOptions(ParseState *pstate,
*** 598,603 ****
--- 612,636 ----
  				(errcode(ERRCODE_SYNTAX_ERROR),
  				 errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("cannot use JSON mode in COPY FROM")));
+ 
+ 		if (opts_out->force_array)
+ 			opts_out->force_row_delimiter = true;
+ 	}
+ 	else if (opts_out->force_array)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ARRAY requires JSON mode")));
+ 	else if (opts_out->force_row_delimiter)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("COPY FORCE_ROW_DELIMITER requires JSON mode")));
+ 
  	/* Set defaults for omitted options */
  	if (!opts_out->delim)
  		opts_out->delim = opts_out->csv_mode ? "," : "\t";
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 700,710 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..fba3070 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,867 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/*
+ 		 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 		 * send the opening bracket.
+ 		 */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			if (cstate->opts.force_array)
+ 			{
+ 				CopySendChar(cstate, '[');
+ 				CopySendEndOfRow(cstate);
+ 			}
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 909,925 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/*
+ 	 * If JSON has been requested, and FORCE_ARRAY has been specified
+ 	 * send the closing bracket.
+ 	 */
+ 	if (cstate->opts.json_mode &&
+ 		cstate->opts.force_array)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 934,940 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 945,1028 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed &&
+ 			cstate->opts.force_row_delimiter)
+ 		{
+ 			CopySendChar(cstate, ',');
+ 		}
+ 		else if (cstate->opts.force_row_delimiter)
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopySendData(cstate, result->data, result->len);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac8..16aa131 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** copy_opt_item:
*** 3408,3413 ****
--- 3408,3417 ----
  				{
  					$$ = makeDefElem("format", (Node *) makeString("csv"), @1);
  				}
+ 			| JSON
+ 				{
+ 					$$ = makeDefElem("format", (Node *) makeString("json"), @1);
+ 				}
  			| HEADER_P
  				{
  					$$ = makeDefElem("header", (Node *) makeBoolean(true), @1);
*************** copy_opt_item:
*** 3448,3453 ****
--- 3452,3465 ----
  				{
  					$$ = makeDefElem("encoding", (Node *) makeString($2), @1);
  				}
+ 			| FORCE ROW DELIMITER
+ 				{
+ 					$$ = makeDefElem("force_row_delimiter", (Node *) makeBoolean(true), @1);
+ 				}
+ 			| FORCE ARRAY
+ 				{
+ 					$$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1);
+ 				}
  		;
  
  /* The following exist for backward compatibility with very old versions */
*************** copy_generic_opt_elem:
*** 3490,3495 ****
--- 3502,3511 ----
  				{
  					$$ = makeDefElem($1, $2, @1);
  				}
+ 			| FORMAT_LA copy_generic_opt_arg
+ 				{
+ 					$$ = makeDefElem("format", $2, @1);
+ 				}
  		;
  
  copy_generic_opt_arg:
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..266910d 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
*************** typedef struct CopyFormatOptions
*** 61,66 ****
--- 62,69 ----
  	List	   *force_null;		/* list of column names */
  	bool		force_null_all; /* FORCE_NULL *? */
  	bool	   *force_null_flags;	/* per-column CSV FN flags */
+ 	bool		force_row_delimiter;	/* use comma as per-row JSON delimiter */
+ 	bool		force_array;	/* JSON array; implies force_row_delimiter */
  	bool		convert_selectively;	/* do selective binary conversion? */
  	List	   *convert_select; /* list of column names (can be NIL) */
  } CopyFormatOptions;
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Emitting JSON to file using COPY TO
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox