public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Emitting JSON to file using COPY TO
35+ messages / 11 participants
[nested] [flat]

* Re: Emitting JSON to file using COPY TO
@ 2023-11-27 09:33 Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Dominique Devienne @ 2023-11-27 09:33 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Davin Shearer <[email protected]>; [email protected]

On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <[email protected]>
wrote:

> On 11/25/23 11:21, Davin Shearer wrote:
> > Hello!
> >
> > I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
> > TO, but I'm running into problems with COPY TO double quoting the
> > output.   Here is a minimal example that demonstrates the problem I'm
> > having:
> >
>
> > I have tried to get COPY TO to copy the results to file "as-is" by
> > setting the escape and the quote characters to the empty string (''),
> > but they only apply to the CSV format.
> >
> > Is there a way to emit JSON results to file from within postgres?
> > Effectively, nn "as-is" option to COPY TO would work well for this JSON
> > use case.
> >
>
> Not using COPY.
>
> See David Johnson's post for one way using the client psql.
>
> Otherwise you will need to use any of the many ETL programs out there
> that are designed for this sort of thing.
>

Guys, I don't get answers like that. The JSON spec is clear:

>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
@ 2023-11-27 09:44 ` Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 16:04   ` Re: Emitting JSON to file using COPY TO Adrian Klaver <[email protected]>
  0 siblings, 2 replies; 35+ messages in thread

From: Dominique Devienne @ 2023-11-27 09:44 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Davin Shearer <[email protected]>; [email protected]

On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <[email protected]>
wrote:

> On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <[email protected]>
> wrote:
>
>> On 11/25/23 11:21, Davin Shearer wrote:
>> > Hello!
>> >
>> > I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
>> > TO, but I'm running into problems with COPY TO double quoting the
>> > output.   Here is a minimal example that demonstrates the problem I'm
>> > having:
>> >
>>
>> > I have tried to get COPY TO to copy the results to file "as-is" by
>> > setting the escape and the quote characters to the empty string (''),
>> > but they only apply to the CSV format.
>> >
>> > Is there a way to emit JSON results to file from within postgres?
>> > Effectively, nn "as-is" option to COPY TO would work well for this JSON
>> > use case.
>> >
>>
>> Not using COPY.
>>
>> See David Johnson's post for one way using the client psql.
>>
>> Otherwise you will need to use any of the many ETL programs out there
>> that are designed for this sort of thing.
>>
>
> Guys, I don't get answers like that. The JSON spec is clear:
>

Oops, sorry, user error. --DD

PS: The JSON spec is a bit ambiguous. First it says

> Any codepoint except " or \ or control characters

And then is clearly shows \" as a valid sequence...
Sounds like JQ is too restrictive?

Or that's the double-escape that's the culprit?
i.e. \\ is in the final text, so that's just a backslash,
and then the double-quote is no longer escaped.

I've recently noticed json_agg(row_to_json(t))
is equivalent to json_agg(t)

Maybe use that instead? Does that make a difference?

I haven't noticed wrong escaping of double-quotes yet,
but then I'm using the binary mode of queries. Perhaps that matters.

On second thought, I guess that's COPY in its text modes doing the escaping?
Interesting. The text-based modes of COPY are configurable. There's even a
JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid
the escaping?


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
@ 2023-11-27 13:27   ` David G. Johnston <[email protected]>
  2023-11-27 13:56     ` Re: Emitting JSON to file using COPY TO Pavel Stehule <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  1 sibling, 2 replies; 35+ messages in thread

From: David G. Johnston @ 2023-11-27 13:27 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; +Cc: Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

On Monday, November 27, 2023, Dominique Devienne <[email protected]>
wrote:

> There's even a JSON mode.
> By miracle, would the JSON output mode recognize JSON[B] values, and avoid
> the escaping?
>

I agree there should be a copy option for “not formatted” so if you dump a
single column result in that format you get the raw unescaped contents of
the column. As soon as you ask for a format your json is now embedded so it
is a value within another format and any structural aspects of the wrapper
present in the json text representation need to be escaped.

David J.


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
@ 2023-11-27 13:56     ` Pavel Stehule <[email protected]>
  2023-11-27 14:43       ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Pavel Stehule @ 2023-11-27 13:56 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Dominique Devienne <[email protected]>; Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

Hi

po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <
[email protected]> napsal:

> On Monday, November 27, 2023, Dominique Devienne <[email protected]>
> wrote:
>
>> There's even a JSON mode.
>> By miracle, would the JSON output mode recognize JSON[B] values, and
>> avoid the escaping?
>>
>
> I agree there should be a copy option for “not formatted” so if you dump a
> single column result in that format you get the raw unescaped contents of
> the column. As soon as you ask for a format your json is now embedded so it
> is a value within another format and any structural aspects of the wrapper
> present in the json text representation need to be escaped.
>

Is it better to use the LO API for this purpose?  It is native for not
formatted data.

Regards

Pavel


> David J.
>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 13:56     ` Re: Emitting JSON to file using COPY TO Pavel Stehule <[email protected]>
@ 2023-11-27 14:43       ` David G. Johnston <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: David G. Johnston @ 2023-11-27 14:43 UTC (permalink / raw)
  To: Pavel Stehule <[email protected]>; +Cc: Dominique Devienne <[email protected]>; Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

On Monday, November 27, 2023, Pavel Stehule <[email protected]> wrote:

> Hi
>
> po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <
> [email protected]> napsal:
>
>> On Monday, November 27, 2023, Dominique Devienne <[email protected]>
>> wrote:
>>
>>> There's even a JSON mode.
>>> By miracle, would the JSON output mode recognize JSON[B] values, and
>>> avoid the escaping?
>>>
>>
>> I agree there should be a copy option for “not formatted” so if you dump
>> a single column result in that format you get the raw unescaped contents of
>> the column. As soon as you ask for a format your json is now embedded so it
>> is a value within another format and any structural aspects of the wrapper
>> present in the json text representation need to be escaped.
>>
>
> Is it better to use the LO API for this purpose?  It is native for not
> formatted data.
>

Using LO is, IMO, never the answer.  But if you are using a driver API
anyway just handle the normal select query result.

David J.


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
@ 2023-11-27 14:56     ` Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Tom Lane @ 2023-11-27 14:56 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Dominique Devienne <[email protected]>; Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

"David G. Johnston" <[email protected]> writes:
> I agree there should be a copy option for “not formatted” so if you dump a
> single column result in that format you get the raw unescaped contents of
> the column.

I'm not sure I even buy that.  JSON data in particular is typically
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
@ 2023-11-27 15:26       ` Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Dominique Devienne @ 2023-11-27 15:26 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: David G. Johnston <[email protected]>; Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

On Mon, Nov 27, 2023 at 3:56 PM Tom Lane <[email protected]> wrote:

> "David G. Johnston" <[email protected]> writes:
> > I agree there should be a copy option for “not formatted” so if you dump
> a
> > single column result in that format you get the raw unescaped contents of
> > the column.
>
> I'm not sure I even buy that.  JSON data in particular is typically
> multi-line, so how will you know where the row boundaries are?
> That is, is a newline a row separator or part of the data?
>
> You can debate the intelligence of any particular quoting/escaping
> scheme, but imagining that you can get away without having one at
> all will just create its own problems.
>

What I was suggesting is not about a "not formatted" option.
But rather than JSON values (i.e. typed `json` or `jsonb`) in a
JSON-formatted COPY operator, the JSON values should not be
serialized to text that is simply output as a JSON-text-value by COPY,
but "inlined" as a "real" JSON value without the JSON document output by
COPY.

This is a special case, where the inner and outer "values" (for lack of a
better terminology)
are *both* JSON documents, and given that JSON is hierarchical, the inner
JSON value can
either by 1) serializing to text first, which must thus be escaped using
the JSON escaping rules,
2) NOT serialized, but "inline" or "spliced-in" the outer COPY JSON
document.

I guess COPY in JSON mode supports only #1 now? While #2 makes more sense
to me.
But both options are valid. Is that clearer?

BTW, JSON is not multi-line, except for insignificant whitespace.
So even COPY in JSON mode is not supposed to be line based I guess?
Unless COPY in JSON mode is more like NDJSON (https://ndjson.org/)? --DD


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
@ 2023-11-28 07:36         ` Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Filip Sedlák @ 2023-11-28 07:36 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; Tom Lane <[email protected]>; +Cc: David G. Johnston <[email protected]>; Adrian Klaver <[email protected]>; Davin Shearer <[email protected]>; [email protected] <[email protected]>

This would be a very special case for COPY. It applies only to a single 
column of JSON values. The original problem can be solved with psql 
--tuples-only as David wrote earlier.


$ psql -tc 'select json_agg(row_to_json(t))
              from (select * from public.tbl_json_test) t;'

  [{"id":1,"t_test":"here's a \"string\""}]


Special-casing any encoding/escaping scheme leads to bugs and harder 
parsing.

Just my 2c.

--
Filip Sedlák






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
@ 2023-11-29 15:32           ` Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Davin Shearer @ 2023-11-29 15:32 UTC (permalink / raw)
  To: [email protected] <[email protected]>

Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip described.

I think it would be great to support writing JSON using COPY TO at
some point so I can emit JSON to files using a PostgreSQL function directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <[email protected]> wrote:

> This would be a very special case for COPY. It applies only to a single
> column of JSON values. The original problem can be solved with psql
> --tuples-only as David wrote earlier.
>
>
> $ psql -tc 'select json_agg(row_to_json(t))
>               from (select * from public.tbl_json_test) t;'
>
>   [{"id":1,"t_test":"here's a \"string\""}]
>
>
> Special-casing any encoding/escaping scheme leads to bugs and harder
> parsing.
>
> Just my 2c.
>
> --
> Filip Sedlák
>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-01 19:28             ` Joe Conway <[email protected]>
  2023-12-01 23:09               ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
  2023-12-02 18:50               ` Re: Emitting JSON to file using COPY TO Maciek Sakrejda <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  0 siblings, 3 replies; 35+ messages in thread

From: Joe Conway @ 2023-12-01 19:28 UTC (permalink / raw)
  To: Davin Shearer <[email protected]>; pgsql-hackers

On 11/29/23 10:32, Davin Shearer wrote:
> Thanks for the responses everyone.
> 
> I worked around the issue using the `psql -tc` method as Filip described.
> 
> I think it would be great to support writing JSON using COPY TO at 
> some point so I can emit JSON to files using a PostgreSQL function directly.
> 
> -Davin
> 
> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     This would be a very special case for COPY. It applies only to a single
>     column of JSON values. The original problem can be solved with psql
>     --tuples-only as David wrote earlier.
> 
> 
>     $ psql -tc 'select json_agg(row_to_json(t))
>                    from (select * from public.tbl_json_test) t;'
> 
>        [{"id":1,"t_test":"here's a \"string\""}]
> 
> 
>     Special-casing any encoding/escaping scheme leads to bugs and harder
>     parsing.

(moved to hackers)

I did a quick PoC patch (attached) -- if there interest and no hard 
objections I would like to get it up to speed for the January commitfest.

Currently the patch lacks documentation and regression test support.

Questions:
----------
1. Is supporting JSON array format sufficient, or does it need to 
support some other options? How flexible does the support scheme need to be?

2. This only supports COPY TO and we would undoubtedly want to support 
COPY FROM for JSON as well, but is that required from the start?

Thanks for any feedback.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] copyto_json.000.diff (8.3K, 2-copyto_json.000.diff)
  download | inline diff:
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..bc1f684 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ProcessCopyOptions(ParseState *pstate,
*** 443,448 ****
--- 443,450 ----
  				 /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  				opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ 				opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  				opts_out->binary = true;
  			else
*************** ProcessCopyOptions(ParseState *pstate,
*** 667,672 ****
--- 669,679 ----
  				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  				 errmsg("cannot specify HEADER in BINARY mode")));
  
+ 	if (opts_out->json_mode && opts_out->header_line)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot specify HEADER in JSON mode")));
+ 
  	/* Check quote */
  	if (!opts_out->csv_mode && opts_out->quote != NULL)
  		ereport(ERROR,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047..f6ee771 100644
*** a/src/backend/commands/copyto.c
--- b/src/backend/commands/copyto.c
***************
*** 37,42 ****
--- 37,43 ----
  #include "rewrite/rewriteHandler.h"
  #include "storage/fd.h"
  #include "tcop/tcopprot.h"
+ #include "utils/json.h"
  #include "utils/lsyscache.h"
  #include "utils/memutils.h"
  #include "utils/partcache.h"
*************** typedef struct
*** 112,117 ****
--- 113,120 ----
  /* NOTE: there's a copy of this in copyfromparse.c */
  static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
  
+ /* need delimiter to start next json array element */
+ static bool json_row_delim_needed = false;
  
  /* non-export function prototypes */
  static void EndCopy(CopyToState cstate);
*************** DoCopyTo(CopyToState cstate)
*** 845,850 ****
--- 848,861 ----
  
  			CopySendEndOfRow(cstate);
  		}
+ 
+ 		/* if a JSON has been requested send the opening bracket */
+ 		if (cstate->opts.json_mode)
+ 		{
+ 			CopySendChar(cstate, '[');
+ 			CopySendEndOfRow(cstate);
+ 			json_row_delim_needed = false;
+ 		}
  	}
  
  	if (cstate->rel)
*************** DoCopyTo(CopyToState cstate)
*** 892,897 ****
--- 903,915 ----
  		CopySendEndOfRow(cstate);
  	}
  
+ 	/* if a JSON has been requested send the closing bracket */
+ 	if (cstate->opts.json_mode)
+ 	{
+ 		CopySendChar(cstate, ']');
+ 		CopySendEndOfRow(cstate);
+ 	}
+ 
  	MemoryContextDelete(cstate->rowcontext);
  
  	if (fe_copy)
*************** DoCopyTo(CopyToState cstate)
*** 906,916 ****
  static void
  CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
  {
- 	bool		need_delim = false;
- 	FmgrInfo   *out_functions = cstate->out_functions;
  	MemoryContext oldcontext;
- 	ListCell   *cur;
- 	char	   *string;
  
  	MemoryContextReset(cstate->rowcontext);
  	oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
--- 924,930 ----
*************** CopyOneRowTo(CopyToState cstate, TupleTa
*** 921,974 ****
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	/* Make sure the tuple is fully deconstructed */
! 	slot_getallattrs(slot);
! 
! 	foreach(cur, cstate->attnumlist)
  	{
! 		int			attnum = lfirst_int(cur);
! 		Datum		value = slot->tts_values[attnum - 1];
! 		bool		isnull = slot->tts_isnull[attnum - 1];
  
! 		if (!cstate->opts.binary)
! 		{
! 			if (need_delim)
! 				CopySendChar(cstate, cstate->opts.delim[0]);
! 			need_delim = true;
! 		}
  
! 		if (isnull)
! 		{
! 			if (!cstate->opts.binary)
! 				CopySendString(cstate, cstate->opts.null_print_client);
! 			else
! 				CopySendInt32(cstate, -1);
! 		}
! 		else
  		{
  			if (!cstate->opts.binary)
  			{
! 				string = OutputFunctionCall(&out_functions[attnum - 1],
! 											value);
! 				if (cstate->opts.csv_mode)
! 					CopyAttributeOutCSV(cstate, string,
! 										cstate->opts.force_quote_flags[attnum - 1],
! 										list_length(cstate->attnumlist) == 1);
  				else
! 					CopyAttributeOutText(cstate, string);
  			}
  			else
  			{
! 				bytea	   *outputbytes;
  
! 				outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 											   value);
! 				CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 				CopySendData(cstate, VARDATA(outputbytes),
! 							 VARSIZE(outputbytes) - VARHDRSZ);
  			}
  		}
  	}
  
  	CopySendEndOfRow(cstate);
  
--- 935,1015 ----
  		CopySendInt16(cstate, list_length(cstate->attnumlist));
  	}
  
! 	if (!cstate->opts.json_mode)
  	{
! 		bool		need_delim = false;
! 		FmgrInfo   *out_functions = cstate->out_functions;
! 		ListCell   *cur;
! 		char	   *string;
  
! 		/* Make sure the tuple is fully deconstructed */
! 		slot_getallattrs(slot);
  
! 		foreach(cur, cstate->attnumlist)
  		{
+ 			int			attnum = lfirst_int(cur);
+ 			Datum		value = slot->tts_values[attnum - 1];
+ 			bool		isnull = slot->tts_isnull[attnum - 1];
+ 
  			if (!cstate->opts.binary)
  			{
! 				if (need_delim)
! 					CopySendChar(cstate, cstate->opts.delim[0]);
! 				need_delim = true;
! 			}
! 
! 			if (isnull)
! 			{
! 				if (!cstate->opts.binary)
! 					CopySendString(cstate, cstate->opts.null_print_client);
  				else
! 					CopySendInt32(cstate, -1);
  			}
  			else
  			{
! 				if (!cstate->opts.binary)
! 				{
! 					string = OutputFunctionCall(&out_functions[attnum - 1],
! 												value);
! 					if (cstate->opts.csv_mode)
! 						CopyAttributeOutCSV(cstate, string,
! 											cstate->opts.force_quote_flags[attnum - 1],
! 											list_length(cstate->attnumlist) == 1);
! 					else
! 						CopyAttributeOutText(cstate, string);
! 				}
! 				else
! 				{
! 					bytea	   *outputbytes;
  
! 					outputbytes = SendFunctionCall(&out_functions[attnum - 1],
! 												   value);
! 					CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
! 					CopySendData(cstate, VARDATA(outputbytes),
! 								 VARSIZE(outputbytes) - VARHDRSZ);
! 				}
  			}
  		}
  	}
+ 	else
+ 	{
+ 		Datum	rowdata = ExecFetchSlotHeapTupleDatum(slot);
+ 		StringInfo	result;
+ 
+ 		result = makeStringInfo();
+ 		composite_to_json(rowdata, result, false);
+ 
+ 		if (json_row_delim_needed)
+ 			CopySendChar(cstate, ',');
+ 		else
+ 		{
+ 			/* first row needs no delimiter */
+ 			CopySendChar(cstate, ' ');
+ 			json_row_delim_needed = true;
+ 		}
+ 
+ 		CopyAttributeOutText(cstate, result->data);
+ 	}
  
  	CopySendEndOfRow(cstate);
  
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 71ae53f..cb4311e 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** typedef struct JsonAggState
*** 83,90 ****
  	JsonUniqueBuilderState unique_check;
  } JsonAggState;
  
- static void composite_to_json(Datum composite, StringInfo result,
- 							  bool use_line_feeds);
  static void array_dim_to_json(StringInfo result, int dim, int ndims, int *dims,
  							  Datum *vals, bool *nulls, int *valcount,
  							  JsonTypeCategory tcategory, Oid outfuncoid,
--- 83,88 ----
*************** array_to_json_internal(Datum array, Stri
*** 490,497 ****
  
  /*
   * Turn a composite / record into JSON.
   */
! static void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
--- 488,496 ----
  
  /*
   * Turn a composite / record into JSON.
+  * Exported so COPY TO can use it.
   */
! void
  composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
  {
  	HeapTupleHeader td;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b..e66bd01 100644
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
*************** typedef struct CopyFormatOptions
*** 43,48 ****
--- 43,49 ----
  	bool		binary;			/* binary format? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
+ 	bool		json_mode;		/* JSON format? */
  	CopyHeaderChoice header_line;	/* header line? */
  	char	   *null_print;		/* NULL marker string (server encoding!) */
  	int			null_print_len; /* length of same */
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index f07e82c..badc5a6 100644
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
***************
*** 17,22 ****
--- 17,24 ----
  #include "lib/stringinfo.h"
  
  /* functions in json.c */
+ extern void composite_to_json(Datum composite, StringInfo result,
+ 							  bool use_line_feeds);
  extern void escape_json(StringInfo buf, const char *str);
  extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
  								const int *tzp);


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-01 23:09               ` Nathan Bossart <[email protected]>
  2023-12-02 03:00                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-02 14:31                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2 siblings, 2 replies; 35+ messages in thread

From: Nathan Bossart @ 2023-12-01 23:09 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers

On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> I did a quick PoC patch (attached) -- if there interest and no hard
> objections I would like to get it up to speed for the January commitfest.

Cool.  I would expect there to be interest, given all the other JSON
support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format that must be
quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

> 1. Is supporting JSON array format sufficient, or does it need to support
> some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one.  My instinct would be
start with something simple, though.  I don't think we offer any special
options for log_destination...

> 2. This only supports COPY TO and we would undoubtedly want to support COPY
> FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

> ! 	if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-01 23:09               ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
@ 2023-12-02 03:00                 ` Davin Shearer <[email protected]>
  2023-12-02 03:10                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Davin Shearer @ 2023-12-02 03:00 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Joe Conway <[email protected]>; Nathan Bossart <[email protected]>

I'm really glad to see this taken up as a possible new feature and will
definitely use it if it gets released.  I'm impressed with how clean,
understandable, and approachable the postgres codebase is in general and
how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a
concern with adding the '[' at the beginning and ']' at the end of the json
output.  Those are already added by `json_agg` (
https://www.postgresql.org/docs/current/functions-aggregate.html) as you
can see in my initial email.  Adding them in the COPY TO may be redundant
(e.g., [[{"key":"value"...}....]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so
much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and
NiFi has superior handling of JSON (via JOLT parsers) versus CSV where
parsing is generally done with regex.  I want to be able to emit JSON using
a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to
be flat and rectangular.  I would vote -1 to leave it out due to the
necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart <[email protected]>
wrote:

> On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> > I did a quick PoC patch (attached) -- if there interest and no hard
> > objections I would like to get it up to speed for the January commitfest.
>
> Cool.  I would expect there to be interest, given all the other JSON
> support that has been added thus far.
>
> I noticed that, with the PoC patch, "json" is the only format that must be
> quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
> conflict with another json-related rule somewhere in gram.y, but I haven't
> tracked down exactly which one is causing it.
>
> > 1. Is supporting JSON array format sufficient, or does it need to support
> > some other options? How flexible does the support scheme need to be?
>
> I don't presently have a strong opinion on this one.  My instinct would be
> start with something simple, though.  I don't think we offer any special
> options for log_destination...
>
> > 2. This only supports COPY TO and we would undoubtedly want to support
> COPY
> > FROM for JSON as well, but is that required from the start?
>
> I would vote for including COPY FROM support from the start.
>
> > !     if (!cstate->opts.json_mode)
>
> I think it's unfortunate that this further complicates the branching in
> CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
> worth refactoring a bunch of stuff to make this nicer.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-01 23:09               ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
  2023-12-02 03:00                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-02 03:10                   ` Joe Conway <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Joe Conway @ 2023-12-02 03:10 UTC (permalink / raw)
  To: Davin Shearer <[email protected]>; pgsql-hackers; +Cc: Nathan Bossart <[email protected]>

On 12/1/23 22:00, Davin Shearer wrote:
> I'm really glad to see this taken up as a possible new feature and will 
> definitely use it if it gets released.  I'm impressed with how clean, 
> understandable, and approachable the postgres codebase is in general and 
> how easy it is to read and understand this patch.
> 
> I reviewed the patch (though I didn't build and test the code) and have 
> a concern with adding the '[' at the beginning and ']' at the end of the 
> json output.  Those are already added by `json_agg` 
> (https://www.postgresql.org/docs/current/functions-aggregate.html 
> <https://www.postgresql.org/docs/current/functions-aggregate.html;) as 
> you can see in my initial email.  Adding them in the COPY TO may be 
> redundant (e.g., [[{"key":"value"...}....]]).

With this patch in place you don't use json_agg() at all. See the 
example output (this is real output with the patch applied):

(oops -- I meant to send this with the same email as the patch)

8<-------------------------------------------------
create table foo(id int8, f1 text, f2 timestamptz);
insert into foo
   select g.i,
          'line: ' || g.i::text,
          clock_timestamp()
   from generate_series(1,4) as g(i);

copy foo to stdout (format 'json');
[
  {"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]
8<-------------------------------------------------


> I think COPY TO makes good sense to support, though COPY FROM maybe not 
> so much as JSON isn't necessarily flat and rectangular like CSV.

Yeah -- definitely not as straight forward but possibly we just support 
the array-of-jsonobj-rows as input as well?

> For my use-case, I'm emitting JSON files to Apache NiFi for processing, 
> and NiFi has superior handling of JSON (via JOLT parsers) versus CSV 
> where parsing is generally done with regex.  I want to be able to emit 
> JSON using a postgres function and thus COPY TO.
> 
> Definitely +1 for COPY TO.
> 
> I don't think COPY FROM will work out well unless the JSON is required 
> to be flat and rectangular.  I would vote -1 to leave it out due to the 
> necessary restrictions making it not generally useful.


-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-01 23:09               ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
@ 2023-12-02 14:31                 ` Joe Conway <[email protected]>
  2023-12-02 15:11                   ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-02 14:31 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers

On 12/1/23 18:09, Nathan Bossart wrote:
> On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
>> I did a quick PoC patch (attached) -- if there interest and no hard
>> objections I would like to get it up to speed for the January commitfest.
> 
> Cool.  I would expect there to be interest, given all the other JSON
> support that has been added thus far.

Thanks for the review

> I noticed that, with the PoC patch, "json" is the only format that must be
> quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
> conflict with another json-related rule somewhere in gram.y, but I haven't
> tracked down exactly which one is causing it.

It seems to be because 'json' is also a type name ($$ = 
SystemTypeName("json")).

What do you think about using 'json_array' instead? It is more specific 
and accurate, and avoids the need to quote.

test=# copy foo to stdout (format json_array);
[
  {"id":1,"f1":"line: 1","f2":"2023-12-01T12:58:16.776863-05:00"}
,{"id":2,"f1":"line: 2","f2":"2023-12-01T12:58:16.777084-05:00"}
,{"id":3,"f1":"line: 3","f2":"2023-12-01T12:58:16.777096-05:00"}
,{"id":4,"f1":"line: 4","f2":"2023-12-01T12:58:16.777103-05:00"}
]

>> 1. Is supporting JSON array format sufficient, or does it need to support
>> some other options? How flexible does the support scheme need to be?
> 
> I don't presently have a strong opinion on this one.  My instinct would be
> start with something simple, though.  I don't think we offer any special
> options for log_destination...

WFM

>> 2. This only supports COPY TO and we would undoubtedly want to support COPY
>> FROM for JSON as well, but is that required from the start?
> 
> I would vote for including COPY FROM support from the start.

Check. My thought is to only accept the same format we emit -- i.e. only 
take a json array.

>> ! 	if (!cstate->opts.json_mode)
> 
> I think it's unfortunate that this further complicates the branching in
> CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
> worth refactoring a bunch of stuff to make this nicer.

Yeah that was my conclusion.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-01 23:09               ` Re: Emitting JSON to file using COPY TO Nathan Bossart <[email protected]>
  2023-12-02 14:31                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-02 15:11                   ` Tom Lane <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Tom Lane @ 2023-12-02 15:11 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; +Cc: Nathan Bossart <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

Joe Conway <[email protected]> writes:
>> I noticed that, with the PoC patch, "json" is the only format that must be
>> quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
>> conflict with another json-related rule somewhere in gram.y, but I haven't
>> tracked down exactly which one is causing it.

While I've not looked too closely, I suspect this might be due to the
FORMAT_LA hack in base_yylex:

            /* Replace FORMAT by FORMAT_LA if it's followed by JSON */
            switch (next_token)
            {
                case JSON:
                    cur_token = FORMAT_LA;
                    break;
            }

So if you are writing a production that might need to match
FORMAT followed by JSON, you need to match FORMAT_LA too.

(I spent a little bit of time last week trying to get rid of
FORMAT_LA, thinking that it didn't look necessary.  Did not
succeed yet.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-02 18:50               ` Maciek Sakrejda <[email protected]>
  2023-12-02 22:43                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2 siblings, 1 reply; 35+ messages in thread

From: Maciek Sakrejda @ 2023-12-02 18:50 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers

On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <[email protected]> wrote:
> 1. Is supporting JSON array format sufficient, or does it need to
> support some other options? How flexible does the support scheme need to be?

"JSON Lines" is a semi-standard format [1] that's basically just
newline-separated JSON values. (In fact, this is what
log_destination=jsonlog gives you for Postgres logs, no?) It might be
worthwhile to support that, too.

[1]: https://jsonlines.org/






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-02 18:50               ` Re: Emitting JSON to file using COPY TO Maciek Sakrejda <[email protected]>
@ 2023-12-02 22:43                 ` Joe Conway <[email protected]>
  2023-12-03 13:46                   ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-02 22:43 UTC (permalink / raw)
  To: Maciek Sakrejda <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers

On 12/2/23 13:50, Maciek Sakrejda wrote:
> On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <[email protected]> wrote:
>> 1. Is supporting JSON array format sufficient, or does it need to
>> support some other options? How flexible does the support scheme need to be?
> 
> "JSON Lines" is a semi-standard format [1] that's basically just
> newline-separated JSON values. (In fact, this is what
> log_destination=jsonlog gives you for Postgres logs, no?) It might be
> worthwhile to support that, too.
> 
> [1]: https://jsonlines.org/


Yes, I have seen examples of that associated with other databases (MSSQL 
and Duckdb at least) as well. It probably makes sense to support that 
format too.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-02 18:50               ` Re: Emitting JSON to file using COPY TO Maciek Sakrejda <[email protected]>
  2023-12-02 22:43                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 13:46                   ` Andrew Dunstan <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-03 13:46 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Maciek Sakrejda <[email protected]>; +Cc: Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-02 Sa 17:43, Joe Conway wrote:
> On 12/2/23 13:50, Maciek Sakrejda wrote:
>> On Fri, Dec 1, 2023 at 11:32 AM Joe Conway <[email protected]> wrote:
>>> 1. Is supporting JSON array format sufficient, or does it need to
>>> support some other options? How flexible does the support scheme 
>>> need to be?
>>
>> "JSON Lines" is a semi-standard format [1] that's basically just
>> newline-separated JSON values. (In fact, this is what
>> log_destination=jsonlog gives you for Postgres logs, no?) It might be
>> worthwhile to support that, too.
>>
>> [1]: https://jsonlines.org/
>
>
> Yes, I have seen examples of that associated with other databases 
> (MSSQL and Duckdb at least) as well. It probably makes sense to 
> support that format too.


You can do that today, e.g.


copy (select to_json(q) from table_or_query q) to stdout


You can also do it as a single document as proposed here, like this:


copy (select json_agg(q) from table_or_query q) to stdout


The only downside to that is that it has to construct the aggregate, 
which could be ugly for large datasets, and that's why I'm not opposed 
to this patch.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 15:10               ` Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2 siblings, 2 replies; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-03 15:10 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-01 Fr 14:28, Joe Conway wrote:
> On 11/29/23 10:32, Davin Shearer wrote:
>> Thanks for the responses everyone.
>>
>> I worked around the issue using the `psql -tc` method as Filip 
>> described.
>>
>> I think it would be great to support writing JSON using COPY TO at 
>> some point so I can emit JSON to files using a PostgreSQL function 
>> directly.
>>
>> -Davin
>>
>> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <[email protected] 
>> <mailto:[email protected]>> wrote:
>>
>>     This would be a very special case for COPY. It applies only to a 
>> single
>>     column of JSON values. The original problem can be solved with psql
>>     --tuples-only as David wrote earlier.
>>
>>
>>     $ psql -tc 'select json_agg(row_to_json(t))
>>                    from (select * from public.tbl_json_test) t;'
>>
>>        [{"id":1,"t_test":"here's a \"string\""}]
>>
>>
>>     Special-casing any encoding/escaping scheme leads to bugs and harder
>>     parsing.
>
> (moved to hackers)
>
> I did a quick PoC patch (attached) -- if there interest and no hard 
> objections I would like to get it up to speed for the January commitfest.
>
> Currently the patch lacks documentation and regression test support.
>
> Questions:
> ----------
> 1. Is supporting JSON array format sufficient, or does it need to 
> support some other options? How flexible does the support scheme need 
> to be?
>
> 2. This only supports COPY TO and we would undoubtedly want to support 
> COPY FROM for JSON as well, but is that required from the start?
>
> Thanks for any feedback.


I  realize this is just a POC, but I'd prefer to see composite_to_json() 
not exposed. You could use the already public datum_to_json() instead, 
passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third 
arguments.

I think JSON array format is sufficient.

I can see both sides of the COPY FROM argument, but I think insisting on 
that makes this less doable for release 17. On balance I would stick to 
COPY TO for now.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-03 15:31                 ` Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Davin Shearer @ 2023-12-03 15:31 UTC (permalink / raw)
  To: pgsql-hackers; Joe Conway <[email protected]>; +Cc: Andrew Dunstan <[email protected]>

Please be sure to include single and double quotes in the test values since
that was the original problem (double quoting in COPY TO breaking the JSON
syntax).

On Sun, Dec 3, 2023, 10:11 Andrew Dunstan <[email protected]> wrote:

>
> On 2023-12-01 Fr 14:28, Joe Conway wrote:
> > On 11/29/23 10:32, Davin Shearer wrote:
> >> Thanks for the responses everyone.
> >>
> >> I worked around the issue using the `psql -tc` method as Filip
> >> described.
> >>
> >> I think it would be great to support writing JSON using COPY TO at
> >> some point so I can emit JSON to files using a PostgreSQL function
> >> directly.
> >>
> >> -Davin
> >>
> >> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     This would be a very special case for COPY. It applies only to a
> >> single
> >>     column of JSON values. The original problem can be solved with psql
> >>     --tuples-only as David wrote earlier.
> >>
> >>
> >>     $ psql -tc 'select json_agg(row_to_json(t))
> >>                    from (select * from public.tbl_json_test) t;'
> >>
> >>        [{"id":1,"t_test":"here's a \"string\""}]
> >>
> >>
> >>     Special-casing any encoding/escaping scheme leads to bugs and harder
> >>     parsing.
> >
> > (moved to hackers)
> >
> > I did a quick PoC patch (attached) -- if there interest and no hard
> > objections I would like to get it up to speed for the January commitfest.
> >
> > Currently the patch lacks documentation and regression test support.
> >
> > Questions:
> > ----------
> > 1. Is supporting JSON array format sufficient, or does it need to
> > support some other options? How flexible does the support scheme need
> > to be?
> >
> > 2. This only supports COPY TO and we would undoubtedly want to support
> > COPY FROM for JSON as well, but is that required from the start?
> >
> > Thanks for any feedback.
>
>
> I  realize this is just a POC, but I'd prefer to see composite_to_json()
> not exposed. You could use the already public datum_to_json() instead,
> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
> arguments.
>
> I think JSON array format is sufficient.
>
> I can see both sides of the COPY FROM argument, but I think insisting on
> that makes this less doable for release 17. On balance I would stick to
> COPY TO for now.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-03 15:51                   ` Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-03 15:51 UTC (permalink / raw)
  To: Davin Shearer <[email protected]>; pgsql-hackers; +Cc: Andrew Dunstan <[email protected]>

On 12/3/23 10:31, Davin Shearer wrote:
> Please be sure to include single and double quotes in the test values 
> since that was the original problem (double quoting in COPY TO breaking 
> the JSON syntax).

test=# copy (select * from foo limit 4) to stdout (format json);
{"id":2456092,"f1":"line with ' in it: 
2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
{"id":2456093,"f1":"line with \\" in it: 
2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
{"id":2456094,"f1":"line with ' in it: 
2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
{"id":2456095,"f1":"line with \\" in it: 
2456095","f2":"2023-12-03T10:44:40.971228-05:00"}

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 22:38                     ` Davin Shearer <[email protected]>
  2023-12-04 01:14                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Davin Shearer @ 2023-12-03 22:38 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; pgsql-hackers; Andrew Dunstan <[email protected]>

" being quoted as \\" breaks the JSON. It needs to be \".  This has been my
whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct quoting
for special characters. I use `jq` on the command line to validate and
format the output.

On Sun, Dec 3, 2023, 10:51 Joe Conway <[email protected]> wrote:

> On 12/3/23 10:31, Davin Shearer wrote:
> > Please be sure to include single and double quotes in the test values
> > since that was the original problem (double quoting in COPY TO breaking
> > the JSON syntax).
>
> test=# copy (select * from foo limit 4) to stdout (format json);
> {"id":2456092,"f1":"line with ' in it:
> 2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
> {"id":2456093,"f1":"line with \\" in it:
> 2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
> {"id":2456094,"f1":"line with ' in it:
> 2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
> {"id":2456095,"f1":"line with \\" in it:
> 2456095","f2":"2023-12-03T10:44:40.971228-05:00"}
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
@ 2023-12-04 01:14                       ` Joe Conway <[email protected]>
  2023-12-04 01:27                         ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-04 12:41                         ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  0 siblings, 2 replies; 35+ messages in thread

From: Joe Conway @ 2023-12-04 01:14 UTC (permalink / raw)
  To: Davin Shearer <[email protected]>; pgsql-hackers; Andrew Dunstan <[email protected]>

(please don't top quote on the Postgres lists)

On 12/3/23 17:38, Davin Shearer wrote:
> " being quoted as \\" breaks the JSON. It needs to be \".  This has been 
> my whole problem with COPY TO for JSON.
> 
> Please validate that the output is in proper format with correct quoting 
> for special characters. I use `jq` on the command line to validate and 
> format the output.

I just hooked existing "row-to-json machinery" up to the "COPY TO" 
statement. If the output is wrong (just for for this use case?), that 
would be a missing feature (or possibly a bug?).

Davin -- how did you work around the issue with the way the built in 
functions output JSON?

Andrew -- comments/thoughts?

Joe


> On Sun, Dec 3, 2023, 10:51 Joe Conway <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 12/3/23 10:31, Davin Shearer wrote:
>      > Please be sure to include single and double quotes in the test
>     values
>      > since that was the original problem (double quoting in COPY TO
>     breaking
>      > the JSON syntax).
> 
>     test=# copy (select * from foo limit 4) to stdout (format json);
>     {"id":2456092,"f1":"line with ' in it:
>     2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
>     {"id":2456093,"f1":"line with \\" in it:
>     2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
>     {"id":2456094,"f1":"line with ' in it:
>     2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
>     {"id":2456095,"f1":"line with \\" in it:
>     2456095","f2":"2023-12-03T10:44:40.971228-05:00"}
> 
>     -- 
>     Joe Conway
>     PostgreSQL Contributors Team
>     RDS Open Source Databases
>     Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com;
> 

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-04 01:14                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-04 01:27                         ` Davin Shearer <[email protected]>
  1 sibling, 0 replies; 35+ messages in thread

From: Davin Shearer @ 2023-12-04 01:27 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Joe Conway <[email protected]>

I worked around it by using select json_agg(t)... and redirecting it to
file via psql on the command line. COPY TO was working until we ran into
broken JSON and discovered the double quoting issue due to some values
containing " in them.


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-04 01:14                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-04 12:41                         ` Andrew Dunstan <[email protected]>
  2023-12-04 13:37                           ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-04 12:41 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-03 Su 20:14, Joe Conway wrote:
> (please don't top quote on the Postgres lists)
>
> On 12/3/23 17:38, Davin Shearer wrote:
>> " being quoted as \\" breaks the JSON. It needs to be \".  This has 
>> been my whole problem with COPY TO for JSON.
>>
>> Please validate that the output is in proper format with correct 
>> quoting for special characters. I use `jq` on the command line to 
>> validate and format the output.
>
> I just hooked existing "row-to-json machinery" up to the "COPY TO" 
> statement. If the output is wrong (just for for this use case?), that 
> would be a missing feature (or possibly a bug?).
>
> Davin -- how did you work around the issue with the way the built in 
> functions output JSON?
>
> Andrew -- comments/thoughts?
>
>

I meant to mention this when I was making comments yesterday.

The patch should not be using CopyAttributeOutText - it will try to 
escape characters such as \, which produces the effect complained of 
here, or else we need to change its setup so we have a way to inhibit 
that escaping.


cheers


andrew



>
>
--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-04 01:14                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-04 12:41                         ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-04 13:37                           ` Joe Conway <[email protected]>
  2023-12-04 14:25                             ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-04 13:37 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

On 12/4/23 07:41, Andrew Dunstan wrote:
> 
> On 2023-12-03 Su 20:14, Joe Conway wrote:
>> (please don't top quote on the Postgres lists)
>>
>> On 12/3/23 17:38, Davin Shearer wrote:
>>> " being quoted as \\" breaks the JSON. It needs to be \".  This has 
>>> been my whole problem with COPY TO for JSON.
>>>
>>> Please validate that the output is in proper format with correct 
>>> quoting for special characters. I use `jq` on the command line to 
>>> validate and format the output.
>>
>> I just hooked existing "row-to-json machinery" up to the "COPY TO" 
>> statement. If the output is wrong (just for for this use case?), that 
>> would be a missing feature (or possibly a bug?).
>>
>> Davin -- how did you work around the issue with the way the built in 
>> functions output JSON?
>>
>> Andrew -- comments/thoughts?
> 
> I meant to mention this when I was making comments yesterday.
> 
> The patch should not be using CopyAttributeOutText - it will try to
> escape characters such as \, which produces the effect complained of
> here, or else we need to change its setup so we have a way to inhibit
> that escaping.


Interesting.

I am surprised this has never been raised as a problem with COPY TO before.

Should the JSON output, as produced by composite_to_json(), be sent 
as-is with no escaping at all? If yes, is JSON somehow unique in this 
regard?

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 15:31                 ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-03 15:51                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 22:38                     ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-04 01:14                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-04 12:41                         ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-04 13:37                           ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-04 14:25                             ` Andrew Dunstan <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-04 14:25 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-04 Mo 08:37, Joe Conway wrote:
> On 12/4/23 07:41, Andrew Dunstan wrote:
>>
>> On 2023-12-03 Su 20:14, Joe Conway wrote:
>>> (please don't top quote on the Postgres lists)
>>>
>>> On 12/3/23 17:38, Davin Shearer wrote:
>>>> " being quoted as \\" breaks the JSON. It needs to be \".  This has 
>>>> been my whole problem with COPY TO for JSON.
>>>>
>>>> Please validate that the output is in proper format with correct 
>>>> quoting for special characters. I use `jq` on the command line to 
>>>> validate and format the output.
>>>
>>> I just hooked existing "row-to-json machinery" up to the "COPY TO" 
>>> statement. If the output is wrong (just for for this use case?), 
>>> that would be a missing feature (or possibly a bug?).
>>>
>>> Davin -- how did you work around the issue with the way the built in 
>>> functions output JSON?
>>>
>>> Andrew -- comments/thoughts?
>>
>> I meant to mention this when I was making comments yesterday.
>>
>> The patch should not be using CopyAttributeOutText - it will try to
>> escape characters such as \, which produces the effect complained of
>> here, or else we need to change its setup so we have a way to inhibit
>> that escaping.
>
>
> Interesting.
>
> I am surprised this has never been raised as a problem with COPY TO 
> before.
>
> Should the JSON output, as produced by composite_to_json(), be sent 
> as-is with no escaping at all? If yes, is JSON somehow unique in this 
> regard?


Text mode output is in such a form that it can be read back in using 
text mode input. There's nothing special about JSON in this respect - 
any text field will be escaped too. But output suitable for text mode 
input is not what you're trying to produce here; you're trying to 
produce valid JSON.

So, yes, the result of composite_to_json, which is already suitably 
escaped, should not be further escaped in this case.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-03 16:03                 ` Joe Conway <[email protected]>
  2023-12-03 17:11                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 19:24                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  1 sibling, 2 replies; 35+ messages in thread

From: Joe Conway @ 2023-12-03 16:03 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

On 12/3/23 10:10, Andrew Dunstan wrote:
> 
> On 2023-12-01 Fr 14:28, Joe Conway wrote:
>> On 11/29/23 10:32, Davin Shearer wrote:
>>> Thanks for the responses everyone.
>>>
>>> I worked around the issue using the `psql -tc` method as Filip 
>>> described.
>>>
>>> I think it would be great to support writing JSON using COPY TO at 
>>> some point so I can emit JSON to files using a PostgreSQL function 
>>> directly.
>>>
>>> -Davin
>>>
>>> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>>     This would be a very special case for COPY. It applies only to a 
>>> single
>>>     column of JSON values. The original problem can be solved with psql
>>>     --tuples-only as David wrote earlier.
>>>
>>>
>>>     $ psql -tc 'select json_agg(row_to_json(t))
>>>                    from (select * from public.tbl_json_test) t;'
>>>
>>>        [{"id":1,"t_test":"here's a \"string\""}]
>>>
>>>
>>>     Special-casing any encoding/escaping scheme leads to bugs and harder
>>>     parsing.
>>
>> (moved to hackers)
>>
>> I did a quick PoC patch (attached) -- if there interest and no hard 
>> objections I would like to get it up to speed for the January commitfest.
>>
>> Currently the patch lacks documentation and regression test support.
>>
>> Questions:
>> ----------
>> 1. Is supporting JSON array format sufficient, or does it need to 
>> support some other options? How flexible does the support scheme need 
>> to be?
>>
>> 2. This only supports COPY TO and we would undoubtedly want to support 
>> COPY FROM for JSON as well, but is that required from the start?
>>
>> Thanks for any feedback.
> 
> I  realize this is just a POC, but I'd prefer to see composite_to_json()
> not exposed. You could use the already public datum_to_json() instead,
> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
> arguments.

Ok, thanks, will do

> I think JSON array format is sufficient.

The other formats make sense from a completeness standpoint (versus 
other databases) and the latest patch already includes them, so I still 
lean toward supporting all three formats.

> I can see both sides of the COPY FROM argument, but I think insisting on
> that makes this less doable for release 17. On balance I would stick to
> COPY TO for now.

WFM.

 From your earlier post, regarding constructing the aggregate -- not 
extensive testing but one data point:
8<--------------------------
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 36353.153 ms (00:36.353)
test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
COPY 1
Time: 46835.238 ms (00:46.835)
8<--------------------------


-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 17:11                   ` Joe Conway <[email protected]>
  2023-12-03 19:47                     ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-03 17:11 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

On 12/3/23 11:03, Joe Conway wrote:
>   From your earlier post, regarding constructing the aggregate -- not
> extensive testing but one data point:
> 8<--------------------------
> test=# copy foo to '/tmp/buf' (format json, force_array);
> COPY 10000000
> Time: 36353.153 ms (00:36.353)
> test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
> COPY 1
> Time: 46835.238 ms (00:46.835)
> 8<--------------------------

Also if the table is large enough, the aggregate method is not even 
feasible whereas the COPY TO method works:
8<--------------------------
test=# select count(*) from foo;
   count
----------
  20000000
(1 row)

test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741822 bytes by 1 
more bytes.

test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 20000000
8<--------------------------

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 17:11                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 19:47                     ` Andrew Dunstan <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-03 19:47 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-03 Su 12:11, Joe Conway wrote:
> On 12/3/23 11:03, Joe Conway wrote:
>>   From your earlier post, regarding constructing the aggregate -- not
>> extensive testing but one data point:
>> 8<--------------------------
>> test=# copy foo to '/tmp/buf' (format json, force_array);
>> COPY 10000000
>> Time: 36353.153 ms (00:36.353)
>> test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
>> COPY 1
>> Time: 46835.238 ms (00:46.835)
>> 8<--------------------------
>
> Also if the table is large enough, the aggregate method is not even 
> feasible whereas the COPY TO method works:
> 8<--------------------------
> test=# select count(*) from foo;
>   count
> ----------
>  20000000
> (1 row)
>
> test=# copy (select json_agg(foo) from foo) to '/tmp/buf';
> ERROR:  out of memory
> DETAIL:  Cannot enlarge string buffer containing 1073741822 bytes by 1 
> more bytes.
>
> test=# copy foo to '/tmp/buf' (format json, force_array);
> COPY 20000000
> 8<--------------------------


None of this is surprising. As I mentioned, limitations with json_agg() 
are why I support the idea of this patch.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 19:24                   ` Joe Conway <[email protected]>
  2023-12-03 19:52                     ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Joe Conway @ 2023-12-03 19:24 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

On 12/3/23 11:03, Joe Conway wrote:
> On 12/3/23 10:10, Andrew Dunstan wrote:
>> I  realize this is just a POC, but I'd prefer to see composite_to_json()
>> not exposed. You could use the already public datum_to_json() instead,
>> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
>> arguments.
> 
> Ok, thanks, will do

Just FYI, this change does loose some performance in my not massively 
scientific A/B/A test:

8<---------------------------
-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37196.898 ms (00:37.197)
Time: 37408.161 ms (00:37.408)
Time: 38393.309 ms (00:38.393)
Time: 36855.438 ms (00:36.855)
Time: 37806.280 ms (00:37.806)

Avg = 37532

-- original patch
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 37426.207 ms (00:37.426)
Time: 36068.187 ms (00:36.068)
Time: 38285.252 ms (00:38.285)
Time: 36971.042 ms (00:36.971)
Time: 35690.822 ms (00:35.691)

Avg = 36888

-- with datum_to_json()
test=# \timing
Timing is on.
test=# copy foo to '/tmp/buf' (format json, force_array);
COPY 10000000
Time: 39083.467 ms (00:39.083)
Time: 37249.326 ms (00:37.249)
Time: 38529.721 ms (00:38.530)
Time: 38704.920 ms (00:38.705)
Time: 39001.326 ms (00:39.001)

Avg = 38513
8<---------------------------

That is somewhere in the 3% range.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 19:24                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
@ 2023-12-03 19:52                     ` Andrew Dunstan <[email protected]>
  2023-12-03 20:09                       ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Andrew Dunstan @ 2023-12-03 19:52 UTC (permalink / raw)
  To: Joe Conway <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers


On 2023-12-03 Su 14:24, Joe Conway wrote:
> On 12/3/23 11:03, Joe Conway wrote:
>> On 12/3/23 10:10, Andrew Dunstan wrote:
>>> I  realize this is just a POC, but I'd prefer to see 
>>> composite_to_json()
>>> not exposed. You could use the already public datum_to_json() instead,
>>> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
>>> arguments.
>>
>> Ok, thanks, will do
>
> Just FYI, this change does loose some performance in my not massively 
> scientific A/B/A test:
>
> 8<---------------------------
> -- with datum_to_json()
> test=# \timing
> Timing is on.
> test=# copy foo to '/tmp/buf' (format json, force_array);
> COPY 10000000
> Time: 37196.898 ms (00:37.197)
> Time: 37408.161 ms (00:37.408)
> Time: 38393.309 ms (00:38.393)
> Time: 36855.438 ms (00:36.855)
> Time: 37806.280 ms (00:37.806)
>
> Avg = 37532
>
> -- original patch
> test=# \timing
> Timing is on.
> test=# copy foo to '/tmp/buf' (format json, force_array);
> COPY 10000000
> Time: 37426.207 ms (00:37.426)
> Time: 36068.187 ms (00:36.068)
> Time: 38285.252 ms (00:38.285)
> Time: 36971.042 ms (00:36.971)
> Time: 35690.822 ms (00:35.691)
>
> Avg = 36888
>
> -- with datum_to_json()
> test=# \timing
> Timing is on.
> test=# copy foo to '/tmp/buf' (format json, force_array);
> COPY 10000000
> Time: 39083.467 ms (00:39.083)
> Time: 37249.326 ms (00:37.249)
> Time: 38529.721 ms (00:38.530)
> Time: 38704.920 ms (00:38.705)
> Time: 39001.326 ms (00:39.001)
>
> Avg = 38513
> 8<---------------------------
>
> That is somewhere in the 3% range.


I assume it's because datum_to_json() constructs a text value from which 
you then need to extract the cstring, whereas composite_to_json(), just 
gives you back the stringinfo. I guess that's a good enough reason to go 
with exposing composite_to_json().


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 13:27   ` Re: Emitting JSON to file using COPY TO David G. Johnston <[email protected]>
  2023-11-27 14:56     ` Re: Emitting JSON to file using COPY TO Tom Lane <[email protected]>
  2023-11-27 15:26       ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-28 07:36         ` Re: Emitting JSON to file using COPY TO Filip Sedlák <[email protected]>
  2023-11-29 15:32           ` Re: Emitting JSON to file using COPY TO Davin Shearer <[email protected]>
  2023-12-01 19:28             ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 15:10               ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
  2023-12-03 16:03                 ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 19:24                   ` Re: Emitting JSON to file using COPY TO Joe Conway <[email protected]>
  2023-12-03 19:52                     ` Re: Emitting JSON to file using COPY TO Andrew Dunstan <[email protected]>
@ 2023-12-03 20:09                       ` Joe Conway <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Joe Conway @ 2023-12-03 20:09 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; Davin Shearer <[email protected]>; pgsql-hackers

On 12/3/23 14:52, Andrew Dunstan wrote:
> 
> On 2023-12-03 Su 14:24, Joe Conway wrote:
>> On 12/3/23 11:03, Joe Conway wrote:
>>> On 12/3/23 10:10, Andrew Dunstan wrote:
>>>> I  realize this is just a POC, but I'd prefer to see 
>>>> composite_to_json()
>>>> not exposed. You could use the already public datum_to_json() instead,
>>>> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
>>>> arguments.
>>>
>>> Ok, thanks, will do
>>
>> Just FYI, this change does loose some performance in my not massively 
>> scientific A/B/A test:
>>
>> 8<---------------------------
<snip>
>> 8<---------------------------
>>
>> That is somewhere in the 3% range.
> 
> I assume it's because datum_to_json() constructs a text value from which
> you then need to extract the cstring, whereas composite_to_json(), just
> gives you back the stringinfo. I guess that's a good enough reason to go
> with exposing composite_to_json().

Yeah, that was why I went that route in the first place. If you are good 
with it I will go back to that. The code is a bit simpler too.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
@ 2023-11-27 16:04   ` Adrian Klaver <[email protected]>
  2023-11-27 16:14     ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Adrian Klaver @ 2023-11-27 16:04 UTC (permalink / raw)
  To: Dominique Devienne <[email protected]>; +Cc: Davin Shearer <[email protected]>; [email protected]

On 11/27/23 01:44, Dominique Devienne wrote:
> On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <[email protected] 
> <mailto:[email protected]>> wrote:
> 

> On second thought, I guess that's COPY in its text modes doing the escaping?
> Interesting. The text-based modes of COPY are configurable. There's even 
> a JSON mode.

Where are you seeing the JSON mode for COPY? AFAIK there is only text 
and CSV formats.

> By miracle, would the JSON output mode recognize JSON[B] values, and 
> avoid the escaping?
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Emitting JSON to file using COPY TO
  2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 09:44 ` Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
  2023-11-27 16:04   ` Re: Emitting JSON to file using COPY TO Adrian Klaver <[email protected]>
@ 2023-11-27 16:14     ` Dominique Devienne <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Dominique Devienne @ 2023-11-27 16:14 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Davin Shearer <[email protected]>; [email protected]

On Mon, Nov 27, 2023 at 5:04 PM Adrian Klaver <[email protected]>
wrote:

> On 11/27/23 01:44, Dominique Devienne wrote:
> > On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <[email protected]
> > <mailto:[email protected]>> wrote:
> > On second thought, I guess that's COPY in its text modes doing the
> escaping?
> > Interesting. The text-based modes of COPY are configurable. There's even
> > a JSON mode.
>
> Where are you seeing the JSON mode for COPY? AFAIK there is only text
> and CSV formats.
>

Indeed. Somehow I thought there was...
I've used the TEXT and BINARY modes, and remembered a wishful thinking JSON
mode!
OK then, if there was, then what I wrote would apply :). --DD


^ permalink  raw  reply  [nested|flat] 35+ messages in thread


end of thread, other threads:[~2023-12-04 14:25 UTC | newest]

Thread overview: 35+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-11-27 09:33 Re: Emitting JSON to file using COPY TO Dominique Devienne <[email protected]>
2023-11-27 09:44 ` Dominique Devienne <[email protected]>
2023-11-27 13:27   ` David G. Johnston <[email protected]>
2023-11-27 13:56     ` Pavel Stehule <[email protected]>
2023-11-27 14:43       ` David G. Johnston <[email protected]>
2023-11-27 14:56     ` Tom Lane <[email protected]>
2023-11-27 15:26       ` Dominique Devienne <[email protected]>
2023-11-28 07:36         ` Filip Sedlák <[email protected]>
2023-11-29 15:32           ` Davin Shearer <[email protected]>
2023-12-01 19:28             ` Joe Conway <[email protected]>
2023-12-01 23:09               ` Nathan Bossart <[email protected]>
2023-12-02 03:00                 ` Davin Shearer <[email protected]>
2023-12-02 03:10                   ` Joe Conway <[email protected]>
2023-12-02 14:31                 ` Joe Conway <[email protected]>
2023-12-02 15:11                   ` Tom Lane <[email protected]>
2023-12-02 18:50               ` Maciek Sakrejda <[email protected]>
2023-12-02 22:43                 ` Joe Conway <[email protected]>
2023-12-03 13:46                   ` Andrew Dunstan <[email protected]>
2023-12-03 15:10               ` Andrew Dunstan <[email protected]>
2023-12-03 15:31                 ` Davin Shearer <[email protected]>
2023-12-03 15:51                   ` Joe Conway <[email protected]>
2023-12-03 22:38                     ` Davin Shearer <[email protected]>
2023-12-04 01:14                       ` Joe Conway <[email protected]>
2023-12-04 01:27                         ` Davin Shearer <[email protected]>
2023-12-04 12:41                         ` Andrew Dunstan <[email protected]>
2023-12-04 13:37                           ` Joe Conway <[email protected]>
2023-12-04 14:25                             ` Andrew Dunstan <[email protected]>
2023-12-03 16:03                 ` Joe Conway <[email protected]>
2023-12-03 17:11                   ` Joe Conway <[email protected]>
2023-12-03 19:47                     ` Andrew Dunstan <[email protected]>
2023-12-03 19:24                   ` Joe Conway <[email protected]>
2023-12-03 19:52                     ` Andrew Dunstan <[email protected]>
2023-12-03 20:09                       ` Joe Conway <[email protected]>
2023-11-27 16:04   ` Adrian Klaver <[email protected]>
2023-11-27 16:14     ` Dominique Devienne <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox