Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyD5y-00HXJt-1Y for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 18:06:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vyD5v-000byv-2B for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 18:06:44 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyD5u-000byk-38 for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 18:06:43 +0000 Received: from mail-yw1-x1130.google.com ([2607:f8b0:4864:20::1130]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vyD5s-00000000ZV7-3vhX for pgsql-hackers@postgresql.org; Thu, 05 Mar 2026 18:06:42 +0000 Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-79860421382so81746937b3.0 for ; Thu, 05 Mar 2026 10:06:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joeconway.com; s=google; t=1772734000; x=1773338800; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=kvj6aUiL+CTSw2yaqwAZO4fYIH6ZJ6BvqJ2lTj+Z+T0=; b=PhCnHWCekaxhre81Hd3gP6PP0MHjIxiQHHZ1+D4go5nuBk976HNKWkB2kuYQwEFMun KTM9nvdl8q1tw9dmAYqCdHOQroKb4RoJjldt4T/q2m8MZzPx9XW1tgmYmOnHgowTJCyM i2VQWkL/eeS0KEsmy6ScY2osYZF14S5W++QHjfanH8mJMvHS4ZpCr5CoySOSMziGSq1h HKrW+z0Rsi/jr9DzDQt2nA5q0P3kqDRhGPnMF/faZBJ28Pphk2rduyZs7f4S43S2QpkP 6gjlvUWskJYRIu6EfJ7+wlKTNACTrkfaG8WAyqdjpC6OJhU4POqMQBOUqV4guGA7qN/Q hCtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772734000; x=1773338800; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kvj6aUiL+CTSw2yaqwAZO4fYIH6ZJ6BvqJ2lTj+Z+T0=; b=KMH983BsrvAsba8IgHswdd5oncZ6ZRmHbk/XbvfIlnMML9oFpZPJxxNtPERr1bUOa2 Cu55OiY/Vu2InqDyVMK7BiwDvQvF7qWUH8WhC28TK/wgGRUgeU6sJT/G5ZXj2uh0tUbO Zk3sFCGh8CtFS9DDTvE+y7jx+tGE2eb4YTzldy9umqMQIE1GLx+MeOM2m9UzEDG7s3u2 TeiWgBhCAEnQgBZq3j2S35BmLXTj9GdwnniyBN+q2zEIAg92WI3IAq2/qsKqoHcbdV9P lnO6j06IzNEKMyz/ZZCHndxZip0vel7RItFp0CleXhLP/yN6zm8j2qgBn44+mjJYZt+V 799A== X-Forwarded-Encrypted: i=1; AJvYcCWAvO3rFKgu3IqqkZc2BhXUuoM3MsaL/LEcR/jYT8EIUxJBDAyWGuiHBkJ3prphokvrIXOIY5b1gXCgebaa@postgresql.org X-Gm-Message-State: AOJu0YwDoJS/kMdJCQJQGM9XWoZct0lPBCwCQv643RtpPc/ssoHck9bx dNPH2yW8KCE5fZAE4iXV7qiwGCiKVSdKQfbPrEV+anSXClEkXkfxQ0MraQS4Ns/r9hI= X-Gm-Gg: ATEYQzyJN9NbdUrVJLHBphdH3VPkHcDTH+GAmQNptjOhCA4ZbUU31gU83oXgN0ZRR51 dMvo0sEmavHZ3qJKZNbrzCn357LbT/xVDnLiO/zT4weme+8W+2YL+kJt6Gbr2hwitOZudCWKEpq RJu834gT0q9/VsqyMpBtF/0q316eNtLzTMSyrPQEA6E+Yx5RnDO+0WyeS+5uEcHrnWKb7hZGuoI Q33yBNQjhHfxqgmx+Ek6eovxzD+SAB8xn2Dr3IGsWB3g4xLeDN58NEMgwQfPU3jh68w+OHdIZkG /M99ZQVr+2nYsOamMuFFzhvIo+kWDQn4sL1W82zNSZIYaSPT8t5xvtvNKKYXq7SHv50uD1zBAqG jiyiYOUBBzgwUki2By5XJkj+YAvKZRL0rlOKqOMJzXyiCNBSMmApae3iAEqyAOpl9QGLNfaEcZb zxGfXCZ50WhOvWFBzC7jtxkXd3 X-Received: by 2002:a05:690e:2060:b0:64c:97ad:17dc with SMTP id 956f58d0204a3-64cf9bbdeb8mr3954327d50.66.1772733999915; Thu, 05 Mar 2026 10:06:39 -0800 (PST) Received: from [192.168.4.42] ([46.110.138.68]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-64cb763d7fbsm9205592d50.16.2026.03.05.10.06.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Mar 2026 10:06:38 -0800 (PST) Message-ID: <6cbbf7f3-da4a-4201-98a6-4347a3c35db0@joeconway.com> Date: Thu, 5 Mar 2026 13:06:37 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Emitting JSON to file using COPY TO To: Andrew Dunstan , jian he , Junwang Zhao Cc: Florents Tselai , "Andrey M. Borodin" , Dean Rasheed , Daniel Verite , Davin Shearer , PostgreSQL-development References: <74d3b4e2-d201-489d-9b9a-1a6d0eb492a6@dunslane.net> Content-Language: en-US From: Joe Conway In-Reply-To: <74d3b4e2-d201-489d-9b9a-1a6d0eb492a6@dunslane.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 3/4/26 10:51, Andrew Dunstan wrote: > > On 2026-02-08 Su 10:48 PM, jian he wrote: >> On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao wrote: >>> Here are some comments on v23: >>> >>> 0001: The refactor looks straightforward to me. Introducing a format >>> field should make future extensions easier. One suggestion is that we >>> could add some helper macros around format, for example: >>> >>> #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV) >>> #define IS_FORMAT_TEXT_LIKE(format) \ >>> (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV) >>> >>> I think this would improve readability. >> Personally, I don't like marcos.... >> >>> 0002: Since you have moved the `CopyFormat enum` into 0001, the >>> following commit msg should be rephrased. >>> >>> The CopyFormat enum was originally contributed by Joel Jacobson >>> joel@compiler.org, later refactored by Jian He to address various issues, and >>> further adapted by Junwang Zhao to support the newly introduced CopyToRoutine >>> struct (commit 2e4127b6d2). >>> >>> - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim) >>> - ereport(ERROR, >>> - (errcode(ERRCODE_SYNTAX_ERROR), >>> - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ >>> - errmsg("cannot specify %s in BINARY mode", "DELIMITER"))); >>> + if (opts_out->delim) >>> + { >>> + if (opts_out->format == COPY_FORMAT_BINARY) >>> + ereport(ERROR, >>> + errcode(ERRCODE_SYNTAX_ERROR), >>> + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ >>> + errmsg("cannot specify %s in BINARY mode", "DELIMITER")); >>> + else if (opts_out->format == COPY_FORMAT_JSON) >>> + ereport(ERROR, >>> + errcode(ERRCODE_SYNTAX_ERROR), >>> + errmsg("cannot specify %s in JSON mode", "DELIMITER")); >>> + } >>> >>> Can we add a function that converts CopyFormat to a string? Treating >>> CopyFormat as %s in error messages would make the code shorter. >>> However, I'm not sure whether this aligns with translation >>> conventions, correct me if I'm wrong. >>> >> I don’t think this is worth the added complexity. >> That said, I tried to simplify the code and changed it to: >> >> if (opts_out->delim && >> (opts_out->format == COPY_FORMAT_BINARY || >> opts_out->format == COPY_FORMAT_JSON)) >> ereport(ERROR, >> errcode(ERRCODE_SYNTAX_ERROR), >> opts_out->format == COPY_FORMAT_BINARY >> ? errmsg("cannot specify %s in BINARY mode", "DELIMITER") >> : errmsg("cannot specify %s in JSON mode", "DELIMITER")); >> >>> - * CSV and text formats share the same TextLike routines except for the >>> + * CSV and text, json formats share the same TextLike routines except for the >>> >>> I'd suggest rewording to `CSV, text and json ...`. The same applied to >>> other parts in this patch. >>> >> sure. >> >>> 0003: The commit message includes some changes(adapt the newly >>> introduced CopyToRoutine) that actually belong to 0002; it would be >>> better to remove them from this commit. >>> >> 0002 commit message: >> """ >> This introduces the JSON format option for the COPY TO command, allowing users >> to export query results or table data directly as a single JSON object or a >> stream of JSON objects. >> >> The JSON format is currently supported only for COPY TO operations; it >> is not available for COPY FROM. >> >> JSON format is incompatible with some standard text/CSV parsing or >> formatting options, >> including: >> - HEADER >> - DEFAULT >> - NULL >> - DELIMITER >> - FORCE QUOTE / FORCE NOT NULL >> >> Regression tests covering valid JSON exports and error handling for >> incompatible options have been added to src/test/regress/sql/copy.sql. >> """ >> >> 0003 commit message: >> """ >> Add option force_array for COPY JSON FORMAT >> This adds the force_array option, which is available exclusively >> when using COPY TO with the JSON format. >> >> When enabled, this option wraps the output in a top-level JSON array >> (enclosed in square brackets with comma-separated elements), making the >> entire result a valid single JSON value. Without this option, the default >> behavior is to output a stream of independent JSON objects. >> >> Attempting to use this option with COPY FROM or with formats other than >> JSON will raise an error. >> """ >> >>> + if (cstate->json_row_delim_needed && cstate->opts.force_array) >>> + CopySendChar(cstate, ','); >>> + else if (cstate->opts.force_array) >>> + { >>> + /* first row needs no delimiter */ >>> + CopySendChar(cstate, ' '); >>> + cstate->json_row_delim_needed = true; >>> + } >>> >>> can we do this: >>> >>> if (cstate->opts.force_array) >>> { >>> if (cstate->json_row_delim_needed) >>> CopySendChar(cstate, ','); >>> else >>> { >>> /* first row needs no delimiter */ >>> CopySendChar(cstate, ' '); >>> cstate->json_row_delim_needed = true; >>> } >>> } >>> >> good suggestion. >> >> If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can >> switch to it accordingly. >> The change itself is quite straightforward. > > > I have reworked these. First I cleaned up a number of things in patches > 2 and 3 (Thanks, Claude for the summary): > > Patch 2: json format for COPY TO > > copy.c: >   - "json" → "JSON" in the COPY FROM rejection error message. > >   copyto.c: >   1. TupleDesc setup runs once, not every row — Added > json_tupledesc_ready flag; the memcpy/populate_compact_attribute/ > BlessTupleDesc block is now guarded by if (!cstate->json_tupledesc_ready). >   2. Comment rewritten — Old: "the slot's TupleDesc may change during > query execution". New: explains BlessTupleDesc registers the RECORDOID > descriptor so lookup_rowtype_tupdesc inside composite_to_json can >   find it. >   3. Eliminated per-row makeStringInfo() — Added StringInfoData > json_buf to the struct, initialized once in CopyToTextLikeStart in > copycontext. Each row does resetStringInfo instead of allocating a new >   StringInfo. >   4. Column list rejection added — Error: "column selection is not > supported in JSON mode" when attnamelist != NIL. >   5. Improved SendCopyBegin comment — Old: "JSON format is always one > non-binary column". New: explains each CopyData message contains one > complete JSON object. > >   Tests: >   6. Added copy copytest (style) to stdout (format json) to the error- > case block. >   7. Added copyjsontype table test with json, jsonb columns — verifies > values are embedded directly, not double-encoded. Covers json objects, > scalars, arrays, nested objects, and nulls. > >   Patch 3: Add option force_array > >   copy.c: >   1. "json" → "JSON" in COPY FROM error (carried from patch 2). >   2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY > error. > >   copyto.c: >   3. Struct fields reorganized — JSON fields grouped under /* JSON > format state */ comment with inline descriptions, instead of a > standalone json_row_delim_needed with a vague comment. >   4. Block comment updated — Was "CSV, text and json formats share the > same TextLike routines except for the one-row callback". Now correctly > notes JSON has its own one-row and end callbacks. >   5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now > "text and CSV" (JSON uses CopyToJsonEnd). >   6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside > the if (format == JSON) block after json_buf init, instead of a separate > top-level conditional. >   7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf, > improved comments, column list rejection. > >   Tests: >   8. --Error → -- should fail: force_array requires json format; --ok → > -- force_array variants. >   9. copyjsontype test carried through from patch 2. > > Then I reworked the way this works. In order to support column lists > with JSON output, we need to deal with individual columns instead of > whole records. This involved quite a number of changes, as can be seen > in patch 4. This involved exporting a new small function from json.c. > > The result is a lot cleaner, I believe, and in my benchmarking is faster > by a factor of almost 2. Andrew, I don't see the actual patches. Did I miss it somewhere? -- Joe Conway PostgreSQL Contributors Team Amazon Web Services: https://aws.amazon.com