Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAulX-008iT7-A0 for pgsql-hackers@arkaria.postgresql.org; Wed, 06 Dec 2023 16:28:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rAulV-00BFT6-Ua for pgsql-hackers@arkaria.postgresql.org; Wed, 06 Dec 2023 16:28:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAulV-00BFSy-JC for pgsql-hackers@lists.postgresql.org; Wed, 06 Dec 2023 16:28:49 +0000 Received: from mail-pf1-x42b.google.com ([2607:f8b0:4864:20::42b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rAulS-00APup-KH for pgsql-hackers@postgresql.org; Wed, 06 Dec 2023 16:28:49 +0000 Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-6cdcef787ffso12238b3a.0 for ; Wed, 06 Dec 2023 08:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jackdb.com; s=google; t=1701880125; x=1702484925; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FuwlsCjamKf1n7V1WV+VToDp3ukHq4M1CSemDyIp+kk=; b=gUnPckVLY/BVbFgDmSAUU2rj/XOPGOGqJDmnZbLnyQUwBWLVSTeRkTeHvLw9GbroY4 Y9Wme64hbb+GbCTaxqA+JjS/mQ43q8gjVS7m0PkgfeuFO7E4Mt5hIpJUoAG0O65u3BOm ypJCGgdzUiohtJjUry7aSD/ixeT2WtvGf3euo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701880125; x=1702484925; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FuwlsCjamKf1n7V1WV+VToDp3ukHq4M1CSemDyIp+kk=; b=nzZeXLuAwN9klXmIBtV2O2Fp1CxJifYeHui8IcbgKaH25XXERrugIx5HiJexGnwocG LD13jZVMo1vri3ErMhjnCiW7O0caQzj9vLMfeRq8FQwa3gmpqzVRpLSekfcm7u2tgDHd LNae7aIVSWDpTAaM7qFQATd/bNPaHdPlEfUanwGckELGw+ybv4tzlDIAznc7qunrA7AT 8+eTu4Irr7AHKyFMMgs7Mwf4cW007rhPWgrx16LW/+Yls//FV172A2tRaYHVbZsZZdlW OQ/CPYT0lPxFHr9sL5ys6nDL6R/UAa2KnTVPup1O3u9HoCYgamM37CPLZUl+WBFqLkzJ NABg== X-Gm-Message-State: AOJu0YzLxvmmJ+aInq6cOSS0Y+csgY3dzunhNS7do8speE+Pjwir8NYN Bv2rGxIRqpdS/RQULp6u7kOVIguopzSubk6XMLySSjxDBOSsqkIjeRY= X-Google-Smtp-Source: AGHT+IG26Q215L1F1tCl2HgMazonMAXIW8+eFJNOeBdyDVAm2wfVgXms7PjxRCfX6StYuxePfBAqavpTZe/u2/iq3XU= X-Received: by 2002:a05:6a00:1d09:b0:6ce:79b3:b28a with SMTP id a9-20020a056a001d0900b006ce79b3b28amr1229283pfx.60.1701880124818; Wed, 06 Dec 2023 08:28:44 -0800 (PST) MIME-Version: 1.0 References: <5c84b70b-ba18-c45d-dbbe-612fa229b2ce@dunslane.net> <398c22f6-4299-4b17-80bf-2f14f4afd592@joeconway.com> <46cc4507-a0d9-4044-b2ce-5a8bca8015c0@joeconway.com> <2554e520-e103-8978-dcb5-807dfeb77402@dunslane.net> <926ff917-8371-40ec-b5e6-ab7b0e09bdc5@joeconway.com> <315b81d4-4b67-7828-0355-3808cd14acd1@dunslane.net> <7a60faf6-e7f1-419d-aee6-10a78ea2fe81@joeconway.com> <2e7ff718-895d-83fc-46f7-be25e23b23b4@dunslane.net> <1104915.1701877459@sss.pgh.pa.us> <19a5f9d8-bd1f-9e51-0f5b-510c1189a8a7@dunslane.net> In-Reply-To: <19a5f9d8-bd1f-9e51-0f5b-510c1189a8a7@dunslane.net> From: Sehrope Sarkuni Date: Wed, 6 Dec 2023 11:28:33 -0500 Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Andrew Dunstan Cc: Tom Lane , Joe Conway , Davin Shearer , PostgreSQL-development Content-Type: multipart/alternative; boundary="0000000000004a5da8060bd9d88f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004a5da8060bd9d88f Content-Type: text/plain; charset="UTF-8" Big +1 to this overall feature. This is something I've wanted for a long time as well. While it's possible to use a COPY with text output for a trivial case, the double escaping falls apart quickly for arbitrary data. It's really only usable when you know exactly what you are querying and know it will not be a problem. Regarding the defaults for the output, I think JSON lines (rather than a JSON array of objects) would be preferred. It's more natural to combine them and generate that type of data on the fly rather than forcing aggregation into a single object. Couple more features / use cases come to mind as well. Even if they're not part of a first round of this feature I think it'd be helpful to document them now as it might give some ideas for what does make that first cut: 1. Outputting a top level JSON object without the additional column keys. IIUC, the top level keys are always the column names. A common use case would be a single json/jsonb column that is already formatted exactly as the user would like for output. Rather than enveloping it in an object with a dedicated key, it would be nice to be able to output it directly. This would allow non-object results to be outputted as well (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is structured, I think this would play nice with the JSON lines v.s. array concept. COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_NOT_ENVELOPE) {"foo":1} {"foo":2} {"foo":3} 2. An option to ignore null fields so they are excluded from the output. This would not be a default but would allow shrinking the total size of the output data in many situations. This would be recursive to allow nested objects to be shrunk down (not just the top level). This might be worthwhile as a standalone JSON function though handling it during output would be more efficient as it'd only be read once. COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS) {} {"foo":2} {"foo":3} 3. Reverse of #2 when copying data in to allow defaulting missing fields to NULL. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ --0000000000004a5da8060bd9d88f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Big +1 to this overall feature.

Th= is is something I've wanted for a long time as well. While it's pos= sible to use a COPY with text output for a trivial case, the double escapin= g falls apart quickly for arbitrary data. It's=C2=A0really only usable = when you know exactly what you are querying and know it will not be a probl= em.

Regarding the defaults for the output, I think JSON lines (rathe= r than a JSON array of objects) would be preferred. It's more natural t= o combine them and generate that type of data on the fly rather than forcin= g aggregation into a single object.

Couple more features / use cases= come to mind as well. Even if they're not part of a first round of thi= s feature I think it'd be helpful to document them now as it might give= some ideas for what does make that first cut:

1. Outputting a top l= evel JSON object without the additional column keys. IIUC, the top level ke= ys are always the column names. A common use case would be a single json/js= onb column that is already formatted exactly as the user would like for out= put. Rather than enveloping it in an object with a dedicated key, it would = be nice to be able to output it directly. This would allow non-object resul= ts to be outputted as well (e.g., lines of JSON arrays, numbers, or strings= ). Due to how JSON is structured, I think this would play nice with the JSO= N lines v.s. array concept.

COPY (SELECT json_build_object('foo&= #39;, x) AS i_am_ignored FROM generate_series(1, 3) x) TO STDOUT WITH (FORM= AT JSON, SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo&= quot;:2}
{"foo":3}

2. An option to ignore null fields s= o they are excluded from the output. This would not be a default but would = allow shrinking the total size of the output data in many situations. This = would be recursive to allow nested objects to be shrunk down (not just the = top level). This might be worthwhile as a standalone JSON function though h= andling it during output would be more efficient as it'd only be read o= nce.

COPY (SELECT json_build_object('foo', CASE WHEN x > = 1 THEN x END) FROM generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, SO= ME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}{"foo":3}

3. Reverse of #2 when copying data in to allow = defaulting missing fields to NULL.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. |=C2=A0https://www.jackdb.com/
--0000000000004a5da8060bd9d88f--