Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAzZ6-00916i-CE for pgsql-hackers@arkaria.postgresql.org; Wed, 06 Dec 2023 21:36:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rAzZ5-00Cj5U-1q for pgsql-hackers@arkaria.postgresql.org; Wed, 06 Dec 2023 21:36:19 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAzZ4-00Cj5K-Ja for pgsql-hackers@lists.postgresql.org; Wed, 06 Dec 2023 21:36:18 +0000 Received: from mail-pj1-x102a.google.com ([2607:f8b0:4864:20::102a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rAzZ1-0098tE-79 for pgsql-hackers@postgresql.org; Wed, 06 Dec 2023 21:36:17 +0000 Received: by mail-pj1-x102a.google.com with SMTP id 98e67ed59e1d1-2851a2b30a2so248949a91.3 for ; Wed, 06 Dec 2023 13:36:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jackdb.com; s=google; t=1701898574; x=1702503374; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Un4vQVbfHhx8yPFU0Isn/qG47jHbPC9bOW+1F4+TFxQ=; b=aB+ET11ab6YzTAOylGaY1WZ+4RnjpaOVNJgVv1+UEH3KwWnRWdm6ZF/MZBtSbMIvjZ kMJLOTt24ay0OvW5LOWP0IZG2Hw6nnCGU2ZDmuFy6UgsSt6HvS4pKkm0f4oubFj7HJC7 MO5gPrQlugJVW5VSq6WdEpfdWLYSdd+FJKF/c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701898574; x=1702503374; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Un4vQVbfHhx8yPFU0Isn/qG47jHbPC9bOW+1F4+TFxQ=; b=OES9Xy6pM/HZ9G7Ltti1y1NIIFR29Sz8eeQBdvm3EoFCt2olvZw3Y6yM7XQnumfozg 0wNywNTFRRpX0sOfMqfHJGkVzclQgHKePf6mbt0nU/If3SKWCKpn/qNrrTX2mVS6HFzf OiN+TMWT/Q1iFXH9xujlL3ommCOA8XvWnxUOHXChuvdT5ttrzvvk0O3JpZ4oM8IN0eSX xuLb7Yr+DwbF+4aX3MNAi4TbT7zZcxrlY+ztcxlX75RJvNJg+Zcxj3mTnNO7T/gPN+wa CItUcIf5bOcsSyKsIwX1AMQ/FdNA2G22/VxxHmMrwpViJ8WEltZ8uq7wk9q3xeiSg6nE 1iXQ== X-Gm-Message-State: AOJu0YwrA2gJzyuJSIF7FarNxUY3v9u0h9O+/Dydr79apR80Oe4XE8/a c0FP9cj8Eg1vyzpArVrMcqybLYX08/M5M4xDmBudxA== X-Google-Smtp-Source: AGHT+IEMKhpxBLvDuFCfVC/sAj2Nm7WNiykw8yFjzy/dNEkZzDDjs2ydJPk5LOnWQXwn8oTL4lVvTavbR8MMG2SzEjA= X-Received: by 2002:a17:90b:1951:b0:286:6cc0:8849 with SMTP id nk17-20020a17090b195100b002866cc08849mr1409152pjb.54.1701898574054; Wed, 06 Dec 2023 13:36:14 -0800 (PST) MIME-Version: 1.0 References: <926ff917-8371-40ec-b5e6-ab7b0e09bdc5@joeconway.com> <315b81d4-4b67-7828-0355-3808cd14acd1@dunslane.net> <7a60faf6-e7f1-419d-aee6-10a78ea2fe81@joeconway.com> <2e7ff718-895d-83fc-46f7-be25e23b23b4@dunslane.net> <1104915.1701877459@sss.pgh.pa.us> <19a5f9d8-bd1f-9!e51-0f5b-510c1189a8a7@dunslane.net> <1110690.1701880139@sss.pgh.pa.us> <20231206163349.GA2873889@nathanxps13> <20231206164439.GC2873889@nathanxps13> <77312d02-9325-4f14-9fe0-2746c2ee12bc@joeconway.com> <1136975.1701894046@sss.pgh.pa.us> <8724f1d2-d11c-aaf2-2cd3-3725e9441d1e@dunslane.net> In-Reply-To: <8724f1d2-d11c-aaf2-2cd3-3725e9441d1e@dunslane.net> From: Sehrope Sarkuni Date: Wed, 6 Dec 2023 16:36:02 -0500 Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Andrew Dunstan Cc: Tom Lane , Joe Conway , Nathan Bossart , Davin Shearer , PostgreSQL-development Content-Type: multipart/alternative; boundary="000000000000f35f9e060bde2376" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f35f9e060bde2376 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Dec 6, 2023 at 4:03=E2=80=AFPM Andrew Dunstan = wrote: > > The output size difference does say that maybe we should pay some > > attention to the nearby request to not always label every field. > > Perhaps there should be an option for each row to transform to > > a JSON array rather than an object? > > I doubt it. People who want this are likely to want pretty much what > this patch is providing, not something they would have to transform in > order to get it. If they want space-efficient data they won't really be > wanting JSON. Maybe they want Protocol Buffers or something in that vein. > For arrays v.s. objects, it's not just about data size. There are plenty of situations where a JSON array is superior to an object (e.g. duplicate column names). Lines of JSON arrays of strings is pretty much CSV with JSON escaping rules and a pair of wrapping brackets. It's common for tabular data in node.js environments as you don't need a separate CSV parser. Each one has its place and a default of the row_to_json(...) representation of the row still makes sense. But if the user has the option of outputting a single json/jsonb field for each row without an object or array wrapper, then it's possible to support all of these use cases as the user can explicitly pick whatever envelope makes sense: -- Lines of JSON arrays: COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3) a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE); ["test-1", 5] ["test-2", 5] ["test-3", 5] ["test-1", 6] ["test-2", 6] ["test-3", 6] -- Lines of JSON strings: COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE); "test-1" "test-2" "test-3" "test-4" "test-5" I'm not sure how I feel about the behavior being automatic if it's a single top level json / jsonb field rather than requiring the explicit option. It's probably what a user would want but it also feels odd to change the output wrapper automatically based on the fields in the response. If it is automatic and the user wants the additional envelope, the option always exists to wrap it further in another: json_build_object('some_field", my_field_i_want_wrapped) The duplicate field names would be a good test case too. I haven't gone through this patch but I'm guessing it doesn't filter out duplicates so the behavior would match up with row_to_json(...), i.e. duplicates are preserved: =3D> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t; row_to_json --------------- {"a":1,"a":2} If so, that's a good test case to add as however that's handled should be deterministic. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ --000000000000f35f9e060bde2376 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Dec 6, 2023 at= 4:03=E2=80=AFPM Andrew Dunstan <= andrew@dunslane.net> wrote:
=

If so, th= at's a good test case to add as however that's handled should be de= terministic.

=
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. |=C2=A0https://www.jackdb.com/

=C2=A0
--000000000000f35f9e060bde2376--