Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAHqY-005ryK-7g for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Dec 2023 22:55:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rAHqU-001YfL-63 for pgsql-hackers@arkaria.postgresql.org; Mon, 04 Dec 2023 22:55:22 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAHqT-001YfD-OT for pgsql-hackers@lists.postgresql.org; Mon, 04 Dec 2023 22:55:21 +0000 Received: from mxout1-he-de.apache.org ([95.216.194.37]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rAHqQ-008p4C-F9 for pgsql-hackers@postgresql.org; Mon, 04 Dec 2023 22:55:20 +0000 Received: from mail.apache.org (mailgw-he-de.apache.org [IPv6:2a01:4f8:c2c:d4aa::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with ESMTPS id 8BD3960C56 for ; Mon, 4 Dec 2023 22:55:14 +0000 (UTC) Received: (qmail 3943996 invoked by uid 116); 4 Dec 2023 22:55:14 -0000 Received: from mailrelay1-he-de.apache.org (HELO mailrelay1-he-de.apache.org) (116.203.21.61) by apache.org (qpsmtpd/0.94) with ESMTP; Mon, 04 Dec 2023 22:55:14 +0000 Authentication-Results: apache.org; auth=none Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by mailrelay1-he-de.apache.org (ASF Mail Server at mailrelay1-he-de.apache.org) with ESMTPSA id CD00D40C5E for ; Mon, 4 Dec 2023 22:55:13 +0000 (UTC) Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6ce5f4b3eefso927807b3a.3 for ; Mon, 04 Dec 2023 14:55:13 -0800 (PST) X-Gm-Message-State: AOJu0Yxb5pO6PPJtJPgUN4xQ7ENeUIjQRwl+iYZKjVAefir0j3KkSgdp WkF0uVvL7tN6EldXUwnBUxH/iH9Ru6QdG1R03AM= X-Google-Smtp-Source: AGHT+IGSmFrDMBumkB5fX/oqsur6sCqRGTRAqSWVZ3czJk2mmsssNUfkKXJJQ4j1fh/X49e/8mZbLkRnQlpilwMD7AU= X-Received: by 2002:a05:6a21:3007:b0:18f:97c:5ba6 with SMTP id yd7-20020a056a21300700b0018f097c5ba6mr1716903pzb.116.1701730512241; Mon, 04 Dec 2023 14:55:12 -0800 (PST) MIME-Version: 1.0 References: <3853387.1701096982@sss.pgh.pa.us> <3a98decf-3fe3-4b49-9b68-fda01338872c@sedlakovi.org> <24e3ee88-ec1e-421b-89ae-8a47ee0d2df1@joeconway.com> <7117a356-916c-4cf3-bad8-861490e65dcf@joeconway.com> <9c77b6fa-ee88-b2e6-0fa7-4fc81721da35@dunslane.net> <41dcba92-1075-e5e5-cb99-36711abf6cec@dunslane.net> <5c84b70b-ba18-c45d-dbbe-612fa229b2ce@dunslane.net> In-Reply-To: <5c84b70b-ba18-c45d-dbbe-612fa229b2ce@dunslane.net> From: Davin Shearer Date: Mon, 4 Dec 2023 17:55:00 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Emitting JSON to file using COPY TO To: Andrew Dunstan , PostgreSQL-development Cc: Joe Conway Content-Type: multipart/alternative; boundary="000000000000af8d7a060bb70275" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000af8d7a060bb70275 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sorry about the top posting / top quoting... the link you sent me gives me a 404. I'm not exactly sure what top quoting / posting means and Googling those terms wasn't helpful for me, but I've removed the quoting that my mail client is automatically "helpfully" adding to my emails. I mean no offense. Okay, digging in more... If the value contains text that has BOMs [footnote 1] in it, it must be preserved (the database doesn't need to interpret them or do anything special with them - just store it and fetch it). There are however a few characters that need to be escaped (per https://www.w3docs.com/snippets/java/how-should-i-escape-strings-in-json.ht= ml) so that the JSON format isn't broken. They are: 1. " (double quote) 2. \ (backslash) 3. / (forward slash) 4. \b (backspace) 5. \f (form feed) 6. \n (new line) 7. \r (carriage return) 8. \t (horizontal tab) These characters should be represented in the test cases to see how the escaping behaves and to ensure that the escaping is done properly per JSON requirements. Forward slash comes as a bit of a surprise to me, but `jq` handles it either way: =E2=9E=9C echo '{"key": "this / is a forward slash"}' | jq . { "key": "this / is a forward slash" } =E2=9E=9C echo '{"key": "this \/ is a forward slash"}' | jq . { "key": "this / is a forward slash" } Hope it helps, and thank you! 1. I don't disagree that BOMs shouldn't be used for UTF-8, but I'm also processing UTF-16{BE,LE} and UTF-32{BE,LE} (as well as other textural formats that are neither ASCII or Unicode). I don't have the luxury of changing the data that is given. --000000000000af8d7a060bb70275 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sorry about the top posting / top= quoting... the link you sent me gives me a 404.=C2=A0 I'm not exactly = sure what top quoting / posting means and Googling those terms wasn't h= elpful for me, but I've removed the quoting that my mail client is auto= matically "helpfully" adding to my emails.=C2=A0 I mean no offens= e.

Okay, digging in more...

If the value contains tex= t that has BOMs [footnote=C2=A01] in it, it must be preserved (the database= doesn't need to interpret them or do anything special with them - just= store it and fetch it).=C2=A0 There are however a few characters that need= to be escaped (per=C2=A0https://www.w3docs.com/snippets/java/how-should-i-escape-strin= gs-in-json.html) so that the JSON format isn't broken.=C2=A0 They a= re:

  1. "=C2=A0(double quote)
  2. \=C2=A0(backslash)
  3. /=C2=A0<= /span>(forward slash)
  4. <= code style=3D"color:rgb(0,0,0);border:0px solid rgb(229,231,235);box-sizing= :border-box;font-weight:600;padding:2px 8px">\b=C2=A0(backspace)
  5. \f=C2=A0= (form feed)
  6. =C2=A0(new line)
    <= /li>
  7. \r=C2=A0(carriage return)
  8. \t=C2=A0(horizontal tab)

These characters should be represented in the te= st cases to see how the escaping=C2=A0behaves and to ensure that the escapi= ng is done properly per JSON requirements.=C2=A0 Forward slash comes as a b= it of a surprise to me, but `jq` handles it either way:

=E2=9E=9C echo '{"key": "this / is a forw= ard slash"}' | jq .
{
=C2=A0 "key": "this / i= s a forward slash"
}
=E2=9E=9C echo '{"key": "= ;this \/ is a forward slash"}' | jq .
{
=C2=A0 "key&quo= t;: "this / is a forward slash"
}

<= span style=3D"font-family:monospace">Hope it helps,=C2=A0and thank=C2=A0you= !

= 1. I don't disagree that BOMs shouldn't be= used for UTF-8, but I'm also processing UTF-16{BE,LE} and UTF-32{BE,LE= } (as well as other textural formats that=C2=A0are neither=C2=A0ASCII or Un= icode).=C2=A0 I don't have the luxury of changing the data that is give= n.

--000000000000af8d7a060bb70275--