public inbox for [email protected]  
help / color / mirror / Atom feed
From: Karsten Hilbert <[email protected]>
To: [email protected]
To: [email protected] <[email protected]>
Subject: Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
Date: Fri, 16 Feb 2024 11:47:29 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CA+mi_8Y0jzEZA+3kCnTtWCS8cvWTUFcD=0J+ihiamh=y8GvOxg@mail.gmail.com>
	<[email protected]>
	<CA+mi_8bKxi286PjqYrt9Epcd2dxL+vFtMDJuyurV2edsOEMqJg@mail.gmail.com>
	<[email protected]>

Am Thu, Feb 15, 2024 at 11:45:15PM -0600 schrieb Karl O. Pinc:

>   Today there is no substitute for knowing the encoding of the
> text your application obtains from the outside world.
> This can be highly system dependent because when reading
> files open()-ed as text, Python decodes (into UTF-8) the bytes read.

Not quite. Python assumes the bytes in the file *are* encoded
by whatever encoding is passed to open(), including, if so
UTF-8). It then decodes said bytes into *unicode code
points*. If we want them back as UTF-8 we need to encode them
as such.

> By default decoding from the system locale's character encoding.
> And when writing files open()-ed as text Python encodes (from UTF-8)

again, from unicode, that is:

	https://docs.python.org/3/howto/unicode.html

>   No matter how you get your data, to put your data into
> the database as text, its bytes must first have their external
> encoding decoded to UTF-8.  Because Python strings are
> UTF-8.

unicode code points, but, yeah

>   Once in Python, psycopg converts the UTF-8 text to the database

unicode

> It's important to get the encoding right so I think it'd be
> good to talk about it.

+1

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B






view thread (6+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox