Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ravkP-00CAzM-9M for psycopg@arkaria.postgresql.org; Fri, 16 Feb 2024 10:47:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ravkN-001Iah-JC for psycopg@arkaria.postgresql.org; Fri, 16 Feb 2024 10:47:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ravkN-001IaZ-94; Fri, 16 Feb 2024 10:47:11 +0000 Received: from mout.gmx.net ([212.227.15.18]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ravkK-007UV1-3D; Fri, 16 Feb 2024 10:47:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=s31663417; t=1708080426; x=1708685226; i=karsten.hilbert@gmx.net; bh=F3A2yYcFZn5qo6vZT0l3glTwQTXCsbfcFpq6KvjVOyY=; h=X-UI-Sender-Class:Date:From:To:Subject:References:In-Reply-To; b=coLlT7SbVV44KOTyfOQzaIQ0Gg76TbwAKugLLdxzFa0TexkPmxPCHNjdSDdojBRG oH+p9ghFuYsyc0KHEnNS8vzAvyw0LXsi3E7AzFKbNv4HRkf8DHKW/jV4nY1a6u45T wUQFf34TgIT+4khalz7YieiG1ThFmK5ZTCcEYI/EyE2hOrpN9Mq3m1XC4nk6ruDfz SD9e4nmskRDOjPM8fhlEkzMo+Rj0Lxb2xfjEBHGrzXjHEm/9YHg9PZIypInWBTdxy Tm1pKKbt2xuBIZ2rN2xBIbeA+PDhAIjCw+l9AKvvtAVbYsqACYZFKmYtnbjlEaKMM 2ppBLcX+nA1JiJquKQ== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from hermes ([88.117.178.2]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MkHMZ-1r89QW43mT-00kfLw; Fri, 16 Feb 2024 11:47:06 +0100 Received: from ncq by hermes with local (Exim 4.96) (envelope-from ) id 1ravkf-0000Rh-2D; Fri, 16 Feb 2024 11:47:29 +0100 Date: Fri, 16 Feb 2024 11:47:29 +0100 From: Karsten Hilbert To: psycopg@lists.postgresql.org, "psycopg@postgresql.org" Subject: Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 Message-ID: Mail-Followup-To: psycopg@lists.postgresql.org, "psycopg@postgresql.org" References: <20240213193732.28cb8329@slate.karlpinc.com> <20240214094203.52d7e22d@slate.karlpinc.com> <20240215234515.0ab71d58@slate.karlpinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240215234515.0ab71d58@slate.karlpinc.com> Ma_X_il-Followup-to: d Re_X_turn-receipt-to: Karsten.Hilbert@gmx.net Di_X_sposition-Notification-To: Karsten.Hilbert@gmx.net X-Confi_X_rm-Reading-To: Karsten.Hilbert@gmx.net X-Pri_X_ority: 2 (High) Sender: X-Provags-ID: V03:K1:+fH6Dck9OZD6XNYc8DshdTdOzixvaS965Fps513uvWTGYpluSaH fEtqF4xG1IjhcBZB+KeeiPnRKhdpQYluSP/R8T+uUYxZ2b/F4Ht9ozYNvobSjHi5di5hd2t +oXmGGcBPmaW7tqlXGPTtw/3/C8GlTMrm4Vp8227j/bz4wdeO/u6T3bh7/YylWHbDj9C+sc an8x1PyKCN8StJ3px8/bw== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:UfI2M5/CsGI=;iGubA24KhsunLN2rCWnd1JNhKaC gGC3SCXFW4Vlw+cWcoVCudYGQ3cxKlvd33FojHBW3MC5C1cd11IdQfisJMEoNI2DDnqNaQv7i I0JJuDU0sJbn9iVwtM+HyJFnZeqQpat04qDkucbEkwQKpQz0aKuKSSWGTJtxPwTRSj0R29BnL zb2D3viFKaxLnL0ZdziSOLFXpofwnJCE+0sc06gYe7qV/mwd6h26vr0r0lHxQWwsLUPgXFfqH AaEKPE6yFqR/e/neiGuDyXOGLQC1Ekj1vVeikR6mlnWdNGSvYnG9crqCQZA0HFIHChc17Bk/Q SgiAUZilOJ/CZQiHKRbA6Zje5eQfrzfmIrLM6yS95qPgFtKuVgSZn13EmLBNWpDO1TB7DNUgb MHSKfPOXTXti6sIR8cJr/VuzAlHsKK0eHJOZTN3eIwVZ0BYJeq/nO9FJMYFCusBt4yOcwOwj9 JzshPS75zuyUD3UDRaJ/0veH+ePGSlE2zmfo5fGUfHBUjuIuU+Zuuq+ijmbUiJMnBGkSMyUSY vjFQkaxFiDjzwj1wrt3y2hWzYdlaob2XyeHVLZI8qeYRfgK1SqN1wpafLx2QOCP0Ouf+BLZal uUMgHWoEYZUl0RyrN8UoztWfg7hHgumhpZ2ERrz7JrppVx3nG7GuK0OFaN5TJj3Zb/LYH6Qkn ZwUjmOFImz+7q9tJgfGvXNVnIPkmKWld8eFcJjAkG6liktXNI5by7tO+pfDbXcikbl923wwY0 CmVQCQkBww6K0gsYNVmSp6aplBSpNOAsjsvGzb4EMSyP2n5OjZUiJWyAGr3xsZce1WHAIU0m9 Fczon9IJO678qLCH/3jw1OKWMF0OGIY2RK/I7FdN3HZCw= Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Am Thu, Feb 15, 2024 at 11:45:15PM -0600 schrieb Karl O. Pinc: > Today there is no substitute for knowing the encoding of the > text your application obtains from the outside world. > This can be highly system dependent because when reading > files open()-ed as text, Python decodes (into UTF-8) the bytes read. Not quite. Python assumes the bytes in the file *are* encoded by whatever encoding is passed to open(), including, if so UTF-8). It then decodes said bytes into *unicode code points*. If we want them back as UTF-8 we need to encode them as such. > By default decoding from the system locale's character encoding. > And when writing files open()-ed as text Python encodes (from UTF-8) again, from unicode, that is: https://docs.python.org/3/howto/unicode.html > No matter how you get your data, to put your data into > the database as text, its bytes must first have their external > encoding decoded to UTF-8. Because Python strings are > UTF-8. unicode code points, but, yeah > Once in Python, psycopg converts the UTF-8 text to the database unicode > It's important to get the encoding right so I think it'd be > good to talk about it. +1 Karsten =2D- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B