Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ra4DZ-007uWr-Db for psycopg@arkaria.postgresql.org; Wed, 14 Feb 2024 01:37:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ra4DY-003Lh8-3B for psycopg@arkaria.postgresql.org; Wed, 14 Feb 2024 01:37:44 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ra4DX-003Lh0-SH for psycopg@lists.postgresql.org; Wed, 14 Feb 2024 01:37:43 +0000 Received: from janus.karlpinc.com ([173.161.46.12] helo=smtp.karlpinc.com) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ra4DQ-0074qd-SN for psycopg@postgresql.org; Wed, 14 Feb 2024 01:37:43 +0000 Received: from slate.karlpinc.com (unknown [192.168.1.14]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp.karlpinc.com (Postfix) with ESMTPS id 6CF24A111 for ; Tue, 13 Feb 2024 19:37:34 -0600 (CST) Received: from slate.karlpinc.com (localhost [IPv6:::1]) by slate.karlpinc.com (Postfix) with ESMTPS id 2DFCF3FB42 for ; Tue, 13 Feb 2024 19:37:33 -0600 (CST) Date: Tue, 13 Feb 2024 19:37:32 -0600 From: "Karl O. Pinc" To: "psycopg@postgresql.org" Subject: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 Message-ID: <20240213193732.28cb8329@slate.karlpinc.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, I'm writing a program which takes a text file from the user, constructs an INPUT statement (the table and column names are not known in advance), and uses the constructed statement to insert rows of the text file into PG. psycopg3 v3.1.18 (installed in a venv) PG v15.5 Python 3.11.2 Debian 12.5 (stable, bookworm) Should the user supply text that cannot be converted to the server's encoding I want to report exactly where in the data the problem lies. The sample code below contains (some of) what I want to report when there is a problem. Because I want to know more than just "this particular row of data caused a problem", after an error is discovered the code below calls a psycopg3 dumper on each data value to try to determine which data value is problematic. This does not work. What is wrong with what I'm doing and how do I do what I want? (And how am I supposed to know why this does not work and what works?) I call the dumper because I want to rely on psycopg3's mechanisms and not have to query the server for its encoding and figure out the PG->Python encoding mappings myself. Thanks for the help. Setup: Client side encoding is UTF-8. create database test_latin1 template template0 encoding 'LATIN1' locale 'C'; -- In the latin1 db create table foo (col1 text, col2 text, col3 text); Mock up of desired code (only slightly mangled by my email client): ---------------------------- #!/usr/bin/python3 # # Test handing of database insertion encoding exception import psycopg import psycopg.adapt import sys # Latin1 does not have the euro symbol data_row =3D ("ascii", "euro =E2=82=AC symbol", "more ascii") # Connect to an existing database with psycopg.connect("dbname=3Dtest_latin1 user=3Dtest") as conn: text_dumper =3D conn.adapters.get_dumper( str, psycopg.adapt.PyFormat.TEXT) # Open a cursor to perform database operations with conn.cursor() as cur: try: cur.execute( """ insert into foo (col1, col2, col3) values (%s, %s, %s) """, data_row, ) except UnicodeEncodeError: # Call the text dumper on each data value to find which # caused the problem and provide a detailed report. col =3D 1 for col_val in data_row: try: text_dumper.dump(col_val) except UnicodeEncodeError as err: print( f"Column number {col} ({col_val}) contains a" " character" f" ({col_val[err.start, err.end]}), position" f" {err.start + 1} through {err.end}, that" " cannot be converted to the server-side" " encoding" f" ({err.encoding})" ) print(str(err)) sys.exit() col +=3D 1 print("Cannot find cause of error") sys.exit() print("Unexpectedly got no error") sys.exit() ---------------------------- The relevant traceback (after the traceback of the initial exception that is caught) is: Traceback (most recent call last): File "/home/kop/projects/devel/test_encoding.py", line 32, in text_dumper.dump(col_val) TypeError: _StrDumper.dump() missing 1 required positional argument: 'obj' Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein