Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l92nh-0000po-6Y for pgadmin-hackers@arkaria.postgresql.org; Mon, 08 Feb 2021 09:25:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1l92ng-0001KR-05 for pgadmin-hackers@arkaria.postgresql.org; Mon, 08 Feb 2021 09:25:44 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l92nf-0001KJ-IA for pgadmin-hackers@lists.postgresql.org; Mon, 08 Feb 2021 09:25:43 +0000 Received: from mail-ed1-x535.google.com ([2a00:1450:4864:20::535]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1l92nc-0001MY-5w for pgadmin-hackers@postgresql.org; Mon, 08 Feb 2021 09:25:42 +0000 Received: by mail-ed1-x535.google.com with SMTP id s3so17218673edi.7 for ; Mon, 08 Feb 2021 01:25:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pgadmin.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7XTY3kICGS3Nk1vmkdBlQuDsnhACY5EOId38zLceTf4=; b=YPECqVoXObt8V5rmoCpTEEThd28mGBJNaVk8qMedcRyEftV2cmLZyVTZSUK3BUxtpg kIzRMYWKBS1D+u1cqCPPF0rQupFuvkS3W9jlYPtiXm3U/HtUNE5GTpYCxsEhaXOxzSQK wDKeBLX4qSgGi0TAIFjVe1nf6OHxI00akNsadhyDhiak0epcpgDLzAE/VlKQCHri3pGv +PRVLFazTAAYZYFszW8pVLDFB2wxAv3Ka3YUBmY113rKdcZK6hY6xE+6Iyy/OxfaHk9x 3u9A8ezrz+HBRQ07lE0F+/NB67Gb2MwXJPMEOtDbdpBmOiXfNHT/IZOqkhl9fN/xerYj IlEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7XTY3kICGS3Nk1vmkdBlQuDsnhACY5EOId38zLceTf4=; b=HReay+rcDK/XQCUmfUOco2CR3ZEmfp1/ugjZAcFkCF9KsJmfqd4RkJv4WdY3DkP8oM +LQdHuAeafQQxCamXbt9zkcEuWYIQw4rSGrfSCy8TmYNj6sPE05GElW3hIpc0+zr6Wt4 Wd4QQxS98vZQOGRE4HDGhLctpAbrxdn2pAMohPGxQizDyo4Z0PSHEV17EgNBMspzMC04 ES+tDDtHI2JwwWNuXPPrg7t7v651+HNzwUxhuinExiEO97IOhBf6Jvhg0FMFAr3NW78V lQLJTO/Yqt2YzKsjfMphJClxHO5wn1FmdI7fGSEV6BKbeWMJtyKOaMEGxQXCDNfMJc2c 2Qsw== X-Gm-Message-State: AOAM530YHNadOEGAKfNaMNR7vIkojyI4hT9KK1SRqtQcgg3u0ney8lYk 7eCArvgDCrDdgSsyEBBea2zwyLan9rg3gBTxmNPDRA== X-Google-Smtp-Source: ABdhPJySr3REZmTizjEUgGU7Na+ZKj5r1BUHZgrL1pVDj+Y7QFkqhqR7a2SArUoJcvaWfIR6LuWno4esIHJIO3BygOk= X-Received: by 2002:a50:e442:: with SMTP id e2mr17130407edm.235.1612776338540; Mon, 08 Feb 2021 01:25:38 -0800 (PST) MIME-Version: 1.0 References: <87a6shyenl.fsf@gmail.com> In-Reply-To: <87a6shyenl.fsf@gmail.com> From: Dave Page Date: Mon, 8 Feb 2021 09:25:27 +0000 Message-ID: Subject: Re: pgagent unicode support To: Sergey Burladyan Cc: pgadmin-hackers , Neel Patel , Ashesh Vashi Content-Type: multipart/alternative; boundary="000000000000c2de0005bacfc066" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --000000000000c2de0005bacfc066 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan wrote= : > Currently pgagent doesn't handle unicode correctly. > > CharToWString function corrupt multibyte characters because it processes > string one byte at a time: > 148 std::string s =3D std::string(cstr); > 149 std::wstring wsTmp(s.begin(), s.end()); > > WStringToChar function does not take into account that there can be > _multi_byte character on wcstombs output and create buffer with > size =3D wcslen: > 157 int wstr_length =3D wcslen(wchar_str); > 158 char *dst =3D new char[wstr_length + 10]; > > Also pgagent do not setup locale with setlocale(), without it all > wcs/mbs functions cannot handle multibyte strings. > > For example: > > =3D=3D=3D step code =3D=3D=3D > select '=D1=8D=D1=82=D0=BE =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0= =B0 =D0=BA=D0=B8=D1=80=D0=B8=D0=BB=D0=BB=D0=B8=D1=86=D1=8B =D0=B2 =D1=82=D0= =B5=D0=BB=D0=B5 =D0=B7=D0=B0=D0=BF=D1=80=D0=BE=D1=81=D0=B0 pgagent' > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > =3D=3D=3D postgres log =3D=3D=3D > 2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR: unterminated > quoted string at or near "'" at character 8 > 2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT: select ' > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Please see attached patch. > I only test it on GNU/Linux and can't test it on Windows, sorry. > Thanks for the patch! Neel/Ashesh; can you take a look please? It looks OK to me, but then I'm not overly familiar with multibyte string handling. What, if anything, needs to be done on Windows? --=20 Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EDB: http://www.enterprisedb.com --000000000000c2de0005bacfc066 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <= ;eshkinkot@gmail.com> wrote:<= br>
Currently pgagen= t doesn't handle unicode correctly.

CharToWString function corrupt multibyte characters because it processes string one byte at a time:
=C2=A0148=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0std::string s =3D std::string(cs= tr);
=C2=A0149=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0std::wstring wsTmp(s.begin(), s.= end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size =3D wcslen:
=C2=A0157=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int wstr_length =3D wcslen(wchar= _str);
=C2=A0158=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *dst =3D new char[wstr_leng= th + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=3D=3D=3D step code =3D=3D=3D
select '=D1=8D=D1=82=D0=BE =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA= =D0=B0 =D0=BA=D0=B8=D1=80=D0=B8=D0=BB=D0=BB=D0=B8=D1=86=D1=8B =D0=B2 =D1=82= =D0=B5=D0=BB=D0=B5 =D0=B7=D0=B0=D0=BF=D1=80=D0=BE=D1=81=D0=B0 pgagent'<= br> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

=3D=3D=3D postgres log =3D=3D=3D
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:=C2=A0 unterminat= ed quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:=C2=A0 select= '
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you ta= ke a look please? It looks OK to me, but then I'm not overly familiar w= ith multibyte string handling. What, if anything, needs to be done on Windo= ws?
=C2=A0

--
Dave Page
Blog: http://pgsnake.blogspot.comTwitter: @pgsnake

EDB: http://www.enterprisedb.com

--000000000000c2de0005bacfc066--