Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lBdG4-0005N3-5S for pgadmin-hackers@arkaria.postgresql.org; Mon, 15 Feb 2021 12:45:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1lBdG3-0000kW-1v for pgadmin-hackers@arkaria.postgresql.org; Mon, 15 Feb 2021 12:45:43 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lBdG2-0000kP-Jm for pgadmin-hackers@lists.postgresql.org; Mon, 15 Feb 2021 12:45:42 +0000 Received: from mail-ej1-x62e.google.com ([2a00:1450:4864:20::62e]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lBdFz-0005Zh-Jt for pgadmin-hackers@postgresql.org; Mon, 15 Feb 2021 12:45:41 +0000 Received: by mail-ej1-x62e.google.com with SMTP id i23so4636154ejg.10 for ; Mon, 15 Feb 2021 04:45:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uACfgJ1t9tLVV4vdRG0O9pGPSrow/Er8GxAJV/WkUO8=; b=hre9F+HYns0VwfC6gp+hdZZPxPgvFW8KMIy02LRyfwRwDYju6Ojpx6eK0d1Wl0ett1 t8easUsJ9dBCImmXy2BcGvWi/V7a54T9/oLaIeYeCq6B7le+Q7AIYUhx2+6eDw7mLSg0 OfGkh1VQ4FSfzeCTxxbVDNspg7kxckIBbWDXNGvUhlDmezBjYqvVuxnabpW+OFh4xKQ4 cPwobCNqtYwbyIRrbFSMhY7zge7ZmaCvpDg8VJuP0YIeCTwIHJxDU29CyuwwnTDXYLOg kaShkIZqo9harih7kEWggQeLhNWzTDCDih5EMYj7ebVqRLLVS24O4DIdE/mJ7ggYb62R Ajug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uACfgJ1t9tLVV4vdRG0O9pGPSrow/Er8GxAJV/WkUO8=; b=Dcm9HFG0jvHmhH2JYla4Jw1gpE13XDNhXpnIwiIKveKtRUHfD3B0xt883lIHweao9L Gojo9NBPTfoZxHno3R2+mldICAAdswfAcez+ciQU7ub3tRLO6Vu5tTlJUpO5mG5YYdiR b6PUVlI69gRh62M5mCojolEnybxzINg1WLvJp5zEJZczGzkP2fteZwYOeyP10O9qxcIK errkgCLIVfV2cCU7QvDKudSnuH/NGyS00xtJyvLUiPijdY/qeqBXXGKAvTH/w0V7QDYD 1gTDNSieTnTSLHvB21EYiwYmaOXTQb19C+bSrwV6BuFhm8e3GPZ1irbIwip+MFyVI3Zw bdAg== X-Gm-Message-State: AOAM531xuaclbG/GBxwNPhk3GGniQiRof2eWowgCs1ZSFXHAybwmQLsP XOS/0Dh7oKm4zvNyZvH9EXuxWOk/t07efKeYhXaPEAWJPrCo3smaGPuLxWzX6x0SrPr/7UukCUv fIRfUJtjkf1id8s0vqcHpI9I88dEjx25luLBI1giiUD0AzW1Tn+/Zb5EzrGwES1/PngG0LRjlER qeBOSIBMO/+E2HqY56Zm4T1M+jI3pkXO1njQ6DxodvLZ/rV7uXcobJN0Ig0A== X-Google-Smtp-Source: ABdhPJwfNCTa3tKvV3ucXxlIPfS/YtV+gegG1Si8ExtbiavnQsMdJaDWeSODxAeI52EBtgmclhc+oluc2UkdhmczYr8= X-Received: by 2002:a17:906:1fd3:: with SMTP id e19mr15444884ejt.446.1613393137324; Mon, 15 Feb 2021 04:45:37 -0800 (PST) MIME-Version: 1.0 References: <87a6shyenl.fsf@gmail.com> In-Reply-To: From: Neel Patel Date: Mon, 15 Feb 2021 18:15:26 +0530 Message-ID: Subject: Re: pgagent unicode support To: Dave Page Cc: Sergey Burladyan , pgadmin-hackers , Ashesh Vashi Content-Type: multipart/alternative; boundary="000000000000d5790b05bb5f5c88" X-CLOUD-SEC-AV-Info: enterprisedb,google_mail,monitor X-CLOUD-SEC-AV-Sent: true X-Gm-Spam: 0 X-Gm-Phishy: 0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --000000000000d5790b05bb5f5c88 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Sergey for the patch. Sure Dave. There is some compilation warning in linux, I will fix those and test pgAgent in windows and update the thread. On Mon, Feb 8, 2021 at 2:55 PM Dave Page wrote: > Hi > > On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan > wrote: > >> Currently pgagent doesn't handle unicode correctly. >> >> CharToWString function corrupt multibyte characters because it processes >> string one byte at a time: >> 148 std::string s =3D std::string(cstr); >> 149 std::wstring wsTmp(s.begin(), s.end()); >> >> WStringToChar function does not take into account that there can be >> _multi_byte character on wcstombs output and create buffer with >> size =3D wcslen: >> 157 int wstr_length =3D wcslen(wchar_str); >> 158 char *dst =3D new char[wstr_length + 10]; >> >> Also pgagent do not setup locale with setlocale(), without it all >> wcs/mbs functions cannot handle multibyte strings. >> >> For example: >> >> =3D=3D=3D step code =3D=3D=3D >> select '=D1=8D=D1=82=D0=BE =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0= =B0 =D0=BA=D0=B8=D1=80=D0=B8=D0=BB=D0=BB=D0=B8=D1=86=D1=8B =D0=B2 =D1=82=D0= =B5=D0=BB=D0=B5 =D0=B7=D0=B0=D0=BF=D1=80=D0=BE=D1=81=D0=B0 pgagent' >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> =3D=3D=3D postgres log =3D=3D=3D >> 2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR: unterminated >> quoted string at or near "'" at character 8 >> 2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT: select ' >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> Please see attached patch. >> I only test it on GNU/Linux and can't test it on Windows, sorry. >> > > Thanks for the patch! Neel/Ashesh; can you take a look please? It looks O= K > to me, but then I'm not overly familiar with multibyte string handling. > What, if anything, needs to be done on Windows? > > > -- > Dave Page > Blog: http://pgsnake.blogspot.com > Twitter: @pgsnake > > EDB: http://www.enterprisedb.com > > --000000000000d5790b05bb5f5c88 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks=C2=A0Sergey for the patch.

Sure Dave.=C2=A0
There is some=C2=A0compilation warning in linux,= I will fix those and test pgAgent in windows and update the thread.
<= /div>

On Mon, Feb 8, 2021 at 2:55 PM Dave Page <dpage@pgadmin.org> wrote:
Hi

On Sat, Feb 6, 2021 = at 5:00 AM Sergey Burladyan <eshkinkot@gmail.com> wrote:
Currently pgagent doesn't handle unico= de correctly.

CharToWString function corrupt multibyte characters because it processes string one byte at a time:
=C2=A0148=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0std::string s =3D std::string(cs= tr);
=C2=A0149=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0std::wstring wsTmp(s.begin(), s.= end());

WStringToChar function does not take into account that there can be
_multi_byte character on wcstombs output and create buffer with
size =3D wcslen:
=C2=A0157=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int wstr_length =3D wcslen(wchar= _str);
=C2=A0158=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *dst =3D new char[wstr_leng= th + 10];

Also pgagent do not setup locale with setlocale(), without it all
wcs/mbs functions cannot handle multibyte strings.

For example:

=3D=3D=3D step code =3D=3D=3D
select '=D1=8D=D1=82=D0=BE =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA= =D0=B0 =D0=BA=D0=B8=D1=80=D0=B8=D0=BB=D0=BB=D0=B8=D1=86=D1=8B =D0=B2 =D1=82= =D0=B5=D0=BB=D0=B5 =D0=B7=D0=B0=D0=BF=D1=80=D0=BE=D1=81=D0=B0 pgagent'<= br> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

=3D=3D=3D postgres log =3D=3D=3D
2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:=C2=A0 unterminat= ed quoted string at or near "'" at character 8
2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:=C2=A0 select= '
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Please see attached patch.
I only test it on GNU/Linux and can't test it on Windows, sorry.

Thanks for the patch! Neel/Ashesh; can you ta= ke a look please? It looks OK to me, but then I'm not overly familiar w= ith multibyte string handling. What, if anything, needs to be done on Windo= ws?
=C2=A0

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

= EDB: http://www.e= nterprisedb.com

--000000000000d5790b05bb5f5c88--