public inbox for [email protected]  
help / color / mirror / Atom feed
From: Neel Patel <[email protected]>
To: Sergey Burladyan <[email protected]>
Cc: pgadmin-hackers <[email protected]>
Cc: Ashesh Vashi <[email protected]>
Cc: Dave Page <[email protected]>
Subject: Re: pgagent unicode support
Date: Thu, 18 Feb 2021 17:12:38 +0530
Message-ID: <CACCA4P3JvfMmG_LCm8Oeqnp4r=EuJBLuaS2mUieuw+--2ybgFQ@mail.gmail.com> (raw)
In-Reply-To: <CACCA4P0PG8y_u9vDwTTL4AHbXOfP1k7joLY4tjLQ40DNpWgyoA@mail.gmail.com>
References: <[email protected]>
	<CA+OCxoxsk-VJOJvw95V+pDN0G79T5-F1jAHB-THsPZd10f0KRg@mail.gmail.com>
	<CACCA4P0PG8y_u9vDwTTL4AHbXOfP1k7joLY4tjLQ40DNpWgyoA@mail.gmail.com>

Hi Sergey,

Thank you for the patch. It looks good to me except below.

We have modified the patch as we fixed the memory leak ( review comment
given by Ashesh ) and also fixed the compilation warnings.
Can you please review and let us know ?

Thanks,
Neel Patel

On Mon, Feb 15, 2021 at 6:15 PM Neel Patel <[email protected]>
wrote:

> Thanks Sergey for the patch.
>
> Sure Dave.
> There is some compilation warning in linux, I will fix those and test
> pgAgent in windows and update the thread.
>
> On Mon, Feb 8, 2021 at 2:55 PM Dave Page <[email protected]> wrote:
>
>> Hi
>>
>> On Sat, Feb 6, 2021 at 5:00 AM Sergey Burladyan <[email protected]>
>> wrote:
>>
>>> Currently pgagent doesn't handle unicode correctly.
>>>
>>> CharToWString function corrupt multibyte characters because it processes
>>> string one byte at a time:
>>>  148         std::string s = std::string(cstr);
>>>  149         std::wstring wsTmp(s.begin(), s.end());
>>>
>>> WStringToChar function does not take into account that there can be
>>> _multi_byte character on wcstombs output and create buffer with
>>> size = wcslen:
>>>  157         int wstr_length = wcslen(wchar_str);
>>>  158         char *dst = new char[wstr_length + 10];
>>>
>>> Also pgagent do not setup locale with setlocale(), without it all
>>> wcs/mbs functions cannot handle multibyte strings.
>>>
>>> For example:
>>>
>>> === step code ===
>>> select 'это проверка кириллицы в теле запроса pgagent'
>>> =================
>>>
>>> === postgres log ===
>>> 2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR:
>>> unterminated quoted string at or near "'" at character 8
>>> 2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT:  select '
>>> ====================
>>>
>>> Please see attached patch.
>>> I only test it on GNU/Linux and can't test it on Windows, sorry.
>>>
>>
>> Thanks for the patch! Neel/Ashesh; can you take a look please? It looks
>> OK to me, but then I'm not overly familiar with multibyte string handling.
>> What, if anything, needs to be done on Windows?
>>
>>
>> --
>> Dave Page
>> Blog: http://pgsnake.blogspot.com
>> Twitter: @pgsnake
>>
>> EDB: http://www.enterprisedb.com
>>
>>


Attachments:

  [application/octet-stream] pgagent_unicode.patch (2.2K, 3-pgagent_unicode.patch)
  download | inline diff:
diff --git a/misc.cpp b/misc.cpp
index 35ac83d..b046f47 100644
--- a/misc.cpp
+++ b/misc.cpp
@@ -143,22 +143,63 @@ std::wstring NumToStr(const long l)
 }
 
 // This function is used to convert char* to std::wstring.
-std::wstring CharToWString(const char* cstr)
+std::wstring CharToWString(const char *cstr)
 {
-	std::string s = std::string(cstr);
-	std::wstring wsTmp(s.begin(), s.end());
-	return wsTmp;
+        if (cstr != NULL)
+        {
+                size_t wc_cnt = mbstowcs(NULL, cstr, 0);
+
+                if (wc_cnt == (size_t) -1) {
+                        return std::wstring();
+                }
+
+                wchar_t *wcs = new wchar_t[wc_cnt + 1];
+                if (wcs == NULL) {
+                        return std::wstring();
+                }
+
+                if (mbstowcs(wcs, cstr, wc_cnt + 1) == (size_t) -1) {
+                        delete [] wcs;
+                        return std::wstring();
+                }
+
+                std::wstring tmp(&wcs[0], &wcs[wc_cnt]);
+                delete [] wcs;
+
+                return tmp;
+        }
+        return std::wstring();
 }
 
 // This function is used to convert std::wstring to char *.
-char * WStringToChar(const std::wstring &wstr)
+char *WStringToChar(const std::wstring &wstr)
 {
-	const wchar_t *wchar_str = wstr.c_str();
-	int wstr_length = wcslen(wchar_str);
-	char *dst = new char[wstr_length + 10];
-	memset(dst, 0x00, (wstr_length + 10));
-	wcstombs(dst, wchar_str, wstr_length);
-	return dst;
+        const wchar_t *wchar_str = wstr.c_str();
+
+        int mb_len = wcstombs(NULL, wchar_str, 0);
+
+        if ((size_t)mb_len == (size_t) -1) {
+                return NULL;
+        }
+
+        char *mbs = new char[mb_len + 1];
+        if (mbs == NULL) {
+                return NULL;
+        }
+
+        memset(mbs, 0, mb_len + 1);
+
+#ifdef __WIN32__
+        size_t charsConverted = 0;
+        wcstombs_s(&charsConverted, mbs, mb_len + 10, wchar_str, mb_len + 1);
+#else
+        if (wcstombs(mbs, wchar_str, mb_len + 1) == (size_t) -1) {
+                delete [] mbs;
+                return NULL;
+        }
+
+#endif
+        return mbs;
 }
 
 // Below function will generate random string of given character.


view thread (13+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pgagent unicode support
  In-Reply-To: <CACCA4P3JvfMmG_LCm8Oeqnp4r=EuJBLuaS2mUieuw+--2ybgFQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox