Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l8FhX-0005bU-L9 for pgadmin-hackers@arkaria.postgresql.org; Sat, 06 Feb 2021 05:00:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1l8FhU-0007ny-Rs for pgadmin-hackers@arkaria.postgresql.org; Sat, 06 Feb 2021 05:00:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l8FhU-0007nq-JY for pgadmin-hackers@lists.postgresql.org; Sat, 06 Feb 2021 05:00:04 +0000 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1l8FhS-0004ot-3R for pgadmin-hackers@postgresql.org; Sat, 06 Feb 2021 05:00:03 +0000 Received: by mail-lf1-x12a.google.com with SMTP id u25so13523115lfc.2 for ; Fri, 05 Feb 2021 21:00:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:user-agent:mime-version; bh=g/OSqwMw0E3hvtc6Hy4O9sOYo+/UPA3yMm3fY0Icv9c=; b=azB605gSGRgAqCwxWSBF92a1OIAFTrkku/F1gyG2gADc8JygJRAb0Tp076OpeWTLpR AZee46KG0oRzaoBHHONaONXnxYCx+iSDxDzGcpLrUec+QovTZMypC5tts4f1AqbJHBHc BO42dDnsnvis43aQGcyXCse7iny2QfazfMZG+1UzPTjeqP/RhL6KrNXU+wnSory44YgB 0y2/6cd8bqNmova3e19cbfgOL5xa+e6eILOSsUXQpBaBqcLXmnKRgFfV6ydI22Cdj80H yIG7MTzXQjNJlhcKbruXJ3qS5JG5s1F82cyYnWDy0PbFFDRWXa9b8/XCt8YTc5IOcNZg Gtkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:user-agent :mime-version; bh=g/OSqwMw0E3hvtc6Hy4O9sOYo+/UPA3yMm3fY0Icv9c=; b=WiSj/HGgZcalXJupfRd4lLTEKo8EHU+H02v9x2mcl/tY8feHm0Doc9k1OqpQlnw0XJ 4iYe+HxxIWyGA5aC/Cc/1Wlh8CRNMNFhINqKk0kWSrMR89Ct+4pP1oA3w54lM44ypLHH 8PbM0XpIzyjubpuv+Bi6QFGRr2e9WMdM69z/G0d8TIeTFyiz1bghV/QeA2jcaxRuoLcp nTiGoFZC8BmMq2fvXB5wdlw4Fw/RNEwnkYF4gQPO7/LNutiBKMN/dsKJVnH4ik5wO8TH hOUsSdWDZGDVnanM+/Thw6UB4SkdpzSwaqPz7RoH4gTnVw7yazBLF3a99aG76qN7WkRj TWVg== X-Gm-Message-State: AOAM531knt7oLym3JTqIHYGd2S2oulrfyzsFjxXrACNPVVggswnS2Qnl MDLNnzlqOokzN1uX3muwRhe6Taw7xY1chg== X-Google-Smtp-Source: ABdhPJxLPf0NyijCBE0zr025iM89gocYmLAKxiPgr/kjERdHqNuPLscWLO9m5XIWzmjo9YeIc8FLkA== X-Received: by 2002:ac2:523c:: with SMTP id i28mr1203107lfl.274.1612587600264; Fri, 05 Feb 2021 21:00:00 -0800 (PST) Received: from seb ([77.232.147.173]) by smtp.gmail.com with ESMTPSA id x11sm1189493ljh.69.2021.02.05.20.59.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 20:59:59 -0800 (PST) From: Sergey Burladyan To: pgadmin-hackers@postgresql.org Subject: pgagent unicode support Date: Sat, 06 Feb 2021 07:59:58 +0300 Message-ID: <87a6shyenl.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Currently pgagent doesn't handle unicode correctly. CharToWString function corrupt multibyte characters because it processes string one byte at a time: 148 std::string s =3D std::string(cstr); 149 std::wstring wsTmp(s.begin(), s.end()); WStringToChar function does not take into account that there can be _multi_byte character on wcstombs output and create buffer with size =3D wcslen: 157 int wstr_length =3D wcslen(wchar_str); 158 char *dst =3D new char[wstr_length + 10]; Also pgagent do not setup locale with setlocale(), without it all wcs/mbs functions cannot handle multibyte strings. For example: =3D=3D=3D step code =3D=3D=3D select '=D1=8D=D1=82=D0=BE =D0=BF=D1=80=D0=BE=D0=B2=D0=B5=D1=80=D0=BA=D0=B0= =D0=BA=D0=B8=D1=80=D0=B8=D0=BB=D0=BB=D0=B8=D1=86=D1=8B =D0=B2 =D1=82=D0=B5= =D0=BB=D0=B5 =D0=B7=D0=B0=D0=BF=D1=80=D0=BE=D1=81=D0=B0 pgagent' =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D postgres log =3D=3D=3D 2021-02-05 23:19:05 UTC [15600-1] postgres@postgres ERROR: unterminated qu= oted string at or near "'" at character 8 2021-02-05 23:19:05 UTC [15600-2] postgres@postgres STATEMENT: select ' =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Please see attached patch. I only test it on GNU/Linux and can't test it on Windows, sorry. --=20 Sergey Burladyan --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=Fix-multibyte-strings-handling.patch commit b9cf098a4d0df53b7b623a0de844fce834bf7be1 (HEAD -> x5) Author: Sergey Burladyan Date: Sat Feb 6 06:16:59 2021 Fix multibyte strings handling diff --git a/misc.cpp b/misc.cpp index 35ac83d..17103c7 100644 --- a/misc.cpp +++ b/misc.cpp @@ -145,20 +145,49 @@ std::wstring NumToStr(const long l) // This function is used to convert char* to std::wstring. std::wstring CharToWString(const char* cstr) { - std::string s = std::string(cstr); - std::wstring wsTmp(s.begin(), s.end()); - return wsTmp; + size_t wc_cnt = mbstowcs(NULL, cstr, 0); + + if (wc_cnt == (size_t) -1) { + return std::wstring(); + } + + wchar_t *wcs = new wchar_t[wc_cnt + 1]; + if (wcs == NULL) { + return std::wstring(); + } + + if (mbstowcs(wcs, cstr, wc_cnt + 1) == (size_t) -1) { + return std::wstring(); + } + + std::wstring tmp(&wcs[0], &wcs[wc_cnt]); + delete [] wcs; + + return tmp; } // This function is used to convert std::wstring to char *. char * WStringToChar(const std::wstring &wstr) { + static char *err = (char*)""; const wchar_t *wchar_str = wstr.c_str(); - int wstr_length = wcslen(wchar_str); - char *dst = new char[wstr_length + 10]; - memset(dst, 0x00, (wstr_length + 10)); - wcstombs(dst, wchar_str, wstr_length); - return dst; + int mb_len = wcstombs(NULL, wchar_str, 0); + + if (mb_len == (size_t) -1) { + return err; + } + + char *mbs = new char[mb_len + 1]; + if (mbs == NULL) { + return err; + } + memset(mbs, 0, mb_len + 1); + + if (wcstombs(mbs, wchar_str, mb_len + 1) == (size_t) -1) { + return err; + } + + return mbs; } // Below function will generate random string of given character. diff --git a/unix.cpp b/unix.cpp index 9a41e38..d4b0d3d 100644 --- a/unix.cpp +++ b/unix.cpp @@ -155,6 +155,8 @@ static void daemonize(void) int main(int argc, char **argv) { + setlocale(LC_ALL, ""); + std::wstring executable; executable.assign(CharToWString(argv[0])); --=-=-=--