#1 2023-05-05 16:56:23

blue
Member
Registered: 2023-05-05
Posts: 9

Utf8ToString does not support Chinese

OS: Windows 11
IDE: delphi 2007

procedure Test;
var
  src, dst                              : RawUtf8;
begin
  src := '中文abc';
  dst := StringToUtf8(src);
  Writeln('StringToUtf8: ', dst, ', src length: ', Length(src), ', dst length: ', Length(dst));

  src := Utf8ToString(dst);
  Writeln('Utf8ToString: ', src, ', src length: ', Length(src), ', dst length: ', Length(dst));
end;

Last edited by blue (2023-05-05 17:00:28)

Offline

#2 2023-05-05 19:01:50

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: Utf8ToString does not support Chinese

Your are making a confusion between string and RawUtf8 variables - this is the correct way on Delphi XE for instance:

procedure testunicode;
var
  s1, s2: string;
  dst: RawUtf8;
begin
  s1 := '中文abc';
  dst := StringToUtf8(s1);
  s2 := Utf8ToString(dst);
  if s1 = s2 then
    writeln('ok')
  else
    writeln('ko');
end;

Offline

#3 2023-05-05 20:49:40

blue
Member
Registered: 2023-05-05
Posts: 9

Re: Utf8ToString does not support Chinese

getting an incorrect result on Delphi 2007.

fixed patch:

function Unicode_GetWideToAnsiSize(W: PWideChar; LW, CodePage: PtrInt): integer;
begin
  result := WideCharToMultiByte(CodePage, 0, W, LW, nil, 0, nil, nil);
end;

function TSynAnsiConvert.UnicodeBufferToAnsi(Dest: PAnsiChar;
  Source: PWideChar; SourceChars: cardinal): PAnsiChar;
..
    // rely on the Operating System for all remaining ASCII characters
    if SourceChars <> 0 then begin
      inc(Dest,
        Unicode_WideToAnsi(Source, Dest, SourceChars, Unicode_GetWideToAnsiSize(Source, SourceChars, fCodePage), fCodePage));

Last edited by blue (2023-05-05 21:33:58)

Offline

#4 2023-05-08 09:43:06

cybexr
Member
Registered: 2016-09-14
Posts: 78

Re: Utf8ToString does not support Chinese

problem exists with mormot2, tested below:

D7 + mormot

VVxwPZ63.png

D7 + mormot2

eU6y31ed.png

Offline

#5 2023-05-09 01:57:27

profh
Member
Registered: 2010-07-02
Posts: 159

Re: Utf8ToString does not support Chinese

uses LConvEncoding, and try CP936ToUTF8 or UTF8ToCP936.

Offline

#6 2023-05-09 07:31:11

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: Utf8ToString does not support Chinese

Offline

#7 2023-05-10 03:42:47

blue
Member
Registered: 2023-05-05
Posts: 9

Re: Utf8ToString does not support Chinese

test passed.

Offline

#8 2023-05-11 12:53:56

itSDS
Member
From: Germany
Registered: 2014-04-24
Posts: 506

Re: Utf8ToString does not support Chinese

just for my interest
in the above sample you use Writeln( with dst as parameter
i think writeln uses an implicit cast of RawUtf8 To String or am i wrong ?


Rad Studio 12.1 Santorini

Offline

Board footer

Powered by FluxBB