You are not logged in.
Pages: 1
Regression tests return errors in
test.core.base.pas
procedure TTestCoreBase._UTF8;
...
W := WinAnsiString(RandomString(len));
U := WinAnsiToUtf8(W);
...
Up := mormot.core.unicode.UpperCase(U);
...
CheckEqual(Utf8CompareIOS(pointer(U), pointer(Up)), 0); // fails here
...
end
Running on Windows 10, Croatian Locale (ANSI Code Page 1250, OEM Code Page 852), both 32bit and 64bit using mORMot2 commit 2.3.8840
When comparing strings CompareStringW() considers diacritic symbols (šđčćž ŠĐČĆŽ) as well as digraphs (nj NJ or lj LJ)
On the other hand, we use mormot.core.unicode.UpperCase() which will uppercase only invariant chars <#127
We need to force CompareStringW() to use LOCALE_INVARIANT, otherwise it will return <> 0 for "nJ" = "NJ" on random-generated text.
On Linux, everything runs fine with current source.
My suggestion would be :
mormot.core.test.pas
class function TSynTestCase.RandomString(CharCount: integer): WinAnsiString;
- PByteArray(result)[i] := 32 + R[i] and 127; // can get over #127
+ PByteArray(result)[i] := $20 + R[i] mod 95;
mormot.core.test.pas
class function TSynTestCase.RandomAnsi7(CharCount: integer): RawByteString;
- PByteArray(result)[i] := 32 + R[i] mod 94;
+ PByteArray(result)[i] := 32 + R[i] mod 95; // tilde #$7E shoud be included (not related to errors from test)
mormot.core.os.pas
function Unicode_CompareString(PW1, PW2: PWideChar; L1, L2: PtrInt;
- result := CompareStringW(LOCALE_USER_DEFAULT, _CASEFLAG[IgnoreCase], PW1, L1, PW2, L2);
+ result := CompareStringW(LOCALE_INVARIANT, _CASEFLAG[IgnoreCase], PW1, L1, PW2, L2);
Offline
Thanks a lot for the investigation.
I am okay with the LOCALE_INVARIANT flag usage, and RandomAnsi7.
Even if I still have a doubt about the initial demand on this UtF8CompareIOS() function.
It was meant to deal with Chinese characters and sorting, and I am not sure if LOCALE_INVARIANT would not break the search...
https://learn.microsoft.com/en-us/windo … -invariant
I am more concerned about RandomString(): this function should work without any tweak, because it returns a WinAnsiString content, which can get over #127 as expected.
The tests is using this WinAnsi chars to validate the WinAnsi case conversion of the framework, i.e. UpperCaseU().
Offline
Please try with the latest trunk,
especially https://github.com/synopse/mORMot2/commit/bc047ba6
Using LOCALE_INVARIANT would break existing code relying on the current user locale.
Offline
Regression tests now passes without failed assertions.
Even if I still have a doubt about the initial demand on this UtF8CompareIOS() function.
It was meant to deal with Chinese characters and sorting, and I am not sure if LOCALE_INVARIANT would not break the search...
UtF8CompareIOS() should be used as rare as possible (i.e. for sorting).
https://learn.microsoft.com/en-us/windo … plications
On Windows it will use CompareStringW() which will fail on some computer generated text (random, base64 encoded, etc.) because of digraphs in some languages, but will be fine for text from natural language conversation.
For example for UtF8CompareIOS() "Anja" = "ANJA","ANJa" = "ANJA" , but "AnJa" <> "ANJA" in my locale.
It's a mess.
I am more concerned about RandomString(): this function should work without any tweak, because it returns a WinAnsiString content, which can get over #127 as expected.
The tests is using this WinAnsi chars to validate the WinAnsi case conversion of the framework, i.e. UpperCaseU().
Actually it works as expected (returns chars in range $20 - $9F).
My bad, in _UTF() test I saw UpperCase()/LowerCase() instead of UpperCaseU()/LowerCaseU() ad figured (wrongly) that chars shoud be in 7-bit range ($20 - $7F).
Cheers
Offline
Pages: 1