You are not logged in.
Pages: 1
var
S : RawUTF8;
begin
S := 'aaaâââ';
S := StringReplaceAll(S, 'â', 'a');
Writeln(S); //Write: aaaâââ Expected: aaaaaa
S := StringReplaceAll(S, 'a', 'â');
Writeln(S); //Write: ???âââ Expected: ââââââ
end;
Now with StringToUTF8 conversion:
S := 'aaaâââ';
S := StringReplaceAll(S, StringToUTF8('â'), 'a');
Writeln(S); //Write: aaaaaa Expected: aaaaaa
S := StringReplaceAll(S, 'a', StringToUTF8('â'));
Writeln(S); //Write: ââââââ Expected: ââââââ
end;
I know this is not a FrameWork problem, but what would be the explanation for:
- The Compiler will not give me an Inplicit Conversion Warning if it is considering the literal String as not UTF-8
- Why is this conversion necessary if by default the String in Delphi(new versions) is Unicode?
Offline
I guess the reason because Delphi is buggy with UTF-8 constants, is that they didn't consider UTF-8 support seriously enough.
Why use UTF-8 if you have UTF-16?
They even deprecated UTF-8 strings... then came back to reason.
But the bugs remains. And I guess Embarcadero is very unlikely to fix them.
So what I do in such context is:
1. for UI: use English text in the source code, e.g. as resourcestring, then put the translation in some resource or external file.
2. for logic process: hardcode constants using explicit StringToUTF8() conversions in the initialization section of the unit, setting the proper UTF-8 content in a global RawUTF8 variable.
Offline
Thanks for the explanation.
Unfortunately, compiling with FPC has proven to be more reliable than with Delphi.
This is a situation that can generate a error that is hard to notice.
Is really frustrating to have to fix basic things with this one.
The idea of converting constants is great. I will adopt this procedure.
Offline
Side note.
The benefit of global RawUTF8 variables is that they will allow reference counting, whereas plain const strings have a reference counter set to -1, so in some cases the compiler will allocate and copy it into a temp variable if this constant is assigned to another variable.
So global RawUTF8 variables may also slightly help performance.
Offline
I'm using records to manage these strings.
This help is also true for rawutf8 properties in records?
TMyRecord = record
MyStr : RawUTF8;
...
end;
const
MYCONST = 'âââ';
var
MyRecord : TMyRecord;
initialization
MyRecord.MyStr = StringToUTF8(MYCONST);
Offline
I'm playing around with this.
And if i set code page to 65001 (UTF-8 ), Delphi recognize the constants as UTF-8 correctly.
Then this conversion are not necessary.
One question, all MORmot source is coded in ANSI (1252), correct?.
Delphi shows as ANSI, but Lazarus and NotePad++ as UTF8.
Last edited by macfly (2020-01-23 17:50:10)
Offline
We tried to make mORMot source plain Ansi-7 ASCII.
Any accentuated or special character is expected to be written as #... constant.
There is no (and there won't be any) "BOM" marker, so UTF-8 or Ansi depends on the IDE, not on the file itself.
Offline
Thanks @ab.
A note for anyone who has the same problem.
After changing my source file to UTF-8 I had a problem with encoding in Lazarus (not in Delphi).
The unit is in UTF-8 and SynCommons.pas in ANSI.
Writeln(Utf8ToConsole(
UrlDecode(UrlEncode('aaaããã'))
));
//Write :aaaããã
If i change de unit to ANSI to match SynCommons encodig the result is as expected.
The solution is to add the conditional {$CODEPAGE UTF8} in unit or better yet include Synopse.inc that define this.
Offline
Pages: 1