You are not logged in.
The code page involved is 1252, and the character involved is the Copyright sign.
The file Tiny.pas is ANSI-encoded and contains a single Copyright sign. That is to say, its content is a single byte 0xA9.
The TinyUTF8WithoutBOM.pas contains the UTF8-encoded Copyright sign but without the UTF8 BOM, i.e., two bytes 0xC2, 0xA9.
The TinyUTF8WithBOM.pas contains the UTF8-encoded Copyright sign and with the UTF8 BOM, i.e., five bytes 0xEF, 0xBB, 0xBF, 0xC2, 0xA9.
The call to TSynAnsiConvert.Engine(CODEPAGE_US).AnsiToUTF8 will convert this single byte 0xA9 to two bytes, i.e., 0xC2, 0xA9.
More importantly, the call to TSynAnsiConvert.Engine(CODEPAGE_US).UTF8ToAnsi will convert the two bytes 0xC2, 0xA9 back to the original single byte 0xA9. Everything is perfect till now.
However, the call to TSynAnsiConvert.Engine(CODEPAGE_US).UTF8ToAnsi will convert the five bytes 0xEF, 0xBB, 0xBF, 0xC2, 0xA9 to the two bytes 0xC2, 0xA9 instead of the original single byte 0xA9. Could you help to comment whether this behavior is a bug ?
program Project1;
{$APPTYPE CONSOLE}
uses FastMM4, SynCommons, mORMot, SysUtils;
begin
SynCommons.FileFromString(
TSynAnsiConvert.Engine(CODEPAGE_US).AnsiToUTF8(SynCommons.StringFromFile('Tiny.pas')),
'TinyUTF8WithoutBOM.pas');
SynCommons.FileFromString(
TSynAnsiConvert.Engine(CODEPAGE_US).UTF8ToAnsi(SynCommons.StringFromFile('TinyUTF8WithoutBOM.pas')),
'TinyConvertedBackFromUTF8WithoutBOM.pas');
SynCommons.FileFromString(
TSynAnsiConvert.Engine(CODEPAGE_US).UTF8ToAnsi(SynCommons.StringFromFile('TinyUTF8WithBOM.pas')),
'TinyConvertedBackFromUTF8WithBOM.pas');
end.
Last edited by ComingNine (2015-10-05 08:46:04)
Offline
The post is edited in order to make things clearer. Thank you for your efforts !
Offline
Dear mpv, thank you for our comment ! I have checked but but I do not think AnyTextFile* is related here.
My question is essentially why TSynAnsiConvert.Engine(CODEPAGE_US).UTF8ToAnsi converts the UTF8 five bytes 0xEF, 0xBB, 0xBF, 0xC2, 0xA9 to the UTF8 two bytes 0xC2, 0xA9, instead of the ANSI single byte 0xA9. Could you help to comment ?
Offline
Dear ab and mpv, thank you for your kind help very much !
Dear mpv, sorry that I did not realize that I should not feed BOM into TSynAnsiConvert.UTF8ToAnsi !...
Offline