#1 2018-04-15 15:12:31

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

[suggestion] - Add TTextWriter.AddUtf8BOMAtBeginning()

@ab,

Would you consider adding the following simple function to TTextWriter?
Assume you use TTextWriter to generate csv files, chaining a TFileStream instance to the text writer when creating it, and finally you call FlushFinal to write the content to the disk.

But since BOM is missing from the final file but Excel relies on the BOM to detect utf8 encoding, all CJK (maybe all non-English) characters will be shown as garbled characters inside Excel.

function TTextWriter.AddUtf8BOMAtBeginning: Boolean;
begin
  if GetLength < 1 then
  begin
    AddShort(#$ef#$bb#$bf); // add UTF-8 Byte Order Mark
    Result := True;
  end
  else
    Result := False;
end;

Last edited by edwinsn (2018-04-15 15:13:27)


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#2 2018-04-16 07:46:44

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,272
Website

Re: [suggestion] - Add TTextWriter.AddUtf8BOMAtBeginning()

I don't see much benefit in comparison to write AddShort(#$ef#$bb#$bf).

Offline

#3 2018-04-16 09:53:54

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Re: [suggestion] - Add TTextWriter.AddUtf8BOMAtBeginning()

Or maybe a better suggestion would be add a new parameter to TTextWriter.Create, so that it looks like:

TTextWriter.Create(aStream: TStream; aSaveUtf8BOM: Boolean);

And transparently handle the adding the utf8 BOM before writing the first byte of the text writer data to aStream.

Imagine people don't  know nothing about utf8 encoding and the so-called BOM. Actually, I took me several tens of minutes of searching before I know Excel assumes utf8-without-BOM as ANSI text.

It's your call wink

Last edited by edwinsn (2018-04-16 09:54:06)


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#4 2018-04-16 11:55:47

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,549
Website

Re: [suggestion] - Add TTextWriter.AddUtf8BOMAtBeginning()

From other side on Linux, the BOM is discouraged because it breaks things like shebang lines in shell scripts. Plus, it'd be pointless to have a UTF-8 signature when UTF-8 is the default encoding anyway. I remove BOM from my files during linux migration. So lets it be on application level, not in SynCommons

Offline

#5 2018-04-16 12:56:03

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Re: [suggestion] - Add TTextWriter.AddUtf8BOMAtBeginning()

@mpv,

Good point, it makes sense. Currently I use Windows only (so far).

Last edited by edwinsn (2018-04-16 12:56:29)


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

Board footer

Powered by FluxBB