#1 2010-06-21 16:31:04

From: France
Registered: 2010-06-21
Posts: 14,134

Unicode, UTF-8, Delphi 2009/2010 and previous Delphi compiler

Our SQLite3 Framework has 100% UNICODE compatibility, that is compilation under Delphi 2009/2010/XE. The code has been deeply rewritten and tested, in order to provide compatibility with the String=UnicodeString paradigm of these compilers.

Since our SQLite3 framework is natively UTF-8 (this is the better character encoding for fast text - JSON - streaming/parsing and it is natively supported by the SQLite3 engine), we had to fix the way our framework used strings, in order to handle all versions of Delphi (even pre-Unicode versions, especially the Delphi 7 version we like so much), and provide compatibility with the Free Pascal Compiler.

Some string types have been defined, and used in the code for best cross-compiler efficiency (avoiding most conversion between formats):
- RawUTF8 is used for every internal data usage, since both SQLite3 and JSON do expect UTF-8 encoding;
- WinAnsiString where WinAnsi-encoded AnsiString (code page 1252) are needed;
- generic string for i18n (e.g. in unit SQLite3i18n), i.e. text ready to be used within the VCL, as either AnsiString (for Delphi 2 to 2007) or UnicodeString (for Delphi 2009/2010/XE);
- RawUnicode in some technical places (e.g. direct Win32 *W() API call in Delphi 7) - note: this type is NOT compatible with Delphi 2009/2010/XE UnicodeString;
- RawByteString for byte storage (e.g. for FileFromString() function);
- some special conversion functions to be used for Delphi 2009/2010/XE UnicodeString  (defined inside {$ifdef UNICODE}...{$endif} blocks);
- never use AnsiString directly, but one of the types above.

Before any release all unitary regression tests are performed with the following compilers:
- Delphi 7, with and without our Enhanced Run Time Library;
- Delphi 2007;
- Delphi 2010 (and in some cases, Delphi 2009 - but we assume that if it works with Delphi 2010, it will work with Delphi 2009).
We don't allow any warning during the compilation process: all string conversion from the types above are made explicitely in the framework's code.


#2 2010-10-27 06:59:45

From: France
Registered: 2010-06-21
Posts: 14,134

Re: Unicode, UTF-8, Delphi 2009/2010 and previous Delphi compiler

These types are defined in the SynCommons unit, which is the root of most units of the framework.

The functions used to directly handle the new RawUTF8 type have been optimized for speed, and will avoid most implicit conversions.
For Delphi 2009/2010/XE, we don't allow any compiler warning about implicit string conversion, because it could lead into timing problems or data loss.

The functions used to convert RawUTF8 into or from string were also integrated into the i18n unit.

Unitary testing is a good way of avoiding most obvious regression between Delphi compiler versions.

Why did we use the UTF8 type, and not the UCS2 type?

Up to Delphi 2009, the AnsiString=String paradigm makes the compiler mostly Ansi compatible. Unicode characters (UCS2 codes, to be more precise) were handled with implicit WideString conversions, which is not got for speed and efficiency.
The UTF-8 encoding allows the whole Unicode range, and is still compatible with Ansi orientation of the string.
There is a #0 char at the end of the UTF-8 string, which is handy for fast parsing.
About speed, UTF-8 encoding and decoding is very fast (we rewrite some functions, even faster than the one provided by the Delphi RTL - under Delphi 2009/2010/XE, UTF-8 was not so well handled).
UTF-8 is a good solution about storage size, and memory usage (the only exception is if you use mostly asiatic characters).
And the World Wide Web, JSON and SQLite3 all expect UTF-8 as default encoding nowadays...


Board footer

Powered by FluxBB