#1 mORMot 1 » Truncated variabel length strings in mORMotVCL » 2016-09-17 09:22:23

merlin352
Replies: 0

Hello

If a RawUTF8 field is defined without an index modifier mORMotVCL sets the DataSize to 1 in

procedure TSynSQLTableDataSet.InternalInitFieldDefs;
.
.
.
.
    sftUTF8Text: begin
      DataSize := fTable.FieldLengthMax(F,True);  // <--- Here 1 is returned if no MaxSize defined
      {$ifndef UNICODE} // for Delphi 2009+ TWideStringField = UnicodeString! 
      if fForceWideString then
        DBType := ftWideString else
      {$endif}
        DBType := ftDefaultVCLString;
    end;

I propose the following change :

procedure TSynSQLTableDataSet.InternalInitFieldDefs;
.
.
.
.
    sftUTF8Text: begin
      DataSize := fTable.FieldLengthMax(F,false);
      if DataSize = 0 then                             // variable length
        DataSize := dsMaxStringSize;                   // this is the maximum size DB unit can handle.
      {$ifndef UNICODE} // for Delphi 2009+ TWideStringField = UnicodeString! 
      if fForceWideString then
        DBType := ftWideString else
      {$endif}
        DBType := ftDefaultVCLString;
    end;

Greetings

#2 Re: mORMot 1 » UTF8 encoded strings in FPC and Lazarus » 2016-09-08 18:20:41

Well, for Delphi the patch ist save, as it changes nothing.

For Lazarus the thing is also clear, it is ALWAYS UTF8. All visible components of the LCL expect an UTF8-String, independant of the version. But I do not know the mORMot-code enough to say if this is true for all the places where codepage-conversion is used. With my proposition it should be easy to change the call from CurrentAnsiConvert to SystemAnsiConvert.

You are right that for FPC things are little more complicate, as the RTL switched recently from Ansi-encoded to UTF8.

On the other hand the existing code is not safe for everything other than Windows, as it always assumes fon non-Windoes-OS that the codepage is 1252, which is rarely the case. Problem is that only Windows knows about ACP-codes.The standard procedure to find the system encoding under Unix-like OS'es (like Linux, Android, OSx) is

{$IFDEF Unix}
function GetSystemEncoding: string;
var
  Lang: string;
begin
  lang := GetEnv('LC_ALL');
  if Length(lang) = 0 then
  begin
    lang := GetEnv('LC_MESSAGES');
    if Length(lang) = 0 then
      lang := GetEnv('LANG');
  end;
  i:=pos('.',Lang);
  if (i>0) and (i<=length(Lang)) then
    Result := copy(Lang,i+1,length(Lang)-i)
  else
     Result := 'UTF-8'
end;
{$ELSE}
begin
  Result := 'UTF-8';
end;
{$ENDIF}

But then you have a string which descibes the character encoding, not an ACP-code. This can be solved, but it can lead to a codepage that is not supported under mORMot.

I have wrtten some years ago a set of units to support ALL codepages for which a Unicode-description exists (that are some more than Windows knows). It has the ability to do codepage conversion internally, (direct conversion between different codepages, multibyte codepage support (asian languages and so on), EBCDIC support, Upper- and Lowercase support) but can also fall back to system calls (iconvenc under *nix). It also has a tool to generate pascal source code from a unicode description file that can then be integrated in the project.

If you are interested in a more global support for codepages I could update these units. But there is some work to do, especially adapt it to strings that support a codepageinfo in the header, and make the whole thing compile unter Delphi.

#3 mORMot 1 » UTF8 encoded strings in FPC and Lazarus » 2016-09-08 06:28:35

merlin352
Replies: 10

Good Morning


CurrentAnsiConvert is initalized always with a converter to the current used Windows Codepage. For FPC and Lazarus this is not correct as Lazarus (and therefore all the visible components like Grids etc) use UTF8 encoded strings. As CurrentAnsiConvert is used all around in the code I suppose the following changes.

1. Insert a new variable SystemAnsiConvert in SynCommons

  /// global TSynAnsiConvert instance to handle current system encoding
  // - this is the encoding as used by the AnsiString Delphi, so will be used
  // before Delphi 2009 to speed-up VCL string handling (especially for UTF-8)
  // - as FPC and Lazarus use UTF8 encoding this is initalized with TSynAnsiUTF8
  // - this instance is global and instantied during the whole program life time
  CurrentAnsiConvert: TSynAnsiConvert;

  /// global TSynAnsiConvert instance to handle current system encoding
  // - this is the encoding as used by the System
  // - this instance is global and instantied during the whole program life time
  SystemAnsiConvert: TSynAnsiConvert;

2. Changes in TSynAnsiConvert.Engine

class function TSynAnsiConvert.Engine(aCodePage: cardinal): TSynAnsiConvert;
var i: integer;
begin
  if SynAnsiConvertList=nil then begin
    GarbageCollectorFreeAndNil(SynAnsiConvertList,TObjectList.Create);
    SystemAnsiConvert := TSynAnsiConvert.Engine(GetACP);
    {$ifdef FPC}
    CurrentAnsiConvert := TSynAnsiConvert.Engine(CP_UTF8) as TSynAnsiUTF8;
    {$else}
    CurrentAnsiConvert := TSynAnsiConvert.Engine(GetACP);
    {$endif}
    WinAnsiConvert := TSynAnsiConvert.Engine(CODEPAGE_US) as TSynAnsiFixedWidth;
    UTF8AnsiConvert := TSynAnsiConvert.Engine(CP_UTF8) as TSynAnsiUTF8;
  end;

If somewhere in the code where CurrentAnsiConvert is used, but in fact the system code page is ment, this should make it far easy to change the source

Greetings

#5 mORMot 1 » Remove mORMoti18n dependency from mORMotUI » 2016-09-05 15:25:21

merlin352
Replies: 2

Hello

mORMoti18n is incompatible with FPC/Lazarus because of the different resourceformat.

mORMotUI uses mORMoti18n but de facto uses only U2S and S2U from this unit.

I propose to change all references in mORMotUI from U2S to UTF8toString and from S2U to StringtoUTF8 in SynCommons and remove the dependency to mORMoti18n.

In consequence mORMotUI is usable under FPC/Lazarus

Greetings

Board footer

Powered by FluxBB