#1 2012-12-19 07:01:42

style-sheets
Member
Registered: 2011-06-29
Posts: 5

SynCommons.pas Int32ToUTF8 question

Hi,

I was looking for a faster IntToStr implementation and discovered SynCommons.pas.

I read in the comments that should be 3x faster than the normal implementation, I tried this Delphi 2010 code, but I'm not sure if I'm using/interpreting the results correctly:

// ----------------------------------------------------------- //
procedure TForm1.Button1Click(Sender: TObject);
var
   sText   : String;
   TheTime : Integer;
   i       : Integer;
begin
     TheTime   := GetTickCount();
     for i := 1 to 10000000 do    // Repeat the tests 10 million times
         sText := IntToStr(i);
     Memo1.Lines.Add('Classic InToStr timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds. sText = ' + sText);

     TheTime   := GetTickCount();
     for i := 1 to 10000000 do    // Repeat the tests 10 million times
         sText := Int32ToUTF8(i);
     Memo1.Lines.Add('Int32ToUTF8 timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds. sText = ' + sText);
end;
// ----------------------------------------------------------- //

Int32ToUTF8() is only 20 or 30 milli-seconds faster than IntToStr().

I'm not familiar with UTF8, I suspect some conversion is slowing Int32ToUTF8() down

Am I missing something here?

Thanks!

Last edited by style-sheets (2012-12-19 07:04:25)

Offline

#2 2012-12-19 07:16:26

style-sheets
Member
Registered: 2011-06-29
Posts: 5

Re: SynCommons.pas Int32ToUTF8 question

OK, I changed the test code to this:

// --------------------------------------------------------------- //
var
   Utf8    : RawUTF8;
   TheTime : Integer;
   i       : Integer;
begin
     TheTime   := GetTickCount();
     for i := 1 to 10000000 do
         Utf8  := Int32ToUTF8(i);
end;
// --------------------------------------------------------------- //

and now Int32ToUTF8() is almost 10 times faster than InToStr():

    Classic InToStr timing = 1813 milli-seconds. sText = 10000000
    Int32ToUTF8 timing = 187 milli-seconds. sText = 10000000


The only problem is: I use String, not RawUTF8, in my projects. Whatever gain I enjoy from using Int32ToUTF8() is actually lost when converting back to string.

My question is: is there a similar routine that works with strings right away?

PS. The same question applies to other optimized routines. For instance, Trim, Pos, StringReplaceAll, etc... they all gave me really good results, but they also use RawUTF8.

Thanks!

Last edited by style-sheets (2012-12-19 09:37:16)

Offline

#3 2012-12-19 14:32:24

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,679
Website

Re: SynCommons.pas Int32ToUTF8 question

Yes, you are right, the Int32ToUTF8() version returns a RawUTF8, which is an UTF-8 encoded AnsiString kind of variable.
If you need a string variable, default conversion from RawUTF8 into String will be very slow, due to the slow implementation in System.pas "official" RTL file.
(this is why SynCommons.pas has a whole set of functions and classes by-passing System.pas and SysUtils.pas to provide the best speed possible, and natively work with our UTF-8 content, which is the standard of our mORMot framework)

What you can do is make directly the conversion:

{$ifdef UNICODE}
function Ansi7ToString(const Text: RawByteString): string;
var i: integer;
begin
  SetString(result,nil,length(Text));
  for i := 0 to length(Text)-1 do
    PWordArray(result)[i] := PByteArray(Text)[i]; // no conversion for 7 bit Ansi
end;
{$else}
function Ansi7ToString(const Text: RawByteString): string;
begin
  result := Text; // if we are SURE this text is 7 bit Ansi -> direct assign
end;
{$endif}

Using this as

  string := Ansi8ToString(Int32ToUTF8(i));

will be faster than direct assignment of RawUTF8 into string.

Offline

#4 2012-12-19 18:55:48

style-sheets
Member
Registered: 2011-06-29
Posts: 5

Re: SynCommons.pas Int32ToUTF8 question

Thank you!

Given the fact that most delphi developers use String rather than RawUTF8, I wonder why you chose not to use string?

I highly suspect the use of RawUTF8 does prevent many (if not most) developers from using these fast routines, simply because (1) the string conversion make the speed gain a lot less significant and (2) it's a lot more difficult to use it in existing projects.

One obvious example is when I have to pass the result of, say, IntToStr to another component that expects String.

According to my tests, even when using the Ansi7ToString (I tried UTF8ToString() as well), overall speed gain (after string conversion) was marginal at best, and sometimes string conversion routine made things so bad that overall operation was actually *slower*

For these reasons, I think having direct string-based routines would be best (performance-wise).

Any chance we'll see something like this? smile

Offline

#5 2012-12-19 22:08:09

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,679
Website

Re: SynCommons.pas Int32ToUTF8 question

I suspect the bottleneck in your test is the memory allocation stack.
Try with our SynScaleMM for instance.

Those fast routines are meant to be used within mORMot, at business domain level.
They were not made to be used in any projects, but to give full power of UTF-8 encoding scheme.

You can write your own faster version of the RTL, as we did for Delphi 7 and 2007.
But it would be worth it only after active profiling.

Offline

#6 2012-12-20 07:24:17

style-sheets
Member
Registered: 2011-06-29
Posts: 5

Re: SynCommons.pas Int32ToUTF8 question

Thank you but...

I'm actually using TopMemory, I can't use SynScaleMM (it crashed several times on my main project). As far as I can tell, OmniThreadLibrary & SynScaleMM hate each other.

The only time I got a better speed was when I used StringReplaceAll() & Int32ToUTF8(), otherwise replacement routines *with* string conversion made speed worst (string conversion is the one to blame, but since I'm interested in the end result...)

That's really too bad, SynCommons.pas really does contain some seriously fast stuff, people would greatly benefit from it if it was made general-purpose

FWIW, here's the code that I tried (TopMemory used):

// ------------------------------------------------------------------------------ //
procedure TForm1.Button1Click(Sender: TObject);
const
     TestString = '  (const S, OldPattern, NewPattern تلاوة : RawUTF8)  Test Data  ';
var
   Utf8    : RawUTF8;
   sText   : String;
   TheTime : Integer;
   i, k    : Integer;
begin
     // ------------------------------------------------------------- //
     // Test Trim
     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := SysUtils.Trim(TestString);
     Memo1.Lines.Add('Classic Trim() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }

     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := UTF8ToString(Trim(TestString));
     Memo1.Lines.Add('SynCommons Trim() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }
     Memo1.Lines.Add('------------');

     // ------------------------------------------------------------- //
     // Test Pos
     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         k     := System.Pos(' Da', TestString);
     Memo1.Lines.Add('Classic Pos() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + IntToStr(k));  // }

     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         k     := Pos(' Da', TestString);
     Memo1.Lines.Add('SynCommons Pos() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + IntToStr(k));  // }
     Memo1.Lines.Add('------------');

     // ------------------------------------------------------------- //
     // Test IntToStr
     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := IntToStr(i);
     Memo1.Lines.Add('Classic IntToStr() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }

     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := Ansi7ToString(Int32ToUTF8(i));
     Memo1.Lines.Add('SynCommons Int32ToUTF8() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }
     Memo1.Lines.Add('------------');

     // ------------------------------------------------------------- //
     // Test StringReplace
     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := SysUtils.StringReplace('(const S, OldPattern, NewPattern: RawUTF8)  Test Data  ', 'Dat', 'NEW_DAT', [rfIgnoreCase, rfReplaceAll]);
     Memo1.Lines.Add('Classic StringReplace() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }

     TheTime   := GetTickCount();
     for i := 1 to 1000000 do
         sText := UTF8ToString(SynCommons.StringReplaceAll(TestString, 'Dat', 'NEW_DAT'));
     Memo1.Lines.Add('SynCommons StringReplace() timing = ' + IntToStr(GetTickCount() - TheTime) + ' milli-seconds.' + sText);  // }
     Memo1.Lines.Add('------------');
     // ------------------------------------------------------------- //
end;
// ------------------------------------------------------------------------------ //

Offline

#7 2012-12-20 10:01:20

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,679
Website

Re: SynCommons.pas Int32ToUTF8 question

Your code is not testing the routines in multi-thread process, but in only one thread/core.
I suppose memory allocation would be a true bottleneck. TopMemory is great, but sounds a bit outdated. I did not find any upgrade since revision 3.55.
SynScaleMM did have better performance than TopMemory - did you try ScaleMM2 which sounds more stable than ScaleMM1/SynScaleMM?
http://code.google.com/p/scalemm/

Apart from the low-level functions, there are much more to do if you want your application to scale in multi-thread environement.
What we try with mORMot is to let it scale as much as possible.
Main trick is to avoid any unnecessary memory allocations.
Using functions returning string (or RawUTF8) is slow by design in this aspect.
You should better use dedicated classes avoiding memory allocations. This is what we do for all our DB or JSON process (e.g. avoiding memory copy, and working with in-place parsing and pointers).
In short: if you want to scale, forget about basic RTL functions and the "string" type, and create your own dedicated process - this was the purpose of mORMot SynCommons.pas core.
For the same reason, we used our UTF-8 type (Delphi is not UTF8 natively), to avoid conversion during internal ORM process, with the benefit of being ready to work with pre-Unicode versions of Delphi.
See http://blog.synopse.info/post/2011/05/2 … plications

What are the results of your test code on your computer?
There is a big gap between such a simple loop-based benchmark and a real application.
I'm not sure that making our optimized functions "general-purpose" would be worth it. Bottleneck is in the main coding style.

I just wrote a blog article about this point.
See http://blog.synopse.info/post/2012/12/2 … ke-it-fast

Offline

#8 2012-12-25 03:54:53

style-sheets
Member
Registered: 2011-06-29
Posts: 5

Re: SynCommons.pas Int32ToUTF8 question

At lot of things to think about, thanks for the detailed reply!

Offline

#9 2014-02-21 16:14:39

louis_riviera
Member
Registered: 2013-09-23
Posts: 61

Re: SynCommons.pas Int32ToUTF8 question

dwUtils.pas has a super fast version of IntToStr32/64 Unicode version ofcourse. smile

IMHO UTF8 is not worth it. Too many problems. You end up working AGAINST the compiler.

But if you want even more raw performance you should write it with ICC(Intel C++).

Last edited by louis_riviera (2014-02-21 16:50:35)

Offline

#10 2014-02-21 23:00:14

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,679
Website

Re: SynCommons.pas Int32ToUTF8 question

Not against the compiler, but against the slow parts of the RTL.

Our RawUtf8 type is as native as string.

Only exception is the NextGen compiler.
But this compiler is not the Delphi compiler any more.
Some monstrosity.

Offline

Board footer

Powered by FluxBB