#1 Re: mORMot 1 » About ZLib/SynLZ/Snappy/LZ4 » 2017-08-14 10:58:07

I still see a few opportunities for improvement. You use plain x64 instructions for zero init. These can be optimized by using SSE 128 registers. 
I'd have to look into it more.

#2 Re: Low level and performance » New crc32c() function using optimized asm and SSE 4.2 instruction » 2015-07-11 17:09:44

Fixed all the issues with testing and added some stress testing to the suite.
The result on my AMD laptop.

Starting testing
Zipcodes 00000..99999
BobJenkins Lookup3 took 127 ms, 272472 ticks
BobJenkins Lookup3 has 34 max low collisions
BobJenkins Lookup3 has 34 max high collisions
BobJenkins Lookup3 has 1 real collisions
Zipcodes 00000..99999
MurmurHash3 took 88 ms, 188765 ticks
MurmurHash3 has 34 max low collisions
MurmurHash3 has 34 max high collisions
MurmurHash3 has 1 real collisions
Zipcodes 00000..99999
CRC32c took 93 ms, 199934 ticks
CRC32c has 23 max low collisions
CRC32c has 24 max high collisions
CRC32c has 0 real collisions
Zipcodes 00000..99999
FNV_1a_Meiyan took 65 ms, 140930 ticks
FNV_1a_Meiyan has 32 max low collisions
FNV_1a_Meiyan has 32 max high collisions
FNV_1a_Meiyan has 0 real collisions
integers
BobJenkins Lookup3 took 111 ms, 238801 ticks
BobJenkins Lookup3 has 35 max low collisions
BobJenkins Lookup3 has 33 max high collisions
BobJenkins Lookup3 has 1 real collisions
integers
MurmurHash3 took 32 ms, 69188 ticks
MurmurHash3 has 34 max low collisions
MurmurHash3 has 36 max high collisions
MurmurHash3 has 0 real collisions
integers
CRC32c took 47 ms, 102769 ticks
CRC32c has 16 max low collisions
CRC32c has 16 max high collisions
CRC32c has 0 real collisions
integers
FNV_1a_Meiyan took 25 ms, 53990 ticks
FNV_1a_Meiyan has 47 max low collisions
FNV_1a_Meiyan has 23 max high collisions
FNV_1a_Meiyan has 0 real collisions
Loading 120 MB password file...
Done loading, starting the hash...
120 MB of Passwords
BobJenkins Lookup3 took 3714 ms, 7960607 ticks
BobJenkins Lookup3 has 2593 max low collisions
BobJenkins Lookup3 has 2595 max high collisions
BobJenkins Lookup3 has 1 real collisions
120 MB of Passwords
MurmurHash3 took 2562 ms, 5491776 ticks
MurmurHash3 has 2625 max low collisions
MurmurHash3 has 2620 max high collisions
MurmurHash3 has 1 real collisions
120 MB of Passwords
CRC32c took 2647 ms, 5674922 ticks
CRC32c has 2634 max low collisions
CRC32c has 2603 max high collisions
CRC32c has 2 real collisions
120 MB of Passwords
FNV_1a_Meiyan took 2381 ms, 5103116 ticks
FNV_1a_Meiyan has 2635 max low collisions
FNV_1a_Meiyan has 2622 max high collisions
FNV_1a_Meiyan has 1 real collisions
120 MB of Passwords as fast as possible
BobJenkins Lookup3 took 2852 ms, 6113225 ticks
BobJenkins Lookup3 has 0 max low collisions
BobJenkins Lookup3 has 0 max high collisions
BobJenkins Lookup3 has 0 real collisions
120 MB of Passwords as fast as possible
MurmurHash3 took 1707 ms, 3660022 ticks
MurmurHash3 has 0 max low collisions
MurmurHash3 has 0 max high collisions
MurmurHash3 has 0 real collisions
120 MB of Passwords as fast as possible
CRC32c took 1805 ms, 3869382 ticks
CRC32c has 0 max low collisions
CRC32c has 0 max high collisions
CRC32c has 0 real collisions
120 MB of Passwords as fast as possible
FNV_1a_Meiyan took 1560 ms, 3344962 ticks
FNV_1a_Meiyan has 0 max low collisions
FNV_1a_Meiyan has 0 max high collisions
FNV_1a_Meiyan has 0 real collisions
Done, press a key....

The longer the data to be hashed is, the faster Meiyan performs. 
So far so big issues with collisions.

Can you make your hash functions compatible with the RTL ones?
Your order of parameters does not match Emba's. 

This makes plugin of your hashes needlessly complicated.

I need this code to bridge the difference.

function HashCrc32(const Data; Len, InitData: integer): integer;
begin
  Result:= SynCommons.crc32cfast(InitData, @Data, Len);
end;

This slows down the hashing.
And on the CRC32SSE version that takes up a non-trivial percentage of the time.

Also I don't understand why the hashfunction takes a PAnsiString. Any pointer will do, surely.

Note that I'm using XE7. Dunno if Bobjenkins is optimized as much as possible.

Last remark:
The FNV_1a_Meiyan PurePascal version runs at the same speed as the optimized assembly function.
(At least on my laptop). All other hashes have PurePascal versions that run notably slower.

#3 Re: mORMot 1 » XE9 Beta » 2015-07-11 16:56:06

Would it not be better to use
{$if compilerversion >= xxx}
that way you're not hung up on the exact compiler version.

#4 Re: Low level and performance » New crc32c() function using optimized asm and SSE 4.2 instruction » 2015-06-26 18:59:38

On my laptop: AMD64 crc32cfast is really really slow. 
The following code:

program HashSpeedTest;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  SynCommons in '..\..\Mormot\SynCommons.pas',
  FastDefaults,
  System.Generics.Defaults,
  System.Diagnostics;

var
  HashTest: array[0..10000] of Ansistring;

procedure InitHashTest;
var
  i,j: Integer;
  Len: integer;
  r: AnsiChar;
begin
  for i := Low(HashTest) to High(HashTest) do begin
    len:= Random(30)+3;
    SetLength(HashTest[i],Len);
    for j := 1 to len do begin
      r:= AnsiChar(Chr(Random(250)+1));
      HashTest[i][j]:= r;
    end;
  end;
  WriteLn('Init done');
end;

procedure crc;
var
  Timer: TStopWatch;
  i,a: integer;
  PrevCrc: integer;

begin
  for a := 0 to 5 do begin
    Timer:= TStopWatch.StartNew;
    PrevCRC:= 0;
    for i:= Low(HashTest) to High(HashTest) do begin
      PrevCrc:= crc32cfast(PrevCRC, PAnsiChar(pointer(HashTest[i])), Length(HashTest));
    end;
    Timer.Stop;
    WriteLn(Format('Crc32 took: %d millisec = %d ticks ',[Timer.ElapsedMilliseconds, Timer.ElapsedTicks]));
  end;
end;

procedure Murmur;
var
  Timer: TStopWatch;
  i,a: integer;
  PrevCrc: integer;
begin
  for a := 0 to 5 do begin
    Timer:= TStopWatch.StartNew;
    PrevCRC:= 0;
    for i:= Low(HashTest) to High(HashTest) do begin
      PrevCrc:= MurmurHash3((@HashTest[i])^, Length(HashTest[i]), PrevCrc);
    end;
    Timer.Stop;
    WriteLn(Format('Murmurhash3 took: %d millisec = %d ticks ',[Timer.ElapsedMilliseconds, Timer.ElapsedTicks]));
  end;
end;

procedure BobJenkins;
var
  Timer: TStopWatch;
  i,a: integer;
  PrevCrc: integer;
begin
  for a := 0 to 5 do begin
    Timer:= TStopWatch.StartNew;
    PrevCRC:= 0;
    for i:= Low(HashTest) to High(HashTest) do begin
      PrevCrc:= BobJenkinsHash((@HashTest[i])^, Length(HashTest[i]), PrevCrc);
    end;
    Timer.Stop;
    WriteLn(Format('Bobjenkins took: %d millisec = %d ticks ',[Timer.ElapsedMilliseconds, Timer.ElapsedTicks]));
  end;
end;

begin
  InitHashTest;
  Crc;
  Murmur;
  BobJenkins;
  WriteLn('done, press a key...');
  ReadLn;
end.

Outputs:

Init done
Crc32 took: 107 millisec = 230030 ticks
Crc32 took: 87 millisec = 187325 ticks
Crc32 took: 91 millisec = 195064 ticks
Crc32 took: 91 millisec = 196798 ticks
Crc32 took: 100 millisec = 215848 ticks
Crc32 took: 94 millisec = 202032 ticks
Murmurhash3 took: 0 millisec = 1312 ticks
Murmurhash3 took: 0 millisec = 1386 ticks
Murmurhash3 took: 1 millisec = 2561 ticks
Murmurhash3 took: 1 millisec = 2472 ticks
Murmurhash3 took: 1 millisec = 2464 ticks
Murmurhash3 took: 1 millisec = 2476 ticks
Bobjenkins took: 3 millisec = 7612 ticks
Bobjenkins took: 3 millisec = 7431 ticks
Bobjenkins took: 3 millisec = 6880 ticks
Bobjenkins took: 1 millisec = 3786 ticks
Bobjenkins took: 1 millisec = 3753 ticks
Bobjenkins took: 1 millisec = 3752 ticks
done, press a key...

As you can see I've eliminated the unneeded calls to `UniqueString` so what am I missing. 
crc32c was supposed to be the end-all of hashes?
What am I missing?

#5 Re: Low level and performance » LZO and SynLZ compression units » 2014-11-04 11:35:58

a hashing limitation makes SynLZ sometimes unable to pack continuous blocks of same byte -> SynLZ is perfect for xml/text, but SynLZO is prefered for database files

I would like to use the LZ compression to shrink Bitmap images in my application. 
Because the bitmaps are screenshots it will have many repeating bytes.
Does the above limitation that I cannot use `SynLZ`, but must use `SynLZO` instead?

Board footer

Powered by FluxBB