#1 2011-08-12 18:52:26

BeginEnd;
Member
Registered: 2011-01-08
Posts: 9

Hash32: Lesser collisions?

If subtraction is used alternatively (or in a some more advanced manner),
and that sum too used in the result, then your Hash32() would have
even less collisions? May be also a cumulative sum of s3 would be of use?

s3 := 0;
...
      inc(s1,P^[0]);
    inc(s3, P^[0]);
      inc(s2,s1);
      inc(s1,P^[1]);
    dec(s3,P^[1]);
      inc(s2,s1);
      inc(s1,P^[2]);
    inc(s3,P^[2]);
      inc(s2,s1);
      inc(s1,P^[3]);
    dec(s3,P^[3]);
      inc(s2,s1);
      inc(PtrUInt(P),16);

In http://burtleburtle.net/bob/c/checksum.c , the author (Bob Jenkins)
returns 8 integers as checksum in hash(), so very less chance for collisions?.

Likewise, in your Hash32 (may be you have a better name like BouchezHash()),
if you return multiple integers (say s2,s3) , may be it becomes even less in collisions.
(I don't know why you would mix s2 and s1, losing the fine value of s2 in the result)

May I know the significance of "result := s1 xor (s2 shl 16);"
I think s2 is the one that avoids the collions more (as far as s1, s1 would be same
even if all the bytes were swapped in any combination). So, why not use s2 alone?


JFYI:
Bob Jenkins home page: http://burtleburtle.net/bob/
He also has a hash called lookup3 (http://burtleburtle.net/bob/c/lookup3.c),
64bit: http://burtleburtle.net/bob/c/lookup8.c
which he claims faster than crc32 (http://burtleburtle.net/bob/hash/examhash.html)
and claims 0 collisions.

His ISAAC has been ported in many languages including Pascal/Delphi
(http://home.netsurf.de/wolfgang.ehrhardt/misc_en.html) by
Wolfgang Ehrhardt another Delphi/Speed expert like you smile
(http://home.netsurf.de/wolfgang.ehrhardt/)

Offline

#2 2011-08-13 05:49:12

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Hash32: Lesser collisions?

You've a good panel of algorithms at http://www.strchr.com/hash_functions

crc32 is the fastest and with less collisions (in the BYFOUR version) and its code is short (when you have the lookup tables).
If available, I'll use it instead in my code.
And I've put a kr32 function, which has much less collisions, as default hashing function for TDynArrayHashed.

Our hash32 is simple a DWORD version of the Adler32 algorithm.
I didn't want to reinvent the wheel, just have some hash at hand.

Of course, hash32 may be improved, but it's fairly basic.
Using a sub won't change much the colision rate.
We could get rid of the "mod" at the end of the calculation. But it was there in the Adler32 version...

Offline

#3 2011-08-14 10:43:22

BeginEnd;
Member
Registered: 2011-01-08
Posts: 9

Re: Hash32: Lesser collisions?

Oh no! smile  I shouldn't have underestimated your amount of research

Offline

#4 2011-11-18 10:24:11

knight_killer
Member
Registered: 2011-11-18
Posts: 4

Re: Hash32: Lesser collisions?

ab wrote:

crc32 is the fastest and with less collisions (in the BYFOUR version)

Do you know http://www.team5150.com/~andrew/noncryptohashzoo/? It seems CRC32 isn't the fastest and has also some issues (Avalanche, Seed Avalanche and Bit Independence Criterion Tests). But the "CrapWOW" Hash seems pretty good... MurMur3 isn't bad too.
Too bad, they didn't tested the Google Hash (dense_hash_map) from http://code.google.com/p/google-sparsehash

Greets
knight_killer

Offline

#5 2011-11-18 10:54:46

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Hash32: Lesser collisions?

The source code used for testing crc32 was not the latest version included in the zlib.
For instance, the one included in our SynZip library is hand-tuned asm using 8 tables.

I'm not sure this "CrapWow" hashing is so quick by design.
It uses a multiplication for each DWORD, so it will probably slower than an optimized per-8 crc32 with precomputed tables or a good plain K&R's. This hash algo is just another variation of multiplicative hash function + shift. No new idea within.
Furthermore, I suspect the code is not re-entrant (whereas crc32 or adler32 can be called once or per chunk).
The supplied asm code is far from optimized itself (there is no good pipelining work within).

I prefer the http://www.strchr.com/hash_functions page for comparison.

I do not know dense_hash_map, but perhaps the "sparse table" trick does wonders with CPU cache size.

Offline

#6 2011-11-25 06:58:10

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Hash32: Lesser collisions?

Offline

#7 2013-01-05 13:57:01

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: Hash32: Lesser collisions?

AB, do you see this string hashing algorithm - i'ts a 2012 year algorithm winner http://arxiv.org/abs/1202.4961   May be used in mORMot...

Offline

#8 2013-01-05 17:58:43

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Hash32: Lesser collisions?

It should have less collisions than Hash32, but is more complex, also certainly slower.
They require a modern CPU.

I suspect they will be difficult to be implemented in non binary languages, like JavaScript.
Whereas I already have CRC32 + SHA256 in Smart/JavaScript.
See http://blog.synopse.info/post/2012/04/1 … JavaScript

Interresting, but perhaps not worth it now.

Offline

Board footer

Powered by FluxBB