You are not logged in.
Pages: 1
Received by email:
I have seen on net an hash called "DJB Hash" (I think the author is D. J. Bernstein http://cr.yp.to/ creator of qmail, djbdns, etc)
It goes something like this...
function DjbHash(const Str : AnsiString) : Cardinal; var i : Cardinal; begin Result := 5381; for i := 1 to Length(Str) do Result := ((Result shl 5) + Result) + Ord(Str[i]); // r*33 = r*32+r end;
I think this would be faster than adler32 (and your Hash32), but I don't have any idea regarding its quality (although in my small test it didn't have collisions for small text where adler32 had; but no idea with large files)
I plan to use this on some code (to verify file checksum). I mailed the author but never received reply (there is no mention of that hash on his site too). I don't know where to ask. As you have written a faster/better hash than adler32, I think I may get an answer from you.
Is it a good hash function? Usable?
Your comments?Thank you
Offline
This is a well-known hash algorithm.
It will be slower than both our adler32 asm implementation and our Hash32 (the later read the data by DWORD, which is much faster).
Because both are optimized for modern CPUs.
The code size is not the speed key.
The problem of this hash is that it was designed for text hashing.
This is the purpose of the shl 5.
On binary data, it will have more collision than adler32.
Adler32 would have a bit more collisions than this hash.
Hash32 would have less collisions.
crc32 would be the best suitable if you aim is to avoid collision.
Our crc32 implementation in SynZip unit is very fast, quite as fast as adler32.
And if you need real hashing, use SHA-1 or SHA-256.
Our asm optimized SHA-256 version is very fast, available in SynCrypto.
But Hash32 is the fastest of all, in all my tests.
That's a bit funny, in the last version of SynPdf, I used 4 hash algorithms at once, to avoid any collision issue: adler32, hash32, crc32 and the hashof from IniFiles (in an optimized version).
See http://synopse.info/fossil/finfo?name=SynPdf.pas and search for "4 algorithms to rule them all"...
About speed, use a precise timing about diverse size of data.
This is the only way of saying what is fast, and what is not fast.
Offline
I tried reading data by DWORDs with DJB, but still Hash32 is faster than DJB (approx 2.5 times). The reason: it seems add is faster than bit shift and Hash32 does only add (inc())
// only to test performance; few bytes at end not used in hash
function DjbHash(data: pointer; len: integer): cardinal;
type
PCardinal = ^cardinal;
function SubHash(P: PCardinal; L: integer): cardinal;
var
i: cardinal;
begin
result := 5381;
if p = nil then
Exit;
// few bytes at end discarded, as it wouldn't affect performance test
for i := 1 to L div sizeof(cardinal) do begin
result := ((result shl 5)+result) + p^;
inc(p);
end;
end;
begin
result := SubHash(data, len);
end;
My understanding correct?
PS: I observed performance with difference in QueryPerformanceCounter value before and after calling the hash function. I think it is fair.
PS2: How s1 and s2 differ?
s1 := 0;
s2 := 0;
for i := 1 to L shr 4 do begin // 16 bytes (4 DWORD) by loop - aligned read
inc(s1,P^[0]);
inc(s2,s1);
inc(s1,P^[1]);
inc(s2,s1);
inc(s1,P^[2]);
inc(s2,s1);
inc(s1,P^[3]);
inc(s2,s1);
inc(PtrUInt(P),16);
end;
In your Hash32() code, after the end of first(main) loop s1 and s2 differ (it should by purpose I think). But by looking at the code plainly it seems both s1 and s2 would have the same value. Where do I overlook?
Thanks.
Offline
First of all, using a shl 5 with a DWORD doesn't make sense: you're taking in account the interresting data only for one byte per every four...
About speed, it's not about add and shift speed itself, which are the same.
You've to take in account that execution is not linear in a modern CPU: that is, there are pipelines, and that x86 asm is converted to internal instructions in multiple execution units, working in parallel, then putting in common their results.
And that the memory access is buffered.
So for this line:
result := ((result shl 5)+result) + p^;
You have 3 instructions (one shl, two adds) and one memory access.
And it does depend on the previous result to begin the next execution.
And the "for i" loop will make a decrement of an internal register, then a comparison.
Whereas for the Hash32 code, you've only two adds and one memory access, i.e. one shl less than in the other algo.
And the "for i" loop handle 4 DWORDs before the decrement+comparison. So the CPU will be able to parallelize the inc() instructions, then put them altogether at the end of the loop (this is not exactly the case, it's more complicated than that, but you got the idea).
About the Hash32() code, s1 and s2 would never have the same value, because there is a inc(s2,s1) every two instructions. Use the step by step debugger, and you'll see how it works.
If you try to achieve best speed, take a look at some recent reference documentation.
You've some very good reference material available at http://www.agner.org/optimize/
Offline
I never imagined there would be this much stuff to work with to achieve speed.
About the Hash32() code, s1 and s2 would never have the same value, because there is a inc(s2,s1) every two instructions. Use the step by step debugger, and you'll see how it works.
My mad. It became clear that s2 is cumulative sum when I manually gone through the loop just for 2 iterations.
If you try to achieve best speed, take a look at some recent reference documentation.
You've some very good reference material available at http://www.agner.org/optimize
I don't know assembly and have no plans to learn
BTW, While you are there..., why I have to worry?
Offline
Pages: 1