You are not logged in.
Pages: 1
Hi, I tried another implementation of the bloom filter and this is what I noticed:
1. TSynBloomFilter doesn't work with probability 0.00000000000001 maximum from 0.00001
2. TSynBloomfiter has too many collisions, about 1 in every 1000
In other implementations there is approximately 1 collision per 10 thousand
Please study and try another implementation BloomFilter:
https://gist.github.com/synopse/b418126 … 6811203928
Please review the current implementation of the bloom filter, please make improvements, there are a lot of collisions
Offline
Full module texrex if need:
https://drive.google.com/file/d/15c1WyT … drive_link
I had to fix the module since it was written for a Pascal object, I seem to have fixed everything
Offline
It seems to me that there are some errors or shortcomings in the current implementation of the bloom filter, so there are a lot of collisions !
Offline
Please follow the forum rules and don't put code in the thread itself.
How then to place the code?
Offline
Please follow the forum rules and don't put code in the thread itself.
May I as why is this?
Would seem, if correct highlighting available, it would be better to post into the forum. External source(s) might disappear and make forum thread useless later on.
-Tee-
Offline
1) Because it is part of the rules you accepted.
And we follow the rules, don't we?
Otherwise, it won't compile.
2) Because the forum use a php engine with a SQLite3 backend which is not good at huge content, and we backup the whole DB file at once.
So we try to keep the posts not too verbose, to keep the DB small enough for the long term (we have more than 42,000 of posts since 2010).
I have no time to change the forum engine.
3) Because using a gist is always better to copy & paste, and make several revisions, and tend to favor not snippets of code, but a minimal reproducible example.
As the forum rules proposes.
Offline
Hi, I tried another implementation of the bloom filter and this is what I noticed:
In other implementations there is approximately 1 collision per 10 thousand
Where is this other implementation from?
What is it's licence?
-Tee-
Offline
@UEfi I don't see any implementation differences between TSynBloomFilter and the one you provided.
The only difference is that we use the "addition trick" of https://www.eecs.harvard.edu/~michaelm/ … -02-05.pdf which is well used and proven.
I don't want to start again the same discussion we had
https://synopse.info/forum/viewtopic.php?id=6864
There was a lot of misunderstanding in our previous discussions, I can't find what you are expecting and if something is wrong in our class.
What we call "false positive" is computed and verified as such in our regression tests TTestCoreBase.BloomFilters:
n := 0;
for i := SIZ + 1 to SIZ + SIZ shr 5 do
if b.MayExist(@i, SizeOf(i)) then
inc(n);
falsepositive := (n * 100) / (SIZ shr 5);
Check(falsepositive < 1, 'falsepositive=%', [falsepositive]);
With crc32c() we are at 0.704 for the integers in 0..SIZ range. With crc() we have 0.9. With other functions like AesNiHash32, which is fed from random so gives diverse values, we are around 0.8 - 1.2, which is as expected with our FalsePositivePercent = 1.
Bloom filters usecase is with FalsePositivePercent around 1, typically 0.5, 1, 2, or 5. If you expect 0.0000x then you need to use another algorithm.
Offline
2) Because the forum use a php engine with a SQLite3 backend which is not good at huge content, and we backup the whole DB file at once.
Fair enough...
Offline
@ab Well, please test in your module collisions occur 1 in 5000 in this third-party library collisions occur 1 in 300000, but the formula is the same here and there
Bloom:=TTrBloomFilter.Create(100000000, 0.00000000000001);
Bloom:=TSynBloomFilter.Create(100000000, 0.000001, cryptcrc32(camd5));
Please test how many collisions there will be in both cases in any way
Last edited by Uefi (2024-12-11 20:01:54)
Offline
Pages: 1