We just added a TSynBloomFilter class to our Open Source framework core.
- Automatic computation of all internal storage details (bits storage size, number of hash functions) from basic high-level information, i.e. estimation of maximum number of items, and false positive probability.
- Insertion and lookup is very optimized for speed, using hardware-accelerated crc32c hashing, reduced to only two calls, following mathematically approved best practice.
- Persistence of the data using an efficient RLE compression of continuous zeros.
- Full regression tests provided (as usual).
- Thread-safe process.
- TSynBloomFilterDiff for incremental remote synchronization.
- Works on Delphi 5 up to 10.1 Berlin, and FPC.
This is the forum thread for http://blog.synopse.info/post/2016/07/3 … d-Big-Data
Well done Arnaud, this is a new concept to me, and through all these years I have been watching you and other contributors never stop enhancing the framework, what a good thing to the Pascal/Delphi community.
Delphi XE4 Professional. Windows 7 64bit.
We use it e.g. as a frontend of several mORMot servers, to guess on which node the actual data and commands should be processed.
The frontend is asking all sub-nodes for their "bloom filter" data digest, so that it can know when the data is expected to be, without asking all nodes each time.
A few 1% of false positive if possible, but it makes a huge performance boost.
It allows to grows to billions of items over a huge set of servers, with no centralized database, and very limited bandwidth use.
This works only if you can assign a user to a server, and if your server count remains stable, which is unpractical and almost useless in practice.
This is how MongoDB does "sharding" - see https://docs.mongodb.com/manual/sharding/
Our scaling idea is that servers may be up or down at any time, so we can add users and users should be able to move from one to another.