TSynBloomFilter

ab · 2016-07-30 17:08:34

We just added a TSynBloomFilter class to our Open Source framework core.

It features:

- Automatic computation of all internal storage details (bits storage size, number of hash functions) from basic high-level information, i.e. estimation of maximum number of items, and false positive probability.
- Insertion and lookup is very optimized for speed, using hardware-accelerated crc32c hashing, reduced to only two calls, following mathematically approved best practice.
- Persistence of the data using an efficient RLE compression of continuous zeros.
- Full regression tests provided (as usual).
- Thread-safe process.
- TSynBloomFilterDiff for incremental remote synchronization.
- Works on Delphi 5 up to 10.1 Berlin, and FPC.

This is the forum thread for http://blog.synopse.info/post/2016/07/3 … d-Big-Data

edwinsn · 2016-07-31 05:20:59

Well done Arnaud, this is a new concept to me, and through all these years I have been watching you and other contributors never stop enhancing the framework, what a good thing to the Pascal/Delphi community.

mpv · 2016-07-31 09:26:53

@AB - Explain, please, for what kind of tasks you use it? (I'm read a WiKi about BloomFilter, but can't imagine where to use it in the mORMot related software)

ab · 2016-07-31 10:50:11

We use it e.g. as a frontend of several mORMot servers, to guess on which node the actual data and commands should be processed.
The frontend is asking all sub-nodes for their "bloom filter" data digest, so that it can know when the data is expected to be, without asking all nodes each time.
A few 1% of false positive if possible, but it makes a huge performance boost.

It allows to grows to billions of items over a huge set of servers, with no centralized database, and very limited bandwidth use.

mpv · 2016-07-31 12:56:08

Very interesting application. We solve the similar problem by calculating a

 serverInstance = crc32(username) mod serverCount

But this require a data moving in case we add a server to farm

ab · 2016-07-31 15:05:41

This works only if you can assign a user to a server, and if your server count remains stable, which is unpractical and almost useless in practice.
This is how MongoDB does "sharding" - see https://docs.mongodb.com/manual/sharding/

Our scaling idea is that servers may be up or down at any time, so we can add users and users should be able to move from one to another.

ab · 2017-09-04 08:36:22

You may be interrested in reading https://synopse.info/forum/viewtopic.ph … 946#p24946
It detailed a real-use of TSynBloomFilter, in the IoT context of a farm of mORMot servers.

mORMot Open Source

#1 2016-07-30 17:08:34

TSynBloomFilter

#2 2016-07-31 05:20:59

Re: TSynBloomFilter

#3 2016-07-31 09:26:53

Re: TSynBloomFilter

#4 2016-07-31 10:50:11

Re: TSynBloomFilter

#5 2016-07-31 12:56:08

Re: TSynBloomFilter

#6 2016-07-31 15:05:41

Re: TSynBloomFilter

#7 2017-09-04 08:36:22

Re: TSynBloomFilter

Board footer