#1 2016-07-30 17:08:34

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

TSynBloomFilter

We just added a TSynBloomFilter class to our Open Source framework core.

It features:

- Automatic computation of all internal storage details (bits storage size, number of hash functions) from basic high-level information, i.e. estimation of maximum number of items, and false positive probability.
- Insertion and lookup is very optimized for speed, using hardware-accelerated crc32c hashing, reduced to only two calls, following mathematically approved best practice.
- Persistence of the data using an efficient RLE compression of continuous zeros.
- Full regression tests provided (as usual).
- Thread-safe process.
- TSynBloomFilterDiff for incremental remote synchronization.
- Works on Delphi 5 up to 10.1 Berlin, and FPC.

This is the forum thread for http://blog.synopse.info/post/2016/07/3 … d-Big-Data

Offline

#2 2016-07-31 05:20:59

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Re: TSynBloomFilter

Well done Arnaud, this is a new concept to me, and through all these years I have been watching you and other contributors never stop enhancing the framework, what a good thing to the Pascal/Delphi community.


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#3 2016-07-31 09:26:53

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: TSynBloomFilter

@AB - Explain, please, for what kind of tasks you use it? (I'm read a WiKi about BloomFilter, but can't imagine where to use it in the mORMot related software)

Offline

#4 2016-07-31 10:50:11

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: TSynBloomFilter

We use it e.g. as a frontend of several mORMot servers, to guess on which node the actual data and commands should be processed.
The frontend is asking all sub-nodes for their "bloom filter" data digest, so that it can know when the data is expected to be, without asking all nodes each time.
A few 1% of false positive if possible, but it makes a huge performance boost.

It allows to grows to billions of items over a huge set of servers, with no centralized database, and very limited bandwidth use.

Offline

#5 2016-07-31 12:56:08

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,539
Website

Re: TSynBloomFilter

Very interesting application. We solve the similar problem by calculating a

 serverInstance = crc32(username) mod serverCount 

But this require a data moving in case we add a server to farm

Offline

#6 2016-07-31 15:05:41

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: TSynBloomFilter

This works only if you can assign a user to a server, and if your server count remains stable, which is unpractical and almost useless in practice.
This is how MongoDB does "sharding" - see https://docs.mongodb.com/manual/sharding/

Our scaling idea is that servers may be up or down at any time, so we can add users and users should be able to move from one to another.

Offline

#7 2017-09-04 08:36:22

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: TSynBloomFilter

You may be interrested in reading https://synopse.info/forum/viewtopic.ph … 946#p24946
It detailed a real-use of TSynBloomFilter, in the IoT context of a farm of mORMot servers.

Offline

Board footer

Powered by FluxBB