#1 2024-02-14 16:55:36

Uefi
Member
Registered: 2024-02-14
Posts: 38

TSynBloomFilter skips duplicate rows

Hello, sometimes TSynBloomFilter still misses duplicate rows, this is especially noticeable when working with large amounts of data of several million rows, maybe I'm doing something wrong ?

uses
SynTable;

function GenerateRandomLetters(Length: Integer): string;
var
  i: Integer;
begin
Result := '';
for i := 1 to Length do
Result := Result + Chr(Ord('a') + Random(Ord('c') - Ord('a') + 1));
end;

procedure Duplicates;
var
BloomFilter:TSynBloomFilter;
I:integer;
s:string;
begin
BloomFilter:=TSynBloomFilter.Create(100000,0);
for i:=1 to 1000000 do begin
s:=GenerateRandomLetters(5);
if not BloomFilter.MayExist(s) then begin
BloomFilter.Insert(s);
Form2.Memo1.Lines.Add(s);
end;
end;
BloomFilter.Free;
end;

procedure TForm2.Button1Click(Sender: TObject);
begin
Randomize;
Duplicates;
end;

Last edited by Uefi (2024-02-14 16:58:04)

Offline

#2 2024-02-14 19:23:13

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,653
Website

Re: TSynBloomFilter skips duplicate rows

Which version of Delphi are you using?
Which version of mORMot 1 do you use?
Did you try with mORMot 2?

Don't use s: string but RawUtF8.

The TTestCoreBase.BloomFilters regression tests work with no problem with SIZ = 2000000.

Offline

#3 2024-02-14 20:14:07

Uefi
Member
Registered: 2024-02-14
Posts: 38

Re: TSynBloomFilter skips duplicate rows

ab wrote:

Which version of Delphi are you using?
Which version of mORMot 1 do you use?
Did you try with mORMot 2?

Don't use s: string but RawUtF8.

The TTestCoreBase.BloomFilters regression tests work with no problem with SIZ = 2000000.

Hail, mORMot version 1, well look, I need, for example, from a file of 100 gigabytes in size to clear duplicate lines using TSynBloomFilter, I read the file line by line using TStreamReader I can do that ?
p.s Delphi XE2

Last edited by Uefi (2024-02-14 20:14:42)

Offline

Board footer

Powered by FluxBB