You are not logged in.
Pages: 1
Hello, sometimes TSynBloomFilter still misses duplicate rows, this is especially noticeable when working with large amounts of data of several million rows, maybe I'm doing something wrong ?
uses
SynTable;
function GenerateRandomLetters(Length: Integer): string;
var
i: Integer;
begin
Result := '';
for i := 1 to Length do
Result := Result + Chr(Ord('a') + Random(Ord('c') - Ord('a') + 1));
end;
procedure Duplicates;
var
BloomFilter:TSynBloomFilter;
I:integer;
s:string;
begin
BloomFilter:=TSynBloomFilter.Create(100000,0);
for i:=1 to 1000000 do begin
s:=GenerateRandomLetters(5);
if not BloomFilter.MayExist(s) then begin
BloomFilter.Insert(s);
Form2.Memo1.Lines.Add(s);
end;
end;
BloomFilter.Free;
end;
procedure TForm2.Button1Click(Sender: TObject);
begin
Randomize;
Duplicates;
end;
Last edited by Uefi (2024-02-14 16:58:04)
Offline
Which version of Delphi are you using?
Which version of mORMot 1 do you use?
Did you try with mORMot 2?
Don't use s: string but RawUtF8.
The TTestCoreBase.BloomFilters regression tests work with no problem with SIZ = 2000000.
Offline
Which version of Delphi are you using?
Which version of mORMot 1 do you use?
Did you try with mORMot 2?Don't use s: string but RawUtF8.
The TTestCoreBase.BloomFilters regression tests work with no problem with SIZ = 2000000.
Hail, mORMot version 1, well look, I need, for example, from a file of 100 gigabytes in size to clear duplicate lines using TSynBloomFilter, I read the file line by line using TStreamReader I can do that ?
p.s Delphi XE2
Last edited by Uefi (2024-02-14 20:14:42)
Offline
Pages: 1