#1 2011-03-16 20:34:11

dataman
Member
From: The Milky Way's outskirts
Registered: 2010-07-18
Posts: 5

Very fast asm-implementation of Base64 Encoding/Decoding

I using unit from here: http://www.delphipraxis.net/991-base64- … oding.html

Perhaps this unit will be useful to you. smile

Offline

#2 2011-03-17 06:59:15

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,510
Website

Re: Very fast asm-implementation of Base64 Encoding/Decoding

That's funny, since I've just enhanced our Base64 encoding decoding functions in SynCommons.pas unit.
(the version in SynCrtSock are more simple, but sufficient for Network use).

Since we will use Base64 encoding for blob storage (especially records or dynamic array JSON serialization), I've tuned this part.
And I'm almost sure that my pure pascal version of those functions can be compared to this asm version.

In all cases, the functions available in SynCommons are much faster than Indy's.

Thanks for the link. When I've time, I'll take a look at this code, because it's surely faster than my pascal code, but it could still be improved (it uses losd/lodsw/stosb/stosw opcodes which are known to be slow).

Offline

#3 2011-03-17 08:42:44

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,510
Website

Re: Very fast asm-implementation of Base64 Encoding/Decoding

I just wrote my own pure asm version.

Should be faster than the one you quoted:
- don't use slow lodsb/stosb and such;
- use parallelized memory reading and less register inter-instruction dependency, for better execution by the CPU pipelines;
- better in-order execution speed (for slow CPU like Atom).

See http://synopse.info/fossil/info/fd42e4b868 and http://synopse.info/fossil/info/4207d16dd4

Thanks for pushing me to write faster code. wink

Offline

#4 2011-03-19 20:37:35

dataman
Member
From: The Milky Way's outskirts
Registered: 2010-07-18
Posts: 5

Re: Very fast asm-implementation of Base64 Encoding/Decoding

Arnaud, you're the best. cool

ab wrote:

Should be faster than the one you quoted...

Unfortunately, on my very old PC (Intel Pentium 4 Prescott, 2800 MHz) I took these results (Delphi XE): sad

Starting benchmarks on 2 cores.
SynCommons.BinToBase64                   took     0,598192519 sec.
SynCommons.Base64ToBin                   took     0,563155921 sec.
BASE64Codec.Base64Encode function        took     0,724407766 sec.
BASE64Codec.Base64Decode function        took     0,657267154 sec.
BASE64Codec.Base64Decode procedure       took     0,401618684 sec.
BASE64Codec.Base64Decode procedure       took     0,338995311 sec.
All benchmarks passed.

My benchmark program:

program BASE64_Benchmarks;

{$APPTYPE CONSOLE}

uses
  Windows, SysUtils, Benchmarker, BASE64Codec, SynCommons;

procedure RunBenchmarks;

var
  b: TBenchmarker;

  OriginalRawS,
  EncodedRawS, DecodedRawS: RawByteString;

  OriginalAnsiS,
  EncodedAnsiS, DecodedAnsiS: AnsiString;

begin
  b := TBenchmarker.Create(procedure(Name: string; Time: Double)
                           begin
                             WriteLn(Format('%-40s took %15.9f sec.', [Name, Time * 1000000]));
                           end);

  try
    b.Warmups := 100;
    b.Iterations := 1000000;

    OriginalRawS := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
    OriginalAnsiS := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';

    EncodedRawS := '';
    b.Benchmark('SynCommons.BinToBase64',
                procedure
                begin
                  EncodedRawS := SynCommons.BinToBase64(OriginalRawS);
                end);

    DecodedRawS := '';

    b.Benchmark('SynCommons.Base64ToBin',
                procedure
                begin
                  DecodedRawS := SynCommons.Base64ToBin(EncodedRawS);
                end);
    Assert(DecodedRawS = OriginalRawS);

    EncodedAnsiS := '';

    b.Benchmark('BASE64Codec.Base64Encode function',
                procedure
                begin
                  EncodedAnsiS := BASE64Codec.Base64Encode(OriginalAnsiS);
                end);

    DecodedAnsiS := '';

    b.Benchmark('BASE64Codec.Base64Decode function',
                procedure
                begin
                  DecodedAnsiS := BASE64Codec.Base64Decode(EncodedAnsiS);
                end);
    Assert(DecodedAnsiS = OriginalAnsiS);

    EncodedAnsiS := '';

    b.Benchmark('BASE64Codec.Base64Decode procedure',
                procedure
                begin
                  BASE64Codec.Base64Encode(OriginalAnsiS, EncodedAnsiS);
                end);

    DecodedAnsiS := '';

    b.Benchmark('BASE64Codec.Base64Decode procedure',
                procedure
                begin
                  BASE64Codec.Base64Decode(EncodedAnsiS, DecodedAnsiS);
                end);
    Assert(DecodedAnsiS = OriginalAnsiS);

  finally
  end;
end;

procedure SetAtMaxSpeed;
var
  SysInfo: _SYSTEM_INFO;
begin
  SetPriorityClass(GetCurrentProcess, HIGH_PRIORITY_CLASS);
  SetThreadPriority(GetCurrentThread, THREAD_PRIORITY_HIGHEST);
  GetSystemInfo(SysInfo);
  WriteLn('Starting benchmarks on ', SysInfo.dwNumberOfProcessors, ' cores.');
  if SysInfo.dwNumberOfProcessors > 1 then
  begin
    SetThreadAffinityMask(GetCurrentThread, 1 shl (SysInfo.dwNumberOfProcessors - 1));
    SetProcessAffinityMask(GetCurrentProcess, 1 shl (SysInfo.dwNumberOfProcessors - 1));
  end;
end;

begin
  try
    SetAtMaxSpeed;
    RunBenchmarks;
    WriteLn('All benchmarks passed.')
  except
    on E:Exception do
      Writeln(E.Classname, ': ', E.Message);
  end;
end.

IMHO, very cool benchmarker:

unit Benchmarker;

interface

uses
  SysUtils;

type
  // TBenchmarker "stolen" from Barry Kelly -- http://barrkel.blogspot.com/2008/08/anonymous-methods-in-testing-profiling.html
  TBenchmarker = record
  private
    const
      DefaultIterations = 3;
      DefaultWarmups = 1;
    var
      FReportSink: TProc<string, Double>;
      FWarmups: Integer;
      FIterations: Integer;
      FOverhead: Double;
    class var
      FFreq: Int64;
    class procedure InitFreq; static;
  public
    constructor Create(const AReportSink: TProc<string, Double>);

    class function Benchmark(const Code: TProc;
                             Iterations: Integer = DefaultIterations;
                             Warmups: Integer = DefaultWarmups): Double; overload; static;

    procedure Benchmark(const Name: string; const Code: TProc); overload;
    function Benchmark<T>(const Name: string; const Code: TFunc<T>): T; overload;

    property Warmups: Integer read FWarmups write FWarmups;
    property Iterations: Integer read FIterations write FIterations;
  end;

implementation

uses
  Windows;

{ TBenchmarker }

constructor TBenchmarker.Create(const AReportSink: TProc<string, Double>);
begin
  InitFreq;
  FReportSink := AReportSink;
  FWarmups := DefaultWarmups;
  FIterations := DefaultIterations;

  // Estimate overhead of harness
  FOverhead := Benchmark(procedure begin end, 100, 3);
end;

class procedure TBenchmarker.InitFreq;
begin
  if (FFreq = 0) and not QueryPerformanceFrequency(FFreq) then
    raise Exception.Create('No high-performance counter available.');
end;

procedure TBenchmarker.Benchmark(const Name: string; const Code: TProc);
begin
  FReportSink(Name, Benchmark(Code, Iterations, Warmups) - FOverhead);
end;

class function TBenchmarker.Benchmark(const Code: TProc; Iterations, Warmups: Integer): Double;
var
  start, stop: Int64;
  i: Integer;
begin
  InitFreq;

  for i := 1 to Warmups do
    Code;

  QueryPerformanceCounter(start);
  for i := 1 to Iterations do
    Code;
  QueryPerformanceCounter(stop);

  Result := (stop - start) / FFreq / Iterations;
end;

function TBenchmarker.Benchmark<T>(const Name: string; const Code: TFunc<T>): T;
var
  start, stop: Int64;
  i: Integer;
begin
  for i := 1 to FWarmups do
    Result := Code;

  QueryPerformanceCounter(start);
  for i := 1 to FIterations do
    Result := Code;
  QueryPerformanceCounter(stop);

  FReportSink(Name, (stop - start) / FFreq / Iterations - FOverhead);
end;

end.

Maybe, these strange results because var parameters used ?

ab wrote:

Thanks for pushing me to write faster code. wink

I've many useful links. So, periodically, I'll distract your from hard work.  smile
With your permission, by your leave, of course.

Offline

#5 2011-06-18 11:20:18

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,510
Website

Re: Very fast asm-implementation of Base64 Encoding/Decoding

Thanks for the feedback!
I just didn't see you post on time, don't know why.

To be fair, you'll have to compare the comparable.
And your test is not fair.

In your benchmark you don't compare the speed of encoding/decoding, but the internal string mechanism of Delphi, because:
- Source data is so small that much time is not spent in the base64 processing, but in string allocation;
- The function() are always slower, because it use a temporary variable and probably reallocation.

That is, you'll have to compare the function version of both units.
In this case, SynCommons is faster.

And if you use a bigger size of data (not 36 bytes), but reasonable size (like 64 or 256 KB).
Base64 is not meant to encode a password - 36 bytes - but to make some binary content ascii-compatible (like embedding a picture in an email).

When it deals with benchmark, use a profiler, and guess how much time is spent a the source code level, not globally.
You should have use diverse data sizes, starting with the size with no overhead, or with pre-allocated static buffers, and direct lowest-level conversion functions.

With fair comparison, SynCommons is faster, I think. wink

Thanks for the distraction!

Offline

#6 2016-10-11 23:19:54

Bo
Member
From: Melbourne
Registered: 2016-07-04
Posts: 57
Website

Re: Very fast asm-implementation of Base64 Encoding/Decoding

Arnaud is right, Base64 Encoding/Decoding in SynCommons is faster with bigger data.

I was replacing our own version of Base64 Encoding/Decoding functions with mORMot's and I wanted to make sure I was doing the right thing (i.e., I made something better),
so I used the TestSQL3 to test our version (Base64GLH) together with mORMot's (Base64).

I cloned test "procedure TTestCryptographicRoutines._Base64;" to "procedure TTestCryptographicRoutines._Base64GLH;" and made it calling our versions TMySimpleHL7.Base64Encode/Base64Decode:

procedure TTestCryptographicRoutines._Base64GLH;
const
  Value64: RawUTF8 = 'SGVsbG8gL2Mn6XRhaXQg5+Ar';
var tmp: RawByteString;
    b64: RawUTF8;
    Value: WinAnsiString;
    i, L: Integer;
begin
  Value := 'Hello /c''0tait 67+';
  Value[10] := #$E9;
  Value[16] := #$E7;
  Value[17] := #$E0;
  Check(not IsBase64(Value));
  Check(TMySimpleHL7.Base64Encode(Value)=Value64);
  Check(BinToBase64(Value)=Value64);
  Check(IsBase64(Value64));
  tmp := StringFromFile(ExeVersion.ProgramFileName);
  b64 := TMySimpleHL7.Base64Encode(tmp);
  Check(IsBase64(b64));
  Check(TMySimpleHL7.Base64Decode(b64)=tmp);
  Check(BinToBase64(tmp)=b64);
  Check(Base64ToBin(b64)=tmp);
  tmp := '';
  for i := 1 to 1998 do begin
    b64 := TMySimpleHL7.Base64Encode(tmp);
    Check(TMySimpleHL7.Base64Decode(b64)=tmp);
    Check((tmp='') or IsBase64(b64));
    Check(BinToBase64(tmp)=b64);
    Check(Base64ToBin(b64)=tmp);
    if tmp<>'' then begin
      L := length(b64);
      Check(not IsBase64(pointer(b64),L-1));
      b64[Random(L)+1] := '&';
      Check(not IsBase64(pointer(b64),L));
    end;
    tmp := tmp + AnsiChar(Random(255));
  end;
end;

the outcome was not good, Base64GLH ran faster:

 1.5. Cryptographic routines:
  - Adler32: 1 assertion passed  365us
  - MD5: 86 assertions passed  267us
  - SHA1: 10 assertions passed  4.93ms
  - SHA256: 8 assertions passed  6.56ms
  - AES256: 17,428 assertions passed  284.57ms
      cypher 1..2409 bytes with AES-NI: 312us, without: 11.42ms
  - RC4: 1 assertion passed  125us
  - Base64: 11,994 assertions passed  70.41ms
  - Base64GLH: 11,994 assertions passed  55.84ms
  - CompressShaAes: 1,683 assertions passed  3.49ms
  - TAESPNRG: 13,300 assertions passed  51.33ms
  Total failed: 0 / 56,505  - Cryptographic routines PASSED  484.30ms

I thought it might be the number of the test was not big enough, so I increased the loop to 10 times from 1998 to 19980, the outcome was still not good:

 1.5. Cryptographic routines:
  - Adler32: 1 assertion passed  378us
  - MD5: 86 assertions passed  272us
  - SHA1: 10 assertions passed  5.02ms
  - SHA256: 8 assertions passed  6.57ms
  - AES256: 17,428 assertions passed  207.10ms
      cypher 1..2409 bytes with AES-NI: 287us, without: 5.59ms
  - RC4: 1 assertion passed  103us
  - Base64: 119,886 assertions passed  2.82s
  - Base64GLH: 119,886 assertions passed  2.13s
  - CompressShaAes: 1,683 assertions passed  3.38ms
  - TAESPNRG: 13,300 assertions passed  73.87ms
  Total failed: 0 / 272,289  - Cryptographic routines PASSED  5.26s

until I saw this thread, I then gave a go with bigger data: changed tmp's value by adding string in Value instead of on char at a time

procedure TTestCryptographicRoutines._Base64GLH;
const
  Value64: RawUTF8 = 'SGVsbG8gL2Mn6XRhaXQg5+Ar';
var tmp: RawByteString;
    b64: RawUTF8;
    Value: WinAnsiString;
    i, L: Integer;
begin
  Value := 'Hello /c''0tait 67+';
  Value[10] := #$E9;
  Value[16] := #$E7;
  Value[17] := #$E0;
  Check(not IsBase64(Value));
  Check(TMySimpleHL7.Base64Encode(Value)=Value64);
  Check(BinToBase64(Value)=Value64);
  Check(IsBase64(Value64));
  tmp := StringFromFile(ExeVersion.ProgramFileName);
  b64 := TMySimpleHL7.Base64Encode(tmp);
  Check(IsBase64(b64));
  Check(TMySimpleHL7.Base64Decode(b64)=tmp);
  Check(BinToBase64(tmp)=b64);
  Check(Base64ToBin(b64)=tmp);
  tmp := '';
  for i := 1 to 19980 do begin
    b64 := TMySimpleHL7.Base64Encode(tmp);
    Check(TMySimpleHL7.Base64Decode(b64)=tmp);
    Check((tmp='') or IsBase64(b64));
    Check(BinToBase64(tmp)=b64);
    Check(Base64ToBin(b64)=tmp);
    if tmp<>'' then begin
      L := length(b64);
      Check(not IsBase64(pointer(b64),L-1));
      b64[Random(L)+1] := '&';
      Check(not IsBase64(pointer(b64),L));
    end;
    tmp := tmp+value;
  end;
end;

then Base64 ran faster than Base64GLH:

 1.5. Cryptographic routines:
  - Adler32: 1 assertion passed  367us
  - MD5: 86 assertions passed  258us
  - SHA1: 10 assertions passed  4.76ms
  - SHA256: 8 assertions passed  6.49ms
  - AES256: 17,428 assertions passed  206.63ms
      cypher 1..2409 bytes with AES-NI: 278us, without: 5.47ms
  - RC4: 1 assertion passed  89us
  - Base64: 119,886 assertions passed  28.90s
  - Base64GLH: 119,886 assertions passed  39.11s
  - CompressShaAes: 1,683 assertions passed  7.71ms
  - TAESPNRG: 13,300 assertions passed  51.35ms
  Total failed: 0 / 272,289  - Cryptographic routines PASSED  68.29s

Offline

#7 2016-10-12 05:15:25

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,510
Website

Re: Very fast asm-implementation of Base64 Encoding/Decoding

BTW, the asm code in http://www.delphipraxis.net/991-base64- … oding.html is not correct.
For instance, it does not preserve non-volatile registers like ESI or EDI.
In some situation, it may just break and gives random access violations or unexpected behavior.

And there is no x64 and pascal/ARM alternative.

So, I would stay away from this code as much as possible.

Offline

Board footer

Powered by FluxBB