#1 2016-04-12 07:48:16

hnb
Member
Registered: 2015-06-15
Posts: 291

SynCrtSock significant bug

to be more precise the bug exist in THttpSocket.CompressDataAndWriteHeaders. After investigation I found very strange things. (topic is related to http://synopse.info/forum/viewtopic.php?id=3223 )

How it was debugged? Small modification for THttpSocket.CompressDataAndWriteHeaders in SynCrtSock:

procedure THttpSocket.CompressDataAndWriteHeaders(const OutContentType: SockString;
  var OutContent: SockString);
var OutContentEncoding: SockString;
{$IFDEF MORMOT_TEST}
    iid: TSQLclient_iid;
    id: Int64;
{$ENDIF}
begin
{$IFDEF MORMOT_TEST}
  try
    iid := TSQLclient_iid.Create;
    iid.iid := GUIDToRawUTF8(RandomGUID);
    GIIDLock.Lock;
    if GIID <> nil then
    begin
      iid.creation_time := DBServerDateTime;
      iid.xkasy := GV_XKASA_ID;
      if OutContent <> '' then begin
        SetLength(TSQLclient_iidAccess(iid).fraw_body, Length(OutContent));
        Move(OutContent[1], TSQLclient_iidAccess(iid).fraw_body[1], Length(OutContent));
      end;
      id := GIID.AddWithBlobs(iid);
    end;
{$ENDIF}
  if integer(fCompressHeader)<>0 then begin
    OutContentEncoding := CompressDataAndGetHeaders(fCompressHeader,fCompress,
      OutContentType,OutContent);
    if OutContentEncoding<>'' then
        SockSend(['Content-Encoding: ',OutContentEncoding]);
  end;
  SockSend(['Content-Length: ',length(OutContent)]); // needed even 0
  if (OutContent<>'') and (OutContentType<>'') then
    SockSend(['Content-Type: ',OutContentType]);
{$IFDEF MORMOT_TEST}
    if GIID <> nil then
    begin
      if OutContent <> '' then begin
        SetLength(TSQLclient_iidAccess(iid).fsynlz_body, Length(OutContent));
        Move(OutContent[1], TSQLclient_iidAccess(iid).fsynlz_body[1], Length(OutContent));
        GIID.Update(iid, 'synlz_body');
      end;
    end;
    SockSend(['IID: ', iid.iid]);
    SockSend(['XCashID: ', GV_XKASA_ID]);
  finally
    GIIDLock.UnLock;
    iid.Free;
  end;
{$ENDIF}
end;

It was necessary to create some local database (available for my server on request) to get logs because the client was installed on android (ARMv6/7) so debugging process was hard. The chance to get the error is approximately 1/100000 requests -,-.

Bug 1:
For data like this (saved as raw_body):

{"ID":0,"sync_zlecenian_otw":false,"sync_zlecenian_otw_id":0,"zlecenian_otw":[],"zleceniap_otw":[],"wiadomosci_otw":[],"delete_wiadomosci_otw":[],"salarestauracji":0,"stoliki":[],"xkasy_top":0,"sender":"TfrmLogOut","client_inner_version":4,"operator":9999918876,"xkasy":999991380}

I got NULL (saved as synlz_body). That is mentioned "inlen=0" in topic http://synopse.info/forum/viewtopic.php?id=3223

Bug 2:
For data like this (saved as raw_body):

{"ID":0,"sync_zlecenian_otw":true,"sync_zlecenian_otw_id":0,"zlecenian_otw":[],"zleceniap_otw":[],"wiadomosci_otw":[],"delete_wiadomosci_otw":[],"salarestauracji":999992049683,"stoliki":[],"xkasy_top":0,"sender":"TTableBillFrm","client_inner_version":4,"operator":9999918876,"xkasy":999991380}

I got strange BLOB presented here as HEX view (saved as synlz_body). That is our ESynCrypto on server side {"Message":"TAESCFB.DecryptPKCS7: Invalid content"}  in topic http://synopse.info/forum/viewtopic.php?id=3223:

00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 

The bug occurs on client side: FPC Android ARM6 HF (hcSynShaAes for TSQLHttpClientWinSock).

Last edited by hnb (2016-04-12 12:21:26)


best regards,
Maciej Izak

Offline

#2 2016-04-12 15:04:57

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: SynCrtSock significant bug

I do not understand how to reproduce it.

Is the problem on the server side?
You state it occurs on the client side.
Do you mean on FPC Android ARM target?
We do not use this platform here, so it would be difficult to reproduce...

Which version of FPC do you use?
Did you try to disable hcSynShaAes?
Do the regression tests associated with SynLZ compression and Sha/Aes encryption pass on the FPC Android ARM target?

Offline

#3 2016-04-12 19:19:09

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

ab wrote:

I do not understand how to reproduce it.

I too can't reproduce this :\. This is log/data from production. The output of CompressDataAndWriteHeaders generally is ok. Maybe some parts of CompressDataAndGetHeaders are not thread safe?

Perhaps the presented data (especcialy second hex data) are some kind of tip?

ab wrote:

Is the problem on the server side?

Server is protected. It can ignore bad data from clients. Anyway mORMot is critical part of my system and I need to fix that.

ab wrote:

You state it occurs on the client side.
Do you mean on FPC Android ARM target?

Yes.

ab wrote:

We do not use this platform here, so it would be difficult to reproduce...

Even acquiring data for this topic was difficult. I don't have direct access to devices. It is big restaurant...

ab wrote:

Which version of FPC do you use?

FPC trunk r33326

ab wrote:

Did you try to disable hcSynShaAes?

No. I will do that tomorrow. The acquisition of new data can take a while.

ab wrote:

Do the regression tests associated with SynLZ compression and Sha/Aes encryption pass on the FPC Android ARM target?

No. That is not that simple.

Last edited by hnb (2016-04-12 19:20:28)


best regards,
Maciej Izak

Offline

#4 2016-04-14 07:58:57

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

hnb wrote:
ab wrote:

Did you try to disable hcSynShaAes?

No. I will do that tomorrow. The acquisition of new data can take a while.

With disabled hcSynShaAes works well. 0 errors. yikes

When hcSynShaAes is enabled on client side, I am almost sure that CompressDataAndGetHeaders is colliding with part where data are received from another thread from another TSQLHttpClientWinSock instance.

Some important details (maybe that can help to reproduce bug):

For Server side:

  aHttpServer := TMyHttpServer.Create(IntToStr(settings.httpPort),[aRestServer],'+',useHttpApiRegisteringURI, 7, secSynShaAes);

For my server is used lower "ServerThreadPoolCount" (I need to limit connections to PostgreSQL DB). I think that has important role in bug on client side, because for many requests for many clients some delay may occur (single client can have few connections: TSQLHttpClientWinSock per thread).

For Client side:

  GClient: TSQLHttpClientWinSock = nil;
  GClientUpdate: TSQLHttpClientWinSock = nil;
  GClientPing: TSQLHttpClientWinSock = nil;
  GClientIID: TSQLHttpClientWinSock = nil; 

When I've added new instance of TSQLHttpClientWinSock the bug occurs more often. For example GClientIID is used only to find our bug (but that additional instance of TSQLHttpClientWinSock is important to get bug more often) :

  fIID_Process := TSynBackgroundThreadProcess.Create('IID process', OnIID, 10000);

...
procedure TMainForm.OnIID(Sender: TSynBackgroundThreadProcess;
  Event: TWaitResult);
var LCodeError: Integer;
    LResult: RawUTF8;
begin
  LCodeError := GClientIID.CallBack(mPOST, 'SyncIID', '', LResult); 

I think when data are received and decompressed somewhere in GClientIID.CallBack and in the same time other data in other thread (for example by GClientUpdate) are compressed by CompressDataAndGetHeaders for send purpose then bug occurs.

Last edited by hnb (2016-04-14 07:59:40)


best regards,
Maciej Izak

Offline

#5 2016-04-14 08:50:59

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: SynCrtSock significant bug

Are you sure you make a thread-safe use of TSQLHttpClientWinSock, and that your server side is also thread-safe?

AFAIK the function CompressShaAes() is thread-safe.
A new TAESCFB instance is created at each call, and CompressSynLZ() is thread-safe by design.

Offline

#6 2016-04-14 10:50:16

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

ab wrote:

Are you sure you make a thread-safe use of TSQLHttpClientWinSock

Yes I'am 100% sure. Why? Because all wrongly encoded data comes from GClientUpdate, which is used only in one place in whole project in special thread.

          LValue := ObjectToJSON(LRequest);
          // the only line where is used GClientUpdate (not counting the declaration, constructor and destructor)
          LCodeError := GClientUpdate.CallBack(mPOST, 'SyncRequest', LValue, LResult); 

as was showed in first topic LValue is always valid, but encoded data sent to server is empty or strange $00 $FF $FF ... $FF block.

The thread is created only once at app start in static constructor so GClientUpdate.CallBack is 100% thread-safe it is never created twice.

ab wrote:

, and that your server side is also thread-safe?

I think yes, anyway the problem is on client side as was showed in first post:

Valid OutContent -> call CompressDataAndGetHeaders(fCompressHeader,fCompress,OutContentType,OutContent); -> OutContent is empty or filled by $00 $FF $FF ... $FF.

but this is true only for hcSynShaAes. Since hcSynShaAes is disabled on client side I have the longest period without errors ever.

hypothetical even if something is wrong on server side AFAIK that can't have affect on CompressDataAndGetHeaders on client side in presented way.

Last edited by hnb (2016-04-14 10:52:06)


best regards,
Maciej Izak

Offline

#7 2016-04-14 10:54:56

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

Maybe I need to lock ALL TSQLHttpClientWinSock, maybe TSQLHttpClientWinSock per thread is bad approach ?


best regards,
Maciej Izak

Offline

#8 2016-04-14 15:45:52

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: SynCrtSock significant bug

No, one TSQLHttpClientWinSock per thread is perfectly safe.
We use it on production since years.

I still don't get where your error comes from.

Offline

#9 2016-04-14 17:07:37

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

after small research I think that the buggy is EncryptPKCS7Buffer for FPC ARM (SynCrypto.pas) (so probably it is not related to threads). Bug 2 has more sense in this context.

I see that is used Random (maybe this is our bug, for some random values combination?). Maybe usage of

Td0, Td1, Td2, Td3, Te0, Te1, Te2, Te3: array[byte] of cardinal;

is somehow wrong.

I will try to create test on my ARM/Android device with many requests to server...

Btw. I can confirm since the hcSynShaAes is disabled, all works fine in restaurant... 0 errors.


best regards,
Maciej Izak

Offline

#10 2016-04-14 18:43:02

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: SynCrtSock significant bug

FillRandom() used those Td0...Te3 to get some values to compute some random.
So even if it points to something wrong, it should not be anything but perhaps 0.
And the fact that all arrays are contiguous is checked using asserts.

Note that latest versions of SynCrypto changed the FillRandom() to use http://blog.synopse.info/post/AES-CSPRNG

I do no think the problem comes from Td0..Te3...

Offline

#11 2016-04-14 19:19:20

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

Thanks for info. I am a little depressed. I need to find what is wrong. Maybe it is some FPC bug :\. Anyway we need wait to tomorrow.

Btw. maybe is worth to use custom random instead of system random for:

  { code from SynCrypto }
  Head.SomeSalt := random(MaxInt);

as is stated in FPC doc:

The Free Pascal implementation of the Random routine uses the Mersenne Twister to simulate randomness. This implementation has a better statistical distribution than for example a Linear Congruential generator algorithm, but is considerably slower than the latter. If speed is an issue, then alternate random number generators should be considered.


best regards,
Maciej Izak

Offline

#12 2016-04-14 19:54:11

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,206
Website

Re: SynCrtSock significant bug

Random(maxInt) here is not a problem being a bit slow, since the AES process in this TAESFull.EncodeDecode() method would definitively be much slower.

In all cases, TAESFull is not used anywhere in the framewok, AFAIR.

Offline

#13 2016-04-15 12:18:55

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

I see that in SynLZcompress1pas is mentioned size (255 + 16) similar to block presented in bug2 (272 bytes one $00 byte and 255 + 16 $FF bytes). Maybe is some nonstandard bitwise-operations in FPC for ARM (http://stackoverflow.com/questions/9044 … ions-to-me). I can try to get asm arm code for SynLZcompress1pas .

Additionally all regression tests associated with SynLZ compression and Sha/Aes encryption pass without error.

I can provide ASM for ARM generated by FPC, but maybe is no sense for that... Seems like is almost impossible to find what is wrong :\ . The last what I can do is to update mORMot to latest versions of SynCrypto and latest FPC trunk.

I can't reproduce bug outside production case.


best regards,
Maciej Izak

Offline

#14 2016-04-28 08:34:16

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

Notice: No difference for AES-CSPRNG (mORMot v1.18.2589) and for current FPC trunk (r33539). Empty body and ESynCrypto still occurs.


best regards,
Maciej Izak

Offline

#15 2017-03-29 13:10:01

hnb
Member
Registered: 2015-06-15
Posts: 291

Re: SynCrtSock significant bug

After many months of tracking logs from production server (~ 2 millions of http calls to server with SynLZ compression's per month from around ~10 devices), probably I discovered the reason of bug. Since in production environment I have less count of Samsung Galaxy Tab (IIRC Galaxy Tab 3 which has Exynos chipset) the bug occurs less often. All detected anomalies comes only from Samsung devices. From other device like "MyPhone hammer" (which has for sure different ARM processor model) I don't have problems with compression. The most important info about "impossible bug":

http://www.mono-project.com/news/2016/0 … 64-icache/
https://www.theregister.co.uk/2016/09/1 … e_gcc_bug/

ab, did you think that above problem could have influence on SynLZ compression for ARM?

Last edited by hnb (2017-03-29 13:53:01)


best regards,
Maciej Izak

Offline

Board footer

Powered by FluxBB