#1 2016-01-13 20:24:54

bigstar
Member
Registered: 2014-01-31
Posts: 5

SynCommons.pas and UpperCopy255Buf() oddities

First a minor documentation problem, the comments below state that 256 chars are copied into the dest buffer, but dest is actually limited to much less space; how much less depends on whether UpperCopy255BufPas() or UpperCopy255BufSSE42() is used

/// copy source^ into a 256 chars dest^ buffer with 7 bits upper case conversion
// - this version is written in optimized pascal
// - you should not have to call this function, but rely on UpperCopy255Buf()
// - returns final dest pointer
// - will copy up to 255 AnsiChar (expect the dest buffer to be defined e.g. as
// array[byte] of AnsiChar on the caller stack)

I found that passing a string of HelloWord repeated 26 times (260 characters total) to:

UpperCopy255BufPas() is truncated to 252 characters, with the last 4 bytes containing random bytes because the tmp buffer is not zeroed out before use and the result is not null terminated

UpperCopy255BufSSE42() appears to truncate to 240 characters, with the last 16 bytes containing random bytes

I noticed the following routines within SynCommons.pas do something like

  UpperCopy255(Up,aStartName)^ := #0;

or

  PWord(UpperCopy255(UpperSection,SectionName))^ := ord(']');

If the source is 'hello' and length is 5 and we do something like this

     UpperCopy255BufPas(tmp, pUTF8Char(S), Len )^ := #0;
or
     UpperCopy255BufSSE42(tmp, pUTF8Char(S), Len )^ := #0;

we get the same result HELLO#0

but now if the source is HelloWord repeated 26 times and we call the code above the results are different

HELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWOR#0<followed by random bytes>

HELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORLDHELLOWORL#0<followed by random bytes>

Is this the intended behavior? If so I think it should be documented somewhere.

Also there is no way to determine the exact length of the result.

Last edited by bigstar (2016-01-13 20:27:50)

Offline

#2 2016-01-13 20:37:02

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: SynCommons.pas and UpperCopy255Buf() oddities

Yes, this is the intended behavior: fill an AnsiChar small buffer, ending with a #0.
It is just to be executed to create an uppercase version of some text in an internal temporary buffer, for hashing or fast case-insensitive comparison.

It is to be used on small text, for at least two purposes:
* case-insensitive hash computation (colision due to length truncation is not an issue)
* case-insensitive key search, e.g. within an .INI file content or TRawUTF8List.IndexOfName or such.

Either UpperCopy255BufPas or UpperCopy255BufSSE42 is used on the same PC, so is consistent, even if the truncation length may diverse.

Purpose of those functions is NOT to convert a text into its exact uppercase version, of any length.

Offline

#3 2016-01-13 21:11:17

bigstar
Member
Registered: 2014-01-31
Posts: 5

Re: SynCommons.pas and UpperCopy255Buf() oddities

Based on most of the places this function is used it might make sense that is intended for small text, but then there are functions such as HashAnsiStringI() where this might not be clear at all.

So assuming the result doesn't need to be null terminated is it safe to assume that for consistent results the source should be 240 characters or less?

Last edited by bigstar (2016-01-13 21:14:46)

Offline

Board footer

Powered by FluxBB