logo.png
mORMot2 API Reference

mormot.core.unicode.pas unit

Purpose: Framework Core Low-Level Unicode UTF-8 UTF-16 Ansi Conversion
- this unit is a part of the Open Source Synopse mORMot framework 2, licensed under a MPL/GPL/LGPL three license - see LICENSE.md

1.1. Units used in the mormot.core.unicode unit

Unit NameDescription
mormot.core.baseFramework Core Shared Types and RTL-like Functions
mormot.core.osFramework Core Low-Level Wrappers to the Operating-System API

1.2. mormot.core.unicode class hierarchy

TObjectTUtf8TableTSynAnsiConvertTSynAnsiUtf8TSynAnsiUtf16TSynAnsiFixedWidthExceptionWithPropsESynUnicode
mormot.core.unicode class hierarchy

1.3. Objects implemented in the mormot.core.unicode unit

ObjectsDescription
ESynUnicodeException raised by this unit in case of fatal conversion issue
TSynAnsiConvertAn abstract class to handle Ansi to/from Unicode translation
TSynAnsiFixedWidthA class to handle Ansi to/from Unicode translation of fixed width encoding (i.e. non MBCS)
TSynAnsiUtf16A class to handle UTF-16 to/from Unicode translation
TSynAnsiUtf8A class to handle UTF-8 to/from Unicode translation
TUtf8Table

1.3.1. TUtf8Table

TUtf8Table = object(TObject)

function GetHighUtf8Ucs4(var U: PUtf8Char): Ucs4CodePoint;

Retrieve a >127 UCS4 CodePoint from UTF-8


1.3.2. ESynUnicode

ESynUnicode = class(ExceptionWithProps)

Exception raised by this unit in case of fatal conversion issue


1.3.3. TSynAnsiConvert

TSynAnsiConvert = class(TObject)

An abstract class to handle Ansi to/from Unicode translation
- implementations of this class will handle efficiently all Code Pages
- this default implementation will use the Operating System APIs
- you should not create your own class instance by yourself, but should better retrieve an instance using TSynAnsiConvert.Engine(), which will initialize either a TSynAnsiFixedWidth or a TSynAnsiConvert instance on need


constructor Create(aCodePage: cardinal); reintroduce; virtual;

Initialize the internal conversion engine


function AnsiBufferToUnicode(Dest: PWideChar; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PWideChar; overload; virtual;

Direct conversion of a PAnsiChar buffer into an Unicode buffer
- Dest^ buffer must be reserved with at least SourceChars*2 bytes
- this default implementation will use the Operating System APIs
- will append a trailing #0 to the returned PWideChar, unless NoTrailingZero is set


function AnsiBufferToUtf8(Dest: PUtf8Char; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PUtf8Char; overload; virtual;

Direct conversion of a PAnsiChar buffer into a UTF-8 encoded buffer
- Dest^ buffer must be reserved with at least SourceChars*3 bytes
- will append a trailing #0 to the returned PUtf8Char, unless NoTrailingZero is set
- this default implementation will use the Operating System APIs


function AnsiToAnsi(From: TSynAnsiConvert; Source: PAnsiChar; SourceChars: cardinal): RawByteString; overload;

Convert any Ansi buffer (providing a From converted) into Ansi Text


function AnsiToAnsi(From: TSynAnsiConvert; const Source: RawByteString): RawByteString; overload;

Convert any Ansi Text (providing a From converted) into Ansi Text


function AnsiToRawUnicode( Source: PAnsiChar; SourceChars: cardinal): RawUnicode; overload; virtual;

Convert any Ansi buffer into an Unicode String
- returns a value using our RawUnicode kind of string


function AnsiToRawUnicode(const AnsiText: RawByteString): RawUnicode; overload;

Convert any Ansi Text into an UTF-16 Unicode String
- returns a value using our RawUnicode kind of string


function AnsiToUnicodeString(const Source: RawByteString): SynUnicode; overload;

Convert any Ansi buffer into an Unicode String
- returns a SynUnicode, i.e. Delphi 2009+ UnicodeString or a WideString


function AnsiToUnicodeString( Source: PAnsiChar; SourceChars: cardinal): SynUnicode; overload;

Convert any Ansi buffer into an Unicode String
- returns a SynUnicode, i.e. Delphi 2009+ UnicodeString or a WideString


function AnsiToUtf8(const AnsiText: RawByteString): RawUtf8; virtual;

Convert any Ansi Text into an UTF-8 encoded String
- internally calls AnsiBufferToUtf8 virtual method


class function Engine(aCodePage: cardinal): TSynAnsiConvert;

Returns the engine corresponding to a given code page
- a global list of TSynAnsiConvert instances is handled by the unit - therefore, caller should not release the returned instance
- will return nil in case of unhandled code page
- is aCodePage is 0, will return CurrentAnsiConvert value


function RawUnicodeToAnsi(const Source: RawUnicode): RawByteString;

Convert any Unicode-encoded String into Ansi Text
- internally calls UnicodeBufferToAnsi virtual method


function UnicodeBufferToAnsi(Dest: PAnsiChar; Source: PWideChar; SourceChars: cardinal): PAnsiChar; overload; virtual;

Direct conversion of an Unicode buffer into a PAnsiChar buffer
- Dest^ buffer must be reserved with at least SourceChars * 3 bytes
- will detect and ignore any trailing UTF-16LE BOM marker
- this default implementation will rely on the Operating System for all non ASCII-7 chars


function UnicodeBufferToAnsi(Source: PWideChar; SourceChars: cardinal): RawByteString; overload; virtual;

Direct conversion of an Unicode buffer into an Ansi Text


function UnicodeStringToAnsi(const Source: SynUnicode): RawByteString;

Convert any Unicode-encoded String into Ansi Text
- internally calls UnicodeBufferToAnsi virtual method


function Utf8BufferToAnsi(Source: PUtf8Char; SourceChars: cardinal): RawByteString; overload;

Convert any UTF-8 encoded buffer into Ansi Text
- internally calls Utf8BufferToAnsi virtual method


function Utf8BufferToAnsi(Dest: PAnsiChar; Source: PUtf8Char; SourceChars: cardinal): PAnsiChar; overload; virtual;

Direct conversion of an UTF-8 encoded buffer into a PAnsiChar buffer
- Dest^ buffer must be reserved with at least SourceChars bytes
- no trailing #0 is appended to the buffer


function Utf8ToAnsi(const u: RawUtf8): RawByteString; virtual;

Convert any UTF-8 encoded String into Ansi Text
- internally calls Utf8BufferToAnsi virtual method


function Utf8ToAnsiBuffer2K(const S: RawUtf8; Dest: PAnsiChar; DestSize: integer): integer;

Direct conversion of a UTF-8 encoded string into a WinAnsi <2KB buffer
- will truncate the destination string to DestSize bytes (including the trailing #0), with a maximum handled size of 2048 bytes
- returns the number of bytes stored in Dest^ (i.e. the position of #0)


procedure AnsiBufferToRawUtf8(Source: PAnsiChar; SourceChars: cardinal; out Value: RawUtf8); overload; virtual;

Direct conversion of a PAnsiChar buffer into a UTF-8 encoded string
- will call AnsiBufferToUnicode() overloaded virtual method


procedure Utf8BufferToAnsi(Source: PUtf8Char; SourceChars: cardinal; var result: RawByteString); overload; virtual;

Convert any UTF-8 encoded buffer into Ansi Text
- internally calls Utf8BufferToAnsi virtual method


property AnsiCharShift: byte read fAnsiCharShift;

Corresponding length binary shift used for worst conversion case


property CodePage: cardinal read fCodePage;

Corresponding code page


1.3.4. TSynAnsiFixedWidth

TSynAnsiFixedWidth = class(TSynAnsiConvert)

A class to handle Ansi to/from Unicode translation of fixed width encoding (i.e. non MBCS)
- this class will handle efficiently all Code Page availables without MBCS encoding - like WinAnsi (1252) or Russian (1251)
- it will use internal fast look-up tables for such encodings
- this class could take some time to generate, and will consume more than 64 KB of memory: you should not create your own class instance by yourself, but should better retrieve an instance using TSynAnsiConvert.Engine(), which will initialize either a TSynAnsiFixedWidth or a TSynAnsiConvert instance on need
- this class has some additional methods (e.g. IsValid*) which take advantage of the internal lookup tables to provide some fast process


constructor Create(aCodePage: cardinal); override;

Initialize the internal conversion engine


function AnsiBufferToUnicode(Dest: PWideChar; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PWideChar; override;

Direct conversion of a PAnsiChar buffer into an Unicode buffer
- Dest^ buffer must be reserved with at least SourceChars*2 bytes
- will append a trailing #0 to the returned PWideChar, unless NoTrailingZero is set


function AnsiBufferToUtf8(Dest: PUtf8Char; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PUtf8Char; override;

Direct conversion of a PAnsiChar buffer into a UTF-8 encoded buffer
- Dest^ buffer must be reserved with at least SourceChars*3 bytes
- will append a trailing #0 to the returned PUtf8Char, unless NoTrailingZero is set


function AnsiToRawUnicode(Source: PAnsiChar; SourceChars: cardinal): RawUnicode; override;

Convert any Ansi buffer into an Unicode String
- returns a value using our RawUnicode kind of string


function IsValidAnsi(WideText: PWideChar; Length: PtrInt): boolean; overload;

Return TRUE if the supplied unicode buffer only contains characters of the corresponding Ansi code page
- i.e. if the text can be displayed using this code page


function IsValidAnsi(WideText: PWideChar): boolean; overload;

Return TRUE if the supplied unicode buffer only contains characters of the corresponding Ansi code page
- i.e. if the text can be displayed using this code page


function IsValidAnsiU(Utf8Text: PUtf8Char): boolean;

Return TRUE if the supplied UTF-8 buffer only contains characters of the corresponding Ansi code page
- i.e. if the text can be displayed using this code page


function IsValidAnsiU8Bit(Utf8Text: PUtf8Char): boolean;

Return TRUE if the supplied UTF-8 buffer only contains 8-bit characters of the corresponding Ansi code page
- i.e. if the text can be displayed with only 8-bit unicode characters (e.g. no "tm" or such) within this code page


function UnicodeBufferToAnsi(Dest: PAnsiChar; Source: PWideChar; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an Unicode buffer into a PAnsiChar buffer
- Dest^ buffer must be reserved with at least SourceChars * 3 bytes
- will detect and ignore any trailing UTF-16LE BOM marker
- this overridden version will use internal lookup tables for fast process


function Utf8BufferToAnsi(Dest: PAnsiChar; Source: PUtf8Char; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an UTF-8 encoded buffer into a PAnsiChar buffer
- Dest^ buffer must be reserved with at least SourceChars bytes
- no trailing #0 is appended to the buffer
- non Ansi compatible characters are replaced as '?'


function WideCharToAnsiChar(wc: cardinal): integer;

Conversion of a wide char into the corresponding Ansi character
- return -1 for an unknown WideChar in the current code page


property AnsiToWide: TWordDynArray read fAnsiToWide;

Direct access to the Ansi-To-Unicode lookup table
- use this array like AnsiToWide: array[byte] of word


property WideToAnsi: TByteDynArray read fWideToAnsi;

Direct access to the Unicode-To-Ansi lookup table
- use this array like WideToAnsi: array[word] of byte
- any unhandled WideChar will return ord('?')


1.3.5. TSynAnsiUtf8

TSynAnsiUtf8 = class(TSynAnsiConvert)

A class to handle UTF-8 to/from Unicode translation
- match the TSynAnsiConvert signature, for code page CP_UTF8
- this class is mostly a non-operation for conversion to/from UTF-8


constructor Create(aCodePage: cardinal); override;

Initialize the internal conversion engine


function AnsiBufferToUnicode(Dest: PWideChar; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PWideChar; override;

Direct conversion of a PAnsiChar UTF-8 buffer into an Unicode buffer
- Dest^ buffer must be reserved with at least SourceChars*2 bytes
- will append a trailing #0 to the returned PWideChar, unless NoTrailingZero is set


function AnsiBufferToUtf8(Dest: PUtf8Char; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PUtf8Char; override;

Direct conversion of a PAnsiChar UTF-8 buffer into a UTF-8 encoded buffer
- Dest^ buffer must be reserved with at least SourceChars*3 bytes
- will append a trailing #0 to the returned PUtf8Char, unless NoTrailingZero is set


function AnsiToRawUnicode(Source: PAnsiChar; SourceChars: cardinal): RawUnicode; override;

Convert any UTF-8 Ansi buffer into an Unicode String
- returns a value using our RawUnicode kind of string


function AnsiToUtf8(const AnsiText: RawByteString): RawUtf8; override;

Convert any Ansi Text into an UTF-8 encoded String
- directly assign the input as result, since no conversion is needed


function UnicodeBufferToAnsi(Source: PWideChar; SourceChars: cardinal): RawByteString; override;

Direct conversion of an Unicode buffer into an Ansi Text


function UnicodeBufferToAnsi(Dest: PAnsiChar; Source: PWideChar; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an Unicode buffer into a PAnsiChar UTF-8 buffer
- will detect and ignore any trailing UTF-16LE BOM marker
- Dest^ buffer must be reserved with at least SourceChars * 3 bytes


function Utf8BufferToAnsi(Dest: PAnsiChar; Source: PUtf8Char; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an UTF-8 encoded buffer into a PAnsiChar UTF-8 buffer
- Dest^ buffer must be reserved with at least SourceChars bytes
- no trailing #0 is appended to the buffer


function Utf8ToAnsi(const u: RawUtf8): RawByteString; override;

Convert any UTF-8 encoded String into Ansi Text
- directly assign the input as result, since no conversion is needed


procedure AnsiBufferToRawUtf8(Source: PAnsiChar; SourceChars: cardinal; out Value: RawUtf8); override;

Direct conversion of a PAnsiChar buffer into a UTF-8 encoded string


procedure Utf8BufferToAnsi(Source: PUtf8Char; SourceChars: cardinal; var result: RawByteString); override;

Convert any UTF-8 encoded buffer into Ansi Text


1.3.6. TSynAnsiUtf16

TSynAnsiUtf16 = class(TSynAnsiConvert)

A class to handle UTF-16 to/from Unicode translation
- match the TSynAnsiConvert signature, for code page CP_UTF16
- even if UTF-16 is not an Ansi format, code page CP_UTF16 may have been used to store UTF-16 encoded binary content
- this class is mostly a non-operation for conversion to/from Unicode


constructor Create(aCodePage: cardinal); override;

Initialize the internal conversion engine


function AnsiBufferToUnicode(Dest: PWideChar; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PWideChar; override;

Direct conversion of a PAnsiChar UTF-16 buffer into an Unicode buffer
- Dest^ buffer must be reserved with at least SourceChars*2 bytes
- will append a trailing #0 to the returned PWideChar, unless NoTrailingZero is set


function AnsiBufferToUtf8(Dest: PUtf8Char; Source: PAnsiChar; SourceChars: cardinal; NoTrailingZero: boolean = false): PUtf8Char; override;

Direct conversion of a PAnsiChar UTF-16 buffer into a UTF-8 encoded buffer
- Dest^ buffer must be reserved with at least SourceChars*3 bytes
- will append a trailing #0 to the returned PUtf8Char, unless NoTrailingZero is set


function AnsiToRawUnicode(Source: PAnsiChar; SourceChars: cardinal): RawUnicode; override;

Convert any UTF-16 Ansi buffer into an Unicode String
- returns a value using our RawUnicode kind of string


function UnicodeBufferToAnsi(Dest: PAnsiChar; Source: PWideChar; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an Unicode buffer into a PAnsiChar UTF-16 buffer
- Dest^ buffer must be reserved with at least SourceChars * 3 bytes


function Utf8BufferToAnsi(Dest: PAnsiChar; Source: PUtf8Char; SourceChars: cardinal): PAnsiChar; override;

Direct conversion of an UTF-8 encoded buffer into a PAnsiChar UTF-16 buffer
- Dest^ buffer must be reserved with at least SourceChars bytes
- no trailing #0 is appended to the buffer


1.4. Types implemented in the mormot.core.unicode unit

1.4.1. PNormTable

PNormTable = ^TNormTable;

Pointer to a lookup table used for fast case conversion


1.4.2. PNormTableByte

PNormTableByte = ^TNormTableByte;

Pointer to a lookup table used for fast case conversion


1.4.3. PTextByteSet

PTextByteSet = ^TTextByteSet;

Points to an Ordinal lookup table used for branch-less text parsing


1.4.4. PTextCharSet

PTextCharSet = ^TTextCharSet;

Points to an AnsiChar lookup table used for branch-less text parsing


1.4.5. TBomFile

TBomFile = ( bomNone, bomUnicode, bomUtf8 );

Text file layout, as returned by BomFile() and StringFromBomFile()
- bomNone means there was no BOM recognized
- bomUnicode stands for UTF-16 LE encoding (as on Windows products)
- bomUtf8 stands for a UTF-8 BOM (as on Windows products)


1.4.6. TCharConversionFlags

TCharConversionFlags = set of ( ccfNoTrailingZero, ccfReplacementCharacterForUnmatchedSurrogate);

Option set for RawUnicodeToUtf8() conversion


1.4.7. TIdemPropNameUSameLen

TIdemPropNameUSameLen = function(P1, P2: pointer; P1P2Len: PtrInt): boolean;

Delphi does not like to inline goto


1.4.8. TNormTable

TNormTable = packed array[AnsiChar] of AnsiChar;

Lookup table used for fast case conversion


1.4.9. TNormTableByte

TNormTableByte = packed array[byte] of byte;

Lookup table used for fast case conversion


1.4.10. TOnStringTranslate

TOnStringTranslate = procedure(var English: string) of object;

A generic callback, which can be used to translate some text on the fly
- maps procedure TLanguageFile.Translate(var English: string) signature as defined in mORMoti18n.pas
- can be used e.g. for TSynMustache's {{"English text}} callback


1.4.11. TSynAnsicharSet

TSynAnsicharSet = set of AnsiChar;

Used to store a set of 8-bit encoded characters


1.4.12. TSynByteSet

TSynByteSet = set of byte;

Used to store a set of 8-bit unsigned integers


1.4.13. TTextByteSet

TTextByteSet = array[byte] of TTextChar;

Defines an Ordinal lookup table used for branch-less text parsing


1.4.14. TTextChar

TTextChar = set of ( tcNot01013, tc1013, tcCtrlNotLF, tcCtrlNot0Comma, tcWord, tcIdentifierFirstChar, tcIdentifier, tcUriUnreserved);

Character categories for text linefeed/word/identifier/uri parsing
- using such a set compiles into TEST [MEM], IMM so is more efficient than a regular set of AnsiChar which generates much slower BT [MEM], IMM
- the same 256-byte memory will also be reused from L1 CPU cache during the parsing of complex input


1.4.15. TTextCharSet

TTextCharSet = array[AnsiChar] of TTextChar;

Defines an AnsiChar lookup table used for branch-less text parsing


1.4.16. TUtf8Compare

TUtf8Compare = function(P1, P2: PUtf8Char): PtrInt;

Function prototype used internally for UTF-8 buffer comparison
- also used e.g. in mormot.core.variants unit


1.5. Constants implemented in the mormot.core.unicode unit

1.5.1. BOM_UTF16LE

BOM_UTF16LE = #$FEFF;

UTF-16LE BOM WideChar marker, as existing e.g. in some UTF-16 Windows files


1.5.2. UNICODE_REPLACEMENT_CHARACTER

UNICODE_REPLACEMENT_CHARACTER = $fffd;

Replace any incoming character whose value is unrepresentable in Unicode
- set e.g. by GetUtf8WideChar(), Utf8UpperReference() or RawUnicodeToUtf8() when ccfReplacementCharacterForUnmatchedSurrogate is set
- encoded as $ef $bf $bd bytes in UTF-8


1.6. Functions or procedures implemented in the mormot.core.unicode unit

Functions or proceduresDescription
AddRawUtf8Add the Value to Values[], with an external count variable, for performance
AddRawUtf8True if Value was added successfully in Values[]
AddRawUtf8Add Value[] items to Values[], with an external count variable, for performance
AddRawUtf8Add Value[] items to Values[]
AddSortedRawUtf8Add a RawUtf8 value in an alphaticaly sorted dynamic array of RawUtf8
AddStringAdd the Value to Values[] string array
Ansi7ToStringConvert any Ansi 7-bit encoded String into a RTL string
Ansi7ToStringConvert any Ansi 7-bit encoded String into a RTL string
Ansi7ToStringConvert any Ansi 7-bit encoded String into a RTL string
AnsiBufferToTempUtf8Convert any Ansi memory buffer into UTF-8, using a TSynTempBuffer if needed
AnsiCharToUtf8Convert an AnsiChar buffer (of a given code page) into a UTF-8 string
AnsiICompFast WinAnsi comparison using the NormToUpper[] array for all 8-bit values
AnsiICompWFast case-insensitive Unicode comparison handling ASCII 7-bit chars
AnsiToStringConvert an AnsiString (of a given code page) into a RTL string
AnsiToUtf8Convert an AnsiString (of a given code page) into a UTF-8 string
AnyAnsiToUtf8Direct conversion of an AnsiString with an unknown code page into an UTF-8 encoded String
AnyAnsiToUtf8Direct conversion of an AnsiString with an unknown code page into an UTF-8 encoded String
AnyTextFileToRawUtf8Read a File content into a RawUtf8, detecting any leading BOM
AnyTextFileToStringRead a File content into a RTL string, detecting any leading BOM
AnyTextFileToSynUnicodeRead a File content into SynUnicode string, detecting any leading BOM
AppendShortCommaFast append some UTF-8 text into a ShortString, with an ending ','
BomFileCheck the file BOM at the beginning of a file buffer
CamelCaseConvert a string into an human-friendly CamelCase identifier
CamelCaseConvert a string into an human-friendly CamelCase identifier
CaseCopyLow-level function called when inlining UpperCase(Copy) and LowerCase(Copy)
CaseSelfLow-level function called when inlining UpperCaseSelf and LowerCaseSelf
CodePageToTextReturn a code page number into human-friendly text
ContainsUtf8Return true if up^ is contained inside the UTF-8 buffer p^
ConvertCaseUtf8Fast conversion of the supplied text into 8-bit case sensitivity
DeleteRawUtf8Delete a RawUtf8 item in a dynamic array of RawUtf8;
DeleteRawUtf8Delete a RawUtf8 item in a dynamic array of RawUtf8
DetectRawUtf8Detect UTF-8 content and mark the variable with the CP_UTF8 codepage
EndWithCheck case-insensitive matching ending of text in upTextEnd
EndWithArrayReturns the index of a case-insensitive matching ending of p^ in upArray[]
EndWithExactCheck case-sensitive matching ending of text in ending
FastFindIndexedPUtf8CharRetrieve the index of a PUtf8Char in a PUtf8Char array via a sort indexed
FastFindPUtf8CharSortedRetrieve the index where is located a PUtf8Char in a sorted PUtf8Char array
FastFindPUtf8CharSortedRetrieve the index where is located a PUtf8Char in a sorted PUtf8Char array
FastFindUpperPUtf8CharSortedRetrieve the index where is located a PUtf8Char in a sorted uppercase array
FastLocatePUtf8CharSortedRetrieve the index where to insert a PUtf8Char in a sorted PUtf8Char array
FastLocatePUtf8CharSortedRetrieve the index where to insert a PUtf8Char in a sorted PUtf8Char array
FillZeroFill all bytes of this UTF-16 string with zeros, i.e. 'toto' -> #0#0#0#0
FillZeroFill all bytes of this dynamic array of bytes with zeros
FillZeroFill all bytes of this UTF-8 string with zeros, i.e. 'toto' -> #0#0#0#0
FillZeroFill all bytes of this memory buffer with zeros, i.e. 'toto' -> #0#0#0#0
FillZeroFill all bytes of this UTF-8 string with zeros, i.e. 'toto' -> #0#0#0#0
FindAnsiReturn true if UpperValue (Ansi) is contained in A^ (Ansi)
FindNameValueSearch and returns a value from its uppercased named entry
FindNameValueSearch for a value from its uppercased named entry
FindNameValuePointerSearch and returns a PUtf8Char value from its uppercased named entry
FindNextUtf8WordBeginPoints to the beginning of the next word stored in U
FindRawUtf8Low-level efficient search of Value in Values[]
FindRawUtf8Return the index of Value in Values[], -1 if not found
FindRawUtf8Return the index of Value in Values[], -1 if not found
FindShortStringListExactFast search of an exact case-insensitive match of a RTTI's PShortString array
FindShortStringListTrimLowerCaseFast case-insensitive search of a left-trimmed lowercase match of a RTTI's PShortString array
FindShortStringListTrimLowerCaseExactFast case-sensitive search of a left-trimmed lowercase match of a RTTI's PShortString array
FindUnicodeReturn true if Upper (Unicode encoded) is contained in U^ (UTF-8 encoded)
FindUtf8Return true if UpperValue (Ansi) is contained in U^ (UTF-8 encoded)
GetCaptionFromPCharLenUnCamelCase and translate a char buffer
GetHighUtf8Ucs4Internal function, used to retrieve a >127 US4 CodePoint from UTF-8
GetLineContainsReturns TRUE if the supplied uppercased text is contained in the text buffer
GetLineSizeCompute the line length from source array of chars
GetLineSizeSmallerThanReturns true if the line length from source array of chars is not less than the specified count
GetNextFieldPropRetrieve the next SQL-like identifier within the UTF-8 buffer
GetNextFieldPropSameLineRetrieve the next identifier within the UTF-8 buffer on the same line
GetNextLineExtract a line from source array of chars
GetNextStringLineToRawUnicodeReturn next string delimited with #13#10 from P, nil if no more
GetNextUtf8UpperRetrieve the next UCS4 CodePoint stored in U, then update the U pointer
GetUtf8WideCharDecode UTF-16 WideChar from UTF-8 input buffer
GotoEndOfQuotedStringGet the next character after a quoted buffer
GotoNextNotSpaceGet the next character not in [#1..' ']
GotoNextNotSpaceSameLineGet the next character not in [#9,' ']
GotoNextSpaceGet the next character in [#0..' ']
IdemFileExtReturns true if the file name extension contained in p^ is the same same as extup^
IdemFileExtsReturns matching file name extension index as extup^
IdemPCharReturns true if the beginning of p^ is the same as up^
IdemPCharReturns true if the beginning of p^ is the same as up^
IdemPCharAndGetNextLineReturn true if IdemPChar(source,searchUp), and go to the next line of source
IdemPCharArrayReturns the index of a matching beginning of p^ in upArray[]
IdemPCharArrayBy2Returns the index of a matching beginning of p^ in upArray two characters
IdemPCharUReturns true if the beginning of p^ is the same as up^
IdemPCharWReturns true if the beginning of p^ is same as up^
IdemPCharWithoutWhiteSpaceReturns true if the beginning of p^ is the same as up^, ignoring white spaces
IdemPPCharReturns the index of a matching beginning of p^ in nil-terminated up^ array
IdemPropNameCase insensitive comparison of ASCII 7-bit identifiers
IdemPropNameCase insensitive comparison of ASCII 7-bit identifiers
IdemPropNameCase insensitive comparison of ASCII 7-bit identifiers
IdemPropNameUCase insensitive comparison of ASCII 7-bit identifiers
IdemPropNameUCase insensitive comparison of ASCII 7-bit identifiers
IdemPropNameUSameLenNotNullCase insensitive comparison of ASCII 7-bit identifiers of same length
IsCaseSensitiveCheck if the supplied text has some case-insentitive 'a'..'z','A'..'Z' chars
IsCaseSensitiveCheck if the supplied text has some case-insentitive 'a'..'z','A'..'Z' chars
IsFixedWidthCodePageCheck if a code page is known to be of fixed width, i.e. not MBCS
IsValidUtf8Returns TRUE if the supplied buffer has valid UTF-8 encoding
IsValidUtf8Returns TRUE if the supplied buffer has valid UTF-8 encoding
IsValidUtf8WithoutControlCharsReturns TRUE if the supplied buffer has valid UTF-8 encoding with no #0..#31 control characters
IsValidUtf8WithoutControlCharsReturns TRUE if the supplied buffer has valid UTF-8 encoding with no #1..#31 control characters
IsVoidCheck all character within text are spaces or control chars
IsWinAnsiReturn TRUE if the supplied unicode buffer only contains WinAnsi characters
IsWinAnsiReturn TRUE if the supplied unicode buffer only contains WinAnsi characters
IsWinAnsiUReturn TRUE if the supplied UTF-8 buffer only contains WinAnsi characters
IsWinAnsiU8BitReturn TRUE if the supplied UTF-8 buffer only contains WinAnsi 8-bit characters
IsZeroReturns TRUE if Value is nil or all supplied Values[] equal ''
LeftUReturns n leading characters
LowerCaseFast conversion of the supplied text into lowercase
LowerCaseCopyFast conversion of the supplied text into lowercase
LowerCaseSelfFast in-place conversion of the supplied variable text into lowercase
LowerCaseSynUnicodeUse the RTL to convert the SynUnicode text to LowerCase
LowerCaseUFast conversion of the supplied text into 8-bit lowercase
LowerCaseUnicodeAccurate conversion of the supplied UTF-8 content into the corresponding lower-case Unicode characters
NextNotSpaceCharIsCheck if the next character not in [#1..' '] matchs a given value
NextUtf8Ucs4Get the UCS4 CodePoint stored in P^ (decode UTF-8 if necessary)
OnlyCharReturns the supplied text content, without any other char than specified
PosCharAnyFast retrieve the position of any value of a given set of characters
PosExIA ASCII-7 case-insensitive version of PosEx()
PosExIA case-insensitive version of PosEx() with a specified lookup table
PosExIPasInternal function used when inlining PosExI()
PosIA non case-sensitive RawUtf8 version of Pos()
PosIUA non case-sensitive RawUtf8 version of Pos()
PropNameSanitizeTry to generate a PropNameValid() output from an incoming text
PropNamesValidReturns TRUE if the given text buffers contains A..Z,0..9,_ characters
PropNameValidReturns TRUE if the given text buffer contains a..z,A..Z,0..9,_ characters
QuickSortRawUtf8Sort a dynamic array of RawUtf8 items
QuickSortRawUtf8Sort a RawUtf8 array, low values first
QuotedStrFormat a text buffer with SQL-like quotes
QuotedStrFormat a text content with SQL-like quotes
QuotedStrFormat a text content with SQL-like quotes
RawUnicodeToStringConvert any UTF-16 encoded buffer into a RTL string
RawUnicodeToStringConvert any RawUnicode encoded string into a RTL string
RawUnicodeToStringConvert any UTF-16 encoded buffer into a RTL string
RawUnicodeToSynUnicodeConvert any UTF-16 buffer into a generic SynUnicode Text
RawUnicodeToSynUnicodeConvert any RawUnicode String into a generic SynUnicode Text
RawUnicodeToUtf8Convert a UTF-16 PWideChar buffer into a UTF-8 buffer
RawUnicodeToUtf8Convert a RawUnicode string into a UTF-8 string
RawUnicodeToUtf8Convert a UTF-16 PWideChar buffer into a UTF-8 string
RawUnicodeToUtf8Convert a UTF-16 PWideChar buffer into a UTF-8 string
RawUnicodeToUtf8Convert a UTF-16 PWideChar buffer into a UTF-8 string
RawUnicodeToUtf8Convert a UTF-16 PWideChar buffer into a UTF-8 temporary buffer
RawUnicodeToWinAnsiConvert a UTF-16 PWideChar buffer into a WinAnsi (code page 1252) string
RawUnicodeToWinAnsiConvert a UTF-16 string into a WinAnsi (code page 1252) string
RawUnicodeToWinPCharDirect conversion of a UTF-16 encoded buffer into a WinAnsi PAnsiChar buffer
RawUtf8DynArrayEqualsTrue if both TRawUtf8DynArray are the same
RawUtf8DynArrayEqualsTrue if both TRawUtf8DynArray are the same for a given number of items
RawUtf8FromFileRead a File content into a RawUtf8, detecting any leading BOM
RawUtf8OfCharUTF-8 dedicated (and faster) alternative to StringOfChar((Ch,Count))
RightUReturns n trailing characters
SameTextUSameText() overloaded function with proper UTF-8 decoding
ShortStringToUtf8Direct conversion of a WinAnsi ShortString into a UTF-8 text
SortDynArrayAnsiStringICompare two "array of AnsiString" elements, with no case sensitivity
SortDynArrayPUtf8CharICompare two "array of PUtf8Char/PAnsiChar" elements, with no case sensitivity
SortDynArrayStringICompare two "array of RTL string" elements, with no case sensitivity
SortDynArrayUnicodeStringICompare two "array of WideString/UnicodeString" elements, with no case sensitivity
SplitSplit a RawUtf8 string into two strings, according to SepStr separator
SplitSplit a RawUtf8 string into several strings, according to SepStr separator
SplitSplit a RawUtf8 string into two strings, according to SepStr separator
SplitRightReturns the last occurrence of the given SepChar separated context
SplitRightsReturns the last occurrence of the given SepChar separated context
StartWithCheck case-insensitive matching starting of text in upTextStart
StartWithExactCheck case-sensitive matching starting of text in start
StrCompILOur fast version of StrCompIL(), to be used with PUtf8Char
StrCompLOur fast version of StrCompL(), to be used with PUtf8Char
strcspnPure pascal version of strcspn(), to be used with PUtf8Char/PAnsiChar
StrICompOur fast version of StrIComp(), to be used with PUtf8Char/PAnsiChar
StrICompLNotNilStrIComp-like function with a length, lookup table and Str1/Str2 expected not nil
StrICompNotNilStrIComp-like function with a lookup table and Str1/Str2 expected not nil
StrILNotNilStrIComp function with a length, lookup table and Str1/Str2 expected not nil
StringBufferToUtf8Convert any RTL string 0-terminated Text buffer into an UTF-8 string
StringBufferToUtf8Convert any RTL string buffer into an UTF-8 encoded buffer
StringDynArrayToRawUtf8DynArrayConvert the string dynamic array into a dynamic array of UTF-8 strings
StringFromBomFileRead a file into a temporary variable, check the BOM, and adjust the buffer
StringListToRawUtf8DynArrayConvert the string list into a dynamic array of UTF-8 strings
StringReplaceAllFast version of StringReplace(S, OldPattern, NewPattern, [rfReplaceAll]);
StringReplaceAllFast version of several cascaded StringReplaceAll()
StringReplaceAllCase-sensitive (or not) StringReplace(S, OldPattern, NewPattern,[rfReplaceAll])
StringReplaceAllProcessActual replacement function called by StringReplaceAll() on first match
StringReplaceCharsFast replace of a specified char by a given string
StringReplaceTabsFast replace of all #9 chars by a given string
StringToAnsi7Convert any RTL string into Ansi 7-bit encoded String
StringToRawUnicodeConvert any RTL string into a RawUnicode encoded String
StringToRawUnicodeConvert any RTL string into a RawUnicode encoded String
StringToSynUnicodeConvert any RTL string into a SynUnicode encoded String
StringToSynUnicodeConvert any RTL string into a SynUnicode encoded String
StringToUtf8Convert any RTL string into an UTF-8 encoded String
StringToUtf8Convert any RTL string buffer into an UTF-8 encoded String
StringToUtf8Convert any RTL string into an UTF-8 encoded String
StringToUtf8Convert any RTL string into an UTF-8 encoded TSynTempBuffer
StringToVariantConvert any RTL string into a variant storing a UTF-8 string
StringToVariantConvert any RTL string into a variant storing a UTF-8 string
StringToWinAnsiConvert any RTL string into WinAnsi (Win-1252) 8-bit encoded String
StrPosIA non case-sensitive version of Pos()
StrPosIReferenceUTF-8 Unicode 10.0 case-insensitive Pattern search within UTF-8 buffer
strspnPure pascal version of strspn(), to be used with PUtf8Char/PAnsiChar
SynUnicodeToStringConvert any SynUnicode encoded string into a RTL string
SynUnicodeToUtf8Convert a SynUnicode string into a UTF-8 string
ToUtf8Convert any RTL string into an UTF-8 encoded String
ToUtf8Convert any UTF-8 encoded ShortString Text into an UTF-8 encoded String
TRawUtf8DynArrayFromQuick helper to initialize a dynamic array of RawUtf8 from some constants
TrimCharReturns the supplied text content, without any specified char
TrimCharsTrim some trailing and ending chars
TrimControlCharsReturns the supplied text content, without any control char
TrimLeftTrims leading whitespace characters from the string by removing new line, space, and tab characters
TrimLeftLinesTrims leading whitespaces of every lines of the UTF-8 text
TrimLeftLowerCaseTrim first lowercase chars ('otDone' will return 'Done' e.g.)
TrimLeftLowerCaseShortTrim first lowercase chars ('otDone' will return 'Done' e.g.)
TrimLeftLowerCaseToShortTrim first lowercase chars ('otDone' will return 'Done' e.g.)
TrimLeftLowerCaseToShortTrim first lowercase chars ('otDone' will return 'Done' e.g.)
TrimOneCharReturns the supplied text content, without one specified char
TrimRightTrims trailing whitespace characters from the string by removing trailing newline, space, and tab characters
Ucs4ToUtf8UTF-8 encode one UCS4 CodePoint into Dest
UnCamelCaseConvert a CamelCase string into a space separated one
UnCamelCaseConvert a CamelCase string into a space separated one
UnicodeBufferToStringConvert an Unicode buffer into a RTL string
UnicodeBufferToUtf8Convert an Unicode buffer into a UTF-8 string
UnicodeBufferToVariantConvert an Unicode buffer into a variant storing a UTF-8 string
UnicodeBufferToWinAnsiConvert an Unicode buffer into a WinAnsi (code page 1252) string
UniqueRawUtf8ZeroToTildeWill fast replace all #0 chars as ~
UnQuotedSqlSymbolNameUnquote a SQL-compatible symbol name
UnQuoteSqlStringUnquote a SQL-compatible string
UnQuoteSqlStringVarUnquote a SQL-compatible string
UnZeroedConvert a binary buffer into a fake ASCII/UTF-8 content without any #0 input
UpperCaseFast conversion of the supplied text into uppercase
UpperCaseCopyFast conversion of the supplied text into uppercase
UpperCaseCopyFast conversion of the supplied text into uppercase
UpperCaseReferenceUpperCase conversion of a UTF-8 string using our Unicode 10.0 tables
UpperCaseSelfFast in-place conversion of the supplied variable text into uppercase
UpperCaseSynUnicodeUse the RTL to convert the SynUnicode text to UpperCase
UpperCaseUFast conversion of the supplied text into 8-bit uppercase
UpperCaseUcs4ReferenceUpperCase conversion of UTF-8 into UCS4 using our Unicode 10.0 tables
UpperCaseUnicodeAccurate conversion of the supplied UTF-8 content into the corresponding upper-case Unicode characters
UpperCopyCopy source into dest^ with ASCII 7-bit upper case conversion
UpperCopy255Delphi does not like inlining goto+label copy source into a 256 chars dest^ buffer with 7-bit upper case conversion
UpperCopy255BufCopy source^ into a 256 chars dest^ buffer with 7-bit upper case conversion
UpperCopy255WCopy UTF-16 source into dest^ with ASCII 7-bit upper case conversion
UpperCopy255WCopy WideChar source into dest^ with upper case conversion
UpperCopyShortCopy source into dest^ with ASCII 7-bit upper case conversion
UpperCopyWin255Copy source into dest^ with WinAnsi 8-bit upper case conversion
Utf16CharToUtf8UTF-8 encode one UTF-16 encoded UCS4 CodePoint into Dest
Utf8DecodeToRawUnicodeConvert a UTF-8 string into a RawUnicode string
Utf8DecodeToRawUnicodeConvert a UTF-8 encoded buffer into a RawUnicode string
Utf8DecodeToRawUnicodeUIConvert a UTF-8 string into a RawUnicode string
Utf8DecodeToRawUnicodeUIConvert a UTF-8 string into a RawUnicode string
Utf8DecodeToStringConvert any UTF-8 encoded buffer into a RTL string
Utf8DecodeToStringConvert any UTF-8 encoded buffer into a RTL string
Utf8DecodeToUnicodeConvert any UTF-8 encoded string into an UTF-16 temporary buffer
Utf8DecodeToUnicodeConvert any UTF-8 encoded buffer into an UTF-16 temporary buffer
Utf8DecodeToUnicodeRawByteStringConvert an UTF-8 encoded buffer into a UTF-16 encoded RawByteString buffer
Utf8DecodeToUnicodeRawByteStringConvert an UTF-8 encoded buffer into a UTF-16 encoded RawByteString buffer
Utf8DecodeToUnicodeStreamConvert an UTF-8 encoded buffer into a UTF-16 encoded stream of bytes
Utf8FirstLineToUtf16LengthCalculate the UTF-16 Unicode characters count of the UTF-8 encoded first line
Utf8ICompFast UTF-8 comparison handling WinAnsi CP-1252 case folding
Utf8ICompReferenceUTF-8 comparison using our Unicode 10.0 tables
Utf8ILCompFast UTF-8 comparison handling WinAnsi CP-1252 case folding
Utf8ILCompReferenceUTF-8 comparison using our Unicode 10.0 tables
Utf8ToFileNameConvert any UTF-8 encoded String into a generic RTL file name string
Utf8ToRawUtf8Direct conversion of a UTF-8 encoded zero terminated buffer into a RawUtf8 String
Utf8ToShortStringDirect conversion of a UTF-8 encoded buffer into a WinAnsi ShortString buffer
Utf8ToStringConvert any UTF-8 encoded String into a RTL string
Utf8ToStringVarConvert any UTF-8 encoded String into a RTL string
Utf8ToSynUnicodeConvert any UTF-8 encoded String into a generic SynUnicode Text
Utf8ToSynUnicodeConvert any UTF-8 encoded buffer into a generic SynUnicode Text
Utf8ToSynUnicodeConvert any UTF-8 encoded String into a generic SynUnicode Text
Utf8ToUnicodeLengthCalculate the UTF-16 Unicode characters count, UTF-8 encoded in source^
Utf8ToWideCharConvert an UTF-8 encoded text into a WideChar (UTF-16) buffer
Utf8ToWideCharConvert an UTF-8 encoded text into a WideChar (UTF-16) buffer
Utf8ToWideStringConvert any UTF-8 encoded String into a generic WideString Text
Utf8ToWideStringConvert any UTF-8 encoded String into a generic WideString Text
Utf8ToWideStringConvert any UTF-8 encoded String into a generic WideString Text
Utf8ToWinAnsiDirect conversion of a UTF-8 encoded zero terminated buffer into a WinAnsi String
Utf8ToWinAnsiDirect conversion of a UTF-8 encoded string into a WinAnsi String
Utf8ToWinPCharDirect conversion of a UTF-8 encoded buffer into a WinAnsi PAnsiChar buffer
Utf8TruncatedLengthCompute the truncated length of the supplied UTF-8 value if it exceeds the specified bytes count
Utf8TruncatedLengthCompute the truncated length of the supplied UTF-8 value if it exceeds the specified bytes count
Utf8TruncateToLengthWill truncate the supplied UTF-8 value if its length exceeds the specified bytes count
Utf8TruncateToUnicodeLengthWill truncate the supplied UTF-8 value if its length exceeds the specified UTF-16 Unicode characters count
Utf8UpperCopyCopy UTF-8 buffer into dest^ handling WinAnsi CP-1252 NormToUpper[] folding
Utf8UpperCopy255Copy UTF-8 buffer into dest^ handling WinAnsi CP-1252 NormToUpper[] folding
Utf8UpperReferenceUpperCase conversion of a UTF-8 buffer using our Unicode 10.0 tables
Utf8UpperReferenceUpperCase conversion of a UTF-8 buffer using our Unicode 10.0 tables
WideCharToWinAnsiConversion of a wide char into a WinAnsi (CodePage 1252) char index
WideCharToWinAnsiCharConversion of a wide char into a WinAnsi (CodePage 1252) char
WideStringToUtf8Convert a WideString into a UTF-8 string
WideStringToWinAnsiConvert a WideString into a WinAnsi (code page 1252) string
WinAnsiBufferToUtf8Direct conversion of a WinAnsi PAnsiChar buffer into a UTF-8 encoded buffer
WinAnsiToRawUnicodeDirect conversion of a WinAnsi (CodePage 1252) string into a Unicode encoded String
WinAnsiToSynUnicodeConvert a Win-Ansi string into a Delphi 2009+ or FPC Unicode string
WinAnsiToSynUnicodeConvert a Win-Ansi encoded buffer into a Delphi 2009+ or FPC Unicode string
WinAnsiToUnicodeBufferDirect conversion of a WinAnsi (CodePage 1252) string into a Unicode buffer
WinAnsiToUtf8Direct conversion of a WinAnsi (CodePage 1252) string into a UTF-8 encoded String
WinAnsiToUtf8Direct conversion of a WinAnsi (CodePage 1252) string into a UTF-8 encoded String
ZeroedConvert a fake UTF-8 buffer without any #0 input back into its original binary

1.6.1. AddRawUtf8

procedure AddRawUtf8(var Values: TRawUtf8DynArray; var ValuesCount: integer; const Value: TRawUtf8DynArray); overload;

Add Value[] items to Values[], with an external count variable, for performance


1.6.2. AddRawUtf8

procedure AddRawUtf8(var Values: TRawUtf8DynArray; const Value: TRawUtf8DynArray); overload;

Add Value[] items to Values[]


1.6.3. AddRawUtf8

function AddRawUtf8(var Values: TRawUtf8DynArray; var ValuesCount: integer; const Value: RawUtf8): PtrInt; overload;

Add the Value to Values[], with an external count variable, for performance


1.6.4. AddRawUtf8

function AddRawUtf8(var Values: TRawUtf8DynArray; const Value: RawUtf8; NoDuplicates: boolean = false; CaseSensitive: boolean = true): boolean; overload;

True if Value was added successfully in Values[]


1.6.5. AddSortedRawUtf8

function AddSortedRawUtf8(var Values: TRawUtf8DynArray; var ValuesCount: integer; const Value: RawUtf8; CoValues: PIntegerDynArray = nil; ForcedIndex: PtrInt = -1; Compare: TUtf8Compare = nil): PtrInt;

Add a RawUtf8 value in an alphaticaly sorted dynamic array of RawUtf8
- returns the index where the Value was added successfully in Values[]
- returns -1 if the specified Value was already present in Values[] (we must avoid any duplicate for O(log(n)) binary search)
- if CoValues is set, its content will be moved to allow inserting a new value at CoValues[result] position - a typical usage of CoValues is to store the corresponding ID to each RawUtf8 item
- if FastLocatePUtf8CharSorted() has been already called, this index can be set to optional ForceIndex parameter
- by default, exact (case-sensitive) match is used; you can specify a custom compare function if needed in Compare optional parameter


1.6.6. AddString

function AddString(var Values: TStringDynArray; const Value: string): PtrInt;

Add the Value to Values[] string array


1.6.7. Ansi7ToString

procedure Ansi7ToString(Text: PWinAnsiChar; Len: PtrInt; var result: string); overload;

Convert any Ansi 7-bit encoded String into a RTL string
- the Text content must contain only 7-bit pure ASCII characters


1.6.8. Ansi7ToString

function Ansi7ToString(Text: PWinAnsiChar; Len: PtrInt): string; overload;

Convert any Ansi 7-bit encoded String into a RTL string
- the Text content must contain only 7-bit pure ASCII characters


1.6.9. Ansi7ToString

function Ansi7ToString(const Text: RawByteString): string; overload;

Convert any Ansi 7-bit encoded String into a RTL string
- the Text content must contain only 7-bit pure ASCII characters


1.6.10. AnsiBufferToTempUtf8

function AnsiBufferToTempUtf8(var Temp: TSynTempBuffer; Buf: PAnsiChar; BufLen, CodePage: cardinal): PUtf8Char;

Convert any Ansi memory buffer into UTF-8, using a TSynTempBuffer if needed
- caller should release any memory by calling Temp.Done
- returns a pointer to the UTF-8 converted buffer - which may be buf


1.6.11. AnsiCharToUtf8

procedure AnsiCharToUtf8(P: PAnsiChar; L: integer; var result: RawUtf8; CodePage: integer);

Convert an AnsiChar buffer (of a given code page) into a UTF-8 string
- the destination code page should be supplied
- wrapper around TSynAnsiConvert.Engine(CodePage).AnsiBufferToRawUtf8()


1.6.12. AnsiIComp

function AnsiIComp(Str1, Str2: pointer): PtrInt;

Fast WinAnsi comparison using the NormToUpper[] array for all 8-bit values


1.6.13. AnsiICompW

function AnsiICompW(u1, u2: PWideChar): PtrInt;

Fast case-insensitive Unicode comparison handling ASCII 7-bit chars
- use the NormToUpperAnsi7Byte[] array, i.e. compare 'a'..'z' as 'A'..'Z'
- this version expects u1 and u2 to be zero-terminated


1.6.14. AnsiToString

function AnsiToString(const Ansi: RawByteString; CodePage: integer): string;

Convert an AnsiString (of a given code page) into a RTL string
- the destination code page should be supplied
- wrapper around TSynAnsiConvert.Engine(CodePage) and string conversion


1.6.15. AnsiToUtf8

function AnsiToUtf8(const Ansi: RawByteString; CodePage: integer): RawUtf8;

Convert an AnsiString (of a given code page) into a UTF-8 string
- use AnyAnsiToUtf8() if you want to use the codepage of the input string
- wrapper around TSynAnsiConvert.Engine(CodePage).AnsiToUtf8()


1.6.16. AnyAnsiToUtf8

procedure AnyAnsiToUtf8(const s: RawByteString; var result: RawUtf8); overload;

Direct conversion of an AnsiString with an unknown code page into an UTF-8 encoded String
- will assume CurrentAnsiConvert.CodePage prior to Delphi 2009
- newer UNICODE versions of Delphi will retrieve the code page from string


1.6.17. AnyAnsiToUtf8

function AnyAnsiToUtf8(const s: RawByteString): RawUtf8; overload;

Direct conversion of an AnsiString with an unknown code page into an UTF-8 encoded String
- will assume CurrentAnsiConvert.CodePage prior to Delphi 2009
- newer UNICODE versions of Delphi will retrieve the code page from string
- use AnsiToUtf8() if you want to specify the codepage


1.6.18. AnyTextFileToRawUtf8

function AnyTextFileToRawUtf8(const FileName: TFileName; AssumeUtf8IfNoBom: boolean = false): RawUtf8;

Read a File content into a RawUtf8, detecting any leading BOM
- assume file with no BOM is encoded with the current Ansi code page, not UTF-8, unless AssumeUtf8IfNoBom is true and it behaves like RawUtf8FromFile()


1.6.19. AnyTextFileToString

function AnyTextFileToString(const FileName: TFileName; ForceUtf8: boolean = false): string;

Read a File content into a RTL string, detecting any leading BOM
- assume file with no BOM is encoded with the current Ansi code page, not UTF-8
- if ForceUtf8 is true, won't detect the BOM but assume whole file is UTF-8


1.6.20. AnyTextFileToSynUnicode

function AnyTextFileToSynUnicode(const FileName: TFileName; ForceUtf8: boolean = false): SynUnicode;

Read a File content into SynUnicode string, detecting any leading BOM
- assume file with no BOM is encoded with the current Ansi code page, not UTF-8
- if ForceUtf8 is true, won't detect the BOM but assume whole file is UTF-8


1.6.21. AppendShortComma

procedure AppendShortComma(text: PAnsiChar; len: PtrInt; var result: ShortString; trimlowercase: boolean);

Fast append some UTF-8 text into a ShortString, with an ending ','


1.6.22. BomFile

function BomFile(var Buffer: pointer; var BufferSize: PtrInt): TBomFile;

Check the file BOM at the beginning of a file buffer
- BOM is common only with Microsoft products
- returns bomNone if no BOM was recognized
- returns bomUnicode or bomUtf8 if UTF-16LE or UTF-8 BOM were recognized: and will adjust Buffer/BufferSize to ignore the leading 2 or 3 bytes


1.6.23. CamelCase

procedure CamelCase(P: PAnsiChar; len: PtrInt; var s: RawUtf8; const isWord: TSynByteSet = [ord('0')..ord('9'), ord('a')..ord('z'), ord('A')..ord('Z')]); overload;

Convert a string into an human-friendly CamelCase identifier
- replacing spaces or punctuations by an uppercase character
- as such, it is not the reverse function to UnCamelCase()


1.6.24. CamelCase

procedure CamelCase(const text: RawUtf8; var s: RawUtf8; const isWord: TSynByteSet = [ord('0')..ord('9'), ord('a')..ord('z'), ord('A')..ord('Z')]); overload;

Convert a string into an human-friendly CamelCase identifier
- replacing spaces or punctuations by an uppercase character
- as such, it is not the reverse function to UnCamelCase()


1.6.25. CaseCopy

procedure CaseCopy(Text: PUtf8Char; Len: PtrInt; Table: PNormTable; var Dest: RawUtf8);

Low-level function called when inlining UpperCase(Copy) and LowerCase(Copy)


1.6.26. CaseSelf

procedure CaseSelf(var S: RawUtf8; Table: PNormTable);

Low-level function called when inlining UpperCaseSelf and LowerCaseSelf


1.6.27. CodePageToText

function CodePageToText(aCodePage: cardinal): TShort16;

Return a code page number into human-friendly text


1.6.28. ContainsUtf8

function ContainsUtf8(p, up: PUtf8Char): boolean;

Return true if up^ is contained inside the UTF-8 buffer p^
- search up^ at the beginning of every UTF-8 word (aka in Soundex)
- here a "word" is a Win-Ansi word, i.e. '0'..'9', 'A'..'Z'
- up^ must be already Upper


1.6.29. ConvertCaseUtf8

function ConvertCaseUtf8(P: PUtf8Char; const Table: TNormTableByte): PtrInt;

Fast conversion of the supplied text into 8-bit case sensitivity
- convert the text in-place, returns the resulting length
- it will decode the supplied UTF-8 content to handle more than 7-bit of ascii characters during the conversion (leaving not WinAnsi characters untouched)
- will not set the last char to #0 (caller must do that if necessary)


1.6.30. DeleteRawUtf8

function DeleteRawUtf8(var Values: TRawUtf8DynArray; var ValuesCount: integer; Index: integer; CoValues: PIntegerDynArray = nil): boolean; overload;

Delete a RawUtf8 item in a dynamic array of RawUtf8
- if CoValues is set, the integer item at the same index is also deleted


1.6.31. DeleteRawUtf8

function DeleteRawUtf8(var Values: TRawUtf8DynArray; Index: PtrInt): boolean; overload;

Delete a RawUtf8 item in a dynamic array of RawUtf8;


1.6.32. DetectRawUtf8

procedure DetectRawUtf8(var source: RawByteString);

Detect UTF-8 content and mark the variable with the CP_UTF8 codepage
- to circumvent FPC concatenation bug with CP_UTF8 and CP_RAWBYTESTRING


1.6.33. EndWith

function EndWith(const text, upTextEnd: RawUtf8): boolean;

Check case-insensitive matching ending of text in upTextEnd
- returns true if the item matched
- ignore case - upTextEnd must be already in upper case
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- see EndWithExact() from mormot.core.text for a case-sensitive version


1.6.34. EndWithArray

function EndWithArray(const text: RawUtf8; const upArray: array of RawUtf8): integer;

Returns the index of a case-insensitive matching ending of p^ in upArray[]
- returns -1 if no item matched
- ignore case - upArray[] items must be already in upper case
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)


1.6.35. EndWithExact

function EndWithExact(const text, textEnd: RawUtf8): boolean;

Check case-sensitive matching ending of text in ending
- returns true if the item matched
- see EndWith() from mormot.core.unicode for a case-insensitive version


1.6.36. FastFindIndexedPUtf8Char

function FastFindIndexedPUtf8Char(P: PPUtf8CharArray; R: PtrInt; var SortedIndexes: TCardinalDynArray; Value: PUtf8Char; ItemComp: TUtf8Compare): PtrInt;

Retrieve the index of a PUtf8Char in a PUtf8Char array via a sort indexed
- will use fast O(log(n)) binary search algorithm


1.6.37. FastFindPUtf8CharSorted

function FastFindPUtf8CharSorted(P: PPUtf8CharArray; R: PtrInt; Value: PUtf8Char): PtrInt; overload;

Retrieve the index where is located a PUtf8Char in a sorted PUtf8Char array
- R is the last index of available entries in P^ (i.e. Count-1)
- string comparison is case-sensitive StrComp (so will work with any PAnsiChar)
- returns -1 if the specified Value was not found
- will use inlined binary search algorithm with optimized x86_64 branchless asm
- slightly faster than plain FastFindPUtf8CharSorted(P,R,Value,@StrComp)


1.6.38. FastFindPUtf8CharSorted

function FastFindPUtf8CharSorted(P: PPUtf8CharArray; R: PtrInt; Value: PUtf8Char; Compare: TUtf8Compare): PtrInt; overload;

Retrieve the index where is located a PUtf8Char in a sorted PUtf8Char array
- R is the last index of available entries in P^ (i.e. Count-1)
- string comparison will use the specified Compare function
- returns -1 if the specified Value was not found
- will use fast O(log(n)) binary search algorithm


1.6.39. FastFindUpperPUtf8CharSorted

function FastFindUpperPUtf8CharSorted(P: PPUtf8CharArray; R: PtrInt; Value: PUtf8Char; ValueLen: PtrInt): PtrInt;

Retrieve the index where is located a PUtf8Char in a sorted uppercase array
- P[] array is expected to be already uppercased
- searched Value is converted to uppercase before search via UpperCopy255Buf(), so is expected to be short, i.e. length < 250
- R is the last index of available entries in P^ (i.e. Count-1)
- returns -1 if the specified Value was not found
- will use fast O(log(n)) binary search algorithm
- slightly faster than plain FastFindPUtf8CharSorted(P,R,Value,@StrIComp)


1.6.40. FastLocatePUtf8CharSorted

function FastLocatePUtf8CharSorted(P: PPUtf8CharArray; R: PtrInt; Value: PUtf8Char): PtrInt; overload;

Retrieve the index where to insert a PUtf8Char in a sorted PUtf8Char array
- R is the last index of available entries in P^ (i.e. Count-1)
- string comparison is case-sensitive StrComp (so will work with any PAnsiChar)
- returns -1 if the specified Value was found (i.e. adding will duplicate a value)
- will use fast O(log(n)) binary search algorithm


1.6.41. FastLocatePUtf8CharSorted

function FastLocatePUtf8CharSorted(P: PPUtf8CharArray; R: PtrInt; Value: PUtf8Char; Compare: TUtf8Compare): PtrInt; overload;

Retrieve the index where to insert a PUtf8Char in a sorted PUtf8Char array
- this overloaded function accept a custom comparison function for sorting
- R is the last index of available entries in P^ (i.e. Count-1)
- string comparison is case-sensitive (so will work with any PAnsiChar)
- returns -1 if the specified Value was found (i.e. adding will duplicate a value)
- will use fast O(log(n)) binary search algorithm


1.6.42. FillZero

procedure FillZero(var secret: RawByteString); overload;

Fill all bytes of this memory buffer with zeros, i.e. 'toto' -> #0#0#0#0
- will write the memory buffer directly, if this string instance is not shared (i.e. has refcount = 1), to avoid zeroing still-used values
- may be used to cleanup stack-allocated content

 ... finally FillZero(secret); end;

1.6.43. FillZero

procedure FillZero(var secret: RawUtf8); overload;

Fill all bytes of this UTF-8 string with zeros, i.e. 'toto' -> #0#0#0#0
- will write the memory buffer directly, if this string instance is not shared (i.e. has refcount = 1), to avoid zeroing still-used values
- may be used to cleanup stack-allocated content

 ... finally FillZero(secret); end;

1.6.44. FillZero

procedure FillZero(var secret: SpiUtf8); overload;

Fill all bytes of this UTF-8 string with zeros, i.e. 'toto' -> #0#0#0#0
- SpiUtf8 type has been defined explicitly to store Sensitive Personal Information


1.6.45. FillZero

procedure FillZero(var secret: SynUnicode); overload;

Fill all bytes of this UTF-16 string with zeros, i.e. 'toto' -> #0#0#0#0


1.6.46. FillZero

procedure FillZero(var secret: TBytes); overload;

Fill all bytes of this dynamic array of bytes with zeros
- will write the memory buffer directly, if this array instance is not shared (i.e. has refcount = 1), to avoid zeroing still-used values


1.6.47. FindAnsi

function FindAnsi(A, UpperValue: PAnsiChar): boolean;

Return true if UpperValue (Ansi) is contained in A^ (Ansi)
- find UpperValue starting at word beginning, not inside words


1.6.48. FindNameValue

function FindNameValue(P: PUtf8Char; UpperName: PAnsiChar): PUtf8Char; overload;

Search for a value from its uppercased named entry
- i.e. iterate IdemPChar(source,UpperName) over every line of the source
- returns the text just after UpperName if it has been found at line beginning
- returns nil if UpperName was not found at any line beginning
- could be used e.g. to efficently extract a value from HTTP headers, whereas FindIniNameValue() is tuned for [section]-oriented INI files


1.6.49. FindNameValue

function FindNameValue(const NameValuePairs: RawUtf8; UpperName: PAnsiChar; var Value: RawUtf8; KeepNotFoundValue: boolean = false; UpperNameSeparator: AnsiChar = #0): boolean; overload;

Search and returns a value from its uppercased named entry
- i.e. iterate IdemPChar(source,UpperName) over every line of the source
- returns true and the trimmed text just after UpperName into Value if it has been found at line beginning
- returns false and set Value := '' if UpperName was not found (or leave Value untouched if KeepNotFoundValue is true)
- could be used e.g. to efficently extract a value from HTTP headers, whereas FindIniNameValue() is tuned for [section]-oriented INI files
- do TrimLeftLines(NameValuePairs) first if the lines start with spaces/tabs


1.6.50. FindNameValuePointer

function FindNameValuePointer(NameValuePairs: PUtf8Char; UpperName: PAnsiChar; out FoundLen: PtrInt; UpperNameSeparator: AnsiChar): PUtf8Char;

Search and returns a PUtf8Char value from its uppercased named entry
- as called when inlining FindNameValue()
- won't make any memory allocation, so could be fine for a quick lookup


1.6.51. FindNextUtf8WordBegin

function FindNextUtf8WordBegin(U: PUtf8Char): PUtf8Char;

Points to the beginning of the next word stored in U
- returns nil if reached the end of U (i.e. #0 char)
- here a "word" is a Win-Ansi word, i.e. '0'..'9', 'A'..'Z'


1.6.52. FindRawUtf8

function FindRawUtf8(const Values: array of RawUtf8; const Value: RawUtf8; CaseSensitive: boolean = true): integer; overload;

Return the index of Value in Values[], -1 if not found
- CaseSensitive=false will use StrICmp() for A..Z / a..z equivalence


1.6.53. FindRawUtf8

function FindRawUtf8(const Values: TRawUtf8DynArray; const Value: RawUtf8; CaseSensitive: boolean = true): integer; overload;

Return the index of Value in Values[], -1 if not found
- CaseSensitive=false will use StrICmp() for A..Z / a..z equivalence


1.6.54. FindRawUtf8

function FindRawUtf8(Values: PRawUtf8; const Value: RawUtf8; ValuesCount: integer; CaseSensitive: boolean): integer; overload;

Low-level efficient search of Value in Values[]
- CaseSensitive=false will use StrICmp() for A..Z / a..z equivalence


1.6.55. FindShortStringListExact

function FindShortStringListExact(List: PShortString; MaxValue: integer; aValue: PUtf8Char; aValueLen: PtrInt): integer;

Fast search of an exact case-insensitive match of a RTTI's PShortString array


1.6.56. FindShortStringListTrimLowerCase

function FindShortStringListTrimLowerCase(List: PShortString; MaxValue: integer; aValue: PUtf8Char; aValueLen: PtrInt): integer;

Fast case-insensitive search of a left-trimmed lowercase match of a RTTI's PShortString array


1.6.57. FindShortStringListTrimLowerCaseExact

function FindShortStringListTrimLowerCaseExact(List: PShortString; MaxValue: integer; aValue: PUtf8Char; aValueLen: PtrInt): integer;

Fast case-sensitive search of a left-trimmed lowercase match of a RTTI's PShortString array


1.6.58. FindUnicode

function FindUnicode(PW: PWideChar; Upper: PWideChar; UpperLen: PtrInt): boolean;

Return true if Upper (Unicode encoded) is contained in U^ (UTF-8 encoded)
- will use the slow but accurate Operating System API (Win32 or ICU) to perform the comparison at Unicode-level
- consider using StrPosIReference() for our faster Unicode 10.0 version


1.6.59. FindUtf8

function FindUtf8(U: PUtf8Char; UpperValue: PAnsiChar): boolean;

Return true if UpperValue (Ansi) is contained in U^ (UTF-8 encoded)
- find UpperValue starting at word beginning, not inside words
- UTF-8 decoding is done on the fly (no temporary decoding buffer is used)


1.6.60. GetCaptionFromPCharLen

procedure GetCaptionFromPCharLen(P: PUtf8Char; out result: string);

UnCamelCase and translate a char buffer
- P is expected to be #0 ended
- return "string" type, i.e. UnicodeString for Delphi 2009+


1.6.61. GetHighUtf8Ucs4

function GetHighUtf8Ucs4(var U: PUtf8Char): Ucs4CodePoint;

Internal function, used to retrieve a >127 US4 CodePoint from UTF-8
- not to be called directly, but from inlined higher-level functions
- here U^ shall be always >= #80
- typical use is as such:

  ch := ord(P^);
  if ch and $80=0 then
    inc(P) else
    ch := GetHighUtf8Ucs4(P);

1.6.62. GetLineContains

function GetLineContains(p, pEnd, up: PUtf8Char): boolean;

Returns TRUE if the supplied uppercased text is contained in the text buffer


1.6.63. GetLineSize

function GetLineSize(P, PEnd: PUtf8Char): PtrUInt;

Compute the line length from source array of chars
- if PEnd = nil, end counting at either #0, #13 or #10
- otherwise, end counting at either #13 or #10
- just a wrapper around BufferLineLength() checking PEnd=nil case


1.6.64. GetLineSizeSmallerThan

function GetLineSizeSmallerThan(P, PEnd: PUtf8Char; aMinimalCount: integer): boolean;

Returns true if the line length from source array of chars is not less than the specified count


1.6.65. GetNextFieldProp

function GetNextFieldProp(var P: PUtf8Char; var Prop: RawUtf8): boolean;

Retrieve the next SQL-like identifier within the UTF-8 buffer
- will also trim any space (or line feeds) and trailing ';'
- any comment like '/*nocache*/' will be ignored
- returns true if something was set to Prop


1.6.66. GetNextFieldPropSameLine

function GetNextFieldPropSameLine(var P: PUtf8Char; var Prop: ShortString): boolean;

Retrieve the next identifier within the UTF-8 buffer on the same line
- GetNextFieldProp() will just handle line feeds (and ';') as spaces - which is fine e.g. for SQL, but not for regular config files with name/value pairs
- returns true if something was set to Prop


1.6.67. GetNextLine

function GetNextLine(source: PUtf8Char; out next: PUtf8Char; andtrim: boolean = false): RawUtf8;

Extract a line from source array of chars
- next will contain the beginning of next line, or nil if source has ended


1.6.68. GetNextStringLineToRawUnicode

function GetNextStringLineToRawUnicode(var P: PChar): RawUnicode;

Return next string delimited with #13#10 from P, nil if no more
- this function returns a RawUnicode string type


1.6.69. GetNextUtf8Upper

function GetNextUtf8Upper(var U: PUtf8Char): Ucs4CodePoint;

Retrieve the next UCS4 CodePoint stored in U, then update the U pointer
- this function will decode the UTF-8 content before using NormToUpper[]
- will return '?' if the UCS4 CodePoint is higher than #255: so use this function only if you need to deal with ASCII characters (e.g. it's used for Soundex and for ContainsUtf8 function)


1.6.70. GetUtf8WideChar

function GetUtf8WideChar(P: PUtf8Char): cardinal;

Decode UTF-16 WideChar from UTF-8 input buffer
- any surrogate (Ucs4>$ffff) is returned as UNICODE_REPLACEMENT_CHARACTER=$fffd


1.6.71. GotoEndOfQuotedString

function GotoEndOfQuotedString(P: PUtf8Char): PUtf8Char;

Get the next character after a quoted buffer
- the first character in P^ must be either ', either "
- it will return the latest quote position, ignoring double quotes within


1.6.72. GotoNextNotSpace

function GotoNextNotSpace(P: PUtf8Char): PUtf8Char;

Get the next character not in [#1..' ']


1.6.73. GotoNextNotSpaceSameLine

function GotoNextNotSpaceSameLine(P: PUtf8Char): PUtf8Char;

Get the next character not in [#9,' ']


1.6.74. GotoNextSpace

function GotoNextSpace(P: PUtf8Char): PUtf8Char;

Get the next character in [#0..' ']


1.6.75. IdemFileExt

function IdemFileExt(p: PUtf8Char; extup: PAnsiChar; sepChar: AnsiChar = '.'): boolean;

Returns true if the file name extension contained in p^ is the same same as extup^
- ignore case - extup^ must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- could be used e.g. like IdemFileExt(aFileName,'.JP');


1.6.76. IdemFileExts

function IdemFileExts(p: PUtf8Char; const extup: array of PAnsiChar; sepChar: AnsiChar = '.'): integer;

Returns matching file name extension index as extup^
- ignore case - extup[] must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- could be used e.g. like IdemFileExts(aFileName,['.PAS','.INC']);


1.6.77. IdemPChar

function IdemPChar(p: PUtf8Char; up: PAnsiChar; table: PNormTable): boolean; overload;

Returns true if the beginning of p^ is the same as up^
- this overloaded function accept the uppercase lookup buffer as parameter


1.6.78. IdemPChar

function IdemPChar(p: PUtf8Char; up: PAnsiChar): boolean; overload;

Returns true if the beginning of p^ is the same as up^
- ignore case - up^ must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated characters): but when you only need to search for field names e.g. IdemPChar() is prefered, because it'll be faster than IdemPCharU(), if UTF-8 decoding is not mandatory
- if p is nil, will return FALSE
- if up is nil, will return TRUE


1.6.79. IdemPCharAndGetNextLine

function IdemPCharAndGetNextLine(var source: PUtf8Char; searchUp: PAnsiChar): boolean;

Return true if IdemPChar(source,searchUp), and go to the next line of source


1.6.80. IdemPCharArray

function IdemPCharArray(p: PUtf8Char; const upArray: array of PAnsiChar): integer;

Returns the index of a matching beginning of p^ in upArray[]
- returns -1 if no item matched
- ignore case - upArray^ must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- warning: this function expects upArray[] items to have AT LEAST TWO CHARS (it will use a fast 16-bit comparison of initial 2 bytes)
- consider IdemPPChar() which is faster but a bit more verbose


1.6.81. IdemPCharArrayBy2

function IdemPCharArrayBy2(p: PUtf8Char; const upArrayBy2Chars: RawUtf8): PtrInt;

Returns the index of a matching beginning of p^ in upArray two characters
- returns -1 if no item matched
- ignore case - upArray^ must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)


1.6.82. IdemPCharU

function IdemPCharU(p, up: PUtf8Char): boolean;

Returns true if the beginning of p^ is the same as up^
- ignore case - up^ must be already Upper
- this version will decode the UTF-8 content before using NormToUpper[], so it will be slower than the IdemPChar() function above, but will handle WinAnsi accentuated characters (e.g. 'e' acute will be matched as 'E')


1.6.83. IdemPCharW

function IdemPCharW(p: PWideChar; up: PUtf8Char): boolean;

Returns true if the beginning of p^ is same as up^
- ignore case - up^ must be already Upper
- this version expects p^ to point to an Unicode char array


1.6.84. IdemPCharWithoutWhiteSpace

function IdemPCharWithoutWhiteSpace(p: PUtf8Char; up: PAnsiChar): boolean;

Returns true if the beginning of p^ is the same as up^, ignoring white spaces
- ignore case - up^ must be already Upper
- any white space in the input p^ buffer is just ignored
- chars are compared as 7-bit Ansi only (no accentuated characters): but when you only need to search for field names e.g. IdemPChar() is prefered, because it'll be faster than IdemPCharU(), if UTF-8 decoding is not mandatory
- if p is nil, will return FALSE
- if up is nil, will return TRUE


1.6.85. IdemPPChar

function IdemPPChar(p: PUtf8Char; up: PPAnsiChar): PtrInt;

Returns the index of a matching beginning of p^ in nil-terminated up^ array
- returns -1 if no item matched
- ignore case - each up^ must be already Upper
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- warning: this function expects up^ items to have AT LEAST TWO CHARS (it will use a fast 16-bit comparison of initial 2 bytes)


1.6.86. IdemPropName

function IdemPropName(const P1, P2: ShortString): boolean; overload;

Case insensitive comparison of ASCII 7-bit identifiers
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)


1.6.87. IdemPropName

function IdemPropName(const P1: ShortString; P2: PUtf8Char; P2Len: PtrInt): boolean; overload;

Case insensitive comparison of ASCII 7-bit identifiers
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)


1.6.88. IdemPropName

function IdemPropName(P1, P2: PUtf8Char; P1Len, P2Len: PtrInt): boolean; overload;

Case insensitive comparison of ASCII 7-bit identifiers
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)
- this version expects P1 and P2 to be a PAnsiChar with specified lengths


1.6.89. IdemPropNameU

function IdemPropNameU(const P1, P2: RawUtf8): boolean; overload;

Case insensitive comparison of ASCII 7-bit identifiers
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)
- is an alternative with PropNameEquals() to be used inlined e.g. in a loop


1.6.90. IdemPropNameU

function IdemPropNameU(const P1: RawUtf8; P2: PUtf8Char; P2Len: PtrInt): boolean; overload;

Case insensitive comparison of ASCII 7-bit identifiers
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)
- this version expects P2 to be a PAnsiChar with specified length


1.6.91. IdemPropNameUSameLenNotNull

function IdemPropNameUSameLenNotNull(P1, P2: PUtf8Char; P1P2Len: PtrInt): boolean;

Case insensitive comparison of ASCII 7-bit identifiers of same length
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- behavior is undefined with UTF-8 encoding (some false positive may occur)
- this version expects P1 and P2 to be a PAnsiChar with an already checked identical length, so may be used for a faster process, e.g. in a loop
- if P1 and P2 are RawUtf8, you should better call overloaded function IdemPropNameU(const P1,P2: RawUtf8), which would be slightly faster by using the length stored before the actual text buffer of each RawUtf8


1.6.92. IsCaseSensitive

function IsCaseSensitive(P: PUtf8Char; PLen: PtrInt): boolean; overload;

Check if the supplied text has some case-insentitive 'a'..'z','A'..'Z' chars
- will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.93. IsCaseSensitive

function IsCaseSensitive(const S: RawUtf8): boolean; overload;

Check if the supplied text has some case-insentitive 'a'..'z','A'..'Z' chars
- will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.94. IsFixedWidthCodePage

function IsFixedWidthCodePage(aCodePage: cardinal): boolean;

Check if a code page is known to be of fixed width, i.e. not MBCS
- i.e. will be implemented as a TSynAnsiFixedWidth


1.6.95. IsValidUtf8

function IsValidUtf8(const source: RawUtf8): boolean; overload;

Returns TRUE if the supplied buffer has valid UTF-8 encoding
- will also refuse #0 characters within the buffer
- on Haswell AVX2 Intel/AMD CPUs, will use very efficient ASM, reaching e.g. 21 GB/s parsing speed on a Core i5-13500


1.6.96. IsValidUtf8

function IsValidUtf8(source: PUtf8Char): boolean; overload;

Returns TRUE if the supplied buffer has valid UTF-8 encoding
- will stop when the buffer contains #0
- just a wrapper around IsValidUtf8Buffer(source, StrLen(source)) so if you know the source length, you would better call IsValidUtf8Buffer() directly
- on Haswell AVX2 Intel/AMD CPUs, will use very efficient ASM, reaching e.g. 15 GB/s parsing speed on a Core i5-13500 - StrLen() itself runs at 37 GB/s


1.6.97. IsValidUtf8WithoutControlChars

function IsValidUtf8WithoutControlChars(source: PUtf8Char): boolean; overload;

Returns TRUE if the supplied buffer has valid UTF-8 encoding with no #1..#31 control characters
- supplied input is a pointer to a #0 ended text buffer


1.6.98. IsValidUtf8WithoutControlChars

function IsValidUtf8WithoutControlChars(const source: RawUtf8): boolean; overload;

Returns TRUE if the supplied buffer has valid UTF-8 encoding with no #0..#31 control characters
- supplied input is a RawUtf8 variable


1.6.99. IsVoid

function IsVoid(const text: RawUtf8): boolean;

Check all character within text are spaces or control chars
- i.e. a faster alternative to if TrimU(text)='' then


1.6.100. IsWinAnsi

function IsWinAnsi(WideText: PWideChar; Length: integer): boolean; overload;

Return TRUE if the supplied unicode buffer only contains WinAnsi characters
- i.e. if the text can be displayed using ANSI_CHARSET


1.6.101. IsWinAnsi

function IsWinAnsi(WideText: PWideChar): boolean; overload;

Return TRUE if the supplied unicode buffer only contains WinAnsi characters
- i.e. if the text can be displayed using ANSI_CHARSET


1.6.102. IsWinAnsiU

function IsWinAnsiU(Utf8Text: PUtf8Char): boolean;

Return TRUE if the supplied UTF-8 buffer only contains WinAnsi characters
- i.e. if the text can be displayed using ANSI_CHARSET


1.6.103. IsWinAnsiU8Bit

function IsWinAnsiU8Bit(Utf8Text: PUtf8Char): boolean;

Return TRUE if the supplied UTF-8 buffer only contains WinAnsi 8-bit characters
- i.e. if the text can be displayed using ANSI_CHARSET with only 8-bit unicode characters (e.g. no "tm" or such)


1.6.104. IsZero

function IsZero(const Values: TRawUtf8DynArray): boolean; overload;

Returns TRUE if Value is nil or all supplied Values[] equal ''


1.6.105. LeftU

function LeftU(const S: RawUtf8; n: PtrInt): RawUtf8;

Returns n leading characters


1.6.106. LowerCase

function LowerCase(const S: RawUtf8): RawUtf8;

Fast conversion of the supplied text into lowercase
- this will only convert 'A'..'Z' into 'a'..'z' (no NormToLower use), and will therefore be correct with true UTF-8 content


1.6.107. LowerCaseCopy

procedure LowerCaseCopy(Text: PUtf8Char; Len: PtrInt; var Dest: RawUtf8);

Fast conversion of the supplied text into lowercase
- this will only convert 'A'..'Z' into 'a'..'z' (no NormToLower use), and will therefore be correct with true UTF-8 content


1.6.108. LowerCaseSelf

procedure LowerCaseSelf(var S: RawUtf8);

Fast in-place conversion of the supplied variable text into lowercase
- this will only convert 'A'..'Z' into 'a'..'z' (no NormToLower use), and will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.109. LowerCaseSynUnicode

function LowerCaseSynUnicode(const S: SynUnicode): SynUnicode;

Use the RTL to convert the SynUnicode text to LowerCase


1.6.110. LowerCaseU

function LowerCaseU(const S: RawUtf8): RawUtf8;

Fast conversion of the supplied text into 8-bit lowercase
- this will not only convert 'A'..'Z' into 'a'..'z', but also accentuated latin characters ('E' acute into 'e' e.g.), using NormToLower[] array
- it will therefore decode the supplied UTF-8 content to handle more than 7-bit of ascii characters


1.6.111. LowerCaseUnicode

function LowerCaseUnicode(const S: RawUtf8): RawUtf8;

Accurate conversion of the supplied UTF-8 content into the corresponding lower-case Unicode characters
- will use the available API (e.g. Win32 or ICU), so may not be consistent on all systems - and also slower than LowerCase/LowerCaseU versions


1.6.112. NextNotSpaceCharIs

function NextNotSpaceCharIs(var P: PUtf8Char; ch: AnsiChar): boolean;

Check if the next character not in [#1..' '] matchs a given value
- first ignore any non space character
- then returns TRUE if P^=ch, setting P to the character after ch
- or returns FALSE if P^<>ch, leaving P at the level of the unexpected char


1.6.113. NextUtf8Ucs4

function NextUtf8Ucs4(var P: PUtf8Char): Ucs4CodePoint;

Get the UCS4 CodePoint stored in P^ (decode UTF-8 if necessary)


1.6.114. OnlyChar

function OnlyChar(const text: RawUtf8; const only: TSynAnsicharSet): RawUtf8;

Returns the supplied text content, without any other char than specified
- specify a custom char set to be included, e.g. as ['A'..'Z']


1.6.115. PosCharAny

function PosCharAny(Str: PUtf8Char; Characters: PAnsiChar): PUtf8Char;

Fast retrieve the position of any value of a given set of characters
- see also strspn() function which is likely to be faster


1.6.116. PosExI

function PosExI(const SubStr, S: RawUtf8; Offset: PtrUInt; Lookup: PNormTable): PtrInt; overload;

A case-insensitive version of PosEx() with a specified lookup table
- redirect to mormot.core.base PosEx() if Lookup = nil


1.6.117. PosExI

function PosExI(const SubStr, S: RawUtf8; Offset: PtrUInt): PtrInt; overload;

A ASCII-7 case-insensitive version of PosEx()
- will use the NormToUpperAnsi7 lookup table for character conversion


1.6.118. PosExIPas

function PosExIPas(Sub, P: PUtf8Char; Offset: PtrUInt; Lookup: PNormTable): PtrInt;

Internal function used when inlining PosExI()


1.6.119. PosI

function PosI(uppersubstr: PUtf8Char; const str: RawUtf8): PtrInt;

A non case-sensitive RawUtf8 version of Pos()
- uppersubstr is expected to be already in upper case
- this version handle only 7-bit ASCII (no accentuated characters)
- see PosIU() if you want an UTF-8 version with accentuated chars support


1.6.120. PosIU

function PosIU(substr: PUtf8Char; const str: RawUtf8): integer;

A non case-sensitive RawUtf8 version of Pos()
- substr is expected to be already in upper case
- this version will decode the UTF-8 content before using NormToUpper[]
- see PosI() for a non-accentuated, but faster version


1.6.121. PropNameSanitize

function PropNameSanitize(const text, fallback: RawUtf8): RawUtf8;

Try to generate a PropNameValid() output from an incoming text
- will trim all spaces, and replace most special chars by '_'
- if it is not PropNameValid() after those replacements, will return fallback


1.6.122. PropNamesValid

function PropNamesValid(const Values: array of RawUtf8): boolean;

Returns TRUE if the given text buffers contains A..Z,0..9,_ characters
- use it with property names values (i.e. only including A..Z,0..9,_ chars)
- this function allows numbers as first char, so won't check the first char the same way than PropNameValid() which refuses digits as pascal convention


1.6.123. PropNameValid

function PropNameValid(P: PUtf8Char): boolean;

Returns TRUE if the given text buffer contains a..z,A..Z,0..9,_ characters
- should match most usual property names values or other identifier names in the business logic source code
- i.e. can be tested via IdemPropName*() functions, and the MongoDB-like extended JSON syntax as generated by dvoSerializeAsExtendedJson
- following classic pascal naming convention, first char must be alphabetical or '_' (i.e. not a digit), following chars can be alphanumerical or '_'


1.6.124. QuickSortRawUtf8

procedure QuickSortRawUtf8(Values: PRawUtf8Array; L, R: PtrInt; caseInsensitive: boolean = false); overload;

Sort a RawUtf8 array, low values first


1.6.125. QuickSortRawUtf8

procedure QuickSortRawUtf8(var Values: TRawUtf8DynArray; ValuesCount: integer; CoValues: PIntegerDynArray = nil; Compare: TUtf8Compare = nil); overload;

Sort a dynamic array of RawUtf8 items
- if CoValues is set, the integer items are also synchronized
- by default, exact (case-sensitive) match is used; you can specify a custom compare function if needed in Compare optional parameter


1.6.126. QuotedStr

function QuotedStr(const S: RawUtf8; Quote: AnsiChar = ''''): RawUtf8; overload;

Format a text content with SQL-like quotes
- this function implements what is specified in the official SQLite3 documentation: "A string constant is formed by enclosing the string in single quotes ('). A single quote within the string can be encoded by putting two single quotes in a row - as in Pascal."


1.6.127. QuotedStr

procedure QuotedStr(const S: RawUtf8; Quote: AnsiChar; var result: RawUtf8); overload;

Format a text content with SQL-like quotes


1.6.128. QuotedStr

procedure QuotedStr(P: PUtf8Char; PLen: PtrInt; Quote: AnsiChar; var result: RawUtf8); overload;

Format a text buffer with SQL-like quotes


1.6.129. RawUnicodeToString

function RawUnicodeToString(P: PWideChar; L: integer): string; overload;

Convert any UTF-16 encoded buffer into a RTL string


1.6.130. RawUnicodeToString

function RawUnicodeToString(const U: RawUnicode): string; overload;

Convert any RawUnicode encoded string into a RTL string
- uses StrLenW() and not length(U) to handle case when was used as buffer


1.6.131. RawUnicodeToString

procedure RawUnicodeToString(P: PWideChar; L: integer; var result: string); overload;

Convert any UTF-16 encoded buffer into a RTL string


1.6.132. RawUnicodeToSynUnicode

function RawUnicodeToSynUnicode(const Unicode: RawUnicode): SynUnicode; overload;

Convert any RawUnicode String into a generic SynUnicode Text


1.6.133. RawUnicodeToSynUnicode

function RawUnicodeToSynUnicode( WideChar: PWideChar; WideCharCount: integer): SynUnicode; overload;

Convert any UTF-16 buffer into a generic SynUnicode Text


1.6.134. RawUnicodeToUtf8

procedure RawUnicodeToUtf8(WideChar: PWideChar; WideCharCount: integer; var result: RawUtf8; Flags: TCharConversionFlags = [ccfNoTrailingZero]); overload;

Convert a UTF-16 PWideChar buffer into a UTF-8 string


1.6.135. RawUnicodeToUtf8

function RawUnicodeToUtf8(WideChar: PWideChar; WideCharCount: integer; Flags: TCharConversionFlags = [ccfNoTrailingZero]): RawUtf8; overload;

Convert a UTF-16 PWideChar buffer into a UTF-8 string


1.6.136. RawUnicodeToUtf8

procedure RawUnicodeToUtf8(WideChar: PWideChar; WideCharCount: integer; var result: TSynTempBuffer; Flags: TCharConversionFlags); overload;

Convert a UTF-16 PWideChar buffer into a UTF-8 temporary buffer


1.6.137. RawUnicodeToUtf8

function RawUnicodeToUtf8(Dest: PUtf8Char; DestLen: PtrInt; Source: PWideChar; SourceLen: PtrInt; Flags: TCharConversionFlags): PtrInt; overload;

Convert a UTF-16 PWideChar buffer into a UTF-8 buffer
- replace system.UnicodeToUtf8 implementation, which is rather slow since Delphi 2009+
- append a trailing #0 to the ending PUtf8Char, unless ccfNoTrailingZero is set
- if ccfReplacementCharacterForUnmatchedSurrogate is set, this function will identify unmatched surrogate pairs and replace them with UNICODE_REPLACEMENT_CHARACTER - see https://en.wikipedia.org/wiki/Specials_(Unicode_block)


1.6.138. RawUnicodeToUtf8

function RawUnicodeToUtf8(const Unicode: RawUnicode): RawUtf8; overload;

Convert a RawUnicode string into a UTF-8 string


1.6.139. RawUnicodeToUtf8

function RawUnicodeToUtf8(WideChar: PWideChar; WideCharCount: integer; out Utf8Length: integer): RawUtf8; overload;

Convert a UTF-16 PWideChar buffer into a UTF-8 string
- this version doesn't resize the resulting RawUtf8 string, but return the new resulting RawUtf8 byte count into Utf8Length


1.6.140. RawUnicodeToWinAnsi

function RawUnicodeToWinAnsi(const Unicode: RawUnicode): WinAnsiString; overload;

Convert a UTF-16 string into a WinAnsi (code page 1252) string


1.6.141. RawUnicodeToWinAnsi

function RawUnicodeToWinAnsi( WideChar: PWideChar; WideCharCount: integer): WinAnsiString; overload;

Convert a UTF-16 PWideChar buffer into a WinAnsi (code page 1252) string


1.6.142. RawUnicodeToWinPChar

procedure RawUnicodeToWinPChar(dest: PAnsiChar; source: PWideChar; WideCharCount: integer);

Direct conversion of a UTF-16 encoded buffer into a WinAnsi PAnsiChar buffer


1.6.143. RawUtf8DynArrayEquals

function RawUtf8DynArrayEquals(const A, B: TRawUtf8DynArray; Count: integer): boolean; overload;

True if both TRawUtf8DynArray are the same for a given number of items
- A and B are expected to have at least Count items
- comparison is case-sensitive


1.6.144. RawUtf8DynArrayEquals

function RawUtf8DynArrayEquals(const A, B: TRawUtf8DynArray): boolean; overload;

True if both TRawUtf8DynArray are the same
- comparison is case-sensitive


1.6.145. RawUtf8FromFile

function RawUtf8FromFile(const FileName: TFileName): RawUtf8;

Read a File content into a RawUtf8, detecting any leading BOM
- will assume text file with no BOM is already UTF-8 encoded
- an alternative to StringFromFile() if you want to handle UTF-8 content and the files are likely to be natively UTF-8 encoded, or with a BOM


1.6.146. RawUtf8OfChar

function RawUtf8OfChar(Ch: AnsiChar; Count: integer): RawUtf8;

UTF-8 dedicated (and faster) alternative to StringOfChar((Ch,Count))


1.6.147. RightU

function RightU(const S: RawUtf8; n: PtrInt): RawUtf8;

Returns n trailing characters


1.6.148. SameTextU

function SameTextU(const S1, S2: RawUtf8): boolean;

SameText() overloaded function with proper UTF-8 decoding
- fast version using NormToUpper[] array for all WinAnsi characters
- this version will decode each UTF-8 glyph before using NormToUpper[]
- current implementation handles UTF-16 surrogates as Utf8IComp()


1.6.149. ShortStringToUtf8

function ShortStringToUtf8(const source: ShortString): RawUtf8;

Direct conversion of a WinAnsi ShortString into a UTF-8 text
- call internally WinAnsiConvert fast conversion class


1.6.150. SortDynArrayAnsiStringI

function SortDynArrayAnsiStringI(const A, B): integer;

Compare two "array of AnsiString" elements, with no case sensitivity
- just a wrapper around inlined StrIComp()


1.6.151. SortDynArrayPUtf8CharI

function SortDynArrayPUtf8CharI(const A, B): integer;

Compare two "array of PUtf8Char/PAnsiChar" elements, with no case sensitivity
- just a wrapper around inlined StrIComp()


1.6.152. SortDynArrayStringI

function SortDynArrayStringI(const A, B): integer;

Compare two "array of RTL string" elements, with no case sensitivity
- the expected string type is the RTL string
- just a wrapper around StrIComp() for AnsiString or AnsiICompW() for UNICODE


1.6.153. SortDynArrayUnicodeStringI

function SortDynArrayUnicodeStringI(const A, B): integer;

Compare two "array of WideString/UnicodeString" elements, with no case sensitivity
- implemented here since would call AnsiICompW()


1.6.154. Split

function Split(const Str, SepStr: RawUtf8; var LeftStr, RightStr: RawUtf8; ToUpperCase: boolean = false): boolean; overload;

Split a RawUtf8 string into two strings, according to SepStr separator
- returns true and LeftStr/RightStr if they were separated by SepStr
- if SepStr is not found, LeftStr=Str and RightStr='' and returns false
- if ToUpperCase is TRUE, then LeftStr and RightStr will be made uppercase


1.6.155. Split

function Split(const Str: RawUtf8; const SepStr: array of RawUtf8; const DestPtr: array of PRawUtf8): PtrInt; overload;

Split a RawUtf8 string into several strings, according to SepStr separator
- this overloaded function will fill a DestPtr[] array of PRawUtf8
- if any DestPtr[]=nil, the item will be skipped
- if input Str end before al SepStr[] are found, DestPtr[] is set to ''
- returns the number of values extracted into DestPtr[]


1.6.156. Split

function Split(const Str, SepStr: RawUtf8; var LeftStr: RawUtf8; ToUpperCase: boolean = false): RawUtf8; overload;

Split a RawUtf8 string into two strings, according to SepStr separator
- this overloaded function returns the right string as function result
- if SepStr is not found, LeftStr=Str and result=''
- if ToUpperCase is TRUE, then LeftStr and result will be made uppercase


1.6.157. SplitRight

function SplitRight(const Str: RawUtf8; SepChar: AnsiChar; LeftStr: PRawUtf8 = nil): RawUtf8;

Returns the last occurrence of the given SepChar separated context
- e.g. SplitRight('01/2/34','/')='34'
- if SepChar doesn't appear, will return Str, e.g. SplitRight('123','/')='123'
- if LeftStr is supplied, the RawUtf8 it points to will be filled with the left part just before SepChar ('' if SepChar doesn't appear)


1.6.158. SplitRights

function SplitRights(const Str, SepChar: RawUtf8): RawUtf8;

Returns the last occurrence of the given SepChar separated context
- e.g. SplitRight('path/one\two/file.ext','/\')='file.ext', i.e. SepChars='/\' will be like ExtractFileName() over RawUtf8 string
- if SepChar doesn't appear, will return Str, e.g. SplitRight('123','/')='123'


1.6.159. StartWith

function StartWith(const text, upTextStart: RawUtf8): boolean;

Check case-insensitive matching starting of text in upTextStart
- returns true if the item matched
- ignore case - upTextStart must be already in upper case
- chars are compared as 7-bit Ansi only (no accentuated chars, nor UTF-8)
- see StartWithExact() from mormot.core.text for a case-sensitive version


1.6.160. StartWithExact

function StartWithExact(const text, textStart: RawUtf8): boolean;

Check case-sensitive matching starting of text in start
- returns true if the item matched
- see StartWith() from mormot.core.unicode for a case-insensitive version


1.6.161. StrCompIL

function StrCompIL(P1, P2: pointer; L: PtrInt; Default: PtrInt = 0): PtrInt;

Our fast version of StrCompIL(), to be used with PUtf8Char
- i.e. make a case-insensitive comparison of two memory buffers, using supplied length
- Default value is returned if both P1 and P2 buffers are equal


1.6.162. StrCompL

function StrCompL(P1, P2: pointer; L: PtrInt; Default: PtrInt = 0): PtrInt;

Our fast version of StrCompL(), to be used with PUtf8Char
- i.e. make a binary comparison of two memory buffers, using supplied length
- Default value is returned if both P1 and P2 buffers are equal


1.6.163. strcspn

function strcspn(s, reject: pointer): integer;

Pure pascal version of strcspn(), to be used with PUtf8Char/PAnsiChar
- returns size of initial segment of s which doesn't appears in reject chars, e.g.

 strcspn('1234,6789',',')=4

- please note that this optimized version may read up to 3 bytes beyond reject but never after s end, so is safe e.g. over memory mapped files


1.6.164. StrIComp

function StrIComp(Str1, Str2: pointer): PtrInt;

Our fast version of StrIComp(), to be used with PUtf8Char/PAnsiChar


1.6.165. StrICompLNotNil

function StrICompLNotNil(Str1, Str2: pointer; Up: PNormTableByte; L: PtrInt): PtrInt;

StrIComp-like function with a length, lookup table and Str1/Str2 expected not nil


1.6.166. StrICompNotNil

function StrICompNotNil(Str1, Str2: pointer; Up: PNormTableByte): PtrInt;

StrIComp-like function with a lookup table and Str1/Str2 expected not nil


1.6.167. StrILNotNil

function StrILNotNil(Str1, Str2: pointer; Up: PNormTableByte; L: PtrInt): PtrInt;

StrIComp function with a length, lookup table and Str1/Str2 expected not nil
- returns L for whole match, or < L for a partial match


1.6.168. StringBufferToUtf8

function StringBufferToUtf8(Dest: PUtf8Char; Source: PChar; SourceChars: PtrInt): PUtf8Char; overload;

Convert any RTL string buffer into an UTF-8 encoded buffer
- Dest must be able to receive at least SourceChars*3 bytes
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.169. StringBufferToUtf8

procedure StringBufferToUtf8(Source: PChar; out result: RawUtf8); overload;

Convert any RTL string 0-terminated Text buffer into an UTF-8 string
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.170. StringDynArrayToRawUtf8DynArray

procedure StringDynArrayToRawUtf8DynArray(const Source: TStringDynArray; var result: TRawUtf8DynArray);

Convert the string dynamic array into a dynamic array of UTF-8 strings


1.6.171. StringFromBomFile

function StringFromBomFile(const FileName: TFileName; out FileContent: RawByteString; out Buffer: pointer; out BufferSize: PtrInt): TBomFile;

Read a file into a temporary variable, check the BOM, and adjust the buffer


1.6.172. StringListToRawUtf8DynArray

procedure StringListToRawUtf8DynArray(Source: TStringList; var result: TRawUtf8DynArray);

Convert the string list into a dynamic array of UTF-8 strings


1.6.173. StringReplaceAll

function StringReplaceAll(const S, OldPattern, NewPattern: RawUtf8; CaseInsensitive: boolean): RawUtf8; overload;

Case-sensitive (or not) StringReplace(S, OldPattern, NewPattern,[rfReplaceAll])
- calls plain StringReplaceAll() version for CaseInsensitive = false
- calls StringReplaceAll(.., NormToUpperAnsi7) if CaseInsensitive = true


1.6.174. StringReplaceAll

function StringReplaceAll(const S: RawUtf8; const OldNewPatternPairs: array of RawUtf8; CaseInsensitive: boolean = false): RawUtf8; overload;

Fast version of several cascaded StringReplaceAll()


1.6.175. StringReplaceAll

function StringReplaceAll(const S, OldPattern, NewPattern: RawUtf8; Lookup: PNormTable = nil): RawUtf8; overload;

Fast version of StringReplace(S, OldPattern, NewPattern, [rfReplaceAll]);


1.6.176. StringReplaceAllProcess

function StringReplaceAllProcess(const S, OldPattern, NewPattern: RawUtf8; found: integer; Lookup: PNormTable): RawUtf8;

Actual replacement function called by StringReplaceAll() on first match
- not to be called as such, but defined globally for proper inlining


1.6.177. StringReplaceChars

function StringReplaceChars(const Source: RawUtf8; OldChar, NewChar: AnsiChar): RawUtf8;

Fast replace of a specified char by a given string


1.6.178. StringReplaceTabs

function StringReplaceTabs(const Source, TabText: RawUtf8): RawUtf8;

Fast replace of all #9 chars by a given string


1.6.179. StringToAnsi7

function StringToAnsi7(const Text: string): RawByteString;

Convert any RTL string into Ansi 7-bit encoded String
- the Text content must contain only 7-bit pure ASCII characters


1.6.180. StringToRawUnicode

function StringToRawUnicode(const S: string): RawUnicode; overload;

Convert any RTL string into a RawUnicode encoded String
- it's prefered to use TLanguageFile.StringToUtf8() method in mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.181. StringToRawUnicode

function StringToRawUnicode(P: PChar; L: integer): RawUnicode; overload;

Convert any RTL string into a RawUnicode encoded String
- it's prefered to use TLanguageFile.StringToUtf8() method in mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.182. StringToSynUnicode

procedure StringToSynUnicode(const S: string; var result: SynUnicode); overload;

Convert any RTL string into a SynUnicode encoded String
- overloaded to avoid a copy to a temporary result string of a function


1.6.183. StringToSynUnicode

function StringToSynUnicode(const S: string): SynUnicode; overload;

Convert any RTL string into a SynUnicode encoded String
- it's prefered to use TLanguageFile.StringToUtf8() method in mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.184. StringToUtf8

procedure StringToUtf8(const Text: string; var result: RawUtf8); overload;

Convert any RTL string into an UTF-8 encoded String
- this overloaded function use a faster by-reference parameter for the result


1.6.185. StringToUtf8

function StringToUtf8(const Text: string; var Temp: TSynTempBuffer): integer; overload;

Convert any RTL string into an UTF-8 encoded TSynTempBuffer
- returns the number of UTF-8 bytes available in Temp.buf
- this overloaded function use a TSynTempBuffer for the result to avoid any memory allocation for the shorter content
- caller should call Temp.Done to release any heap-allocated memory


1.6.186. StringToUtf8

function StringToUtf8(const Text: string): RawUtf8; overload;

Convert any RTL string into an UTF-8 encoded String
- in the VCL context, it's prefered to use TLanguageFile.StringToUtf8() method from mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.187. StringToUtf8

procedure StringToUtf8(Text: PChar; TextLen: PtrInt; var result: RawUtf8); overload;

Convert any RTL string buffer into an UTF-8 encoded String
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.188. StringToVariant

procedure StringToVariant(const Txt: string; var result: variant); overload;

Convert any RTL string into a variant storing a UTF-8 string
- could be used e.g. as TDocVariantData.AddValue() parameter


1.6.189. StringToVariant

function StringToVariant(const Txt: string): variant; overload;

Convert any RTL string into a variant storing a UTF-8 string
- could be used e.g. as TDocVariantData.AddValue() parameter


1.6.190. StringToWinAnsi

function StringToWinAnsi(const Text: string): WinAnsiString;

Convert any RTL string into WinAnsi (Win-1252) 8-bit encoded String


1.6.191. StrPosI

function StrPosI(uppersubstr, str: PUtf8Char): PUtf8Char;

A non case-sensitive version of Pos()
- uppersubstr is expected to be already in upper case
- this version handle only 7-bit ASCII (no accentuated characters)


1.6.192. StrPosIReference

function StrPosIReference(U: PUtf8Char; const Up: RawUcs4): PUtf8Char;

UTF-8 Unicode 10.0 case-insensitive Pattern search within UTF-8 buffer
- returns nil if no match, or the Pattern position found inside U^
- Up should have been already converted using UpperCaseUcs4Reference()
- won't call the Operating System, so is consistent on all platforms, and don't require any temporary UTF-16 decoding


1.6.193. strspn

function strspn(s, accept: pointer): integer;

Pure pascal version of strspn(), to be used with PUtf8Char/PAnsiChar
- returns size of initial segment of s which appears in accept chars, e.g.

 strspn('abcdef','debca')=5

- please note that this optimized version may read up to 3 bytes beyond accept but never after s end, so is safe e.g. over memory mapped files


1.6.194. SynUnicodeToString

function SynUnicodeToString(const U: SynUnicode): string;

Convert any SynUnicode encoded string into a RTL string


1.6.195. SynUnicodeToUtf8

function SynUnicodeToUtf8(const Unicode: SynUnicode): RawUtf8;

Convert a SynUnicode string into a UTF-8 string


1.6.196. ToUtf8

function ToUtf8(const Ansi7Text: ShortString): RawUtf8; overload;

Convert any UTF-8 encoded ShortString Text into an UTF-8 encoded String
- expects the supplied content to be already ASCII-7 or UTF-8 encoded, e.g. a RTTI type or property name: it won't work with Ansi-encoded strings


1.6.197. ToUtf8

function ToUtf8(const Text: string): RawUtf8; overload;

Convert any RTL string into an UTF-8 encoded String


1.6.198. TRawUtf8DynArrayFrom

function TRawUtf8DynArrayFrom(const Values: array of RawUtf8): TRawUtf8DynArray;

Quick helper to initialize a dynamic array of RawUtf8 from some constants
- can be used e.g. as:

 MyArray := TRawUtf8DynArrayFrom(['a','b','c']);

1.6.199. TrimChar

function TrimChar(const text: RawUtf8; const exclude: TSynAnsicharSet): RawUtf8;

Returns the supplied text content, without any specified char
- specify a custom char set to be excluded, e.g. as [#0 .. ' ']


1.6.200. TrimChars

procedure TrimChars(var S: RawUtf8; Left, Right: PtrInt);

Trim some trailing and ending chars
- if S is unique (RefCnt=1), will modify the RawUtf8 in place
- faster alternative to S := copy(S, Left + 1, length(S) - Left - Right)


1.6.201. TrimControlChars

function TrimControlChars(const text: RawUtf8): RawUtf8;

Returns the supplied text content, without any control char
- here control chars have an ASCII code in [#0 .. ' '], i.e. text[] <= ' '


1.6.202. TrimLeft

function TrimLeft(const S: RawUtf8): RawUtf8;

Trims leading whitespace characters from the string by removing new line, space, and tab characters


1.6.203. TrimLeftLines

procedure TrimLeftLines(var S: RawUtf8);

Trims leading whitespaces of every lines of the UTF-8 text
- also delete void lines
- could be used e.g. before FindNameValue() call
- modification is made in-place so S will be modified


1.6.204. TrimLeftLowerCase

function TrimLeftLowerCase(const V: RawUtf8): PUtf8Char;

Trim first lowercase chars ('otDone' will return 'Done' e.g.)
- return a PUtf8Char to avoid any memory allocation


1.6.205. TrimLeftLowerCaseShort

function TrimLeftLowerCaseShort(V: PShortString): RawUtf8;

Trim first lowercase chars ('otDone' will return 'Done' e.g.)
- return an RawUtf8 string: enumeration names are pure 7-bit ANSI with Delphi 7 to 2007, and UTF-8 encoded with Delphi 2009+


1.6.206. TrimLeftLowerCaseToShort

procedure TrimLeftLowerCaseToShort(V: PShortString; out result: ShortString); overload;

Trim first lowercase chars ('otDone' will return 'Done' e.g.)
- return a ShortString: enumeration names are pure 7-bit ANSI with Delphi 7 to 2007, and UTF-8 encoded with Delphi 2009+


1.6.207. TrimLeftLowerCaseToShort

function TrimLeftLowerCaseToShort(V: PShortString): ShortString; overload;

Trim first lowercase chars ('otDone' will return 'Done' e.g.)
- return a ShortString: enumeration names are pure 7-bit ANSI with Delphi 7 to 2007, and UTF-8 encoded with Delphi 2009+


1.6.208. TrimOneChar

function TrimOneChar(const text: RawUtf8; exclude: AnsiChar): RawUtf8;

Returns the supplied text content, without one specified char


1.6.209. TrimRight

function TrimRight(const S: RawUtf8): RawUtf8;

Trims trailing whitespace characters from the string by removing trailing newline, space, and tab characters


1.6.210. Ucs4ToUtf8

function Ucs4ToUtf8(ucs4: Ucs4CodePoint; Dest: PUtf8Char): PtrInt;

UTF-8 encode one UCS4 CodePoint into Dest
- return the number of bytes written into Dest (i.e. from 1 up to 6)
- this method DOES properly handle UTF-16 surrogate pairs


1.6.211. UnCamelCase

function UnCamelCase(const S: RawUtf8): RawUtf8; overload;

Convert a CamelCase string into a space separated one
- 'OnLine' will return 'On line' e.g., and 'OnMyLINE' will return 'On my LINE'
- will handle capital words at the beginning, middle or end of the text, e.g. 'KLMFlightNumber' will return 'KLM flight number' and 'GoodBBCProgram' will return 'Good BBC program'
- will handle a number at the beginning, middle or end of the text, e.g. 'Email12' will return 'Email 12'
- '_' char is transformed into ' - '
- '__' chars are transformed into ': '
- return an RawUtf8 string: enumeration names are pure 7-bit ANSI with Delphi up to 2007, and UTF-8 encoded with Delphi 2009+


1.6.212. UnCamelCase

function UnCamelCase(D, P: PUtf8Char): integer; overload;

Convert a CamelCase string into a space separated one
- 'OnLine' will return 'On line' e.g., and 'OnMyLINE' will return 'On my LINE'
- will handle capital words at the beginning, middle or end of the text, e.g. 'KLMFlightNumber' will return 'KLM flight number' and 'GoodBBCProgram' will return 'Good BBC program'
- will handle a number at the beginning, middle or end of the text, e.g. 'Email12' will return 'Email 12'
- return the char count written into D^
- D^ and P^ are expected to be UTF-8 encoded: enumeration and property names are pure 7-bit ANSI with Delphi 7 to 2007, and UTF-8 encoded with Delphi 2009+
- '_' char is transformed into ' - '
- '__' chars are transformed into ': '


1.6.213. UnicodeBufferToString

function UnicodeBufferToString(source: PWideChar): string;

Convert an Unicode buffer into a RTL string


1.6.214. UnicodeBufferToUtf8

function UnicodeBufferToUtf8(source: PWideChar): RawUtf8;

Convert an Unicode buffer into a UTF-8 string


1.6.215. UnicodeBufferToVariant

function UnicodeBufferToVariant(source: PWideChar): variant;

Convert an Unicode buffer into a variant storing a UTF-8 string
- could be used e.g. as TDocVariantData.AddValue() parameter


1.6.216. UnicodeBufferToWinAnsi

procedure UnicodeBufferToWinAnsi(source: PWideChar; out Dest: WinAnsiString);

Convert an Unicode buffer into a WinAnsi (code page 1252) string


1.6.217. UniqueRawUtf8ZeroToTilde

procedure UniqueRawUtf8ZeroToTilde(var u: RawUtf8; MaxSize: PtrInt = maxInt);

Will fast replace all #0 chars as ~
- could be used after UniqueRawUtf8() on a in-placed modified JSON buffer, in which all values have been ended with #0
- you can optionally specify a maximum size, in bytes (this won't reallocate the string, but just add a #0 at some point in the UTF-8 buffer)
- could allow logging of parsed input e.g. after an exception


1.6.218. UnQuotedSqlSymbolName

function UnQuotedSqlSymbolName(const ExternalDBSymbol: RawUtf8): RawUtf8;

Unquote a SQL-compatible symbol name
- e.g. '[symbol]' -> 'symbol' or '"symbol"' -> 'symbol'


1.6.219. UnQuoteSqlString

function UnQuoteSqlString(const Value: RawUtf8): RawUtf8;

Unquote a SQL-compatible string


1.6.220. UnQuoteSqlStringVar

function UnQuoteSqlStringVar(P: PUtf8Char; out Value: RawUtf8): PUtf8Char;

Unquote a SQL-compatible string
- the first character in P^ must be either ' or " then internal double quotes are transformed into single quotes
- 'text '' end' -> text ' end
- "text "" end" -> text " end
- returns nil if P doesn't contain a valid SQL string
- returns a pointer just after the quoted text otherwise


1.6.221. UnZeroed

function UnZeroed(const bin: RawByteString): RawUtf8;

Convert a binary buffer into a fake ASCII/UTF-8 content without any #0 input
- will use ~ char to escape any #0 as ~0 pair (and plain ~ as ~~ pair)
- output is just a bunch of non 0 bytes, so not trully valid UTF-8 content
- may be used as an alternative to Base64 encoding if 8-bit chars are allowed
- call ZeroedRawUtf8() as reverse function


1.6.222. UpperCase

function UpperCase(const S: RawUtf8): RawUtf8;

Fast conversion of the supplied text into uppercase
- this will only convert 'a'..'z' into 'A'..'Z' (no NormToUpper use), and will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.223. UpperCaseCopy

procedure UpperCaseCopy(const Source: RawUtf8; var Dest: RawUtf8); overload;

Fast conversion of the supplied text into uppercase
- this will only convert 'a'..'z' into 'A'..'Z' (no NormToUpper use), and will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.224. UpperCaseCopy

procedure UpperCaseCopy(Text: PUtf8Char; Len: PtrInt; var Dest: RawUtf8); overload;

Fast conversion of the supplied text into uppercase
- this will only convert 'a'..'z' into 'A'..'Z' (no NormToUpper use), and will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.225. UpperCaseReference

function UpperCaseReference(const S: RawUtf8): RawUtf8;

UpperCase conversion of a UTF-8 string using our Unicode 10.0 tables
- won't call the Operating System, so is consistent on all platforms, whereas UpperCaseUnicode() may vary depending on each library implementation
- won't use temporary UTF-16 decoding, and optimized for plain ASCII content


1.6.226. UpperCaseSelf

procedure UpperCaseSelf(var S: RawUtf8);

Fast in-place conversion of the supplied variable text into uppercase
- this will only convert 'a'..'z' into 'A'..'Z' (no NormToUpper use), and will therefore be correct with true UTF-8 content, but only for 7-bit


1.6.227. UpperCaseSynUnicode

function UpperCaseSynUnicode(const S: SynUnicode): SynUnicode;

Use the RTL to convert the SynUnicode text to UpperCase


1.6.228. UpperCaseU

function UpperCaseU(const S: RawUtf8): RawUtf8;

Fast conversion of the supplied text into 8-bit uppercase
- this will not only convert 'a'..'z' into 'A'..'Z', but also accentuated latin characters ('e' acute into 'E' e.g.), using NormToUpper[] array
- it will therefore decode the supplied UTF-8 content to handle more than 7-bit of ascii characters (so this function is dedicated to WinAnsi code page 1252 characters set)


1.6.229. UpperCaseUcs4Reference

function UpperCaseUcs4Reference(const S: RawUtf8): RawUcs4;

UpperCase conversion of UTF-8 into UCS4 using our Unicode 10.0 tables
- won't call the Operating System, so is consistent on all platforms, whereas UpperCaseUnicode() may vary depending on each library implementation


1.6.230. UpperCaseUnicode

function UpperCaseUnicode(const S: RawUtf8): RawUtf8;

Accurate conversion of the supplied UTF-8 content into the corresponding upper-case Unicode characters
- will use the available API (e.g. Win32 or ICU), so may not be consistent on all systems - consider UpperCaseReference() to use our Unicode 10.0 tables
- will temporary decode S into and from UTF-16 so is likely to be slower


1.6.231. UpperCopy

function UpperCopy(dest: PAnsiChar; const source: RawUtf8): PAnsiChar;

Copy source into dest^ with ASCII 7-bit upper case conversion
- returns final dest pointer
- will copy up to the source buffer end: so Dest^ should be big enough - which will the case e.g. if Dest := pointer(source)


1.6.232. UpperCopy255

function UpperCopy255(dest: PAnsiChar; const source: RawUtf8): PAnsiChar; overload;

Delphi does not like inlining goto+label copy source into a 256 chars dest^ buffer with 7-bit upper case conversion
- used internally for short keys match or case-insensitive hash
- returns final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be defined e.g. as array[byte] of AnsiChar on the caller stack)


1.6.233. UpperCopy255Buf

function UpperCopy255Buf(dest: PAnsiChar; source: PUtf8Char; sourceLen: PtrInt): PAnsiChar;

Copy source^ into a 256 chars dest^ buffer with 7-bit upper case conversion
- used internally for short keys match or case-insensitive hash
- returns final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be defined e.g. as array[byte] of AnsiChar on the caller stack)


1.6.234. UpperCopy255W

function UpperCopy255W(dest: PAnsiChar; source: PWideChar; L: PtrInt): PAnsiChar; overload;

Copy WideChar source into dest^ with upper case conversion
- used internally for short keys match or case-insensitive hash
- returns final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be array[byte] of AnsiChar), replacing any non WinAnsi character by '?'


1.6.235. UpperCopy255W

function UpperCopy255W(dest: PAnsiChar; const source: SynUnicode): PAnsiChar; overload;

Copy UTF-16 source into dest^ with ASCII 7-bit upper case conversion
- used internally for short keys match or case-insensitive hash
- returns final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be array[byte] of AnsiChar), replacing any non WinAnsi character by '?'


1.6.236. UpperCopyShort

function UpperCopyShort(dest: PAnsiChar; const source: ShortString): PAnsiChar;

Copy source into dest^ with ASCII 7-bit upper case conversion
- returns final dest pointer
- this special version expect source to be a ShortString


1.6.237. UpperCopyWin255

function UpperCopyWin255(dest: PWinAnsiChar; const source: RawUtf8): PWinAnsiChar;

Copy source into dest^ with WinAnsi 8-bit upper case conversion
- used internally for short keys match or case-insensitive hash
- returns final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be array[byte] of AnsiChar)


1.6.238. Utf16CharToUtf8

function Utf16CharToUtf8(Dest: PUtf8Char; var Source: PWord): integer;

UTF-8 encode one UTF-16 encoded UCS4 CodePoint into Dest
- return the number of bytes written into Dest (i.e. from 1 up to 6)
- Source will contain the next UTF-16 character
- this method DOES properly handle UTF-16 surrogate pairs


1.6.239. Utf8DecodeToRawUnicode

function Utf8DecodeToRawUnicode(P: PUtf8Char; L: integer): RawUnicode; overload;

Convert a UTF-8 encoded buffer into a RawUnicode string
- if L is 0, L is computed from zero terminated P buffer
- RawUnicode is ended by a WideChar(#0)
- faster than System.Utf8Decode() which uses slow widestrings


1.6.240. Utf8DecodeToRawUnicode

function Utf8DecodeToRawUnicode(const S: RawUtf8): RawUnicode; overload;

Convert a UTF-8 string into a RawUnicode string


1.6.241. Utf8DecodeToRawUnicodeUI

function Utf8DecodeToRawUnicodeUI(const S: RawUtf8; var Dest: RawUnicode): integer; overload;

Convert a UTF-8 string into a RawUnicode string
- returns the resulting length (in bytes) will be stored within Dest
- see also Utf8DecodeToUnicode() which uses a TSynTempBuffer for storage


1.6.242. Utf8DecodeToRawUnicodeUI

function Utf8DecodeToRawUnicodeUI(const S: RawUtf8; DestLen: PInteger = nil): RawUnicode; overload;

Convert a UTF-8 string into a RawUnicode string
- this version doesn't resize the length of the result RawUnicode and is therefore useful before a Win32 Unicode API call (with nCount=-1)
- if DestLen is not nil, the resulting length (in bytes) will be stored within
- see also Utf8DecodeToUnicode() which uses a TSynTempBuffer for storage


1.6.243. Utf8DecodeToString

procedure Utf8DecodeToString(P: PUtf8Char; L: integer; var result: string); overload;

Convert any UTF-8 encoded buffer into a RTL string


1.6.244. Utf8DecodeToString

function Utf8DecodeToString(P: PUtf8Char; L: integer): string; overload;

Convert any UTF-8 encoded buffer into a RTL string
- it's prefered to use TLanguageFile.Utf8ToString() in mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.245. Utf8DecodeToUnicode

function Utf8DecodeToUnicode(Text: PUtf8Char; Len: PtrInt; var temp: TSynTempBuffer): PtrInt; overload;

Convert any UTF-8 encoded buffer into an UTF-16 temporary buffer


1.6.246. Utf8DecodeToUnicode

function Utf8DecodeToUnicode(const Text: RawUtf8; var temp: TSynTempBuffer): PtrInt; overload;

Convert any UTF-8 encoded string into an UTF-16 temporary buffer
- returns the number of WideChar stored in temp (not bytes)
- caller should make temp.Done after temp.buf has been used


1.6.247. Utf8DecodeToUnicodeRawByteString

function Utf8DecodeToUnicodeRawByteString(P: PUtf8Char; L: integer): RawByteString; overload;

Convert an UTF-8 encoded buffer into a UTF-16 encoded RawByteString buffer
- could be used instead of deprecated RawUnicode when a temp UTF-16 buffer is needed


1.6.248. Utf8DecodeToUnicodeRawByteString

function Utf8DecodeToUnicodeRawByteString(const U: RawUtf8): RawByteString; overload;

Convert an UTF-8 encoded buffer into a UTF-16 encoded RawByteString buffer
- could be used instead of deprecated RawUnicode when a temp UTF-16 buffer is needed


1.6.249. Utf8DecodeToUnicodeStream

function Utf8DecodeToUnicodeStream(P: PUtf8Char; L: integer): TStream;

Convert an UTF-8 encoded buffer into a UTF-16 encoded stream of bytes


1.6.250. Utf8FirstLineToUtf16Length

function Utf8FirstLineToUtf16Length(source: PUtf8Char): PtrInt;

Calculate the UTF-16 Unicode characters count of the UTF-8 encoded first line
- count may not match the UCS4 CodePoint, in case of UTF-16 surrogates
- end the parsing at first #13 or #10 character


1.6.251. Utf8IComp

function Utf8IComp(u1, u2: PUtf8Char): PtrInt;

Fast UTF-8 comparison handling WinAnsi CP-1252 case folding
- this version expects u1 and u2 to be zero-terminated
- decode the UTF-8 content before using NormToUpper[] lookup table
- match the our SYSTEMNOCASE custom (and default) SQLite 3 collation
- consider Utf8ICompReference() for Unicode 10.0 support


1.6.252. Utf8ICompReference

function Utf8ICompReference(u1, u2: PUtf8Char): PtrInt;

UTF-8 comparison using our Unicode 10.0 tables
- this version expects u1 and u2 to be zero-terminated
- Utf8IComp() handles WinAnsi CP-1252 latin accents - this one is Unicode
- won't call the Operating System, so is consistent on all platforms, and don't require any temporary UTF-16 decoding
- has a branchless optimized process of 7-bit ASCII charset [a..z] -> [A..Z]


1.6.253. Utf8ILComp

function Utf8ILComp(u1, u2: PUtf8Char; L1, L2: cardinal): PtrInt;

Fast UTF-8 comparison handling WinAnsi CP-1252 case folding
- this version expects u1 and u2 not to be necessary zero-terminated, but uses L1 and L2 as length for u1 and u2 respectively
- decode the UTF-8 content before using NormToUpper[] lookup table
- consider Utf8ILCompReference() for Unicode 10.0 support


1.6.254. Utf8ILCompReference

function Utf8ILCompReference(u1, u2: PUtf8Char; L1, L2: integer): PtrInt;

UTF-8 comparison using our Unicode 10.0 tables
- this version expects u1 and u2 not to be necessary zero-terminated, but uses L1 and L2 as length for u1 and u2 respectively
- Utf8ILComp() handles WinAnsi CP-1252 latin accents - this one is Unicode
- won't call the Operating System, so is consistent on all platforms, and don't require any temporary UTF-16 decoding
- has a branchless optimized process of 7-bit ASCII charset [a..z] -> [A..Z]


1.6.255. Utf8ToFileName

procedure Utf8ToFileName(const Text: RawUtf8; var result: TFileName);

Convert any UTF-8 encoded String into a generic RTL file name string


1.6.256. Utf8ToRawUtf8

procedure Utf8ToRawUtf8(P: PUtf8Char; var result: RawUtf8);

Direct conversion of a UTF-8 encoded zero terminated buffer into a RawUtf8 String


1.6.257. Utf8ToShortString

procedure Utf8ToShortString(var dest: ShortString; source: PUtf8Char);

Direct conversion of a UTF-8 encoded buffer into a WinAnsi ShortString buffer
- non WinAnsi chars are replaced by '?' placeholders


1.6.258. Utf8ToString

function Utf8ToString(const Text: RawUtf8): string;

Convert any UTF-8 encoded String into a RTL string
- it's prefered to use TLanguageFile.Utf8ToString() in mORMoti18n, which will handle full i18n of your application
- it will work as is with Delphi 2009+ (direct unicode conversion)
- under older version of Delphi (no unicode), it will use the current RTL codepage, as with WideString conversion (but without slow WideString usage)


1.6.259. Utf8ToStringVar

procedure Utf8ToStringVar(const Text: RawUtf8; var result: string);

Convert any UTF-8 encoded String into a RTL string


1.6.260. Utf8ToSynUnicode

function Utf8ToSynUnicode(const Text: RawUtf8): SynUnicode; overload;

Convert any UTF-8 encoded String into a generic SynUnicode Text


1.6.261. Utf8ToSynUnicode

procedure Utf8ToSynUnicode(Text: PUtf8Char; Len: PtrInt; var result: SynUnicode); overload;

Convert any UTF-8 encoded buffer into a generic SynUnicode Text


1.6.262. Utf8ToSynUnicode

procedure Utf8ToSynUnicode(const Text: RawUtf8; var result: SynUnicode); overload;

Convert any UTF-8 encoded String into a generic SynUnicode Text


1.6.263. Utf8ToUnicodeLength

function Utf8ToUnicodeLength(source: PUtf8Char): PtrUInt;

Calculate the UTF-16 Unicode characters count, UTF-8 encoded in source^
- count may not match the UCS4 CodePoint, in case of UTF-16 surrogates
- faster than System.Utf8ToUnicode with dest=nil


1.6.264. Utf8ToWideChar

function Utf8ToWideChar(dest: PWideChar; source: PUtf8Char; MaxDestChars, sourceBytes: PtrInt; NoTrailingZero: boolean = false): PtrInt; overload;

Convert an UTF-8 encoded text into a WideChar (UTF-16) buffer
- faster than System.Utf8ToUnicode
- this overloaded function expect a MaxDestChars parameter
- sourceBytes can not be 0 for this function
- enough place must be available in dest buffer (guess is sourceBytes*3+2)
- a WideChar(#0) is added at the end (if something is written) unless NoTrailingZero is TRUE
- returns the BYTE COUNT (not WideChar count) written in dest, excluding the ending WideChar(#0)


1.6.265. Utf8ToWideChar

function Utf8ToWideChar(dest: PWideChar; source: PUtf8Char; sourceBytes: PtrInt = 0; NoTrailingZero: boolean = false): PtrInt; overload;

Convert an UTF-8 encoded text into a WideChar (UTF-16) buffer
- faster than System.Utf8ToUnicode
- sourceBytes can by 0, therefore length is computed from zero terminated source
- enough place must be available in dest buffer (guess is sourceBytes*3+2)
- a WideChar(#0) is added at the end (if something is written) unless NoTrailingZero is TRUE
- returns the BYTE count written in dest, excluding the ending WideChar(#0)


1.6.266. Utf8ToWideString

function Utf8ToWideString(const Text: RawUtf8): WideString; overload;

Convert any UTF-8 encoded String into a generic WideString Text


1.6.267. Utf8ToWideString

procedure Utf8ToWideString(const Text: RawUtf8; var result: WideString); overload;

Convert any UTF-8 encoded String into a generic WideString Text


1.6.268. Utf8ToWideString

procedure Utf8ToWideString(Text: PUtf8Char; Len: PtrInt; var result: WideString); overload;

Convert any UTF-8 encoded String into a generic WideString Text


1.6.269. Utf8ToWinAnsi

function Utf8ToWinAnsi(const S: RawUtf8): WinAnsiString; overload;

Direct conversion of a UTF-8 encoded string into a WinAnsi String


1.6.270. Utf8ToWinAnsi

function Utf8ToWinAnsi(P: PUtf8Char): WinAnsiString; overload;

Direct conversion of a UTF-8 encoded zero terminated buffer into a WinAnsi String


1.6.271. Utf8ToWinPChar

function Utf8ToWinPChar(dest: PAnsiChar; source: PUtf8Char; count: integer): integer;

Direct conversion of a UTF-8 encoded buffer into a WinAnsi PAnsiChar buffer


1.6.272. Utf8TruncatedLength

function Utf8TruncatedLength(const text: RawUtf8; maxBytes: PtrUInt): PtrInt; overload;

Compute the truncated length of the supplied UTF-8 value if it exceeds the specified bytes count
- this function will ensure that the returned content will contain only valid UTF-8 sequence, i.e. will trim the whole trailing UTF-8 sequence
- returns maxBytes if text was not truncated, or the number of fitting bytes


1.6.273. Utf8TruncatedLength

function Utf8TruncatedLength(text: PAnsiChar; textlen, maxBytes: PtrUInt): PtrInt; overload;

Compute the truncated length of the supplied UTF-8 value if it exceeds the specified bytes count
- this function will ensure that the returned content will contain only valid UTF-8 sequence, i.e. will trim the whole trailing UTF-8 sequence
- returns maxBytes if text was not truncated, or the number of fitting bytes


1.6.274. Utf8TruncateToLength

function Utf8TruncateToLength(var text: RawUtf8; maxBytes: PtrUInt): boolean;

Will truncate the supplied UTF-8 value if its length exceeds the specified bytes count
- this function will ensure that the returned content will contain only valid UTF-8 sequence, i.e. will trim the whole trailing UTF-8 sequence
- returns FALSE if text was not truncated, TRUE otherwise


1.6.275. Utf8TruncateToUnicodeLength

function Utf8TruncateToUnicodeLength(var text: RawUtf8; maxUtf16: integer): boolean;

Will truncate the supplied UTF-8 value if its length exceeds the specified UTF-16 Unicode characters count
- count may not match the UCS4 CodePoint, in case of UTF-16 surrogates
- returns FALSE if text was not truncated, TRUE otherwise


1.6.276. Utf8UpperCopy

function Utf8UpperCopy(Dest, Source: PUtf8Char; SourceChars: cardinal): PUtf8Char;

Copy UTF-8 buffer into dest^ handling WinAnsi CP-1252 NormToUpper[] folding
- returns the final dest pointer
- current implementation handles UTF-16 surrogates


1.6.277. Utf8UpperCopy255

function Utf8UpperCopy255(dest: PAnsiChar; const source: RawUtf8): PUtf8Char;

Copy UTF-8 buffer into dest^ handling WinAnsi CP-1252 NormToUpper[] folding
- returns the final dest pointer
- will copy up to 255 AnsiChar (expect the dest buffer to be array[byte] of AnsiChar), with UTF-8 encoding


1.6.278. Utf8UpperReference

function Utf8UpperReference(S, D: PUtf8Char): PUtf8Char; overload;

UpperCase conversion of a UTF-8 buffer using our Unicode 10.0 tables
- won't call the Operating System, so is consistent on all platforms, whereas UpperCaseUnicode() may vary depending on each library implementation
- some codepoints enhance in length, so D^ should be at least twice than S^
- any invalid input is replaced by UNICODE_REPLACEMENT_CHARACTER=$fffd
- won't use temporary UTF-16 decoding, and optimized for plain ASCII content


1.6.279. Utf8UpperReference

function Utf8UpperReference(S, D: PUtf8Char; SLen: PtrUInt): PUtf8Char; overload;

UpperCase conversion of a UTF-8 buffer using our Unicode 10.0 tables
- won't call the Operating System, so is consistent on all platforms, whereas UpperCaseUnicode() may vary depending on each library implementation
- some codepoints enhance in length, so D^ should be at least twice than S^
- any invalid input is replaced by UNICODE_REPLACEMENT_CHARACTER=$fffd
- won't use temporary UTF-16 decoding, and optimized for plain ASCII content
- knowing the Source length, this function will handle any ASCII 7-bit input by quad, for efficiency


1.6.280. WideCharToWinAnsi

function WideCharToWinAnsi(wc: cardinal): integer;

Conversion of a wide char into a WinAnsi (CodePage 1252) char index
- return -1 for an unknown WideChar in code page 1252


1.6.281. WideCharToWinAnsiChar

function WideCharToWinAnsiChar(wc: cardinal): AnsiChar;

Conversion of a wide char into a WinAnsi (CodePage 1252) char
- return '?' for an unknown WideChar in code page 1252


1.6.282. WideStringToUtf8

function WideStringToUtf8(const aText: WideString): RawUtf8;

Convert a WideString into a UTF-8 string


1.6.283. WideStringToWinAnsi

function WideStringToWinAnsi(const Wide: WideString): WinAnsiString;

Convert a WideString into a WinAnsi (code page 1252) string


1.6.284. WinAnsiBufferToUtf8

function WinAnsiBufferToUtf8(Dest: PUtf8Char; Source: PAnsiChar; SourceChars: cardinal): PUtf8Char;

Direct conversion of a WinAnsi PAnsiChar buffer into a UTF-8 encoded buffer
- Dest^ buffer must be reserved with at least SourceChars*3
- call internally WinAnsiConvert fast conversion class


1.6.285. WinAnsiToRawUnicode

function WinAnsiToRawUnicode(const S: WinAnsiString): RawUnicode;

Direct conversion of a WinAnsi (CodePage 1252) string into a Unicode encoded String
- very fast, by using a fixed pre-calculated array for individual chars conversion


1.6.286. WinAnsiToSynUnicode

function WinAnsiToSynUnicode(WinAnsi: PAnsiChar; WinAnsiLen: PtrInt): SynUnicode; overload;

Convert a Win-Ansi encoded buffer into a Delphi 2009+ or FPC Unicode string
- this function is faster than default RTL, since use no Win32 API call


1.6.287. WinAnsiToSynUnicode

function WinAnsiToSynUnicode(const WinAnsi: WinAnsiString): SynUnicode; overload;

Convert a Win-Ansi string into a Delphi 2009+ or FPC Unicode string
- this function is faster than default RTL, since use no Win32 API call


1.6.288. WinAnsiToUnicodeBuffer

procedure WinAnsiToUnicodeBuffer(const S: WinAnsiString; Dest: PWordArray; DestLen: PtrInt);

Direct conversion of a WinAnsi (CodePage 1252) string into a Unicode buffer
- very fast, by using a fixed pre-calculated array for individual chars conversion
- text will be truncated if necessary to avoid buffer overflow in Dest[]


1.6.289. WinAnsiToUtf8

function WinAnsiToUtf8(const S: WinAnsiString): RawUtf8; overload;

Direct conversion of a WinAnsi (CodePage 1252) string into a UTF-8 encoded String
- faster than SysUtils: don't use Utf8Encode(WideString) -> no Windows.Global(), and use a fixed pre-calculated array for individual chars conversion


1.6.290. WinAnsiToUtf8

function WinAnsiToUtf8(WinAnsi: PAnsiChar; WinAnsiLen: PtrInt): RawUtf8; overload;

Direct conversion of a WinAnsi (CodePage 1252) string into a UTF-8 encoded String
- faster than SysUtils: don't use Utf8Encode(WideString) -> no Windows.Global(), and use a fixed pre-calculated array for individual chars conversion


1.6.291. Zeroed

function Zeroed(const u: RawUtf8): RawByteString;

Convert a fake UTF-8 buffer without any #0 input back into its original binary
- may be used as an alternative to Base64 decoding if 8-bit chars are allowed
- call UnZeroedRawUtf8() as reverse function


1.7. Variables implemented in the mormot.core.unicode unit

1.7.1. CurrentAnsiConvert

CurrentAnsiConvert: TSynAnsiConvert;

Global TSynAnsiConvert instance to handle current system encoding
- this is the encoding as used by the AnsiString type, so will be used before Delphi 2009 to speed-up RTL string handling (especially for UTF-8)
- this instance is global and instantied during the whole program life time


1.7.2. IdemPropNameUSameLen

IdemPropNameUSameLen: array[boolean] of TIdemPropNameUSameLen;

Case (in)sensitive comparison of ASCII 7-bit identifiers of same length


1.7.3. IsValidUtf8Buffer

IsValidUtf8Buffer: function(source: PUtf8Char; sourcelen: PtrInt): boolean;

Returns TRUE if the supplied buffer has valid UTF-8 encoding
- will also refuse #0 characters within the buffer
- on Haswell AVX2 Intel/AMD CPUs, will use very efficient ASM


1.7.4. LoadResStringTranslate

LoadResStringTranslate: procedure(var Text: string) = nil;

These procedure type must be defined if a default system.pas is used
- expect generic "string" type, i.e. UnicodeString for Delphi 2009+


1.7.5. NormToLower

NormToLower: TNormTable;

Lookup table used for fast case conversion to lowercase
- handle 8-bit upper chars as in WinAnsi / code page 1252 (e.g. accents)
- is defined globally, since may be used from an inlined function


1.7.6. NormToLowerAnsi7

NormToLowerAnsi7: TNormTable;

This table will convert 'A'..'Z' into 'a'..'z'
- so it will work with UTF-8 without decoding, whereas NormToUpper[] expects WinAnsi encoding


1.7.7. NormToNorm

NormToNorm: TNormTable;

Case sensitive NormToUpper[]/NormToLower[]-like table
- i.e. NormToNorm[c] = c


1.7.8. NormToUpper

NormToUpper: TNormTable;

Lookup table used for fast case conversion to uppercase
- handle 8-bit upper chars as in WinAnsi / code page 1252 (e.g. accents)
- is defined globally, since may be used from an inlined function


1.7.9. NormToUpperAnsi7

NormToUpperAnsi7: TNormTable;

This table will convert 'a'..'z' into 'A'..'Z'
- so it will work with UTF-8 without decoding, whereas NormToUpper[] expects WinAnsi encoding


1.7.10. RawByteStringConvert

RawByteStringConvert: TSynAnsiFixedWidth;

Global TSynAnsiConvert instance with no encoding (RawByteString/RawBlob)


1.7.11. SortDynArrayAnsiStringByCase

SortDynArrayAnsiStringByCase: array[boolean] of TDynArraySortCompare;

A quick wrapper to SortDynArrayAnsiString or SortDynArrayAnsiStringI comparison functions


1.7.12. StrCompByCase

StrCompByCase: array[boolean] of TUtf8Compare;

A quick wrapper to StrComp or StrIComp comparison functions


1.7.13. TEXT_CHARS

TEXT_CHARS: TTextCharSet;

Lookup table for text linefeed/word/identifier/uri branch-less parsing


1.7.14. Utf8AnsiConvert

Utf8AnsiConvert: TSynAnsiUtf8;

Global TSynAnsiConvert instance to handle UTF-8 encoding (code page CP_UTF8)
- this instance is global and instantied during the whole program life time


1.7.15. WinAnsiConvert

WinAnsiConvert: TSynAnsiFixedWidth;

Global TSynAnsiConvert instance to handle WinAnsi encoding (code page 1252)
- this instance is global and instantied during the whole program life time
- it will be created from hard-coded values, and not using the system API, since it appeared that some systems (e.g. in Russia) did tweak the registry so that 1252 code page maps 1251 code page