#1 2025-04-03 09:50:47

tbo
Member
Registered: 2015-04-20
Posts: 375

Behaviour of function UpperCaseU

The description of the UpperCaseU function is:

/// fast conversion of the supplied text into 8-bit uppercase
// - this will not only convert 'a'..'z' into 'A'..'Z', but also accentuated
// latin characters ('e' acute into 'E' e.g.), using NormToUpper[] array
// - it will therefore decode the supplied UTF-8 content to handle more than
// 7-bit of ascii characters (so this function is dedicated to WinAnsi code page
// 1252 characters set)

An example with following test:

var
  ul, uh: RawUtf8;
begin
  // Windows-1252 character set
  // Ordinal numbers (decimal), https://de.wikipedia.org/wiki/Windows-1252
  // - é (lower case): 233
  // - É (upper case): 201
  ul := UTF8Encode('étudiant');
  uh := UTF8Encode('Étudiant');  
  // Result: ETUDIANT, Expected: ÉTUDIANT
  ShowMessage(Utf8ToString(mormot.core.unicode.UpperCaseU(ul)));
  // Result: etudiant, Expected: étudiant  
  ShowMessage(Utf8ToString(mormot.core.unicode.LowerCaseU(uh)));  

Do I misunderstand the text of the description, or do I misinterpret the name of the function, or is my expectation simply wrong?

With best regards
Thomas

Offline

#2 2025-04-03 14:49:07

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 15,247
Website

Re: Behaviour of function UpperCaseU

Your expectations are in fact incorrect.

The documentation states:

'e' acute into 'E'

So études etudes Etudes or Études would all result into ETUDES with no accent.

Offline

#3 2025-04-04 08:14:47

sakura
Member
From: Germany
Registered: 2018-02-21
Posts: 240
Website

Re: Behaviour of function UpperCaseU

However, I would recommend renaming the function to something akin to NormalizeToUpperU/NormalizeToLowerU or similar, as UpperCaseU/LowerCaseU are similar to the original Delphi methods, which act differently.

Offline

#4 2025-04-04 08:42:18

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 15,247
Website

Re: Behaviour of function UpperCaseU

@sakura
Those functions exist since more than 10 years in mORMot, renaming them may not be an option.
They clearly refer to the NormToUpper[] array, which has a defined behavior since 2008.

I have refined the documentation to avoid any confusion:
https://github.com/synopse/mORMot2/commit/ac6b729f7

Offline

Board footer

Powered by FluxBB