#1 2020-07-28 22:21:41

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Localized sorting

Hi,

I am looking for ways to localize sorting on all platforms - Android, IOS included.

Anyone has experience with http://site.icu-project.org ?

Offline

#2 2020-07-29 06:29:28

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,571
Website

Re: Localized sorting

I use it. not directly from Pascal, byt from SyNode - in modern JS Intl object is ICU interface. For sorting examples see https://developer.mozilla.org/en-US/doc … l/Collator

Offline

#3 2020-07-29 18:23:55

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

mpv,

thanks, but I need something closer to home.

Unfortunately the only pascal wrapper is over a decade old: http://www.crossgl.com/icu4pas. This is such an essential functionality for none English programs that it would make sense to me to revitalize this project as part of FPC/Lazarus.

For now - since there does not seem to be a viable option for me - I am heading to carve my own path and create a general purpose string compare solution which can be easily extended for different locales. I need it mostly for the client side/local servers so it will not be optimized for speed at first.

Offline

#4 2020-07-29 19:33:20

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Localized sorting

Advice: do not reinvent the wheel about Unicode.
When you see the ICU table size and complexity, it is a whole project.
Rather use the RTL functions.

Offline

#5 2020-07-30 08:59:49

edwinsn
Member
Registered: 2010-07-02
Posts: 1,218

Re: Localized sorting


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#6 2020-07-31 11:05:21

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

ab wrote:

Advice: do not reinvent the wheel about Unicode.
When you see the ICU table size and complexity, it is a whole project.
Rather use the RTL functions.

You are right of course. Being short on time  I was looking for  the fastest solution and adapting something for this purpose I  had to create anyway for an other logically similar purpose seemed like a good idea.
I gave up rushing and went into a deep dive into RTL  and the world of unicode, unicode collations, ICU and the database it is based on. Also looked into how mORMot handles sorting. 

It was useful I  understand the  subject better now but honestly this was a total blackout on my part:

The original problem was that I could not find locale aware comparison routines for UTF-8.  So I started this topic hoping I had missed something.
But there is an extremely simple solution for this, though not optimal for speed: converting to code page aware ansistrings and comparing those with AnsiCompareText gives the proper result.

I thought that this would be the proper way to convert from RawUTF8, but it did not no work:

var aRawUTF8: RawUTF8;
    aAnsiString: AnsiString;
begin
  aRawUTF8:= 'ű';
  aAnsiString:=  TSynAnsiConvert.Engine(1250).UTF8ToAnsi(aRawUTF8);
>> aAnsiString = '?' at this point

end;

Last edited by Leslie7 (2020-07-31 11:11:54)

Offline

#7 2020-07-31 11:27:30

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

If I understand your code right using NormToUpperAnsi7 is a  speed optimiziation which is not codepage aware.  If  SortIStr/SortStr had a version using a slower CP aware comparison instead it would gave mORMot sorting abilities in any locale. The comparsion could  be  based on the RTL as you suggested. Not sure though how Delphi RTL  is different in this regard.

Last edited by Leslie7 (2020-07-31 11:35:43)

Offline

#8 2020-07-31 11:36:39

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

edwinsn wrote:

Thanks, this may be useful some day.

Offline

#9 2020-07-31 11:58:48

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,659
Website

Re: Localized sorting

I don't like the Yunqa products very much. They are making money by selling Open Source libraries, and keeping the source closed.
But at least they support it.

For Unicode tables, take a look at https://github.com/BeRo1985/pucu
This is really Open Source, and also cross-compiler.

Offline

#10 2020-07-31 12:42:14

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

Thanks. This seems kind of low level. Could not find anything readily usable   in it  yet. I will check it out later. Luckily  I already have an almost workable solution which is good enough for now. The only problem left to solve the one I mentioned before  with TSynAnsiConvert.Engine(1250).UTF8ToAnsi. Is there any other way for RawUTF8 to AnsiString conversion?

Last edited by Leslie7 (2020-07-31 21:26:54)

Offline

#11 2020-07-31 22:41:59

Leslie7
Member
Registered: 2015-06-25
Posts: 248

Re: Localized sorting

To answer my question, rookie mistake :

Instead
aRawUTF8:= 'ű';

Should be
aRawUTF8:= StringToUTF8('ű');

I  wrongly assumed automatic conversion for constant values.

Last edited by Leslie7 (2020-07-31 22:42:21)

Offline

Board footer

Powered by FluxBB