You are not logged in.
Pages: 1
Hi,
I am looking for ways to localize sorting on all platforms - Android, IOS included.
Anyone has experience with http://site.icu-project.org ?
Offline
I use it. not directly from Pascal, byt from SyNode - in modern JS Intl object is ICU interface. For sorting examples see https://developer.mozilla.org/en-US/doc … l/Collator
Offline
mpv,
thanks, but I need something closer to home.
Unfortunately the only pascal wrapper is over a decade old: http://www.crossgl.com/icu4pas. This is such an essential functionality for none English programs that it would make sense to me to revitalize this project as part of FPC/Lazarus.
For now - since there does not seem to be a viable option for me - I am heading to carve my own path and create a general purpose string compare solution which can be easily extended for different locales. I need it mostly for the client side/local servers so it will not be optimized for speed at first.
Offline
Have you looked at this: https://www.yunqa.de/delphi/products/converters/index
Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.
Offline
Advice: do not reinvent the wheel about Unicode.
When you see the ICU table size and complexity, it is a whole project.
Rather use the RTL functions.
You are right of course. Being short on time I was looking for the fastest solution and adapting something for this purpose I had to create anyway for an other logically similar purpose seemed like a good idea.
I gave up rushing and went into a deep dive into RTL and the world of unicode, unicode collations, ICU and the database it is based on. Also looked into how mORMot handles sorting.
It was useful I understand the subject better now but honestly this was a total blackout on my part:
The original problem was that I could not find locale aware comparison routines for UTF-8. So I started this topic hoping I had missed something.
But there is an extremely simple solution for this, though not optimal for speed: converting to code page aware ansistrings and comparing those with AnsiCompareText gives the proper result.
I thought that this would be the proper way to convert from RawUTF8, but it did not no work:
var aRawUTF8: RawUTF8;
aAnsiString: AnsiString;
begin
aRawUTF8:= 'ű';
aAnsiString:= TSynAnsiConvert.Engine(1250).UTF8ToAnsi(aRawUTF8);
>> aAnsiString = '?' at this point
end;
Last edited by Leslie7 (2020-07-31 11:11:54)
Offline
If I understand your code right using NormToUpperAnsi7 is a speed optimiziation which is not codepage aware. If SortIStr/SortStr had a version using a slower CP aware comparison instead it would gave mORMot sorting abilities in any locale. The comparsion could be based on the RTL as you suggested. Not sure though how Delphi RTL is different in this regard.
Last edited by Leslie7 (2020-07-31 11:35:43)
Offline
Have you looked at this: https://www.yunqa.de/delphi/products/converters/index
Thanks, this may be useful some day.
Offline
I don't like the Yunqa products very much. They are making money by selling Open Source libraries, and keeping the source closed.
But at least they support it.
For Unicode tables, take a look at https://github.com/BeRo1985/pucu
This is really Open Source, and also cross-compiler.
Offline
Thanks. This seems kind of low level. Could not find anything readily usable in it yet. I will check it out later. Luckily I already have an almost workable solution which is good enough for now. The only problem left to solve the one I mentioned before with TSynAnsiConvert.Engine(1250).UTF8ToAnsi. Is there any other way for RawUTF8 to AnsiString conversion?
Last edited by Leslie7 (2020-07-31 21:26:54)
Offline
To answer my question, rookie mistake :
Instead
aRawUTF8:= 'ű';
Should be
aRawUTF8:= StringToUTF8('ű');
I wrongly assumed automatic conversion for constant values.
Last edited by Leslie7 (2020-07-31 22:42:21)
Offline
Pages: 1