You are not logged in.
Dear ab, when running mormot sample 30, I want to check about unicode61 tokenizer, after modified TSQLArticleSearch definition as TSQLArticleSearch = class(TSQLRecordFTS4Unicode61), but the mormot generated SQLite3-db still representated table 'ArticleSearch' as tokenize = simple
after I check & modify mormot.pas row-31023 to Self.InheritsFrom(TSQLRecordFTS3Unicode61), everything works fine. So I created a pull request on Github. pls review it and thank you again for your enthusiasm on this wonderful framework :-)
And after doing some work on tokenizer, I found simple & porter & unicode61 tokenizer both can't deal Chinese correctly. Is anyway to add ICU tokenizer or some custom tokenizer?
Below is some informatioin I've investigated :
http://stackoverflow.com/questions/1838 … -when-i-in
https://github.com/wangwang4git/SQLite3-ICU
https://github.com/haifengkao/SqliteSubstringSearch
https://sqlite.org/fts3.html#tokenizer
Offline
I've merged your pull request.
And now tokenize=... SQL will be generated from TSQLRecordFTS3/4 class name.
e.g. TSQLRecordFTS4Porter -> tokenize=porter
See https://synopse.info/fossil/info/afd4717549
So if you define your own TSQLRecordFTS4ICU class, it will generate a virtual FTS4 table with tokenize=icu parameter.
But you still need to add the tokenizer function to the SQLite3 engine.
I suspect you need to compile your SQLite3.obj with ICU as reported by https://sqlite.org/fts3.html#compiling_ … 3_and_fts4 i.e.
-DSQLITE_ENABLE_ICU
Offline
ab, thank you for your so quick response, I'lll try to add some CJK language friendly tokenizer to engine.
Offline