#1 2011-03-09 15:36:19

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Stop words and international characters with FTS4

Hi AB,

Sorry for so much questions in a single day, since I'm building a new app and I some requirements need to be well thought wink

Two questions about using FTS4 with this excellent sqlite framework:

1 - Do you have any tips/info as to support international characters when using FTS4?

2 - Do you have any hints/info as to removing stop words? I see in this article that this can significant reduce the database size.

Thank you again!


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#2 2011-03-09 19:07:23

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Stop words and international characters with FTS4

You can have multiple tokenizer in FTS3/FTS4.
But you'll have to create itself.
See http://www.sqlite.org/fts3.html#tokenizer

But it needs to write some low-level C-like code in Delphi.
Not very easy....

About stop words, they should be deleted BEFORE sending the text to the FTS3 engine.

Online

#3 2011-03-10 10:00:33

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Re: Stop words and international characters with FTS4

Thanks for the link, ab.

Is Synopse SQLITE compiled with the SQLITE_ENABLE_ICU pre-processor symbol defined? Thanks.


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#4 2011-03-10 10:13:12

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Stop words and international characters with FTS4

No, it is not, because ICU is a very big part of code, with a lot of tables.
Since we use static linking, the .obj (therefore .exe) files will be much bigger.

And ICU is the "Linux-way" of handling Unicode... there are other native API under Windows, which may be used instead.
But it's not available in SQLite3 by now.

Online

#5 2011-03-11 04:30:47

edwinsn
Member
Registered: 2010-07-02
Posts: 1,215

Re: Stop words and international characters with FTS4

I'm wondering if you think that it's a good idea that offering multiple versions of the sqlite3 objects, for example, one with this ICUenabled option defined...


Delphi XE4 Pro on Windows 7 64bit.
Lazarus trunk built with fpcupdelux on Windows with cross-compile for Linux 64bit.

Offline

#6 2011-03-11 08:31:21

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Stop words and international characters with FTS4

As I wrote above, IMHO under Windows ICU is not a good option.

Online

Board footer

Powered by FluxBB