#1 2010-06-21 16:42:24

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

FTS3

Since version 1.5 of the framework, the sqlite3fts3.obj file is always available in the distribution file: just define the INCLUDE_FTS3 conditional in SQLite3.pas to enable FTS3 in your application.
Leave it undefined if you don't need this feature, and will therefore spare some KB of code.

Offline

#2 2010-08-19 10:16:36

esmondb
Member
From: London
Registered: 2010-07-20
Posts: 299

Re: FTS3

I've been trying out the FTS3 by modifying the Sample 02 - Embedded SQLite ORM project

Adding the following before Application.run; works fine:

TSQLRestServerDB(Form1.Database).DB.GetTableNames(TableNamesAtCreation);
  if FindRawUTF8(TableNamesAtCreation, 'fts3index') = -1 then
    TSQLRestServerDB(Form1.Database).EngineExecuteAll(FormatUTF8(
    'CREATE VIRTUAL TABLE fts3index USING fts3(%, %);',['title', 'notes']));

The problem is if I change the add and find buttons onclick event as below I get an assertion failure at line1935 of SQLite3.pas when closing the app. Can you suggest what I'm doing wrong?

procedure TForm1.AddButtonClick(Sender: TObject);
var
  SQL: rawUTF8;
begin
  SQL := FormatUTF8('INSERT INTO fts3index(title, notes) VALUES(''%'', ''%'');',
      [StringToUTF8(NameEdit.Text), StringToUTF8(QuestionMemo.Text)]);
  TSQLRestServerDB(Database).EngineExecuteAll(SQL);
end;

procedure TForm1.FindButtonClick(Sender: TObject);
begin
  questionmemo.Text := TSQLRestServerDB(Database).DB.ExecuteJSON(
  'SELECT * FROM fts3index WHERE fts3index MATCH '''+NAMEEDIT.Text+''';');
end;

I would be great if a Free text searching option could be incorporated into your ORM framework.

Also seem to have found a problem with sample03 (pipes) - The server part can't find unit2.pas

Thanks

Offline

#3 2010-08-19 11:00:29

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

You try to make a good usage of RawUTF8/FormatUTF8/StringToUTF8. Congratulations!
(only FindButtonClick is not very clean).
You can use String/UnicodeString directly in for opened array of FormatUTF8, like this:

  SQL := FormatUTF8('INSERT INTO fts3index(title, notes) VALUES(''%'', ''%'');',[NameEdit.Text,QuestionMemo.Text]);

About your problem.
I took a look at the source code:

destructor TSQLDataBase.Destroy;
var S: TSQLHandle;
begin
  if DB<>0 then
  try
    Rollback; // any unfinished transaction is rollbacked
  finally
  { Applications should finalize all prepared statements and close all BLOB handles
    associated with the sqlite3 object prior to attempting to close the object }
    repeat
      S := sqlite3_next_stmt(DB,0); // 0: get first prepared statement for DB
      if S=0 then
        break;
      // if code was correctly protected with try/finally, as in
      // TSQLDataBase.Execute() and TSQLRequest.Execute(), we should never go here
      assert(false,FileName); // debug purpose, never trigered on production
    until not (sqlite3_finalize(S) in [SQLITE_OK,SQLITE_ABORT]); 
    // close the database object
    sqlite3_close(DB);
    fDB := 0;
  end;
  DeleteCriticalSection(fLock);
  fCache.Free;
  inherited;
end;

As it states, there is a statement which has not been closed.
Since it shouldn't appear if you use the framework right, I've an aggressive Assert() here.

My ExecuteAll methods just looks fine:

procedure TSQLRequest.ExecuteAll;
begin
  if RequestDB=0 then
    raise ESQLException.Create(0,SQLITE_CANTOPEN);
  try
    repeat
      repeat
      until Step<>SQLITE_ROW; // all steps of this statement
    until PrepareNext=SQLITE_DONE; // all statements
  finally
    Close; // always release statement
  end;
end;

And the ExecuteJSON() calls this code:

procedure TSQLRequest.Execute(aDB: TSQLHandle; const aSQL: RawUTF8; JSON: TStream;
  Expand: boolean=false);
...
  try
    // prepare the SQL request
    if aSQL<>'' then // if not already prepared, reset and bound by caller
      Prepare(aDB,aSQL); // will raise an ESQLException on error
...
  finally
    if aSQL<>'' then
      Close; // always release statement (if not prepared and closed by caller)
    W.Free;
  end;
end;

So I guess both methods are implemented correctly.

If you can debug your code in step-by-step,

There could be some not closed statement when using FTS3. Error should be in the SQLite3 engine itself.

I think you can just comment the following line in SQLite3.pas:

  //    assert(false,FileName); // debug purpose, never trigered on production

It's safe in all cases.

Offline

#4 2010-08-19 11:07:09

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

esmondb wrote:

I would be great if a Free text searching option could be incorporated into your ORM framework.

Indeed, but since I didn't use FTS3 myself, I didn't put dedicated methods yet.
I'll try to add some methods for using FTS3, to avoid writing SQL statements.

esmondb wrote:

Also seem to have found a problem with sample03 (pipes) - The server part can't find unit2.pas

These files were indeed missing the in the "03 - NamedPipe Client-Server" sample.
I've uploaded to http://synopse.info/fossil
And they will be integrated to the next release (1.9).

Offline

#5 2010-08-19 14:08:51

esmondb
Member
From: London
Registered: 2010-07-20
Posts: 299

Re: FTS3

I haven't had much luck with debugging. I've got round it by changing TSQLRequest.Close to what's below. Could this have any unwanted effects?

procedure TSQLRequest.Close;
begin
  if Request=0 then exit;
  repeat
    sqlite3_finalize(fRequest);
    fRequest := sqlite3_next_stmt(RequestDB,0);
    if fRequest = 0 then break;
  until not (sqlite3_finalize(fRequest) in [SQLITE_OK,SQLITE_ABORT]);
end;

Offline

#6 2010-08-19 16:54:38

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

Oh no! Don't do that!

It will close ALL statements. And that's not a good idea!

If you've nested statements - which can occur if you've prepared a statement, like we did for Retrieve() - you'll close ALL statements...

So your code will break the framework.
wink

I'm working of having FTS3 handled natively in the framework.
Here is the interface of the new type:

type
  {{ a base record, corresponding to a FTS3 table
    - FTS3 table is an SQLite virtual table that allows users to perform
      full-text searches on a set of documents. The most common (and effective)
      way to describe full-text searches is "what Google, Yahoo and Altavista do
      with documents placed on the World Wide Web". Users input a term, or
      series of terms, perhaps connected by a binary operator or grouped together
      into a phrase, and the full-text query system finds the set of documents
      that best matches those terms considering the operators and groupings the
      user has specified. See @http://sqlite.org/fts3.html
    - any record which inherits from this class must have only sftUTF8Text
      (RawUTF8) fields
    - this record has its fID: integer property which may be published
      as DocID, to be consistent with SQLite3 praxis, and reflect that it
      points to an ID of another associated TSQLRecord
    - a good approach is to store your data in a regular TSQLRecord table, then
      store your text content in a separated FTS3 table, associated to this
      TSQLRecord table via its ID/DocID
    - the ID/DocID property can be set when the record is added, to retrieve any
      associated TSQLRecord (note that for a TSQLRecord record,
      the ID property can't be set at adding, but is calculated by the engine)
    - static tables don't handle TSQLRecordFTS3 classes
    - by default, the FTS3 engine ignore all characters >= #80, but handle
      low-level case insentivity (i.e. 'A'..'Z') so you must keep your
      request with the same range for upper case
    - by default, the "simple" tokenizer is used, but you can inherits from
      TSQLRecordFTS3Potter class if you want a better English matching, using
      the Porter Stemming algorithm - see @http://sqlite.org/fts3.html#tokenizer
    - in order to make FTS3 queries, use regular queries, with the MATCH operator:
       ! var IDs: TIntegerDynArray;
       ! if OneFieldValues(TSQLMyFTS3Table,'ID','text MATCH "linu*"',IDs) then
       !  // you've all matching IDs in IDs[]  }
  TSQLRecordFTS3 = class(TSQLRecord)
  public
     {{ optimize the FTS3 virtual table
     - this causes FTS3 to merge all existing index b-trees into a single large
       b-tree containing the entire index. This can be an expensive operation,
       but may speed up future queries. See @http://sqlite.org/fts3.html#section_1_2
     - this method must be called server-side
     - returns TRUE on success }
     class function OptimizeFTS3Index(Server: TSQLRestServer): boolean;
  end;

  /// this base class will create a FTS3 table using the Porter Stemming algorithm
  // - see @http://sqlite.org/fts3.html#tokenizer
  TSQLRecordFTS3Potter = class(TSQLRecordFTS3);

I think this could help you using FTS3.

Offline

#7 2010-08-19 18:24:50

esmondb
Member
From: London
Registered: 2010-07-20
Posts: 299

Re: FTS3

Didn't think my solution was that good but it seemed to work in my particular situation. Thanks for the feedback.

Re the new fts type do you think it could be worth extending TSQLRecord instead of descending from it? What I'm thinking is that there could be a new field type of 'fts text'. When this field type is used in a TSQLRecord child definition behind the scenes two tables would be created and both kept in step. Or if no other field types are used then a single fts3 table would be created. I guess this would be harder to implement in the framework but could make using it simpler.


TSQLRecord do you think

Offline

#8 2010-08-19 19:16:53

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

Your idea is great. It could be perfect from the user point of view, in most scenarios.

But in all cases, it's a need to have a TSQLRecordFTS3 table type, in order to implement a native access to the FTS3 virtual table in a RESTful manner.
For example, if there is no other field types than 'fts text' (as you stated above), then it's easy to use a TSQLRecordFTS3 instead of a TSQLRecord.

What I can do is that I can add a new field type, just like TSQLRawBlob.

type
  TSQLRawFTS3 = type TSQLRawUTF8;

This field kind won't be retrieved by default (as BLOB is not).
It makes sense because this field content is intended to contains a lot of text, which could use a lot of bandwidth in an ORM approach (we populate all field values from server).

Then you'll use new RetrieveFTS3() and UpdateFTS3() methods to access to this field from its ID.
From the client side, they will silently call RetrieveBlob() and UpdateBlob() methods, via a RESTful standard extended URL, like the one used for BLOB fields: http://server/ModelName/TableName/FTS3FieldName
From the server side, the low-level Blob functions won't be used, but the corresponding FTS3 table will be used instead.

For the queries, we will able to access to the fts3 fields only, or to not-fts3 fields only, on the same request. That is, it is impossible, without writing a full SQL parser in Delphi, to mix fts3 fields and no-fts3 fields in the "where" clause. But it's not an issue, if it's properly documented.

One issue is that I can't use such an hidden TSQLRecordFTS3 table to handle the storage, because the retrieval rely on fields defined via RTTI, which is not the case here.
So there'll be an hidden dynamic array containing the fts3 table names, according to the field type definition.

What do you think about this implementation?

Offline

#9 2010-08-19 20:18:36

esmondb
Member
From: London
Registered: 2010-07-20
Posts: 299

Re: FTS3

From what you say my idea was probably a bit of a 'black box' object without enough switches on the side but treating freetext fields like BLOBs looks like it could have it's merits.

Descending TSQLRecordFTS3 from TSQLRecord doesn't quite seem right. Shouldn't a parent object be distilled from them to make them siblings, otherwise a lot of functions would be overridden (not sure if this is an issue)?

With the queries from my experience there normally needs to be some intervention in the 'n-tier' to make the search interface simple enough for the end user, so as you say it's not an issue.

Would it make sense is to have an 'umbrella' object for tables that use both freetext and non-freetext fields?

Offline

#10 2010-08-19 21:52:34

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

esmondb wrote:

Descending TSQLRecordFTS3 from TSQLRecord doesn't quite seem right. Shouldn't a parent object be distilled from them to make them siblings, otherwise a lot of functions would be overridden (not sure if this is an issue)?

We need all inherited fields from TSQLRecord, and the fact that it inherits from it make it easy register any TSQLRecordFTS3 as other tables.
And no function is overriden, if you look at the type definition: only a new one - about optimizing FTS3 table - is added.
So it looks like a good solution, at least from the code point of view, to me.
IMHO it's the less demanding option in terms of number of lines.

esmondb wrote:

Would it make sense is to have an 'umbrella' object for tables that use both freetext and non-freetext fields?

What do you call an "umbrella" object? An interface? Looking back at the Architecture of the framework, I don't think so.
The TSQLRawFTS3 new type definition seems coherent from the user point of view. Just some UTF-8 field, with extended search capabilities when used with the "MATCH" operator in the where clause.

Offline

#11 2010-08-20 15:18:49

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

I've finished the TSQLRecordFTS3 class implementation.
Dedicated tests have been included.

Including FTS3 virtual tables was not so simple.
One big issue is that I mapped the rowid to the ID property for normal tables.
And it's not possible to have this ID property for a FTS3 virtual table.
So I had to use RowID as property name for all SQL statements... therefore some deep changes where needed in the SQL generation code of the framework.
And some other parts of the code, which are checking the field names before sending them to the database engine, needed to be rewritten too.

Including the TSQLRawFTS3 feature, as I posted above, could be a lot more work than I expected first.

Offline

#12 2010-08-20 21:48:29

esmondb
Member
From: London
Registered: 2010-07-20
Posts: 299

Re: FTS3

Thanks for your work. I was being vague with 'umbrella' object but I think your TSQLRawFTS3 design covers it.
It could be useful to make the similar treatment of freetext and blob fileds optional. There can be situations where a freetext index is useful with small or maybe fixed length fields. In these cases it would be convenient to get a JSON encoded table of results from queries rather than individually retrieving fields.

Offline

#13 2010-08-22 06:25:59

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,182
Website

Re: FTS3

There is no fixed length fields in SQLite3.
You can already get a JSON Encoded table of BLOB results: BLOB fields are not retrieved by default by the Retrieve() methods, but BLOB fields can be retrieved with OneFieldValues()/MultiFieldValues() methods.

Offline

#14 2016-05-23 17:32:28

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,534
Website

Re: FTS3

Because of security concerns, SQLite >= 3.11.0 only enabled to register a build-in FTS tokenizer by default. But we use a non-standard Cyrillic tokenizer, so I add a TSQLDataBase.EnableCustomTokenizer function to allow statements like

SELECT fts3_tokenizer('mytokenizer', @tokenizerFunction)

See [1d09ea3191]

Offline

Board footer

Powered by FluxBB