Routing requests to multiple database

okoba · 2022-04-07 16:34:45

Hello,

I have multiple databases for each customer (need to be independent databases for legal reasons), with the same structure or model. And I like to use interface methods to access them. The API should have independent methods called by user, so no direct access to database tables.
What is the best way to rout incoming requests to multiple databases?

To be clear, what is the best way to rout like:
User1 calls /api/logger/add?data=XYZ&key=user1_key
Server checks user1_key, finds that the data need to be added to user1.db and like to insert to that file.

Perhaps, in the future, the Server need to call another server (User1_Server) to do the actual insert. I think this part can be done by the master/slave architecture described in the documentation, but I appreciate any needed info.

JD · 2022-04-07 18:49:45

I have a similar Use Case and I manage it using PostgresSQL and its schemas. Each schema houses a different database belonging to different companies.

The REST call contains the name of the schema and that way a different select/insert/delete can be accessed depending on the schema in the REST call.

For example, I use this REST call to find out if my mORMot server is up and running:

http://localhost:8088/service/myapi/sayhello?schema=company_xyz

This will call the SayHello resource in the Company_XYZ schema.

I hope this helps a little.

Cheers,

JD

ab · 2022-04-07 19:12:09

I would not put the data scope as parameter. It is unsafe, and verbose.

What I have done is to have the database be derivated from the User, e.g. from a simple field in the User table, or - even better - by using a bits map to the User ID (the highest bits of the UserID are in fact a DataID or AccountID, which is used to pickup the right database).

Or I could store the DataID into a JWT, if I use a JWT for authentication.

About multiple databases, I usually use SQLite3, because they are easy to manage (even move the data), efficient, and small.

okoba · 2022-04-07 19:50:42

@JD thank you, although I need to have independent file for databases.
@ab thank for the notes. I want to do it with SQLite if possible. Can you elaborate on how can I route the class to different DBs while using interface based?
I was thinking about a virtual table but maybe it is over engineering? I guessed maybe there is something better in mORMot.
PS, I’m using mORMot 2 for this new project.

okoba · 2022-04-17 05:33:00

@ab can you help on this?

ab · 2022-04-17 10:43:35

Just create a TRestServerDB per client (or group of clients). For mORMot 2, you could use TRestOrmServerDB instances.
For efficiency, cache them in a list, with a timeout, so that you can reuse one instance from a previous call, without opening it.
Then let your interface based service use this cache list and retrieve the proper TRestServerDB/TRestOrmServerDB instance when needed.

I will add such a cache in mORMot 2, if you prefer.

okoba · 2022-04-17 10:49:31

Thank you very much. I need to test your suggestion and will let you know if the cache is needed.

ab · 2022-04-17 16:24:42

Opening a SQLite3 database is fast, a lot of applications (e.g. Fossil) do this, with no real penalty.

But to prepare the future, and leverage your server, a simple cache of opened SQlite3 instances is a good idea - even if not mandatory at first.
You can just make a method which returns the instance, creating it every time, and enabling cache later on if needed for your project... A naive cache of "last used instance" may be enough for most use cases, in practice.

okoba · 2024-02-05 18:43:04

After a couple of years, I came back to this case again. Last time the project got canceled.
I am curious, @ab, did you make the cache mechanism or similar?

ab · 2024-02-05 18:56:00

No, it is not part of the framework.

Implementing a "last used instance" cache is just two lines of code.

okoba · 2024-02-05 19:04:17

After a while I am wondering around in the ORM part of mORMot (always funny that you call it little) and I looking for my way to implement that routing to multiple database files. So I guessed asking, maybe you did something in past years.

okoba · 2024-02-06 14:22:17

I am puzzled by the name of the classes.
What I understand is:
TRestServerDB is a REST server that has a DB.
TRestOrmServerDB should be a REST server that has DB and ORM support, but it is the ORM of TRestServerDB, right?

And when should I use TRestServerDB or TRestOrmServerDB? I cannot understand their usage, even after reading the readme files.

And what is the place that I should choose the TRestXDB if I am going to have multiple one per user? In OnRequest of HTTPServer?
I know mORMot2 should be easier compared to 1, but I still get lost.

ab · 2024-02-06 16:04:06

I am sorry you are still confused.
And even in the wrong forum place - it is about mORMot 1 in here.

To get the proper definition, don't guess, just look at the description in the code/comments.
TRestServerDB = TRestServer class over SQlite3
TRestOrmServerDB = TRestOrmServer class over SQlite3

With mORMot 2 we made a clear distinction between TRestServer and TRestOrmServer.
TRestServer = REST server for both ORM and SOA
TRestOrmServer = implements ORM part of TRestServer

In practice, you just use
- TRestServerDB to access a SQlite3 (or any other external) database;
- TRestServerFullMemory if you need limited (or no) ORM at your REST server level.

Usually, you create a TRestHttpServer with a TRestServerFullMemory which supplies some SOA/services as interfaces or methods.
Then you have one or several separated TRestServerDB instance(s) to access SQlite3 or external databases from within the SOA services.

The Thomas (tbo) samples give a good introduction to this.

In the next weeks, I would like to make things easier to work with, by defining some high-level and ready to use classes to implement a REST SOA server.

okoba · 2024-02-06 20:27:35

Thank you for the helpful answer. Yes it is in the wrong forum, but if I remember correctly, when I initially opened this topic, Version 2 was in Beta and there was not a dedicated forum for it. If it is possible, please move it.
Those samples were life-saving; without them, I couldn't find out what I should do. Thanks to Thomas and you for them. Martin samples are great too.
I will look forward to the updates, but please create samples for them so a newbie like me knows how to use them. Your codes are great and advanced; please dont assume finding a way in them is walking in a park. They are like being in a big library—too many great things to see and read and get overwhelmed

Anyway, I tried my best to make it work. Can you please check it?
https://gitlab.com/-/snippets/3670020

Questions:
1- I didn't use Interface based (SOA?) because I wanted to use the server with a Browser client. Is that a right assumption?
2- I tried using TRestOrmServerDB , but it asks for a Rest, what should I give it? If TRestServerDB, then what was the reason you said I should use TRestOrmServerDB? I tried passing nil for the Rest parameter, but TRestOrm.InternalAdd raise error because it needs fRest.
3- Is there a better way to handle caching?
4- If I don't set LockingMode, and Synchronous the speed is very low. LockingMode is fine, but setting Synchronous to smOff would be dangerous I guess. Can I ask what would you suggest in production? I can use PostgreSQL too, but I guessed SQLite would be faster and less hassle.
5- What is the best way to handle the locking problem when multiple connection come and want to use one TRestServerDB/DB?

Last edited by okoba (2024-02-06 21:19:56)

ab · 2024-02-07 08:58:24

1. Using interfaces may ease your work, especially for testing outside of a client/server architecture.
It would help SOLID code for sure.
2. TRestOrmServerDB can be used stand-alone as IRestOrmServer and TRestOrmServerDB.CreateStandalone.
3. Using a TDictionary seems simple, but you may have some race condition on slow queries longer than 1 second.
4. lmExclusive/smOff are needed for write performance with SQLite3. You can use both values on production. SQLite3 will continue to work even in case of interruption.

5. Multi threading is hard...
There is already a lock at TRestServerDB level, when you access the SQlite3 database. So it is fine if several requests access the same DB at the same time.
I would not use a TSynDictionary of TRestServerDB instances, but of IRestOrmServer, so that it would be properly reference counted, and no DB would be freed once still running for a request.

See TRestStorageMulti in mormot.orm.storage.pas
or TRestStorageMultiDB in mormot.orm.sqlite3.pas
they seem to do what you expect - but I have not fully tested/validated those, IIRC.
- I already found two issues with those classes. But you got the idea.

okoba · 2024-02-07 09:38:07

Thank you very much for the explanation. I will work on it more and see what I can do.
About SQLite smOff, I would lose data in case of OS crash. Any better way?

ab · 2024-02-07 10:01:47

You will always loose data in case of OS crash. What is not written is not written.

The only problems are if the data is lost atomically or not, or if you can reopen the database on restart.
Those problems do not appear even with smOff. It does not cache the data for a delayed write. It just do not wait for the OS to signal that the data has been written on disk.
The worse may be that you may loose a bit more data than with full synchronization on disk - which is much slower.

Reference:
https://sqlite.org/pragma.html#pragma_synchronous
What is called "database corruption" in https://sqlite.org/howtocorrupt.html#cfgerr is not that the database can't be opened.
It is that some data may be somewhat incorrect in the middle on transactions.
In your use case, you don't use transactions...

Hint:
If you have a lot of information written in the same DB, then I would write into a temporary TRestBatch, then flush it on disk after some items or delay.
Then it would make a transaction, and you would not need to use smOff but smNormal could be enough.

okoba · 2024-02-07 10:55:55

Great advice. Yes TRestBatch seems the way to go. I was and am mostly worried of returning success to the user, but in a crash I lose their information, hence thinkin about Full.

I should ask (as you may be one of the best people who used SQLite a lot), compare to PostgreSQL, what is your go to for servers with a lot of concurrent connection, or even for my case, many databases for many users?
I looked at benchmark (extdb-bench) test you did and SQLite is 4 to 5 times (900K/s to 190K/s) faster than PostgreSQL if you use Off and Batch and Transaction, but PostgreSQL is much faster (9.5K/s to 190/s) if I want to use Full without Transaction (separate connections doing one at a time).

Last edited by okoba (2024-02-07 11:02:17)

ab · 2024-02-07 12:37:58

" Full without Transaction " is less secure over PostgreSQL than SQlite3, if there is a power outage.
PostgreSQL stores the request into a memory cache before writing it to disk. So it is not a fair comparison.

The real questions are (min/max/average):
How many users?
How many users at the same time?
How many records per user per second?
How much data per record?

okoba · 2024-02-07 15:08:14

I assumed PostgreSQL does that but I am not familiar with it much.

There are max estimations:
How many users? 1K/5K (User means separate database)
How many users at the same time? 2K/10K, for each database/user there would be 1 to 3 sub-user.
How many records per user per second? 1/2 per user (2K/10K for all users at max, in a second)
How much data per record? 1KB/8KB

So if each DB/user commits in a 2 second delay, there may be 10K to 20K (at max) records that can get lost if an OS crashes happen.
I was thinking about relying each record from load balancer to two server at the same time, so if one goes down, another have a chance of commiting.

ab · 2024-02-07 16:49:28

If only 1/2 records per second, then a batch won't help much, because the batch should be done per database.

For 5K max separate databases, I would try to maintain all SQLite3 databases open, but with a very small in-memory buffer size.
That is, with our default TSqlDataBase.PageSize as 4096 (4KB), we could set TSqlDataBase.CacheSize as 16.
So it should consume a bit more than 4 * 16 * 5000 = 312 MB of RAM plus the OS file caches.
Most of the process would be disk I/O bound I guess. A good OS (Linux) with very good SSD may be needed.

Perhaps a load balancer could help.
It could be a mORMot server, with an in-memory cache for one or few seconds, which would propagate to some mORMot sub-servers, e.g. each with their own 1K databases.

But 10K records of 4KB per second seems a lot of data: 40MB per second, more than 3TB per day?
So I guess the data is updated, not inserted/appended?

okoba · 2024-02-08 09:58:36

Thank you very much. Your estimation seems solid.
Yes, a good chunk of the data is updated (hence the need to sync before letting the user know it is done), and most of the other requests would be read and not written (I should have been more clear).

You are right; there is no need to close the connections if the cache size is so low. I could keep them open indefinitely.

Having multiple servers answer write requests seems like the way to go, as if one goes down, the data will not be lost. And in this way, we can use queue and batch, as you said, to speed it up.

mORMot Open Source

#1 2022-04-07 16:34:45

Routing requests to multiple database

#2 2022-04-07 18:49:45

Re: Routing requests to multiple database

#3 2022-04-07 19:12:09

Re: Routing requests to multiple database

#4 2022-04-07 19:50:42

Re: Routing requests to multiple database

#5 2022-04-17 05:33:00

Re: Routing requests to multiple database

#6 2022-04-17 10:43:35

Re: Routing requests to multiple database

#7 2022-04-17 10:49:31

Re: Routing requests to multiple database

#8 2022-04-17 16:24:42

Re: Routing requests to multiple database

#9 2024-02-05 18:43:04

Re: Routing requests to multiple database

#10 2024-02-05 18:56:00

Re: Routing requests to multiple database

#11 2024-02-05 19:04:17

Re: Routing requests to multiple database

#12 2024-02-06 14:22:17

Re: Routing requests to multiple database

#13 2024-02-06 16:04:06

Re: Routing requests to multiple database

#14 2024-02-06 20:27:35

Re: Routing requests to multiple database

#15 2024-02-07 08:58:24

Re: Routing requests to multiple database

#16 2024-02-07 09:38:07

Re: Routing requests to multiple database

#17 2024-02-07 10:01:47

Re: Routing requests to multiple database

#18 2024-02-07 10:55:55

Re: Routing requests to multiple database

#19 2024-02-07 12:37:58

Re: Routing requests to multiple database

#20 2024-02-07 15:08:14

Re: Routing requests to multiple database

#21 2024-02-07 16:49:28

Re: Routing requests to multiple database

#22 2024-02-08 09:58:36

Re: Routing requests to multiple database

Board footer