Store PDF as Blobs in SQlite - performance issues?

mine1961 · 2019-12-04 09:41:44

Hi Arnaud,

until now we are storing binary data as blobs, but I feel this might be not a good idea.
What are your experiences with large blobs in SQlite?

With regards Michael

edwinsn · 2019-12-05 07:15:54

If you do so, be sure to use TSQLRawBlob as the field type.

For your concerns about the performance of sqlite itself, read:
SQLite small blob storage: 35% Faster Than the Filesystem:

https://www.sqlite.org/fasterthanfs.html

oz · 2019-12-09 19:23:30

mine1961 wrote:

What are your experiences with large blobs in SQlite?

My experience with LARGE blob data:

Don't do it.

Store metadata in DB, large content outside DB in files. But be carefull not to store too many files in one folder (at least under Windows). File I/O will become incredibly slow once a certain amount of files per folder is reached.

Vitaly · 2019-12-09 20:25:46

oz wrote:

My experience with LARGE blob data:
Don't do it.

Could you share details of your experience, please?

esmondb · 2019-12-10 07:09:32

If you use numbers to name the files create a set of folders numbered 1-9 and file the files depending on the first digit of the file name. Or something like that.

oz · 2019-12-17 17:08:15

Vitaly wrote:

Could you share details of your experience, please?

Well, let‘s start like this: If you really want to store large blob data in a database, then sqlite is one of your best friends to do so. That's because of its basic design. It‘s a big advantage to have your storage engine running in-process for that task. Put that large blob table into its own db file and use data sharding principles if possible. Don‘t put that table in your main db file.

A short story with some oversimplified theory:

Once upon a time in database land, there was a performance issue regarding a Firebird RDBMS based 3-tire app. Must have been about 12 years ago approximately.

Everything worked like a charm, but once one specific table has reached about 250.000 records containing 180 GB of blob data, things started getting really weird... Indexes were fine, db settings were fine, page size was fine, cache options were fine etc...
Adding new records to that large blob table was fine, selects were not that fast but ok, updates were fine.

But: working with other havy used tables in that db became ultra slow.

At the end it breaks down to simple physics...

We had other tables in that db with a lot of text blob data of various length (0-16k bytes, iso8859-1 codepage).

At the end every single SQL server has to store its data into a file or set of files. There is no magic in how SQL servers work. A flat record file containing an array of fixed length records paired with a well implemented b-tree for indexing will always be the most performant way of accessing data. This is what all sql servers are doing at the end. They store fixed length data blocks with a bit of metadata info and your data as payload. Depending on the implementation of the sql server some or many tuning options and clever optimizations come into play. You can tune page and cache sizes, there are algorithms calculating expected table growth etc..

Data on disk is physically stored like that in sequence:

[Table A, Record 1]
[Table A, Record 2]
[Table A, Record 3]
[Table B, Record 1]
[Table B, Record 2]
[Table C, Record 1]
[Table A, Record 4]
[Table C, Record 2]
...

I think you got it, data is appended at the end of the file in creation order.

RDBMS try to arrange records of same type/table as close together as possible.

[Table A, Record 1]
[Table A, Record 2]
[Table A, Record 3]
[
EMPTY BLOCK for 10 new Table A records
]
[Table B, Record 1]
[Table B, Record 2]
[
EMPTY BLOCK for 10 new Table B records
]
[Table C, Record 1]
[
EMPTY BLOCK for 10 new Table C records
]
[Table A, Record xxx]
...

If you have a lot of variable-length data records (with text-blobs, big varchars, etc..) and lot of various inserts, those reserved blocks become unbalanced.
This results in your data being spread around just everywhere in your db pages.

The physical HDD heads have to do a lot of movement to collect all data from db pages on disc. Modern SSD drivers don't face that problem as bad as physical drives because of their architecture. But even SSDs have to do a lot of calculation in such a case. If there are many concurrent users who are requesting that data all the time, then the HDD will go crazy. This is what has happend in my case.

Things are getting even worse the more data is inserted into db.

Well, in Firebird RDBMS there is a way to reorganize the ODS (on disc structure): Do a simple DB backup and restore. When restoring db, all the table data is written back in sequence, so all Table A records reside next to each other on disk.

That helped a lot and improved performance, but we were forced to do those backup/restore jobs quite regularly.

It was absolutely no fun to do so. It took a lot of time to backup/restore those 200 GB of data. Data was still growing and soon we've reached a point where such backup/restore tasks took longer then the out-of-office times of the users.

The DB became unmaintainable.

Solution was to store those blobs outside of db, that fixed all those problems.

So, to come to an conclusion: It depends a lot on your use-case and data types if it is a good idea to store large blobs in db.
My general advice is: don't do it.
Do it only if you exactly know what and why you are doing it.

zed · 2019-12-17 19:39:27

At first, you should answer the question: what is the "large blob" means in your case? Is it 1k, 100k, 1M, 100M?

After this you can go here: https://www.sqlite.org/intern-v-extern-blob.html and check if your case is "green" (than you can use blobs) or "red" (then use file system).

Vitaly · 2019-12-17 20:54:06

oz wrote:

So, to come to an conclusion: It depends a lot on your use-case and data types if it is a good idea to store large blobs in db.

Thanks for sharing your experience, thoughts, and details!
I guess, I misunderstood you the first time. I thought you had some experience, which can tell that storing files in db is always a bad idea in any case.
I used storing blobs in db differently in different situations and planning to continue using it in future. That's why I was interested in your strict phrase "Don't do it."

oz wrote:

Do it only if you exactly know what and why you are doing it.

Totally agree

mORMot Open Source

#1 2019-12-04 09:41:44

Store PDF as Blobs in SQlite - performance issues?

#2 2019-12-05 07:15:54

Re: Store PDF as Blobs in SQlite - performance issues?

#3 2019-12-09 19:23:30

Re: Store PDF as Blobs in SQlite - performance issues?

#4 2019-12-09 20:25:46

Re: Store PDF as Blobs in SQlite - performance issues?

#5 2019-12-10 07:09:32

Re: Store PDF as Blobs in SQlite - performance issues?

#6 2019-12-17 17:08:15

Re: Store PDF as Blobs in SQlite - performance issues?

#7 2019-12-17 19:39:27

Re: Store PDF as Blobs in SQlite - performance issues?

#8 2019-12-17 20:54:06

Re: Store PDF as Blobs in SQlite - performance issues?

Board footer