#1 mORMot 1 » Progressive hash computation » 2017-01-17 21:58:55

Replies: 1

Hi !

First a big thank you for the quality of your components !

Now my question - and sorry about my poor English smile

I’m sending large files to a TCP server. Each file is divided into arbitrary chunks of data, using a special chunking algorithm.

Each chunk is hashed (with TSHA1 record – Init / Update / Final...) on client side before being sent, and the hash is verified on the server side to check it has not been altered, then saved to a database.

But now I also want to get the hash of the whole file, to be able to check that during the file reconstruction (when requested) chunks are properly written to the disk, in the valid order, and match the original content which has been previously sent.

The naïve way would be to use a second TSHA1 record, to call Update() during chunking - until the last chunk is found - and then call Final() to get the SHA-1 hash of the whole stream. But in this case, the data is hashed twice (the first time to get an individual hash for each chunk, the second time to have the « global » hash for the file), which really burns a lot of CPU. And we're talking to hash the *same bytes* twice, which is really nonsense.

So, I’m searching for a way to perform progressive hash computation with your components (if possible). In other words, getting the fingerprint for the bytes that have been processed at some point without preventing adding more data to the hash. It seems that Final() is designed to get the hash value according to what has been previously supplied to Update(),  but its name implies that some internal state variables are then reinitialized, which defeats my purpose.

A hack might be to save a copy of the record before calling Final(), get the « intermediate » hash corresponding to the fingerprint for a chunk, and then reassign the saved record to the variable. This way we restore the previous state of the hash as it was before calling Final(). And then we call Update() again, repeating the operation until we reach the end of the stream - this time the Final() value would be the hash of the whole stream. But it seems a little bit dirty...

Any hint appreciated smile

Best regards,


Board footer

Powered by FluxBB