Log truncation broke a UTF8

mpv · 2023-10-02 10:00:39

While reading log files using Rust, I discover what some of them is not a UTF8 text. This happens in case we truncate a long strings using `TextTruncateAtLength` parameter, and writer break a string inside a surrogate pair (in my case I logs an input JSON what contains a non latin strings).
IMHO good place is to fix this inside a logger here https://github.com/synopse/mORMot2/blob … .pas#L5531 (not inside writer for compatibility). Good solution is to tries step forward a little (for max of 3 char, because string can be actual a binary) and found actual character end.
Or I should fix this on my app level?

Last edited by mpv (2023-10-02 10:01:28)

Chaa · 2023-10-02 10:26:18

We already have function for this purpose: Utf8TruncatedLength.

ab · 2023-10-02 13:34:54

Perhaps https://github.com/synopse/mORMot2/commit/ebb54974 could help.

mpv · 2023-10-02 15:33:12

Thanks! I back-port it to mORMot1 - see https://github.com/synopse/mORMot/pull/446

mORMot Open Source

#1 2023-10-02 10:00:39

Log truncation broke a UTF8

#2 2023-10-02 10:26:18

Re: Log truncation broke a UTF8

#3 2023-10-02 13:34:54

Re: Log truncation broke a UTF8

#4 2023-10-02 15:33:12

Re: Log truncation broke a UTF8

Board footer