#1 2023-10-02 10:00:39

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,544
Website

Log truncation broke a UTF8

While reading log files using Rust, I discover what some of them is not a UTF8 text. This happens in case we truncate a long strings using `TextTruncateAtLength` parameter, and writer break a string inside a surrogate pair (in my case I logs an input JSON what contains a non latin strings).
IMHO good place is to fix this inside a logger here https://github.com/synopse/mORMot2/blob … .pas#L5531 (not inside writer for compatibility). Good solution is to tries step forward a little (for max of 3 char, because string can be actual a binary) and found actual character end.
Or I should fix this on my app level?

Last edited by mpv (2023-10-02 10:01:28)

Offline

#2 2023-10-02 10:26:18

Chaa
Member
Registered: 2011-03-26
Posts: 245

Re: Log truncation broke a UTF8

We already have function for this purpose: Utf8TruncatedLength.

Offline

#3 2023-10-02 13:34:54

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,240
Website

Re: Log truncation broke a UTF8

Offline

#4 2023-10-02 15:33:12

mpv
Member
From: Ukraine
Registered: 2012-03-24
Posts: 1,544
Website

Re: Log truncation broke a UTF8

Thanks! I back-port it to mORMot1 - see https://github.com/synopse/mORMot/pull/446

Offline

Board footer

Powered by FluxBB