You are not logged in.
Hi,
I would like to transform a RawUTF8 string character-wise, in particular remove underscores and move all initial lowercase characters to the end (de_la_Rue -> Rue!dela). To do so I run over the characters in the RawUTF8 string N and its LoweCaseUnicode() copy L as
while (i <= len) and (N[i] = L[i]) do ...
This does not work for non-7-bit characters like é and ä. This is probably due to the UTF-8 encoding and N[] returning a single byte instead of a single char.
What is the correct way of accessing the ith character in a RawUTF8 string?
Best regards
Boris
Last edited by gothbert (2017-12-22 18:21:51)
Offline
Thank you, mpv, that does the job.
There is a lot of pointer arithmetic involved in getting the job done, plus some pitfalls. Here is what I learned:
RawUTF8 strings behave like pointers.
p= pointer(S) makes a PUTF8Char point to the beginning of the RawUTF8 string S. Do not use p:= @S.
K:= S makes K point to the same memory as S. Even K:= Trim(S) does so. I needed to make an explicit copy by K:= Copy(S, 1, Maxint).
Offline