#1 2022-08-31 15:00:18

MC
Member
Registered: 2012-10-04
Posts: 21

TDocVariant: text encoding

Hello.
I am using Delphi 7.
I need to save a file in Json format in one application and load it in a second application.
Originally, the information comes from the GUI and may have a code point greater than 127.
I wanted to use a TDocVariant but I have a problem with the encoding.
First try:

var
  vVar1   : Variant;
  vVar2   : Variant;
  vStr1   : string;
  vStr2   : string;
  vJson : RawUTF8;
begin
  vStr1 := 'fenêtre'; //-> input var with code point greater than 127

  TDocVariant.New(vVar1);
  vVar1.Objet := vStr1;
  vJson:= VariantSaveJSON(vVar1);

  vVar2 := _Json(vJSon); //-> give {"Objet":"fenêtre"}
  vStr2 := vVar2.Objet;

  ShowMessage(vStr2); //-> give "fenêtre" -> not correct
end;

I thought I should use the RawUTF8 type.
Second try:

var
  vVar1 : Variant;
  vVar2 : Variant;
  vStr1 : string;
  vStr2 : string;
  vJson : RawUTF8;
  vUtf1 : RawUTF8;
  vUtf2 : RawUTF8;
begin
  vStr1 := 'fenêtre'; //-> input var with code point greater than 127
  vUtf1 := StringToUTF8(vStr1); //-> input var as RawUTF8

  TDocVariant.New(vVar1);
  vVar1.Objet := vUtf1;
  vJson:= VariantSaveJSON(vVar1);

  vVar2 := _Json(vJson);
  vUtf2 := vVar2.Objet;
  vStr2 := UTF8ToString(vUtf2);

  ShowMessage(vStr2); //-> give "fenêtre"  -> not correct
end;

And now I'm lost!
Thanks a lot.

Offline

#2 2022-08-31 17:41:10

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,657
Website

Re: TDocVariant: text encoding

Are you sure that when using variant late binding, vVar2.Object returns in fact already a RawUtf8?

My guess is that it returns a plain string on Delphi 7, so Utf8ToString() is not needed, since it is already done during late-binding extraction.

Offline

#3 2022-08-31 18:14:15

ttomas
Member
Registered: 2013-03-08
Posts: 135

Re: TDocVariant: text encoding

@MC, Delphi 7, can you try to change StringToUTF8 with UTF8Encode and UTF8ToString with UTF8Decode, don't have D7 to test. Also check your .pas file encoding, is it Utf8 or Ansi format.

Offline

#4 2022-09-01 08:22:27

MC
Member
Registered: 2012-10-04
Posts: 21

Re: TDocVariant: text encoding

Thank you for your answers.

@ab
I used the debugger to answer your question.

When I set the property using late-binding with an Ansi string, there is a conversion to RawUTF8.
i.e. vVar1.Objet contains "fenêtre" (AnsiString to RawUTF8 conversion).

When I set the property using late-binding with an RawUTF8, there is also a conversion to RawUTF8 :
i.e. vVar1.Objet contains "{"Objet":"fenêtre"}" (double RawUTF8 conversion).

Whatever the type of the original variable (Ansi or RawUTF8) there is a conversion to UTF8 when setting property via late-binding!

In all cases, VariantSaveJSON give the same value as the variant and the second variant obtained with _Json(vJson) is identical to the first variant.

For me, the problem is that there is always a UTF8 conversion regardless of the original type in the assignment.

I went through the documentation but all the examples I found do not have, unless I am mistaken, characters with a code point greater than 127.

@ttomas

When I use UTF8Encode and UTF8DEcode (from system.pas and intended for WideString, right?), it is exactly the same.

The .pas file is Ansi encoded.

Last edited by MC (2022-09-01 08:27:22)

Offline

#5 2022-09-01 13:06:59

ab
Administrator
From: France
Registered: 2010-06-21
Posts: 14,657
Website

Re: TDocVariant: text encoding

In Delphi 7, there is no way to know the code page of a string, as with Delphi 2009+ and FPC.
So just
- use string values not RawUtf8 values when using a late-binding setter,
- and expect a RawUtf8 when using a late-binding getter.

Don't use late binding, but _Safe() which is both faster and gives direct access to the properties as RawUtf8.

Offline

#6 2022-09-01 13:33:19

MC
Member
Registered: 2012-10-04
Posts: 21

Re: TDocVariant: text encoding

OK, I understand.
Thank you, I will follow your advice.

Offline

Board footer

Powered by FluxBB