You are not logged in.
Pages: 1
This time, I use mORMot to process some data which charset is Chinese. My Env is Os_Win7_codepage_936/Oracle_11g_charset_AL32UTF8.
I create prop with default codepage param 0, the result is mess character returned. After a few hour of tracing code, finally I change the codepage to cp_utf8 to make my work done. But the problem here still exists.
Follow the code in unit SynDBOracle.pas, when I use default param 0 to create prop, the Class use GetACP to obtain the OS default codepage 936 first. When query data, convert data returned by OCI from codepage 936 to UTF8. So, I use CP_UTF8 to bypass this convertion. From here, I think SynDBOracle may have a wrong concept about codepage and charset.
I have used Oracle since Oracle8. Each time, charset give us a little trouble when starting a new project, because Chinese Charecter is involved mainly. codepage , charset and NLS_LANG make us mad.
To make clear about problem, I'd like to discuss the concept about codepage , charset, which may make things right.
I think, the codepage and NLS_LANG is related to the client tools of Oracle, such as SQL*PLUS, EXP/IMP; but for direct connect, only charset is necessary to concern. Because, when mORMot connect to Oracle DB using OCI, the client means mORMot, the charset used by client means UTF8. If DB use UTF8, no conversion is needed. If DB use the charset other than UTF8, the conversion is needed. so here mORMot is nothing related to codepage concept. codepage is related to OS or App.
So, Based on the viewpoint above, CreateWithCodePage should changed to CreateWithCharset; the aCodepage param should change to aCharset. for default charset string, we can query DB to get the charset used by DB, or just set it as 'UTF8' simply.
I do'nt know what descripted here is make things clear, because English isn't my native language.
For Programming with Unicode, oracle have the doc here
http://docs.oracle.com/cd/B28359_01/ser … nicode.htm
By the way, another problem is the error msg returned by OCI display not right, I donot know how mORMot convert this message.
I know all the things discussed here not related to ASCII, because all the charset is superset of ASCII. for the guy not use CJK charset, it is difficult to find such error.
And also codepage/charset list in SynDBOracle may not be right for codepage 936. for codepage, consider following table
Common IANA character set Name vs. Oracle character set name
(Http header & html meta tag) (NLS_LANG character set part)
UTF-8 UTF8
windows-1250 EE8MSWIN1250
windows-1251 CL8MSWIN1251
windows-1252 WE8MSWIN1252
windows-1253 EL8MSWIN1253
windows-1254 TR8MSWIN1254
windows-1255 IW8MSWIN1255
windows-1256 AR8MSWIN1256
windows-1257 BLT8MSWIN1257
windows-1258 VN8MSWIN1258
windows-936 or GBK ZHS16GBK
Big5 ZHT16MSWIN950
Big5-HKSCS ZHT16HKSCS – Hong Kong extension of big5
TIS-620 TH8TISASCII
Shift_JIS JA16SJIS
korean or KS_C_5601-1989 KO16MSWIN949
Offline
Yes, in production, we use here huge Oracle databases with a dedicated Charset, which always map the Windows code page.
For instance, for our customers in Korea, we have a KO16MSWIN949 database, and GetACP=949 on the workstations.
I agree this is a convention on our side only, and should be made less specific.
Especially, new DB should better use UTF8 encoding, to support Unicode as required by most modern applications.
We have created a ticket to follow this refactoring.
See http://synopse.info/fossil/info/a6a639ec43
What are your expecations, for changing how TSQLDBOracleConnectionProperties should be initialized?
I believe even CreateWithCodePage/CreateWithCharSet is a wrong idea, since it breaks the default inherited TSQLDBConnectionProperties.Create() signature.
Some general custom parameters may definitively make sense, e.g. as is done with SynDBFireDAC (i.e. add parameters to the ServerName or DatabaseName parameters, encoded as URI, with a TSQLDBOracleConnectionProperties.URI() class function).
Feedback is welcome!
Offline
In my opnion, there is no CreateWithCodePage/CreateWithCharSet needed. when creating connection to OCI, just query Oracle for what Charset is used, this may return charset string such as AL32UTF8/KO16MSWIN949/ZHS16GBK, after that you may use OCINlsCharSetNameToId or a sql query to get related charset_ID;
sql to get DB Charset:
SELECT PROPERTY_VALUE
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME = 'NLS_CHARACTERSET'
sql to convert charsetID
SELECT NLS_CHARSET_ID('AL32UTF8') FROM dual
'AL32UTF8' here is the charset string to query.
but for this auto solution, I don't know if it's available in old db system such as Oracle DB 8.
Last edited by richard6688 (2013-12-09 11:57:51)
Offline
Nice solution!
We have removed CreateWithCodePage() constructor: now the charset will be retrieved at connection, and used for CHAR/NVARCHAR2 fields.
See http://synopse.info/fossil/info/2950b39ac5
Hope it works on your side too.
Thanks a lot for the feedback!
You are in the contributor list at the beginning of SynDBOracle, now.
Offline
thanks for the quick code-fixed.
I have test the new code about SynDBOracle. Still have a problem here.
You did not process the UTF-8 compatible charset. AL32UTF8 charset is a superset of UTF8. it's UTF8 of Oracle.
on http://docs.oracle.com/cd/B28359_01/ser … m#NLSPG608
they said:
AL32UTF8
An Oracle Database character set for the SQL CHAR data type, which is used for the database character set. It encodes Unicode data in the UTF-8 encoding.
But Its oracle charset ID is not in the list you refer: http://www.mydul.net/charsets.html
folowing code will retrival all the charset ID list of oracle
select nls_charset_id(value) nls_charset_id, value nls_charset_name, to_char(nls_charset_id(value),'xxxx') hex_id
from v$nls_valid_values
where parameter = 'CHARACTERSET'
order by nls_charset_id(value)
the list of My oracle 11g here
...
"871","UTF8"," 367"
"872","UTFE"," 368"
"873","AL32UTF8"," 369"
...
Unfortunately, UTF8 ID <> AL32UTF8 ID, so you shall make modified to code.
The Simple solution may modify TSQLDBOracleConnection.STRToUTF8 such as
const OCI_AL32UTF8=$369
procedure TSQLDBOracleConnection.STRToUTF8(P: PAnsiChar; var result: RawUTF8;
ColumnDBCharSet,ColumnDBForm: integer);
var L: integer;
begin
L := StrLen(PUTF8Char(P));
if (L=0) or (ColumnDBCharSet in [OCI_UTF8, OCI_AL32UTF8]) or (ColumnDBForm=SQLCS_NCHAR) or
(fOCICharSet in [OCI_UTF8, OCI_AL32UTF8]) then
SetString(result,P,L) else
result := fAnsiConvert.AnsiBufferToRawUTF8(P,L);
end;
UTF-8(unicode 3.0) was introduced before Oracle 9i. AL32UTF8(unicode 5.0) is introduced by 9i.
UTFE(unicode 5.0) is used only on EBCDIC platforms.
Last edited by richard6688 (2013-12-10 01:14:41)
Offline
You are right.
I was not aware of OCI_AL32UTF8, nor that you may use UTF-8 in CHAR/VARCHAR2 instead of NCHAR/NVARCHAR2...
I've tried to enhance Oracle DB CharSet process.
See http://synopse.info/fossil/info/406380e3a2
Offline
Now, it's OK.
I have tested with my DB 11g with charset AL32UTF8 of CHAR/VARCHAR2 and national charset AL16UTF16 of NCHAR/NVARCHAR2.
Thanks, Good worK!
Offline
hi,
About error message, as I mentioned on #1, is not correct displayed for Chinese. though that isn't real problem, but for debug, is not handy.
after I check your code
procedure TSQLDBOracleLib.HandleError(Status: Integer; ErrorHandle: POCIError;
InfoRaiseException: Boolean; LogLevelNoRaise: TSynLogInfo);
begin
......
raise ESQLDBOracle.Create(String(msg));
end;
I don't know how to cast it use your ansiconvert, but can use delphi Tencodeing to convert it correctly.
code as:
raise ESQLDBOracle.Create(TEncoding.UTF8.GetString(Tbytes(msg)));
Now, I don't know its real means. you may know that to integrate the code.
Offline
I have forced correct character conversion when processing Oracle error messages.
See http://synopse.info/fossil/info/2ab02f4ac0
Offline
Pages: 1