mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

ab · 2020-12-29 14:39:35

One proposal for mORMot 2.

What if we renamed the RawUTF8 type into Utf8?

With a default compatibility redirection if PUREMORMOT2 is not defined, of course.

The "Raw" prefix came from early mORMot code, which used TRichView as reference for the UTF-8 encoding... but it is clearly an overhead today.
And "Utf8" is shorter than "String" and "Utf8String" anyway....

What do you think?

macfly · 2020-12-29 16:57:56

I prefer RawUTF8, because it is easy to know the origin of the type.

When looking at the source there is no doubt that it is the type added by mORMot.

mpv · 2020-12-29 19:44:28

Agree with @macfly. And this is a big regressiin. Lets keep it as is

Vitaly · 2020-12-29 21:22:01

If it is just an opinion collection, I'd vote for Arnaud's proposal. I'm using RawUTF8 widely, therefore from my personal experience:
- RawUTF8 in my code is often confusing for others, who don't know much or know nothing about mORMot (particularly the 'Raw' part, as far as I understand).
- a shorter name is faster to type
It is all not critical for sure, but I won't be against such renaming.

mdbs99 · 2020-12-30 01:47:14

RawUTF8 isn't a good name, but better than just Utf8.
Using just Utf8 seems not complete: is it a string, a stream, a char, what?

Every string type has String as suffix so, to keep a consistence a new string type should use it too. Think for example in UTF8Char type, should we rename it too? A char is "less" than a string, so a char could use Utf8 and Utf8String is a-string-of-chars(?)

SynUnicode is inconsistent because "say nothing" about the type, as it doesn't keep the naming using String as suffix—that should be renamed.

If you really want and will change the name anyway, don't think only in RawUTF8, but RawJSON, RawUnicodeString, and so on. The prefix "Raw" might be removed.

Last, you should use "T" as a prefix for all types. PRawUTF8 has "P" because it is a pointer, as TRawUTF8 uses "T" because it is a type—I don't care if Integer and Currency don't have a "T", it is inconsistent anyway.

Finally, the perfect name should be TUtf8String but I guess you won't use it...

Chaa · 2020-12-30 04:31:26

I think that UTF8String is better than UTF8.
And RTL has UTF8String defined exactly as in mORMot.
So we can use RTL type in new Delphi versions, and define it for old versions.

ab · 2020-12-30 09:51:03

Utf8String sounds better, indeed.
It is defined in Delphi 7 and later as expected.

We already renamed TSQLRawBlob as RawBlob in mORMot2.
RawJSON makes sense to me: it is indeed some raw JSON content.
RawUnicode is an internal type, almost never used. It is an Unicode String stored in an AnsiString, only needed for Delphi 7/2007. So we could keep it as such, for compatibility reasons.

SynUnicode is either UnicodeString or WideString - the faster on the platform. On FPC, you have UnicodeString <> string by default, and UnicodeString=WideString on POSIX... So our SynUnicode is somewhat more consistent if you want a fast UTF-16 Type. Perhaps UTF16String may be more namingful, and closed to Utf8String convention.

Also note that you have a SPIUTF8 type available...

About 'T' prefix, it was never the case for simple raw types like integer, string or such.

Junior/RO · 2020-12-30 12:53:04

var s: mUtf8 // mORMot UTF8
var s: sUtf8 // string UTF8
var s: s_utf8 // snake case
var s: RawUTF8
var s: rUtf8 // God forbid
var s: _utf8 // argh
var s: Utf8

mdbs99 · 2020-12-30 14:02:41

ab wrote:

SynUnicode is either UnicodeString or WideString - the faster on the platform. On FPC, you have UnicodeString <> string by default, and UnicodeString=WideString on POSIX... So our SynUnicode is somewhat more consistent if you want a fast UTF-16 Type. Perhaps UTF16String may be more namingful, and closed to Utf8String convention.

For sure SynUnicode should exist - the faster on the platform, you've said - but I'm talking about the naming, which could be (T)SynUnicodeString following the convention.

About 'T' prefix, it was never the case for simple raw types like integer, string or such.

I know, from Borland yeah, but I think that following a convention is better, even more important in Pascal that is no case-sensitive. A 'T' helps for not having a collision name with a property for example.

edwinsn · 2020-12-30 16:00:42

I suggest to keep the `T` prefix.

`RawUTF8` is OK, the length of the name is reasonable, it's not lengthy at all. On the other hand, `UTF8` is too short in my eye...

Eugene Ilyin · 2020-12-30 23:20:14

Ok, here is a typical case below.
What is better for you to fast and accurate reading/re-understanding your/another developer old code?

function TMyClass.DoSomething(const AValueA, AValueB: Utf8; const APosition: Integer): Utf8;
var
  Index: Integer;
  StrA, StrB: Utf8;
...

function TMyClass.DoSomething(const AValueA, AValueB: RawUTF8; const APosition: Integer): RawUTF8;
var
  Index: Integer;
  StrA, StrB: RawUTF8;
...

function TMyClass.DoSomething(const AValueA, AValueB: UTF8String; const APosition: Integer): UTF8String;
var
  Index: Integer;
  StrA, StrB: UTF8String;
...

function TMyClass.DoSomething(const AValueA, AValueB: UTF8Str; const APosition: Integer): UTF8Str;
var
  Index: Integer;
  StrA, StrB: UTF8Str;
...

IMHO, RawUTF8 is ok, Utf8 is too short and ugly, and could be hard for the fast code reading.
Maybe suggested alternative UTF8String (or UTF8Str) is long enough, but is it worth the code and libraries refactoring?
Also it will be not good at all to introduce another mORMot.UTF8String/System.UTF8String name collision with standard type.
Is it possible to mess it up when you use both: some Delphi libraries expected System.UTF8String and some mORMot UTF8String routines in the same code.
From the other point of view: introduction of mORMot.UTF8String will be a good protection barrier to throw out libraries which are work with System.UTF8String

P. S.
Btw (if we talk about such core changes in naming )
I'm very allergic to ab's CamelCase notation to almost all abbreviations, all these: Html, Http, Utf, Xml, Json, etc. in places where expected HTTP, XML, UTF, JSON... brrrrrr... (thanks that SQL is not cursed to Sql) I wonder is it because of issue to keep the Shift key pressed during typing or some code post-processor trash it out before commit?
With all respect, but fix this will improve readability/expectability more than RawUTF8 (hm... RawUtf8) renaming.
What do you think? Maybe it's a separate topic to discuss.

Last edited by Eugene Ilyin (2020-12-31 00:20:55)

Vitaly · 2020-12-31 00:18:39

Chaa wrote:

I think that UTF8String is better than UTF8.

I like this suggestion, it seems clearer for understanding for mORMot-newbies from my pov. Although I'm not sure if it will be ok for compatibility (with other libraries including), I would not call myself proficient in this question.

Eugene Ilyin wrote:

I'm very allergic to ab's CamelCase notation to almost all abbreviations, all these...

For me it is completely different - I like this way of abbreviating use in naming Maybe it is a matter of taste (I'm just sharing mine), but when I get occasionally the situation with several abbreviations in one name - it is very hard for me to use uppercase only. Something like TJWTRS256 or TUTF8XMLCDATA or whatever else - there are so many abbreviations today. So TSqlRecord (or Orm) seems for me much more convenient than TSQLRecord (or ORM).

edwinsn · 2020-12-31 05:45:09

Eugene Ilyin wrote:

Also it will be not good at all to introduce another mORMot.UTF8String/System.UTF8String name collision with standard type.
Is it possible to mess it up when you use both: some Delphi libraries expected System.UTF8String and some mORMot UTF8String routines in the same code.

I highly agree with Eugene, especially the drawback of conflict with System.UTF8String in some cases!

ab · 2020-12-31 09:36:05

What we could do in mORMot 2:

type RawUTF8 = Utf8String;

So that we could use RawUTF8 or the standard Utf8String anywhere.
Since we didn't sub-type the type, it will be the very same type, with no conflict possible.

About camelcasing, it is even "worse" (Eugene terms) in mORMot 2, in which *REST* became *Rest* for consistency of camelcasing, and easier spelling (as Vitaly noted).
The natural next step would be to do the same with TJWT* into TJwt* and TSQL* into TSql*...

Eugene Ilyin · 2020-12-31 10:24:50

*REST* became *Rest*, TJWT* into TJwt* and TSQL* into TSql*

Ok, I will print " Id and EntityId, not ID and EntityID ! " on A4 and meditate for an hour per day on it

Unfortunatelly Object Pascal Style Guide and Free Pascal guidelines are not clear about acronyms notation in names.

Maybe Kotlin (successor of all Android dev) balance is the best approach?

When using an acronym as part of a declaration name, capitalize it if it consists of two letters (IOStream); capitalize only the first letter if it is longer (XmlFormatter, HttpInputStream).

Last edited by Eugene Ilyin (2020-12-31 13:03:44)

Vitaly · 2020-12-31 10:51:24

Eugene Ilyin wrote:

Maybe Kotlin (successor of all Android dev) balance is the best approach?
When using an acronym as part of a declaration name, capitalize it if it consists of two letters (IOStream); capitalize only the first letter if it is longer (XmlFormatter, HttpInputStream).

It is a very nice, solid, and simple rule, I'll definitely take it as basic at least for myself. Thanks for this link/quote

edwinsn · 2020-12-31 11:12:00

Eugene Ilyin wrote:

Maybe Kotlin (successor of all Android dev) balance is the best approach?
When using an acronym as part of a declaration name, capitalize it if it consists of two letters (IOStream); capitalize only the first letter if it is longer (XmlFormatter, HttpInputStream).

Hey, that's exactly the rule I came up with myself, and has been using it for a while

Eugene Ilyin · 2020-12-31 14:29:56

Small amendment:

Following Naming Conventions and Section 3.4 of Object Pascal Style Guide which is a base for both Delphi and Free Pascal:

Except for reserved words and directives, which are in all lowercase, all Pascal identifiers should use InfixCaps, which means the first letter should be a capital, and any embedded words in an identifier should be in caps, as well as any acronym that is embedded

And

Method names should use the InfixCaps style. Start with a capital letter, and capitalize the first letter of any subsequent word in the name, as well as any letters that are part of an acronym. All other characters in the name are lower case.

But as for me the Kotlin approach above looks more balanced.

ab · 2020-12-31 14:49:42

Please check https://github.com/synopse/mORMot2/tree … pes-naming

We followed the Kotlin convention.
See https://github.com/synopse/mORMot2/comm … 908161eca7

Feedback is welcome.
What has been forgotten? What should be modified/fixed?

ab · 2020-12-31 16:25:24

I have just refactored JSON, BSON, UTF8 acronyms too...
Big commit indeed!

tbo · 2020-12-31 20:16:36

The changes look good to me. But for proper names like SQLite, you should keep the spelling.
TSQLite3HttpServer looks better than TSqlite3HttpServer to me.

I wish everyone a happy and healthy new year.

With best regards
Thomas

edwinsn · 2021-01-01 07:20:45

tbo wrote:

But for proper names like SQLite, you should keep the spelling.
TSQLite3HttpServer looks better than TSqlite3HttpServer to me.

Sorry, but I have to disagree. Why the exception, it doesn't make sense. `TSqlite3HttpServer ` looks good and it's comply with the new naming convention. Maybe it takes a little to get use to it

And happy new year!

tbo · 2021-01-01 12:55:08

edwinsn wrote:

Sorry, but I have to disagree. Why the exception, it doesn't make sense.

Sorry, my English is not so good. I would like to explain it with another example. If you buy a car with the serial name "GREAT car" it is not right to call it "great car". The name is "SQLite" not "Sqlite". For proper names, I would keep the original spelling and not change it. JSON is an acronym. Therefore it is OK to write it "Json". I hope I was able to explain my opinion a little better. From my feeling I would also leave out the "3" and write it like this "TSQLiteHttpServer".

Actually, I have been very happy with all the decisions the mORMot team has made for mORMot2. mORMot is a fantastic library. Many thanks for that.

With best regards
Thomas

esmondb · 2021-01-01 23:56:28

I’m not sure that what Utf-
8 is called matters that
much. Happy New Year!

Leslie7 · 2021-01-03 10:35:14

I prefer ab's original suggestion. For the most used datatype beside integer being short is an advantage.

If mORMot2 uses UTF8String:

I am wondering if using some short alias in my own code like UTF8 = UTF8String or U8 = UTF8String would cause any problem?

ab · 2021-01-03 11:25:24

If you use no sub-typing, then it will work

type
  UTF8 = UTF8String; // just name alias to the same TypeInfo(UTF8)=TypeInfo(UTF8String)
  UTF8 = type UTF8String; // new distinct TypeInfo(UTF8)<>TypeInfo(UTF8String)

Leslie7 · 2021-01-03 15:36:09

OK, thanks.

ab · 2021-01-05 11:29:42

So, to sum-up what we did for mORMot 2:
- We kept RawUtf8 but we made it as an alias to System.Utf8String, not a stand-alone type: you could use either of the two types with no difference;
- We followed Kotlin naming for all types: for instance, RawUTF8 is now RawUtf8, and all types have been renamed to follow those rules, e.g. TJWT* into TJwt* and TSQL* into TSql*...;
- We kept SQlite3 as SQlite3 by now. Don't know why, perhaps for praise to this great DB engine.

My personal feeling is that the source code is now much easier to read and browse. Less capitalize words seem better.

macfly · 2021-01-05 11:47:14

Great choices.

The new formatting style is a plus that cannot be omitted. It helps a lot to read and understand.

I have a serious difficulty with if then begin along the same line.

Thanks @ab

edwinsn · 2021-01-05 16:09:47

Good choices! Thanks ab for all your efforts!

mORMot Open Source

#1 2020-12-29 14:39:35

mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#2 2020-12-29 16:57:56

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#3 2020-12-29 19:44:28

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#4 2020-12-29 21:22:01

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#5 2020-12-30 01:47:14

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#6 2020-12-30 04:31:26

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#7 2020-12-30 09:51:03

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#8 2020-12-30 12:53:04

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#9 2020-12-30 14:02:41

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#10 2020-12-30 16:00:42

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#11 2020-12-30 23:20:14

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#12 2020-12-31 00:18:39

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#13 2020-12-31 05:45:09

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#14 2020-12-31 09:36:05

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#15 2020-12-31 10:24:50

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#16 2020-12-31 10:51:24

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#17 2020-12-31 11:12:00

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#18 2020-12-31 14:29:56

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#19 2020-12-31 14:49:42

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#20 2020-12-31 16:25:24

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#21 2020-12-31 20:16:36

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#22 2021-01-01 07:20:45

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#23 2021-01-01 12:55:08

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#24 2021-01-01 23:56:28

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#25 2021-01-03 10:35:14

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#26 2021-01-03 11:25:24

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#27 2021-01-03 15:36:09

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#28 2021-01-05 11:29:42

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#29 2021-01-05 11:47:14

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

#30 2021-01-05 16:09:47

Re: mORMot 2 proposal: rename RawUTF8 type to Utf8 ?

Board footer