You are not logged in.
Pages: 1
As I wrote in a previous post, Delphi string, dynamic array and memory manager don't like multi-core CPU.
My proposal is to add a threadlocalvar keyword, to be used instead of var in your code, to mark some variables to be used in only the current thread. Then the compiler and RTL won't have to use the LOCK instruction, and the application will be MUCH faster in multi-thread environment.
I made this proposal in both the Embarcadero forum and in the FreePascal forum.
So what about using a threadlocalvar new reserved word, in a threadvar way, which could define the variable (string, dynamic array) to be accessed by the current thread only, and won't have any LOCK call?
My guess is that any server thread uses mostly such local variables (e.g. for string concatenation of XML or HTML stream), and don't need the thread safe approach most of the time.
It could lead into confusion some Delphi users/beginners... perhaps it's worth noting that the threadvar itself does exist for years (since Delphi 1 I guess), but is used only by those how need this feature. If you need it, you use it. Otherwise you ignore it, and your application will be slower in some cases, but wil work. For some CPU intensive applications, like multi-thread servers, it should be a great improvement to have such a threadlocalvar at hand.
Note 1: the threadvar implements one variable instance per thread, and the threadlocalvar should implements one variable instance for the current thread only.
Note 2: any threadlocalvar could be assigned to a normal var, when a multi-thread safe variable is needed (e.g. for communication between threads): in this case, it will be a direct write, not a copy on write. Copy on write would be enabled between threadlocalvar, of course, for better performance.
Note 3: it's up to the programmer to take care of the multi-thread approach, but writing threadlocalvar (so many characters to type!) will prevent for doing it without knowing it.
This could be something like that:
function TServer.ComputePage: string;
threadlocalvar
tmp: string; // differs from var tmp: string
i: integer; // i is defined as threadlocalvar, but is the same as normal var
begin
for i := 0 to high(ListStr) do // ListStr[] is a dynamic array
tmp := tmp+ListStr[i]+#13#10; // very fast computation, without LOCK
result := tmp; // copy from local heap to global heap
end;
To implement this at the compiler and RTL level, we could use reference count <=-2 for these variables (-1=const, -2=ref 0, -3=ref 1..).
The RTL (i.e. system.pas unit) should be modified as such, for example for handling a threadlocalvar string reference counting add:
function _LStrAddRef(var str): Pointer;
var P: PStrRec;
begin
if Integer(str)<>0 then begin
P := Pointer(Integer(str) - sizeof(StrRec));
if P.refcnt >= 0 then
InterlockedIncrement(P.refcnt) else // slow multi-thread safe reference count
if P.refcnt < -1 then // -1 for const
dec(P.refcnt); // -2,-3... for threadlocalvar
end;
Result := Pointer(str);
end;
And a local threadheap should be implemented for such threadlocalvar, with threadlocalgetmem() and such functions.
Another possibility should be do add a "threadlocal" attribute for types, to be used for local variables and properties:
TServer = class
public
ListStr: threadlocal array of string;
...
In this case, the method above should begin with:
function TServer.ComputePage: string;
var
tmp: threadlocal string; // differs from var tmp: string
....
But this threadlocal attribute syntax sounds a bit not "pascalish", whereas the threadlocalvar does (it sounds like the treadvar feature) ... another idea?
Online
So, if I understand it correctly, your threadlocalvar sounds quite like a stack variable, except it will include managed types which are then automatically redirected to use the thread local version of memory manager.
But, as you said, it has some confusing usage syntax and semantics.
There will be assignment between threadlocalvar and "normal" variables.
(You cannot forbid all local-nonlocal assignments, because the computation performed using the threadlocalvar has to be returned as result, which is not a threadlocalvar.)
This assignment will be not only expensive (full copying is required), but also cumbersome.
For example, what should happend when a threadlocalvar instance of a class is assigned to a normal class reference?
Because the class instance itself may contain reference to other threadlocalvar types/class instances, a simple instance memory copy may not be enough.
Then all classes must implement a compiler magic "threadlocalcopy()" method to perform deep copying.
Another possible solution is to forbid assignment between "complex" threadlocalvar types and normal types, such as class references.
Then this may lead to confusion, and extra complexity of the compilers.
Also what will happen if someone typecast a complex type of threadlocalvar to pointer or integer and try to pass it?
Last edited by AdamWu (2010-08-01 07:25:53)
Offline
You're right, memory is allocated by a local memory manager.
My proposal is to use the reference counter (starting with -2) to identify the threadlocalvar variables.
For example, when you assign threadlocalvar and "normal" variables, you make a copy of them (it's the same for the const string when not used localy).
And it'll be less expensive than a LOCK in multi-thread applications.
At first, I didn't want to have threadloalvar instance of a class. There could be threadlocalvar members in the class, but the class itself could be allocated globally.
If you typecast a threadlocalvar to pointer or integer, it's exactly the same than for a threadvar: you are meant to know what you're doing.
This feature is for advanced coders, like the threadvar, and will be used only by those who really need it. But I think it'll be worth having it when you build multi-thread servers.
Online
> And it'll be less expensive than a LOCK in multi-thread applications
I doubt that. Currently string assignment is by reference only, a full copy would involve many times more instruction executions than that.
Also, for threadlocal to threadlocal assignment, yes, it will be faster. But since threadlocal and global variables must be differenciated and treated differently, its intruduction will cause all existing code to run slower, maybe just by a tiny bit, but that includes almost everything...
Last edited by AdamWu (2010-08-04 04:33:13)
Offline
You're right in that threadlocalvar should be used with circumspection, knowing what you're doing.
A full copy could take place in case of threadlocal -> global var assignment. I stated that it will occur only once, for example when the request process is about to be finished, and you need to pass data back to the main server thread.
Perhaps implementing RCU in Delphi could be the best way to do it.
The easier way will be to use librcu or libsync, on FPC or Kylix, and run the software under Linux, which is a great platform for running servers anyway. Both librcu and libsync seems not to be fully implemented yet... But it's worth taking an eye to it!
When you take a look at http://lwn.net/Articles/263130 the performance graphs are outstanding!
Online
Sounds very close to the Rust memory model.
Online
Pages: 1