You are not logged in.
Pages: 1
Hi all,
here I have written an assembly function, which converts a general numerical string to Persian - Arabic equvalent.
for example: '1234567890' ==> '۱۲۳۴۵۶۷۸۹۰'
NOTE ---- all characters of the string parameter assumed to be numeric, and no further checking seems necessary.
Your ideas are welcome to help a newcomer amateur programmer optimize this little piece of code.
thanx.
function NumStrToFa(const str: string): string;
asm
test eax, eax // check if the string is null
jz @@Null2
push eax // save the input string pointer to stack
mov eax, [eax - 4] // length of the input string
test eax, eax // checking for emty string
jz @@Null1
push eax // saving the length of input string to stack
push edx // saving the @Result
call System.@NewUnicodeString //creating a new string which forms the result
pop edx // retrieving the @Result
mov [edx], eax // assign the new string to @Result
pop ecx // retrieving the Length of string
pop eax // retrieving the input string pointer
push edx // saving the Result string
mov edx, [edx] // forming the edx register to be used in the loop.
push ebx
@@Loop:
mov bx, word ptr [eax + ecx * 2 - 2]
add bx, 6C0h //$C0 is the Persian-Arabic Zero character order ('۰') - '0'
mov word ptr [edx + ecx * 2 - 2], bx
Loop @@Loop
pop ebx
pop edx // having the Result back
ret
@@Null1:
add esp, 4
@@Null2:
end;
Amir
Offline
Your code is great.
Why do you save edx? I'm not sure you need to.
Perhaps you could make the conversion by two WideChar at once, like this:
....
push ecx
shr ecx,1
jz @@one
@@Loop: // perhaps use Alt-F2 to align this @@Loop point to a multiple of 4 bytes address by inserting nop above it
dec ecx
mov ebx,[eax]
lea ebx,ebx+$06C006C0 //$C0 is the Persian-Arabic Zero character order ('۰') - '0'
lea eax,eax+4
mov [edx],ebx
lea edx,edx+4
jnz @@Loop
pop ecx
and ecx,1
jz @@nomore
mov bx,[eax]
add bx,$6C0
mov [edx],bx
@@nomore:
pop ebx
ret
@@one:
pop ecx
mov bx,[eax]
add bx,$6C0
mov [edx],bx
pop ebx
ret
I wrote this in the forum directly. Didn't test this code...
Offline
Hi Dear Arnaud,
Thanx for ur beautiful code... I am learning from your codes vary much.
THANK YOU.
I had checked to prevent extra "pushing and popping" of registers, when I wrote the code; but unfortunately the "NewUnicodeString" function affects and manipulates all EAX, ECX, EDX registers' data !!!
so I had to save them to retrieve essential data needed to run the rest of code. and the couple of PUSH-POP which has surrounded the "NewUnicodeString" seems necessary, but ther is another "PUSH EDX-POP EDX" which can be ANIHILATED... as you've illustrated in your code .
I always prefer native pointers, even in pure pascal. like C/C++. but I wanted to test delphi-string manipulation using assembly, and I found it does not make me a favour !!!
I think I'd be back to pointers again, any where possible ... . But there is no other choice in VCL, we should challenge delphi string.
As you noticed I could have written such code by converting two wideChars at the same time. now I have completed this function using your code template, and I moved and replaced
and ecx, 1 ===> test ecx, 1
then there is no need to have another couple of [PUSH ECX - POP ECX], and also by merging two sections ("@@nomore", "@@one") in one, we have a smaller function.
function NumStrToFa(const str: string): string;
asm
test eax, eax
jz @@Null2
push eax
mov eax, [eax - 4]
test eax, eax
jz @@Null1
push eax
push edx
call System.@NewUnicodeString
pop edx
mov [edx], eax
pop ecx
pop eax
mov edx, [edx]
push ebx
test ecx, 1
jz @@Loop
shr ecx, 1
@@one:
mov bx, [eax]
add bx, $6C0
mov [edx], bx
test ecx, ecx
jz @@Quit
lea eax, eax + 2
lea edx, edx + 2
@@Loop:
dec ecx
mov ebx, [eax]
lea ebx, ebx + $06C006C0
lea eax, eax + 4
mov [edx], ebx
lea edx, edx + 4
jnz @@Loop
@@Quit:
pop ebx
ret
@@Null1:
add esp, 4
@@Null2:
end;
Please tell me about these changes... Thanks.
Last edited by Amir (2010-08-19 15:50:51)
Amir
Offline
Since there is no string with length()=0 (it's always encoded as nil), you can change:
mov eax, [eax - 4]
test eax, eax
jz @@Null1
push eax
into:
mov eax, [eax - 4]
push eax
It's not a good idea to put the @@one before the main loop.
I didn't do that, so my code could look duplicated, but it was on purpose.
In order to achieve best performance, you must access DWORD data in 4 bytes alignment.
By putting the @@one: before the main loop, you change the alignment of [eax] (which is DWORD aligned by default) into a 2 byte alignment, therefore the whole optimization of the main @@Loop (i.e. handling 2 WideChars at once) is void.
The new code could even perform worse than the previous dealing with one widechar at once...
So you've to perform the main @@Loop first, with 4 bytes alignement, then, if there is once widechar left, process it later.
I suspect that if the string is '', you should explicitly clear the result string with the appropriate call function (for ansistring it's LStrClr I'll let you guess for UnicodeString).
Most of the time, the result is already ''. It's the case when you use this function as a parameter to another method: in this case, the string is a temporary string, therefore it's initialized to ''. So your code will work.
For example, you'll have:
s := 'something';
(....)
s := NumStrToFa('');
// here s='something' and should be '' :(
In all cases, you're learning fast!
Offline
Hi ...
Where can I read about data alignment !!! I know nothing about what u say I really like to learn why it is so important.
Know only I can make a guss of what you said about alignment, that I'll change the code and post it again.
Please let me know a complete reference about it.
as you noticed the the dummy code removed from first start of the function:
function NumStrToFa(const str: string): string;
asm
test eax, eax
jz @@Null
push eax
mov eax, dword ptr [eax - 4]
push eax
.
.
.
The function you mentioned is "UStrClr" which shoulf be applied at the end of the function.
that should manage the emty string explicitly.
.
.
.
@@Null:
mov eax, edx // preparing the result parameter for "UStrClr" function
jmp System.@UStrClr
end; // end of the function declaration
Realllyyyy Thankssssss again.
Last edited by Amir (2010-08-19 17:41:37)
Amir
Offline
About alignment and optimization, I found out that this official document from AMD was worth downloading:
http://support.amd.com/us/Processor_TechDocs/25112.PDF
You've it's little brother from Intel:
http://www.intel.com/assets/pdf/manual/248966.pdf
To be short: in modern CPU, data should be 16 bytes aligned for SSE, and at least 4 bytes aligned for DWORD access.
That is you should read/write a dword from address $403200 $403204 $403208 $40320C $403210 ... and not from address $403201 $403202 $403203 $403205 $403206 $403207 $403209 ....
So you should always access DWORD data in DWORD alignment.
That's why Delphi itself align the fields to 4 or 8 bytes boundaries:
type
TR = record
aByte: byte;
aInteger: integer;
end;
sizeof(TR) will not be 5, but 8, because aInteger will be aligned to a DWORD multiple.
Offline
I'm back ... I read something about data alignment also also I decided to study delphi string managment mechanism, so I read string management routines in "system.pas" unit, and now I am aware of what happens to a string in its lifecycle... this should have been accomplished soOoOoOoner SORRY ...
Then I found that, the problem you mentioned about "Clearing empty strings" still persists in my code. Where I'd used "NewUnicodeString" function, the string previously assigned (temp string), never cleared, if there was one. I found another function in "system.pas" which fits my needs, called "UStrSetLength". so I changed the code to provide a DWORD aligned function which manages strings using the function (UStrSetLength) correctly.
Now how can I align the loop that is the main part of the function ? --- You noticed by adding NOP at start point of the loop, but how I can have the size of my loop to define number of NOPs to apply. --- You highlited something in a comment of your code about ALT-F12 shortcut, I tried multiple times in different situations, but nothing occured, would you please help me again.
Thank You.
function NumStrToFa(const str: string): string;
asm
xchg eax, edx
test edx, edx
jz @@Null
push edx
mov edx, dword ptr [edx - 4]
push edx
call System.@UStrSetLength
mov edx, eax
pop ecx
pop eax
mov edx, dword ptr [edx]
push ebx
shr ecx, 1
jz @@one
@@Loop:
dec ecx
mov ebx, dword ptr [eax]
lea ebx, ebx + $06C006C0
lea eax, eax + 4
mov dword ptr [edx], ebx
lea edx, edx + 4
jnz @@Loop
@@one:
mov ebx, dword ptr [eax]
test bx, bx
jz @@Quit
add ebx, $6C0
mov dword ptr [edx], ebx
@@Quit:
pop ebx
ret
@@Null:
jmp System.@UStrClr
end;
Please let me know your idea.
Cheers ...
Last edited by Amir (2010-08-19 20:39:33)
Amir
Offline
Now how can I align the loop that is the main part of the function ? --- You noticed by adding NOP at start point of the loop, but how I can have the size of my loop to define number of NOPs to apply. --- You highlited something in a comment of your code about ALT-F12 shortcut, I tried multiple times in different situations, but nothing occured, would you please help me again.
Sorry, it's Alt-F2
You'll have the full disassembling of your assembler code... but on the left side of the window, you'll have the starting address of every assembler instruction.
Just add Nop instructions before @@Loop so that this @@Loop will be 4 bytes aligned.
And you're 100% right about not using NewUnicodeString but using UStrLength.
And the faster will be UStrClr + NewUnicodeString because UStrLength performs a resize, and is therefore slower.
Offline
Thanks Dear Arnaud.
I put a breakpoint at the start point of the @@Loop and using the delphi dissembler stopping at that point, I could simply calculate the size of Loop which was 11 bytes, and needed only one NOP instruction, to be DWORD aligned.
.
.
.
nop
@@Loop:
dec ecx
mov ebx, dword ptr [eax]
lea ebx, ebx - $06C006C0
lea eax, eax + 4
mov dword ptr [edx], ebx
lea edx, edx + 4
jnz @@Loop
.
.
.
I'll also replace the UStrSetLength with combination of UStrClr and NewUnicodeString as you advised to optimize the function.
All Thanks for ur help.
Cheers.
Amir
Offline
Great work!
What do you mean about "the size of the loop". It's not the size of the loop you'll have to align, but the address of the @@Loop label which must be DWORD aligned.
Could you post the resulting code here?
Therefore the "call UStrClr" could be made in all cases, just at the beginning of the function, if you use NewUnicodeString
It could be interesting.
Thanks
Offline
Hi Dear Arnaud
Here's new code, using UStrClr + NewUnicodeString, which has @@Loop at 4-byte aligned address, I tried to make an aligned loop without adding NOPs...
And about "size"... I was about aligning both address of the loop and size of the loop, this achieved by adding one extra NOP instraction before loop start-point.
It was because when studying "Intel Optimization... PDF" which you introduced, I misunderstood a subject, that after progressing in study I found it.
Sorry.
function NumToPersian(const str: string): string;
asm
test eax, eax
xchg eax, edx
jnz @@Continue
call System.@UStrClr
ret
@@Continue:
push edx
mov edx, [edx - 4]
push edx
push eax
push edx
call System.@UStrClr
pop eax
call System.@NewUnicodeString
pop edx
mov [edx], eax
mov edx, [edx]
pop ecx
pop eax
test ecx, 1
jz @@Prepare
inc ecx
@@Prepare:
shr ecx, 1
push ebx
@@Loop:
dec ecx
mov ebx, dword ptr [eax]
lea ebx, ebx + $06C006C0
lea eax, eax + 4
mov dword ptr [edx], ebx
lea edx, edx + 4
jnz @@Loop
pop ebx
end;
best.
Amir
Offline
By the wayyyyyy....
I tried to make just one call to "UStrClr" but because it affects all the world !!! and I had to push EDX, ECX registers, I found it much better to include two "UStrClr"s !!!
Please let me know your idea, which guides me through the rest of the work....
thanks.
Last edited by Amir (2010-08-20 17:09:57)
Amir
Offline
Your code is always better.
I've put my remarks as comment in the code.
function NumToPersian(const str: string): string;
asm
test eax, eax
xchg eax, edx
jz @@System.@UStrClr // if works, it's faster
push edx
mov edx, [edx - 4]
push edx
push eax
push edx
call System.@UStrClr
pop eax
call System.@NewUnicodeString
pop edx
mov [edx], eax
mov edx, eax // faster (will be pipelined: for speed, one instruction don't have to depend to its previous ones)
pop ecx
pop eax
test ecx, 1
jz @@Prepare
inc ecx // nice try, but it will change the trailing #0 into #6C0 -> you're corrupting the Delphi string model: I think you need to handle one char conversion by itself
@@Prepare:
shr ecx, 1
push ebx
@@Loop:
dec ecx
mov ebx, dword ptr [eax]
lea ebx, ebx + $06C006C0
lea eax, eax + 4
mov dword ptr [edx], ebx
lea edx, edx + 4
jnz @@Loop
pop ebx
end;
And one remark is that you don't have to search for shorter code by all means.
Shorter code is not necessary faster.
If your code is really big, i.e. it won't fit in the L1 op cache, that's not good.
But sometimes, if your code is a bit longer, but make better use of pipelining, it'll be faster in practice.
And don't forget that in modern CPU, the code you're writing is not the one which will be executed... you've an internal conversion to micro-ops, multiple pipelines, caches, etc... That's why it's a need to read the documents from Intel & AMD.
As a conclusion, having some duplicated code for handling one char conversion is not a problem, IMHO.
Offline
Thanks...
I remember all your remarks. There are lots of things I should learn... I know nothing about pipelining & ..., beside documents you introduced, I've started to study "The Intel Microprocessors ... 7th Ed - Barry B. Brey". I decide to learn this completely... I love lowlevel
All thanks.
function NumToPersian(const str: string): string;
asm
test eax, eax
xchg eax, edx
jz System.@UStrClr
@@Continue:
push edx
mov edx, [edx - 4]
push edx
push eax
push edx
call System.@UStrClr
pop eax
call System.@NewUnicodeString
pop edx
mov [edx], eax
mov edx, eax
pop ecx
pop eax
shr ecx, 1
push ebx
jz @@One
nop
@@Loop:
dec ecx
mov ebx, dword ptr [eax]
lea ebx, ebx + $06C006C0
lea eax, eax + 4
mov dword ptr [edx], ebx
lea edx, edx + 4
jnz @@Loop
@@One:
mov ebx, dword ptr [eax]
test bx, bx
jz @@Quit
add ebx, $6C0
mov dword ptr [edx], ebx
@@Quit:
pop ebx
end;
Last edited by Amir (2010-08-20 20:41:24)
Amir
Offline
Hi Dear Arnaud.
And Thanks for every thing...
One more question, does delphi iteself align a for, while or repeat/until loop in an appropriate manner ???
Amir
Offline
No, there is no code alignment performed by the compiler yet, inside the code.
Only beginning of methods/functions/procedure are somewhat aligned (to 4 or 8 bytes address boundaries).
As I remember the upcoming Delphi XE will have a {$CODEALIGN ....} directive do implement such alignment.
But aligning for/while/repeat loops is not performed, and won't be in XE (as far as I remember).
Take a look at the FastCode site: even if it's not updated nowadays, you've very interesting stuff here, about optimization and BASM.
Offline
Pages: 1