You are not logged in.
Pages: 1
Hi all, 
here I have written an assembly function, which converts a general numerical string to Persian - Arabic equvalent.
for example: '1234567890' ==> '۱۲۳۴۵۶۷۸۹۰'
NOTE ---- all characters of the string parameter assumed to be numeric, and no further checking seems necessary.
Your ideas are welcome to help a newcomer amateur programmer optimize this little piece of code. 
thanx.
function NumStrToFa(const str: string): string;
asm
      test  eax, eax               // check if the string is null
      jz    @@Null2
      push  eax                    // save the input string pointer to stack
      mov   eax, [eax - 4]      // length of the input string
      test  eax, eax              // checking for emty string
      jz    @@Null1
      push  eax                   // saving the length of input string to stack
      push  edx                   // saving the @Result
      call  System.@NewUnicodeString   //creating a new string which forms the result
      pop   edx                                  // retrieving the @Result         
      mov   [edx], eax                        // assign the new string to @Result
      pop   ecx                                  // retrieving the Length of string
      pop   eax                                  // retrieving the input string pointer
      push  edx                                 // saving the Result string
      mov   edx, [edx]                        // forming the edx register to be used in the loop.
      push  ebx
@@Loop:
      mov   bx,  word ptr [eax + ecx * 2 - 2]
      add   bx,  6C0h                                      //$C0 is the  Persian-Arabic Zero character order ('۰') - '0'
      mov   word ptr [edx + ecx * 2 - 2], bx
      Loop  @@Loop
      pop   ebx
      pop   edx                                             // having the Result back
      ret
@@Null1:
      add   esp, 4
@@Null2:
end;Amir
Offline
Your code is great.
Why do you save edx? I'm not sure you need to. 
Perhaps you could make the conversion by two WideChar at once, like this:
....
push ecx
shr ecx,1
jz @@one
@@Loop: // perhaps use Alt-F2 to align this @@Loop point to a multiple of 4 bytes address by inserting nop above it
dec ecx
mov ebx,[eax]
lea ebx,ebx+$06C006C0   //$C0 is the  Persian-Arabic Zero character order ('۰') - '0'
lea eax,eax+4
mov [edx],ebx
lea edx,edx+4
jnz @@Loop
pop ecx
and ecx,1
jz @@nomore
mov bx,[eax]
add bx,$6C0
mov [edx],bx
@@nomore:
pop ebx
ret
@@one:
pop ecx
mov bx,[eax]
add bx,$6C0
mov [edx],bx
pop ebx
retI wrote this in the forum directly. Didn't test this code...
Offline
Hi Dear Arnaud,
Thanx for ur beautiful code... I am learning from your codes vary much. 
THANK YOU.
I had checked to prevent extra "pushing and popping" of registers, when I wrote the code; but unfortunately the "NewUnicodeString" function affects and manipulates all EAX, ECX, EDX registers' data !!!
so I had to save them to retrieve essential data needed to run the rest of code. and the couple of PUSH-POP which has surrounded the "NewUnicodeString" seems necessary, but ther is another "PUSH EDX-POP EDX" which can be ANIHILATED... as you've illustrated in your code .
I always prefer native pointers, even in pure pascal. like C/C++. but I wanted to test delphi-string manipulation using assembly, and I found it does not make me a favour !!!
I think I'd be back to pointers again, any where possible ...  . But there is no other choice in VCL, we should challenge delphi string.
 . But there is no other choice in VCL, we should challenge delphi string. 
As you noticed I could have written such code by converting two wideChars at the same time. now I have completed this function using your code template, and I moved and replaced
and ecx, 1 ===> test ecx, 1
then there is no need to have another couple of [PUSH ECX - POP ECX], and also by merging two sections ("@@nomore", "@@one") in one, we have a smaller function.
function NumStrToFa(const str: string): string;
asm
      test  eax, eax
      jz    @@Null2
      push  eax
      mov   eax, [eax - 4]
      test  eax, eax
      jz    @@Null1
      push  eax
      push  edx
      call  System.@NewUnicodeString
      pop   edx
      mov   [edx], eax
      pop   ecx
      pop   eax
      mov   edx, [edx]
      push  ebx
      test  ecx, 1
      jz    @@Loop
      shr   ecx, 1
@@one:
      mov  bx, [eax]
      add   bx, $6C0
      mov  [edx], bx
      test  ecx, ecx
      jz    @@Quit
      lea   eax, eax + 2
      lea   edx, edx + 2
@@Loop:
      dec   ecx
      mov   ebx, [eax]
      lea   ebx, ebx + $06C006C0
      lea   eax, eax + 4
      mov   [edx], ebx
      lea   edx, edx + 4
      jnz   @@Loop
@@Quit:
      pop   ebx
      ret
@@Null1:
      add   esp, 4
@@Null2:
end;Please tell me about these changes... Thanks.
Last edited by Amir (2010-08-19 15:50:51)
Amir
Offline
Since there is no string with length()=0 (it's always encoded as nil), you can change:
 
      mov   eax, [eax - 4]
      test  eax, eax
      jz    @@Null1
      push  eaxinto:
 
      mov   eax, [eax - 4]
      push  eaxIt's not a good idea to put the @@one before the main loop.
I didn't do that, so my code could look duplicated, but it was on purpose.
In order to achieve best performance, you must access DWORD data in 4 bytes alignment.
By putting the @@one: before the main loop, you change the alignment of [eax] (which is DWORD aligned by default) into a 2 byte alignment, therefore the whole optimization of the main @@Loop (i.e. handling 2 WideChars at once) is void.
The new code could even perform worse than the previous dealing with one widechar at once...
So you've to perform the main @@Loop first, with 4 bytes alignement, then, if there is once widechar left, process it later.
I suspect that if the string is '', you should explicitly clear the result string with the appropriate call function (for ansistring it's LStrClr I'll let you guess for UnicodeString).
Most of the time, the result is already ''. It's the case when you use this function as a parameter to another method: in this case, the string is a temporary string, therefore it's initialized to ''. So your code will work.
For example, you'll have:
  s := 'something';
 (....)
  s := NumStrToFa('');
  // here s='something' and should be '' :(In all cases, you're learning fast!
Offline
Hi ... 
Where can I read about data alignment !!! I know nothing about what u say  I really like to learn why it is so important.
 I really like to learn why it is so important. 
Know only I can make a guss of what you said about alignment, that I'll change the code and post it again.
Please let me know a complete reference about it.
as you noticed the the dummy code removed from first start of the function:
function NumStrToFa(const str: string): string;
asm
      test  eax, eax
      jz    @@Null
      push  eax
      mov   eax, dword ptr [eax - 4]
      push  eax
      .
      .
      .The function you mentioned is "UStrClr" which shoulf be applied at the end of the function.
that should manage the emty string explicitly.
   .
   .
   .
@@Null:
      mov   eax, edx               //  preparing the result parameter for "UStrClr" function
      jmp   System.@UStrClr
end;                                  // end of the function declarationRealllyyyy Thankssssss again.
Last edited by Amir (2010-08-19 17:41:37)
Amir
Offline
About alignment and optimization, I found out that this official document from AMD was worth downloading:
http://support.amd.com/us/Processor_TechDocs/25112.PDF
You've it's little brother from Intel:
http://www.intel.com/assets/pdf/manual/248966.pdf
To be short: in modern CPU, data should be 16 bytes aligned for SSE, and at least 4 bytes aligned for DWORD access.
That is you should read/write a dword from address $403200 $403204 $403208 $40320C $403210 ... and not from address $403201 $403202 $403203 $403205 $403206 $403207 $403209 ....
So you should always access DWORD data in DWORD alignment.
That's why Delphi itself align the fields to 4 or 8 bytes boundaries:
type
  TR = record
    aByte: byte;
    aInteger: integer; 
  end;sizeof(TR) will not be 5, but 8, because aInteger will be aligned to a DWORD multiple.
Offline
I'm back ... I read something about data alignment also also I decided to study delphi string managment mechanism, so I read string management routines in "system.pas" unit, and now I am aware of what happens to a string in its lifecycle... this should have been accomplished soOoOoOoner  
  SORRY ...
  SORRY ... 
Then I found that, the problem you mentioned about "Clearing empty strings" still persists in my code. Where I'd used "NewUnicodeString" function, the string previously assigned (temp string), never cleared, if there was one. I found another function in "system.pas" which fits my needs, called "UStrSetLength". so I changed the code to provide a DWORD aligned function which manages strings using the function (UStrSetLength) correctly.
Now how can I align the loop that is the main part of the function ? --- You noticed by adding NOP at start point of the loop, but how I can have the size of my loop to define number of NOPs to apply. --- You highlited something in a comment of your code about ALT-F12 shortcut, I tried multiple times in different situations, but nothing occured, would you please help me again.
Thank You.
function NumStrToFa(const str: string): string;
asm
      xchg   eax, edx
      test   edx, edx
      jz      @@Null
      push  edx
      mov   edx, dword ptr [edx - 4]
      push  edx
      call    System.@UStrSetLength
      mov   edx, eax
      pop    ecx
      pop    eax
      mov    edx, dword ptr [edx]
      push   ebx
      shr     ecx, 1
      jz       @@one
@@Loop:
      dec    ecx
      mov   ebx, dword ptr [eax]
      lea     ebx, ebx + $06C006C0
      lea     eax, eax + 4
      mov   dword ptr [edx], ebx
      lea     edx, edx + 4
      jnz     @@Loop
@@one:
      mov   ebx, dword ptr [eax]
      test   bx, bx
      jz      @@Quit
      add    ebx, $6C0
      mov   dword ptr [edx], ebx
@@Quit:
      pop   ebx
      ret
@@Null:
      jmp   System.@UStrClr
end;Please let me know your idea.
Cheers ...
Last edited by Amir (2010-08-19 20:39:33)
Amir
Offline
Now how can I align the loop that is the main part of the function ? --- You noticed by adding NOP at start point of the loop, but how I can have the size of my loop to define number of NOPs to apply. --- You highlited something in a comment of your code about ALT-F12 shortcut, I tried multiple times in different situations, but nothing occured, would you please help me again.
Sorry, it's Alt-F2
You'll have the full disassembling of your assembler code... but on the left side of the window, you'll have the starting address of every assembler instruction.
Just add Nop instructions before @@Loop so that this @@Loop will be 4 bytes aligned.
And you're 100% right about not using NewUnicodeString but using UStrLength.
And the faster will be UStrClr + NewUnicodeString because UStrLength performs a resize, and is therefore slower.
Offline
Thanks Dear Arnaud.
I put a breakpoint at the start point of the @@Loop and using the delphi dissembler stopping at that point, I could simply calculate the size of Loop which was 11 bytes, and needed only one NOP instruction, to be DWORD aligned.
      .
      .
      . 
      nop
@@Loop:
      dec   ecx
      mov   ebx, dword ptr [eax]
      lea   ebx, ebx - $06C006C0
      lea   eax, eax + 4
      mov   dword ptr [edx], ebx
      lea   edx, edx + 4
      jnz   @@Loop
      .
      .
      .
      I'll also replace the UStrSetLength with combination of UStrClr and NewUnicodeString as you advised to optimize the function.
All Thanks for ur help.
Cheers.
Amir
Offline
Great work!
What do you mean about "the size of the loop". It's not the size of the loop you'll have to align, but the address of the @@Loop label which must be DWORD aligned.
Could you post the resulting code here?
Therefore the "call UStrClr" could be made in all cases, just at the beginning of the function, if you use NewUnicodeString
It could be interesting.
Thanks
Offline
Hi Dear Arnaud
Here's new code, using UStrClr + NewUnicodeString, which has @@Loop at 4-byte aligned address, I tried to make an aligned loop without adding NOPs...
And about "size"... I was about aligning both address of the loop and size of the loop, this achieved by adding one extra NOP instraction before loop start-point.
It was because when studying "Intel Optimization... PDF" which you introduced, I misunderstood a subject, that after progressing in study I found it.
Sorry. 
function NumToPersian(const str: string): string;
asm
      test  eax, eax
      xchg  eax, edx
      jnz   @@Continue
      call System.@UStrClr
      ret
@@Continue:
      push  edx
      mov   edx, [edx - 4]
      push  edx
      push  eax
      push  edx
      call  System.@UStrClr
      pop   eax
      call  System.@NewUnicodeString
      pop   edx
      mov   [edx], eax
      mov   edx, [edx]
      pop   ecx
      pop   eax
      test  ecx, 1
      jz   @@Prepare
      inc   ecx
@@Prepare:
      shr   ecx, 1
      push  ebx
@@Loop:
      dec   ecx
      mov   ebx, dword ptr [eax]
      lea   ebx, ebx + $06C006C0
      lea   eax, eax + 4
      mov   dword ptr [edx], ebx
      lea   edx, edx + 4
      jnz   @@Loop
      pop   ebx
end;best.
Amir
Offline
By the wayyyyyy....
I tried to make just one call to "UStrClr" but because it affects all the world !!! and I had to push EDX, ECX registers, I found it much better to include two "UStrClr"s !!!
Please let me know your idea, which guides me through the rest of the work....
thanks.
Last edited by Amir (2010-08-20 17:09:57)
Amir
Offline
Your code is always better.
I've put my remarks as comment in the code.
function NumToPersian(const str: string): string;
asm
      test  eax, eax
      xchg  eax, edx
      jz @@System.@UStrClr // if works, it's faster
      push  edx
      mov   edx, [edx - 4]
      push  edx
      push  eax
      push  edx
      call  System.@UStrClr
      pop   eax
      call  System.@NewUnicodeString
      pop   edx
      mov   [edx], eax
      mov   edx, eax // faster (will be pipelined: for speed, one instruction don't have to depend to its previous ones)
      pop   ecx
      pop   eax
      test  ecx, 1
      jz   @@Prepare 
      inc   ecx  // nice try, but it will change the trailing #0 into #6C0 -> you're corrupting the Delphi string model: I think you need to handle one char conversion by itself
@@Prepare:
      shr   ecx, 1
      push  ebx
@@Loop:
      dec   ecx
      mov   ebx, dword ptr [eax]
      lea   ebx, ebx + $06C006C0
      lea   eax, eax + 4
      mov   dword ptr [edx], ebx
      lea   edx, edx + 4
      jnz   @@Loop
      pop   ebx
end;And one remark is that you don't have to search for shorter code by all means. 
Shorter code is not necessary faster. 
If your code is really big, i.e. it won't fit in the L1 op cache, that's not good.
But sometimes, if your code is a bit longer, but make better use of pipelining, it'll be faster in practice.
And don't forget that in modern CPU, the code you're writing is not the one which will be executed... you've an internal conversion to micro-ops, multiple pipelines, caches, etc... That's why it's a need to read the documents from Intel & AMD.
As a conclusion, having some duplicated code for handling one char conversion is not a problem, IMHO.
Offline
Thanks...
I remember all your remarks. There are lots of things I should learn... I know nothing about pipelining & ..., beside documents you introduced, I've started to study "The Intel Microprocessors ... 7th Ed - Barry B. Brey". I decide to learn this completely... I love lowlevel 
All thanks.
function NumToPersian(const str: string): string;
asm
      test  eax, eax
      xchg  eax, edx
      jz    System.@UStrClr
@@Continue:
      push  edx
      mov   edx, [edx - 4]
      push  edx
      push  eax
      push  edx
      call  System.@UStrClr
      pop   eax
      call  System.@NewUnicodeString
      pop   edx
      mov   [edx], eax
      mov   edx, eax
      pop   ecx
      pop   eax
      shr   ecx, 1
      push  ebx
      jz    @@One
      nop
@@Loop:
      dec   ecx
      mov   ebx, dword ptr [eax]
      lea   ebx, ebx + $06C006C0
      lea   eax, eax + 4
      mov   dword ptr [edx], ebx
      lea   edx, edx + 4
      jnz   @@Loop
@@One:
      mov   ebx, dword ptr [eax]
      test  bx, bx
      jz    @@Quit
      add   ebx, $6C0
      mov   dword ptr [edx], ebx
@@Quit:
      pop   ebx
end;Last edited by Amir (2010-08-20 20:41:24)
Amir
Offline
Hi Dear Arnaud.
And Thanks for every thing...
One more question, does delphi iteself align a for, while or repeat/until loop in an appropriate manner ???
Amir
Offline
No, there is no code alignment performed by the compiler yet, inside the code. 
Only beginning of methods/functions/procedure are somewhat aligned (to 4 or 8 bytes address boundaries).
As I remember the upcoming Delphi XE will have a {$CODEALIGN ....} directive do implement such alignment.
But aligning for/while/repeat loops is not performed, and won't be in XE (as far as I remember). 
Take a look at the FastCode site: even if it's not updated nowadays, you've very interesting stuff here, about optimization and BASM.
Offline
Pages: 1