Announcement

Collapse
No announcement yet.

assembler format$ question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • assembler format$ question

    Hi everyone, I am trying to convert up to 6-digit positive
    integers to strings quickly (for the same random # generator I've
    been obsessing over for a month) and I even tried learning a bit
    of assembler to do it. The assembler helped other (easier)
    parts of the program very much, (the inline assembler is quewool!
    I'd never used it, but am gonna use it more 'cause it seems that
    even clunky asm moves beaucoup mail) but I'm stuck on
    the: s$ = FORMAT$(sixDigit&)

    I really would like it to do this:
    s$ = FORMAT$(upToSixDigit&, "000000")
    (for any integers less than 6 digits too)

    but
    s$ = FORMAT$(sixDigit&) would be excellent.

    Would any assembler fans know of a way to do this quickly?
    Welp, gotta say it sooner or lat.. sooner! By "quickly" i mean
    a factor of ah.. 15 maybe? do I hear 20?

    If yer still reading, whilst I'm asking assembly questions,
    can the following be done in less ticks? Is my code the most
    newbie looking ASM you've seen? I've learned fiv.. no six! opcodes
    so far.

    I'm adding a bunch of
    dwords. Here is a 4 dword example:

    Code:
    LOCAL dw AS DWORD, dw2 AS DWORD
    LOCAL dw3 AS DWORD, dw4 AS DWORD
    LOCAL hiWrd AS DWORD, loWrd AS DWORD 
    
    !MOV EAX, dw 
    !ADD EAX, dw2  ;add them then
    !ADC EDX, 0    ;keep track of possible carry in EDX
    !ADD EAX, dw3  ;add another then
    !ADC EDX, 0    ;keep adding possible carrys in EDX
    !ADD EAX, dw4  ;add another then
    !ADC EDX, 0    ;keep adding possible carrys in EDX
    !MOV hiWrd, EDX ;hi dword of answer
    !MOV loWrd, EAX ;lo dword of answer


    [This message has been edited by John Gleason (edited January 05, 2004).]

  • #2
    John,
    format a 6 digit number.

    Paul.
    Code:
    $REGISTER NONE
    FUNCTION PBMAIN()
     
     
    x$="000000"     'where to put the answer
    xp&=STRPTR(x$)  'a pointer so ASM can find it
    n&=123456       'the number to convert
    
    'the obvious way
    !pushad         'save registers
      
    !mov eax,n&     'get number
    !mov ecx,5      '6 digits (start at 0)
    lp:
    !mov edx,0      'MSBs of 64 bit division in EDX:EAX
    !mov ebx,10     'convert to decimal
    !mov esi,xp&    'point to result
    
    !div ebx        'do a digit
    !add edx,48     'convert to ASCII
    !mov [esi+ecx],dl   'store it in result
    !dec ecx        'next digit
    !jns lp         'finished?
     
    !popad          'restore registers
    
    MSGBOX x$
     
    END FUNCTION

    ------------------

    Comment


    • #3
      John,
      oops. I was in a hurry when I posted it and, although it works, I forgot to move the 2 lines :
      Code:
      !mov ebx,10     'convert to decimal
      !mov esi,xp&    'point to result
      to above LP: as they should be set up once before the loop, not everytime through it.

      Also, only ESI and EBX need to be saved on the stack so I should have had..
      Code:
      !push esi
      !push ebx
       
      !mov eax,n&     'get number
      !mov ecx,5      '6 digits (start at 0)
      !mov ebx,10     'convert to decimal
      !mov esi,xp&    'point to result
        
      lp:
      !mov edx,0      'MSBs of 64 bit division in EDX:EAX
      !div ebx        'do a digit
      !add edx,48     'convert to ASCII
      !mov [esi+ecx],dl   'store it in result
      !dec ecx        'next digit
      !jns lp         'finished?
       
      !pop ebx
      !pop esi
      <<By "quickly" i mean a factor of ah.. 15 maybe? do I hear 20?>>

      Sorry again but I make that about 3200 times faster on my machine that FORMAT$, I hope that's not a problem
      There are faster ways still to do this if you need more than 6 digits.

      Paul.

      Comment


      • #4
        Originally posted by Paul Dixon:
        Code:
        !dec ecx        'next digit
        !jns lp         'finished?
         
        !pop ebx
        !pop esi
        <<By "quickly" i mean a factor of ah.. 15 maybe? do I hear 20?>>

        Sorry again but I make that about 3200 times faster on my machine that FORMAT$, I hope that's not a problem
        There are faster ways still to do this if you need more than 6 digits.

        Paul.
        Wow, thanks Paul! 3200 times faster. I'm gonna stand several
        feet back when I first run it on my computer just in case
        there's an explosion.

        Comment


        • #5
          John,
          no need to stand back, it's probably safe..

          A slightly more useable version follows. This allows you to convert upto 18 digits. The core code runs in 25 clock cycles on my CPU for the full 18 character conversion, but the packaging in a self contained function slows it down to about 1/10th that speed. Still, it's 150 times faster than your original FORMAT$(n&,"000000").

          Notice a slight difference in the way it's called. Since you're just specifying the number of digits, I used a number instead of a string of "0"s which would need counting.

          Paul.
          Code:
          $REGISTER NONE
          FUNCTION asmformat$(n&&,digits&)
          REM return the right most digits& of n&& as an ASCII string including leading 0s.
          REM n&& MUST be 18 decimal digits or less and unsigned.
          REM digit& MUST be 1 to 18
            
          x$="000000000000000000"    'where to assemble the answer
          xp&=STRPTR(x$)             'a pointer so ASM can find it
            
          !pushad                 'save registers
          !sub esp,12             'create a bit of workspace on the stack
          !mov edi,esp            'point edi at workspace
           
          '!fild  n&&              'load the data into FPU
          !mov eax,n&&            'must dereference it since it was passed as a parameter
          !fild qword [eax]
           
          !fbstp tbyte [edi]      'convert data to BCD and save it
           
          !mov esi,xp&            'pointer to result string
           
          !mov ecx,1              'loop counter for the 2 DWORDS holding the result
          !mov edx,0              'need to count backwards too as string is stored the opposite way to integer
          lp:
          !mov eax,[edi+edx*4+1]  'do conversion 4 low nibbles at a time
          !and eax,&h0f0f0f0f
          !add eax,&h30303030
          !mov [ecx*8+esi+7],al
          !shr eax,8
          !mov [ecx*8+esi+5],al
          !shr eax,8
          !mov [ecx*8+esi+3],al
          !shr eax,8
          !mov [ecx*8+esi+1],al
          !shr eax,8
           
          !mov eax,[edi+edx*4+1]  'and 4 high nibbles at a time
          !and eax,&hf0f0f0f0
          !shr eax,4
          !add eax,&h30303030
          !mov [ecx*8+esi+6],al
          !shr eax,8
          !mov [ecx*8+esi+4],al
          !shr eax,8
          !mov [ecx*8+esi+2],al
          !shr eax,8
          !mov [ecx*8+esi],al
          !shr eax,8
           
          !inc edx
          !dec ecx                 'finished 2 DWORDs?
          !jns lp
           
          !mov eax,[edi]           'yes, now do the 2 left over digits that didn't fit (18 digits)
          !and eax,&h0f
          !add eax,&h30
          !mov [esi+17],al
          !mov eax,[edi]
          !and eax,&hf0
          !shr eax,4
          !add eax,&h30
          !mov [esi+16],al
           
          !add esp,12         'remove workspace from stack
           
          !popad              'restore registers
            
          FUNCTION= RIGHT$(x$,digits&)
          END FUNCTION
          
          FUNCTION PBMAIN()
          
          n&&=124578
            
          x$=asmformat$(n&&,6)
          'x$=format$(n&&,"000000")
          
          MSGBOX x$
          
          END FUNCTION

          Comment


          • #6
            Originally posted by Paul Dixon:
            John,
            A slightly more useable version follows. This allows you to convert upto 18 digits. The core code runs in 25 clock cycles on my CPU for the full 18 character conversion, but the packaging in a self contained function slows it down to about 1/10th that speed. Still, it's 150 times faster than your original FORMAT$(n&,"000000").

            Notice a slight difference in the way it's called. Since you're just specifying the number of digits, I used a number instead of a string of "0"s which would need counting.


            x$=asmformat$(n&&,6)
            'x$=format$(n&&,"000000")

            MSGBOX x$

            END FUNCTION
            I may have been on the right track reading about BCD for the
            past couple days after all, but I had a LONG way to go. That
            DIV in the 1st algo was a real shocker tho. The new code is one
            sweet algorithm too, approaching a 4000x speed boost! It's a couple
            zeros faster than I thought possible.

            I may try another new (to me) item to avoid the 10x function call speed hit--a Macro.
            The reason for my extra worry 'bout speed is that it's a multiple
            1000's loop nested in a multiple 1000's loop. But the function
            is excellent, and I'll put it in a DLL and use it from now on
            whenever needed.


            ------------------

            Comment


            • #7
              John,
              you need to get a copy of the Intel Architecture Software Development Manual Volume 2.
              If you don't have it, get it from http://www.intel.com/design/pentium4/manuals/245471.htm

              I'll tell you all the ASM instructions and exactly what they do.

              The DIV instruction divides EDX:EAX by the number and leaves the remainder in EDX which is what we need.
              However, the FBSTP instruction does that in hardware for all 18 digits in about 17 clock cycles so it saves a lot of work!


              If I'd known you wanted an inline version, I wouldn't have packaged it as FUNCTION!
              The inline version can be mad to run in 25 CPU clock cycles on my K6-III but the byte data must be aligned in memory correctly (on a 32 byte boundary).

              I'll leave you to convert it to a macro, it's not difficult. The only place you'll likely get into trouble in the lines:
              '!fild n&& 'load the data into FPU
              !mov eax,n&& 'must dereference it since it was passed as a parameter
              !fild qword [eax]

              For inline use, delete the last 2 lines and uncomment the first as n&& will not be a parameter of a FUNCTION anymore so it should be accessed directly rather than looked up using its address.

              Good luck.

              Paul.

              Comment


              • #8
                Originally posted by Paul Dixon:

                If I'd known you wanted an inline version, I wouldn't have packaged it as FUNCTION!
                The inline version can be mad to run in 25 CPU clock cycles on my K6-III but the byte data must be aligned in memory correctly (on a 32 byte boundary).
                Thanks fer the great & commented code Paul. The function code was
                not done in vain; it may still be what I use on the nested loop. It's
                certainly gonna be what I use for 90+ % of future format$ needs
                since it's been a nagging problem in many tick sensitive loops
                for me lately.

                I often do array substitutions for smaller integer ranges:
                DIM sConvert(32767) AS STRING * 6
                s$ = sConvert(s&) where s& = VAL(s$) 'pretty fast but requires
                'initialization, memory

                But, six digits made sConvert(sixDigits&) a six million byte
                array and slowed the whole program to a crawl:
                DIM sConvert(999999) AS STRING * 6
                s$ = sConvert(s&) 'and go have a coffee waiting for this one.

                But now, 2004, I have
                s$ = asmFormat$(987898, 6) 'superhero fast
                and as a bonus, if I want:
                s$ = asmFormat$(485865747437370, 15) !!!
                Yep, life is good again.

                Comment


                • #9
                  John (and others)
                  a slight "oops!".

                  How efficient my CPU appeared to be recently has been nagging me.

                  Turns out that the timing program I've used for years had a slight bug in it and divided the results by 10 when it shouldn't have!

                  Sorry if I've misled anyone with how fast my code was! It's still quick, just not THAT quick.

                  Paul.

                  Comment


                  • #10
                    Paul,

                    Not misled ... looking for a clock cycle timing program myself.
                    Found some to look at, at http://www.agner.org/assem/
                    Plan to look at int ReadClock (void) to see if can do.
                    FYI also there is an updated Pentium Optimization Manual, now in
                    .pdf format.

                    Just one question, (18 digit_>$ code) why the surplus instructions,

                    !shr eax,8

                    two places, since the next instruction reloads eax anyway?

                    Also, why not save al, then save ah, then do one !shr eax,16,
                    instead of the two in-line shift 8's?

                    No real timing difference in the crude measurements here, until I
                    can check cycles with some code.

                    Regards

                    ------------------
                    Rick Angell

                    [This message has been edited by Richard Angell (edited January 06, 2004).]
                    Rick Angell

                    Comment


                    • #11
                      Rick,
                      The code I use to time short routines is as follows..
                      Code:
                      'PBCC1.0
                      $REGISTER NONE
                      FUNCTION PBMAIN()
                      REM a routine TO time things IN clock cycles
                       
                      limit&=10
                      DIM c&&(limit&)
                       
                      %RDTSC=&h310f
                       
                      a&&=0:b&&=0
                      '######################
                      'do any setting up of variables here
                       
                       
                       
                      'end of setting up
                      '########################
                       
                       
                       
                      FOR r& = 1 TO limit&
                       
                      !pushad
                      !cli
                      !cpuid
                      !dw %RDTSC
                      !mov a&&[4],edx
                      !mov a&&,eax
                       
                      '#################
                      'place the code to time here.
                       
                       
                       
                       
                       
                      'end of timed code
                      '#################
                       
                       
                      !cpuid
                      !dw %RDTSC
                      !mov b&&[4],edx
                      !mov b&&,eax
                      !sti
                      !popad
                       
                      c&&(r&)=(b&&-a&&)
                      NEXT
                       
                      FOR r& = 1 TO limit&
                          PRINT r&,c&&(r&)
                          NEXT
                       
                      WAITKEY$
                      END FUNCTION
                      This gives results in CPU clock cycles. Run it with no "timed code" first to find how long the timing code itself takes and subtract it from the final result.
                      The !cli and !sti instructions may need to be removed if running in WinXP or other OS that doesn't allow them. They work fine in Win98.

                      It's done 10 times so you can see how consistent it is. The first timing is usually longer as caches need to be filled.
                      There are sometimes odd times among the other 9 when the OS steps in to do something, especially if the !STI and !cli are not included.

                      Believe it or not, your other suggestions (store AH and remove the surplus SHRs) have already been done, I posted the code in a hurry and I didn't want to clutter the thread up with minor improvements..

                      Paul.
                      The Edit:
                      doing those minor improvements and unrolling the loop drops the time from 251 to 222 clk cycles on my K6-III, a saving of 29 clks (72nano seconds) for an 18 digit number(!).

                      [This message has been edited by Paul Dixon (edited January 06, 2004).]

                      Comment


                      • #12
                        Thanks Paul,

                        Some where, when I was looking earlier today I saw some cautions
                        on the RDTSSC method, but have not relocated that piece. For lurkers
                        here is a link to Intel's 1997 document on the instruction
                        http://cedar.intel.com/software/idap...f/rdtscpm1.pdf

                        Looks like the 10X runs is right on ... will be trying your
                        code with some work I'm doing ... after the other fires here are out

                        Added: the raw timimg code is returning 65 on my AMD XP2400

                        Regards.

                        ------------------
                        Rick Angell

                        [This message has been edited by Richard Angell (edited January 06, 2004).]
                        Rick Angell

                        Comment


                        • #13
                          Rick,
                          I'm not aware of any problems with RDTSC (except that the opcode doesn't exist in the PB assembler which is why I need the %RDTSC equate)!)

                          Beware of putting too much effort into optimising code to save every last clock cycle, different CPUs run differntly so saving a few clks on a one may waste a few on another.
                          Also, alignment in memory of both code and data become apparent when counting clks. Just moving the code by inserting a few NOPs can speed it up by 15%.

                          The raw timing code runs in 35clks on my K6-III, but it runs in 10 clks without the CPUID serialising instruction.
                          Perhaps it takes longer on yours because serializing a more modern CPU involves a lot more work as the pipelines and caches are bigger.


                          I did also try a MMX version of this FORMAT$ but it (unexpectedly) ran slower. I think the switching between MMX and FPU is too slow on my old CPU. A more modern CPU may benefit from MMX.

                          Paul.

                          Comment


                          • #14
                            Paul,

                            Late last night had a look at your code versus some other samples.

                            IOW, your code is the better of the bunch, since the process is well
                            done with good setup and exit. I hope to change my MotherBd in the
                            next weeks and re-test. Not to sure this one is doing the best by the
                            CPU.

                            Your other points in the last post well taken and are right on.

                            Thanks


                            ------------------
                            Rick Angell

                            [This message has been edited by Richard Angell (edited January 07, 2004).]
                            Rick Angell

                            Comment


                            • #15
                              http://www.powerbasic.com/support/pb...ad.php?t=23412

                              ------------------
                              bernard ertl
                              interplan systems - project management software, project planning software
                              Bernard Ertl
                              InterPlan Systems

                              Comment

                              Working...
                              X