Announcement

Collapse
No announcement yet.

FPU inline asm

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FPU inline asm

    I have a calculation intense program that I am writing. I was
    wondering if in general it is worth my time to hand load the
    FPU with inline asm instead to totally relying on compiled code.
    There are a few functions that the program uses a lot.

    I am very weak with asm so additional effort will be required by
    me to come up on the learning curve.

    If good performance improvements can be realized with asm then
    does anyone know of the algorithm to raise a real to a real
    power - i.e. 45.6^3.41 using asm and the FPU. Thanks.

    ------------------

  • #2
    Floating point maths are handled largely by the processor directly.
    There's little enough compiler overhead involved that rewriting the code
    in assembly language is likely to be a thankless task. I'd look to other
    techniques, like using MACROs instead of FUNCTIONs where practical.

    ------------------
    Tom Hanlin
    PowerBASIC Staff

    Comment


    • #3
      Thanks for your reply.

      ------------------

      Comment


      • #4
        James,
        check out this link. The section starting page 804 lists the ASM code required to do what you want.
        http://webster.cs.ucr.edu/Page_asm/A...y/pdf/ch14.pdf

        Paul.

        Comment


        • #5
          James,
          OK, as you say you're weak with ASM, I've converted the routine to asm suitable for comiling with PBCC1.00.

          I've timed the BASIC and ASM versions and the basic version takes about 80% longer than the direct ASM so you could almost double your throughput by using ASM.

          Paul.

          Code:
          $REGISTER NONE
          FUNCTION PBMAIN()
          
          SaveCW&=0
          MaskedCW&=0
          
          REM 45.6^3.41
          a#=45.6
          b#=3.41
          x#=0
          
          'Do it in BASIC
          c#=a# ^ b#
          
          PRINT "BASIC result= ";c#
          
          
          'Now do it in assembler
          
          !finit
          !fld a#   ;push to 2 operands onto the FP stack
          !fld b#
           
          '; YtoX(y,x)-    Computes y**x (y=st(1), x=st(0)).
          ';               This routine requires three free registers.
          ';
          ';               Y must be greater than zero.
          ';
          ';       YtoX(y,x) = 2 ** (x * lg(y))
          
          YtoX:
          !                fxch            ;Compute lg(y).
          !                fld1
          !                fxch
          !                fyl2x
          
          !                fmul            ;Compute x*lg(y).
          '!                CALL    TwoToX  ;Compute 2**(x*lg(y)).
          
                          'Don't call it, do it in-line for speed..
          '; TwoToX(x)-    Computes 2**x.
          ';               It does this by USING the algebraic identity:
          ';
          ';               2**x = 2**INT(x) * 2**FRAC(x).
          ';               We can easily compute 2**INT(x) WITH fscale AND
          ';               2**FRAC(x) USING f2xm1.
          ';
          ';               This routine requires three free registers.
          
          TwoToX:
          !                fstcw   SaveCW&
          
          '; Modify the control WORD TO truncate when rounding.
          
          !                fstcw   MaskedCW&
          '!                OR      BYTE PTR MaskedCW&+1, &b1100
          !                OR      MaskedCW&, &b110000000000
          !                fldcw   MaskedCW&
          
          
          !                fld     st(0)           ;Duplicate tos.
          !                fld     st(0)
          !                frndint                 ;Compute INTEGER portion.
          
          !                fxch                    ;SWAP whole AND INT values.
          !                fsub    st(0), st(1)    ;Compute fractional part.
          
          !                f2xm1                   ;Compute 2**FRAC(x)-1.
          !                fld1
          !                fadd                    ;Compute 2**FRAC(x).
          
          !                fxch                    ;GET INTEGER portion.
          !                fld1                    ;Compute 1*2**INT(x).
          !                fscale
          !                fstp    st(1)           ;Remove st(1) (which is 1).
          
          !                fmul                    ;Compute 2**INT(x) * 2**FRAC(x).
          
          !                fldcw   SaveCW&     ;Restore rounding mode.
           
           
          !fstp x#      ;pop the answer off the FP stack
          
          
          PRINT "ASM result   =";x#
          
          INPUT LINE d$
          
          END FUNCTION
          Editted to correct the mask used to set rounding from 1100 to 1100 0000 0000.


          [This message has been edited by Paul Dixon (edited August 19, 2002).]

          Comment


          • #6
            James,
            after looking at the code above I realised that it's not done very effieciently.
            The code below is MUCH shorter and a bit quicker. It runs on my machine 2.1 times as fast as BASIC.
            The code for doing the x ^ y is only 14 opcodes.

            Paul.
            Code:
            $REGISTER NONE
            FUNCTION PBMAIN()
            'compare speed of a^b via BASIC and ASM
            
            SaveCW&=0
            MaskedCW&=0
            
            REM 45.6^3.41
            a#=45.6 : b#=3.41 : c#=0
            
            !finit                  ;init FPU
            ' Create the control WORD TO truncate when rounding.
            !fstcw   MaskedCW&
            !OR      MaskedCW&, &b110000000000
            !fstcw   SaveCW&        ;save a copy of the initial control word so it can be put back
            
            'time empty loop
            t!=TIMER
            FOR a# = 1 TO 100 STEP 0.01
                FOR b#=1 TO 100 STEP 0.1
            NEXT
            NEXT
            
            tt!=TIMER-t!
            
            'Do it in BASIC
            t!=TIMER
            FOR a# = 1 TO 100 STEP 0.01
                FOR b#=1 TO 100 STEP 0.1
            c#=a# ^ b#
            NEXT
            NEXT
            
            PRINT "BASIC result= ";TIMER-t!-tt!
            
            
            'Now do it in assembler
            t!=TIMER
            FOR a# = 1 TO 100 STEP 0.01
                FOR b#=1 TO 100 STEP 0.1
            
            !fld b#                 ;push the 2 operands onto the FP stack
            !fld a#
            !fyl2x                  ;Compute x*lg(y)
            
            !fldcw   MaskedCW&      ;Modify the control WORD TO truncate when rounding
            !fld     st(0)          ;Duplicate top of stack
            !frndint                ;Compute INTEGER portion.
            !fsubr    st(0), st(1)  ;Compute fractional part.
            
            !f2xm1                  ;Compute 2**FRAC(x)-1.
            !fld1
            !fadd                   ;Compute 2**FRAC(x).
            
            !fscale                 ;Compute 2**INT(x) * 2**FRAC(x).
            
            !fldcw   SaveCW&        ;Restore rounding mode.
            
            !fstp c#                ;pop the answer off the FP stack
            !fstp  st(0)            ;clean up the stack (there's 1 value left on it)
               
            NEXT
            NEXT
            
            PRINT "asm result  = ";TIMER-t! -tt!
            
            INPUT LINE d$
            
            END FUNCTION

            Comment


            • #7
              Interesting. I wouldn't expect to see that much difference. I wonder
              if the two are doing quite the same thing... may bear experimenting.

              I'd be inclined to worry about the !finit here. Am I missing something,
              or did you just stomp the heck out of the FPU settings that PowerBASIC
              is expecting?

              Rather than INPUT LINE d$, think WAITKEY$. No variable needed--
              just use it as a statement.

              ------------------
              Tom Hanlin
              PowerBASIC Staff

              Comment


              • #8
                Tom,
                !FINIT makes little difference to the timing, I use it out of habit. It just puts the FPU into it's default configuration in case something else has messed it up beforehand.
                REMing it out still works in this case.


                The 2 are definitely doing the same (at least they produce the same results to the last place) for the 10 million x y combinations I tried when testing the code.
                I'm sure I once decompiled a PBCC program to see what opcodes were used but I can't remember how! So I can't compare the 2 versions to see where the time is lost.


                Input line d$ is just another habit. In DOS I regularly use INPUT d$ (d=dummy) and the natural way to change it to work in PBCC1.0 was to use INPUT LINE d$. I may try to get into the waitkey$ habit, it's shorter.

                Paul.

                Comment


                • #9
                  Thank you for your code Paul.

                  ------------------

                  Comment

                  Working...
                  X