Announcement

Collapse
No announcement yet.

64 bit floating point using SSE2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 64 bit floating point using SSE2

    Contrary to popular opinion, 64 bit SSE2 scalar double instructions are really easy to use. You can make single op function if that does the job for you but the real power is in combined operations with many calculations where you are not converting back and forth. You can also use what is called "packed doubles" and between the two you have a lot of number crunching grunt that is easy enough to write.
    Code:
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    
        #include "\basic\include\win32api.inc"
    
        MACRO FUNCTION msqrt(DblNum)
        ! movsd xmm0, DblNum                    ; load 1st value
        ! sqrtsd xmm0, xmm0                     ; calc square root
        ! movsd DblNum, xmm0
        END MACRO = DblNum
    
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    
    FUNCTION PBmain as LONG
    
        LOCAL fpnum as DOUBLE
        LOCAL fpnrv as DOUBLE
    
        fpnum = 12345.67890
    
        StdOut "Original floating point value = "+format$(fpnum)
    
      ' --------------------------
    
        fpnrv = fsqrt(fpnum)
        StdOut format$(fpnrv)+" function call"
    
      ' --------------------------
    
        fpnum = 12345.67890
        fpnrv = msqrt(fpnum)                    ' macro
        StdOut format$(fpnrv)+" macro call"
    
      ' --------------------------
    
        fpnrv = fsquared(fpnrv)
        StdOut format$(fpnrv)+" squared to original"
    
      ' --------------------------
    
        StdOut $CRLF+"Calculate the volume of a cylinder"
    
        LOCAL cylinder_radius as DOUBLE
        LOCAL cylinder_depth  as DOUBLE
        LOCAL cylinder_volume as DOUBLE
    
        cylinder_radius = 16.123456789012
        cylinder_depth  = 73.210987654321
    
        StdOut "Radius = "+format$(cylinder_radius)
        StdOut "Depth  = "+format$(cylinder_depth)
    
        cylinder_volume = get_cylinder_volume(cylinder_radius,cylinder_depth)
        StdOut "cylinder_volume = "+format$(cylinder_volume)
    
        waitkey$
    
    End FUNCTION
    
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    
    FUNCTION fsqrt(ByVal fpnum as DOUBLE) as DOUBLE
    
        PREFIX "!"
    
        movsd xmm0, fpnum                       ; load 1st value
        sqrtsd xmm0, xmm0                       ; calc square root
        movsd fpnum, xmm0  
    
        END PREFIX
    
        FUNCTION = fpnum
    
    End FUNCTION
    
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    
    FUNCTION fsquared(ByVal fpnum as DOUBLE) as DOUBLE
    
        PREFIX "!"
    
        movsd xmm0, fpnum                       ; the previous square root result
        mulsd xmm0, xmm0                        ; multiplied by itself
        movsd fpnum, xmm0                       ; store xmm0 into DOUBLE
    
        END PREFIX
    
        FUNCTION = fpnum
    
    End FUNCTION
    
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    
    FUNCTION get_cylinder_volume(ByVal radi as DOUBLE,ByVal depth as DOUBLE) as DOUBLE
    
        LOCAL pi as DOUBLE
        LOCAL rv as DOUBLE
    
        pi = 3.141592653589793
    
        PREFIX "!"
    
        movsd xmm0, radi                        ; load radius
        mulsd xmm0, xmm0                        ; radius squared
        mulsd xmm0, pi                          ; * pi
        mulsd xmm0, depth                       ; * depth
        movsd rv, xmm0                          ; result written to DOUBLE
    
        END PREFIX
    
        FUNCTION = rv
    
    End FUNCTION
    
    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    Here are some of the scalar double instructions.
    Code:
        EXAMPLE
            movsd xmm0, xmm1    ; valid
            movsd xmm0, mem1    ; valid
            movsd mem1, xmm0    ; valid
            movsd mem1, mem2    ; error, no opcode to support memory to memory
    
            One arg must be an SSE register or both can be.
    
          Data transfer
            movsd       ; move data between registers and memory
    
          Arithmetic
            addsd       ; add double to register
            subsd       ; sub dbl from register
            mulsd       ; mul 2 dbl values
            divsd       ; div one value by another
            sqrtsd      ; square root
    
          Comparisons
            comisd      ; compare two dbl values and set eflags (jz jnz je jne etc...)
            ucomisd     ; Unordered Compare and Set eflags
            cmpsd       ; Compare Scalar Double-Precision Floating-Point Value
    
          Rounding
            roundsd     ; Round Scalar Double Precision Floating-Point Values
    
          Limits
            minsd       ; Return Minimum Scalar Double-Precision Floating-Point Value
            maxsd       ; Return Maximum Scalar Double-Precision Floating-Point Value
    
          Conversions
            cvtsd2si    ; Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
            cvtsi2sd    ; Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
    
        NOTE : This is not an exhaustive list of scalar mnemonics and there are packed double
               mnemonics that can be used with 64 bit floating point values.
    
        A full explanation of each mnemonic can be found in the Intel instruction manual(s).
    hutch at movsd dot com
    The MASM Forum

    www.masm32.com

  • #2
    Thank you very much for this.

    Comment


    • #3
      Steve, what is SSE2 ? how does it do 64bit when the PB compiler is only 32bit?

      Comment


      • #4
        Addressing range and data size are not related.
        The Bit-ness of a CPU refers to its addressing range.
        The FPU has been 80 bit for decades, even when operating systems were 16 bit DOS.

        32-bit PowerBASIC uses 80 bit EXTENDED FPU data.

        SSE instructions use a different set of registers to the CPU, just as the FPU has its own set of registers.
        The SSE registers are 128 bit and are able to process multiple data items in parallel.
        They can process two 64 bit DOUBLEs or four 32 bit SINGLEs at the same time.
        They also handle lots of combinations of integers which they can process in parallel.

        SSE means (confusingly) Streaming SIMD Extensions.
        SIMD means Single Instruction Multiple Data, i.e. the same instruction does the same thing with multiple data items.

        SSE helps by first allowing multiple data to be processed simultaneously and then by streamlining the way that data is fetched, processed and stored to maximise memory bandwidth.

        Comment


        • #5
          Hi Anne,

          The data size is controlled by the register size and using the 128 bit registers, the scalar double instructions can work with 64 bit data. With the SSE2 instructions you can work in either 32 or 64 bit but for higher precision you have to revert back to the x87 instruction set. For whatever reason, later versions of Windows do not use the old x87 instruction sets but still support them at an operating system level. One of the bugbears of x87 is that it shares the same set of registers with the slightly later MMX instructions and it involved messy switching code to avoid clashes.

          With SSE using the XMM registers, you have none of those problems and there are multiple instruction families that use them with no problems. Now as Paul mentioned above, there are instructions that handle more than one piece of data at a time, XMM registers can be loaded and processed with 4 DWORD items at a time which increased the operation throughput by a large amount. PowerBASIC can handle up to SSE version 4.1 from memory and with the combined earlier SSE version has access at a very powerful set of instructions.

          The SSE2 instructions have been around for many years and only the very oldest hardware does not have them available and if you are using a modern version of 64 bit Windows you can use the full range the PowerBASIC supports. I posted the scalar double examples above as they are very simple to use for floating point maths. There is a 32 bit set of scalar single instructions and among other things, the programmers who write games use them because they have sufficient precision for their task and are generally faster than the 64 bit set. The multiple data item instructions can produce some very fast code but they get progressively more complex to use.
          hutch at movsd dot com
          The MASM Forum

          www.masm32.com

          Comment


          • #6
            Here is a quick example of using the 32 bit SINGLE floating point SSE2 instructions. They are just as easy to use and all I have done here is change the scalar double instructions to scalar single instructions and the data types from DOUBLE to SINGLE.
            Code:
            ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
            
            FUNCTION PBmain as LONG
            
                LOCAL svar as SINGLE
                LOCAL rvar as SINGLE
            
                svar = 100.0
            
                rvar = ssqrt(svar)
                MsgBox format$(rvar),0," sqrt of 100"
            
                svar = sqrd(rvar)
                MsgBox format$(svar),0," squared root value"
            
                svar = circle_area(100.0)
                MsgBox format$(svar),0," area of 100 unit circle"
            
            End FUNCTION
            
            ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
            
            FUNCTION ssqrt(ByVal fpnum as SINGLE) as SINGLE
            
                PREFIX "!"
            
                movss xmm0, fpnum                       ; load 1st value
                sqrtss xmm0, xmm0                       ; calc square root
                movss fpnum, xmm0                       ; store xmm0 into DOUBLE  
            
                END PREFIX
            
                FUNCTION = fpnum
            
            End FUNCTION
            
            ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
            
            FUNCTION sqrd(ByVal fpnum as SINGLE) as SINGLE
            
                PREFIX "!"
            
                movss xmm0, fpnum                       ; load 1st value
                mulss xmm0, xmm0                        ; square number
                movss fpnum, xmm0                       ; store xmm0 into DOUBLE  
            
                END PREFIX
            
                FUNCTION = fpnum
            
            End FUNCTION
            
            ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
            
            FUNCTION circle_area(ByVal Diameter as SINGLE) as SINGLE
            
                LOCAL pi as SINGLE
                LOCAL rv as SINGLE
                LOCAL v2 as SINGLE
            
                pi = 3.141592653589793
                v2 = 2.0
            
                PREFIX "!"
            
                movss xmm0, Diameter                    ; load diameter
                divss xmm0, v2                          ; divide by 2 for radius
                mulss xmm0, xmm0                        ; radius squared
                mulss xmm0, pi                          ; * pi
                movss rv, xmm0                          ; result written to SINGLE
            
                END PREFIX
            
                FUNCTION = rv
            
            End FUNCTION
            
            ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
            hutch at movsd dot com
            The MASM Forum

            www.masm32.com

            Comment


            • #7
              Here is a better example showing the first advantage of SSE instructions, that they work on multiple data items, not just single items.
              This calculates the square roots of 4 SINGLES in less time than the FPU takes to do one.

              Code:
              'PBCC6 program
              #COMPILE EXE
              #DIM ALL
              
              
              TYPE Single4
                  first AS SINGLE
                  second AS SINGLE
                  third AS SINGLE
                  fourth AS SINGLE
              
              END TYPE
              
              
              
              FUNCTION PBMAIN () AS LONG
              
              LOCAL a,b AS Single4
              
              a.first = 25
              a.second = 36
              a.third = 49
              a.fourth = 64
              
              
              !movups xmm0,a       'load all 4 SINGLES into XMM register 0
              !sqrtps xmm0,xmm0    'Square root all 4 SINGLES in one instruction and leave the results in XMM0
              !movups b,xmm0       'Store all 4 SINGLE results
              
              
              PRINT b.first, b.second, b.third, b.fourth
              
              WAITKEY$
              END FUNCTION

              Comment


              • #8
                Good example, the packed data types are very useful on streamed data, effectively what they are useful for.
                hutch at movsd dot com
                The MASM Forum

                www.masm32.com

                Comment


                • #9
                  Here is another version of the 32 bit scalar calculations, the last 2 procedures are to show the different between calling a function versus working directly with registers with a fast proc. The disassembly is at the end of the source file.
                  Code:
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                      #compile exe "single.exe"
                  
                  FUNCTION PBmain as LONG
                  
                      LOCAL svar as SINGLE
                      LOCAL rvar as SINGLE
                      LOCAL ivar as DWORD
                  
                      svar = 100.0
                  
                      rvar = ssqrt(svar)
                      MsgBox format$(rvar),0," sqrt of 100"
                  
                      svar = ssqrd(rvar)
                      MsgBox format$(svar),0," squared root value"
                  
                      svar = circle_area(100.0)
                      MsgBox format$(svar),0," area of 100 unit circle"
                  
                      svar = cylinder_volume(100.0,100.0)
                      MsgBox format$(svar),0," volume of 100 x 100 cylinder"
                  
                      svar = cubed_volume(100.0)
                      MsgBox format$(svar),0," volume of 100 cubed"
                  
                      svar = cubic_volume(100.0,100.0,100.0)
                      MsgBox format$(svar),0," cubic volume of 100 cubed"
                  
                    ' -------------------------------
                    ' a floating point memory operand
                    ' -------------------------------
                      svar = 100.0
                  
                    ' -------------------------------
                    ' passing values in xmm registers
                    ' -------------------------------
                      ! movss xmm1, svar
                      ! movss xmm2, svar
                      ! movss xmm3, svar
                  
                    ' -------------------------
                    ' direct call of a FASTPROC
                    ' -------------------------
                      ! call cubvol
                  
                    ' --------------------------
                    ' get return value from xmm0
                    ' --------------------------
                      ! movss svar, xmm0
                  
                      MsgBox format$(svar),0," FASTPROC cubic volume of 100 cubed"
                  
                      ivar = svar
                      MsgBox format$(ivar)
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION ssqrt(ByVal fpnum as SINGLE) as SINGLE
                  
                      PREFIX "!"
                  
                      movss xmm0, fpnum                       ; load 1st value
                      sqrtss xmm0, xmm0                       ; calc square root
                      movss fpnum, xmm0                       ; store xmm0 into DOUBLE  
                  
                      END PREFIX
                  
                      FUNCTION = fpnum
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION ssqrd(ByVal fpnum as SINGLE) as SINGLE
                  
                      PREFIX "!"
                  
                      movss xmm0, fpnum                       ; load 1st value
                      mulss xmm0, xmm0                        ; square number
                      movss fpnum, xmm0                       ; store xmm0 into DOUBLE  
                  
                      END PREFIX
                  
                      FUNCTION = fpnum
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION circle_area(ByVal Diameter as SINGLE) as SINGLE
                  
                      LOCAL pi as SINGLE
                      LOCAL rv as SINGLE
                      LOCAL v2 as SINGLE
                  
                      pi = 3.141592653589793
                      v2 = 2.0
                  
                      PREFIX "!"
                  
                      movss xmm0, Diameter                    ; load diameter
                      divss xmm0, v2                          ; divide by 2 for radius
                      mulss xmm0, xmm0                        ; radius squared
                      mulss xmm0, pi                          ; * pi
                      movss rv, xmm0                          ; result written to SINGLE
                  
                      END PREFIX
                  
                      FUNCTION = rv
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION cylinder_volume(ByVal Diameter as SINGLE,ByVal depth as SINGLE) as SINGLE
                  
                      LOCAL pi as SINGLE
                      LOCAL rv as SINGLE
                      LOCAL v2 as SINGLE
                  
                      pi = 3.141592653589793
                      v2 = 2.0
                  
                      PREFIX "!"
                  
                      movss xmm0, Diameter                    ; load diameter
                      divss xmm0, v2                          ; divide by 2 for radius
                      mulss xmm0, xmm0                        ; radius squared
                      mulss xmm0, pi                          ; * pi
                      mulss xmm0, depth                       ; times depth
                      movss rv, xmm0                          ; result written to SINGLE
                  
                      END PREFIX
                  
                      FUNCTION = rv
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION cubed_volume(ByVal wdth as SINGLE) as SINGLE
                  
                      LOCAL rv as SINGLE
                  
                      PREFIX "!"
                  
                      movss xmm0, wdth
                      mulss xmm0, wdth
                      mulss xmm0, wdth
                      movss rv, xmm0                          ; result written to SINGLE
                  
                      END PREFIX
                  
                      FUNCTION = rv
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FUNCTION cubic_volume(ByVal wd1 as SINGLE,ByVal wd2 as SINGLE,ByVal wd3 as SINGLE) as SINGLE
                  
                      LOCAL rv as SINGLE
                  
                      PREFIX "!"
                  
                      movss xmm0, wd1
                      mulss xmm0, wd2
                      mulss xmm0, wd3
                      movss rv, xmm0                          ; result written to SINGLE
                  
                      END PREFIX
                  
                      FUNCTION = rv
                  
                  End FUNCTION
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  FASTPROC cubvol
                  
                      PREFIX "!"
                  
                      movss xmm0, xmm1
                      mulss xmm0, xmm2
                      mulss xmm0, xmm3
                  
                      ret
                  
                      END PREFIX
                  
                  END FASTPROC
                  
                  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  
                  #IF 0  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  
                  --------------------------------
                  
                  FUNCTION
                  
                  002014FD                    fn_002014FD:                ; Xref 00201216 0020150B
                  002014FD 55                     push    ebp
                  002014FE 8BEC                   mov     ebp,esp
                  00201500                    loc_00201500:               ; Xref 0020149C
                  00201500 53                     push    ebx
                  00201501 56                     push    esi
                  00201502 57                     push    edi
                  00201503 683F130000             push    133Fh
                  00201508 83EC70                 sub     esp,70h
                  0020150B 68FD142000             push    offset fn_002014FD
                  00201510 31F6                   xor     esi,esi
                  00201512 56                     push    esi
                  00201513 56                     push    esi
                  00201514 56                     push    esi
                  00201515 56                     push    esi
                  00201516 56                     push    esi
                  00201517 56                     push    esi
                  00201518 F30F104508             movss   xmm0,[ebp+8]
                  0020151D F30F59450C             mulss   xmm0,[ebp+0Ch]
                  00201522 F30F594510             mulss   xmm0,[ebp+10h]
                  00201527 F30F118564FFFFFF       movss   [ebp-9Ch],xmm0
                  0020152F 8B8564FFFFFF           mov     eax,[ebp-9Ch]
                  00201535 898568FFFFFF           mov     [ebp-98h],eax
                  0020153B                    off_0020153B:               ; Xref 002014F3
                  0020153B D98568FFFFFF           fld     dword ptr [ebp-98h]
                  00201541 8D65F4                 lea     esp,[ebp-0Ch]
                  00201544 5F                     pop     edi
                  00201545 5E                     pop     esi
                  00201546 5B                     pop     ebx
                  00201547 5D                     pop     ebp
                  00201548 C20C00                 ret     0Ch
                  
                  --------------------------------
                  
                  FASTPROC
                  
                  00201550                    fn_00201550:                ; Xref 0020126B
                  00201550 F30F10C1               movss   xmm0,xmm1
                  00201554 F30F59C2               mulss   xmm0,xmm2
                  00201558 F30F59C3               mulss   xmm0,xmm3
                  0020155C C3                     ret
                  
                  --------------------------------
                  
                  #ENDIF ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                  hutch at movsd dot com
                  The MASM Forum

                  www.masm32.com

                  Comment


                  • #10
                    Here is one more, IO as integer and processed as FP.
                    Code:
                    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                    
                    FUNCTION squared2(ByVal inum as DWORD) as DWORD
                    
                        PREFIX "!"
                          cvtsi2sd xmm0, inum           ; convert integer to fp
                          mulsd xmm0, xmm0              ; square the fp number
                          cvtsd2si eax, xmm0            ; convert fp to integer register
                          mov FUNCTION, eax
                        END PREFIX
                    
                    End FUNCTION
                    
                    ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                    hutch at movsd dot com
                    The MASM Forum

                    www.masm32.com

                    Comment

                    Working...
                    X