Announcement

Collapse
No announcement yet.

Fast vector norm sought

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fast vector norm sought

    To compute the norm of a three-dimensional vector in PB (all variables are Single):
    Code:
     v = Sqr(x*x + y*y + z*z)
    To do it using the floating point processor:
    Code:
     Macro Norm3(v,x,y,z)
     ! fld x                   'push X onto stack
     ! fld st(0)               'push it again
     ! fmul                    'stack top = product
     ! fld y                   'push Y onto stack
     ! fld st(0)               'push it again
     ! fmul                    'stack top = product
     ! fld z                   'push Z onto stack
     ! fld st(0)               'push it again
     ! fmul                    'stack top = product
     ! fadd                    'stack top = sum of last two
     ! fadd                    'sum of that with last
     ! fsqrt                   'stack top = sqr of that
     ! fstp v                  'copy stack top to result and pop stack
     End Macro
    But this assembly code is literally no faster than PB. Is there a faster way?
    Politically incorrect signatures about immigration patriots are forbidden. Searching “immigration patriots” is forbidden. Thinking about searching ... well, don’t even think about it.

  • #2
    This article indicates that using SSE and its reciprical square root (since I will be dividing anyway and I don't need much accuracy)
    rsqrtss
    will take 1/20 the time as FPU's fsqrt.

    I've never used SSE before. One restriction is that the CPU hardware must support SSE, which didn't exist until 1999.

    ADDED: PBCC4 doesn't recognize MMX, much less SSE. (Ditto for PBCC5 but I need this for version 4.)
    Last edited by Mark Hunter; 1 Apr 2021, 06:28 PM.
    Politically incorrect signatures about immigration patriots are forbidden. Searching “immigration patriots” is forbidden. Thinking about searching ... well, don’t even think about it.

    Comment

    Working...
    X