Announcement

Collapse
No announcement yet.

ASM Memory Copy..

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Steve Hutchesson
    replied
    Peter,

    Somewhere I have seen code for doing memory copy using floating point
    instructions but they use the same registers as MMX so it does not seem
    to be any advantage.

    I can't lay my hands on the test piece at the moment but it is an unrolled
    loop of this type,

    Code:
        ! mov esi, src
        ! mov edi, dst
    
      mmSt:
        ! movq mm(0), [esi]
        ! movq mm(1), [esi + 8]
        ! movq mm(2), [esi + 16]
        ! movq mm(3), [esi + 24]
        ! movq mm(4), [esi + 32]
        ! movq mm(5), [esi + 40]
        ! movq mm(6), [esi + 48]
        ! movq mm(7), [esi + 56]
    
        ! movq [edi],      mm(0)
        ! movq [edi + 8],  mm(1)
        ! movq [edi + 16], mm(2)
        ! movq [edi + 24], mm(3)
        ! movq [edi + 32], mm(4)
        ! movq [edi + 40], mm(5)
        ! movq [edi + 48], mm(6)
        ! movq [edi + 56], mm(7)
    This code below was my last attempt at improving on REP MOVSD, it is still
    about 3 - 4 % slower than REP MOVSD and I tried this is after instruction
    re-ordering to maximise its loop speed and it has no pairing problems and no
    stalls.

    From all of the technical data I have seen and from my own testing, REP MOVSD
    is well optimised in the PII - PIII processor range but I have also run into
    the technical data that the speed of the physical memory is the limiting factor
    in memory copy and my testing appears to bear this out, all of the algorithms
    I have tested come within about 5% of each other, even though the MMX version
    should be a lot faster.

    Regards,

    [email protected]

    Code:
      ; #########################################################################
      
      srCopy proc src :DWORD, dst :DWORD, ln :DWORD
      
          LOCAL cntr :DWORD
      
          push ebx
          push esi
          push edi
      
          mov esi, src
          mov edi, dst
      
          cmp ln, 16
          jb ShortLoop
      
          mov eax, ln
          shr eax, 4
          mov cntr, eax
      
        @@:
          mov eax, [esi]
          mov [edi], eax
          mov ebx, [esi+4]
          mov [edi+4], ebx
          mov ecx, [esi+8]
          mov [edi+8], ecx
          mov edx, [esi+12]
          mov [edi+12], edx
          add esi, 16
          add edi, 16
          dec cntr
          jnz @B
      
          and ln, 15
      
        ShortLoop:
          mov al, [esi]
          inc esi
          mov [edi], al
          inc edi
          dec ln
          jns ShortLoop
      
          pop edi
          pop esi
          pop ebx
      
          ret
      
      srCopy endp
      
      ; #########################################################################
    hmmmm, smileys



    [This message has been edited by Steve Hutchesson (edited March 26, 2001).]

    Leave a comment:


  • Peter Manders
    replied
    Steve,

    there is one possible way to copy even faster that that, but I'm not sure if it'll work.

    Maybe you can write down the code for this because I have no experience in programming the floating point processor:

    It should be possible to copy 8 or 10 bytes at a time, copying FPU registers around. This might be faster, maybe you can adapt the movsd loop to test this?

    I've made a movsd loop myself but made sure my buffers are at least 4 bytes too large, I just increment the loop counter and copy between 1 and 4 bytes too many, forget about movsb. Should be faster on small copies.


    Peter.


    ------------------
    [email protected]



    [This message has been edited by Peter Manders (edited March 26, 2001).]

    Leave a comment:


  • Tyrone W. Lee
    replied
    Gee..

    Thanks guys.. that was more than I expected.. I'll see if I can make
    use of these routines..



    ------------------
    Explorations v3.0 RPG Development System
    http://www.explore-rpg.com

    Leave a comment:


  • Michael Mattias
    replied
    Fast, easy memory copy:
    Code:
    LET Y = X

    MCM

    Leave a comment:


  • Charles Dietz
    replied
    Thank you, Semen...

    And to think that I spent so much of my working career doing matrix algebra
    with Fortran, I shouldn't have forgotten the PB mat commands. Thanks again
    for reminding me.

    ------------------

    Leave a comment:


  • Semen Matusovski
    replied
    Charles --
    MAT is PB statement (operations with arrays). So, look PB.HLP.

    ------------------
    E-MAIL: [email protected]

    Leave a comment:


  • Charles Dietz
    replied
    Semen,

    What is mat? I found m(Move memory) documented in Win32.hlp.

    m SourceAddr Length DestAddress

    ------------------

    Leave a comment:


  • Steve Hutchesson
    replied
    Try this one, for all the variations I have tried to get a faster
    algo, this one still beats the rest. I have tried 6 register versions,
    8 MMX register versions and this one still outclocks them. The speed
    limit on memory copy is apparently imposed by the actual speed of
    memory but the REP MOVSD pair is very well optimised and it has
    slightly less overhead.

    Regards,

    [email protected]

    Code:
    ' ###########################################################################
    
    FUNCTION MemCopyD(ByVal src as LONG, _
                      ByVal dst as LONG, _
                      ByVal ln as LONG) as LONG
    
        #REGISTER NONE
    
          ! cld
    
          ! mov esi, src
          ! mov edi, dst
          ! mov ecx, ln
    
          ! shr ecx, 2
          ! rep movsd
    
          ! mov ecx, ln
          ! and ecx, 3
          ! rep movsb
    
        FUNCTION = 0
    
    END FUNCTION
    
    ' ###########################################################################
    ------------------

    Leave a comment:


  • Borje Hagsten
    replied
    Steve Hutchesson once posted this to the Windows Forum. Very fast.
    All credits to Steve..
    Code:
    DECLARE FUNCTION MemCopyD(BYVAL Source AS LONG, _
                             BYVAL Dest AS LONG, _
                             BYVAL ln AS LONG) AS LONG
     
    FUNCTION MemCopyD(BYVAL Source AS LONG, _
                               BYVAL Dest AS LONG, _
                               BYVAL ln AS LONG) AS LONG
      'Big mover, [ movsd ] BURP !
      'Written by Steve Hutchesson < [email protected] >
      '~~~~~~~~~~~~~~~~~~~~~~~~~~~
      LOCAL lnth AS LONG, _
            divd AS LONG, _
            rmdr AS LONG
     
      ! cmp ln, 4           ; if under 4 bytes long
      ! jl tail             ; jump to label tail
      ! mov eax, ln         ; copy length into eax
      ! push eax            ; place a copy of eax on the stack
      ! shr eax, 2          ; integer divide eax by 4
      ! shl eax, 2          ; multiply eax by 4 to get dividend
      ! mov divd, eax       ; copy it into variable
      ! mov ecx, divd       ; copy variable into ecx
      ! pop eax             ; retrieve length in eax off the stack
      ! sub eax, ecx        ; subtract dividend from length to get remainder
      ! mov rmdr, eax       ; copy remainder into variable
      ! cld                 ; copy bytes forward
      ! mov ecx, ln         ; put byte count in ecx
      ! shr ecx, 2          ; divide by 4 for DWORD data size
      ! mov esi, Source     ; copy source pointer into source index
      ! mov edi, Dest       ; copy dest pointer into destination index
      ! repnz movsd         ; repeat while not zero, move string DWORD
      ! mov ecx, rmdr       ; put remainder in ecx
      ! jmp over
    tail:
      ! mov ecx, ln         ; set counter if less than 4 bytes in length
      ! mov esi, Source     ; copy source pointer into source index
      ! mov edi, Dest       ; copy dest pointer into destination index
    over:
      ! repnz movsb         ; copy remaining BYTES from source to dest
      ! sub ln, ecx         ; calculate return value ( little use )
     
      FUNCTION = ln         ' return bytes copied
    END FUNCTION

    ------------------

    Leave a comment:


  • Semen Matusovski
    replied
    I liked a way "for dummies", which works with the same speed.

    Code:
    sub CpyMem(Destination As Dword, Source As Dword, Length As Long)     
       dim arrDest(Length-1) as byte at Destination    
       dim arrSour(Length-1) as byte at Source
       mat arrDest() = arrSour()
    end sub
    I don't name author, because it looks, that he refuses from own idea

    Added later.
    I compared three methods. On my PC (Win2000) Steve's sub works with the same speed as API.
    MAT works good with big chunks only (>= 10-20 K).

    Code:
       #Compile Exe
       #Dim All
       #Register None
       
       %L = 10000
       %k = 100000
    
       Declare Function MoveMemory Lib "KERNEL32.DLL" Alias "RtlMoveMemory" (ByVal lpDest As Long, ByVal lpSource As Long, ByVal cbMove As Long) As Long
    
    
       Function MemCopyD(ByVal Source As Long, _
                               ByVal Dest As Long, _
                               ByVal ln As Long) As Long
      'Big mover, [ movsd ] BURP !
      'Written by Steve Hutchesson < [email protected] >
      '~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Local lnth As Long, _
            divd As Long, _
            rmdr As Long
    
      ! cmp ln, 4           ; if under 4 bytes long
      ! jl tail             ; jump to label tail
      ! mov eax, ln         ; copy length into eax
      ! push eax            ; place a copy of eax on the stack
      ! shr eax, 2          ; integer divide eax by 4
      ! shl eax, 2          ; multiply eax by 4 to get dividend
      ! mov divd, eax       ; copy it into variable
      ! mov ecx, divd       ; copy variable into ecx
      ! pop eax             ; retrieve length in eax off the stack
      ! sub eax, ecx        ; subtract dividend from length to get remainder
      ! mov rmdr, eax       ; copy remainder into variable
      ! cld                 ; copy bytes forward
      ! mov ecx, ln         ; put byte count in ecx
      ! shr ecx, 2          ; divide by 4 for DWORD data size
      ! mov esi, Source     ; copy source pointer into source index
      ! mov edi, Dest       ; copy dest pointer into destination index
      ! repnz movsd         ; repeat while not zero, move string DWORD
      ! mov ecx, rmdr       ; put remainder in ecx
      ! jmp over
    tail:
      ! mov ecx, ln         ; set counter if less than 4 bytes in length
      ! mov esi, Source     ; copy source pointer into source index
      ! mov edi, Dest       ; copy dest pointer into destination index
    over:
      ! repnz movsb         ; copy remaining BYTES from source to dest
      ! sub ln, ecx         ; calculate return value ( little use )
    
      Function = ln         ' return bytes copied
    End Function
    
       Sub CpyMem(ByVal Destination As Dword, ByVal Source As Dword, ByVal Length As Long)
          ReDim bSource(0 : Length - 1) As Byte At Source
          ReDim bDestination(0 : Length - 1) As Byte At Destination
          Mat bDestination = bSource
    
       End Sub
    
       Function PbMain
    
          Dim s As Asciiz * %L
          Dim d As Asciiz * %L
          Dim t1 As Single, t2 As Single, i As Long
    
          s = "Test for CpyMem"
    
    
          t1 = Timer
          For i = 1 To %k
             MoveMemory VarPtr(d), VarPtr(s), %L
          Next
          t2 = Timer
          MsgBox Format$(t2 - t1, "#.### sec"),, "MoveMemory"
    
          t1 = Timer
          For i = 1 To %k
             CpyMem VarPtr(d), VarPtr(s), %L
          Next
          t2 = Timer
          MsgBox Format$(t2 - t1, "#.### sec"),, "MAT"
    
          t1 = Timer
          For i = 1 To %k
             MemCopyD VarPtr(d), VarPtr(s), %L
          Next
          t2 = Timer
          MsgBox Format$(t2 - t1, "#.### sec"),, "Steve"
          
          'If d <> s Then MsgBox "Oh" Else MsgBox "Ok"
    
       End Function

    ------------------
    E-MAIL: [email protected]

    [This message has been edited by Semen Matusovski (edited March 24, 2001).]

    Leave a comment:


  • Peter P Stephensen
    replied
    Use the CopyMemory-API. It is as fast as you can get it...

    Regards
    Peter

    ------------------

    Leave a comment:


  • Tyrone W. Lee
    started a topic ASM Memory Copy..

    ASM Memory Copy..

    I am not farmiliar with assembly but could someone who knows ASM write
    me a simple memory copy routine in memory.

    I want to pass information to this ASM function and have it copy data
    from one location to the next.

    Sorta Like..

    Sub CopyMem (SourceLocation as DWORD, DestinationLocation as DWORD, NumBytes as DWORD)
    End Sub

    I need this routine to be as fast as possible and it seems ASM is the
    only way to solve my problem..

    Any help would be appreciated..


    ------------------
    Explorations v3.0 RPG Development System
    http://www.explore-rpg.com
Working...
X
😀
🥰
🤢
😎
😡
👍
👎