Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

SHA512 Secure Hash for 9.0+ & 5.0+

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SHA512 Secure Hash for 9.0+ & 5.0+

    Secure 512-bit hashing (ver. 2) for PowerBASIC

    Code for the following two files appears below:
    • SHA512a.INC
      Hash routines for returning 64-byte SHA512 hashes of buffers and files. Included are a 32-bit version as well as versions that make use of SSE2 or MMX functionality if available.
    • SHA512a.BAS
      Test bed EXE illustrating buffer and file hashing.


    SHA512a.ZIP (available below) contains both files as well as a class implementation of the same functionality.

    See the function declarations below for detailed calling information.

    All code compiles with either PBWIN 9+ or PBCC 5+. It replaces an earlier version of SHA512 which I posted in 2004. My thanks go to Eddy Van Esch for help with testing and debugging the new version.

    ----------------------------------------------------------------------

    Available here is a PDF file containing the NIST specifications for SHA-1 as well as the 224-bit, 256-bit, 384-bit, and 512-bit extensions to the SHA standard.

    A hash is considered secure when it possesses the following qualities.
    -- Determining the input string from the hash (i.e., working backward from the hash alone to determine the string which generated it) is not considered feasible.
    -- Given an input string, it is not considered feasible to find another string which hashes to the same value.
    -- It is not considered feasible to find two random strings which hash to the same value.

    Secure hashes are not designed for speed. The implementation below relies on assembly language to improve speed, but unless security is required, a secure hash is a poor choice when compared with the many simpler, far more efficient hash algorithms in widespread use. Moreover, unless compelling reasons exist for employing a 512-bit secure hash, SHA256 offers faster results on most systems. My PB implementation of SHA256 is available here.

    NIST is currently preparing an open competition to replace SHA-1 as the secure hash standard. Info.

    ----------------------------------------------------------------------
    This PB implementation of SHA512 is hereby placed in the public domain. Use it as you wish.

    Greg Turgeon
    10/2008

    Code:
    '=====================================================================
    '-- SHA512a.BAS
    '-- Test bed for SHA512a.INC
    '-- Compiles with either PBWIN 9+ or PBCC 5+
    '   Greg Turgeon  10/2008
    '=====================================================================
    #COMPILE EXE
    #DIM ALL
    '============
    #INCLUDE "WIN32API.INC"
    #INCLUDE "SHA512a.INC"
    
    '--------------------
    '  Utility macros
    '--------------------
    #IF %def(%pb_win32)
       MACRO eol=$CR
       MACRO say(t)
          MessageBox 0&, BYCOPY (t), EXE.Namex$, %MB_OK OR %MB_TASKMODAL
       END MACRO
       MACRO EnterCC
       END MACRO
       MACRO ExitCC
       END MACRO
    #ELSEIF %def(%pb_cc32)
       MACRO eol=$CRLF
       MACRO say(t)=stdout t
    
       MACRO EnterCC
       LOCAL launched AS LONG
       if (cursory = 1) and (cursorx = 1) then launched = -1
       END MACRO
    
       MACRO ExitCC
       if launched then
          input flush
          stdout "Press any key to end"
          waitkey$
       end if
       END MACRO
    #ENDIF
    
    '--------------------
    '-- Utility functions
    '--------------------
    DECLARE FUNCTION Get_FileSize(File_Name$) AS DWORD
    DECLARE FUNCTION ShowHash64(ShouldBe$, Hash$) AS LONG
    DECLARE FUNCTION Hex2ShowQuad(Buffer$) AS STRING
    
    '====================
    FUNCTION PBMain() AS LONG
    LOCAL ecode AS LONG
    LOCAL dataBuffer, sha, shouldBe AS STRING
    EnterCC
    
    gosub TestVectors1
    gosub TestVectors2
    gosub TestVectors3
    gosub FileHash
    
    function = ecode
    '============
    ExitMain:
    ExitCC
    EXIT FUNCTION
    
    '============
    TestVectors1:
    dataBuffer$ = "abc"  'target data 
    sha$ = nul$(%HASHLEN) 'buffer into which hash routine will place hash
    SHA512_Buffer byval strptr(dataBuffer$), len(dataBuffer$), byval strptr(sha$)
    shouldBe$ = "DDAF35A193617ABA CC417349AE204131 12E6FA4E89A97EA2 0A9EEEE64B55D39A 2192992A274FC1A8 36BA3C23A3FEEBBD 454D4423643CE80E 2A9AC94FA54CA49F"
    ShowHash64 shouldBe$, sha$
    RETURN
    
    '============
    TestVectors2:
    dataBuffer$ = "abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu"
    sha$ = nul$(%HASHLEN)
    SHA512_Buffer byval strptr(dataBuffer$), len(dataBuffer$), byval strptr(sha$)
    shouldBe$ = "8E959B75DAE313DA 8CF4F72814FC143F 8F7779C6EB9F7FA1 7299AEADB6889018 501D289E4900F7E4 331B99DEC4B5433A C7D329EEB6DD2654 5E96E55B874BE909"
    ShowHash64 shouldBe$, sha$
    RETURN
    
    '============
    TestVectors3:
    dataBuffer$ = string$(1000000,"a")
    sha$ = nul$(%HASHLEN)
    SHA512_Buffer byval strptr(dataBuffer$), len(dataBuffer$), byval strptr(sha$)
    shouldBe$ = "E718483D0CE76964 4E2E42C7BC15B463 8E1F98B13B204428 5632A803AFA973EB DE0FF244877EA60A 4CB0432CE577C31B EB009C5C2C49AA2E 4EADB217AD8CC09B"
    ShowHash64 shouldBe$, sha$
    RETURN
    
    '============
    FileHash:
    LOCAL t$, file_name$, file_size AS DWORD
    LOCAL t1, t2, t3 AS SINGLE
    
    file_name = command$
    
    if len(file_name) = 0 then
       say(eol + "No file specified")
       return
    end if
    if isfile(file_name) = 0 then
       say(eol + "Cannot find file " + file_name)
       return
    end if
    
    t1 = GetTickCount
    ecode = SHA512_File(File_Name$, sha$)
    t2 = GetTickCount
    if ecode then 
       say(eol+ "SHA512_File error" + str$(ecode) + eol + error$(ecode))
       return
    end if
    
    t = t + file_name + eol
    file_size = Get_FileSize(file_name$)
    t3 = (t2-t1)/1000
    t = t + "File size: " + using$(",",file_size) + " bytes" + eol
    t = t + "Time elapsed: " + format$(t3,"###.###") + " seconds" + eol
    t = t + format$(file_size/t3,"#########,") + " BPS"
    say(t)
    RETURN
    END FUNCTION
    
    
    '====================
    FUNCTION Get_FileSize(File_Name$) AS DWORD
    LOCAL totalbytes AS DWORD, fdata AS DIRDATA
    if len(dir$(File_Name$, to fdata)) then
       totalbytes = fdata.FileSizeLow
    end if
    function = totalbytes
    END FUNCTION
    
    '====================
    FUNCTION ShowHash64(shouldBe$, Hash$) AS LONG
    LOCAL t$
    t = "Should be:" + eol + shouldBe$ + eol
    t = t + "Actual: " + eol + Hex2ShowQuad(Hash$)
    say(t)
    END FUNCTION
    
    '====================
    FUNCTION Hex2ShowQuad(Buffer$) AS STRING
    REGISTER i AS LONG, j AS LONG
    LOCAL t$, pbyte AS BYTE PTR
    pbyte = strptr(Buffer$)
    for i = 0 to 7
       for j = 0 to 7
          t = t +  hex$(@pbyte,2)
          incr pbyte
       next j
       t = t + " "
    next i
    function = t
    END FUNCTION
    '-- END SHA512a.BAS ---------------------------------------------------
    Code:
    '=====================================================================
    '-- SHA512a.INC
    '-- Implementation of the SHA512 secure hash algorithm
    '-- Compiles with either PBWIN 9+ or PBCC 5+
    '-- WIN32 API not required
    '-- Uses no global data
    '   Greg Turgeon  10/2008
    '=====================================================================
    
    %TRUE                = 1
    %FALSE               = 0
    
    '-- Set to %FALSE to return big-endian hash$
    %RETURN_LITTLE_ENDIAN = %TRUE
    
    %ALIGNMENT        = 16
    %WORKSPACESIZE    = ((8*8)+(80*8)+(8*8)) 's_array+w_array+xx,t0,t1,etc.
    %HASHLEN          = 64     'bytes
    %BLOCKSIZE        = 128    'bytes
    %FILE_BUFFERSIZE  = 32000  'bytes
    
    TYPE SHA512_CONTEXT
       state((%HASHLEN\8)+(%ALIGNMENT\8)) AS QUAD      'here, 80 bytes
       lendata     AS DWORD
       pdata       AS BYTE PTR
       pstate      AS QUAD PTR
       k_array     AS QUAD PTR
       s_array     AS BYTE PTR
       w_array     AS BYTE PTR
       pworkspace  AS BYTE PTR
       dummy1      AS LONG     'padding for 64-byte alignment
       workspace   AS STRING * (%WORKSPACESIZE + %ALIGNMENT)
    END TYPE
    
    
    DECLARE FUNCTION SHA512_Buffer(BYVAL DataBuffer AS BYTE PTR, _
                                   BYVAL Length AS DWORD, _
                                   BYVAL HashBuffer AS BYTE PTR) AS LONG
    #IF 0
    Parameters for SHA512_Buffer() specify the location of the data to 
    be hashed, the size of the data, and the location where the hash$ 
    is to be placed.  Byte pointers are listed only to fit the logic 
    of the action being performed.  For example, the routine can be 
    called with: 
    
    ecode = SHA512_Buffer(byval strptr(buffer$), _
                          len(buffer$), _
                          byval strptr(hash$))
    ecode = SHA512_Buffer(byval varptr(AnArray&(0)), _
                         (ubound(AnArray)*4)+4, _
                          byval varptr(aUDT.HashString))
    ecode = SHA512_Buffer(byval varptr(aUDT), _
                          sizeof(aUDT), _
                          byval varptr(HashArray?(0)))
    However, the routine performs no error checking to verify the 
    validity of the parameters passed.
    #ENDIF
    
    DECLARE FUNCTION SHA512_File(File_Name$, Hash$) AS LONG
    #IF 0
    SHA512_File() expects a dynamic string$ to be passed for return 
    of the hash; the string$ itself is resized within the routine.  The 
    routine returns zero on success or a PB (not Win32) error code.
    #ENDIF
    
    '-- Routines used internally
    '   SHA512_Buffer() and SHA512_File() test for SSE2 and MMX 
    '   availability and call the routine w/highest performance
    DECLARE FUNCTION SHA512_Init(Ctx AS SHA512_CONTEXT) AS LONG
    DECLARE FUNCTION SHA512_MakePadding(BYVAL TotalBytes AS DWORD) AS STRING
    DECLARE FUNCTION SHA512_Compress128(Ctx AS SHA512_CONTEXT) AS LONG
    DECLARE FUNCTION SHA512_Compress64(Ctx AS SHA512_CONTEXT) AS LONG
    DECLARE FUNCTION SHA512_Compress32(Ctx AS SHA512_CONTEXT) AS LONG
    DECLARE FUNCTION HasSSE2() AS LONG
    DECLARE FUNCTION HasMMX() AS LONG
    
    '--------------------
    MACRO align(p,alignment)=((p+(alignment-1)) AND (NOT(alignment-1)))
    
    '--------------------
    MACRO ROR8_128(XMMReg,RotateVal)
    '-- Returns (x >> n) | (x << (64 - n))
    '-- Destroys eax, edx, xmm6, xmm7
    !  mov      eax,     RotateVal
    !  mov      edx,     64
    !  sub      edx,     eax
    
    !  movd     xmm6,     edx
    !  movdqa   xmm7,     XMMReg        ;copy to xmm7
    
    !  psrlq    XMMReg,  RotateVal      ;shift each quad right
    !  psllq    xmm7,    xmm6           ;shift each quad left by edx
    !  por      XMMReg,  xmm7           ;OR the results
    END MACRO
    
    '--------------------
    MACRO ROR8_64(MMXReg,RotateVal)
    '-- Returns (x >> n) | (x << (64 - n))
    '-- Destroys eax, edx, mm6, mm7
    !  mov      eax,     RotateVal
    !  mov      edx,     64
    !  sub      edx,     eax
    
    !  movd     mm6,     edx
    !  movq     mm7,     MMXReg       ;copy to mm7
    
    !  psrlq    MMXReg,  RotateVal    ;shift right
    !  psllq    mm7,     mm6          ;shift left by edx
    !  por      MMXReg,  mm7          ;OR the results
    END MACRO
    
    '--------------------
    MACRO ROR8(pQuad,RotateVal)
    MACROTEMP RorStore
    '-- Destroys eax, ecx, edx
    !  mov      eax,     pQuad
    !  mov      ecx,     RotateVal
    !  mov      edx,     [eax+4]
    !  mov      eax,     [eax]
    !  mov      ebx,     eax         ;duplicate ebx = QuadLO
    !  and      ecx,     63          ;RotateVal mod 64
    
    !  shrd     eax,     edx,  cl
    !  shrd     edx,     ebx,  cl
    !  test     ecx,     32          ;RotateVal > 31?
    !  jz       RorStore             ;done if yes
    !  xchg     eax,     edx         ;otherwise rotate edx:eax, 32
    RorStore:
    !  mov      ecx,     pQuad
    !  mov      [ecx],   eax
    !  mov      [ecx+4], edx
    END MACRO
    
    
    '--------------------
    MACRO SHR8_128(XMMReg,ShiftVal)
    !  psrlq    XMMReg,     ShiftVal
    END MACRO
    
    '--------------------
    MACRO SHR8_64(MMXReg,ShiftVal)
    !  psrlq    MMXReg,     ShiftVal
    END MACRO
    
    '--------------------
    MACRO SHR8(pQuad,ShiftVal)
    '-- Destroys eax, ecx, edx
    MACROTEMP SHR8Done
    !  mov      eax,     pQuad
    !  mov      ecx,     ShiftVal
    !  mov      edx,     [eax+4]
    !  mov      eax,     [eax]
    !  and      ecx,     63          ;ShiftVal mod 64
    
    !  shrd     eax,     edx, cl     ;edx:eax shr (ShiftVal mod 32 )
    !  shr      edx,     cl
    !  test     ecx,     32          ;ShiftVal > 31?
    !  jz                SHR8Done    ;done if yes
    !  mov      eax,     edx         ;otherwise shift right edx:eax, 32
    !  xor      edx,     edx
    SHR8Done:
    !  mov      ecx,     pQuad
    !  mov      [ecx],   eax
    !  mov      [ecx+4], edx
    END MACRO
    
    
    '--------------------
    MACRO XOR8(px,py,pz)
    '-- Destroys eax, ebx, ecx, edx
    '-- Returns result at [px]
    !  mov      edx,     pz
    !  mov      ecx,     py
    !  mov      edx,     [edx]
    !  mov      ecx,     [ecx]
    
    !  mov      eax,     px
    !  xor      edx,     ecx         ;edx = (pzLO XOR pyLO)
    !  mov      ebx,     eax         ;save ebx --> px
    !  mov      ecx,     [eax]
    !  xor      edx,     ecx         ;edx = (pzLO XOR pyLO) XOR pxLO
    !  mov      [ebx],   edx         ;store low dword to pxLO
    
    !  mov      ecx,     py
    !  mov      edx,     [eax+4]
    !  mov      ecx,     [ecx+4]
    !  mov      eax,     pz
    !  xor      edx,     ecx         ;edx = (pxHI XOR pyHI)
    
    !  mov      ecx,     [eax+4]
    !  xor      edx,     ecx         ;edx = (pxHI XOR pyHI) XOR pzHI
    !  mov      [ebx+4], edx         ;store hi dword to pxHI
    END MACRO
    
    
    '--------------------
    MACRO Chh128(px,py,pz,presult)
    '-- Chh(x,y,z)=(z XOR (x AND (y XOR z)))
    '-- Returns presult at [presult]
    '-- Destroys eax, ecx, edx, xmm0, xmm1, xmm2
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  movq     xmm0,    [eax]
    !  movq     xmm1,    [ecx]
    !  movq     xmm2,    [edx]
    
    !  mov      eax,     presult
    !  pxor     xmm1,    xmm2         ;(y XOR z)
    !  pand     xmm0,    xmm1         ;(x AND (y XOR z)))
    !  pxor     xmm2,    xmm0         ;(z XOR (x AND (y XOR z)))
    !  movq     [eax],   xmm2
    END MACRO
    
    '--------------------
    MACRO Chh64(px,py,pz,presult)
    '-- Chh(x,y,z)=(z XOR (x AND (y XOR z)))
    '-- Returns presult at [presult]
    '-- Destroys eax, ecx, edx, mm0, mm1, mm2
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  movq     mm0,     [eax]
    !  movq     mm1,     [ecx]
    !  movq     mm2,     [edx]
    
    !  mov      eax,     presult
    !  pxor     mm1,     mm2         ;(y XOR z)
    !  pand     mm0,     mm1         ;(x AND (y XOR z)))
    !  pxor     mm2,     mm0         ;(z XOR (x AND (y XOR z)))
    !  movq     [eax],   mm2
    END MACRO
    
    '--------------------
    MACRO Chh(px,py,pz,presult)
    '-- Chh(x,y,z)=(z XOR (x AND (y XOR z)))
    '-- Returns presult at [presult]
    '-- Destroys eax, ecx, edx
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  mov      eax,     [eax]
    !  mov      ecx,     [ecx]
    !  mov      edx,     [edx]
    
    !  xor      ecx,     edx         ;ecx = (y XOR z)
    !  and      eax,     ecx         ;eax = (x AND (y XOR z))
    !  mov      ecx,     presult
    !  xor      edx,     eax         ;edx = (z XOR (x AND (y XOR z)))
    !  mov      [ecx],   edx
    
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  mov      eax,     [eax+4]
    !  mov      ecx,     [ecx+4]
    !  mov      edx,     [edx+4]
    
    !  xor      ecx,     edx         ;ecx = (y XOR z)
    !  and      eax,     ecx         ;eax = (x AND (y XOR z))
    !  mov      ecx,     presult
    !  xor      edx,     eax         ;edx = (z XOR (x AND (y XOR z)))
    !  mov      [ecx+4], edx
    END MACRO
    
    
    '--------------------
    MACRO Maj128(px,py,pz,presult)
    '-- Maj(x,y,z)=(((x OR y) AND z) OR (x AND y)) 
    '-- Destroys eax, ecx, edx, xmm0, xmm1, xmm2, xmm3
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  movq     xmm0,    [eax]      ;xmm0 = [px]
    !  movq     xmm1,    [ecx]      ;xmm1 = [py]
    !  movdqa   xmm3,    xmm0       ;copy: xmm3 = xmm0 = [px]
    !  movq     xmm2,    [edx]      ;xmm2 = [pz]
    !  por      xmm0,    xmm1       ;xmm0 =  (x OR y)
    !  pand     xmm3,    xmm1       ;xmm3 =  (x AND y)
    !  pand     xmm0,    xmm2       ;xmm0 = ((x OR y) AND z)
    !  mov      eax,     presult
    !  por      xmm0,    xmm3       ;xmm0 = ((x OR y) AND z) OR (x AND y)
    !  movq     [eax],   xmm0
    END MACRO
    
    '--------------------
    MACRO Maj64(px,py,pz,presult)
    '-- Maj(x,y,z)=(((x OR y) AND z) OR (x AND y)) 
    '-- Destroys eax, ecx, edx, mm0, mm1, mm2, mm3
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  movq     mm0,     [eax]       ;mm0 = [px]
    !  movq     mm1,     [ecx]       ;mm1 = [py]
    !  movq     mm3,     mm0         ;copy: mm0 = [px]
    !  movq     mm2,     [edx]       ;mm2 = [pz]
    !  por      mm0,     mm1         ;mm0 =  (x OR y)
    !  pand     mm3,     mm1         ;mm3 =  (x AND y)
    !  pand     mm0,     mm2         ;mm0 = ((x OR y) AND z)
    !  mov      eax,     presult
    !  por      mm0,     mm3         ;mm0 = ((x OR y) AND z) OR (x AND y)
    !  movq     [eax],   mm0
    END MACRO
    
    '--------------------
    MACRO Maj(px,py,pz,presult)
    '-- Maj(x,y,z)=(((x OR y) AND z) OR (x AND y)) 
    '-- Destroys eax, ebx, ecx, edx
    !  push     esi
    !  push     edi
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      edx,     pz
    !  mov      eax,     [eax]       ;eax = [pxLO]
    !  mov      esi,     [ecx]       ;esi = [pyLO]
    !  mov      edi,     [edx]       ;edi = [pzLO]
    !  mov      ebx,     eax         ;copy: ebx = pxLO
    !  or       eax,     esi         ;eax =  (x OR y)
    !  and      ebx,     esi         ;ebx =  (x AND y)
    !  and      eax,     edi         ;eax = ((x OR y) AND z)
    !  or       eax,     ebx         ;eax = ((x OR y) AND z) OR (x AND y)
    !  mov      edi,     presult
    !  mov      [edi],   eax         ;presultLO
    
    !  mov      eax,     px
    !  mov      ecx,     py
    !  mov      eax,     [eax+4]     ;eax = [pxHI]
    !  mov      esi,     [ecx+4]     ;esi = [pyHI]
    !  mov      edi,     [edx+4]     ;edi = [pzHI]
    !  mov      ebx,     eax         ;save: ebx = pxHI
    !  or       eax,     esi         ;eax =  (x OR y)
    !  and      ebx,     esi         ;ebx =  (x AND y)
    !  and      eax,     edi         ;eax = ((x OR y) AND z)
    !  or       eax,     ebx         ;eax = ((x OR y) AND z) OR (x AND y)
    !  mov      edi,     presult
    !  mov      [edi+4], eax         ;presultHI
    !  pop      edi
    !  pop      esi
    END MACRO
    
    
    '--------------------
    MACRO Sigma0_128(pn,presult)
    '-- Destroys edx, xmm0, xmm1, xmm2
    !  mov      edx,     pn
    !  movq     xmm0,    [edx]
    !  movdqa   xmm1,    xmm0
    !  movdqa   xmm2,    xmm0
    ROR8_128(xmm0,28) : ROR8_128(xmm1,34) : ROR8_128(xmm2,39)
    !  pxor     xmm0,    xmm1
    !  mov      edx,     presult
    !  pxor     xmm0,    xmm2
    !  movq     [edx],   xmm0
    END MACRO
    
    '--------------------
    MACRO Sigma0_64(pn,presult)
    '-- Destroys edx, mm0, mm1, mm2
    !  mov      edx,     pn
    !  movq     mm0,     [edx]
    !  movq     mm1,     mm0
    !  movq     mm2,     mm0
    ROR8_64(mm0,28) : ROR8_64(mm1,34) : ROR8_64(mm2,39)
    !  pxor     mm0,     mm1
    !  mov      edx,     presult
    !  pxor     mm0,     mm2
    !  movq     [edx],   mm0
    END MACRO
    
    '--------------------
    MACRO Sigma0(pn,presult)
    Copy8XtoY(pn,xx) : Copy8XtoY(pn,yy) : Copy8XtoY(pn,zz)
    ROR8(xx,28)      : ROR8(yy,34)      : ROR8(zz,39)
    XOR8(xx,yy,zz)
    Copy8XtoY(xx,presult)
    END MACRO
    
    
    '--------------------
    MACRO Sigma1_128(pn,presult)
    '-- Destroys edx, xmm0, xmm1, xmm2
    !  mov      edx,     pn
    !  movq     xmm0,    [edx]
    !  movdqa   xmm1,    xmm0
    !  movdqa   xmm2,    xmm0
    ROR8_128(xmm0,14) : ROR8_128(xmm1,18) : ROR8_128(xmm2,41)
    !  pxor     xmm0,    xmm1
    !  mov      edx,     presult
    !  pxor     xmm0,    xmm2
    !  movq     [edx],   xmm0
    END MACRO
    
    '--------------------
    MACRO Sigma1_64(pn,presult)
    '-- Destroys edx, mm0, mm1, mm2
    !  mov      edx,     pn
    !  movq     mm0,     [edx]
    !  movq     mm1,     mm0
    !  movq     mm2,     mm0
    ROR8_64(mm0,14) : ROR8_64(mm1,18) : ROR8_64(mm2,41)
    !  pxor     mm0,     mm1
    !  mov      edx,     presult
    !  pxor     mm0,     mm2
    !  movq     [edx],   mm0
    END MACRO
    
    '--------------------
    MACRO Sigma1(pn,presult)
    Copy8XtoY(pn,xx) : Copy8XtoY(pn,yy) : Copy8XtoY(pn,zz)
    ROR8(xx,14)      : ROR8(yy,18)      : ROR8(zz,41)
    XOR8(xx,yy,zz)
    Copy8XtoY(xx,presult)
    END MACRO
    
    
    '--------------------
    MACRO Gamma0_128(pn,presult)
    '-- Destroys edx, xmm0, xmm1, xmm2
    !  mov      edx,     pn
    !  movq     xmm0,    [edx]
    !  movdqa   xmm1,    xmm0
    !  movdqa   xmm2,    xmm0
    ROR8_128(xmm0,1) : ROR8_128(xmm1,8) : SHR8_128(xmm2,7)
    !  pxor     xmm0,    xmm1
    !  mov      edx,     presult
    !  pxor     xmm0,    xmm2
    !  movq     [edx],   xmm0
    END MACRO
    
    '--------------------
    MACRO Gamma0_64(pn,presult)
    '-- Destroys edx, mm0, mm1, mm2
    !  mov      edx,     pn
    !  movq     mm0,     [edx]
    !  movq     mm1,     mm0
    !  movq     mm2,     mm0
    ROR8_64(mm0,1) : ROR8_64(mm1,8) : SHR8_64(mm2,7)
    !  pxor     mm0,     mm1
    !  mov      edx,     presult
    !  pxor     mm0,     mm2
    !  movq     [edx],   mm0
    END MACRO
    
    '--------------------
    MACRO Gamma0(pn,presult)
    Copy8XtoY(pn,xx) : Copy8XtoY(pn,yy) : Copy8XtoY(pn,zz)
    ROR8(xx,1)       : ROR8(yy,8)       : SHR8(zz,7)
    XOR8(xx,yy,zz)
    Copy8XtoY(xx,presult)
    END MACRO
    
    
    '--------------------
    MACRO Gamma1_128(pn,presult)
    '-- Destroys edx, xmm0, xmm1, xmm2
    !  mov      edx,     pn
    !  movq     xmm0,    [edx]
    !  movdqa   xmm1,    xmm0
    !  movdqa   xmm2,    xmm0
    ROR8_128(xmm0,19) : ROR8_128(xmm1,61) : SHR8_128(xmm2,6)
    !  pxor     xmm0,    xmm1
    !  mov      edx,     presult
    !  pxor     xmm0,    xmm2
    !  movq     [edx],   xmm0
    END MACRO
    
    '--------------------
    MACRO Gamma1_64(pn,presult)
    '-- Destroys edx, mm0, mm1, mm2
    !  mov      edx,     pn
    !  movq     mm0,     [edx]
    !  movq     mm1,     mm0
    !  movq     mm2,     mm0
    ROR8_64(mm0,19) : ROR8_64(mm1,61) : SHR8_64(mm2,6)
    !  pxor     mm0,     mm1
    !  mov      edx,     presult
    !  pxor     mm0,     mm2
    !  movq     [edx],   mm0
    END MACRO
    
    '--------------------
    MACRO Gamma1(pn,presult)
    Copy8XtoY(pn,xx) : Copy8XtoY(pn,yy) : Copy8XtoY(pn,zz)
    ROR8(xx,19)      : ROR8(yy,61)      : SHR8(zz,6)
    XOR8(xx,yy,zz)
    Copy8XtoY(xx,presult)
    END MACRO
    
    
    '--------------------
    MACRO Copy8XtoY128(px,py)
    '-- Destroys eax, edx, xmm0
    !  mov      eax,     px
    !  mov      edx,     py
    !  movq     xmm0,    [eax]
    !  movq     [edx],   xmm0
    END MACRO
    
    '--------------------
    MACRO Copy8XtoY64(px,py)
    '-- Destroys eax, edx, mm0
    !  mov      eax,     px
    !  mov      edx,     py
    !  movq     mm0,     [eax]
    !  movq     [edx],   mm0
    END MACRO
    
    '--------------------
    MACRO Copy8XtoY(px,py)
    '-- Destroys eax, ebx, ecx, edx
    !  mov      eax,     px
    !  mov      edx,     py
    !  mov      ebx,     [eax]
    !  mov      ecx,     [eax+4]
    !  mov      [edx],   ebx
    !  mov      [edx+4], ecx
    END MACRO
    
    
    '--------------------
    MACRO Add8XtoY128(px,py)
    '-- Destroys eax, edx, xmm6, xmm7
    !  mov      edx,     py          ;edx --> y throughout (target)
    !  mov      eax,     px          ;eax --> x throughout
    !  movq     xmm6,    [edx]
    !  movq     xmm7,    [eax]
    !  paddq    xmm6,    xmm7
    !  movq     [edx],   xmm6
    END MACRO
    
    '--------------------
    MACRO Add8XtoY64(px,py)
    Add8XtoY(px,py)
    END MACRO
    
    '--------------------
    MACRO Add8XtoY(px,py)
    '-- Destroys eax, ebx, ecx, edx
    !  mov      edx,     py          ;edx --> y throughout (target)
    !  mov      eax,     px          ;eax --> x throughout
    !  mov      ecx,     [edx]       ;ecx  = y[0]
    !  add      ecx,     [eax]       ;y[0] = y[0] + x[0]
    !  mov      [edx],   ecx         ;store to y[0]
    !  mov      ecx,     [edx+4]     ;ecx = y[3]
    !  adc      ecx,     [eax+4]     ;ecx = y[3] + x[3]
    !  mov      [edx+4], ecx         ;store to y[3]
    END MACRO
    
    
    '====================
    FUNCTION SHA512_Init(Ctx AS SHA512_CONTEXT) AS LONG
    LOCAL p AS DWORD
    p               = varptr(Ctx.state(0))
    Ctx.pstate      = align(p,%ALIGNMENT)
    [email protected][0]  = &h6A09E667F3BCC908&&
    [email protected][1]  = &hBB67AE8584CAA73B&&
    [email protected][2]  = &h3C6EF372FE94F82B&&
    [email protected][3]  = &hA54FF53A5F1D36F1&&
    [email protected][4]  = &h510E527FADE682D1&&
    [email protected][5]  = &h9B05688C2B3E6C1F&&
    [email protected][6]  = &h1F83D9ABFB41BD6B&&
    [email protected][7]  = &h5BE0CD19137E2179&&
    
    Ctx.k_array = codeptr(K_Array_Data)
    
    p = varptr(Ctx.workspace)
    p = align(p,%ALIGNMENT)
    Ctx.s_array = p
    Ctx.w_array = p+(8*8)              ' allow for 16-byte alignment in SHA512_Compress128()
    Ctx.pworkspace = p+((8*8)+(80*8))
    
    EXIT FUNCTION
    '============
    #ALIGN 16
    K_Array_Data:
    ! DD  &hD728AE22,&h428A2F98,  &h23EF65CD,&h71374491,  &hEC4D3B2F,&hB5C0FBCF,  &h8189DBBC,&hE9B5DBA5
    ! DD  &hF348B538,&h3956C25B,  &hB605D019,&h59F111F1,  &hAF194F9B,&h923F82A4,  &hDA6D8118,&hAB1C5ED5
    ! DD  &hA3030242,&hD807AA98,  &h45706FBE,&h12835B01,  &h4EE4B28C,&h243185BE,  &hD5FFB4E2,&h550C7DC3
    ! DD  &hF27B896F,&h72BE5D74,  &h3B1696B1,&h80DEB1FE,  &h25C71235,&h9BDC06A7,  &hCF692694,&hC19BF174
    ! DD  &h9EF14AD2,&hE49B69C1,  &h384F25E3,&hEFBE4786,  &h8B8CD5B5,&h0FC19DC6,  &h77AC9C65,&h240CA1CC
    ! DD  &h592B0275,&h2DE92C6F,  &h6EA6E483,&h4A7484AA,  &hBD41FBD4,&h5CB0A9DC,  &h831153B5,&h76F988DA
    ! DD  &hEE66DFAB,&h983E5152,  &h2DB43210,&hA831C66D,  &h98FB213F,&hB00327C8,  &hBEEF0EE4,&hBF597FC7
    ! DD  &h3DA88FC2,&hC6E00BF3,  &h930AA725,&hD5A79147,  &hE003826F,&h06CA6351,  &h0A0E6E70,&h14292967
    ! DD  &h46D22FFC,&h27B70A85,  &h5C26C926,&h2E1B2138,  &h5AC42AED,&h4D2C6DFC,  &h9D95B3DF,&h53380D13
    ! DD  &h8BAF63DE,&h650A7354,  &h3C77B2A8,&h766A0ABB,  &h47EDAEE6,&h81C2C92E,  &h1482353B,&h92722C85
    ! DD  &h4CF10364,&hA2BFE8A1,  &hBC423001,&hA81A664B,  &hD0F89791,&hC24B8B70,  &h0654BE30,&hC76C51A3
    ! DD  &hD6EF5218,&hD192E819,  &h5565A910,&hD6990624,  &h5771202A,&hF40E3585,  &h32BBD1B8,&h106AA070
    ! DD  &hB8D2D0C8,&h19A4C116,  &h5141AB53,&h1E376C08,  &hDF8EEB99,&h2748774C,  &hE19B48A8,&h34B0BCB5
    ! DD  &hC5C95A63,&h391C0CB3,  &hE3418ACB,&h4ED8AA4A,  &h7763E373,&h5B9CCA4F,  &hD6B2B8A3,&h682E6FF3
    ! DD  &h5DEFB2FC,&h748F82EE,  &h43172F60,&h78A5636F,  &hA1F0AB72,&h84C87814,  &h1A6439EC,&h8CC70208
    ! DD  &h23631E28,&h90BEFFFA,  &hDE82BDE9,&hA4506CEB,  &hB2C67915,&hBEF9A3F7,  &hE372532B,&hC67178F2
    ! DD  &hEA26619C,&hCA273ECE,  &h21C0C207,&hD186B8C7,  &hCDE0EB1E,&hEADA7DD6,  &hEE6ED178,&hF57D4F7F
    ! DD  &h72176FBA,&h06F067AA,  &hA2C898A6,&h0A637DC5,  &hBEF90DAE,&h113F9804,  &h131C471B,&h1B710B35
    ! DD  &h23047D84,&h28DB77F5,  &h40C72493,&h32CAAB7B,  &h15C9BEBC,&h3C9EBE0A,  &h9C100D4C,&h431D67C4
    ! DD  &hCB3E42B6,&h4CC5D4BE,  &hFC657E2A,&h597F299C,  &h3AD6FAEC,&h5FCB6FAB,  &h4A475817,&h6C44198C
    END FUNCTION
    
    
    '====================
    FUNCTION SHA512_Compress128(Ctx AS SHA512_CONTEXT) AS LONG
    '-- Requires SSE2
    '-- In macros, EBX is considered always available; ESI & EDI are 
    '   preserved around use
    #REGISTER NONE
    LOCAL i, x, xx, t0, t1, pstate, result AS LONG
    LOCAL s_array, w_array, k_array AS LONG
    LOCAL aa, bb, cc, ddd, ee, ff, gg, hh AS LONG
    
    s_array = Ctx.s_array : w_array = Ctx.w_array : k_array = CTX.k_array
    
    '-- Local vars aa-hh overlay s_array&&(0-7)
    aa = s_array    : bb = s_array+8  : cc = s_array+16 : ddd = s_array+24
    ee = s_array+32 : ff = s_array+40 : gg = s_array+48 : hh  = s_array+56
    xx = Ctx.pworkspace : t0 = xx+16  : t1 = xx+32      : result = xx+48
    
    i = Ctx.pdata
    pstate = Ctx.pstate
    
    !  push     ebx
    !  push     esi
    !  push     edi
    
    '-- Copy current state into s_array&&()
    'poke$ s, peek$(Ctx.pstate, %HASHLEN)
    !  mov      esi,     pstate
    !  mov      edi,     s_array
    !  movdqa   xmm0,    [esi]
    !  movdqa   xmm1,    [esi+16]
    !  movdqa   xmm2,    [esi+32]
    !  movdqa   xmm3,    [esi+48]
    !  movdqa   [edi],   xmm0
    !  movdqa   [edi+16],xmm1
    !  movdqa   [edi+32],xmm2
    !  movdqa   [edi+48],xmm3
    
    '-- Copy target data into w&&(0-15) w/64-bit little-to-big endian conversion
    !  mov      esi,     i              ;i = Ctx.pdata = unaligned
    !  mov      edi,     w_array
    !  mov      ecx,     %BLOCKSIZE
    '-- 64-bit BSWAP * 2 /loop
    #ALIGN 16
    BSwapCopyTop:
    !  sub      ecx,     16
    !  movdqu   xmm0,    [esi+ecx]
    
    !  sub      ecx,     16
    !  movdqu   xmm2,    [esi+ecx]
    
    !  movdqa   xmm1,    xmm0
    !  movdqa   xmm3,    xmm2
    
    !  psllw    xmm0,    8
    !  psllw    xmm2,    8
    
    !  psrlw    xmm1,    8
    !  psrlw    xmm3,    8
    
    !  por      xmm0,    xmm1
    !  por      xmm2,    xmm3
    
    !  pshufhw  xmm0,    xmm0, &b00011011
    !  pshufhw  xmm2,    xmm2, &b00011011
    
    !  pshuflw  xmm0,    xmm0, &b00011011
    !  pshuflw  xmm2,    xmm2, &b00011011
    
    !  movdqa   [edi+ecx+16],  xmm0
    !  movdqa   [edi+ecx],     xmm2
    !  test     ecx,     ecx
    !  jnz      BSwapCopyTop
    
    '-- Fill w&&(16-79)
    '   for i = 16 to 79
    '      @w[i] = Gamma1(@w[i-2]) + @w[i-7] + Gamma0(@w[i-15]) + @w[i-16]
    '   next i
    !  mov      esi,     16             ;edi = w_array from above
    #ALIGN 16
    TopLoop1:
       !  mov      ebx,     esi
       !  sub      ebx,     2
       !  lea      eax,     [edi+ebx*8] ;x = w+((i-2)*8)
       !  mov      x,       eax         ;x --> w[i-2]
       Gamma1_128(x,result)             'result = Gamma1(@w[i-2])
    
       !  mov      ebx,     esi
       !  mov      edx,     result
       !  sub      ebx,     7
       !  lea      eax,     [edi+ebx*8] ;eax --> @w[i-7]
       '!  mov      x,    eax         ;x --> w[i-2]
       'Add8XtoY128(x,result)        'result = result + @w[i-2]
       !  movq     xmm6,    [edx]
       !  movq     xmm7,    [eax]
       !  paddq    xmm6,    xmm7
       !  movq     [edx],   xmm6
    
       !  mov      ebx,     esi
       !  sub      ebx,     15
       !  lea      eax,     [edi+ebx*8]
       !  mov      x,       eax         ;x --> w[i-15]
       Gamma0_128(x,xx)
       Add8XtoY128(xx,result)          'result = result + Gamma0(@w[i-15])
    
       !  mov      ebx,     esi
       !  mov      edx,     result
       !  sub      ebx,     16
       'Add8XtoY128(x,result)          'result = result + @w[i-16]
       !  lea      eax,     [edi+ebx*8] ;eax --> @w[i-16]
       !  movq     xmm6,    [edx]
       !  movq     xmm7,    [eax]
       !  paddq    xmm6,    xmm7
       !  movq     [edx],   xmm6
    
       !  lea      eax,     [edi+esi*8] ;x = w+(i*8)
       !  mov      edx,     result
       'Copy8XtoY128(result,x)         '@w[i] = @result
       !  movq     xmm0,    [edx]
       !  movq     [eax],   xmm0
       
    !  inc      esi
    !  cmp      esi,     79
    !  jng      TopLoop1
    
    'for i = 79 to 0 step -1
    !  xor      esi,     esi
    !  mov      edi,     80
    #ALIGN 16
    TopLoop2:
       't0 = @hh + Sigma1&&(@ee) + Chh(@ee, @ff, @gg) + @CTX.k_array[i] + @w[i]
       Copy8XtoY128(hh,t0)
       Sigma1_128(ee,result)
       Add8XtoY128(result,t0)
       Chh128(ee,ff,gg,result)
       Add8XtoY128(result,t0)
    
       !  mov      ebx,     k_array
       !  mov      edx,     t0
       !  lea      eax,     [ebx+esi]
       'Add8XtoY128(x,t0)
       !  movq     xmm6,    [edx]
       !  movq     xmm7,    [eax]
       !  paddq    xmm6,    xmm7
       !  movq     [edx],   xmm6
    
       !  mov      ebx,     w_array
       !  lea      eax,     [ebx+esi]
       'Add8XtoY128(x,t0)
       !  movq     xmm6,    [edx]
       !  movq     xmm7,    [eax]
       !  paddq    xmm6,    xmm7
       !  movq     [edx],   xmm6
    
       Sigma0_128(aa,t1)
       Maj128(aa,bb,cc,result)
       Add8XtoY128(result,t1)
    
       'Copy8XtoY64(gg,hh)
       'Copy8XtoY64(ff,gg)
       'Copy8XtoY64(ee,ff)
       'Copy8XtoY64(ddd,ee)
       '-- aa, cc, ee, gg = aligned
       !  mov      edx,     gg
       !  mov      ecx,     ff
       !  mov      ebx,     ee
       !  mov      eax,     ddd
       !  movq     xmm3,    [edx]
       !  movq     xmm2,    [ecx]
       !  movq     xmm1,    [ebx]
       !  movq     xmm0,    [eax]
       !  movq     [edx+8], xmm3
       !  movq     [edx],   xmm2
       !  movq     [ecx],   xmm1
       !  movq     [ebx],   xmm0
       Add8XtoY128(t0,ee)
    
       'Copy8XtoY64(cc,ddd)
       'Copy8XtoY64(bb,cc)
       'Copy8XtoY64(aa,bb)
       '@aa = t0 + t1
       !  mov      ecx,     cc
       !  mov      ebx,     bb
       !  mov      eax,     aa
       !  mov      edx,     t0
       !  movq     xmm3,    [ecx]
       !  movq     xmm2,    [ebx]
       !  movq     xmm1,    [eax]
       !  movq     xmm0,    [edx]
       !  movq     [ecx+8], xmm3
       !  movq     [ecx],   xmm2
       !  movq     [ebx],   xmm1
       !  movq     [eax],   xmm0
       Add8XtoY128(t1,aa)
    'next i
    !  add      esi,  8
    !  dec      edi
    !  jnz      TopLoop2
    
    'for i = 0 to 7 : [email protected][i] = [email protected][i] + @s[i] : next i
    !  mov      esi,     s_array        ;esi --> s_array&&(0) (aligned)
    !  mov      edi,     pstate         ;edi --> Ctx.State(0)
    
    !  movdqa   xmm0,    [edi]
    !  movdqa   xmm1,    [edi+16]
    !  movdqa   xmm2,    [edi+32]
    !  movdqa   xmm3,    [edi+48]
    
    !  paddq    xmm0,    [esi]
    !  paddq    xmm1,    [esi+16]
    !  paddq    xmm2,    [esi+32]
    !  paddq    xmm3,    [esi+48]
    
    !  movdqa   [edi],      xmm0
    !  movdqa   [edi+16],   xmm1
    !  movdqa   [edi+32],   xmm2
    !  movdqa   [edi+48],   xmm3
    
    '-- Burn context's temp values (poke$ Ctx.pworkspace, nul$(%WORKSPACESIZE))
    !  mov      ecx,        %WORKSPACESIZE
    !  pxor     xmm0,       xmm0
    !  lea      edi,        [esi+ecx]   ;point to end of workspace
    !  pxor     xmm1,       xmm1
    !  neg      ecx
    BurnTop:
    !  movdqa   [edi+ecx],  xmm0
    !  movdqa   [edi+ecx+16],  xmm0
    !  add      ecx,        32
    !  jnz      BurnTop
    
    !  pop      edi
    !  pop      esi
    !  pop      ebx
    END FUNCTION
    
    
    '====================
    FUNCTION SHA512_Compress64(Ctx AS SHA512_CONTEXT) AS LONG
    '-- Requires MMX
    '-- In macros, EBX is considered always available; ESI & EDI are 
    '   preserved around use
    #REGISTER NONE
    LOCAL i, x, xx, yy, zz, pstate, t0, t1, result AS LONG
    LOCAL s_array, w_array, k_array AS LONG
    LOCAL aa, bb, cc, ddd, ee, ff, gg, hh AS LONG
    
    s_array = Ctx.s_array : w_array = Ctx.w_array : k_array = CTX.k_array
    
    '-- Local vars aa-hh overlay s_array&&(0-7)
    aa = s_array    : bb = s_array+8  : cc = s_array+16 : ddd = s_array+24
    ee = s_array+32 : ff = s_array+40 : gg = s_array+48 : hh  = s_array+56
    xx = Ctx.pworkspace : yy = xx+8   : zz  = xx+16
    t0 = xx+24 : t1 = xx+32 : result = xx+40
    
    i = Ctx.pdata
    pstate = Ctx.pstate
    
    !  push     ebx
    !  push     esi
    !  push     edi
    '-- Copy current state into s_array&&()
    'poke$ s, peek$(Ctx.pstate, %HASHLEN)
    !  mov      esi,     pstate
    !  mov      edi,     s_array
    !  movq     mm0,     [esi]
    !  movq     mm1,     [esi+8]
    !  movq     mm2,     [esi+16]
    !  movq     mm3,     [esi+24]
    !  movq     mm4,     [esi+32]
    !  movq     mm5,     [esi+40]
    !  movq     mm6,     [esi+48]
    !  movq     mm7,     [esi+56]
    !  movq     [edi],   mm0
    !  movq     [edi+8], mm1
    !  movq     [edi+16],mm2
    !  movq     [edi+24],mm3
    !  movq     [edi+32],mm4
    !  movq     [edi+40],mm5
    !  movq     [edi+48],mm6
    !  movq     [edi+56],mm7
    
    '-- Copy target data into w&&(0-15) w/64-bit little-to-big endian conversion
    !  mov      esi,     i
    !  mov      edi,     w_array
    !  mov      ecx,     %BLOCKSIZE
    #ALIGN 4
    BSwapCopyTop:
    !  sub      ecx,     4
    !  mov      eax,     [esi+ecx]
    !  sub      ecx,     4
    !  bswap    eax
    !  mov      edx,     [esi+ecx]
    !  mov      [edi+ecx], eax
    !  bswap    edx
    !  test     ecx,     ecx
    !  mov      [edi+ecx+4], edx
    !  jnz      BSwapCopyTop
    
    '-- Fill w&&(16-79)
    '   for i = 16 to 79
    '      @w[i] = Gamma1(@w[i-2]) + @w[i-7] + Gamma0(@w[i-15]) + @w[i-16]
    '   next i
    !  mov         esi,  16          'edi = w from above
    #ALIGN 8
    TopLoop1:
       !  mov      ebx,  esi
       !  sub      ebx,  2
       !  lea      eax,  [edi+ebx*8] ;x = w+((i-2)*8)
       !  mov      x,    eax         ;x --> w[i-2]
       Gamma1_64(x,result)           'result = Gamma1(@w[i-2])
    
       !  mov      ebx,  esi
       !  sub      ebx,  7
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> @w[i-7] (x = w+((i-7)*8))
       Add8XtoY64(x,result)            'result = result + @w[i-7]
    
       !  mov      ebx,  esi
       !  sub      ebx,  15
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> w[i-15]
       Gamma0_64(x,xx)
       Add8XtoY64(xx,result)           'result = result + Gamma0(@w[i-15])
    
       !  mov      ebx,  esi
       !  sub      ebx,  16
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> @w[i-16]
       Add8XtoY64(x,result)            'result = result + @w[i-16]
    
       !  mov      edx,  result
       !  lea      eax,  [edi+esi*8] ;x = w+(i*8) (@w[i])
       'Copy8XtoY64(result,x)       '@w[i] = @result
       !  movq     mm0,  [edx]
       !  movq     [eax],mm0
    
    !  inc      esi
    !  cmp      esi,     79
    !  jng      TopLoop1
    
    'for i = 79 to 0 step -1
    !  xor      esi,     esi
    !  mov      edi,     80
    #ALIGN 8
    TopLoop2:
       't0 = @hh + Sigma1&&(@ee) + Chh(@ee, @ff, @gg) + @CTX.k_array[i] + @w[i]
       Copy8XtoY64(hh,t0)
       Sigma1_64(ee,result)
       Add8XtoY64(result,t0)
       Chh64(ee,ff,gg,result)
       Add8XtoY64(result,t0)
    
       !  mov      ebx,  k_array
       !  lea      eax,  [ebx+esi]
       !  mov      x,    eax
       Add8XtoY64(x,t0)
    
       !  mov      ebx,   w_array
       !  lea      eax,   [ebx+esi]
       !  mov      x,     eax
       Add8XtoY64(x,t0)
    
       Sigma0_64(aa,t1)
       Maj64(aa,bb,cc,result)
       Add8XtoY64(result,t1)
    
       'Copy8XtoY64(gg,hh)
       'Copy8XtoY64(ff,gg)
       'Copy8XtoY64(ee,ff)
       'Copy8XtoY64(ddd,ee)
       !  mov      edx,     gg
       !  mov      ecx,     ff
       !  mov      ebx,     ee
       !  mov      eax,     ddd
       !  movq     mm3,     [edx]
       !  movq     mm2,     [ecx]
       !  movq     mm1,     [ebx]
       !  movq     mm0,     [eax]
       !  movq     [edx+8], mm3
       !  movq     [edx],   mm2
       !  movq     [ecx],   mm1
       !  movq     [ebx],   mm0
       Add8XtoY64(t0,ee)
    
       'Copy8XtoY64(cc,ddd)
       'Copy8XtoY64(bb,cc)
       'Copy8XtoY64(aa,bb)
       '@aa = t0 + t1
       !  mov      ecx,     cc
       !  mov      ebx,     bb
       !  mov      eax,     aa
       !  mov      edx,     t0
       !  movq     mm3,     [ecx]
       !  movq     mm2,     [ebx]
       !  movq     mm1,     [eax]
       !  movq     mm0,     [edx]
       !  movq     [ecx+8], mm3
       !  movq     [ecx],   mm2
       !  movq     [ebx],   mm1
       !  movq     [eax],   mm0
       Add8XtoY64(t1,aa)
    'next i
    !  add      esi,     8
    !  dec      edi
    !  jnz      TopLoop2
    
    'for i = 0 to 7 : [email protected][i] = [email protected][i] + @s[i] : next i
    !  mov      eax,     s_array     ;eax --> s_array&&(0)
    !  mov      edx,     pstate      ;edx --> Ctx.State(0)
    !  mov      xx,      eax         ;xx  --> s_array&&(0)
    !  mov      yy,      edx         ;yy  --> Ctx.state0
    !  mov      edi,     8
    !  mov      esi,     7
    Add8XtoY64(xx,yy)
    #ALIGN 4
    TopLoop3:
       'advance pointers
       !  add      xx,   edi         ;xx --> s[i]
       !  add      yy,   edi         ;yy --> pcurrent_state[i]
       Add8XtoY64(xx,yy)
    !  dec      esi
    !  jnz      TopLoop3
    
    '-- Burn context's temp values (poke$ Ctx.pworkspace, nul$(%WORKSPACESIZE))
    !  mov      edi,     s_array
    !  xor      eax,     eax
    !  mov      ecx,     (%WORKSPACESIZE\4)
    !  cld
    !  rep      stosd
    
    !  pop      edi
    !  pop      esi
    !  pop      ebx
    !  emms
    END FUNCTION
    
    
    '====================
    FUNCTION SHA512_Compress32(Ctx AS SHA512_CONTEXT) AS LONG
    '-- Uses 32-bit code only
    '-- In macros, EBX is considered always available; ESI & EDI are 
    '   preserved around use
    #REGISTER NONE
    LOCAL i, x, xx, yy, zz, pstate, t0, t1, result AS LONG
    LOCAL s_array, w_array, k_array AS LONG
    LOCAL aa, bb, cc, ddd, ee, ff, gg, hh AS LONG
    
    s_array = Ctx.s_array : w_array = Ctx.w_array : k_array = CTX.k_array
    '-- Local vars aa-hh overlay s_array&&(0-7)
    aa = s_array    : bb = s_array+8  : cc = s_array+16 : ddd = s_array+24
    ee = s_array+32 : ff = s_array+40 : gg = s_array+48 : hh  = s_array+56
    xx = Ctx.pworkspace : yy = xx+8   : zz  = xx+16
    t0 = xx+24 : t1 = xx+32 : result = xx+40
    
    pstate = Ctx.pstate
    '-- Copy current state into s_array&&()
    poke$ s_array, peek$(pstate, %HASHLEN)
    
    '-- Copy target data into w&&(0-15) w/64-bit little-to-big endian conversion
    i = Ctx.pdata 
    !  push     ebx
    !  push     esi
    !  push     edi
    
    !  mov      esi,  i
    !  mov      edi,  w_array
    !  mov      ecx,  %BLOCKSIZE
    #ALIGN 4
    BSwapCopyTop:
    !  sub      ecx,  4
    !  mov      eax,  [esi+ecx]
    !  sub      ecx,  4
    !  bswap    eax
    !  mov      edx,  [esi+ecx]
    !  mov      [edi+ecx], eax
    !  bswap    edx
    !  test     ecx,  ecx
    !  mov      [edi+ecx+4], edx
    !  jnz      BSwapCopyTop
    
    '-- Fill w&&(16-79)
    '   for i = 16 to 79
    '      @w[i] = Gamma1(@w[i-2]) + @w[i-7] + Gamma0(@w[i-15]) + @w[i-16]
    '   next i
    !  mov      esi,  16             ;edi = w from above
    #ALIGN 4
    TopLoop1:
       '@w[i] = Gamma1(@w[i-2]) + @w[i-7] + Gamma0(@w[i-15]) + @w[i-16]
       !  mov      ebx,  esi
       !  sub      ebx,  2
       !  lea      eax,  [edi+ebx*8] ;x = w+((i-2)*8)
       !  mov      x,    eax         ;x --> w[i-2]
       Gamma1(x,result)              'result = Gamma1(@w[i-2])
    
       !  mov      ebx,  esi
       !  sub      ebx,  7
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> @w[i-7] (x = w+((i-7)*8))
       Add8XtoY(x,result)            'result = total + @w[i-7]
    
       !  mov      ebx,  esi
       !  sub      ebx,  15
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> w[i-15]
       Gamma0(x,xx)
       Add8XtoY(xx,result)           'result = result + Gamma0(@w[i-15])
    
       !  mov      ebx,  esi
       !  sub      ebx,  16
       !  lea      eax,  [edi+ebx*8]
       !  mov      x,    eax         ;x --> @w[i-16]
       Add8XtoY(x,result)            'result = result + @w[i-16]
    
       !  lea      eax,  [edi+esi*8] ;x = w+(i*8)
       !  mov      x,    eax         ;x --> @w[i]
       Copy8XtoY(result,x)           '@w[i] = @result
    !  inc      esi
    !  cmp      esi,  79
    !  jng      TopLoop1
    
    'for i = 79 to 0 step -1
    !  xor      esi,  esi
    !  mov      edi,  80
    #ALIGN 4
    TopLoop2:
       't0 = @hh + Sigma1&&(@ee) + Chh(@ee, @ff, @gg) + @CTX.k_array[i] + @w[i]
       Copy8XtoY(hh,t0)
       Sigma1(ee,result)
       Add8XtoY(result,t0)
       Chh(ee,ff,gg,result)
       Add8XtoY(result,t0)
    
       !  mov      ebx,  k_array
       !  lea      eax,  [ebx+esi]
       !  mov      x,    eax
       Add8XtoY(x,t0)
    
       !  mov      ebx,  w_array
       !  lea      eax,  [ebx+esi]
       !  mov      x,    eax
       Add8XtoY(x,t0)
    
       Sigma0(aa,t1)
       Maj(aa,bb,cc,result)
       Add8XtoY(result,t1)
    
       Copy8XtoY(gg,hh)
       Copy8XtoY(ff,gg)
       Copy8XtoY(ee,ff)
       Copy8XtoY(ddd,ee)
       Add8XtoY(t0,ee)
    
       Copy8XtoY(cc,ddd)
       Copy8XtoY(bb,cc)
       Copy8XtoY(aa,bb)
       Copy8XtoY(t0,aa)
       Add8XtoY(t1,aa)
    'next i
    !  add      esi,  8
    !  dec      edi
    !  jnz      TopLoop2
    
    'for i = 0 to 7 : [email protected][i] = [email protected][i] + @s[i] : next i
    !  mov      eax,  s_array        ;eax --> s_array&&(0)
    !  mov      edx,  pstate         ;edx --> Ctx.State(0)
    !  mov      xx,   eax            ;xx  --> s_array&&(0)
    !  mov      yy,   edx            ;yy  --> Ctx.state0
    !  mov      edi,  8
    !  mov      esi,  7
    Add8XtoY(xx,yy)
    #ALIGN 4
    TopLoop3:
       'advance pointers
       !  add   xx,  edi             ;xx --> s[i]
       !  add   yy,  edi             ;yy --> pcurrent_state[i]
       Add8XtoY(xx,yy)
    !  dec   esi
    !  jnz   TopLoop3
    
    '-- Burn context's temp values (poke$ Ctx.pworkspace, nul$(%WORKSPACESIZE))
    !  mov      edi,  s_array
    !  xor      eax,  eax
    !  mov      ecx,  (%WORKSPACESIZE\4)
    !  cld
    !  rep      stosd
    
    !  pop      edi
    !  pop      esi
    !  pop      ebx
    END FUNCTION
    
    
    '====================
    FUNCTION SHA512_Buffer(BYVAL DataBuffer AS BYTE PTR, BYVAL Length AS DWORD, BYVAL HashBuffer AS BYTE PTR) EXPORT AS LONG
    '-- Expects parameter Hash to point to buffer of correct size of %HASHLEN bytes (512bits\8)
    REGISTER i AS DWORD
    LOCAL lastbuff$, ctx AS SHA512_CONTEXT, pfunction, pstate AS LONG
    
    i = Length AND (%BLOCKSIZE-1)
    
    lastbuff$ = peek$((DataBuffer+Length)-i, i)
    lastbuff$ = lastbuff$ + SHA512_MakePadding(Length)
    
    SHA512_Init ctx
    
    if HasSSE2&() then
       pfunction = codeptr(SHA512_Compress128)
    elseif HasMMX&() then
       pfunction = codeptr(SHA512_Compress64)
    else
       pfunction = codeptr(SHA512_Compress32)
    end if
    
    ctx.lendata = Length
    ctx.pdata   = DataBuffer
    
    i = Length AND (NOT %BLOCKSIZE-1)
    do while i > 0
       call dword pfunction SDECL (ctx)
       ctx.pdata = ctx.pdata + %BLOCKSIZE
       i = i - %BLOCKSIZE
    loop
    
    ctx.pdata = strptr(lastbuff$)
    ctx.lendata = len(lastbuff$)
    
    do while ctx.lendata > 0
       call dword pfunction STDCALL (BYREF ctx)
       ctx.pdata = ctx.pdata + %BLOCKSIZE
       ctx.lendata = ctx.lendata - %BLOCKSIZE
    loop
    
    '-- Copy current state from s&() to Hash
    'for i = 0 to (%HASHLEN\8)-1 : @Hash[i] = [email protected][i] : next i
    pstate = ctx.pstate
    
    !  push     esi
    !  push     edi
    !  mov      esi,     pstate      ;esi -> ctx.state(0)
    !  mov      edi,     HashBuffer
    #IF %RETURN_LITTLE_ENDIAN
    !  xor      ecx,     ecx
    LoopTop:
    !  mov      edx,     [esi+ecx*4+4]
    !  mov      eax,     [esi+ecx*4]
    !  bswap    edx
    !  bswap    eax
    !  mov      [ecx*4+edi],  edx
    !  inc      ecx
    !  mov      [ecx*4+edi],  eax
    !  inc      ecx
    !  test     ecx,     (%HASHLEN\4)
    !  jz       LoopTop
    #ELSE
    !  mov      ecx,     (%HASHLEN\4)
    !  cld
    !  rep      movsd
    #ENDIF
    !  pop      edi
    !  pop      esi
    END FUNCTION
    
    
    '====================
    FUNCTION SHA512_File(File_Name$, Hash$) EXPORT AS LONG
    '-- Returns 0 on success or PB (not OS) error code
    '-- Parameter Hash$ is resized here before return
    REGISTER i AS LONG, bytesleft AS DWORD
    LOCAL buffer$, padding$
    LOCAL ctx AS SHA512_CONTEXT, phash AS QUAD PTR
    LOCAL ecode, pfunction, pstate, infile, lastpass, maxstring AS LONG
    
    '-- If file not found, return PB error code
    if isfile(File_Name$) = 0 then
       function = 53 : exit function
    end if
    
    buffer = string$(%FILE_BUFFERSIZE, 0)
    maxstring = %FILE_BUFFERSIZE
    
    ctx.lendata = %BLOCKSIZE
    SHA512_Init ctx
    
    if HasSSE2&() then
       pfunction = codeptr(SHA512_Compress128)
    elseif HasMMX&() then
       pfunction = codeptr(SHA512_Compress64)
    else
       pfunction = codeptr(SHA512_Compress32)
    end if
    
    infile = freefile
    open File_Name$ for binary lock shared as infile base=0
    if err then goto SHA_File_Error
    bytesleft = lof(infile)
    padding = SHA512_MakePadding(bytesleft)
    
    do
       'Resize if necessary & flag final buffer
       if bytesleft =< maxstring then
          maxstring = bytesleft
          buffer = string$(maxstring, 0)
          incr lastpass
       end if
       get infile,, buffer : if err then goto SHA_File_Error
       if lastpass then buffer = buffer + padding
       ctx.pdata = strptr(buffer)
       for i = 1 to (len(buffer)\%BLOCKSIZE)
          call dword pfunction STDCALL (BYREF ctx)
          ctx.pdata = ctx.pdata + %BLOCKSIZE
       next i
       bytesleft = bytesleft - maxstring
    loop until lastpass
    close infile : if err then goto SHA_File_Error
    
    '-- Copy current state from s&() to Hash$
    'for i = 0 to 7 : @Hash[i] = [email protected][i] : next i
    Hash$ = string$(%HASHLEN,0)
    phash = strptr(Hash$)
    pstate = ctx.pstate
    !  push     esi
    !  push     edi
    !  mov      esi,     pstate      ;esi -> ctx.state(0)
    !  mov      edi,     phash
    #IF %RETURN_LITTLE_ENDIAN
    !  xor      ecx,     ecx
    LoopTop:
    !  mov      edx,     [esi+ecx*4+4]
    !  mov      eax,     [esi+ecx*4]
    !  bswap    edx
    !  bswap    eax
    !  mov      [edi+ecx*4],   edx
    !  inc      ecx
    !  mov      [edi+ecx*4],   eax
    !  inc      ecx
    !  test     ecx,     (%HASHLEN\4)
    !  jz       LoopTop
    #ELSE
    !  mov      ecx,     (%HASHLEN\4)
    !  cld
    !  rep      movsd
    #ENDIF
    !  pop      edi
    !  pop      esi
    
    Exit_SHA_File:
    function = ecode
    EXIT FUNCTION
    
    '============
    SHA_File_Error:
    if err then
       ecode = errclear
    else
       ecode = -1
    end if
    RESUME Exit_SHA_File
    END FUNCTION
    
    
    '=========================
    FUNCTION SHA512_MakePadding(BYVAL TotalBytes AS DWORD) AS STRING
    '-- Creates the necessary string to append to targeted data buffer
    REGISTER i AS LONG, padBytes AS LONG
    LOCAL buffBits AS QUAD, padding$
    LOCAL pbyte1, pbyte2 AS BYTE PTR
    
    buffBits = TotalBytes * 8
    padding$ = nul$(16)
    pbyte1 = strptr(padding$)+8 : pbyte2 = varptr(buffBits)
    
    '-- Reverse bytes during copy
    for i = 0 to 7
       @pbyte1[i] = @pbyte2[7 - i]
    next i
    
    padBytes = %BLOCKSIZE - ((TotalBytes+17) AND (%BLOCKSIZE-1))
    function = chr$(&h80) + nul$(padBytes) + padding$
    END FUNCTION
    
    
    '===================
    FUNCTION HasSSE2() AS LONG
    !  mov   eax, 1
    !  cpuid
    !  xor   eax, eax
    !  test  edx, &h04000000   ;bit 26
    !  setnz al                ;rem to force downgrade to MMX
    !  mov   function, eax
    END FUNCTION
    
    '===================
    FUNCTION HasMMX() AS LONG
    !  mov      eax,  1
    !  cpuid
    !  xor      eax,  eax
    !  test     edx,  &h800000 ;bit 23
    !  setnz    al             ;rem to force downgrade to 32-bit
    !  mov      function, eax
    END FUNCTION
    '-- END SHA512a.INC ---------------------------------------------------
    Attached Files

  • #2
    PBWin 10

    The lines with call dword pfunction works not with PBWin 10

    Comment


    • #3
      Bernhard, the following is a bit of a dog's dinner but it works.

      Code:
      Declare Function TheseParameters(ctx As SHA512_CONTEXT) As Long
      
      and change the three instances of 'call dword pfunction' to
      
      Call Dword pfunction Using TheseParameters(ctx)
      No doubt a more elegant solution exists but I stopped at the first one that worked - I'd already pulled enough hair out.
      Last edited by David Roberts; 1 Oct 2013, 10:14 PM.

      Comment

      Working...
      X