Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

Complementary Multiply-with-Carry with lag PRNG

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PBWin/PBCC Complementary Multiply-with-Carry with lag PRNG

    The Multiply-with-Carry (MWC) pseudo random number generator (PRNG) by George Marsaglia has a couple of issues: the first is the use of 2^32 as a base which is resolved when 2^32-1 is used; the second is with the method itself which is resolved by using what is called Complementary Multiply-with-Carry (CMWC).

    The period of the original idea is just less than 2^64 which is the square of PB's RND at 2^32. On my machine doing nothing else but compute RND the cycle starts to repeat after only a couple of minutes and, of course, there are only 2^32 possible entry points into the cycle.

    It is worth noting that when using Randomize Timer we are restricting the number of entry points to seconds in a day x 64, the usual resolution of the system clock, giving then only 55259600 or 2^22.4 entry points.

    The period of both MWC and CMWC can be enhanced by using a lag method where we use initial values x(0), x(1), ..., x(r-1) and an initial value for the carry corresponding with x(r-1). The original idea then is a lag~1.

    I have been posting slls of late but two function 'shells', CMWC( Opt OptionalLag As Long ) and CWMCSeed( Opt Text As String ), are presented here, so you can kick them around to suit your requirements.

    The code presented here uses a CMWC base 2^32-1 with lag~r where r = 2, 4, 8, ..., 4096. With r = 4096 we have a period of 2^131086 compared with 2^19937 for the Mersenne Twister PRNG.

    The code assumes at least Windows XP SP3 and the latest compilers but can be re-written to accomodate all versions of Windows and compilers but with more effort.

    In Marsaglia's post here we have
    Code:
    static unsigned long Q[4096],c=362436; /* choose random initial c<809430660 and */
                                             /* 4096 random 32-bit integers for Q[]   */
      unsigned long CMWC4096(void){
      unsigned long long t, a=18782LL;
      static unsigned long i=4095;
      unsigned long x,r=0xfffffffe;
         i=(i+1)&4095;
         t=a*Q[i]+c;
         c=(t>>32); x=t+c; if(x<c){x++;c++;}
         return(Q[i]=r-x);    }
    Above that section of code is a MWC base 2^32 lag~256 which uses a multiplier 'a' of 809430660. The carrys generated will never exceed the multiplier and it is therefore preferable to use an intial carry less than the multipler. However, the choice in the above is the same as with the MWC example and the used value of c is 362436; which is greater than the multiplier 18782. The term c=(t>>32) is what we would use for a base 2^32. I wonder then if the CMWC was a copy and paste and not fully edited. I shall come back to c=(t>>32).

    The principle of CMWC is x = b - 1 - x where b is the base and x, on the right hand side, is the number generated by the MWC. With b = 2^32-1 we get x = 2^32 - 2 - x. 2^32 - 2 is 0xfffffffe; the value used for r in 'return(Q[i]=r-x'. The x's generated by MWC base 2^32-1 will be in the range [0,2^32-2]. If we use x = 0 and x = 2^32-2 in the CMWC we get [2^32-2,0]. So, the above is using a base of 2^32-1.

    The term x=t+c had me fooled for a while, I was expecting a modulo.

    In general, j = k*(j\k) + j MOD k ie quotient + remainder

    Rearranging, j MOD k = j - k*(j\k)

    With k = 2^32-1 we get

    j MOD (2^32-1) = j - (2^32-1)*(j\(2^32-1))
    = j - 2^32*(j\(2^32-1)) + j\(2^32-1)

    Substituting Marsaglia's variables we get

    x = t MOD (2^32-1) = t - 2^32*(t\(2^32-1)) + t\(2^32-1)
    = t - 2^32*c + c ; with c = t\(2^32-1)
    = t + c

    Since 2^32*c = 0 because c is 32 bit.

    I had not seen that before - it is neat isn't it?

    This further confirms a base of 2^32-1.

    In the following code x = t Mod &H0FFFFFFFF is used rather than x = t + c because it was found to be marginally faster; we are using Register t As Ext. I should imagine that x = t + c will come into its own with an assembler version.

    So, why is c=(t>>32) being used instead of c = t\&H0FFFFFFFF? ie t\(2^32-1).

    Answers on a post card to ......

    In the following code c = t\&H0FFFFFFFF is used.

    I should add that every implementation seen, from using C/C++, PureBASIC and Java, port Marsaglia's post as written - no questions asked.

    At the moment I don't understand 'if(x<c){x++;c++;}' but have accepted it. More thought is required. Perhaps someone can put me wise on that?

    For byte stream testing purposes we need to map [0,2^32-2] into [0,2^32-1] otherwise the fourth byte of a dword dump will be in the range [0,254] and not [0,255].

    In addition to George Marsaglia's DIEHARD tests I use ent.exe from Fourmilab.

    Here are the results from ten tests of 15MB dumps; five done one day and another five done another day.

    Code:
                         1         2         3         4         5         6         7         8         9        10
    Entropy          7.999989  7.999989  7.999989  7.999990  7.999988  7.999989  7.999988  7.999988  7.999988  7.999990
    Chi square        86.67%    70.67%    83.99%    90.55%    22.72%    83.27%    55.25%    47.67%    56.03%    94.64%
    Arithmetic Mean  127.4922  127.4959  127.4820  127.5184  127.5042  127.5056  127.5123  127.4812  127.5121  127.5221
    Monte Carlo       0.01%     0.05%     0.02%     0.04%     0.03%     0.01%     0.01%     0.02%     0.02%     0.05%
    SCC              0.000210  0.000120  0.000159 -0.000144 -0.000528  0.000407 -0.000089  0.000040  0.000266  0.000254
    For a full description of the tests see Fourmilab's home page.

    With the chi square, in particular, non-random behaviour will put us into the regions < 10% and > 90%. We only have two in the above and they would qualify as almost suspect. The more tests we do the more likely one would qualify as suspect. There is nothing unusual with that.

    The conclusion here is that there is no evidence to contradict the null hypothesis that CMWC produces good quality pseudo random numbers.

    Compare with these tests of 15MB dumps.

    Code:
                         1         2         3
    Entropy          7.999990  7.999989  7.999988
    Chi square        97.13%    80.63%    41.68%
    Arithmetic Mean  127.5034  127.5028  127.46772
    Monte Carlo       0.00%     0.02%     0.06%
    SCC              0.000156  0.000354 -0.000165
    These were 15MB dumps from here; a quantum random number generator service - the reason why I am using 15MB dumps since the service gives dumps of 1MB, 15MB and 100MB.

    That chi square of 97.13% puts us into the suspect region but it may be a while before we see that again.

    Obviously, there are many ways we can populate Q() with the initial random seeds. In the following code the CryptGenRandom API is used. It is easy to implement and, by definition, does not require seeding itself.

    Best practice would call the function CMWC once in an application's initial stages so as to initialize the generator. It has an optional lag parameter which defaults to lag~256 if not given or if an invalid lag is given.

    On my machine I am getting about 10 million dwords a second, irrespective of which lag is used - a bit of assembler would help.

    So, which lag to use? I had reservations about a small multipler but they are not borne out in tests. There are some scenarios where exceptionally large periods are required where even Mersenne Twister is left wanting. A lag of 1024 and above exceed Mersenne Twister. Devices limited in memory would benefit from low memory Q() usage. A lag~64 needs only 256 bytes for Q() and has a period of 2^2077. You decide.

    Post any comments here. I have one of my own.

    Example code:

    Code:
    #Compile Exe
    #Dim All
     
    #Include "WIN32API.INC"
     
    Global hProv As Dword
     
    Function PBMain() As Long
    Local i, n As Long
    Local temp As Dword
    Local msg As String
     
      hProv = GetCSPHandle
      If hProv = 0 Then
        MessageBox 0, "Failed to get Cryptographic Service Provider handle" + $CrLf + _
        "Probable cause: Operating system < XP SP3", "CMWC", %MB_IconError Or %MB_ApplModal Or %MB_Topmost
        Exit Function
      End If
     
      CMWC(4096) ' Initialize generator ' takes 0.824ms On my machine
     
      n = 20
    '  n = 15*1024*1024\4
    '  Open "15MB04096.txt" For Binary As #1
      For i = 1 To n
        msg = msg + Str$(CMWC) + $CrLf
    '   temp = CMWC()
    '   Put #1, , temp
      Next
    '  Close #1
     
      MsgBox msg
     
    '  MsgBox "Done"
     
      If hProv Then CryptReleaseContext hProv, 0
     
    End Function
     
    Function CMWC( Opt OptionalLag As Long ) As Dword
     
    Dim Q() As Static Dword ' Random x seeds
    Static i, j, Multiplier, BeenHere As Long
    Static cary As Dword ' Random And < Multiplier
    Register x As Dword ' MWC Random number generated
    Register t As Ext
     
      If IsFalse BeenHere Then ' Initialize Q() And carry
     
        Local Lag As Long
        If IsMissing(OptionalLag) Then
          Lag = 256
        Else
          Lag = OptionalLag
        End If
     
        i = Lag -1
        j = i
     
        ReDim Q(0 To i) As Static Dword
        CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) ) ' Random x seeds
        Local k As Long
        For k = 0 To i
          If Q(k) = 4294967295 Then Decr Q(k)
          ' x seeds should be less than 2^32-1
        Next
     
        Select Case As Long Lag
          Case 4
            Multiplier = 987654366 ' 2^157
          Case 8
            Multiplier = 987651386 ' 2^285
          Case 16
            multiplier = 987765178 ' 2^541
          Case 32
            Multiplier = 987655670 ' 2^1053
          Case 64
            Multiplier = 987651206 ' 2^2077
          Case 128
            Multiplier = 987688302 ' 2^4125
          Case 256
            Multiplier = 987662290 ' 2^8221
          Case 512
            Multiplier = 123462658 ' 2^16410
          Case 1024
            Multiplier = 5555698   ' 2^32790
          Case 2048
            Multiplier = 1030770   ' 2^65559
          Case 4096
            Multiplier = 18782     ' 2^131086
          Case Else
            MsgBox "Invalid lag given." + $CrLf + $CrLf + "A 256 lag will be used.", %MB_Ok, "CMWC"
            Multiplier = 987662290 ' 2^8221
        End Select
     
        CryptGenRandom(hProv, 4, VarPtr(cary))
        cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
        BeenHere = %True
     
      End If
     
      ' { Assembler here would be nice
      i = ( i + 1 ) And j ' Cycle around j
      t = Multiplier * Q(i) + cary
      cary = t \ &H0FFFFFFFF ' Base = 2^32-1
    '  x = t + cary
      x = t Mod &H0FFFFFFFF
    '  If ( x < cary ) Then ' See next post why commented out
    '    Incr x : Incr cary
    '  End If
      Q(i) = &H0FFFFFFFE - x ' The CMWC aspect (Base-1) - x
      ' }
     
      ' Q(i) = Q(i) * 1.000000000232831##  ; used when creating test files
     
      Function = Q(i)
     
    End Function
     
    ' ***************************************************
     
    Function GetCSPHandle As Long
      ' We have a failure If the Function returns %False
      Local hCSP As Dword
     
      If CryptAcquireContext( hCSP, ByVal %Null, ByVal %Null, %PROV_RSA_AES, 0 ) = %False Then
        ' Unable To Find a Default For %PROV_RSA_AES so force the issue first For XP
        If CryptAcquireContext( hCSP, ByVal %Null, $MS_ENH_RSA_AES_PROV_XP_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) = %False Then
          ' Still failed so force the issue For Vista And Win7
          If CryptAcquireContext( hCSP, ByVal %Null, $MS_ENH_RSA_AES_PROV_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) <> %False Then
            Function = hCSP
          End If
        Else
          Function = hCSP
        End If
      Else ' found Default
        Function = hCSP
      End If
     
    End Function
     
    ' ***************************************************


    ***********************************************************************************

    There are situations where we need to have the same sequence per instance of generation along the lines of 'Randomize [number]'. The function CMWC will not do that as it stands. The CryptGenRandom aspect will need to be replaced.

    One way is presented with the function CMWCSeed along the lines of 'Randomize [alphanumeric]' which, of course, we do not have with PB.

    The principle is to take an alphanumeric seed and hash it with, say, SHA256 to populate a lag~8 Q(). This has a period of 2^285. To put that into perspective 2^285/2^32 = 2^253 ~ 10^76 times the period of RND. A SHA512 approach will give ~ 10^162 times the period of RND.

    The alphanumeric parameter is optional. If not given "The time has come the walrus said...." is used; and why not?

    Obviously, there are many ways to populate Q() with a seeded approach - using a hash is an easy way.

    Example code:
    Code:
    #Compile Exe
    #Dim All
     
    #Include "WIN32API.INC"
     
    Global hProv As Dword
     
    Function PBMain() As Long
    Local i, n As Long
    Local msg As String
     
      hProv = GetCSPHandle
      If hProv = 0 Then
        MessageBox 0, "Failed to get Cryptographic Service Provider handle" + $CrLf + _
        "Probable cause: Operating system < XP SP3", "CMWC", %MB_IconError Or %MB_ApplModal Or %MB_Topmost
        Exit Function
      End If
     
      CWMCSeed("PowerBASIC Compile without compromise") ' Takes 0.043ms On my machine
     
      n = 20
      For i = 1 To n
        msg = msg + Str$(CWMCSeed) + $CrLf
      Next
     
      MsgBox msg
     
      If hProv Then CryptReleaseContext hProv, 0
     
    End Function
     
    Function CWMCSeed( Opt Text As String ) As Dword
     
    Dim Q() As Static Dword ' Random x seeds
    Static i, BeenHere As Long
    Static cary As Dword
    Register x As Dword ' MWC Random number generated
    Register t As Ext
     
      If IsFalse BeenHere Then ' Initialize Q()
     
        Local TextToCrunch As String
        If IsMissing(Text) Then
          TextToCrunch = "The time has come the walrus said...."
        Else
          TextToCrunch = Text
        End If
     
        Local hHash, hashSize As Long
        i = 7
        ReDim Q(0 To i) As Static Dword
        CryptCreateHash( hProv, %CALG_SHA_256, 0, 0, hHash )
        CryptHashData( hHash, ByVal StrPtr(TextToCrunch), Len(TextToCrunch), 0 )
        CryptGetHashParam( hHash, %HP_HASHSIZE, VarPtr(hashSize), 4, 0 )
        CryptGetHashParam( hHash, %HP_HASHVAL, VarPtr( Q(0) ), hashSize, 0 )
        Local k As Long
        For k = 0 To i
          If Q(k) = 4294967295 Then Decr Q(k)
          ' x seeds should be less than 2^32-1
        Next
     
        cary = 123456789 ' Carved In stone but could be anything As
                         ' Long As it Is Fixed And less than 9877651386
        BeenHere = %True
     
      End If
     
      i = ( i + 1 ) And 7
      t = 987651386 * Q(i) + cary
      cary = t \ &H0FFFFFFFF ' Base = 2^32-1
      'x = t + cary
      x = t Mod &H0FFFFFFFF
    '  If ( x < cary ) Then ' See next post why commented out
    '    Incr x : Incr cary
    '  End If
      Q(i) = &H0FFFFFFFE - x ' The CMWC aspect (Base-1) - x
     
      Function = Q(i)
     
    End Function
     
    ' *****************************************************************************************************
    Function GetCSPHandle As Long
      ' We have a failure If the Function returns %False
      Local hCSP As Dword
     
      If CryptAcquireContext( hCSP, ByVal %Null, ByVal %Null, %PROV_RSA_AES, 0 ) = %False Then
        ' Unable To Find a Default For %PROV_RSA_AES so force the issue first For XP
        If CryptAcquireContext( hCSP, ByVal %Null, $MS_ENH_RSA_AES_PROV_XP_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) = %False Then
          ' Still failed so force the issue For Vista And Win7
          If CryptAcquireContext( hCSP, ByVal %Null, $MS_ENH_RSA_AES_PROV_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) <> %False Then
            Function = hCSP
          End If
        Else
          Function = hCSP
        End If
      Else ' found Default
        Function = hCSP
      End If
     
    End Function
     
    ' *****************************************************************************************************
    Last edited by David Roberts; 3 May 2012, 04:46 PM.

  • #2
    I may have the answer to "Why do we have 'if(x<c){x++;c++;}'?"

    c=(t>>32) is c = t \ 2^32. The difference between that and t \ (2^32-1) is negligible but c, since it is a dword, may fall shy by 1 for some values of t so needs to be incremented. The C code uses x=t+c so when c needs incrementing then so does x.

    However, we are not using x=t+c but x = t Mod &H0FFFFFFFF so x doesn't need incrementing.

    We are not using c = t \ 2^32 but c = t \ &H0FFFFFFFF ie c = t \ (2^32-1) so c doesn't need incrementing in the first place.

    So, when we use

    cary = t \ &H0FFFFFFFF
    x = t Mod &HFFFFFFFF

    we don't need

    Code:
    If ( x < cary ) Then
      Incr x : Incr cary
    End If
    When removed I get

    Code:
                         1         2         3         4         5         6         7         8         9        10
    Entropy          7.999989  7.999988  7.999988  7.999989  7.999987  7.999988  7.999990  7.999988  7.999990  7.999988
    Chi square        74.02%    48.99%    30.19%    84.10%    13.40%    39.22%    91.22%    42.37%    92.04%    55.90%
    Arithmetic Mean  127.4773  127.4962  127.4995  127.4796  127.5042  127.4567  127.4963  127.4881  127.4872  127.5163
    Monte Carlo       0.03%     0.02%     0.03%     0.01%     0.01%     0.04%     0.04%     0.03%     0.05%     0.04%
    SCC              0.000268 -0.000386 -0.000245  0.000147 -0.000011 -0.000180  0.000104  0.000001 -0.000080 -0.000124
    I have commented the test in the code above just in case someone comes back with a reason to leave it in.

    It was an expensive test - crunching is now about 18% faster - didn't expect that.

    Comment


    • #3
      Updated 13 May 2012 09:44 BST

      On my machine I am getting about 10 million dwords a second, irrespective of which lag is used - a bit of assembler would help.
      John Gleason stepped to the fore and replaced the 'engine' with assembler. Thanks, John.

      John's code uses c=(t>>32) so I gave some further thought in the discussion here to Marsaglia's 'if(x<c){x++;c++;}' and found it to be correct.

      I am now getting about 61 million dwords a second for lag~4096 and about 55 million dwords a second for lag~256. [With this update]. It is times like this that the inline assembler is worth its weight in gold.

      Code:
      Function CMWC( Opt OptionalLag As Long ) As Dword
       
      Dim Q() As Static Dword ' Random x seeds
      Static i, j, Multiplier, BeenHere As Long
      Static cary As Dword ' Random And < Multiplier
      Static q0Address As Byte Ptr
       
        If IsFalse BeenHere Then ' Initialize Q() And carry
       
          Local Lag As Long
          If IsMissing(OptionalLag) Then
            Lag = 256
          Else
            Lag = OptionalLag
          End If
       
          i = Lag -1
          j = i
       
          ReDim Q(0 To i) As Static Dword
          q0Address = VarPtr( Q(0) )
          CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) ) ' Random x seeds
          Local k As Long
          For k = 0 To i
            If Q(k) = 4294967295 Then Decr Q(k)
            ' x seeds should be less than 2^32-1
          Next
       
          Select Case As Long Lag
            Case 4
              Multiplier = 987654366 ' 2^157
            Case 8
              Multiplier = 987651386 ' 2^285
            Case 16
              multiplier = 987765178 ' 2^541
            Case 32
              Multiplier = 987655670 ' 2^1053
            Case 64
              Multiplier = 987651206 ' 2^2077
            Case 128
              Multiplier = 987688302 ' 2^4125
            Case 256
              Multiplier = 987662290 ' 2^8221
            Case 512
              Multiplier = 123462658 ' 2^16410
            Case 1024
              Multiplier = 5555698   ' 2^32790
            Case 2048
              Multiplier = 1030770   ' 2^65559
            Case 4096
              Multiplier = 18782     ' 2^131086
            Case Else
              MsgBox "Invalid lag given." + $CrLf + $CrLf + "A 256 lag will be used.", %MB_Ok, "CMWC"
              Multiplier = 987662290 ' 2^8221
          End Select
       
          CryptGenRandom(hProv, 4, VarPtr(cary))
          cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
          BeenHere = %True
       
        End If
        
        !mov eax, i
        !mov ecx, j
        !Add eax, 1     ;i + 1
        !And eax, ecx   ;And j
        !mov i, eax
      
        !mov ecx, q0Address ' Address Of Q(0)
        !mov eax, [ecx+eax*4] ' Contenets Of Q(i)
      
        !mov edx, Multiplier
        !mul edx        ;Multiplier * Q(i)
        !mov ecx, cary
        !Add eax, ecx   ;+ cary 
        !adc edx, 0     ;In Case there Is a carry Of 1, Add it.
        !mov cary, edx  ; store cary
        !Add eax, edx   ; x = t + cary
        !jc short incr2vars
        !cmp eax, -1
        !je short incr2vars
      
      incrReturn:
        ' The CMWC aspect (Base-1) - x                
        !mov edx, &H0FFFFFFFE
        !Sub edx, eax   ;edx = &H0FFFFFFFE - x
        !mov eax, i
        !mov ecx, q0Address ' Adress Of Q(0)
        !mov [ecx+eax*4], edx ' Store Q(i)
      
      '  Q(i) = Q(i) * 1.000000000232831##  ' used when creating test files  
        
        Function = Q(i)
        
        Exit Function
        
      Incr2vars:
        !Add eax, 1 'Incr x
        !Add cary, 1 'Incr cary
        !jmp incrReturn
        
      End Function
      Code:
      Function CMWCSeed( Opt Text As String ) As Dword
      Static i, j, Multiplier, BeenHere As Long
      Static cary As Dword ' Random And < Multiplier
      Static q0Address As Byte Ptr
       
        If IsFalse BeenHere Then ' Initialize Q()
       
          Local TextToCrunch As String
          If IsMissing(Text) Then
            TextToCrunch = "The time has come the walrus said...."
          Else
            TextToCrunch = Text
          End If
       
          Local hHash, hashSize As Long
       
          i = 7 ' Use 15 For lag~16
       
          j = i
       
          ReDim Q(0 To i) As Static Dword
          q0Address = VarPtr( Q(0) )
       
          CryptCreateHash( hProv, %CALG_SHA_256, 0, 0, hHash ) ' Use %CALG_SHA_512 For lag~16
       
          CryptHashData( hHash, ByVal StrPtr(TextToCrunch), Len(TextToCrunch), 0 )
          CryptGetHashParam( hHash, %HP_HASHSIZE, VarPtr(hashSize), 4, 0 )
          CryptGetHashParam( hHash, %HP_HASHVAL, VarPtr( Q(0) ), hashSize, 0 )
       
          Local k As Long
          For k = 0 To i
            If Q(k) = 4294967295 Then Decr Q(k)
            ' x seeds should be less than 2^32-1
          Next
       
          Multiplier = 987651386 ' Use 987765178 For lag~16
       
          cary = 123456789 ' Carved In stone but could be anything As
                           ' Long As it Is Fixed And less than 9877651386 or 987765178 for lag~16
          BeenHere = %True
       
        End If
        
        !mov eax, i
        !mov ecx, j
        !Add eax, 1     ;i + 1
        !And eax, ecx   ;And j
        !mov i, eax
      
        !mov ecx, q0Address ' Address Of Q(0)
        !mov eax, [ecx+eax*4] ' Contenets Of Q(i)
      
        !mov edx, Multiplier
        !mul edx        ;Multiplier * Q(i)
        !mov ecx, cary
        !Add eax, ecx   ;+ cary 
        !adc edx, 0     ;In Case there Is a carry Of 1, Add it.
        !mov cary, edx  ; store cary
        !Add eax, edx   ; x = t + cary
        !jc short incr2vars
        !cmp eax, -1
        !je short incr2vars
      
      incrReturn:
        ' The CMWC aspect (Base-1) - x                
        !mov edx, &H0FFFFFFFE
        !Sub edx, eax   ;edx = &H0FFFFFFFE - x
        !mov eax, i
        !mov ecx, q0Address ' Adress Of Q(0)
        !mov [ecx+eax*4], edx ' Store Q(i)
      
      '  Q(i) = Q(i) * 1.000000000232831##  ' used when creating test files  
        
        Function = Q(i)
        
        Exit Function
        
      Incr2vars:
        !Add eax, 1 'Incr x
        !Add cary, 1 'Incr cary
        !jmp incrReturn
        
      End Function
      Last edited by David Roberts; 13 May 2012, 06:02 AM. Reason: Forgot to remove Local tQ as quad

      Comment


      • #4
        This requires compilers 10 & 6.

        It is a FastProc version of CMWC.

        The lag is no longer optional and the return is Long.

        On my machine it produces in excess of 124 million Longs per second when speed testing with temp = CMWCF(4096). Without assignment I get 135 million Longs per second.

        Code:
        FastProc CMWCF( ByVal Lag As Long ) As Long
        
        Dim Q() As Static Dword ' Random x seeds
        Static i, j, Multiplier, BeenHere As Long
        Static cary As Dword ' Random And < Multiplier
        Static q0Address As Byte Ptr
        
          If IsFalse BeenHere Then ' Initialize Q() And carry
            
            i = Lag -1
            
            ReDim Q(0 To i) As Static Dword
            q0Address = VarPtr( Q(0) )
            CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) ) ' Random x seeds
        
            For j = 0 To i
              If Q(j) = 4294967295 Then Decr Q(j)
              ' x seeds should be less than 2^32-1
            Next
        
            Select Case As Long Lag
              Case 4
                Multiplier = 987654366 ' 2^157
              Case 8
                Multiplier = 987651386 ' 2^285
              Case 16
                multiplier = 987765178 ' 2^541
              Case 32
                Multiplier = 987655670 ' 2^1053
              Case 64
                Multiplier = 987651206 ' 2^2077
              Case 128
                Multiplier = 987688302 ' 2^4125
              Case 256
                Multiplier = 987662290 ' 2^8221
              Case 512
                Multiplier = 123462658 ' 2^16410
              Case 1024
                Multiplier = 5555698   ' 2^32790
              Case 2048
                Multiplier = 1030770   ' 2^65559
              Case 4096
                Multiplier = 18782     ' 2^131086
              Case Else
                MsgBox "Invalid lag given." + $CrLf + $CrLf + "A 256 lag will be used.", %MB_Ok, "CMWC"
                Multiplier = 987662290 ' 2^8221
            End Select
            
            CryptGenRandom(hProv, 4, VarPtr(cary))
            cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
            j = i
            
            BeenHere = %True 
        
          End If
          
          !mov eax, i
          !mov ecx, j
          !Add eax, 1     ;i + 1
          !And eax, ecx   ;And j
          !mov i, eax
        
          !mov ecx, q0Address ' Addrees Of Q(0)
          !mov eax, [ecx+eax*4] ' Contents Of Q(i)
        
          !mov edx, Multiplier
          !mul edx        ;Multiplier * Q(i)
          !mov ecx, cary
          !Add eax, ecx   ;+ cary 
          !adc edx, 0     ;In Case there Is a carry Of 1, Add it.
          !mov cary, edx  ; store cary
          !Add eax, edx   ; x = t + cary
          !jc short incr2vars
          !cmp eax, -1
          !je short incr2vars
        
        incrReturn:
          ' The CMWC aspect (Base-1) - x                
          !mov edx, &H0FFFFFFFE
          !Sub edx, eax   ;edx = &H0FFFFFFFE - x
          !mov eax, i
          !mov ecx, q0Address ' Adress Of Q(0)
          !mov [ecx+eax*4], edx ' Store Q(i)
        
          !jmp short Done  
          
        Incr2vars:
          !Add eax, 1 'Incr x
          !Add cary, 1 'Incr cary
          !jmp short incrReturn
          
        Done:
          
        End FastProc = Q(i)
        Last edited by David Roberts; 13 May 2012, 10:50 AM.

        Comment


        • #5
          This also requires compilers 10 & 6.

          Updated 15 May 2012 17:58 BST

          It is essentially the FastProc function above, named CMWCFF (the last F for Float), except that it does not return a value itself but puts an Extended value [0,1) in the globally declared RandomFloat.

          On my machine:

          t = Rnd => 30 million per second with a period of 2^32

          { CMWCFF(4096)
          t = RandomFloat } => 37 million per second with a period of 2^131086

          So, in use we would do the following

          Global RandomFloat As Ext

          Execute CMWCFF(4096), or whatever lag is to be used.

          and then use RandomFloat

          Here are a couple of averages of 100 million numbers: 0.5000302515 and 0.4999855848. We get similar results from 100 million Rnds.

          Code:
          FastProc CMWCFF( ByVal Lag As Long )
          
          Dim Q() As Static Dword ' Random x seeds
          Static i, j, Multiplier, BeenHere As Long
          Static cary As Dword ' Random And < Multiplier
          Static q0Address As Byte Ptr
          Static mult As Ext
          Static value As Quad
          
            If IsFalse BeenHere Then ' Initialize Q() And carry
              
              i = Lag -1
              
              ReDim Q(0 To i) As Static Dword
              q0Address = VarPtr( Q(0) )
              CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) ) ' Random x seeds
          
              For j = 0 To i
                If Q(j) = 4294967295 Then Decr Q(j)
                ' x seeds should be less than 2^32-1
              Next
          
              Select Case As Long Lag
                Case 4
                  Multiplier = 987654366 ' 2^157
                Case 8
                  Multiplier = 987651386 ' 2^285
                Case 16
                  multiplier = 987765178 ' 2^541
                Case 32
                  Multiplier = 987655670 ' 2^1053
                Case 64
                  Multiplier = 987651206 ' 2^2077
                Case 128
                  Multiplier = 987688302 ' 2^4125
                Case 256
                  Multiplier = 987662290 ' 2^8221
                Case 512
                  Multiplier = 123462658 ' 2^16410
                Case 1024
                  Multiplier = 5555698   ' 2^32790
                Case 2048
                  Multiplier = 1030770   ' 2^65559
                Case 4096
                  Multiplier = 18782     ' 2^131086
                Case Else
                  MsgBox "Invalid lag given." + $CrLf + $CrLf + "A 256 lag will be used.", %MB_Ok, "CMWC"
                  Multiplier = 987662290 ' 2^8221
              End Select
              
              CryptGenRandom(hProv, 4, VarPtr(cary))
              cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
              j = i
              mult = 0.0000000002328306437080797375 ' From 1/(2^32-1)
              BeenHere = %True 
          
            End If
            
            !mov eax, i
            !mov ecx, j
            !Add eax, 1     ;i + 1
            !And eax, ecx   ;And j
            !mov i, eax
          
            !mov ecx, q0Address ' Addrees Of Q(0)
            !mov eax, [ecx+eax*4] ' Contents Of Q(i)
          
            !mov edx, Multiplier
            !mul edx        ;Multiplier * Q(i)
            !mov ecx, cary
            !Add eax, ecx   ;+ cary 
            !adc edx, 0     ;In Case there Is a carry Of 1, Add it.
            !mov cary, edx  ; store cary
            !Add eax, edx   ; x = t + cary
            !jc short incr2vars
            !cmp eax, -1
            !je short incr2vars
          
          incrReturn:
            ' The CMWC aspect (Base-1) - x                
            !mov edx, &H0FFFFFFFE
            !Sub edx, eax   ;edx = &H0FFFFFFFE - x
            !mov eax, i
            !mov ecx, q0Address ' Adress Of Q(0)
            !mov [ecx+eax*4], edx ' Store Q(i)
            
            !mov value, edx
            !fild Qword value
            !fld mult
            !fmulp st(1), st
            !fstp RandomFloat
          
            !jmp short Done
          
          Incr2vars:
            !Add eax, 1 'Incr x
            !Add cary, 1 'Incr cary
            !jmp short incrReturn
            
          Done:
            
          End FastProc
          Last edited by David Roberts; 15 May 2012, 01:00 PM.

          Comment


          • #6
            Again, this requires compilers 10 & 6

            This again is essentially the FastProc function above, named CMWCFR (the R for range), and is analogous to RND(a,b). The difference here is that the lag is hard wired at 4096 as FastProces limits us to two Byval Long parameters.

            The assembler for ranging is an adaptation of code developed in RND2 and credit goes to John Gleason and Gary Ramey.

            On my machine:

            t = Rnd(-45,200) => 2 million per second
            t = CMWCFR(-45,200) => 11 million per second
            Code:
            FastProc CMWCFR( ByVal One As Long, ByVal Two As Long ) As Long
            #Register None
            Dim Q() As Static Dword ' Random x seeds
            Static i, j, Multiplier, BeenHere, Result As Long
            Static cary As Dword ' Random And < Multiplier
            Static q0Address As Byte Ptr
            
              If IsFalse BeenHere Then ' Initialize Q() And carry
                
                i = 4095
                
                ReDim Q(0 To i) As Static Dword
                q0Address = VarPtr( Q(0) )
                CryptGenRandom( hProv, 16384, VarPtr( Q(0) ) ) ' Random x seeds
            
                For j = 0 To i
                  If Q(j) = 4294967295 Then Decr Q(j)
                  ' x seeds should be less than 2^32-1
                Next
            
                Multiplier = 18782     ' Period = 2^131086
               
                CryptGenRandom(hProv, 4, VarPtr(cary))
                cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
                j = i
                BeenHere = %True 
            
              End If
              
              !mov eax, i
              !mov ecx, j
              !Add eax, 1     ;i + 1
              !And eax, ecx   ;And j
              !mov i, eax
            
              !mov ecx, q0Address ' Addrees Of Q(0)
              !mov eax, [ecx+eax*4] ' Contents Of Q(i)
            
              !mov edx, Multiplier
              !mul edx        ;Multiplier * Q(i)
              !mov ecx, cary
              !Add eax, ecx   ;+ cary 
              !adc edx, 0     ;In Case there Is a carry Of 1, Add it.
              !mov cary, edx  ; store cary
              !Add eax, edx   ; x = t + cary
              !jc short incr2vars
              !cmp eax, -1
              !je short incr2vars
            
            incrReturn:
              ' The CMWC aspect (Base-1) - x                
              !mov edx, &H0FFFFFFFE
              !Sub edx, eax   ;edx = &H0FFFFFFFE - x
              !mov eax, i
              !mov ecx, q0Address ' Adress Of Q(0)
              !mov [ecx+eax*4], edx ' Store Q(i)
              
              !mov ecx, One        ; NOW, evaluate parameters.
              !mov eax, Two 
              !cmp ecx, eax        ; Is One > Two?
              !jl short Now1LT2    ; jump over Swap, already Two > One
              !xchg eax, ecx       ; Had To Switch them, now ecx has LOWER bound As it should
              
            Now1LT2:
              !Sub eax, ecx        ; now eax Is Range    [ Range (before +1) ]
              !inc eax             ; Add 1 To Range. Result Is correct For two params, And 0 If Max Range
              !jz short doTheRnd   ; jump If we incremented &hFFFFFFFF up To 0; we have maximum Range
              ' At This Point ECX = lower,  EAX = Range, EDX = Random
              !mul edx             ;'Random * Range+1 == edx:eax. Use edx As result As If /(2^32).
              !Add edx, ecx        ;'Add lower bound To rand result In edx
            doTheRnd:
              !mov Result, edx
            
              !jmp short Done
            
            Incr2vars:
              !Add eax, 1 'Incr x
              !Add cary, 1 'Incr cary
              !jmp short incrReturn
              
            Done:
              
            End FastProc = Result

            Comment


            • #7
              Back in post #5 we have
              t = Rnd => 30 million per second with a period of 2^32

              { CMWCFF(4096)
              t = RandomFloat } => 37 million per second with a period of 2^131086
              Inside the loop I had tot += for one particular test, msg += for another test, t1 and t2 for a Monte Carlo test and comparisons with RND.

              When I did the above test I must have left a statement uncommented because CMWCFF(4096) actually knocks out about 50 million per second. Bit of a difference.

              I came across a paper recently which questioned the accepted method of converting 32-bit random integers to double precision floating point [0,1); the accepted method being to simply multiply by 1/2^32 ie 0.000000000232830643653869628906. The proposal was to use a signed 32-bit value, multiply by 1/2^32 and then add 0.5.

              Of course, a signed 32-bit is what we know as a LONG and the vast majority of us know that LONGs are the fastest integers in PowerBASIC. The reason is that LONGs are the fastest integers in the 32-bit Intel Architecture.

              So, with CMWCF(4096), which returns LONGs, I used t = CMWCF(4096)*0.0000000002328306437080797375 + 0.5; remembering that we multiply by 1/(2^32-1).

              The return jumped from 50 million per second to 61.5 million per second and that is using BASIC!

              I then replaced the existing asm section
              Code:
              !mov value, edx
              !fild Qword value ; unsigned edx into st(0)
              !fld mult         ; mult into st(0), unsigned edx into st(1)
              !fmulp st(1), st  ; multiply into st(1) Then pop, value now In st(0)
              !fstp RandomFloat ; st(0) into RandomFloat
              with this
              Code:
              !mov value, edx
              !fild Dword value ; Signed edx into st(0)
              !fld mult         ; mult into st(0), Signed edx into st(1)
              !fmulp st(1), st  ; multiply into st(1) Then pop, value now In st(0)
              !fld half         ; half into st(0), st(0) into st(1)
              !faddp st(1), st  ; Add into st(1) Then pop, value now In st(0)
              !fstp RandomFloat ; st(0) into RandomFloat
              which looks lke it should be slower especially when adding 0.5.

              However, allowing the CPU to play with signed 32-bit, it's favourite, as opposed to unsigned 32-bit gives a return of 79.3 million per second.

              So, we now have

              t = Rnd => 30 million per second with a period of 2^32

              { CMWCFF(4096)
              t = RandomFloat } => 79 million per second with a period of 2^131086

              and implemented by

              Global RandomFloat As Ext

              Execute CMWCFF(4096), or whatever lag is to be used.

              and then use RandomFloat.

              Added: I forgot to mention that 'Static mult As Ext' has been replaced with 'Static mult, half As Ext' and 'half = 0.5' has been added to the 'If IsFalse BeenHere Then' section.
              Last edited by David Roberts; 30 Jun 2012, 04:57 PM.

              Comment


              • #8
                STOP THE PRESS on the last post.

                ;remembering that we multiply by 1/(2^32-1).
                is causing a problem.

                If 'value' is 2^31 then we get -1.16415320688522E-10 ie negative.

                There is probably a simple solution but it evades me at the moment.

                I found it doing a Monte Carlo to estimate PI. Previously, RND and CMWC were 'neck and neck'. Since RND passes muster as a PRNG then any competitor which equals it, Diehard and ent.exe et all, is OK too but the above post saw RND winning on each and every trial.

                Comment


                • #9
                  It took me a while but I cracked it. RND and CMWC alternated as being the better at estimating PI getting back to 'neck and neck'.

                  However, the additional code, even in asm, was very expensive and the output dropped back to 50 odd million per second giving then no gain.

                  Cannot win them all and 50 million per second is still better than a poke in the eye with a sharp stick.

                  It doesn't follow that the signed approach cannot be harnessed with benefit - asm is not my comfort zone - not that I have one anyway but I'm less comfortable in that environment.

                  Comment


                  • #10
                    I have decided to go ahead with the last asm tested.

                    Whilst working on it I noticed that with the original asm we had [0,1), but 0.5 was impossible. With a 2^32 divisor then the position 2^31-1, 2^31, 2^31+1 will see 2^31/2^32 = 0.5; not so when the divisor is 2^32-1.

                    When we use DWORD instead of QUAD and the new code the missing 0.5 now becomes a missing zero so we get (0,1), In fact we get [1.16415323019557E-10, 1 - 1.16415323019557E-10]. With 2^32-1 values, 0 to 2^32-2, the value bang in the middle is exactly 0.5.

                    This is a corollary to a failue - the failure being to be closer to 75 million than 50 million.

                    Since the speed is almost identical to the original code I will use it as the 'spread' is perfectly balanced.

                    Code:
                    1,000,000,000
                    
                    RND  0.4999913283
                    CMWC 0.4999903611
                    Not much in it.

                    This is the Source Code forum so the choice is yours.
                    Code:
                      !mov value, edx
                      !fild Dword value ; Signed edx into st(0)
                      !ftst             ; compare st(0) With 0
                      !fstsw ax         ; Copy Status Word To AX
                      !fwait            ; wait Until above completed
                      !sahf             ; Condition codes To CPU's flag Register
                      !ja skip          ; If above
                      !jz skip          ; If zero
                      !fiadd one        ; Incr st(0)
                    skip:
                      !fld mult         ; mult into st(0), Signed edx into st(1)
                      !fmulp st(1), st  ; multiply into st(1) Then pop, value now In st(0)
                      !fld half         ; half into st(0), st(0) into st(1)
                      !faddp st(1), st  ; Add into st(1) Then pop, value now In st(0)
                      !fstp RandomFloat
                    Last edited by David Roberts; 1 Jul 2012, 07:28 PM.

                    Comment


                    • #11
                      That
                      !ja skip
                      !jz skip
                      can be replaced with
                      !jnbe skip

                      giving a marginal improvement.

                      Some observations:

                      On the Monte Carlo tests I gave RND and CMWC 30 seconds each to come up with an estimate of PI. RND was a tad better with 3.141171 compared with CMWC's 3.141058; the actual value, of course, being 3.141593. Further tests showed that they were pretty much in the same ball park. However, the total loops used by CMWC was only about 3% more than used by RND. The loops were ticking over at about half a million a second with the lion's share of the computation being taken up with number crunching. Removing the request for random numbers, using fixed values assigned before the loop, the total loops used was only about 4% more than used by RND. This scenario is tantamount to a generator knocking out numbers at infinite speed.

                      The fact that CMWC was running at more than 50 million a second compared with RND's 30 million a second was rendered to being almost academic.

                      The total number of loops was about 15 million generating random number pairs giving then about 30 million random numbers used. This is, of course, a drop in the ocean compared with RND's period.

                      With this application of random numbers then there is no argument against using RND.

                      Had the request for random numbers been very much more rapid then there may be a case for using the faster generator.

                      On my machine, doing nothing else but request RND numbers, will see the period off in about 140 seconds. Applications which run for days on end, perhaps weeks, requiring lots of random numbers will need much more than RND.

                      Now, although the above is obvious when considered it is easy to fall into the trap of using 'the latest and greatest' when an old stalwart is more than capable of doing a particular job.

                      This argument is not dissimilar to using assembler in a procedure only to find that in an application's session it didn't make a blind bit of difference compared with the procdeure written in BASIC. On the other hand replacing the opening post's BASIC with assembler, in the engine department, was very worthwhile.

                      I have another need now - an alphanumeric fixed seed on a range but I won't bother to publish the result as there is code enough above to do that.

                      Comment


                      • #12
                        Using 2^32-1 as a divisor also causes a problem with CMWCF, in post #4, as well.

                        Although the return value is a Long we can never get -1.

                        Post #4 can no longer be edited but the correction is minimal.

                        Introduce 'Static result As Long' and change the end of the function from
                        Code:
                        Done:
                          
                        End FastProc = Q(i)
                        to
                        Code:
                        Done:
                        result = Q(i)
                        If result < 0 Then Incr result
                        
                        End FastProc = result
                        The return value is now within [-2147483647, 2147483647] ie [-(2^31-1), 2^31-1] with 2^32-1 possible values and zero bang in the middle. We now only get 64 million per second.

                        Comment


                        • #13
                          A little while back I came across some CMWC4096 code here. I didn't pay much attention because John Gleason had written some assembler to replace my BASIC and the final version took into account Marsaglia's last thoughts on the adjustments which no one else seems to have done; at least publicly. However, the method employed to convert the generated Dword to [0,1) was very different to ours but I had no idea what it was doing. Don't things like that nag?

                          I went back a few times and the penny finally dropped. If the Dword is treated as an IEEE single precision binary floating point then the high order bit is the sign bit, the next 8 bits are the exponent and the next 23 bits are the fraction. If we take 23 random bits from the 32 random bits generated plus the implicit bit then we have the fraction and it will be in [0,1). All we do then is convert to decimal. For double precision we need 52 bits for the fraction but we haven't got that so we execute CMWC twice to get 64 random bits. With our approach we take the whole 32 bits and multiply by 1/(2^32-1).

                          The first attempt at the double precision code was to have the Dword generator as an asm subroutine and call it twice. However, since we are only calling it twice and we are using very little memory I opted to duplicate the generator and drop through both avoiding any calling. This was worthwhile.

                          If we consider how the lag principle works we see that the higher lags need more seeds and they give longer periods but, other than that, there is no difference between a lag~2 and all the other higher lags. If seeding is not an issue, which it isn't if we use the Microsoft cryptographic generator, then there is no case against using the highest lag. On my machine the lag~4096 is initialised in very much less than 1ms. So, I have removed the optional lag and hard wired a lag~4096. The FastProc procedure now has no input parameters and returns nothing either; other than loading a global variable before exiting. The need for a cryptographic service provider has also been removed by using RtlGenRandom. However, this will not work for pre Win XP.

                          I contacted the author of the code in the link above and told him what I was about and he came back with "Feel free to use what you can." So, thanks again, Wilbert.

                          Now, some figures: The single precision procedure knocks out 135 million numbers a second. The double precision procedure knocks out 74 million numbers a second.

                          If you want to test these figures then I would suggest that you do not run the single precision procedure too long otherwise you may have some broken windows.

                          With the 30 second Monte Carlo test the double precision procedure gave one more significant figure accuracy compared with RND, CMWCFF and the new single precision procedure; which were all in the same ball park.

                          In keeping with the above naming convention the single precision procedure has been called CMWCFS and the double precision procedure has been called CMWCFD.

                          To my mind CMWCFF is now redundant - it is slower than CMWCFD.

                          Usage:

                          Single

                          Global RandomSingle as Single

                          Execute CMWCFS and then use RandomSingle.

                          Double

                          Global RandomDouble as Double

                          Execute CMWCFD and then use RandomDouble.

                          Code:
                          FastProc CMWCFS
                          Dim Q() As Static Dword ' Random x seeds
                          Static i, BeenHere As Long
                          Static cary As Dword ' Random And < Multiplier
                          Static q0Address As Byte Ptr
                           
                            If IsFalse BeenHere Then ' Initialize Q() And carry
                           
                              ReDim Q(0 To 4095) As Static Dword
                              RtlGenRandom( Q(0), 16384 ) ' Random x seeds
                           
                              For i = 0 To 4095
                                If Q(i) = 4294967295 Then Decr Q(i)
                                ' x seeds should be less than 2^32-1
                              Next
                           
                              RtlGenRandom( cary, 4 )
                              cary = cary * 18781/4294967295 ' Make cary < Multiplier
                              q0Address = VarPtr( Q(0) )
                              i = 4095
                              BeenHere = %True
                           
                            End If
                           
                            !mov eax, i
                            !mov ecx, 4095
                            !Add eax, 1     ;i + 1
                            !And eax, ecx   ;And j
                            !mov i, eax
                           
                            !mov ecx, q0Address ' Addrees Of Q(0)
                            !mov eax, [ecx+eax*4] ' Contents Of Q(i)
                           
                            !mov edx, 18782 ; Multiplier
                            !mul edx        ; Multiplier * Q(i)
                            !mov ecx, cary
                            !Add eax, ecx   ; + cary
                            !adc edx, 0     ; In Case there Is a carry Of 1, Add it.
                            !mov cary, edx  ; store cary
                            !Add eax, edx   ; x = t + cary
                            !jc short incr2vars
                            !cmp eax, -1
                            !je short incr2vars
                           
                          incrReturn:
                            ' The CMWC aspect (Base-1) - x
                            !mov edx, &H0FFFFFFFE
                            !Sub edx, eax   ;edx = &H0FFFFFFFE - x
                            !mov eax, i
                            !mov ecx, q0Address ' Adress Of Q(0)
                            !mov [ecx+eax*4], edx ' Store Q(i)
                          
                          ' Wilbert's conversion
                           
                            !movd xmm0, edx
                            !psrlq xmm0, 9           ; Shift Right by 9 Bits
                            !mov edx, 1
                            !cvtsi2ss xmm1, edx      ; convert To Single precision
                            !por xmm0, xmm1          ; Or
                            !subss xmm0, xmm1        ; subtract
                            !movd RandomSingle, xmm0
                           
                            Exit FastProc
                           
                          Incr2vars:
                            !Add eax, 1 'Incr x
                            !Add cary, 1 'Incr cary
                            !jmp short incrReturn
                           
                          End FastProc
                          Code:
                          FastProc CMWCFD
                          Dim Q() As Static Dword ' Random x seeds
                          Static i, BeenHere As Long
                          Static cary As Dword ' Random And < Multiplier
                          Static q0Address As Byte Ptr
                           
                            If IsFalse BeenHere Then ' Initialize Q() And carry
                           
                              ReDim Q(0 To 4095) As Static Dword
                              RtlGenRandom( Q(0), 16384 ) ' Random x seeds
                           
                              For i = 0 To 4095
                                If Q(i) = 4294967295 Then Decr Q(i)
                                ' x seeds should be less than 2^32-1
                              Next
                           
                              RtlGenRandom( cary, 4 )
                              cary = cary * 18781/4294967295 ' Make cary < Multiplier
                              q0Address = VarPtr( Q(0) )
                              i = 4095
                              BeenHere = %True
                           
                            End If
                           
                            !mov eax, i
                            !mov ecx, 4095
                            !Add eax, 1     ;i + 1
                            !And eax, ecx   ;And j
                            !mov i, eax
                           
                            !mov ecx, q0Address ' Addrees Of Q(0)
                            !mov eax, [ecx+eax*4] ' Contents Of Q(i)
                           
                            !mov edx, 18782 ; Multiplier
                            !mul edx        ; Multiplier * Q(i)
                            !mov ecx, cary
                            !Add eax, ecx   ; + cary
                            !adc edx, 0     ; In Case there Is a carry Of 1, Add it.
                            !mov cary, edx  ; store cary
                            !Add eax, edx   ; x = t + cary
                            !jc short incr2vars1
                            !cmp eax, -1
                            !je short incr2vars1
                           
                          incrReturn1:
                            ' The CMWC aspect (Base-1) - x
                            !mov edx, &H0FFFFFFFE
                            !Sub edx, eax   ;edx = &H0FFFFFFFE - x
                            !mov eax, i
                            !mov ecx, q0Address ' Adress Of Q(0)
                            !mov [ecx+eax*4], edx ' Store Q(i)
                           
                            !movd xmm0,edx
                           
                            !jmp short NextNumber
                           
                          Incr2vars1:
                            !Add eax, 1 'Incr x
                            !Add cary, 1 'Incr cary
                            !jmp short incrReturn1
                           
                          NextNumber:
                            !mov eax, i
                            !mov ecx, 4095
                            !Add eax, 1     ;i + 1
                            !And eax, ecx   ;And j
                            !mov i, eax
                           
                            !mov ecx, q0Address ' Addrees Of Q(0)
                            !mov eax, [ecx+eax*4] ' Contents Of Q(i)
                           
                            !mov edx, 18782 ; Multiplier
                            !mul edx        ; Multiplier * Q(i)
                            !mov ecx, cary
                            !Add eax, ecx   ; + cary
                            !adc edx, 0     ; In Case there Is a carry Of 1, Add it.
                            !mov cary, edx  ; store cary
                            !Add eax, edx   ; x = t + cary
                            !jc short incr2vars2
                            !cmp eax, -1
                            !je short incr2vars2
                           
                          incrReturn2:
                            ' The CMWC aspect (Base-1) - x
                            !mov edx, &H0FFFFFFFE
                            !Sub edx, eax   ;edx = &H0FFFFFFFE - x
                            !mov eax, i
                            !mov ecx, q0Address ' Adress Of Q(0)
                            !mov [ecx+eax*4], edx ' Store Q(i)
                           
                            !movd xmm1, edx
                           
                          ' Wilbert's conversion
                          
                            !punpckldq xmm0, xmm1
                            !psrlq xmm0, 12
                            !mov eax, 1
                            !cvtsi2sd xmm1, eax
                            !por xmm0, xmm1
                            !subsd xmm0, xmm1
                            !movq RandomDouble, xmm0
                           
                            Exit FastProc
                           
                            Incr2vars2:
                            !Add eax, 1 'Incr x
                            !Add cary, 1 'Incr cary
                            !jmp short incrReturn2
                           
                          End FastProc
                          Last edited by David Roberts; 8 Jul 2012, 04:42 AM.

                          Comment

                          Working...
                          X