Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

Xorshift PRNG

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PBWin/PBCC Xorshift PRNG

    Note: Compilers 10 or 6 are required and OS >= Windows Vista

    This is a follow on from Xorshift128+ PRNG.

    The Xorshift pseudorandom number generator is a method invented by George Marsaglia. Marsaglia also invented the Multiply-with-carry generator (MWC). There were some issues with MWC and, over the years, Marsaglia addressed them resulting in lag-r Complementary-multiply-with-carry (CMWC) base 2^32-1 and that is the basis of my CMWC256.

    There were some issues with Xorshift but Marsaglia left it to to others to take up the batten seen at Xorshift. The following implements the method described in the last link in the references and is called Xorshift128+. Since then, sometime in 2016, the author, Sebastiano Vigna, and David Blackman have introduced a variant called Xoroshiro128+. [3]

    Earlier xorshift generators had issues but the "xorshift128+ generator that is currently the fastest full-period generator we are aware of that does not fail systematically any BigCrush test (not even reversed)..." [1]. Xoroshiro128+ equalled that and then gave "a significant improvement in statistical quality, as detected by the long-range tests of PractRand." [2].

    With '%xorshift = -1', the first line of the following source code, the following code will compile to Xorshift128+.sll and with '%xorshift = 0' it will compile to Xoroshiro128.sll.

    CMWC256 has a period of 2^8208 reflecting a lag-256. Smaller and larger periods can be created but a lag-256 takes, in my opinion, a comfortable time to seed the generator. User seeding takes about 1.45ms and system seeding takes about 7.5ms. The generators here take about 75µs for both seeding methods. Note that is micro-seconds.

    Both xorshift methods have a period of 2^128-1. RND has a period of 2^32 and my machine can compute all singles in less than one minute. 2^128 may not seen much larger than RND but 2^128 = (2^32)^4.

    2^128 is not easy for me to comprehend.

    2^128 = 3.4 x 10^37

    That I can comprehend and that is large.

    In fact, it would take my machine, i7-3770K @ 3.5GHz, 4,829,532,708,142 X The age of the universe (at 29/05/2016 ) to complete the period.

    "...purely linearly recurrent generators with a very large state space need a very long time to get from an initial state with a small number of ones to a state in which the ones are approximately half." [1]. We are not using a very large state space but, nonetheless, a 'warm up' period is required. 'Burning' 200 64 bit values is more than enough for our purposes. On my machine that takes about 10 nano-seconds. Not enough time to make a cup of tea. A 'warm up' is undertaken after any seeding.

    Speed tests

    Code:
    CMWC256
     
    .S  = 1.77 x RND
    .SX = 1.72 x RND, 1.97 x 2*RND-1
    .D  = 1.23 x RND
    .DX = 1.18 x RND
    .R(a,b)  = 3.85 x RND(a,b)
     
    Xorshift128+
     
    .S  = 2.24 x RND
    .SX = 2.14 x RND, 2.45 x 2*RND-1
    .D  = 2.03 x RND
    .DX = 1.93 x RND
    .R(a,b)  = 4.67 x RND(a,b)
     
    Xoroshiro128+
     
    .S  = 2.23 x RND
    .SX = 2.14 x RND, 2.45 x 2*RND-1
    .D  = 1.93 x RND
    .DX = 1.81 x RND
    .R(a,b)  = 4.5 x RND(a,b)
    Added: The syntax is the same as CMWC256. If you are not acquainted with that here is the meaning.

    .S Single precision [0,1)
    .SX Single precision [-1,1)
    .D Double precision [0,1)
    .DX Double precision [-1,1)
    .R(a,b) Same as PowerBASIC

    "It looks like Xoroshiro is the best general purpose algorithm currently available. Low memory (just 128 bits of storage), extremely high performance (1.2 nanoseconds per 64-bit number, after subtracting baseline overheads) and very well distributed (beating other algorithms on a range of automated tests)." [4].

    I made a mistake in the opening link - I was comparing apples with oranges - and overestimated the speed of Xorshift128+. The above test used the same program for all three generators using their respective libraries. The figures represent an average of 20 runs.

    The reason why CMWC256's double precision speed is much less than its single precision speed is because it is a 32 bit generator so we have to go past the 'engine' twice to get two dwords. The xorshift generators are 64 bit so one pass of the engine is enough. With the xorshift generators we go past the engine once for two single precision number requests; one dword is used and the other stored for the next request.

    Xorshift128+ is over 26% faster than CMWC256 coming in, on my machine, at 186 million singles per second compared with RND at 83 million singles per second. Given that there is very little difference in speed between Xorshift128+ and Xoroshiro128+ and because of the latter's "significant improvement in statistical quality". [2] then if I were to use a xorshift generator it would be Xoroshiro128+.

    There is another method: .DW. That generates Dwords - I used that to dump data to file for testing.

    Seeding is done via .DefaultSeed, .UserSeed("[Your string here]") and .SystemSeed

    If no seeding is used a default seed, the same as .DefaultSeed, is set up during the object's creation.

    The user string, which may be an empty string, is digested with MD5 to populate the state. Although MD5 is no longer recommended for cryptographic purposes it is OK in our context. Any function will do which gives the same output for a given input. The fact that two different input strings may give the same output is irrelevant. MD5 produces 128 bits, just what we need for a 128 bit state.

    .SystemSeed populates the state with random numbers from the Windows API BCryptGenRandom and why 'OS >= Windows Vista' is required. For 'OS < Windows Vista' then simply replace with CryptGenRandom which has been around since, at least, Windows 95.

    For Xorshift128+ we use

    Code:
    #Link "C:\PBWin\Xorshift\Xorshift128+.sll" ' Your path here
     
    Global Rand As IXorshift
     
    Function PBMain
     
      Rand = Class "Xorshift"
    and for Xoroshiro128+ we use

    Code:
    #Link "C:\PBWin\Xorshift\Xoroshiro128+.sll" ' Your path here
     
    Global Rand as IXoroshiro
     
    Functioin PBMain
     
      Rand = Class "Xoroshiro"
    or whatever you wish to use for 'Rand'.

    Here is a small PBCC program to show usage and the seeding methods.

    Code:
    #Compile Exe
    #Dim All
    #Register None
     
    #Link "C:\PBWin\Xorshift\Xoroshiro128+.sll" ' Your path here
    '#Link "C:\PBWin\Xorshift\Xorshift128+.sll" ' Your path here
     
    Global Rand As IXoroshiro
     
    Function PBMain As Long
    Local i As Long
     
      Rand = Class "Xoroshiro"
     
      Print "Default seed"
      Print
      For i = 1 To 5
        Print Rand.S
      Next
      Print
      Print "User seed: PowerBASIC"
      Print
      Rand.UserSeed("PowerBASIC")
      For i = 1 To 5
        Print Rand.S
      Next
      Print
      Print "System seed"
      Print
      Rand.SystemSeed
      For i = 1 To 5
        Print Rand.S
      Next
      Print
      Print "User seed: PowerBASIC"
      Print
      Rand.UserSeed("PowerBASIC")
      For i = 1 To 5
        Print Rand.S
      Next
      Print
      Print "Default seed"
      Print
      Rand.DefaultSeed
      For i = 1 To 5
        Print Rand.S
      Next
     
      WaitKey$
     
    End Function
    If you do not have PBCC here is a typical output:

    Code:
    Default seed
     
     .3211236
     1.512289E-2
     .2400187
     .6792868
     .2981217
     
    User seed: PowerBASIC
     
     .6273323
     .9427909
     .6490459
     .1646218
     .9015744
     
    System seed
     
     .9760034
     .8830762
     .6781559
     .7170306
     .5183147
     
    User seed: PowerBASIC
     
     .6273323
     .9427909
     .6490459
     .1646218
     .9015744
     
    Default seed
     
     .3211236
     1.512289E-2
     .2400187
     .6792868
     .2981217
    If an application required only a small number of random numbers then RND is more than adequate. It passes all Die Hard tests and gets good results using Fourmilab's ENT. I have no idea how well it performs using BigCrush but then a small number of random numbers would not exhibit any failings in quality anyway and the fact that its speed is less than the three generators tested here would be academic for a small number of random numbers required.

    So, what is a small number of random numbers? I don't know.

    The two xorshift libraries do not have as many methods as CMWC256 but they can be added if requested.

    Both libraries are quite small: Xorshift128+.sll ( 8.61KB ), Xoroshiro128+.sll ( 8.99KB ).

    The source code is in the following post.

    Comments, if any, here.

    Have fun.

    CAVEAT: The following code is not thread safe; and nor is RND.

    References

    [1] Further scramblings of Marsaglia's xorshift generators ( pdf download )

    [2] xoroshiro+ / xorshift* / xorshift+ generators and the PRNG shootout

    [3] Xoroshiro128+

    [4] Random number generators in Swift
    Last edited by David Roberts; 2 Jun 2016, 07:49 AM.

  • #2
    Code:
    %xorshift = 0
    
    #If %xorshift
      #Compile SLL "Xorshift128+
    #Else
      #Compile SLL "Xoroshiro128+"
    #EndIf
    #Dim All
    #Register None
    
    #Include "Win32API.inc"
    
    Macro Engine
    
      ' xorshift128+ engine
    
      ' uint64_t s[2];
      ' uint64_t next(void) {
      ' uint64_t s1 = s[0];
      ' const uint64_t s0 = s[1];
      '
      ' s[0] = s0;
      ' s1 ^= s1 << 23; // a
      ' s[1] = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5); // b, c
      ' return s[1] + s0;
      ' }
    
      #If %xorshift
    
        Prefix "ASM"
          mov eax, BigS0Ptr
          movq xmm1, [eax] ' s1 = s[0]
          mov edx, BigS1Ptr
          movq xmm0, [edx] ' s0 = s[1]
          movq [eax], xmm0 ' s[0] = s0
          movq xmm7, xmm1 ' temp1 = s1
          psllq xmm7, 23 ' temp1 << 23
          pxor xmm1, xmm7 ' s1 ^= s1 << 23
          movq xmm7, xmm1 ' temp1 = s1
          psrlq xmm7, 18 ' temp1 >> 18
          movq xmm6, xmm0 ' temp2 = s0
          psrlq xmm6, 5 ' temp2 >> 5
          pxor xmm1, xmm0 ' s1 = s1 ^ s0
          pxor xmm1, xmm7 ' s1 = s1 ^ s0 ^ (s1 >> 18)
          pxor xmm1, xmm6 ' s1 = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5)
          mov eax, BigS1Ptr
          movq [eax], xmm1
          paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
        End Prefix
    
      #Else
    
      ' xoroshiro128+ engine
    
      ' uint64_t s[2];
    
      ' static inline uint64_t rotl(const uint64_t x, int k) {
      '   return (x << k) | (x >> (64 - k));
      ' }
      '
      ' uint64_t next(void) {
      ' const uint64_t s0 = s[0];
      ' uint64_t s1 = s[1];
      ' const uint64_t result = s0 + s1;
      '
      ' s1 ^= s0;
      ' s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
      ' s[1] = rotl(s1, 36); // c
      '
      ' return result;
      ' }
    
        Prefix "ASM"
          mov eax, BigS0Ptr
          movq xmm0, [eax] ' s0 = s[0]
          mov eax, BigS1Ptr
          movq xmm1, [eax] ' s1 = s[1]
          pxor xmm1, xmm0  ' s1 = s1^s0
    
        'Rotate s0
          movq xmm3, xmm0
          movq xmm2, xmm0 ' temp2 = s0
          psllq xmm3, 55  ' s0 = s0 << 55
          psrlq xmm2, 9   ' temp2 >> 9
          por xmm3, xmm2  ' s0 = s0 | temp2
    
          pxor xmm3, xmm1 ' s0 = s0 ^ s1
          movq xmm2, xmm1 ' temp2 = s1
          psllq xmm2, 14  ' temp2 = s1 << 14
          pxor xmm3, xmm2 ' s0 = s0 ^ temp2
          mov eax, BigS0Ptr
          movq [eax], xmm3
    
        'Rotate s1
          movq xmm3, xmm1
          movq xmm2, xmm1 ' temp3 = s1
          psllq xmm3, 36  ' s1 = s1 << 36
          psrlq xmm2, 28  ' temp3 >> 28
          por xmm3, xmm2  ' s1 = s1 | temp3
    
          mov eax, BigS1Ptr
          movq [eax], xmm3
    
          paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
        End Prefix
    
      #EndIf
    End Macro
    
    Global BigS0Ptr, BigS1Ptr As Dword Ptr
    Global Keep As Dword
    Global LittleCrunch As Long
    
    #If %xorshift
      Class Xorshift Common
    #Else
      Class Xoroshiro Common
    #EndIf
    
      Instance qState() As Quad
    
      Class Method Create
        ReDim qState(0 To 3) As Instance Quad
        Local i As Long
        BigS0Ptr = VarPtr( qState(0) )
        BigS1Ptr = VarPtr( qState(2) )
        qState(0) = &HFFFFFFFFFFFFFFFF
        qState(1) = 0
        qState(2) = &HFFFFFFFFFFFFFFFF
        qState(3) = 0
        LittleCrunch = %False
        For i = 1 To 200        ' Warm up
          Engine
        Next
    
      End Method
    
    #If %xorshift
      Interface IXorshift
    #Else
      Interface IXoroshiro
    #EndIf
    
      Inherit IUnknown
    
      Method S As Single
        !xor LittleCrunch, 1
        !jz JustOneS
        Engine
        !movd Keep, xmm0
        !psrlq xmm0, 32
        !jmp short DoneS
      JustOneS:
        !movd xmm0, Keep
      DoneS:
      ' Single precision conversion by Wilbert at the PureBasic forums
        !psrlq xmm0, 9
        !mov edx, 1
        !cvtsi2ss xmm1, edx
        !por xmm0, xmm1
        !subss xmm0, xmm1
        !movd method, xmm0
    
      End Method
    
      Method SX As Single
        !xor LittleCrunch, 1
        !jz JustOneSX
        Engine
        !movd Keep, xmm0
        !psrlq xmm0, 32
        !jmp short DoneSX
      JustOneSX:
        !movd xmm0, Keep
      DoneSX:
        !psrlq xmm0, 9
        !mov edx, 2
        !cvtsi2ss xmm1, edx
        !por xmm0, xmm1
        !subss xmm0, xmm1
        !mov edx, 1
        !cvtsi2ss xmm1, edx
        !subss xmm0, xmm1
        !movd Method, xmm0
    
      End Method
    
      Method D As Double
        Engine
        ' Double precision conversion by Wilbert at the PureBasic forums
        !psrlq xmm0, 12
        !mov eax, 1
        !cvtsi2sd xmm1, eax
        !por xmm0, xmm1
        !subsd xmm0, xmm1
        !movq Method, xmm0
    
      End Method
    
      Method DX As Double
        Engine
        !psrlq xmm0, 12
        !mov eax, 2
        !cvtsi2sd xmm1, eax
        !por xmm0, xmm1
        !subsd xmm0, xmm1
        !mov eax, 1
        !cvtsi2sd xmm1, eax
        !subsd xmm0, xmm1
        !movq Method, xmm0
    
      End Method
    
      Method R( ByVal One As Long, ByVal Two As Long ) As Long
        !xor LittleCrunch, 1
        !jz JustOneR
        Engine
        !movd Keep, xmm0
        !psrlq xmm0, 32
        !jmp short DoneR
      JustOneR:
        !movd xmm0, Keep
      DoneR:
        !movd edx, xmm0
      ' Range conversion by John Gleason at the PowerBASIC forums
        !mov ecx, One        ; NOW, evaluate parameters.
        !mov eax, Two
        !cmp ecx, eax        ; Is One > Two?
        !jl short Now1LT2    ; jump over swap, already Two > One
        !xchg eax, ecx       ; Had to switch them, now ecx has LOWER bound as it should
    
        Now1LT2:
        !Sub eax, ecx        ; now eax is Range    [ Range (before +1) ]
        !inc eax             ; Add 1 To Range. Result Is correct for two params, and 0 If Max Range
        !jz short doTheRnd   ; jump if we incremented &hFFFFFFFF up to 0; we have maximum Range
        ' At This Point ECX = lower,  EAX = Range, EDX = Random
        !mul edx             ;'Random * Range+1 == edx:eax. Use edx as result as If /(2^32).
        !Add edx, ecx        ;'Add lower bound to rand result In edx
        doTheRnd:
        !mov method, edx
    
      End Method
    
      Method DW As Dword
        !xor LittleCrunch, 1
        !jz JustOneDW
        Engine
        !movd Keep, xmm0 ' Keep first dword
        !psrlq xmm0, 32  ' Make the second dword the first dword
        !movd method, xmm0
        Exit Method
      JustOneDW:
        Method = Keep
    
      End Method
    
      Method DefaultSeed
      Local i As Long
        qState(0) = &HFFFFFFFFFFFFFFFF
        qState(1) = 0
        qState(2) = &HFFFFFFFFFFFFFFFF
        qState(3) = 0
        LittleCrunch = %False
        For i = 1 To 200        ' Warm up
          Engine
        Next
    
      End Method
    
      Method UserSeed( ByVal strText As String )
      Local hHashAlg, pbHash As Dword
      Local sBinHash As String
      Local i As Long
    
        BCryptOpenAlgorithmProvider( hHashAlg, "MD5"$$, $$Nul, 0)
        Do
          sBinHash = Hash( hHashAlg, pbHash, StrPtr(strText), Len(strText), %True)
          Poke$ BigS0Ptr, sBinHash
        Loop Until qState(0) <> 0 And qState(2) <> 0
        ' Generating a zero state is unlikely but not impossible
        BCryptCloseAlgorithmProvider( hHashAlg, 0 )
        qState(2) = qState(1)
        LittleCrunch = %False
        For i = 1 To 200        ' Warm up
          Engine
        Next
    
      End Method
    
      Method SystemSeed
      Local hRand As Dword, i As Long
    
      ' Cryptographically populate the State
        BCryptOpenAlgorithmProvider(hRand, $$BCRYPT_RNG_ALGORITHM, $$Nul, 0) ' Prepare For Random number generation
        Do
          BCryptGenRandom(hRand, VarPtr( qState(0) ), 8, 0)
          BCryptGenRandom(hRand, VarPtr( qState(2) ), 8, 0)
        Loop Until qState(0) <> 0 And qState(2) <> 0
        ' Generating a zero state is unlikely but not impossible
        BCryptCloseAlgorithmProvider(hRand, 0)
        LittleCrunch = %False
        For i = 1 To 200        ' Warm up
          Engine
        Next
    
      End Method
    
    End Interface
    
    End Class
    
    Function Hash( ByVal hAlg As Dword, phHash As Dword, ByVal lAddress As Long Ptr, ByVal lLength As Long, ByVal Final As Long, Opt pbSecret As String ) As String
    ' Input
    ' hAlg As given by BCryptOpenAlgorithmProvider
    ' lAddress Is address Of message To be hashed
    ' lLength Is length Of message
    ' Final Is false If further messages are To be 'hashed In' Else true.
    ' pbSecret Is secret HMAC key
    ' Input/Output
    ' phHash Is zero On first pass Of Hash() And As given by BCryptCreateHash On second And subsequent passes
    
      If IsFalse phHash Then
        If IsMissing( pbSecret ) Then
          BCryptCreateHash( hAlg, phHash, %Null, 0, 0, 0, 0 )
        Else
          BCryptCreateHash( hAlg, phHash, ByVal %Null, 0, StrPtr( pbSecret ), Len( pbSecret ), 0 )
        End If
      End If
    
      BCryptHashData( phHash, lAddress, lLength, 0 )
    
      If IsTrue Final Then
        Local lHashLength, lResult As Long
        Local sBinHash As String
        BCryptGetProperty( hAlg, $$BCRYPT_HASH_LENGTH, VarPtr( lHashLength ), 4, lResult, 0 )
        sBinHash = Space$( lHashLength )
        BCryptFinishHash( phHash, StrPtr( sBinHash ), lHashLength, 0 )
        BCryptDestroyHash( phHash )
        Reset phHash ' Ensures that Next time Hash() Is used a hash Object Is created
        Function = sBinHash
      End If
    
    End Function
    

    Comment


    • #3
      Improved timimgs.

      Code:
      Xorshift128+
       
      .S  = 2.31 x RND
      .SX = 2.19 x RND, 2.51 x 2*RND-1
      .D  = 2.07 x RND
      .DX = 1.96 x RND
      .R(a,b)  = 4.72 x RND(a,b)
       
      Xoroshiro128+
       
      .S  = 2.28 x RND
      .SX = 2.17 x RND, 2.49 x 2*RND-1
      .D  = 2.02 x RND
      .DX = 1.78 x RND
      .R(a,b)  = 4.66 x RND(a,b)
      got by replacing the engines with

      Code:
      Prefix "ASM"
        align 16
        mov eax, BigS0Ptr
        movq xmm1, [eax] ' s1 = s[0]
        mov edx, BigS1Ptr
        movq xmm0, [edx] ' s0 = s[1]
        movq [eax], xmm0 ' s[0] = s0
        movdqa xmm7, xmm1 ' temp1 = s1
        psllq xmm7, 23 ' temp1 << 23
        pxor xmm1, xmm7 ' s1 ^= s1 << 23
        movdqa xmm7, xmm1 ' temp1 = s1
        psrlq xmm7, 18 ' temp1 >> 18
        movdqa xmm6, xmm0 ' temp2 = s0
        psrlq xmm6, 5 ' temp2 >> 5
        pxor xmm1, xmm0 ' s1 = s1 ^ s0
        pxor xmm1, xmm7 ' s1 = s1 ^ s0 ^ (s1 >> 18)
        pxor xmm1, xmm6 ' s1 = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5)
        mov eax, BigS1Ptr
        movq [eax], xmm1
        paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
      End Prefix
      Code:
      Prefix "ASM"
        align 16
        mov eax, BigS0Ptr
        movq xmm0, [eax] ' s0 = s[0]
        mov eax, BigS1Ptr
        movq xmm1, [eax] ' s1 = s[1]
        pxor xmm1, xmm0  ' s1 = s1^s0
       
      'Rotate s0
        movdqa xmm3, xmm0
        movdqa xmm2, xmm0 ' temp2 = s0
        psllq xmm3, 55  ' s0 = s0 << 55
        psrlq xmm2, 9   ' temp2 >> 9
        por xmm3, xmm2  ' s0 = s0 | temp2
       
        pxor xmm3, xmm1 ' s0 = s0 ^ s1
        movdqa xmm2, xmm1 ' temp2 = s1
        psllq xmm2, 14  ' temp2 = s1 << 14
        pxor xmm3, xmm2 ' s0 = s0 ^ temp2
        mov eax, BigS0Ptr
        movq [eax], xmm3
       
      'Rotate s1
        movdqa xmm3, xmm1
        movdqa xmm2, xmm1 ' temp3 = s1
        psllq xmm3, 36  ' s1 = s1 << 36
        psrlq xmm2, 28  ' temp3 >> 28
        por xmm3, xmm2  ' s1 = s1 | temp3
       
        mov eax, BigS1Ptr
        movq [eax], xmm3
       
        paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
      End Prefix
      All that I have done is replace movq with movdqa where I can and add 'align 16'.

      Comment


      • #4
        John Gleason has set the cat amongst the pigeons in the Programming forum when he "stumbled across" PractRand - statistical tests & pseudo- random number generators (RNGs, PRNGs). If we dump 1GB of data, for example, PractRand will do a test on 32MB, 64MB, 128MB, 256MB, 512MB and 1GB.

        It does not take any prisoners and we saw Xorshift128+ failing sometimes with a particular test called BRank.

        I figured that it may be because we were dumping Dwords and PractRand analyses dumps of bytes. I then dumped Rand.R(0,255) and the BRank failures were no more. However, if the Dwords had any bias in them then it may be getting crushed by using an eight bit digest of a 32 bit Dword and Xorshift128+ still uses Dwords for the .S, .SX and .R methods.

        I had no particular reason to do the following but gave it a whirl. Instead of using one Dword of Xorshift128+'s Qword output and keeping the other for the next request the 'tweak' is to Xor the two Dwords and use the result. With this approach PractRand gave me a clean sweep with no anomalies in sight when dumping Dwords.

        From the Programming forum:

        Many moons ago I read that when we Xor two random numbers then the result will be as least as good as the better of the two. I could never find a proof and abandoned the idea. In some code I was Xoring Intel's RdRand and Microsoft's CryptGenrandom.

        Well, I decided to do another search and have found the proof in a 2002 paper (pdf): Exclusive OR (XOR) and hardware random number generators
        By 'at least as good as' I mean with regard unpredictability.

        I did 10 1GB dumps using Rand.SystemSeed and got the following
        Code:
        1  2xu 512
        2  1xu 64, 1xu 128
        3  Clean sweep
        4  Clean sweep
        5  1xu 256
        6  1xu 512
        7  1xu 256
        8  1xu 32, 1xu 64, 1xu 128
        9  2xu 256
        10 1xu 128
        By '2xu 512', for example, I mean two unusual values in the 512MB section.

        To put this into context one 32MB test of PB's RND(0,255) gave 15 failures, 4 very suspicious, 3 suspicious and 2 unusual.

        With the above tests we do not see any worse than unusual. The anomalies were from a variety of tests which is a good sign.

        It would seem then that splitting the Qword allows bias to creep in. Xorshift128+ is, after all, a 64 bit generator. The Xor method greatly mitigates that. If no bias creeps in with two Dwords then Xor is redundant. However, we cannot tell, a priori, whether any bias has crept in or not and we certainly shouldn't do any tests for that otherwise the speed of generation would grind closely to a halt.

        From the Programming forum.
        This is, of course, not a condemnation of Xorshift128+ but a condemnation of how 'muggins here' made use of the 64 bits generated.
        The 'Engine' code has not been altered.

        The speed of Single precision in the original code is down to 'keeping' a Dword. With Xor we lose that. I reckoned that this should put us into the same ball park as Double precision. However, the Xor is slowing us down even further - we do not use that for Double precision.

        This is what we get for Xorshift128+ tweaked.
        Code:
        .S  = 1.828 x RND
         
        .SX = 1.726 x RND, 1.97 x 2*RND-1
         
        .D  = 2.047 x RND
         
        .DX = 1.952 x RND
         
        .R(a,b)  = 4.121 x RND(a,b)
        The Single precision is still faster than CMWC256 but not by much. We now have the 'barmy' situation where Double precision is faster than Single precision.

        The Range has also taken a performance hit because we were 'keeping' a Dword with the original code. However, it is still four times faster than PB's RND Range.

        This is what we get for Xoroshiro128+ tweaked
        Code:
        .S  = 1.687 x RND
         
        .SX = 1.578 x RND, 1.8 x 2*RND-1
         
        .D  = 1.921 x RND
         
        .DX = 1.8 x RND
         
        .R(a,b)  = 3.827 x RND(a,b)
        The Single precision is now a little slower than CMWC256 but the Double precision is faster than CMWC256's Single precision.

        With 10 1GB I got this for Xoroshiro128+
        Code:
        1  Clean sweep
        2  Clean sweep
        3  1xu 32
        4  1xu 1024
        5  1xu 1024
        6  1xu 32, 1xu 64, 1xu 128, 1xu 1024
        7  Clean sweep
        8  1xu 128
        9  1xu 512
        10 Clean sweep
        Here we have twice as many 'Clean sweeps' as Xorshift128+ and , again, no worse than unusual and many of them appear in the larger dumps. The author of both generators rates Xoroshiro128+ ahead of Xorshift128+ but I had not seen any evidence previously and favoured Xorshift128+ on speed grounds.

        In post #3 I invited you to replace some code in post #1. I will not do that with the Xor approach. In the next post is the whole code with post #3's replacement and the Xor approach.

        The two possible SLLs will be called Xorshift128+T and Xoroshiro128+T; the 'T' denoting tweaked.

        For Xorshift128+T we use

        Code:
        #Link "C:\PBWin\Xorshift\Xorshift128+T.sll" ' Your path here
        
        Global Rand As IXorshiftT
        
        Function PBMain
        
        Rand = Class "XorshiftT"
        and for Xoroshiro128+T we use

        Code:
        #Link "C:\PBWin\Xorshift\Xoroshiro128+T.sll" ' Your path here
         
        Global Rand as IXoroshiroT
         
        Function PBMain
         
          Rand = Class "XoroshiroT"
        Thanks to John Gleason for all the analysis work he has done during the development of these generators and, in particular, his introducing me to PractRand.

        There may still be a case for using the original code provided that we do not request large amounts of random numbers. However, I do not know what constitutes 'large'. I am a speed freak but not at the expense of quality. I will be using the tweaked versions no matter what. They are still fast compared with CMWC256 and leave RND standing.

        There is a case for removing the methods .S and .SX, since .D and .DX are faster, but if I did someone may still want them; for some reason.

        For the record my default PRNG is now going to be Xoroshiro128+T.sll using Double precision. There you were thinking that I would stay with Xorshift128+ because it is faster. Having said that I would use my Intel RdRand CSPRNG generator if speed was not an issue and I did not want user seeding.

        Benchmark:

        10 1GB CMWC256
        Code:
        1 1xu 128m, 1xu 256
        2 1xu 256
        3 1xu 512
        4 1xu 128
        5 2xu 32
        6 1xu 32
        7 Clean sweep
        8 1xu 512, 1xu 1024
        9 2xu 64, 1xu 256
        10 Clean sweep
        Added:

        Chi squared does not take any prisoners either.

        10 1GB Xoroshiro128+T ( courtesy of John Gleason's chiSquare2g.exe )
        Code:
        1  49.175%
        2  72.180%
        3  33.917%
        4  89.329%
        5  24.525%
        6  81.813%
        7  66.409%
        8  35.518%
        9  10.468%
        10 20.822%
        No problemo.
        Last edited by David Roberts; 13 Jul 2016, 10:14 AM. Reason: Chi squared test

        Comment


        • #5
          Code:
          %xorshift = 0
          
          #If %xorshift
            #Compile SLL "Xorshift128+T"
          #Else
            #Compile SLL "Xoroshiro128+T"
          #EndIf
          #Dim All
          #Register None
          
          #Include "Win32API.inc"
          
          Macro Engine
          
            ' xorshift128+ engine
          
            ' uint64_t s[2];
            ' uint64_t next(void) {
            ' uint64_t s1 = s[0];
            ' const uint64_t s0 = s[1];
            '
            ' s[0] = s0;
            ' s1 ^= s1 << 23; // a
            ' s[1] = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5); // b, c
            ' return s[1] + s0;
            ' }
          
            #If %xorshift
              Prefix "ASM"
                align 16
                mov eax, BigS0Ptr
                movq xmm1, [eax] ' s1 = s[0]
                mov edx, BigS1Ptr
                movq xmm0, [edx] ' s0 = s[1]
                movq [eax], xmm0 ' s[0] = s0
                movdqa xmm7, xmm1 ' temp1 = s1
                psllq xmm7, 23 ' temp1 << 23
                pxor xmm1, xmm7 ' s1 ^= s1 << 23
                movdqa xmm7, xmm1 ' temp1 = s1
                psrlq xmm7, 18 ' temp1 >> 18
                movdqa xmm6, xmm0 ' temp2 = s0
                psrlq xmm6, 5 ' temp2 >> 5
                pxor xmm1, xmm0 ' s1 = s1 ^ s0
                pxor xmm1, xmm7 ' s1 = s1 ^ s0 ^ (s1 >> 18)
                pxor xmm1, xmm6 ' s1 = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5)
                mov eax, BigS1Ptr
                movq [eax], xmm1
                paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
              End Prefix
          
            #Else
          
            ' xoroshiro128+ engine
          
            ' uint64_t s[2];
          
            ' static inline uint64_t rotl(const uint64_t x, int k) {
            '   return (x << k) | (x >> (64 - k));
            ' }
            '
            ' uint64_t next(void) {
            ' const uint64_t s0 = s[0];
            ' uint64_t s1 = s[1];
            ' const uint64_t result = s0 + s1;
            '
            ' s1 ^= s0;
            ' s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
            ' s[1] = rotl(s1, 36); // c
            '
            ' return result;
            ' }
          
              Prefix "ASM"
                align 16
                mov eax, BigS0Ptr
                movq xmm0, [eax] ' s0 = s[0]
                mov eax, BigS1Ptr
                movq xmm1, [eax] ' s1 = s[1]
                pxor xmm1, xmm0  ' s1 = s1^s0
          
              'Rotate s0
                movdqa xmm3, xmm0
                movdqa xmm2, xmm0 ' temp2 = s0
                psllq xmm3, 55  ' s0 = s0 << 55
                psrlq xmm2, 9   ' temp2 >> 9
                por xmm3, xmm2  ' s0 = s0 | temp2
          
                pxor xmm3, xmm1 ' s0 = s0 ^ s1
                movdqa xmm2, xmm1 ' temp2 = s1
                psllq xmm2, 14  ' temp2 = s1 << 14
                pxor xmm3, xmm2 ' s0 = s0 ^ temp2
                mov eax, BigS0Ptr
                movq [eax], xmm3
          
              'Rotate s1
                movdqa xmm3, xmm1
                movdqa xmm2, xmm1 ' temp3 = s1
                psllq xmm3, 36  ' s1 = s1 << 36
                psrlq xmm2, 28  ' temp3 >> 28
                por xmm3, xmm2  ' s1 = s1 | temp3
          
                mov eax, BigS1Ptr
                movq [eax], xmm3
          
                paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
              End Prefix
          
            #EndIf
          End Macro
          
          Global BigS0Ptr, BigS1Ptr As Dword Ptr
          
          #If %xorshift
            Class XorshiftT Common
          #Else
            Class XoroshiroT Common
          #EndIf
          
            Instance qState() As Quad
          
            Class Method Create
              ReDim qState(0 To 3) As Instance Quad
              Local i As Long
              BigS0Ptr = VarPtr( qState(0) )
              BigS1Ptr = VarPtr( qState(2) )
              qState(0) = &HFFFFFFFFFFFFFFFF
              qState(1) = 0
              qState(2) = &HFFFFFFFFFFFFFFFF
              qState(3) = 0
              For i = 1 To 200        ' Warm up
                Engine
              Next
          
            End Method
          
          #If %xorshift
            Interface IXorshiftT
          #Else
            Interface IXoroshiroT
          #EndIf
          
            Inherit IUnknown
          
            Method S As Single
              Engine
              !movd eax, xmm0
              !psrlq xmm0, 32
              !movd ebx, xmm0
              !xor eax, ebx
              !movd xmm0, eax
            ' Single precision conversion by Wilbert at the PureBasic forums
              !psrlq xmm0, 9
              !mov edx, 1
              !cvtsi2ss xmm1, edx
              !por xmm0, xmm1
              !subss xmm0, xmm1
              !movd method, xmm0
          
            End Method
          
            Method SX As Single
              Engine
              !movd eax, xmm0
              !psrlq xmm0, 32
              !movd ebx, xmm0
              !xor eax, ebx
              !movd xmm0, eax
              !psrlq xmm0, 9
              !mov edx, 2
              !cvtsi2ss xmm1, edx
              !por xmm0, xmm1
              !subss xmm0, xmm1
              !mov edx, 1
              !cvtsi2ss xmm1, edx
              !subss xmm0, xmm1
              !movd Method, xmm0
          
            End Method
          
            Method D As Double
              Engine
              ' Double precision conversion by Wilbert at the PureBasic forums
              !psrlq xmm0, 12
              !mov eax, 1
              !cvtsi2sd xmm1, eax
              !por xmm0, xmm1
              !subsd xmm0, xmm1
              !movq Method, xmm0
          
            End Method
          
            Method DX As Double
              Engine
              !psrlq xmm0, 12
              !mov eax, 2
              !cvtsi2sd xmm1, eax
              !por xmm0, xmm1
              !subsd xmm0, xmm1
              !mov eax, 1
              !cvtsi2sd xmm1, eax
              !subsd xmm0, xmm1
              !movq Method, xmm0
          
            End Method
          
            Method R( ByVal One As Long, ByVal Two As Long ) As Long
              Engine
              !movd edx, xmm0
              !psrlq xmm0, 32
              !movd ebx, xmm0
              !xor edx, ebx
            ' Range conversion by John Gleason at the PowerBASIC forums
              !mov ecx, One        ; NOW, evaluate parameters.
              !mov eax, Two
              !cmp ecx, eax        ; Is One > Two?
              !jl short Now1LT2    ; jump over swap, already Two > One
              !xchg eax, ecx       ; Had to switch them, now ecx has LOWER bound as it should
          
              Now1LT2:
              !Sub eax, ecx        ; now eax is Range    [ Range (before +1) ]
              !inc eax             ; Add 1 To Range. Result Is correct for two params, and 0 If Max Range
              !jz short doTheRnd   ; jump if we incremented &hFFFFFFFF up to 0; we have maximum Range
              ' At This Point ECX = lower,  EAX = Range, EDX = Random
              !mul edx             ;'Random * Range+1 == edx:eax. Use edx as result as If /(2^32).
              !Add edx, ecx        ;'Add lower bound to rand result In edx
              doTheRnd:
              !mov method, edx
          
            End Method
          
            Method DW As Dword
              Engine
              !movd eax, xmm0
              !psrlq xmm0, 32
              !movd ebx, xmm0
              !xor eax, ebx
              !mov method, eax
             End Method
          
            Method DefaultSeed
            Local i As Long
              qState(0) = &HFFFFFFFFFFFFFFFF
              qState(1) = 0
              qState(2) = &HFFFFFFFFFFFFFFFF
              qState(3) = 0
              For i = 1 To 200        ' Warm up
                Engine
              Next
          
            End Method
          
            Method UserSeed( ByVal strText As String )
            Local hHashAlg, pbHash As Dword
            Local sBinHash As String
            Local i As Long
          
              BCryptOpenAlgorithmProvider( hHashAlg, "MD5"$$, $$Nul, 0)
              Do
                sBinHash = Hash( hHashAlg, pbHash, StrPtr(strText), Len(strText), %True)
                Poke$ BigS0Ptr, sBinHash
              Loop Until qState(0) <> 0 And qState(2) <> 0
              ' Generating a zero state is unlikely but not impossible
              BCryptCloseAlgorithmProvider( hHashAlg, 0 )
              qState(2) = qState(1)
              For i = 1 To 200        ' Warm up
                Engine
              Next
          
            End Method
          
            Method SystemSeed
            Local hRand As Dword, i As Long
          
            ' Cryptographically populate the State
              BCryptOpenAlgorithmProvider(hRand, $$BCRYPT_RNG_ALGORITHM, $$Nul, 0) ' Prepare For Random number generation
              Do
                BCryptGenRandom(hRand, VarPtr( qState(0) ), 8, 0)
                BCryptGenRandom(hRand, VarPtr( qState(2) ), 8, 0)
              Loop Until qState(0) <> 0 And qState(2) <> 0
              ' Generating a zero state is unlikely but not impossible
              BCryptCloseAlgorithmProvider(hRand, 0)
              For i = 1 To 200        ' Warm up
                Engine
              Next
          
            End Method
          
          End Interface
          
          End Class
          
          Function Hash( ByVal hAlg As Dword, phHash As Dword, ByVal lAddress As Long Ptr, ByVal lLength As Long, ByVal Final As Long, Opt pbSecret As String ) As String
          ' Input
          ' hAlg As given by BCryptOpenAlgorithmProvider
          ' lAddress Is address Of message To be hashed
          ' lLength Is length Of message
          ' Final Is false If further messages are To be 'hashed In' Else true.
          ' pbSecret Is secret HMAC key
          ' Input/Output
          ' phHash Is zero On first pass Of Hash() And As given by BCryptCreateHash On second And subsequent passes
          
            If IsFalse phHash Then
              If IsMissing( pbSecret ) Then
                BCryptCreateHash( hAlg, phHash, %Null, 0, 0, 0, 0 )
              Else
                BCryptCreateHash( hAlg, phHash, ByVal %Null, 0, StrPtr( pbSecret ), Len( pbSecret ), 0 )
              End If
            End If
          
            BCryptHashData( phHash, lAddress, lLength, 0 )
          
            If IsTrue Final Then
              Local lHashLength, lResult As Long
              Local sBinHash As String
              BCryptGetProperty( hAlg, $$BCRYPT_HASH_LENGTH, VarPtr( lHashLength ), 4, lResult, 0 )
              sBinHash = Space$( lHashLength )
              BCryptFinishHash( phHash, StrPtr( sBinHash ), lHashLength, 0 )
              BCryptDestroyHash( phHash )
              Reset phHash ' Ensures that Next time Hash() Is used a hash Object Is created
              Function = sBinHash
            End If
          
          End Function
          Last edited by David Roberts; 13 Jul 2016, 09:11 PM.

          Comment


          • #6
            If you are an asm guru then you will regard the XOR code as a right dog's dinner. So it is.
            Code:
            Method S As Single
              Engine
              !movd eax, xmm0
              !psrlq xmm0, 32
              !movd ebx, xmm0
              !xor eax, ebx
              !movd xmm0, eax
              ...
            I replaced that with
            Code:
            Method S As Single
              Engine
              !movdqa xmm1, xmm0
              !psrlq, xmm1, 32
              !pxor xmm0, xmm1
              ...
            On the face of it I thought that would be worthwhile.

            It gave me a paltry 2% increase in speed.

            This goes to show how slow the xmm registers are compared with the CPU registers.

            Imagine how fast the 'Engine' would be if, instead of the xmm registers, we were able to use 64 bit registers.
            Last edited by David Roberts; 13 Jul 2016, 09:12 PM.

            Comment


            • #7
              TYPO!

              Just above the comment in 'Method S' I had '!movd eax, xmm0' and it should have been 'movd xmm0, eax'.

              Similarly, with the 5th instruction after 'Engine' in 'Method SX'.

              The 'Method R' is OK.

              I have corrected the post but if you have copied then correct your copy.

              It is not a crash-able error - it just caught my eye.

              The dumps above are OK because the 'Method DW' is OK.

              The typo only affected .S and .SX - it was my subconscious putting a spoke in as Double precision is the way to go. That is my story and I am sticking to it.
              Last edited by David Roberts; 13 Jul 2016, 09:33 PM.

              Comment


              • #8
                I have found out why the original code had BRank failures.

                I ran tests dumping just the lower Dword of the Qword output and tests dumping just the upper Dword of the Qword output. It seemed that BRank failures only occurred with the lower Dword.

                I did some further reading and found this with regard Xoroshiro128+ and, I suspect, it is true for Xorhift128+ as well.
                Beside passing BigCrush, this generator passes the PractRand test suite up to (and included) 16TB, with the exception of binary rank tests, which fail due to the lowest bit being an LFSR; all other bits pass all tests. We suggest to use a sign test to extract a random Boolean value.
                So, we have an errant bit 0 in the Qword.

                If we have two Dwords, X random and Y not random then X Xor Y will be random. That is why the Xor method worked. However, bit 0 will still be errant for the double precision values.

                We could simply cast out the lower Dword saving us time with the Xor method. However, we will now have to go past the 'engine' twice to collect two 'good' Dwords for the Double precision values; not good for a 64 bit generator. Alternatively, we can try and rectify bit 0.

                I tried the suggestion of using a sign test but all I got was a toggling of bit 0, which simply moved the location of a potential failure.

                Bit zero was looked at in isolation and it was found that it had a 50/50 chance, give or take, of being 0, which is indicative of randomness but it is not implied. It follows then that since bits 1 to 31 are random then the expected average parity of the lower Dword will be 0.5. So, if we do a parity check of the lower Dword and force bit 0 to be 1 if the parity is 1 and force bit 0 to be zero if the parity is zero then we will force bit 0 to assume the same randomness of the parity. Bit 0 is now random. Bit 0 still has a 50/50 chance of being zero but it will no longer have an anomaly in its sequence. No more BRank failures were seen.

                It is worth noting that the period of both generators, 2^128-1, will not be altered provided that no part of any interference to the output is fed back into the generators. We can do anything we like to the output provided we do not introduce non-randomness so we must subject an interfered output to applications like PractRand. The .DW, used in dumping, was subject to a type of interference in the Xor method but the Double precision was not and that got past me.

                This bit 0 correction approach now allows us to return to the original code where we kept the lower Dword for the next request; giving us the speed we got. However, we now have to correct bit 0 of the kept Dword before further processing. With the double precision values bit 0 will have to be corrected on every determination.

                I wrote some asm to correct bit 0 but I wasn't happy with the speed. I posted it in the 'Inline Assembler' forum and Paul Dixon provided code that gave mine a 24% boost. Paul's code was adopted.

                Here are the speed results for Xorshift128+:
                Code:
                .S  = 2.022 x RND
                .SX = 1.922 x RND, 2.2 x 2*RND-1
                .D  = 1.676 x RND
                .DX = 1.562 x RND
                .R(a,b)  = 3.989 x RND(a,b)
                This is a worthwhile improvement on the Xor method with regard Single precision with a little improvement with the Range. Double precision has suffered but it had no bit 0 correction with the Xor method. Having said that Double precision is only a little slower than CMWC256 Single precision.

                Here is a dump test.
                Code:
                1  Clean sweep
                2  2xu 64, 1xu 256
                3  1xu 128, 1xu 1024
                4  1xu 128, 1xu 512
                5  1xu 512, 1xu 1024
                6  1xu 128, 1xu 512
                7  Clean swwep
                8  Clean sweep
                9  1xu 128
                10 1xu64, 1xu 128
                That is OK.

                Here are the speed results for Xoroshiro128+
                Code:
                .S  = 1.96 x RND
                .SX = 1.839 x RND, 2.1 x 2*RND-1
                .D  = 1.578 x RND
                .DX = 1.441 x RND
                .R(a,b)  = 3.894 x RND(a,b)
                and a dump test
                Code:
                1  1xu 256
                2  Clean sweep
                3  1xu 128, 1xu 512, 1xu 1024
                4  Clean sweep
                5  1xu 512, 1xu 1024
                6  Clean sweep
                7  1xu 512
                8  1xu 128
                9  Clean sweep
                10 Clean sweep
                Since .DW is random for both the Xor method and this latest approach I didn't expect anything different with regard the dump tests and there isn't, with Xoroshiro128+ being a little better again. However, since Single precision is clearly faster then Double precision with this latest approach I will go for Single precision for my default; it is faster than CMWC256 Single precision and better quality randomness.

                To summarize all the posts above:

                The initial approach had a bit 0 issue across the board.
                The second approach, Xor, had a bit 0 issue with Double precision.
                The third appoach, this one, has no bit 0 issue.

                New source code in the next post. I have dropped the 'T' in the sll names since we have gone back to the Dword keeping technique of the original code.

                PS It bugged me that the sign test did not work as expected so I tired again. I got it to work but the speed of correction was almost identical to the parity method so stayed with the parity method.
                Last edited by David Roberts; 19 Jul 2016, 05:58 AM. Reason: Added PS

                Comment


                • #9
                  Code:
                  %xorshift = 0
                  
                  #If %xorshift
                    #Compile SLL "Xorshift128+
                  #Else
                    #Compile SLL "Xoroshiro128+"
                  #EndIf
                  #Dim All
                  #Register None
                  
                  #Include "Win32API.inc"
                  
                  Macro Engine
                  
                    ' xorshift128+ engine
                  
                    ' uint64_t s[2];
                    ' uint64_t next(void) {
                    ' uint64_t s1 = s[0];
                    ' const uint64_t s0 = s[1];
                    '
                    ' s[0] = s0;
                    ' s1 ^= s1 << 23; // a
                    ' s[1] = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5); // b, c
                    ' return s[1] + s0;
                    ' }
                  
                    #If %xorshift
                  
                    Prefix "ASM"
                      align 16
                      mov eax, BigS0Ptr
                      movq xmm1, [eax] ' s1 = s[0]
                      mov edx, BigS1Ptr
                      movq xmm0, [edx] ' s0 = s[1]
                      movq [eax], xmm0 ' s[0] = s0
                      movdqa xmm7, xmm1 ' temp1 = s1
                      psllq xmm7, 23 ' temp1 << 23
                      pxor xmm1, xmm7 ' s1 ^= s1 << 23
                      movdqa xmm7, xmm1 ' temp1 = s1
                      psrlq xmm7, 18 ' temp1 >> 18
                      movdqa xmm6, xmm0 ' temp2 = s0
                      psrlq xmm6, 5 ' temp2 >> 5
                      pxor xmm1, xmm0 ' s1 = s1 ^ s0
                      pxor xmm1, xmm7 ' s1 = s1 ^ s0 ^ (s1 >> 18)
                      pxor xmm1, xmm6 ' s1 = s1 ^ s0 ^ (s1 >> 18) ^ (s0 >> 5)
                      mov eax, BigS1Ptr
                      movq [eax], xmm1
                      paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
                    End Prefix
                  
                    #Else
                  
                    ' xoroshiro128+ engine
                  
                    ' uint64_t s[2];
                  
                    ' static inline uint64_t rotl(const uint64_t x, int k) {
                    '   return (x << k) | (x >> (64 - k));
                    ' }
                    '
                    ' uint64_t next(void) {
                    ' const uint64_t s0 = s[0];
                    ' uint64_t s1 = s[1];
                    ' const uint64_t result = s0 + s1;
                    '
                    ' s1 ^= s0;
                    ' s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
                    ' s[1] = rotl(s1, 36); // c
                    '
                    ' return result;
                    ' }
                  
                    Prefix "ASM"
                      align 16
                      mov eax, BigS0Ptr
                      movq xmm0, [eax] ' s0 = s[0]
                      mov eax, BigS1Ptr
                      movq xmm1, [eax] ' s1 = s[1]
                      pxor xmm1, xmm0  ' s1 = s1^s0
                  
                    'Rotate s0
                      movdqa xmm3, xmm0
                      movdqa xmm2, xmm0 ' temp2 = s0
                      psllq xmm3, 55  ' s0 = s0 << 55
                      psrlq xmm2, 9   ' temp2 >> 9
                      por xmm3, xmm2  ' s0 = s0 | temp2
                  
                      pxor xmm3, xmm1 ' s0 = s0 ^ s1
                      movdqa xmm2, xmm1 ' temp2 = s1
                      psllq xmm2, 14  ' temp2 = s1 << 14
                      pxor xmm3, xmm2 ' s0 = s0 ^ temp2
                      mov eax, BigS0Ptr
                      movq [eax], xmm3
                  
                    'Rotate s1
                      movdqa xmm3, xmm1
                      movdqa xmm2, xmm1 ' temp3 = s1
                      psllq xmm3, 36  ' s1 = s1 << 36
                      psrlq xmm2, 28  ' temp3 >> 28
                      por xmm3, xmm2  ' s1 = s1 | temp3
                  
                      mov eax, BigS1Ptr
                      movq [eax], xmm3
                  
                      paddq xmm0, xmm1 ' xmm0 is now our 64 bit random number
                    End Prefix
                  
                    #EndIf
                  End Macro
                  
                  Global BigS0Ptr, BigS1Ptr As Dword Ptr
                  Global Keep As Dword
                  Global LittleCrunch As Long
                  
                  #If %xorshift
                    Class Xorshift Common
                  #Else
                    Class Xoroshiro Common
                  #EndIf
                  
                    Instance qState() As Quad
                  
                    Class Method Create
                      ReDim qState(0 To 3) As Instance Quad
                      Local i As Long
                      BigS0Ptr = VarPtr( qState(0) )
                      BigS1Ptr = VarPtr( qState(2) )
                      qState(0) = &HFFFFFFFFFFFFFFFF
                      qState(1) = 0
                      qState(2) = &HFFFFFFFFFFFFFFFF
                      qState(3) = 0
                      LittleCrunch = %False
                      For i = 1 To 200        ' Warm up
                        Engine
                      Next
                  
                    End Method
                  
                  #If %xorshift
                    Interface IXorshift
                  #Else
                    Interface IXoroshiro
                  #EndIf
                  
                    Inherit IUnknown
                  
                    Method S As Single
                      !xor LittleCrunch, 1
                      !jz JustOneS
                      Engine
                      !movd Keep, xmm0
                      !psrlq xmm0, 32
                      !jmp short DoneS
                    JustOneS:
                      !movd xmm0, Keep
                      !movd eax, xmm0
                      ' Making the errant bit 0 into a random bit by Paul Dixon at the PowerBASIC forums
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !movd xmm0, eax
                  DoneS:
                    ' Single precision conversion by Wilbert at the PureBasic forums
                      !psrlq xmm0, 9
                      !mov edx, 1
                      !cvtsi2ss xmm1, edx
                      !por xmm0, xmm1
                      !subss xmm0, xmm1
                      !movd method, xmm0
                  
                    End Method
                  
                    Method SX As Single
                      !xor LittleCrunch, 1
                      !jz JustOneSX
                      Engine
                      !movd Keep, xmm0
                      !psrlq xmm0, 32
                      !jmp short DoneSX
                    JustOneSX:
                      !movd xmm0, Keep
                      !movd eax, xmm0
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !movd xmm0,eax
                    DoneSX:
                      !psrlq xmm0, 9
                      !mov edx, 2
                      !cvtsi2ss xmm1, edx
                      !por xmm0, xmm1
                      !subss xmm0, xmm1
                      !mov edx, 1
                      !cvtsi2ss xmm1, edx
                      !subss xmm0, xmm1
                      !movd Method, xmm0
                  
                    End Method
                  
                    Method D As Double
                      Engine
                      !movd eax, xmm0
                      !movdqa xmm1, xmm0 ' copy xmm0
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !psrlq xmm1, 32 ' This line and the next will clear the the lower Dword
                      !psllq xmm1, 32
                      !movd xmm0, eax ' Upper Dword gets destroyed - normal behaviour
                      !por xmm0, xmm1 ' Upper Dword put back
                      ' Double precision conversion by Wilbert at the PureBasic forums
                      !psrlq xmm0, 12
                      !mov eax, 1
                      !cvtsi2sd xmm1, eax
                      !por xmm0, xmm1
                      !subsd xmm0, xmm1
                      !movq Method, xmm0
                  
                    End Method
                  
                    Method DX As Double
                      Engine
                      !movd eax, xmm0
                      !movdqa xmm1, xmm0
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !psrlq xmm1, 32
                      !psllq xmm1, 32
                      !movd xmm0, eax
                      !por xmm0, xmm1
                      !psrlq xmm0, 12
                      !mov eax, 2
                      !cvtsi2sd xmm1, eax
                      !por xmm0, xmm1
                      !subsd xmm0, xmm1
                      !mov eax, 1
                      !cvtsi2sd xmm1, eax
                      !subsd xmm0, xmm1
                      !movq Method, xmm0
                  
                    End Method
                  
                    Method R( ByVal One As Long, ByVal Two As Long ) As Long
                      !xor LittleCrunch, 1
                      !jz JustOneR
                      Engine
                      !movd Keep, xmm0
                      !psrlq xmm0, 32
                      !jmp short DoneR
                    JustOneR:
                      !movd xmm0, Keep
                      !movd eax, xmm0
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !movd xmm0, eax
                    DoneR:
                      !movd edx, xmm0
                    ' Range conversion by John Gleason at the PowerBASIC forums
                      !mov ecx, One        ; NOW, evaluate parameters.
                      !mov eax, Two
                      !cmp ecx, eax        ; Is One > Two?
                      !jl short Now1LT2    ; jump over swap, already Two > One
                      !xchg eax, ecx       ; Had to switch them, now ecx has LOWER bound as it should
                  
                      Now1LT2:
                      !Sub eax, ecx        ; now eax is Range    [ Range (before +1) ]
                      !inc eax             ; Add 1 To Range. Result Is correct for two params, and 0 If Max Range
                      !jz short doTheRnd   ; jump if we incremented &hFFFFFFFF up to 0; we have maximum Range
                      ' At This Point ECX = lower,  EAX = Range, EDX = Random
                      !mul edx             ;'Random * Range+1 == edx:eax. Use edx as result as If /(2^32).
                      !Add edx, ecx        ;'Add lower bound to rand result In edx
                      doTheRnd:
                      !mov method, edx
                  
                    End Method
                  
                    Method DW As Dword
                      !xor LittleCrunch, 1
                      !jz JustOneDW
                      Engine
                      !movd Keep, xmm0 ' Keep first dword
                      !psrlq xmm0, 32  ' Make the second dword the first dword
                      !movd method, xmm0
                      Exit Method
                    JustOneDW:
                      !movd xmm0, Keep
                      !movd eax, xmm0
                      !mov ecx,eax
                      !shr ecx,16
                      !xor ecx,eax
                      !xor cl,ch
                      !setpo cl
                      !and al,&hfe
                      !or al,cl
                      !mov method, eax
                  
                    End Method
                  
                    Method DefaultSeed
                    Local i As Long
                      qState(0) = &HFFFFFFFFFFFFFFFF
                      qState(1) = 0
                      qState(2) = &HFFFFFFFFFFFFFFFF
                      qState(3) = 0
                      LittleCrunch = %False
                      For i = 1 To 200        ' Warm up
                        Engine
                      Next
                  
                    End Method
                  
                    Method UserSeed( ByVal strText As String )
                    Local hHashAlg, pbHash As Dword
                    Local sBinHash As String
                    Local i As Long
                  
                      BCryptOpenAlgorithmProvider( hHashAlg, "MD5"$$, $$Nul, 0)
                      Do
                        sBinHash = Hash( hHashAlg, pbHash, StrPtr(strText), Len(strText), %True)
                        Poke$ BigS0Ptr, sBinHash
                      Loop Until qState(0) <> 0 And qState(2) <> 0
                      ' Generating a zero state is unlikely but not impossible
                      BCryptCloseAlgorithmProvider( hHashAlg, 0 )
                      qState(2) = qState(1)
                      LittleCrunch = %False
                      For i = 1 To 200        ' Warm up
                        Engine
                      Next
                  
                    End Method
                  
                    Method SystemSeed
                    Local hRand As Dword, i As Long
                  
                    ' Cryptographically populate the State
                      BCryptOpenAlgorithmProvider(hRand, $$BCRYPT_RNG_ALGORITHM, $$Nul, 0) ' Prepare For Random number generation
                      Do
                        BCryptGenRandom(hRand, VarPtr( qState(0) ), 8, 0)
                        BCryptGenRandom(hRand, VarPtr( qState(2) ), 8, 0)
                      Loop Until qState(0) <> 0 And qState(2) <> 0
                      ' Generating a zero state is unlikely but not impossible
                      BCryptCloseAlgorithmProvider(hRand, 0)
                      LittleCrunch = %False
                      For i = 1 To 200        ' Warm up
                        Engine
                      Next
                  
                    End Method
                  
                  End Interface
                  
                  End Class
                  
                  Function Hash( ByVal hAlg As Dword, phHash As Dword, ByVal lAddress As Long Ptr, ByVal lLength As Long, ByVal Final As Long, Opt pbSecret As String ) As String
                  ' Input
                  ' hAlg As given by BCryptOpenAlgorithmProvider
                  ' lAddress Is address Of message To be hashed
                  ' lLength Is length Of message
                  ' Final Is false If further messages are To be 'hashed In' Else true.
                  ' pbSecret Is secret HMAC key
                  ' Input/Output
                  ' phHash Is zero On first pass Of Hash() And As given by BCryptCreateHash On second And subsequent passes
                  
                    If IsFalse phHash Then
                      If IsMissing( pbSecret ) Then
                        BCryptCreateHash( hAlg, phHash, %Null, 0, 0, 0, 0 )
                      Else
                        BCryptCreateHash( hAlg, phHash, ByVal %Null, 0, StrPtr( pbSecret ), Len( pbSecret ), 0 )
                      End If
                    End If
                  
                    BCryptHashData( phHash, lAddress, lLength, 0 )
                  
                    If IsTrue Final Then
                      Local lHashLength, lResult As Long
                      Local sBinHash As String
                      BCryptGetProperty( hAlg, $$BCRYPT_HASH_LENGTH, VarPtr( lHashLength ), 4, lResult, 0 )
                      sBinHash = Space$( lHashLength )
                      BCryptFinishHash( phHash, StrPtr( sBinHash ), lHashLength, 0 )
                      BCryptDestroyHash( phHash )
                      Reset phHash ' Ensures that Next time Hash() Is used a hash Object Is created
                      Function = sBinHash
                    End If
                  
                  End Function

                  Comment


                  • #10
                    Been doing tests and everything now seems fine.

                    Before I leave this thread, just to put the last dump into perspective here is a 1GB RND(0, 65535) dump , Seed = Randomize Timer.

                    PractRand did a 32MB analysis and gave up on any further testing - could you blame it?

                    It very nearly managed to avoid a BRank failure but squeezed one at the end.

                    Code:
                    f:\msvc12_32bit>type dump.dat | RNG_test.exe stdin
                    RNG = RNG_stdin, PractRand version 0.91, seed = 0x9a99f395
                    test set = normal, folding = standard(unknown format)
                    
                    rng=RNG_stdin, seed=0x9a99f395
                    length= 32 megabytes (2^25 bytes), time= 2.1 seconds
                      Test Name                         Raw       Processed     Evaluation
                      Gap-16:A                          R= -14.8  p =1-1.4e-13    FAIL
                      [Low1/8]BCFN(2+0,13-6,T)          R= +19.6  p =  3.7e-7   very suspicious
                      [Low1/8]BCFN(2+1,13-6,T)          R=  +9.2  p =  1.4e-3   unusual
                      [Low1/8]BCFN(2+2,13-6,T)          R= +40.7  p =  2.3e-14    FAIL
                      [Low1/8]BCFN(2+3,13-6,T)          R= +33.0  p =  9.7e-12    FAIL
                      [Low1/8]BCFN(2+4,13-7,T)          R= +12.9  p =  1.6e-4   unusual
                      [Low1/8]BCFN(2+6,13-8,T)          R= +26.3  p =  1.3e-7   very suspicious
                      [Low1/8]BCFN(2+7,13-9,T)          R= +26.5  p =  4.5e-7   suspicious
                      [Low1/8]DC6-9x1Bytes-1            R= +17.9  p =  7.4e-10   VERY SUSPICIOUS
                      [Low1/8]Gap-16:A                  R=+909.7  p =  9.1e-732   FAIL !!!!!!!
                      [Low1/8]Gap-16:B                  R=+883.1  p =  1.9e-666   FAIL !!!!!!!
                      [Low1/8]FPF-14+6/16:(0,14-2)      R=+511.4  p =  4.8e-447   FAIL !!!!!!!
                      [Low1/8]FPF-14+6/16:(1,14-3)      R=+364.4  p =  3.3e-319   FAIL !!!!!!
                      [Low1/8]FPF-14+6/16:(2,14-4)      R=+235.5  p =  2.2e-192   FAIL !!!!!!
                      [Low1/8]FPF-14+6/16:(3,14-5)      R=+193.4  p =  3.7e-160   FAIL !!!!!
                      [Low1/8]FPF-14+6/16:(4,14-5)      R= +87.6  p =  1.8e-72    FAIL !!!!
                      [Low1/8]FPF-14+6/16:(5,14-6)      R= +72.3  p =  2.0e-55    FAIL !!!!
                      [Low1/8]FPF-14+6/16:(6,14-7)      R= +34.7  p =  1.8e-27    FAIL !!
                      [Low1/8]FPF-14+6/16:(7,14-8)      R= +40.4  p =  3.9e-29    FAIL !!
                      [Low1/8]FPF-14+6/16:(8,14-8)      R= +23.9  p =  2.9e-17    FAIL !
                      [Low1/8]FPF-14+6/16:(9,14-9)      R= +12.0  p =  8.0e-8   suspicious
                      [Low1/8]FPF-14+6/16:all           R=+698.4  p =  2.3e-628   FAIL !!!!!!!
                      [Low1/8]FPF-14+6/16:all2          R=+122303 p = 0           FAIL !!!!!!!!
                      [Low1/8]FPF-14+6/16:cross         R= +24.4  p =  3.0e-21    FAIL !!
                      [Low4/32]BCFN(2+0,13-6,T)         R=+152.8  p =  9.3e-53    FAIL !!!!
                      [Low4/32]BCFN(2+1,13-6,T)         R=+226.6  p =  4.9e-78    FAIL !!!!
                      [Low4/32]BCFN(2+2,13-6,T)         R=+149.0  p =  1.8e-51    FAIL !!!!
                      [Low4/32]BCFN(2+3,13-6,T)         R=+112.0  p =  8.3e-39    FAIL !!!
                      [Low4/32]BCFN(2+4,13-7,T)         R=+133.8  p =  7.1e-41    FAIL !!!
                      [Low4/32]BCFN(2+5,13-8,T)         R= +49.4  p =  1.7e-13    FAIL
                      [Low4/32]BCFN(2+6,13-8,T)         R= +66.1  p =  1.0e-17    FAIL !
                      [Low4/32]BCFN(2+7,13-9,T)         R= +26.7  p =  4.1e-7   suspicious
                      [Low4/32]BCFN(2+8,13-9,T)         R= +69.7  p =  8.8e-17    FAIL !
                      [Low4/32]DC6-9x1Bytes-1           R= +3399  p =  8e-1916    FAIL !!!!!!!!
                      [Low4/32]Gap-16:A                 R=+50845  p = 0           FAIL !!!!!!!!
                      [Low4/32]Gap-16:B                 R=+79767  p = 0           FAIL !!!!!!!!
                      [Low4/32]FPF-14+6/16:(0,14-2)     R=+301.5  p =  1.7e-263   FAIL !!!!!!
                      [Low4/32]FPF-14+6/16:(1,14-3)     R=+213.2  p =  1.1e-186   FAIL !!!!!!
                      [Low4/32]FPF-14+6/16:(2,14-4)     R=+150.8  p =  3.5e-123   FAIL !!!!!
                      [Low4/32]FPF-14+6/16:(3,14-5)     R=+106.7  p =  2.4e-88    FAIL !!!!
                      [Low4/32]FPF-14+6/16:(4,14-5)     R= +50.5  p =  1.0e-41    FAIL !!!
                      [Low4/32]FPF-14+6/16:(5,14-6)     R= +46.7  p =  8.2e-36    FAIL !!!
                      [Low4/32]FPF-14+6/16:(6,14-7)     R= +36.2  p =  1.3e-28    FAIL !!
                      [Low4/32]FPF-14+6/16:(7,14-8)     R= +22.7  p =  2.2e-16    FAIL
                      [Low4/32]FPF-14+6/16:(8,14-8)     R= +25.7  p =  1.5e-18    FAIL !
                      [Low4/32]FPF-14+6/16:(9,14-9)     R= +19.9  p =  8.4e-13   VERY SUSPICIOUS
                      [Low4/32]FPF-14+6/16:(10,14-10)   R= +17.9  p =  4.1e-10  very suspicious
                      [Low4/32]FPF-14+6/16:(12,14-11)   R= +14.7  p =  3.1e-7   unusual
                      [Low4/32]FPF-14+6/16:all          R=+418.5  p =  2.1e-376   FAIL !!!!!!!
                      [Low4/32]FPF-14+6/16:all2         R=+42852  p = 0           FAIL !!!!!!!!
                      [Low4/32]FPF-14+6/16:cross        R=  +5.3  p =  9.2e-5   unusual
                      [Low1/32]BCFN(2+0,13-7,T)         R=+883.8  p =  1.7e-266   FAIL !!!!!!
                      [Low1/32]BCFN(2+1,13-7,T)         R=+662.5  p =  6.6e-200   FAIL !!!!!!
                      [Low1/32]BCFN(2+2,13-8,T)         R= +1070  p =  1.4e-272   FAIL !!!!!!
                      [Low1/32]BCFN(2+3,13-8,T)         R=+432.7  p =  9.4e-111   FAIL !!!!!
                      [Low1/32]BCFN(2+4,13-8,T)         R=+219.1  p =  1.5e-56    FAIL !!!!
                      [Low1/32]BCFN(2+5,13-9,T)         R=+347.5  p =  3.3e-79    FAIL !!!!
                      [Low1/32]BCFN(2+6,13-9,T)         R=+318.8  p =  9.2e-73    FAIL !!!!
                      [Low1/32]DC6-9x1Bytes-1           R=+14562  p =  1e-8413    FAIL !!!!!!!!
                      [Low1/32]Gap-16:A                 R=+91509  p = 0           FAIL !!!!!!!!
                      [Low1/32]Gap-16:B                 R=+262073 p = 0           FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(0,14-4)     R= +3700  p =  1e-3023    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(1,14-5)     R= +2568  p =  5e-2129    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(2,14-5)     R= +3024  p =  1e-2506    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(3,14-6)     R= +2240  p =  4e-1714    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(4,14-7)     R= +1445  p =  3e-1150    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:(5,14-8)     R=+927.1  p =  3.2e-667   FAIL !!!!!!!
                      [Low1/32]FPF-14+6/16:(6,14-8)     R= +1100  p =  1.9e-791   FAIL !!!!!!!
                      [Low1/32]FPF-14+6/16:(7,14-9)     R=+590.3  p =  5.6e-372   FAIL !!!!!!!
                      [Low1/32]FPF-14+6/16:(8,14-10)    R=+521.7  p =  4.2e-278   FAIL !!!!!!
                      [Low1/32]FPF-14+6/16:(9,14-11)    R=+358.9  p =  2.9e-157   FAIL !!!!!
                      [Low1/32]FPF-14+6/16:(10,14-11)   R=+265.6  p =  1.3e-116   FAIL !!!!!
                      [Low1/32]FPF-14+6/16:all          R= +6254  p =  2e-5295    FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:all2         R=+9121694 p = 0           FAIL !!!!!!!!
                      [Low1/32]FPF-14+6/16:cross        R= +57.9  p =  8.2e-45    FAIL !!!
                      [Low1/32]BRank(12):256(0)         R= +2628  p~=  2.9e-792   FAIL !!!!!!!
                      ...and 54 test result(s) without anomalies

                    Comment


                    • #11
                      What I had not done was a 10 x 1GB dump of Intel RdRand.
                      Code:
                      1  Clean sweep
                      2  1xu256, 2xu512
                      3  Clean sweep
                      4  1xu32, 1xu 512, 1xu 1024
                      5  1xu 128, 1xu 256
                      6  1xu 512, 2xu 32
                      7  Clean sweep
                      8  Clean sweep
                      9  1xu 1024
                      10 1xu 64, 1xu 1024
                      So, even a CSPRNG will have unusual values. They must have because we are dealing with random numbers.

                      It is a little unfair to compare a PRNG with a CSPRNG but these results show that the Xoroshiro128+ generator is 'top drawer'.

                      Comment


                      • #12
                        I have been doing a lot of code removal/additions of late and ended up with the following in .S, .SX, .DW and .R.
                        Code:
                        !movd xmm0, Keep
                        !movd eax, xmm0
                        which has, of course, some redundancy.

                        Just replace the two lines with '!mov eax, Keep'.

                        Speed increase? Well, I suppose there must be some but it did not register in my tests.

                        So, why mention it?

                        Well, it seems that my pedanticism is greater than my need to avoid showing my face with egg on it.

                        Comment


                        • #13
                          I thought it might be possible to use PBCC5 STDOUT to "just pipe the output from a random number generator" (heheh easier said by the RNG_test author than done) to RNG_test.exe STDIN, but then read in PBCC5 docs that "STDOUT is not compatible with pipes."

                          Well, I rebelled against that advice and tried it anyway. I found it seems to work fine if I just pipe in small chunks of data at a time. It's way faster to test the data even with small data packets than to write to disk then test the disk file--approaching 4x faster--and you don't have to waste any SSD drive-life-reducing erase cycles to boot!

                          I ran STDOUT | STDIN, the little 7-lines-of-code 2^63 period mwc generator we originally used years ago, on low priority to 4TB and got 2 "unusual"s. I don't actually consider p-vals of 0.9918 and 0.9972 given maybe 2000 p-val tests unusual tho.

                          Anyway, I digress. Below is the PBCC5 code I used to make 2 exe's, one outputting 512 bytes at a time, the other 1024. It's faster as the output gets bigger, but it gets unreliable definitely by 64KB, tho I don't know the upper limit of reliability. I included the 2 .exe's if you don't have PBCC and want to try them. My lowest bit replacement code may be a little different than yours, but the function I'm sure must be similar.
                          Code:
                          #COMPILE EXE
                          #DIM ALL
                          #REGISTER NONE
                           %SAMPLEsIZE = &h080        'stdout 1024 bytes at a time to RNG_test Note: can't get too big or pipe to RNG_test.exe begins to fail
                          '%SAMPLEsIZE = &h040        'stdout  512 bytes at a time to RNG_test
                          
                          MACRO mGet32bits4LoBit     'get 32 bits as substitutes from an 8 byte rand to use for bad lo dword least signif bit (bit 0)
                                                     'This macro is run only once every 32 loops (256 bytes of output) so is little drag on speed
                                !mov eax, BigS0Ptr
                                !movq mm6, [eax]     ' a = s[0]
                                !mov edx, BigS1Ptr
                                !movq mm7, [edx]     ' b = s[1]
                                !movq [eax], mm7     ' s[0] = b
                                !movq mm0, mm6       ' mm0 = a
                                !psllq mm0, 23       ' a << 23
                                !movq mm1, mm6       ' mm1 = a
                                !pxor mm1, mm0       ' mm1 = a ^= a << 23
                                !movq mm0, mm1       ' copy mm1
                                !psrlq mm1, 18       ' a >> 18
                                !pxor mm0, mm1       ' a = ^= a >> 18
                                !pxor mm0, mm7       ' a = ^= b
                                !movq mm6, mm7       ' copy mm7 ie b
                                !psrlq mm7, 5        ' b >> 5
                                !pxor mm0, mm7       ' a ^= b >> 5
                                !mov eax, BigS1Ptr
                                !movq [eax], mm0     ' s[1] = a
                                !movq2dq xmm0, mm0   ' a
                                !movq2dq xmm1, mm6   ' b
                                !paddq xmm0, xmm1    ' a + b; xmm0 is now our 64 bit random number
                                !psrldq xmm0, 1      'shift lo byte out of xmm0 so 32 lo-bit substitutes can now be used from lo dword of xmm0. 1,2,3, or 4 bytes can be shifted out.
                                !movd ebx, xmm0      'store lo 32 rand bits in ebx to use one at a time for each of thirty-two 64-bit xorshift randoms
                          END MACRO
                          
                          FUNCTION PBMAIN () AS LONG
                          REGISTER ii AS LONG
                          LOCAL bigS0ptr,bigS1ptr,Lptr,LptrB AS LONG
                          LOCAL s0,s1,t AS QUAD
                          LOCAL lineo AS STRING
                          lineo = STRING$(%SAMPLEsIZE*8, 0)
                          DIM rArr(%SAMPLEsIZE-1) AS QUAD AT STRPTR(lineo) 'not really needed here, but is artifact from previous writing-to-disk tests
                             Lptr = VARPTR(rArr(0))
                             LptrB = Lptr
                             bigS0ptr = VARPTR(s0)
                             bigS1ptr = VARPTR(s1)
                             s0 = CVQ(CHR$(207,36,49,011,235,180,68,241))  'any 8 numbers for initial bytes
                             s1 = CVQ(CHR$(131,47,226,156,246,105,021,57)) '      "       "          "
                             TIX t
                             POKE QUAD, VARPTR(s0), t XOR s0  'shuffle up the starting quads some
                             SLEEP 113
                             TIX t
                             POKE QUAD, VARPTR(s1), t XOR s1  'shuffle the starting quads
                            
                             DO
                               FOR ii = 0 TO %SAMPLEsIZE-1
                                 IF (ii AND 31) = 0 THEN
                                   mGet32bits4LoBit
                                 END IF
                                !mov eax, BigS0Ptr
                                !movq mm6, [eax]     ' a = s[0]
                                !mov edx, BigS1Ptr
                                !movq mm7, [edx]     ' b = s[1]
                                !movq [eax], mm7     ' s[0] = b
                                !movq mm0, mm6       ' mm0 = a
                                !psllq mm0, 23       ' a << 23
                                !movq mm1, mm6       ' mm1 = a
                                !pxor mm1, mm0       ' mm1 = a ^= a << 23
                                !movq mm0, mm1       ' copy mm1
                                !psrlq mm1, 18       ' a >> 18
                                !pxor mm0, mm1       ' a = ^= a >> 18
                                !pxor mm0, mm7       ' a = ^= b
                                !movq mm6, mm7       ' copy mm7 ie b
                                !psrlq mm7, 5        ' b >> 5
                                !pxor mm0, mm7       ' a ^= b >> 5
                                !mov eax, BigS1Ptr
                                !movq [eax], mm0     ' s[1] = a
                                !movq2dq xmm0, mm0   ' a
                                !movq2dq xmm1, mm6   ' b
                                !mov eax, Lptr       'final 8 byte rand array store ptr
                                !paddq xmm0, xmm1    ' a + b; xmm0 is now our 64 bit random number
                                !movups [eax], xmm0  'store rand bytes in array
                                !shr ebx, 1
                                !setc cl
                                !xor byte ptr[eax], cl
                                !add eax, 8
                                !mov Lptr, eax
                              NEXT
                             !emms
                                                
                              STDOUT lineo;
                              Lptr = LptrB
                                                  
                              LOOP
                              
                          END FUNCTION
                          Here are some results:
                          Code:
                          C:\msvc12_32bit>xorshiftstdout512 | rng_test stdin
                          RNG = RNG_stdin, PractRand version 0.92, seed = 0x63bf972a
                          test set = normal, folding = standard(unknown format)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 128 megabytes (2^27 bytes), time= 3.6 seconds
                            no anomalies in 151 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 256 megabytes (2^28 bytes), time= 8.1 seconds
                            no anomalies in 162 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 512 megabytes (2^29 bytes), time= 15.8 seconds
                            no anomalies in 171 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 1 gigabyte (2^30 bytes), time= 30.8 seconds
                            no anomalies in 183 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 2 gigabytes (2^31 bytes), time= 59.5 seconds
                            no anomalies in 194 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 4 gigabytes (2^32 bytes), time= 114 seconds
                            no anomalies in 203 test result(s)
                          
                          rng=RNG_stdin, seed=0x63bf972a
                          length= 8 gigabytes (2^33 bytes), time= 226 seconds
                            no anomalies in 215 test result(s)
                          rngPipeTestsXorshift.zip
                          Attached Files
                          Last edited by John Gleason; 22 Jul 2016, 03:52 PM. Reason: should be "t XOR s0" (and s1)

                          Comment


                          • #14
                            Haven't got time to try tonight - will have a look tomorrow.

                            226 seconds! You must be younger than me.

                            Comment


                            • #15
                              Woke up gasping for a cup of tea - could be worse.

                              My lowest bit replacement code may be a little different than yours, but the function I'm sure must be similar.
                              It is very different:

                              If parity of Qword is 1 then bit 0 is set else reset. The idea being bit 0 will have the same randomness as parity - ie random

                              Your app, John, is too much like hard work for me. Great idea.

                              PBCC6

                              Code:
                              #Compile Exe "My_RNG"
                              #Dim All
                              #Register None
                              
                              #Include "WIN32API.INC"
                              #Link "C:\PBWin\Xorshift\Xoroshiro128+.sll"
                              
                              Global Rand As IXoroshiro
                              
                              Function PBMain
                              Local i, j As Long
                              Local sData As String * 1024
                              Dim dwArr(1 To 256) As Dword At VarPtr( sData )
                              
                              Rand = Class "Xoroshiro"
                              
                              Rand.SystemSeed
                              
                              For j = 1 To 8*1024*1024 ' 8 Gigs
                                For i = 1 To 256
                                  dwArr(i) = Rand.DW
                                Next
                                con.stdout( sData ;)
                              Next
                              
                              End Function
                                          
                              Code:
                              f:\msvc12_32bit>My_RNG | RNG_test stdin
                              RNG = RNG_stdin, PractRand version 0.91, seed = 0x6147cb8b
                              test set = normal, folding = standard(unknown format)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 32 megabytes (2^25 bytes), time= 2.2 seconds
                                no anomalies in 130 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 64 megabytes (2^26 bytes), time= 4.7 seconds
                                no anomalies in 139 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 128 megabytes (2^27 bytes), time= 9.1 seconds
                                no anomalies in 151 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 256 megabytes (2^28 bytes), time= 17.4 seconds
                                no anomalies in 162 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 512 megabytes (2^29 bytes), time= 33.3 seconds
                                no anomalies in 171 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 1 gigabyte (2^30 bytes), time= 65.0 seconds
                                no anomalies in 183 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 2 gigabytes (2^31 bytes), time= 128 seconds
                                no anomalies in 194 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 4 gigabytes (2^32 bytes), time= 251 seconds
                                no anomalies in 203 test result(s)
                              
                              rng=RNG_stdin, seed=0x6147cb8b
                              length= 8 gigabytes (2^33 bytes), time= 506 seconds
                                no anomalies in 215 test result(s)
                              Last edited by David Roberts; 22 Jul 2016, 08:16 PM.

                              Comment


                              • #16
                                CAVEAT: Best results are got if we restrict our requests to 32GB of data without reseeding. If you are in need of a lot of random numbers a safe bet would be to reseed every 16GB.
                                Last edited by David Roberts; 28 Jul 2016, 03:15 PM.

                                Comment

                                Working...
                                X