Note: Compilers 10 or 6 are required and OS >= Windows Vista
This is a follow on from Xorshift128+ PRNG.
The Xorshift pseudorandom number generator is a method invented by George Marsaglia. Marsaglia also invented the Multiply-with-carry generator (MWC). There were some issues with MWC and, over the years, Marsaglia addressed them resulting in lag-r Complementary-multiply-with-carry (CMWC) base 2^32-1 and that is the basis of my CMWC256.
There were some issues with Xorshift but Marsaglia left it to to others to take up the batten seen at Xorshift. The following implements the method described in the last link in the references and is called Xorshift128+. Since then, sometime in 2016, the author, Sebastiano Vigna, and David Blackman have introduced a variant called Xoroshiro128+. [3]
Earlier xorshift generators had issues but the "xorshift128+ generator that is currently the fastest full-period generator we are aware of that does not fail systematically any BigCrush test (not even reversed)..." [1]. Xoroshiro128+ equalled that and then gave "a significant improvement in statistical quality, as detected by the long-range tests of PractRand." [2].
With '%xorshift = -1', the first line of the following source code, the following code will compile to Xorshift128+.sll and with '%xorshift = 0' it will compile to Xoroshiro128.sll.
CMWC256 has a period of 2^8208 reflecting a lag-256. Smaller and larger periods can be created but a lag-256 takes, in my opinion, a comfortable time to seed the generator. User seeding takes about 1.45ms and system seeding takes about 7.5ms. The generators here take about 75µs for both seeding methods. Note that is micro-seconds.
Both xorshift methods have a period of 2^128-1. RND has a period of 2^32 and my machine can compute all singles in less than one minute. 2^128 may not seen much larger than RND but 2^128 = (2^32)^4.
2^128 is not easy for me to comprehend.
2^128 = 3.4 x 10^37
That I can comprehend and that is large.
In fact, it would take my machine, i7-3770K @ 3.5GHz, 4,829,532,708,142 X The age of the universe (at 29/05/2016 ) to complete the period.
"...purely linearly recurrent generators with a very large state space need a very long time to get from an initial state with a small number of ones to a state in which the ones are approximately half." [1]. We are not using a very large state space but, nonetheless, a 'warm up' period is required. 'Burning' 200 64 bit values is more than enough for our purposes. On my machine that takes about 10 nano-seconds. Not enough time to make a cup of tea. A 'warm up' is undertaken after any seeding.
Speed tests
Added: The syntax is the same as CMWC256. If you are not acquainted with that here is the meaning.
.S Single precision [0,1)
.SX Single precision [-1,1)
.D Double precision [0,1)
.DX Double precision [-1,1)
.R(a,b) Same as PowerBASIC
"It looks like Xoroshiro is the best general purpose algorithm currently available. Low memory (just 128 bits of storage), extremely high performance (1.2 nanoseconds per 64-bit number, after subtracting baseline overheads) and very well distributed (beating other algorithms on a range of automated tests)." [4].
I made a mistake in the opening link - I was comparing apples with oranges - and overestimated the speed of Xorshift128+. The above test used the same program for all three generators using their respective libraries. The figures represent an average of 20 runs.
The reason why CMWC256's double precision speed is much less than its single precision speed is because it is a 32 bit generator so we have to go past the 'engine' twice to get two dwords. The xorshift generators are 64 bit so one pass of the engine is enough. With the xorshift generators we go past the engine once for two single precision number requests; one dword is used and the other stored for the next request.
Xorshift128+ is over 26% faster than CMWC256 coming in, on my machine, at 186 million singles per second compared with RND at 83 million singles per second. Given that there is very little difference in speed between Xorshift128+ and Xoroshiro128+ and because of the latter's "significant improvement in statistical quality". [2] then if I were to use a xorshift generator it would be Xoroshiro128+.
There is another method: .DW. That generates Dwords - I used that to dump data to file for testing.
Seeding is done via .DefaultSeed, .UserSeed("[Your string here]") and .SystemSeed
If no seeding is used a default seed, the same as .DefaultSeed, is set up during the object's creation.
The user string, which may be an empty string, is digested with MD5 to populate the state. Although MD5 is no longer recommended for cryptographic purposes it is OK in our context. Any function will do which gives the same output for a given input. The fact that two different input strings may give the same output is irrelevant. MD5 produces 128 bits, just what we need for a 128 bit state.
.SystemSeed populates the state with random numbers from the Windows API BCryptGenRandom and why 'OS >= Windows Vista' is required. For 'OS < Windows Vista' then simply replace with CryptGenRandom which has been around since, at least, Windows 95.
For Xorshift128+ we use
and for Xoroshiro128+ we use
or whatever you wish to use for 'Rand'.
Here is a small PBCC program to show usage and the seeding methods.
If you do not have PBCC here is a typical output:
If an application required only a small number of random numbers then RND is more than adequate. It passes all Die Hard tests and gets good results using Fourmilab's ENT. I have no idea how well it performs using BigCrush but then a small number of random numbers would not exhibit any failings in quality anyway and the fact that its speed is less than the three generators tested here would be academic for a small number of random numbers required.
So, what is a small number of random numbers? I don't know.
The two xorshift libraries do not have as many methods as CMWC256 but they can be added if requested.
Both libraries are quite small: Xorshift128+.sll ( 8.61KB ), Xoroshiro128+.sll ( 8.99KB ).
The source code is in the following post.
Comments, if any, here.
Have fun.
CAVEAT: The following code is not thread safe; and nor is RND.
References
[1] Further scramblings of Marsaglia's xorshift generators ( pdf download )
[2] xoroshiro+ / xorshift* / xorshift+ generators and the PRNG shootout
[3] Xoroshiro128+
[4] Random number generators in Swift
This is a follow on from Xorshift128+ PRNG.
The Xorshift pseudorandom number generator is a method invented by George Marsaglia. Marsaglia also invented the Multiply-with-carry generator (MWC). There were some issues with MWC and, over the years, Marsaglia addressed them resulting in lag-r Complementary-multiply-with-carry (CMWC) base 2^32-1 and that is the basis of my CMWC256.
There were some issues with Xorshift but Marsaglia left it to to others to take up the batten seen at Xorshift. The following implements the method described in the last link in the references and is called Xorshift128+. Since then, sometime in 2016, the author, Sebastiano Vigna, and David Blackman have introduced a variant called Xoroshiro128+. [3]
Earlier xorshift generators had issues but the "xorshift128+ generator that is currently the fastest full-period generator we are aware of that does not fail systematically any BigCrush test (not even reversed)..." [1]. Xoroshiro128+ equalled that and then gave "a significant improvement in statistical quality, as detected by the long-range tests of PractRand." [2].
With '%xorshift = -1', the first line of the following source code, the following code will compile to Xorshift128+.sll and with '%xorshift = 0' it will compile to Xoroshiro128.sll.
CMWC256 has a period of 2^8208 reflecting a lag-256. Smaller and larger periods can be created but a lag-256 takes, in my opinion, a comfortable time to seed the generator. User seeding takes about 1.45ms and system seeding takes about 7.5ms. The generators here take about 75µs for both seeding methods. Note that is micro-seconds.
Both xorshift methods have a period of 2^128-1. RND has a period of 2^32 and my machine can compute all singles in less than one minute. 2^128 may not seen much larger than RND but 2^128 = (2^32)^4.
2^128 is not easy for me to comprehend.
2^128 = 3.4 x 10^37
That I can comprehend and that is large.
In fact, it would take my machine, i7-3770K @ 3.5GHz, 4,829,532,708,142 X The age of the universe (at 29/05/2016 ) to complete the period.
"...purely linearly recurrent generators with a very large state space need a very long time to get from an initial state with a small number of ones to a state in which the ones are approximately half." [1]. We are not using a very large state space but, nonetheless, a 'warm up' period is required. 'Burning' 200 64 bit values is more than enough for our purposes. On my machine that takes about 10 nano-seconds. Not enough time to make a cup of tea. A 'warm up' is undertaken after any seeding.
Speed tests
Code:
CMWC256 .S = 1.77 x RND .SX = 1.72 x RND, 1.97 x 2*RND-1 .D = 1.23 x RND .DX = 1.18 x RND .R(a,b) = 3.85 x RND(a,b) Xorshift128+ .S = 2.24 x RND .SX = 2.14 x RND, 2.45 x 2*RND-1 .D = 2.03 x RND .DX = 1.93 x RND .R(a,b) = 4.67 x RND(a,b) Xoroshiro128+ .S = 2.23 x RND .SX = 2.14 x RND, 2.45 x 2*RND-1 .D = 1.93 x RND .DX = 1.81 x RND .R(a,b) = 4.5 x RND(a,b)
.S Single precision [0,1)
.SX Single precision [-1,1)
.D Double precision [0,1)
.DX Double precision [-1,1)
.R(a,b) Same as PowerBASIC
"It looks like Xoroshiro is the best general purpose algorithm currently available. Low memory (just 128 bits of storage), extremely high performance (1.2 nanoseconds per 64-bit number, after subtracting baseline overheads) and very well distributed (beating other algorithms on a range of automated tests)." [4].
I made a mistake in the opening link - I was comparing apples with oranges - and overestimated the speed of Xorshift128+. The above test used the same program for all three generators using their respective libraries. The figures represent an average of 20 runs.
The reason why CMWC256's double precision speed is much less than its single precision speed is because it is a 32 bit generator so we have to go past the 'engine' twice to get two dwords. The xorshift generators are 64 bit so one pass of the engine is enough. With the xorshift generators we go past the engine once for two single precision number requests; one dword is used and the other stored for the next request.
Xorshift128+ is over 26% faster than CMWC256 coming in, on my machine, at 186 million singles per second compared with RND at 83 million singles per second. Given that there is very little difference in speed between Xorshift128+ and Xoroshiro128+ and because of the latter's "significant improvement in statistical quality". [2] then if I were to use a xorshift generator it would be Xoroshiro128+.
There is another method: .DW. That generates Dwords - I used that to dump data to file for testing.
Seeding is done via .DefaultSeed, .UserSeed("[Your string here]") and .SystemSeed
If no seeding is used a default seed, the same as .DefaultSeed, is set up during the object's creation.
The user string, which may be an empty string, is digested with MD5 to populate the state. Although MD5 is no longer recommended for cryptographic purposes it is OK in our context. Any function will do which gives the same output for a given input. The fact that two different input strings may give the same output is irrelevant. MD5 produces 128 bits, just what we need for a 128 bit state.
.SystemSeed populates the state with random numbers from the Windows API BCryptGenRandom and why 'OS >= Windows Vista' is required. For 'OS < Windows Vista' then simply replace with CryptGenRandom which has been around since, at least, Windows 95.
For Xorshift128+ we use
Code:
#Link "C:\PBWin\Xorshift\Xorshift128+.sll" ' Your path here Global Rand As IXorshift Function PBMain Rand = Class "Xorshift"
Code:
#Link "C:\PBWin\Xorshift\Xoroshiro128+.sll" ' Your path here Global Rand as IXoroshiro Functioin PBMain Rand = Class "Xoroshiro"
Here is a small PBCC program to show usage and the seeding methods.
Code:
[COLOR=0000C0]#Compile Exe #Dim All #Register None #Link [/COLOR][COLOR=C020C0]"C:\PBWin\Xorshift\Xoroshiro128+.sll" [/COLOR][COLOR=007F00]' Your path here '#Link "C:\PBWin\Xorshift\Xorshift128+.sll" ' Your path here [/COLOR] [COLOR=0000C0]Global [/COLOR][COLOR=000000]Rand [/COLOR][COLOR=0000C0]As [/COLOR][COLOR=000000]IXoroshiro [/COLOR] [COLOR=0000C0]Function PBMain As Long Local [/COLOR][COLOR=000000]i [/COLOR][COLOR=0000C0]As Long [/COLOR][COLOR=000000]Rand [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=0000C0]Class [/COLOR][COLOR=C020C0]"Xoroshiro" [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=C020C0]"Default seed" [/COLOR][COLOR=0000C0]Print For [/COLOR][COLOR=000000]i [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=000000]1 [/COLOR][COLOR=0000C0]To [/COLOR][COLOR=000000]5 [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.S [/COLOR][COLOR=0000C0]Next Print Print [/COLOR][COLOR=C020C0]"User seed: PowerBASIC" [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.UserSeed[/COLOR][COLOR=8000FF]([/COLOR][COLOR=C020C0]"PowerBASIC"[/COLOR][COLOR=8000FF]) [/COLOR][COLOR=0000C0]For [/COLOR][COLOR=000000]i [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=000000]1 [/COLOR][COLOR=0000C0]To [/COLOR][COLOR=000000]5 [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.S [/COLOR][COLOR=0000C0]Next Print Print [/COLOR][COLOR=C020C0]"System seed" [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.SystemSeed [/COLOR][COLOR=0000C0]For [/COLOR][COLOR=000000]i [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=000000]1 [/COLOR][COLOR=0000C0]To [/COLOR][COLOR=000000]5 [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.S [/COLOR][COLOR=0000C0]Next Print Print [/COLOR][COLOR=C020C0]"User seed: PowerBASIC" [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.UserSeed[/COLOR][COLOR=8000FF]([/COLOR][COLOR=C020C0]"PowerBASIC"[/COLOR][COLOR=8000FF]) [/COLOR][COLOR=0000C0]For [/COLOR][COLOR=000000]i [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=000000]1 [/COLOR][COLOR=0000C0]To [/COLOR][COLOR=000000]5 [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.S [/COLOR][COLOR=0000C0]Next Print Print [/COLOR][COLOR=C020C0]"Default seed" [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.DefaultSeed [/COLOR][COLOR=0000C0]For [/COLOR][COLOR=000000]i [/COLOR][COLOR=8000FF]= [/COLOR][COLOR=000000]1 [/COLOR][COLOR=0000C0]To [/COLOR][COLOR=000000]5 [/COLOR][COLOR=0000C0]Print [/COLOR][COLOR=000000]Rand.S [/COLOR][COLOR=0000C0]Next WaitKey$ End Function[/COLOR]
Code:
Default seed .3211236 1.512289E-2 .2400187 .6792868 .2981217 User seed: PowerBASIC .6273323 .9427909 .6490459 .1646218 .9015744 System seed .9760034 .8830762 .6781559 .7170306 .5183147 User seed: PowerBASIC .6273323 .9427909 .6490459 .1646218 .9015744 Default seed .3211236 1.512289E-2 .2400187 .6792868 .2981217
So, what is a small number of random numbers? I don't know.
The two xorshift libraries do not have as many methods as CMWC256 but they can be added if requested.
Both libraries are quite small: Xorshift128+.sll ( 8.61KB ), Xoroshiro128+.sll ( 8.99KB ).
The source code is in the following post.
Comments, if any, here.
Have fun.
CAVEAT: The following code is not thread safe; and nor is RND.
References
[1] Further scramblings of Marsaglia's xorshift generators ( pdf download )
[2] xoroshiro+ / xorshift* / xorshift+ generators and the PRNG shootout
[3] Xoroshiro128+
[4] Random number generators in Swift
Comment