> once you have an algorithm that tests without statistical flaws
> once you've hit the ceiling, there's apparently no place higher to go.
As in 'beyond reasonable doubt'? That had its boundary moved when forensic science started playing around with DNA testing.
I believe that the inherent weakness of all prngs is that they are generated using continuous functions which do not prohibit things like infinities and singularities.
The brakes are off if we use discrete functions and for an abundance of them we need to venture into the quantum environment.
Announcement
Collapse
No announcement yet.
Constants for Random Numbers Generator
Collapse
X

I agree with David. Statistically speaking, there is no such thing as an "ultimate final test" or proof of randomness.
Then too, don't forget John von Neumann's famous injunction, which Donald Knuth quotes at the outset of his great chapter in vol 2 of "The art of computer programming" on random numbers: "Anyone who considers arithmetical methods of producing random digits is, of course in a state of sin."
Leave a comment:

>>... We can grade the doubt of randomness but we cannot grade randomness itself.
That's an interesting point, because once you have an algorithm that tests without statistical flaws, can any other algorithm surpass it? It seems you could reasonably base a further criterion on period length, saying that a flawless algo with a period of 2^256 is "better" than one with a period of 2^64. But other than that, once you've hit the ceiling, there's apparently no place higher to go.
If I had to make a linear scale of randomness doubt confidence of various generatorsespecially if we include mechanical "true randomness" generatorswith E being "I am virtually certain that data is not random," to A being "That data looks random to me," here is my take:
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEDCBA.
The data is a statistical disaster 99+% of the time. If you doubt this, create a few generators yourself and then test them. Prepare to be apically humbled.
There is one other characteristic called divergence that can distinguish prngs. Some generatorsthe Mersenne Twister in particularcan get into regions of very low change where they become statistically flawed. But if you avoid those areas (an easy task given a period of over 2^19000) that flaw won't appear.Last edited by John Gleason; 11 Apr 2008, 09:04 AM.
Leave a comment:

I'd like to chip in here.
All the tests above are based upon our assuming that the data we are considering is random data.
It is the job of the tests to disprove our assumption, not prove it.
If the DieHard routines, for example, fail to disprove our assumption then we may conclude that no evidence has been found to cast doubt upon our assumption.
Given that a set of data is random then we may be able to say that it possesses, for example, characteristics x, y and z.
However, given that characteristics x, y and z exist does not prove randomness. If one or more of the characteristics does not exist then this will cast doubt on the randomness of the data.
If we fall into the body of a probability distribution then we are not able to draw any conclusions because that is where we expected to be. On the other hand, if we fall into a tail of a probability distribution then we may draw conclusions because that is where we are not expected to be.
If we have a process where there is some evidence that it is not random we could have another process where we have stronger evidence that it, in turn, is not random. What we cannot say is that the first process is more random than the second when we doubt that the first process is indeed random.
We can grade the doubt of randomness but we cannot grade randomness itself.Last edited by David Roberts; 10 Apr 2008, 11:50 PM.
Leave a comment:

>>or grade (as after an exam) to the "quality of randomness"
I see the idea now. Well, you have some code there as a starting point if you ever do need to limit the output characters.
As far as the grade goes, the final KStest is mostly passfail. It is very unforgiving and usually gives an E (.000000 or 1.000000) if there's even a tiny flaw in the algo. An overall result of eg. .001045 is highly suspect (a D maybe), but will occur by chance ~ 1 in 1000 times (.998955 being its opposite). But eg. .990842 or .010322 occur with seeming regularity and can easily be part of an A algo. It's not a "failure at the 99% confidence level" as you might hear in statistics reporting. In fact, a good algo must produce those values along with all the others in the linear distribution. Sort and graph your 269 summary pvalues and they should form a straight line 0100. So should eg. 100 overall pvalues from 100 separate 269 pval summaries. It's like a recursion, no matter how deep you go, the pvalues always should be uniformly distributed.
If however, you run the test 3 times in a row and get for the overall: .010332; .009233; .012322... D or E. It has a virtually guaranteed flaw. Tho the algo must be extremely random to achieve even those D/E values, check that code carefully because some glitch will likely turn up.Last edited by John Gleason; 10 Apr 2008, 02:31 PM.
Leave a comment:

Originally posted by John Gleason View Post>>the final pvalues collected and examined for uniform distribution. Is this correct ?
Yes, that would be among the most stringent proofs for quality using the Diehard tests.
A note about the AE distribution you mentioned: using your 272MB file for example, 5 of every 256 bytes will be "A" thru "E" (CHR$(&h41) to CHR$(&h45)) or about 2% or 5.3MB of your random bytes...
I had in mind a response assigning a mark, or grade (as after an exam) to the "quality of randomness" of the sequence: A = very good ... E= very bad
Regards
Aldo
Leave a comment:

>>the final pvalues collected and examined for uniform distribution. Is this correct ?
Yes, that would be among the most stringent proofs for quality using the Diehard tests.
A note about the AE distribution you mentioned: using your 272MB file for example, 5 of every 256 bytes will be "A" thru "E" (CHR$(&h41) to CHR$(&h45)) or about 2% or 5.3MB of your random bytes. But you can get a much higher % return rate of those 5 characters by, for example, using the two nybbles of each byte to represent A thru E if the chosen (nybble AND 7) has a value of eg. 1 thru 5.
Here is an illustration showing this to get many times the output (86% of bytes succeed in producing at least one, 39% produce two AE results):
Code:#COMPILE EXE #DIM ALL FUNCTION PBMAIN () AS LONG LOCAL ii, x, a2e1, a2e2 AS LONG, rndbytes AS STRING FOR ii = 21 TO 40 rndBytes = "gv´œÍHµ„¾à‰vá°FLþä#n®~ë[PÃyV‘%¥º9‘÷ávf:þUóÁav¬à[email protected]üIR”Wô°„Edé‡" x = ASC(rndBytes, ii) 'next byte from rand string a2e1 = x AND &b000000111 'low nybble AND 7 a2e2 = x AND &b001110000 'hi nybble AND 7 SHIFT RIGHT a2e2, 4 'move to low nybble (07) SELECT CASE a2e1 CASE 1: ? "low A" CASE 2: ? "low B" CASE 3: ? "low C" CASE 4: ? "low D" CASE 5: ? "low E" CASE ELSE: ? "No low match this byte" END SELECT SELECT CASE a2e2 CASE 1: ? "hi A" CASE 2: ? "hi B" CASE 3: ? "hi C" CASE 4: ? "hi D" CASE 5: ? "hi E" CASE ELSE: ? "No hi match this byte" END SELECT NEXT END FUNCTION
Leave a comment:

Ooops ... I counted definitely too many bits: it was a sequence of 272 Mbytes, so slightly more than 2e9 bits.
So, If I undertand correctly, the ultimate final test is based on the uniform distribution of the pvalues of the KS test made on the set of 269 pvalues obtained from the individual tests. So the Diehard test should be repeated many times on different sequences of data produced by the same generator and the final pvalues collected and examined for uniform distribution.
Is this correct ?
Leave a comment:

Originally posted by Aldo Vitagliano View PostIs there a practical way of converting the summary of 269 pvalues given at the end of the report to an information such as: "From A to E, your sequence of 10^11 bits is Aclass random ?"
One other point: Diehard only tests the first ~275MB of your files, so you can split your 10^11 bits into many files of say 290MB to be safe, then test some of those to be sure you have a good algo. If you test, say 10 files, your 10 Overall pvalue after applying KStest on 269 pvalues totals should be generally evenly distributed from 0 to 1.Last edited by John Gleason; 10 Apr 2008, 08:34 AM. Reason: I noticed file size is in bits, not bytes
Leave a comment:

Thanks again.
I downloaded the Diehard2 package at the link you provided. Then I generated two huge binary files using the gsl_rng_knuthran2 generator (+ version), with further shuflling of the output, and I submitted the files to the full set of tests.
From the little I understood of the meaning of the "pvalues" everything should be OK, but I would like to get a slightly better understanding. Is there a practical way of converting the summary of 269 pvalues given at the end of the report to an information such as: "From A to E, your sequence of 10^11 bits is Aclass random ?"
Leave a comment:

>>Do you have "ready to use" routines that you will be willing to share ?
Sure Aldo, the three I mainly use are Diehard (new version), RaBiGeTe, and a big Chi Square algo I wrote based on the Ent test.
here's the link to download RaBiGeTe:
http://www.webalice.it/cristiano.pi/rabigete/
I can only find links to the old version of Diehard on the net right now, eg.
http://www.stat.fsu.edu/pub/diehardd/
so if you'd like the new version, I can email it to you.
Send me a private message with your email if you want me to send you the new Diehard and/or the big Chi^2 program.
Added: I found a link to the new version of Diehard:
http://i.cs.hku.hk/~diehard/index.html
Leave a comment:

Thank you John. I have adopted the (+) version, because I only want positive numbers, and then I also adopt a further "shuffling" method as described in Numerical Recipes, section 7.1.
You said you have run a lot of tests for randomness. Do you have "ready to use" routines that you will be willing to share ?
Leave a comment:

I found the Knuth equation reference from The Laws of Cryptography with Java Code by Neal R. Wagner, p.102, and it shows a subtraction, not addition.
It's almost a certainty that both + and  are correct, because given that the randoms are full range LONGS, they are split 5050 positivenegative (well, maybe 1 extra negative in there), and it doesn't matter if you add a positive or subtract a negative. This postulate is borne out in the randomness tests, as both Knuth versions pass all tests I have, and that's a lot of tests.
There is still the issue tho that you need to generate two Knuth randoms to get a full 32bit random. If you need all 32 bits, this cuts its net speed more than half. The Marsaglia randoms are 32bit in one pass, have millions more 2^63 sequences, and also pass all the randomness tests.
Leave a comment:

and then of course you can always change generators as you wish
http://powerbasic.com/support/pbforu...ndom+generator
Leave a comment:

>>shouldn't there be a (+) sign rather than () here ?
Hmmm... indeed it does look that way. I wonder if I got my formula from an older version (than 2002) of if it is just an error on my part, or if it matters, ie. it works either way. Well, I've extensively tested the "" algo and it passed all tests for randomness (like "Diehard" and NIST tests), but it is a poser!
Well, until you or I can find out, here is another simpler and faster one, and it's the one I use 99% of the time:
Code:#COMPILE EXE #DIM ALL FUNCTION PBMAIN () AS LONG 'George Marsaglia random # gen. Period ~ 2^63 LOCAL mRnd AS QUAD LOCAL x0, x1, x2 AS DWORD LOCAL ii, x1Long AS LONG x0 = 1995844334 'you supply this seed: 0 < seed < 2087494968 x1 = 3588392233 ' " : 0 < seed < 4294967296 x2 = &h7C6CA538 'prime of special kind (Sophie Germain). You can choose any of millions where both x2 * 2^32 1, and x2 * 2^31 1 are prime FOR ii = 1 TO 10 mRnd = x2 * x1 + x0 x0 = HI(DWORD, mRnd) x1 = LO(DWORD, mRnd) '<< this is your random DWORD. x1Long = x1 'You can also convert the DWORD to a LONG if needed. ? STR$(x1) & $CRLF & STR$(x1Long) NEXT ? "k" END FUNCTION
Leave a comment:

Thanks,
but ... according to the link you provided, shouldn't there be a (+) sign rather than () here ?
271828183 * x1  314159269 * x0
Leave a comment:

I'm afraid I haven't found my PDF with the info on the Knuth algo, but the below link describes it briefly. Look for gsl_rng_knuthran2 on the page:
http://www.gnu.org/software/gsl/manu...enerators.html
Leave a comment:

Aldo, I corrected the above code in this post: some of the variables should be QUADS. Also, in order to account for that 32nd bit that is always 0 in the algorithm, below is code (not particularly optimized btw) which handles it.
>>1) I suppose that at the beginning it can be safely fed with any couple of LONG seeds. Is this true ?
Correct, doing that gives us potentially approx. 2^64 starting points in the sequence, but I believe the period is only 2^62 or 2^63.
>>2) Do you have references about it ?
I have a PDF of Knuth's paper describing the algorithm. I'll try to find it asap.
Code:#COMPILE EXE #DIM ALL FUNCTION PBMAIN () AS LONG LOCAL x1, kRnd, kRnd2, x0 AS QUAD LOCAL ii, kFin AS LONG x0 = 1995844334 'you supply these two long seeds x1 = 1588392233 ' " FOR ii = 1 TO 10 kRnd = (271828183 * x1  314159269 * x0) MOD &h7fffffff x0 = x1 x1 = kRnd kRnd2 = (271828183 * x1  314159269 * x0) MOD &h7fffffff x0 = x1 x1 = kRnd2 SHIFT LEFT kRnd2, 1 'move a random bit into the 32nd bit spot kRnd2 = kRnd2 AND &h80000000 kFin = kRnd + kRnd2 ? STR$(kFin) NEXT ? "k" END FUNCTION
Leave a comment:

Thank you for your replies.
John's suggestion looks simple and would in principle be enough for me.
Two questions:
1) I suppose that at the beginning it can be safely fed with any couple of LONG seeds. Is this true ?
2) Do you have references about it ?
Leave a comment:

Leave a comment: