Announcement

Collapse
No announcement yet.

Pcg32 prng

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pcg32 prng

    *Really* minimal PCG32 code / (c) 2014 M.E. O'Neill / pcg-random.org
    Licensed under Apache License 2.0 (NO WARRANTY, etc. see website)

    PCG32.zip

    The unzipped dll is only 20.0KB.

    I could not port the source code from FreeBASIC to PowerBASIC because 64-bit unsigned integers are used and I don't want to mess about with 'workarounds'

    Originally posted by Lance Edmonds 2 May 2001
    64-bit unsigned variables are on the wish list already. I'll ask for a tick in your name to be added. Thanks!
    Not all ticks came to much.

    PCG32 is a 64-bit generator, that is with a period of 2^64, with a 32-bit output. Besides a seed input it has a sequence input both of which are 64-bit unsigned integers. PB cannot handle 64-bit unsigned integers so seed and sequence are input as Quads. Warning: Stick to 0 to 2^63-1 like glue, you may not get what you want otherwise.

    It is a quality generator and the FreeBASIC version passes PractRand to 1TB. The author and others have pushed it to 16TB with the original C code. I have yet to test the dll but, in the meantime, ran John Walker's ENT as I always do. The services of PractRand are not called upon on an ENT failure.

    Here is an ENT output:
    Code:
    E:\ENT>ent "100MB.dat"
    Entropy = 7.999998 bits per byte.
    
    Optimum compression would reduce the size
    of this 104857600 byte file by 0 percent.
    
    Chi square distribution for 104857600 samples is 261.08, and randomly
    would exceed this value 38.35 percent of the times.
    
    Arithmetic mean value of data bytes is 127.4956 (127.5 = random).
    Monte Carlo value for Pi is 3.141687360 (error 0.00 percent).
    Serial correlation coefficient is 0.000031 (totally uncorrelated = 0.0).
    No problems there.

    Sequence is converted internally to 'Sequence OR 1' as only odd sequences are allowed. Sequencing is not that common with PRNGs. A given seed can have a given sequence so for every seed value we can have upto 2^63 different sequences. With Rnd and most generators we have just the one sequence.

    At the moment we have three procedures: PCG32DW, RandomizePCG32, PCG32SE and PCG32R

    PCG32DW

    This simply outputs random Dwords. I doubt that there will be much call for this, it was written for PractRand testing and left in the code.

    RandomizePCG32( Seed, Sequence )

    We must use RandomizePCG32, there is no default state, I am working on that.

    We could have RandomizePCG32( 123456, 654321 ) and this is analogous to 'Randomize n' generating the same random numbers on each invocation.

    Randomize( 0, 654321) will deliver a random seed using CryptGenRandom. FB has not 'cottoned onto' BCryptGenRandom yet.

    Randomize( 123456, 0 ) will deliver a random sequence.

    Randomize ( 0, 0 ) will deliver both a random seed and a random sequence analogous to Randomize Timer, well not quite. Internally, a 64-bit unsigned integer is clipped to 2^63-1. With 2^63 sequences, that is over 9 million trillion, the chances are that we will be using a sequence that we have never used before and unlikely to ever use again.

    PCG32SE

    This outputs a Double random number with a granularity of 32 bits. The SE stands for 'Single extended', Single not as in Single precision but as only one 32-bit output is used. At this stage I am not providing 53-bit granularity which would require two 32-bit outputs. Had I done so I would have called it PCG32D.

    PCG32R( First, Last )

    This is analogous to RND(a, b)

    The following is a test bed. I am using MultipleTimersLite.inc. On one run I got:
    Code:
    DW: 274MHz
    SE: 174MHz
    Range: 177MHz
    RND:  82MHz
    So, SE is over twice the speed of RND. This is not as fast as CryptoRndII which uses a lot more memory via two buffers and streams random numbers into the buffers.

    Testbed.bas
    Code:
    #Compile Exe
    #Dim All
    #Include "Win32API.inc"
    #Include "C:\PBWin\Timers\MultipleTimersLite.inc"
    
    Declare Sub RandomizePCG32 lib "PCG32.dll" Alias "RandomizePCG32" ( Byval Seed As Quad, Byval Sequence As Quad )
    Declare Function PCG32DW Lib "PCG32.dll" Alias "PCG32DW" As Dword
    Declare Function PCG32SE Lib "PCG32.dll" Alias "PCG32SE" As Double
    Declare Function PCG32R Lib "PCG32.dll" Alias "PCG32R" ( Byval First As Long, Byval Last As Long ) As Long
    
    Function PBMain
    Local i, n as long
    Local tot as double
    
    InitializeTimers
    
    ' Allow random Seed and Sequence.
    RandomizePCG32( 0, 0 ) ' We must use RandomizePCG32, there is no default state.
    
    ' Knock out 10 DWORDs
    Print "10 DWORDS" : Print
    For i = 1 To 10
      Print PCG32DW
    Next
    Print
    ' Knock out 10 singles
    Print "10 SE" : Print
    For i = 1 To 10
      Print PCG32SE
    Next
    Print
    ' Knock out 10 range
    Print "10 Range" : Print
    For i = 1 To 10
      Print PCG32R(0,255)
    Next
    Print
    
    ' Knock out average of 10^8 DWORDs
    For i = 1 To 10^8
      tot += PCG32DW
    Next
    tot = tot/10^8
    Print "Average of 10^8 DWORDs: ";tot
    
    Print
    tot = 0
    ' Knock out average of 10^8 singles
    For i = 1 To 10^8
      tot += PCG32SE
    Next
    tot = tot/10^8
    Print "Average of 10^8 SE: ";tot
    
    Print
    tot = 0
    ' Knock out average of 10^8 range (0,100)
    For i = 1 To 10^8
      tot += PCG32R(0,100)
    Next
    tot = tot/10^8
    Print "Average of 10^8 range (0,100): ";tot
    
    Print
    Print "Estimate of 'raw' throughput (Millions per second)"
    Print
    
    StartTimer(0)
    For i = 1 To 10^8
      PCG32DW
    Next
    StopTimer(0)
    Print "DW: ";Using$( "###", int(100000/nTimetaken(0)) );"MHz
    
    StartTimer(0)
    For i = 1 To 10^8
      PCG32SE
    Next
    StopTimer(0)
    Print "SE: ";Using$( "###", int(100000/nTimetaken(0)) );"MHz"
    
    StartTimer(0)
    For i = 1 To 10^8
      PCG32R(0,100)
    Next
    StopTimer(0)
    Print "Range: ";Using$( "###", int(100000/nTimeTaken(0)) );"MHz"
    
    StartTimer(0)
    For i = 1 To 10^8
      tot = rnd
    Next
    StopTimer(0)
    Print "RND: ";Using$( "###", int(100000/nTimeTaken(0)) );"MHz"
    
    Waitkey$
    
    End Function

  • #2
    Thank you very much for all your work.
    Pcg32.prng looks to have a nice balance of everything.

    Comment


    • #3
      Thanks, Dean.

      In testbed.bas the statements
      Code:
      ' Allow random Seed and Sequence.
      RandomizePCG32( 0, 0 ) ' We must use RandomizePCG32, there is no default state.
      can now be removed. The default is RandomizePCG32( 0, 0 ).

      We can, of course, invoke RandomizePCG32 at anytime, with RandomizePCG32( 123456, 654321 ) for example.

      Oh, I forgot to mention that there is no need to warm up the engine, 200 executions of PCG32DW are made at the end of the RandomizePCG32 procedure.

      I should also have mentioned that "C:\PBWin\Timers\MultipleTimersLite.inc" needs editing to wherever MultipleTimersLite.inc is located. It is another way to get folk to use that inc file, I get commission for every copy.

      Comment


      • #4
        I finally got PractRand to test PCG32.dll, I have never tested a dll before.

        Here is a 1TB test, as you can see we have a small anomaly at 8GB. I have never seen a clean sweep to 1TB and not seen many with only one small anomaly, even Intel's RdRand will throw up two or three in a 1TB test.

        So, PCG32.dll passes with flying colours.

        If you ever use PractRand you may not be aware of the multithreaded switch. On my Intel i7 I get a 25% boost and with the 1TB test itself taking nearly 75 minutes it is a welcome improvement.

        Code:
        E:\PR64>PCG_Rng | Rng_test stdin32 -multithreaded
        RNG_test using PractRand version 0.94
        RNG = RNG_stdin32, seed = unknown
        test set = core, folding = standard (32 bit)
        
        rng=RNG_stdin32, seed=unknown
        length= 512 megabytes (2^29 bytes), time= 2.5 seconds
          no anomalies in 178 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 1 gigabyte (2^30 bytes), time= 4.9 seconds
          no anomalies in 192 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 2 gigabytes (2^31 bytes), time= 9.4 seconds
          no anomalies in 204 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 4 gigabytes (2^32 bytes), time= 18.0 seconds
          no anomalies in 216 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 8 gigabytes (2^33 bytes), time= 35.9 seconds
          Test Name                         Raw       Processed     Evaluation
          [Low1/32]BCFN(2+4,13-3,T)         R=  -8.2  p =1-1.0e-4   unusual
          ...and 228 test result(s) without anomalies
        
        rng=RNG_stdin32, seed=unknown
        length= 16 gigabytes (2^34 bytes), time= 71.2 seconds
          no anomalies in 240 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 32 gigabytes (2^35 bytes), time= 137 seconds
          no anomalies in 251 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 64 gigabytes (2^36 bytes), time= 281 seconds
          no anomalies in 263 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 128 gigabytes (2^37 bytes), time= 558 seconds
          no anomalies in 273 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 256 gigabytes (2^38 bytes), time= 1083 seconds
          no anomalies in 284 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 512 gigabytes (2^39 bytes), time= 2240 seconds
          no anomalies in 295 test result(s)
        
        rng=RNG_stdin32, seed=unknown
        length= 1 terabyte (2^40 bytes), time= 4490 seconds
          no anomalies in 304 test result(s)

        Comment


        • #5
          Snapshot

          Two new functions: GetSnapshot; SetSnapshot

          GetSnapshot makes a copy of the current state vector.

          SetSnapshot returns the state vector to that taken at the last GetSnapshot.

          Just after the default RandomizePCG32 is executed a GetSnapshot is invoked.

          Latest version uploaded.

          Usage example:
          Code:
          #Compile Exe
          #Dim All
          #Include "Win32API.inc"
          
          Declare Function PCG32SE Lib "PCG32.dll" Alias "PCG32SE" As Double
          Declare Sub GetSnapshot Lib "PCG32.dll" Alias "GetSnapshot"
          Declare Sub SetSnapShot Lib "PCG32.dll" Alias "SetSnapshot"
          
          Function PBMain
          Local i As Long
          
          For i = 1 To 6
            Print PCG32SE
          Next
          Print
          
          SetSnapshot ' Back to start
          
          For i = 1 To 6
            Print PCG32SE
          Next
          Print
          
          For i = 1 To 1000
           PCG32SE
          Next
          
          GetSnapshot ' [1]
          
          For i = 1 To 6
            Print PCG32SE
          Next
          Print
          
          SetSnapshot ' Back to [1]
          
          For i = 1 To 6
            Print PCG32SE
          Next
          
          WaitKey$
          
          End Function

          Comment


          • #6
            I have been asked if it would be much of an effort to add a double with 53-bit granularity as opposed to 32-bit granularity. The reason for the request didn't seem to me to be particularly strong but since it was no effort it has been added, and called PCG32D. The throughput has not been halved, using two Dwords, as we call the engine twice from one function call. I would be interested what arguments others may have for using 53-bit granularity, my imagination is leaving me a little wanting.

            I have also included a floating-point range, called PCG32FR. Of course, we could write a function to do that using RND or a PCG32 value but that would probably slow things down. Having said that PCG32FR is not as fast as expected.

            The new declarations are:
            Code:
            Declare Function PCG32D Lib "PCG32.dll" Alias "PCG32D" As Double
            Declare Function PCG32FR Lib "PCG32.dll" Alias "PCG32FR" ( Byval First As Double, Byval Last As Double ) As Double
            The engine had 'Sequence Or 1' to ensure odd but this has been changed to 'Sequence' as the 'Or 1' is taken care of in RandomizePCG32 - I had forgotten to remove 'Or 1' in the engine. The throughput only gained 3% but is better than a poke in the eye with a sharp stick.

            I won't include another testbed but my results are as follows.
            Code:
            Estimate of 'raw' throughput (Millions per second)
            
            DW: 282MHz
            SE: 183MHz
            D: 111MHz
            Range: 180MHz
            Float Range: 76MHz
            RND: 82MHz
            As you can see 'D' is down but it is still 35% faster than RND. Float Range is disappointing but it is not exactly slow.

            Latest version uploaded.

            The dll is now a whopping 21KB!
            Last edited by David Roberts; 7 Apr 2019, 06:19 PM. Reason: Used RND(a,b) when I meant RND

            Comment


            • #7
              I have managed to get PCG32FR from 76MHz to 110MHz, just short of a 45% increase. I am using an optimizing compiler and it is almost impossible to second guess what it gets up to even with our own asm code. If ever I use asm in a procedure I always write two versions, one with asm and the other with wholly BASIC. Not always but sometimes the wholly BASIC is faster. Time does not permit but one day I will disassemble the BASIC native code to see what the compiler is up to.

              Latest version uploaded.

              Comment

              Working...
              X