Announcement

Collapse
No announcement yet.

Discussion: CMWC with lag PRNG

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discussion: CMWC with lag PRNG

    I have two comments of my own.

    Why do we have 'if(x<c){x++;c++;}'?

    I was going to contact Professor Marsaglia but on searching for an email address I learnt that he died on 15 February 2011 aged 86. Only two years before that he made a call for multipliers for lags of 8192, 16384, 32768 or higher. What a guy!

    I can muddle through an attempt to use assembler in the section ' { Assembler here would be nice' but there are others here far better qualified than myself. Any takers - Mr Gleason?
    David Roberts
    Member
    Last edited by David Roberts; 2 May 2012, 04:03 PM.

  • #2
    "Why do we have 'if(x<c){x++;c++;}'?"

    I may have the answer. See post #2 here.

    Comment


    • #3
      IIRC, a few years ago I looked at code similar to this--or indeed this same code--where some programmers had asked George M. about very large period RNG's. I tried his code out just "as is" and remember it seemed fast and was good statistically.

      If you think it might need some asm where noted, I'll check the code and be happy to try a few things to see if I can get an improvement. I do have some kind of affinity for these RNG's.

      Comment


      • #4
        You may also be interested in this verry comprehensive test suite I came across a while back. It takes a long time to run, but has caught subtle flaws no other tests did. RNG tests

        Comment


        • #5
          Thanks John.

          Comment


          • #6
            The asm I show below should match your results exactly David, except for c=(tQ>>32) which is treated as effectively HI(DWORD, tQ). I think that's the proper conversion, but it is just slightly different than t \ &H0FFFFFFFF. Because the asm uses the equivalent of an integer unsigned quad (tQ) and avoids division and some other floating point calcs, it's likely faster now.

            tQ as a QUAD variable could be removed entirely, but I left it in for reference clarity in the asm below.

            It's possible the code may not quite compile because I wasn't able to test it all together on my laptop, but it is likely just a minor glitch if so.
            Code:
            #COMPILE EXE
            #DIM ALL
            
            #INCLUDE "WIN32API.INC"
            
            GLOBAL hProv AS DWORD
            
            FUNCTION PBMAIN() AS LONG
            LOCAL i, n AS LONG
            LOCAL temp AS DWORD
            LOCAL msg AS STRING
            
              hProv = GetCSPHandle
              IF hProv = 0 THEN
                MessageBox 0, "Failed to get Cryptographic Service Provider handle" + $CRLF + _
                "Probable cause: Operating system < XP SP3", "CMWC", %MB_IconError OR %MB_ApplModal OR %MB_Topmost
                EXIT FUNCTION
              END IF
            
              CMWC(4096) ' Initialize generator ' takes 0.824ms On my machine
            
              n = 20
            '  n = 15*1024*1024\4
            '  Open "15MB04096.txt" For Binary As #1
              FOR i = 1 TO n
                msg = msg + STR$(CMWC) + $CRLF
            '   temp = CMWC()
            '   Put #1, , temp
              NEXT
            '  Close #1
            
              MSGBOX msg
            
            '  MsgBox "Done"
            
              IF hProv THEN CryptReleaseContext hProv, 0
            
            END FUNCTION
            
            FUNCTION CMWC( OPT OptionalLag AS LONG ) AS DWORD
            
            DIM Q() AS STATIC DWORD ' Random x seeds
            STATIC i, j, Multiplier, BeenHere AS LONG
            STATIC cary AS DWORD ' Random And < Multiplier
            REGISTER x AS DWORD ' MWC Random number generated
            REGISTER t AS EXT
            LOCAL tQ AS QUAD
            
              IF ISFALSE BeenHere THEN ' Initialize Q() And carry
            
                LOCAL Lag AS LONG
                IF IsMissing(OptionalLag) THEN
                  Lag = 256
                ELSE
                  Lag = OptionalLag
                END IF
            
                i = Lag -1
                j = i
            
                REDIM Q(0 TO i) AS STATIC DWORD
                CryptGenRandom( hProv, 4 * Lag, VARPTR( Q(0) ) ) ' Random x seeds
                LOCAL k AS LONG
                FOR k = 0 TO i
                  IF Q(k) = 4294967295 THEN DECR Q(k)
                  ' x seeds should be less than 2^32-1
                NEXT
            
                SELECT CASE AS LONG Lag
                  CASE 4
                    Multiplier = 987654366 ' 2^157
                  CASE 8
                    Multiplier = 987651386 ' 2^285
                  CASE 16
                    multiplier = 987765178 ' 2^541
                  CASE 32
                    Multiplier = 987655670 ' 2^1053
                  CASE 64
                    Multiplier = 987651206 ' 2^2077
                  CASE 128
                    Multiplier = 987688302 ' 2^4125
                  CASE 256
                    Multiplier = 987662290 ' 2^8221
                  CASE 512
                    Multiplier = 123462658 ' 2^16410
                  CASE 1024
                    Multiplier = 5555698   ' 2^32790
                  CASE 2048
                    Multiplier = 1030770   ' 2^65559
                  CASE 4096
                    Multiplier = 18782     ' 2^131086
                  CASE ELSE
                    MSGBOX "Invalid lag given." + $CRLF + $CRLF + "A 256 lag will be used.", %MB_Ok, "CMWC"
                    Multiplier = 987662290 ' 2^8221
                END SELECT
            
                CryptGenRandom(hProv, 4, VARPTR(cary))
                cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
                BeenHere = %True
            
              END IF
            
              ' { Assembler here would be nice
            '  i = ( i + 1 ) AND j ' Cycle around j, j = 4095
              !mov eax, i
              !mov ecx, j
              !add eax, 1     ;i + 1
              !and eax, ecx   ;AND j
              !mov i, eax
            
              x = Q(i)
              !mov eax, x
              !mov edx, Multiplier
              !mul edx        ;Multiplier * Q(i)
              !mov ecx, cary
              !add eax, ecx   ;+ cary
              !adc edx, 0     ;in case there is a carry of 1, add it.
              !mov tQ,  eax   ;lo DWORD of tQ
              !mov tQ[4], edx ;hi DWORD. quad now complete: tQ = Multiplier * Q(i) + cary
            
              !mov eax, tQ[4] ;cary = hi(dword, tQ), same as c=(tQ>>32)
              !mov cary, eax  ;store cary
              !mov ecx, tQ    ;lo(dword, tQ) to ecx,
              !add ecx, eax   ;x = tQ + cary
                              ' The CMWC aspect (Base-1) - x
              !mov edx, &H0FFFFFFFE
              !sub edx, ecx   ;edx = &H0FFFFFFFE - x
              !mov x,   ecx   ;store in x
              Q(i) = x
            
              FUNCTION = x 'Q(i)
              '^^this asm generally does the below code, except c=(tQ>>32) which is treated as just HI(DWORD, tQ)
            
            '  i = ( i + 1 ) And j ' Cycle around j
            '  t = Multiplier * Q(i) + cary
            '  cary = t \ &H0FFFFFFFF ' Base = 2^32-1
            ''  x = t + cary
            '  x = t Mod &H0FFFFFFFF
            ''  If ( x < cary ) Then ' See next post why commented out
            ''    Incr x : Incr cary
            ''  End If
            '  Q(i) = &H0FFFFFFFE - x ' The CMWC aspect (Base-1) - x
            '  ' }
            '
            '  ' Q(i) = Q(i) * 1.000000000232831##  ; used when creating test files
            '
            '  Function = Q(i)
            
            END FUNCTION
            
            ' ***************************************************
            
            FUNCTION GetCSPHandle AS LONG
              ' We have a failure If the Function returns %False
              LOCAL hCSP AS DWORD
            
              IF CryptAcquireContext( hCSP, BYVAL %Null, BYVAL %Null, %PROV_RSA_AES, 0 ) = %False THEN
                ' Unable To Find a Default For %PROV_RSA_AES so force the issue first For XP
                IF CryptAcquireContext( hCSP, BYVAL %Null, $MS_ENH_RSA_AES_PROV_XP_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) = %False THEN
                  ' Still failed so force the issue For Vista And Win7
                  IF CryptAcquireContext( hCSP, BYVAL %Null, $MS_ENH_RSA_AES_PROV_A, %PROV_RSA_AES, %CRYPT_VERIFYCONTEXT ) <> %False THEN
                    FUNCTION = hCSP
                  END IF
                ELSE
                  FUNCTION = hCSP
                END IF
              ELSE ' found Default
                FUNCTION = hCSP
              END IF
            
            END FUNCTION
            
            ' ***************************************************

            Comment


            • #7
              From the Source Code forum

              On my machine I am getting about 10 million dwords a second, irrespective of which lag is used - a bit of assembler would help.
              From John Gleason

              it's likely faster now.
              Well, I can confirm that it is faster now - as in 45 MILLION DWORDS A SECOND! That is about 40% faster than PB's RND and with a period of 2^131086, for the lag~4096, as opposed to PB's 2^32.

              Using c=(tQ>>32) puts the 'if(x<c){x++;c++;}' mopping up code back into the frame or, at least, something along those lines. I need to prove its truth or come up with my own mopping up code.

              Thank you, John. We have further evidence that with a 'crunching' app a bit of PB's inline assembly language can elevate it from the stratosphere to interstellar space.

              With that kind of speed I have another idea to play with - I'll be back in the Source code forum in a few days.

              Comment


              • #8
                In the Source Code forum it is shown that x = t MOD (2^32-1) = t + c

                after taking 2^32*c = 0 because c is 32 bit.

                That last line could have been written better. 2^32*c = 0 because x is 32 bit ie the high dword of t + c is ignored.

                However, we will have an overflow if

                t MOD (2^32) + c > 2^32-1

                in which case we need to increment x.

                When an overflow occurs x < c.

                This explains the first part of Marsaglia's 'if(x<c){x++;c++;}'

                As for the second part I think we simply say if x is incremented then for x = t + c to hold we must increment c as well.

                I considered dozens of tests and found that c did need incrementing when an overflow occurred.

                It would appear then that Marsaglia's 'if(x<c){x++;c++;}' is correct.

                There is a scenario where this method fails and that is when c should be incremented and an overflow would occur if it was. Since it isn't then neither x or c will be incrmented.

                It is worth noting, however, that the chance of an overflow is very rare for a lag~4096 since there only 18781 values of c that we can add to the low dword of t which can take 2^32-1 values so missing an overflow because c is under calculated will be rarer still.

                The situation is different for a lag~2 as c can now take 987654366 values but missing an overflow should still be a rare event. I will give that some further thought.

                This may be a good reason to use high lag values as they have the smaller multipliers and so smaller values that c can max out to. Having said that I did a few tests with CMWCSeed which uses a lag~256 and found the results to be OK.

                We don't have to consider 'if x < c ...'. In John's code above we can delay storing cary and then jump over incrementing x and c if the carry flag is not set on '!Add ecx, eax ;x = tQ + cary'
                Code:
                  !mov eax, tQ[4] ;cary = Hi(Dword, tQ), same As c=(tQ>>32)
                  '  !mov cary, eax  ;store cary
                  !mov ecx, tQ    ;Lo(Dword, tQ) To ecx,
                  !Add ecx, eax   ;x = tQ + cary
                                  ' The CMWC aspect (Base-1) - x
                  !jnc NoOverFlow
                  !inc ecx        ; inc x
                  !inc eax        ; inc cary
                 
                NoOverFlow:
                  !mov cary, eax  ;store cary
                  !mov edx, &H0FFFFFFFE
                There is another tweak to John's code which gives a liitle extra speed. The 'engine' is embraced between x = Q(i) and Q(i) = x. If we declare a Static q0Address As Byte Ptr and equate q0Address with VarPtr( Q(0) ) in the initialization then we can use q0Address in assembler to collect and store Q(i)).

                I am now getting a little over 48 million dwords a second.

                Assembler versions of CMWC and CMWCSeed have been posted in the Source Code forum here.
                David Roberts
                Member
                Last edited by David Roberts; 9 May 2012, 10:13 AM. Reason: Added link to Source Code forum.

                Comment


                • #9
                  Couple of tips:

                  1) After

                  CMWC(4096) ' Initialize generator

                  which takes 0.824ms on my machine regardless of which lag~r is used.

                  add

                  For i = 1 To 100000 : CMWC : Next ' takes 1.915ms on my machine

                  This will 'warm up' the generator and put us some distance from the initial values.

                  2) Let your machine run a while to let CryptGenRandom's entropy pool build up; unless you are using something else to generate the initial values.

                  Comment


                  • #10
                    Found a revision to Marsaglia's 'if(x<c){x++;c++;}' namely 'if(x<cary || x+1==0) {x++;cary++;}' in a 2009 post of his.

                    t MOD (2^32-1) cannot give a remainder of 2^32-1 hence our output range is [0, 2^32-2].

                    However, with x = t MOD (2^32-1) = t + c we could get t MOD (2^32) + c = 2^32-1; the boundary just before an overflow.

                    Another rare event but should be acted upon, nonetheless.

                    Perhaps I should have spotted it but it took me long enough to understand the first condition and Marsaglia took a while to introduce it himself. I cannot see anyone else incorporating either.

                    Source Code here updated.

                    Comment


                    • #11
                      I changed the function shown below to 1) fill the entire array in one call, and 2) work the asm to account for the 'if(x<cary || x+1==0) {x++;cary++;}' while hopefully being efficient.

                      The bigger improvement should be the whole-array-fill I would think, because it avoids thousands of function calls per array fill.

                      My old machine can't handle the crypt functions, so I just put some real poor initializing code in there so it will compile.
                      Code:
                      #COMPILE EXE
                      #DIM ALL
                      
                      #INCLUDE "WIN32API.INC"
                      
                      GLOBAL Q() AS DWORD 'each element will be a Random x
                      
                      FUNCTION PBMAIN() AS LONG
                      LOCAL i, n AS LONG, t AS QUAD
                      
                        n = 16384           ' 1/4 gig of random bytes
                        CMWCfilledArr(4096) ' Initialize generator
                      
                        TIX t
                        FOR i = 1 TO n
                            cmwcFilledArr() ' array gets entirely replaced each loop
                        NEXT
                        TIX END t
                      
                        MSGBOX STR$(t / 16384^2, 3) & " tix per random byte"
                      
                      END FUNCTION
                      
                      
                      FUNCTION CMWCfilledArr( OPT OptionalLag AS LONG ) AS DWORD
                      
                      STATIC i, j, Multiplier, BeenHere AS LONG
                      STATIC cary AS DWORD ' Random And < Multiplier
                      STATIC q0Address AS BYTE PTR
                      REGISTER x AS DWORD  ' MWC Random number generated
                      REGISTER ii AS LONG
                      
                        IF ISFALSE BeenHere THEN ' Initialize Q() And carry
                      
                          LOCAL Lag AS LONG
                          IF ISMISSING(OptionalLag) THEN
                            Lag = 256
                          ELSE
                            Lag = OptionalLag
                          END IF
                      
                          i = Lag -1
                          j = i
                      
                          DIM Q(0 TO i) AS GLOBAL DWORD
                          q0Address = VARPTR( Q(0) )
                      '    CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) )  ' Random x seeds
                          LOCAL k AS LONG
                          FOR k = 0 TO i
                             Q(k) = getTickCount * (k + 19) AND &h0ffffffff  ' temporary, poor stand-in for CryptGenRandom--allows compile on old machine
                            IF Q(k) = 4294967295 THEN DECR Q(k)
                            ' x seeds should be less than 2^32-1
                          NEXT
                      
                          SELECT CASE AS LONG Lag
                            CASE 4
                              Multiplier = 987654366 ' 2^157
                            CASE 8
                              Multiplier = 987651386 ' 2^285
                            CASE 16
                              multiplier = 987765178 ' 2^541
                            CASE 32
                              Multiplier = 987655670 ' 2^1053
                            CASE 64
                              Multiplier = 987651206 ' 2^2077
                            CASE 128
                              Multiplier = 987688302 ' 2^4125
                            CASE 256
                              Multiplier = 987662290 ' 2^8221
                            CASE 512
                              Multiplier = 123462658 ' 2^16410
                            CASE 1024
                              Multiplier = 5555698   ' 2^32790
                            CASE 2048
                              Multiplier = 1030770   ' 2^65559
                            CASE 4096
                              Multiplier = 18782     ' 2^131086
                            CASE ELSE
                              MSGBOX "Invalid lag given." + $CRLF + $CRLF + "A 256 lag will be used.", %MB_OK, "CMWC"
                              Multiplier = 987662290 ' 2^8221
                          END SELECT
                      
                      '    CryptGenRandom(hProv, 4, VarPtr(cary))
                          cary = 123456789  'temporary stand-in
                          cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
                          BeenHere = %True
                      
                        END IF
                      
                       FOR ii = 0 TO j
                      '  i = ( i + 1 ) AND j ' Cycle around j, j = 4095
                        !mov eax, i
                        !mov ecx, j
                        !add eax, 1     ;i + 1
                        !and eax, ecx   ;AND j
                        !mov i, eax
                      
                        x = Q(i)
                        !mov eax, x
                        !mov edx, Multiplier
                        !mul edx        ;Multiplier * Q(i)
                        !mov ecx, cary
                        !add eax, ecx   ;+ cary
                        !adc edx, 0     ;in case there is a carry of 1, add it.
                        !mov cary, edx  ;store cary
                        !add eax, edx   ;x = tQ + cary
                        !cmp eax, edx   ;if(x<c) then incr2vars
                        !jb short incr2vars ;jmps here rarely taken, maybe 1 per 450,000; so adds just ~4 tix. No big whup
                        !cmp eax, -1    ;OR if(x=-1) ie. if x+1=0 then incr2vars
                        !je short incr2vars ;jmp very rarely taken, maybe 1 per 4 billion
                      
                      incrReturn:       ' The CMWC aspect (Base-1) - x
                        !mov edx, &H0FFFFFFFE
                        !sub edx, eax   ;edx = &H0FFFFFFFE - x
                        !mov x,   eax   ;store in x
                        Q(i) = x
                        NEXT
                      
                        EXIT FUNCTION
                      
                      incr2vars:
                        !add cary, 1    ;c++
                        !add eax,  1    ;x++
                        !jmp       incrReturn
                      
                      END FUNCTION

                      Comment


                      • #12
                        One other thing re. initialization: with these big lags, this algo needs really good initial random seeding like the crypt seeding. If there is a flaw in the seeding, the generator won't settle down to good randoms for up to, in the lag 4096 case, a whopping 156 megabytes (39,000,000 dwords).

                        Below is a plot of chi values that demonstrate this--each x-axis point represents 256 bytes of data for which the chi sqr is calculated and should be centered at ~254. I knew I hadn't seeded the 4096 array with great numbers by any means, but it was still quite a shock to see such slow divergence.

                        So, the crypt seeding looks like the way to go. Otherwise, wait until the 40 millionth random dword is generated to start using them. Again, this is with lag 4096.
                        Attached Files

                        Comment


                        • #13
                          My old machine can't handle the crypt functions, so I just put some real poor initializing code in there so it will compile.
                          The following will work on Win95 through to XP SP3. It may work on Vista and Win7 but by then we should be using the newer Cryptographic Service Provider (CSP).
                          Code:
                          %PROV_RSA_FULL = 1
                          $MS_DEF_PROV = "Microsoft Base Cryptographic Provider v1.0"
                          %CRYPT_VERIFYCONTEXT = &hF0000000
                           
                          Function GetCSPHandle As Long
                           
                            Local hCSP As Dword
                           
                            If CryptAcquireContext( hCSP, ByVal %Null, ByVal %Null, %PROV_RSA_FULL, 0 ) = %FALSE Then
                              If CryptAcquireContext( hCSP, ByVal %Null, $MS_DEF_PROV, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT ) = %FALSE Then
                                Function = hCSP
                              Else
                                Function = hCSP
                              End If
                            Else
                              Function = hCSP
                            End If
                           
                          End Function
                          For < compilers 10 &/or 6 then the following are also needed.
                          Code:
                          Declare Function CryptGenRandom Lib "AdvAPI32.dll" Alias "CryptGenRandom" _
                              (ByVal hProv As Dword, ByVal dwLen As Dword, ByVal pbBuffer As Byte Ptr) As Long
                           
                          Declare Function CryptAcquireContext Lib "AdvAPI32.dll" Alias "CryptAcquireContextA" _
                              (phProv As Dword, szContainer As AsciiZ, szProvider As AsciiZ, _
                               ByVal dwProvType As Dword, ByVal dwFlags As Dword) As Long
                           
                          Declare Function CryptReleaseContext Lib "AdvAPI32.dll" Alias "CryptReleaseContext" _
                              (ByVal hProv As Dword, ByVal dwFlags As Dword) As Long
                          I have just used all of the above on my XP SP3 machine using PBWin 9 and it worked fine - the old CSP (< XP SP3) stayed, on going from SP2 to SP3.

                          The first block of code will be of no use with regard to CMWCSeed as it uses the SHA2 family of hash functions which were not available with the old CSP. All is not lost as there are PB versions of SHA256 and SHA512 courtesy of Greg Turgeon in the Source Code forum.

                          BTW, John, CryptGenRandom fills the entire array in one call - 4 * Lag worth of bytes are generated and 'dropped' into memory beginning at address VarPtr( Q(0) ).

                          So, the crypt seeding looks like the way to go.
                          OK, we are doing that now.

                          Marsaglia suggests seeding the array with calls to other PRNGs but CryptGenRandom is so convenient. As intimated above it may be a good idea to let our machine run a while to let the entropy pool build up. However, if we cannot wait then we could exploit, from SDK, "Optionally, the application can fill this buffer with data to use as an auxiliary random seed." We could have , for lag~4096, 16KB of quantum random numbers stored in DBs, read them, randomly shuffle them, perhaps with RND, and then preload the array.

                          Re 'if(x<cary || x+1==0) {x++;cary++;}' it is now catered for in the latest versions.

                          Instead of your
                          Code:
                          !add eax, edx   ;x = tQ + cary
                          !cmp eax, edx   ;if(x<c) then incr2vars
                          !jb short incr2vars
                          I have
                          Code:
                            !Add ecx, eax   ;x = tQ + cary
                            !jnc NoOverFlow
                              !inc ecx ' Incr x
                              !inc eax ' Incr cary
                           
                          NoOverFlow:
                          and instead of
                          Code:
                          !cmp eax, -1    ;OR if(x=-1) ie. if x+1=0 then incr2vars
                          !je short incr2vars
                          I have
                          Code:
                          NoOverFlow:
                            !cmp ecx, &H0FFFFFFFF
                            !jne NotAboveRange
                              !inc ecx ' Incr x
                              !inc eax ' Incr cary
                           
                          NotAboveRange:
                          So, I am 'falling through'.

                          Is there much in it?

                          Ah, I see that you have dropped using tQ. I'll have another head scratch.
                          David Roberts
                          Member
                          Last edited by David Roberts; 11 May 2012, 05:15 AM.

                          Comment


                          • #14
                            The difference in our asm is, like you say, not going to be much. I wanted to get rid of the tQ quad since it wasn't really necessary, tidying up the code some.

                            But the big boost comes from the whole array fill per function call. I clocked it 2.4x faster than calling the function one time per random. So since you were seeing about 48 megs per second before, that might now bump to 115, if we're lucky.

                            Comment


                            • #15
                              Actually, John, the penny did not drop straight away about filling the whole array per function call. I thought you were talking about the initialization; which, obviously, was not the case when I saw the use of GetTickCount. The penny dropped when I saw 'FOR ii = 0 TO j'.

                              What we have then is a global buffer, initialized with random seeds, gets a warming up treatment or not, and, finally, gets populated with random numbers which an application can simply collect as and when required. Filling the buffer in one hit rather than r calls, where r is from lag~r, will, of course, be faster and you have seen 2.4x faster with, I assume, a lag~4096.

                              Collecting the random numbers from a buffer will, of course, be blindingly fast compared with the time to generate them.

                              However, if someone is satisfied with the return from an unbuffered approach and is looking for millions of random numbers then they may not be happy with their application 'stuttering' each time the buffer needed refreshing. On the other hand someone requiring less than 4KB of random numbers would only need one final pass of the function, after warming up if any, and then access them at the speed of lightning; even with the necessity of indexing.

                              I would prefer to leave the Source Code as unbuffered and let folks rewrite it if they want buffering. Most PRNGs are un-buffered. RND is unbuffered and so is Mersenne Twister. CryptGenRandom, on the other hand, is buffered.

                              Your new asm was better than mine. I was jumping on the most likely events when I should have been jumping on the least likely events. Getting rid of the tQ Quad did more than tidy up. We are no longer accessing RAM and I'd forgotten how time consuming it is creating Local storage each time a function is called.

                              On my machine I am now getting 61 million dwords a second for lag~4096. The likelihood of an overflow or an out of range random number is greater the smaller the lag value used because the greater is the multiplier. For a lag~256 I am getting 55 million dwords a second. That lag~4096 value is getting very close to twice the speed of RND.

                              I suggested in post #9 a 'warm up' and was as surprised as you to see how a poor set of initial values required such a large 'warm up' period. I would be interested in a graph, as in your post #12, using CryptGenRandom. I suspect that a 'warm up' will still be needed but not as long as with poor initial values.

                              Source code has been updated and noted at the top of the last post there.

                              Comment


                              • #16
                                With the speed tests I have been using 'temp = CMWC'. One of the reasons why I did not publish a sll or dll, as I have being doing of late, was so that you guys could tailor the function to your requirements - perhaps you need a [0,1) as with RND or a range. Added: BTW, for [0,1) we should divide dword by 2^31-1 since the output is [0, 2^32-2] and not [0, 2^32-1] as with a base 2^32 MWC/CMWC. To be fair then I should speed test CMWC without an assignment. With lag~4096, in particular, the 61 million dwords a second mentioned above becomes 66 million dwords a second. With the assembler versions it is best, obviously, to do any changes to the output within the assembler code and have the Function return Ext or Long as appropriate.

                                With all the variables in the Function head declared as Static I wondered if CMWC could be converted to a FastProc which PB introduced in compilers 10 & 6.

                                With the lag as no longer optional and the Function return as Long, plus a little tweaking to adhere to FastProc protocol, I am now getting over 124 million longs a second.

                                I am so pleased with that because of this quote from a post by Marsaglia.

                                The above CMWC4096( ) RNG produces 32-bit random integers
                                at a rate of 132 million per second, or 110 million
                                per second as part of a CMWC4096( )+CNG+XS KISS.
                                gcc, Vista, 2.4GHz Intel Core2 Quad Q6600
                                My machine uses an Intel Core2 E6700 @ 2.67GHz.

                                I did not believe him - I figured that he meant bytes - I do now.

                                Added: Oops. To compare like with like - without assignment I get 135 million longs a second. Yeaaaaaaaaah.

                                Now, FastProc returns Longs whether we like it or not so any conversion will have to be done outside of the Function. Using CDwd(CMWCF(4096)) will clip that 124 million back to about 58 million. However, CDwd's syntax is CDwd(numeric_expression). This does more than we need - we need CDwd(Long) and an assembler version would probably be best. Perhaps John can step in here as well.

                                I have posted a FastProc version of CMWC, imaginatively named CMWCF, in the Source Code forum here.
                                David Roberts
                                Member
                                Last edited by David Roberts; 13 May 2012, 10:46 AM.

                                Comment


                                • #17


                                  What is it they about wood and trees?

                                  Local result as Dword

                                  result = CMWCF(4096) + 2147483648

                                  Oh, dear.

                                  Of course, when we do any conversion we simply remember that we are dealing with Longs and not Dwords. Any conversion will reduce the rate of return but it will not matter whether we are dealing with Longs or Dwords.
                                  David Roberts
                                  Member
                                  Last edited by David Roberts; 13 May 2012, 09:26 AM.

                                  Comment


                                  • #18
                                    I have just added a FastProc function in the Source Code forum to generate a random float in the range [0,1), as does Rnd without parameters, and is about 23% faster than Rnd.

                                    Comment


                                    • #19
                                      I have just added a FastProc function in the Source Code forum to generate a random number within a range analogous with Rnd(a,b). It is over 5 times faster than RND.

                                      The lag is hard wired at 4096 as FastProcs limit us to two ByVal Longs as parameters.

                                      Comment


                                      • #20
                                        John Gleason has come to the fore again with a faster way to generate [0,1) numbers.

                                        We are now getting 63% faster than Rnd compared with 23% previously.

                                        CMWCFF updated in the Source Code forum.

                                        Comment

                                        Working...
                                        X