No announcement yet.

Discussion: CMWC with lag PRNG

  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    I tested the generator with and without
    OR if x+1=0 then incr x: incr c
    and here are my findings: with it included, the dword &h0ffffffff never appears in any Q() element, and with it removed zero never appears in any Q() element. It's just a one number shift in 4gigs of dwords/longs. Nothing else measurably changes.

    Perhaps sir George Marsaglia thought it would be better for zeros to pop up now and then. Does it change the generator in any statistical way long term? Unknown to me, but I'd bet the Veyron it has no other effect.


    • #22
      John .... u have a Veyron ???? what do u own Meechigan ?


      • #23
        I did some asm optimization for float rnds, while also using &h0ffffffff as the divisor to compensate for -1's absence as you recommended, David.

        Again I fill the whole array (2 arrays actually) in one go for best speed, and since the filling requires only ~50 microseconds to complete on a 2.8GHz core, any stutter effect will typically be negligible.

        My timings averaged 41 tix / cmwc extended rnd, vs. 116 for RND (best case as register var), making CMWCfilledArrFp over 180% faster than RND.

        #COMPILE EXE
        #DIM ALL
        '#INCLUDE "WIN32API.INC"
        GLOBAL Q() AS DWORD 'each element will be a Random x. Used here to maintain internal state for EXT rnds
        GLOBAL Qx() AS EXT  'each element will be a Random x [0,1) ie. 0 to .99999...
        LOCAL i, n AS LONG, t AS QUAD
        REGISTER ii AS LONG, ex AS EXT
          n = 16384             ' 67MB EXT randoms generated
          CMWCfilledArrFp(4096) ' Initialize generator
        ' OPEN "c:\fpRnds.txt" FOR OUTPUT AS #1
          TIX t
          FOR i = 1 TO n
             cmwcFilledArrFp() ' both arrays get entirely replaced each loop. 41.1 tix / EXT rnd
        '    PRINT #1, Qx()
          TIX END t
          MSGBOX STR$(t / (n * 4096), 3) & " tix per random EXT"
        FUNCTION CMWCfilledArrFp( OPT OptionalLag AS LONG ) AS LONG  'Fp stands for floating point
        STATIC ii, j, Multiplier, BeenHere AS LONG, t AS QUAD
        STATIC cary AS DWORD ' Random And < Multiplier
        STATIC y AS QUAD
        STATIC qxPtr AS LONG
        STATIC reciproc AS EXT
        REGISTER x AS DWORD  ' MWC Random number generated
          IF ISFALSE BeenHere THEN ' Initialize Q() And carry
            LOCAL Lag AS LONG
            IF ISMISSING(OptionalLag) THEN
              Lag = 256
              Lag = OptionalLag
            END IF
            i = Lag -1
            j = i
            DIM Q(0 TO i) AS GLOBAL DWORD  'internal state array
            DIM Qx(0 TO i) AS GLOBAL EXT   'random EXT output array, 10 significant digits per element
            qxPtr = VARPTR( Qx(0) )
        '   CryptGenRandom( hProv, 4 * Lag, VarPtr( Q(0) ) )  ' Random x seeds
            LOCAL k AS LONG
            FOR k = 0 TO i
               TIX t                                                  'temporary stand-in
               Q(k) = (t AND &h0ffffffff) * (k + 19) AND &h0ffffffff  ' temporary, poor stand-in for CryptGenRandom--allows compile on old machine
               IF Q(k) = 4294967295 THEN DECR Q(k)
              ' x seeds should be less than 2^32-1
            SELECT CASE AS LONG Lag
              CASE 4
                Multiplier = 987654366 ' 2^157
              CASE 8
                Multiplier = 987651386 ' 2^285
              CASE 16
                multiplier = 987765178 ' 2^541
              CASE 32
                Multiplier = 987655670 ' 2^1053
              CASE 64
                Multiplier = 987651206 ' 2^2077
              CASE 128
                Multiplier = 987688302 ' 2^4125
              CASE 256
                Multiplier = 987662290 ' 2^8221
              CASE 512
                Multiplier = 123462658 ' 2^16410
              CASE 1024
                Multiplier = 5555698   ' 2^32790
              CASE 2048
                Multiplier = 1030770   ' 2^65559
              CASE 4096
                Multiplier = 18782     ' 2^131086
              CASE ELSE
                MSGBOX "Invalid lag given." + $CRLF + $CRLF + "A 256 lag will be used.", %MB_OK, "CMWC"
                Multiplier = 987662290 ' 2^8221
            END SELECT
        '   CryptGenRandom(hProv, 4, VarPtr(cary))
            TIX t                     'temporary stand-in for CryptGen line just above
            cary = t AND &h0ffffffff  'temporary stand-in
            cary = cary * (Multiplier-1)/4294967295 ' Make cary < Multiplier
            BeenHere = 1'%True
            reciproc = 1 / &h0ffffffff 'multiply by 1/&h0ffffffff rather than divide; to range rnd over [0,1)
          END IF
         !fld reciproc    ;you can load this once, saving tix
         FOR ii = 0 TO j
        '  i = ( i + 1 ) AND j ' Cycle around j, j = 4095
          !mov eax, i
          !mov ecx, j
          !add eax, 1     ;i + 1
          !and eax, ecx   ;AND j
          !mov i, eax
          x = Q(i)
          !mov eax, x
          !mov edx, Multiplier
          !mul edx        ;Multiplier * Q(i)
          !mov ecx, cary
          !add eax, ecx   ;+ cary
          !adc edx, 0     ;in case there is a carry of 1, add it.
          !mov cary, edx  ;store cary
          !add eax, edx   ;x = tQ + cary
          !cmp eax, edx   ;if(x<c) then incr2vars
          !jb short incr2vars ;jmps here rarely taken, maybe 1 per 450,000; so adds just ~4 tix. No big whup
          !cmp eax, -1    ;OR if(x=-1) ie. if x+1=0 then incr2vars
          !je short incr2vars ;jmp very rarely taken, maybe 1 per 4 billion
        incrReturn:       ' The CMWC aspect (Base-1) - x
          !mov edx, &H0FFFFFFFE
          !sub edx, eax   ;edx = &H0FFFFFFFE - x
          !mov y,   eax   ;store in y lo 4 bytes
          !mov x,   eax   ;and store in x
          !fild qword y   ;in PB9, "qword" is not required here.
          Q(i) = x
          !fmul st(0),st(1)     ;x / &h0ffffffff
          !mov eax, qxPtr       ;Qx(0) ptr
          !mov ecx, i           ;i is in register esi or edi.
          !lea ecx, [ecx*4+ecx] ;i*5
          !add ecx, ecx         ;then *2=i*10 effectively
          !fstp tbyte [eax+ecx] ;store our [0,1) ext. Note: only 10 signif. digits tho
        ' above fp asm does Qx(i) = x / &h0ffffffff
         !fstp st(0)            ;clear fpu register
          !add cary, 1    ;c++
          !add eax,  1    ;x++
          !jmp short incrReturn


        • #24
          I don't really own a Veyron. And I'm afraid my actual ol' Bugatti looks like kind of a heap next to one of those beasts.


          • #25
            Darn... I was getting ready to hitch-hike up there & beg u for a ride

            btw .. I knew Arthur Tweedale ... he won some big race ??? in a Bugatti in the '30's ???


            • #26
              It's just a one number shift in 4gigs of dwords/longs. Nothing else measurably changes.
              How about the period?

              If we used x = t Mod &H0FFFFFFFF then we do not need 'if(x<cary || x+1==0) {x++;cary++;}'.

              We do need it if we use x = t + c

              If we allow either condition to obtain then this is tantamount to tinkering with either the multiplier, and therefore the primitives, or the base 2^32-1. Tinkering with either, indiscriminately, would leave us with an indeterminate period. The primitives used are calculated for a base of 2^32-1. Incidentally, with a lag~r and r>1 the maximum possible period cannot be got with a base of 2^32.

              So, will the period collapse, perhaps to 4gigs, 2^32, if we dispense with checking x?

              I do not know.

              but I'd bet the Veyron it has no other effect.
              I wouldn't.

              Haven't had a chance to look over CMWCfilledArrFp yet but initial thoughts are the same as with the first introduction of buffering. The random numbers Qx() can only be collected with indexing and the index will need to be incremented on each use. We also need to watch the index in preparation for a buffer refresh. That "over 180% faster than RND" will drop. Not sure by how much. We need to see the method in use.

              Added: Why aren't you using CryptGenRandom, John? If you use the head of my post #13 you can go back as far as Win95. Not sure about Vista and Win7 but that code will work up to XP SP3.
              Last edited by David Roberts; 16 May 2012, 05:47 AM.


              • #27
                David, my test refers only to including or deleting Marsaglia's much later addition of '|| x+1==0 {x++;cary++;}', not the entire 'if(x<cary || x+1==0) {x++;cary++;}'. 'if(x<cary) {x++;cary++;}' is always used either way. I was testing to determine the effect of just that second condition on the output. I discovered that with it included, zero appears in the random output but -1 doesn't, and without it, -1 appears in the random output but zero doesn't.

                I think I first saw the algo in 2007, maybe even earlier, sans '|| x+1==0'. I don't know the date he decided to add it, but it sounds like it was quite recently. So the algo likely existed for years without it, and always had a reported period of 2^131086 (lag 4096). So it looks like the Veyron stays in my collection.

                Re. cryptGen: I used post #13 which still wouldn't compile, and then went searching for code I used a couple years ago that I did get to compile back then. After a long while of no joy, I finally surrendered.


                • #28
                  I have seen quite a few articles by Marsaglia in 2003 and in one them he eliminated two nagging problems by introducing the base 2^32-1 as opposed to 2^32. The earliest article I can find where the condition x + 1 = 0 is mentioned is in 2009 but, regrettably, without an explanation.

                  I have never seen an explanation by Marsaglia for the condition x < c. It was only when examining x = t + c it made sense when I realised the potential overflow.

                  I have just done a run where the 10000th generation was forced to be &H0FFFFFFFF. A msgbox displayed the five numbers before my intervention and 25 numbers after. Nothing untoward appeared to happen.

                  However, I must wonder if any damage had been done to the period and with a period of 2^131086 quite a few cups of tea may be drunk before we get any spilled milk.

                  x < c made no sense to me when I first saw it but now it does. x + 1 = 0 isn't making any sense other than making sure that x = 0 gets a look in; as you noted, John. It may be that simple. In theory we have [0,2^32-2] because we are using a base 2^32-1. In other words we generate 2^32-1 possibilities; and why, of course, we have to divide by 2^32-1 in converting to [0,1). If the value 2^32-1 is allowed to get through then to still have 2^32-1 possibilities, 0 will have to take a hit.

                  Anyway, my gut feeling is to leave the condition in just in case there is more to it than ensuring 0 gets a look in. I could create many 15MB dumps for testing without problem and then one day - 'bang'. Maybe it did go 'bang' and someone's simulation or whatever went 'pear shaped'.

                  Unfortunately, it will nag, just as x < c did, so, no doubt, I will call by this issue from time to time. We have all put something to one's side and on returning spotted a solution within minutes and wondered why we were banging our head against a brick wall previously.


                  • #29
                    A thought re. x + 1 = 0: it produces rands evenly over the range 0 to 2^32-2 inclusive, and dividing by (2^32-1), ie. x/&h0ffffffff, gives perfect scaling [0,1). [0,1) is the "standard" range that is used nearly everywhere for uniformly distributed decimal rands.

                    If, however, the integer rands range from 1 to 2^32-1, uniform decimal scaling x/&h0100000000 produces (0, 1), which would be a non-standard distribution. Perhaps GM was considering this.


                    • #30
                      Perhaps GM was considering this.
                      I do not think so.

                      Without 'short cuts' we have.
                      t = a*Q(i) + c  (1)
                      c = t \ &H0FFFFFFFF  (2)
                      x = t MOD &H0FFFFFFFF  (3)
                      All of the above will be >= 0.

                      (3) gives x belonging to the set {0,2^32-2}

                      We have no problems with the above and get the "standard", as you put it John, [0,1) when adhering to x/&h0ffffffff.

                      The problems start when we use 'c=(t>>32);x=t+c'

                      The apple cart is first upset with 'c=(t>>32)' and then compounded with x=t+c with t at 64 bit and x at 32 bit with potential overflow.

                      From Marsaglia's paper, Journal of Modern Applied Statistical Methods May,2003, Vol.2,No.1,2-13:
                      Complimentary-multiply-with-carry RNGs (CMWC) permit us to avoid both of those difficulties. By making b = 2^32-1, we can still exploit the way that integer arithmetic is carried out in modern CPUs (with a little fiddling for reductions mod 2^31-1 rather than mod 2^32).
                      "fiddling"! - this is a professor of Mathematics talking here.

                      He then proceeded to fiddle without giving mere mortals such as ourselves the slightest hint at what he was up to. Two years after the paper he shuffled off this mortal coil having batted for 86 years. I have not read the whole paper yet but I will. He was 84 when he wrote it!

                      This thread is getting a lot more hits than the Source Code forum entry. It is a pity most of them don't comment even if they sign off with 'Just my two cents'. Still, two heads are better than one and the source code is a hell of a lot better since you joined me John.
                      Last edited by David Roberts; 16 May 2012, 08:07 PM.


                      • #31
                        The proposal was to use a signed 32-bit value, multiply by 1/2^32 and then add 0.5.
                        So, with CMWCF(4096), which returns LONGs, I used t = CMWCF(4096)*0.0000000002328306437080797375 + 0.5; remembering that we multiply by 1/(2^32-1).
                        David, I believe you should multiply by 1/2^32 according to the proposal, not 1/(2^32-1). That I think avoids the potential negative result.


                        • #32
                          Hi John

                          It does avoid the negative result but the reason for the negative result is because putting a 32 bit unsigned value into the FPU's DWORD does not add one as we would do with a two's complement.

                          Consider the CPU's DWORD as used by the CMWC method.

                          0 1 ------ 2^31-1 2^31 2^31+1 ------ 2^32-2 ie 2^32-1 values.

                          When we '!fild Dword value' we get

                          -2^31 ------ -2 0 1 ------ 2^31-1

                          Note no -1.

                          The FPU would do a twos complement if we requested a negation.

                          The negative result comes in with -2^31/(2^32-1) + 0.5 => -1.164153E-10

                          By adding 1 to the negative values we get

                          -(2^31-1) ------ -1 0 1 ------ 2^31-1 giving a total of 2^32-1 values.

                          By dividing by 2^32-1 and adding 0.5 we then get

                          [1.16415323019557E-10, 1 - 1.16415323019557E-10]

                          I have just realised that the CMWCF, which returns a Long, will never return -1.

                          It should be adjusted so that we get [-2147483647, 2147483647] ie [-(2^31-1), 2^31-1]
                          Last edited by David Roberts; 3 Jul 2012, 07:29 PM. Reason: Had 2147483648, should have been 2147483647


                          • #33
                            Yes, that missing -1 is disconcerting. But not to worry, I've added a "rotator" to the code--just 3 tix--so the CWMC now produces all 4GB longs equally. That is, each number from 0 thru 4294967295 has an equal "chance" correlated 100% to the CWMC output.

                            For ease of timing and following code progress, I include the trusty
                            #COMPILE EXE
                            #DIM ALL
                            #REGISTER NONE
                            #INCLUDE ""
                            FUNCTION PBMAIN () AS LONG
                               LOCAL ii, value AS LONG, lineo AS STRING
                               STATIC rotator AS DWORD
                               LOCAL half,randomFloat,mult,lastFloat AS EXT, t AS QUAD
                               half = 1/2
                               mult = 1/&h100000000
                              FOR ii = -2147483648 TO &h07fffffff
                                 TIX t
                                 !mov edx, ii        ;for testing purposes, check every possible long
                                 !mov eax, rotator
                                 !add edx, eax
                                 !add eax, 61331     ;add a prime number so rotator cycles continually thru all numbers 0-&h0ffffffff
                                 !mov value, edx     ;the rotator evens out value distribution so no numbers (like -1) are skipped
                                 !mov rotator, eax   ;save so new rotator is ready for next rnd
                                 !fild Dword value ; Signed edx into st(0)
                                 !fld mult         ; mult into st(0), Signed edx into st(1)
                                 !fmulp st(1), st  ; multiply into st(1) Then pop, value now In st(0)
                                 !fld half         ; half into st(0), st(0) into st(1)
                                 !faddp st(1), st  ; Add into st(1) Then pop, value now In st(0)
                                 !fstp RandomFloat ; st(0) into RandomFloat
                                 TIX END t
                            IF (ii AND &hffffff) = 0 THEN
                             pcpn (t-33)           'on my machine, calling TIX twice (an empty TIX call) takes 33 tix, so subtract it for true tix count. Substitute your machines count for my 33.
                             END IF                'so the conversion to decimal takes 23 tix on my machine, and the correction and conversion takes
                            NEXT                   '26 tix--just 3 more--overall not so shabby!
                            ? "k"
                            END FUNCTION
                            'my output. notice the tix count consistancy at 26 tix.
                            ' 0 -2147483648  181
                            ' .578125 -2130706432  26
                            ' .15625 -2113929216  26
                            ' .734375 -2097152000  26
                            ' .3125 -2080374784  26
                            ' .890625 -2063597568  26
                            ' .46875 -2046820352  26
                            ' .046875 -2030043136  26
                            ' .625 -2013265920  26
                            ' .203125 -1996488704  26
                            ' .78125 -1979711488  26
                            ' .359375 -1962934272  26
                            ' .9375 -1946157056  26
                            ' .515625 -1929379840  26
                            ' .09375 -1912602624  26
                            ' .671875 -1895825408  26
                            ' .25 -1879048192  26
                            ' .828125 -1862270976  26
                            ' ...
                            Last edited by John Gleason; 4 Jul 2012, 08:33 AM. Reason: comment improved


                            • #34
                              CMWC outputs [0, 2^32-2] and, with a 'rotator', that can be converted to [-2^31, 2^31-1] and use a mult of 1/2^32 as opposed to 1/(2^32-1)?

                              I am going to have to play around with some smaller numbers because the penny is not dropping at the moment.


                              • #35
                                Indeed, a small dataset is the key to see how it works. I'll make one with say, 256 possibilities and fill it with just 255 numbers and a rotator.


                                • #36
                                  Here's the example.
                                  #COMPILE EXE
                                  #DIM ALL
                                  FUNCTION PBMAIN () AS LONG
                                     LOCAL ii, ii2 AS LONG, lineo AS STRING
                                     LOCAL rndMwc, carry, sgPrime AS DWORD
                                     LOCAL rndE AS EXT, badRnd, rotator AS BYTE
                                     rndMwc = 1876388432
                                     carry  = 3199389881
                                     sgPrime= 4160465475
                                     OPEN "c:\faultyRndNums.dat" FOR BINARY AS #1
                                     lineo = SPACE$(&h10000)
                                     FOR ii = 1 TO 12000000
                                        rndE = rndMwc * sgPrime + carry
                                        rndMwc = rndE MOD &h100000000
                                        carry = rndE \ &h100000000
                                        badRnd = rndMwc MOD 255         'should be MOD 256, so 255 (&h0ff) is not possible
                                        INCR ii2
                                        ASC(lineo, ii2) = badRnd
                                        IF ii2 = &h10000 THEN
                                           PUT #1,, lineo
                                           ii2 = 0
                                        END IF
                                     ? "Check file with rnd tests now. It will fail."
                                     OPEN "c:\faultyRndNums.dat" FOR BINARY AS #1
                                     lineo = SPACE$(&h10000)
                                     rndMwc = 1876388432                'reset to same as above
                                     carry  = 3199389881
                                     sgPrime= 4160465475
                                     ii2 = 0
                                     FOR ii = 1 TO 12000000
                                        rndE = rndMwc * sgPrime + carry
                                        rndMwc = rndE MOD &h100000000
                                        carry = rndE \ &h100000000
                                        badRnd = rndMwc MOD 255         'should be MOD 256, so 255 (&h0ff) is not possible
                                        INCR ii2
                                        rotator += 97
                                        ASC(lineo, ii2) = badRnd + rotator
                                        IF ii2 = &h10000 THEN
                                           PUT #1,, lineo
                                           ii2 = 0
                                        END IF
                                     ? "Check file with rnd tests now. It will probably pass--even diehard."
                                  END FUNCTION


                                  • #37
                                    So, given a set of randomly distributed bytes a(i), for i = 1 to n, with 0 <= a(i) <= 254 for all i and a prime number r
                                    then b(i) = (a(i) + i*r ) mod 256 will be randomly distributed with the same 'chance' as a(i) and with 0 <= b(i) <= 255 for all i.

                                    At the moment I am not bying that.

                                    The fact that a() fails a random number test and b() satisfies a random number test is not proof.

                                    I did one test with n = 15MB and b() was suspect.

                                    John, is this your own idea or do you have some links to others kicking it around or even proving its truth?


                                    • #38
                                      Here is another example of the principle. I did not expect this. The next question is whether it carries over into your usage, John.

                                      Anyway, although it is food for thought I wonder whether we need to change (0,1) to [0,1) or [-(2^31-1), 2^31-1] to [-2^31, 2^31-1] given that the former of both cases gives a perfect balance at 0.5 and 0 respectively.
                                      #Compile Exe
                                      #Dim All
                                      Function PBMain() As Long
                                      Dim a(0 To 255) As Byte
                                      Local i, j, p As Long
                                      Local msg As String
                                      For i = 0 To 255
                                        a(i) += p
                                        p += 3 ' Any prime number except 2
                                        ' p += 104729 ' works just As well 
                                      Array Sort a()
                                      For i = 0 To 255 Step 16
                                        For j = 0 To 15
                                          msg += Str$(a(i+j)) + " "
                                        msg += $CrLf
                                      MsgBox msg
                                      End Function
                                      Attached Files


                                      • #39
                                        >>John, is this your own idea or do you have some links to others kicking it around or even proving its truth?

                                        It's my idea 99.9% assured (someone else may have thought of it too, given 7 billion earthling souls), but it was independent of any others.

                                        The "chance" of b(i) being randomly distributed is correlated to, but not equal to, that of a(i), because b() takes the available 255 bytes of randomness and expands it to 256 bytes with no further input source of randomness. Now, is its resultant randomness therefore 255/256ths that of a()?

                                        If that postulate is true (I think it's exponential, therefore actually much closer to 1 than that), given our larger dword-sized sample, we're missing one number (&hffffffff) out of &h100000000 possible, so our randomness will be 4294967295/4294967296ths that of a().

                                        In effect we're trading a missing number--a noticeable flaw--for an undetectable statistical change.


                                        • #40
                                          It has since dawned on me that the missing -1 had nothing to do with twos complement - it is missing because 2^32-1 is missing; by design. Had the maximum been 2^32-3 then we would have had -2 and -1 missing. It never ceases to amaze me how we can develop a torturous explanation which turns out to be false only to find on returning to the problem at a later date the obvious smacks us in the mouth.

                                          With regard to [0,1) I have found a MMX solution here which knocks out about 120 million Singles per second on my machine. There is a Double variant which calls CMWC twice. Of course, our solution is in Extended which slows things down a bit.
                                          Last edited by David Roberts; 5 Jul 2012, 10:32 AM.