Announcement

Collapse
No announcement yet.

Rnd2 disscussion

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • That sounds like a ringing endorsement! I will try it out asap and report back.

    Reporting back...
    Code:
    PB9
    RND:           1.000  Reference 2666 ms
    Rnd2 [0,1):    0.752
    Rnd2 (-1,1):   0.776
    
    *****Integer Mode*****
    RND(1, 52):   1.645
    Rnd2(1, 52):  0.591
    RND(-2147483648, 2147483647):   1.614
    Rnd2(-2147483648, 2147483647):  0.603
    That little unanticipated change makes it ~17% faster. Man. Who'd have guessed? Fortunately, Gary Ramey! Think of the asm tweaking you often have to do to get 17%.
    Last edited by John Gleason; 5 Nov 2008, 07:11 AM. Reason: show results

    Comment


    • Gary, unfortunately it looks like there is a problem with BYVAL. What happens if it gets eg. Rnd2(-255, 0) or Rnd2(255, 0), ie. the second parameter is zero? I don't see a way to determine if it's zero or not present. :dang:

      Comment


      • Okay, I'll think about BYVAL some more.

        Found a way to shorten the top portion of noParams to 13 opcodes. My relative Rnd2() speeds dropped from 0.750 & 0.828 to 0.725 & 0.690.

        Code:
        '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        GOTO noParams   'doesn't get executed, buy may help PB w/ alignment
          noParams:  'No parameters, so calc extd rnd2. (v.GDR20081106a)
            !mov eax, mwcSoph       'eax << soph
            !mul mwcRandom          'edx:ead << soph * rand0
            !ADD eax, mwcCarry      'eax << eax + carry0 ; this is rand1
            !mov eq, eax            'eq(low) << rand1
            !adc edx, 0             'edx << edx + cf ; this is carry1
            !mov ecx, edx           'ecx << edx ; holding our carry1
            !mul mwcSoph            'edx:eax << rand1 * soph
            !ADD eax, ecx           'eax << eax + carry1 ; this is rand2
            !adc edx, 0             'edx << edx + cf ; this is carry2
            !mov mwcRandom, eax     'mwcRandom << rand2
            !lea ecx, eq            'load address for eq ; this is fastest sequence for P5
            !mov [ecx+4], eax       'eq(high) << rand2
            !mov mwcCarry, edx      'mwcCarry << carry2
            ''' end of optimized 2x thru mwc w/ saves to mwcRandom, mwcCarry, and eq.
            
          okQuadRnd:                'now we have a complete rand quad in eq
        Added: Okay, abandon the BYVAL idea. I missed the fact that BYVAL fills even OPT parameters with zero when loading the stack for the function call.
        Last edited by Gary Ramey; 5 Nov 2008, 05:51 PM.

        Comment


        • I'm looking at your code Gary, and had one other idea you may find interesting / hilarious. I'll post it soon as possible.

          Comment


          • Okay, you have me curious... Sounds like you have figured out something that should have been obvious.

            By the way, are you pondering what to do with this fact? Under noParams, we could skip the FMULP operation. The FILD eq, followed by FSTP rndE, takes care of the twos complement and normalizing issues for our value. For the exponent ranges we're working with, the 2^-63 multiplication via FMULP is the same as taking the upper two bytes of rndE and subtracting 3F. I just doubt there is anything more efficient than what we are already doing. FSCALE is too slow, and doing a separate set of moves and subtraction (on just one word) is also much slower. I'm happy with the speeds we have.

            Comment


            • For the exponent ranges we're working with, the 2^-63 multiplication via FMULP is the same as taking the upper two bytes of rndE and subtracting 3F.
              Wow, I'd never have seen that. It's a real lesson in asm to me. I wish this following thought were so advanced.

              I'm in the process of tinkering with an idea that, before I spend too much time on it, I have to see if a significant speed up occurs. You may chuckle at this, but basically it is: What if we declare just one variable in the function? We're virtually all in asm anyway, so just make all the memory references relative to storeState, and make it encompass all 9 variables. Our whole declare then is:
              Code:
              STATIC storeState AS STRING * 60
              and we save 8 variables.

              At the top, we'll just !push esi: !lea storeState, esi then reference say mwcCarry with eg. !mov eax, [esi+8] instead of !mov eax, mwcCarry. You're probably now aren't ya?

              Comment


              • ON CRYPTO

                Here is one way to have a procedural Crypto as opposed to an object Crypto giving access to pre PB8/CC5 users.

                There is now no need for a %USECRYPTO = 0 or 1.

                Right, here we go.

                Remove: %USECRYPTO =

                Remove: Macro Rnd2goCrypto and Macro Rnd2stopCrypto

                Add: Declare Function GetCryptoBytes( ByVal Which As Long, ByVal hProvProc As Long, x As Ext) As Long

                Add:
                Code:
                $MS_DEF_PROV = "Microsoft Base Cryptographic Provider v1.0"
                %PROV_RSA_FULL = 1
                %CRYPT_VERIFYCONTEXT = &hF0000000
                %XP = 1
                 
                Function GetCryptoBytes( ByVal Which As Long, ByVal hProvProc As Long, x As Ext ) As Long
                  Dim BinaryByte(9) As Static Byte
                  Static hexSize As Dword
                  hexSize = 10
                  If Which = %XP Then
                    Call Dword hProvProc Using RtlGenRandom( BinaryByte(0), hexSize )
                  Else
                    CryptGenRandom( hProvProc, hexSize, BinaryByte(0) )
                  End If
                  x = Peek( Ext, VarPtr( BinaryByte(0)))
                End Function
                At the head of Rnd2 add:
                Code:
                Static os As OSVERSIONINFO, strOS As String
                Static CryptoXP As Long
                Static hProvProc, hLib As Dword
                In the initialization section add:
                Code:
                os.dwOSVersionInfoSize=SizeOf(os)
                GetVersionEx os
                If os.dwPlatformId = %VER_PLATFORM_WIN32_NT Then
                  strOS = Trim$(Str$(os.dwMajorVersion)) & "." & Trim$(Str$(os.dwMinorVersion))
                  If strOS > "5.0" Then ' we have XP/Vista/Server 2003 or 2008
                    CryptoXP = %XP
                  Else
                    CryptoXP = 0
                  End If
                Else
                  CryptoXP = 0
                End If
                 
                ' hProvProc is either an address or provider depending upon whether the user has XP/Vista/Server 03 or 08 or not respectively.
                If CrypToXP = %XP Then
                  hLib = LoadLibrary( "advapi32.dll")
                  hProvProc = GetProcAddress(hLib, "SystemFunction036")
                Else
                  CryptAcquireContext( hProvProc, ByVal %Null, ByVal %Null, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                  CryptAcquireContext( hProvProc, ByVal %Null, $MS_DEF_PROV, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                End If
                As you know, my initialization is done in PBMain via Case 7. There is some tidying up to do and I suggest this be done via Case 8 as follows:
                Code:
                Case 8
                  If CryptoXP = %XP Then
                    FreeLibrary hLib
                  Else
                    CryptReleaseContext hProvProc, 0
                  End If
                So, we used to have Rnd2GoCrypto/Rnd2StopCrypto and I now have Rnd2SetUp/Rnd2TidyUp.

                Nearly there.

                In the Case 2 code replace GetCrypto.Bytes(rndE) with GetCryptoBytes(CryptoXP, hProvProc, rndE).

                With regard timing I was getting about a 15.8 x 1.000 Ref with Crypto and the faster generator and now I'm getting about 16.1 x 1.000 Ref so, nothing in it except, of course, all the other metrics are not encumbered by the presence of an object; which I was not expecting when I introduced Crypto. The bonus from this method is, as mentioned, Crypto is now available to users of the earlier compilers.

                PS: The faster generator takes about one quarter of the time of the slower generator.
                Last edited by David Roberts; 7 Nov 2008, 04:27 AM.

                Comment


                • This relates to Gary's post several back re. BYVAL function--the problem was not being able to distinguish zero. Solution is to use !LEA like we did originally rather than !mov at the top compares. The other changes lower in the function are fine and remain the same. Speed? well you be the judge, but better watch your socks again.
                  Code:
                    !lea eax, Two       ;go back to this
                  '  !mov eax, Two      ;don't use this
                    !cmp eax, 0         ;now it's comparing if there's a valid address
                    !jne twoParams      ;IF OPTIONAL parameter Two passed, belt down TO INTEGER Rnd2 CODE section
                    !lea ecx, One       ;same as above for Two
                  '  !mov ecx, One      ;comment out
                    !cmp ecx, 0
                    !je noParams        ;IF no params, boogie down TO noParams CODE
                  Code:
                  PB9 optimize size
                  RND:           1.000  Reference 2799 ms
                  Rnd2 [0,1):    0.636
                  Rnd2 (-1,1):   0.635
                  *****Integer Mode*****
                  RND(1, 52):   1.560
                  Rnd2(1, 52):  0.556
                  RND(-2147483648, 2147483647):   1.565
                  Rnd2(-2147483648, 2147483647):  0.626
                  
                  
                  PB9 optimize speed
                  RND:           1.000  Reference 2674 ms
                  Rnd2 [0,1):    0.668
                  Rnd2 (-1,1):   0.665
                  *****Integer Mode*****
                  RND(1, 52):   1.673
                  Rnd2(1, 52):  0.629
                  RND(-2147483648, 2147483647):   1.634
                  Rnd2(-2147483648, 2147483647):  0.571

                  Comment


                  • John, I've been a bit busy this last few days and came back to knock out the revised Crypto code and may have missed something. I don't recognise the code you changed and it is not clear what previous alterations are to remain.

                    I have just GPF'd.

                    Comment


                    • Originally posted by David Roberts View Post
                      John, I've been a bit busy this last few days and came back to knock out the revised Crypto code and may have missed something. I don't recognise the code you changed and it is not clear what previous alterations are to remain.

                      I have just GPF'd.
                      Dave and Gary, re. post #328 revised code: It doesn't work correctly with anything except 2 parameters as I just discovered, and Dave discovered previously. I still think BYVAL has a good chance so I'm going to test it using the original "how many parameters were passed" determination code shown below:
                      Code:
                        !lea eax, Two
                        !mov ecx, [eax]
                        !cmp ecx, 0
                        !jne twoParams      ;if optional parameter Two passed, belt down to integer Rnd2 code section
                        !lea eax, One
                        !mov ecx, [eax]
                        !cmp ecx, 0
                        !je noParams        ;if no params, boogie down to noParams code
                      added: still testing BYVAL and it seems so far to be ok except for "repeat last random value" which I'm looking into. Here is a timing:
                      `
                      Code:
                      PB9 OPTIMIZE SPEED
                      10,000,000 iterations
                      RND:           1.000  Reference 2601 ms
                      Rnd2 [0,1):    0.709
                      Rnd2 (-1,1):   0.691
                      
                      *****Integer Mode*****
                      RND(1, 52):   1.594
                      Rnd2(1, 52):  0.612
                      RND(-2147483648, 2147483647):   1.645
                      Rnd2(-2147483648, 2147483647):  0.627
                      SIZE optimization was even faster.

                      added2: Dang, BYVAL failed again. Gary, you said to abandon it but I couldn't let the speed go. Well, now I have to let it go because it just won't work even with all I've tried. There might be a way that I can't find, but I guess it's time to move on.
                      Last edited by John Gleason; 7 Nov 2008, 01:23 PM. Reason: added timing & 2nd note

                      Comment


                      • John, if you have a look at PB9's 'IsMissing' function you'll see that it doesn't work with Byval either.

                        Comment


                        • Thanks Dave, I'd never see that function before. That proves it methinks. I've got a question: Is there a way to call an API function, namely queryPerfCntr, which takes a quad variable, but without declaring a quad variable? I want to do something like queryPerformanceCounter [esi+12] instead of queryPerformanceCounter eq. I wondered if CALL DWORD might somehow be used for that application.

                          Comment


                          • It is true that byval is not going to work and fulfill our design goals. After John's "heads-up" I eventually found failures for several parameter scenarios. And, I learned that it was easy to drop unexpectedly into the Select Case section of the function.

                            Regarding the idea a few posts back about packing all variables into a storestate string, I can't imagine this would be faster. Right now, isn't the compiler precomputing the variable addresses for us? Isn't that faster than the CPU microcode needing to real-time compute the esi+ addresses for each opcode that would have get data in/out of the string?

                            At this point we seem to have a good prng, with options for random seeding, with good features, and which is quite a bit faster that RND. The speeds already exceed the needs for my present applications. Are we ready to pull it together, retest every option once more, re-verify ent & diehard, and wrap it up? I see two variants: (a) David's pre-initialization version, and (b) possibly a more limited self-initialization version.

                            After my mistake on byval, I've switched back to a methodical checklist to validate: RND(), toggled RND(), RND(0), RND(1,52), RND(52,1), RND(-minint,0), RND(-minint,+maxint), RND(0,+maxint), RND(0,0), RND(20,20), RND(-20,-20), RND(-20,-10), RND(-10, -20), and a range of RND(-int).

                            Comment


                            • I see two variants: (a) David's pre-initialization version, and (b) possibly a more limited self-initialization version.
                              It's funny, that's exactly what I was thinking earlier today--Rnd2, the basic version, and Rnd2C the crypto version. Both can happily coexist in the include file, and in Rnd2C Rnd2mize will be taken over by the crypto seed. The syntax of the two could be identical, or at most vary by Rnd2C needing the initial CASE 7 call and cleanup macro. << The actual case numbers could be adjusted if needed.

                              I got the single declare version going (minus the queryPerfCnt call) and I was surprised to see a pretty big improvement. I don't want to say the % because I need to see if the algo is doing everything correctly first. After a bit more testing, I'll post it.

                              Comment


                              • Rnd2, the basic version, and Rnd2C the crypto version.
                                Why? Post #327 removes the object oriented Crypto and replaces it with a procedural Crypto. There is no longer a need for an option to turn Crypoto off if non PB9/CC5 or it isn't required because it is no longer reliant upon PB9/CC5 and does not encumber the speed.

                                For non-XP/Vista users Rnd2mize is about twice as fast as Rnd2mizeCrypto. For Xp/Vista users Rnd2mizeCrypto is about twice as fast as Rnd2mize. The Crypto method available is determined during the initialization whether that be via PBMain or embedded within Rnd2.

                                My PBMain is simply
                                Function PBMain() As Long

                                Rnd2SetUp
                                ......
                                ......
                                ......
                                Rnd2TidyUp

                                End Function

                                similar to when Crypto was using objects.

                                Comment


                                • I wondered if CALL DWORD might somehow be used for that application.
                                  CALL DWORD still uses PB parameters.

                                  Code:
                                  Is there a way to call an API function, namely queryPerfCntr, which takes a quad variable, but without declaring a quad variable?
                                  In my case, yes, since the TSC and QPC are effectively one in the same thing so I'd simply use RDTSC.

                                  Comment


                                  • Dave asked: Rnd2, the basic version, and Rnd2C the crypto version.
                                    Why? Post #327 removes the object oriented Crypto and replaces it with a procedural Crypto.
                                    I think having a minimum code version like that below is useful because 1) It matches RND almost identically in usage, 2) The function is self-contained and portable, ie. it can stand alone without an include file if wanted, and 3) It's as fast as possible.

                                    The speed increase from previous versions is okay, not great, but now this code has got to be just about at the speed limit.

                                    I've tested the heckjeebers out of it, and it matches the older include files exactly in output. The way to use queryPerfCnt and other PB commands without more declared variables is to PEEK and POKE them.

                                    Rnd2min.inc
                                    Code:
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    ' Rnd2 Macros
                                    MACRO Rnd2mize           = Rnd2(1)                       'WildWheel© seed the random number generator.
                                    MACRO Rnd2mizeCrypto     = Rnd2(2)                       'cryptographically seed the random number generator
                                    MACRO Rnd2Long           = Rnd2(-2147483648, 2147483647) 'commonly used full LONG range
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    ' Rnd2 Additional Macros
                                    MACRO Rnd2LastNum        = Rnd2(0)                       'get last random generated.
                                    MACRO Rnd2Default        = Rnd2(3)                       'set or reset to default sequence
                                    MACRO Rnd2Redo           = Rnd2(4)                       'repeat from beginning of seq, or last bookmarked position
                                    MACRO Rnd2Mark           = Rnd2(5)                       'save position in a sequence
                                    MACRO Rnd2TogSign        = Rnd2(6)                       'turn on/off EXT random range -1 to 1 exclusive.
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    
                                    FUNCTION Rnd2( OPT One AS LONG, Two AS LONG ) AS EXT
                                     #REGISTER NONE
                                     STATIC storeState AS STRING * 60   'mwc means "multiply with carry" random generator based on theory
                                                                        'by George Marsaglia and is the fastest generator known to man. A man. Me. ;)
                                      !lea esi, storeState      ;all our variables are now in one declare: storeState
                                      !cmp dword ptr[esi+00], 0 ;is mwcSoph 0? if so, this is 1st time thru & storeState must be filled with data
                                      !jne short passOneTime
                                         GOSUB gsDefaultState
                                         GOSUB gsStoreSeq
                                     passOneTime:
                                    
                                      !lea eax, Two
                                      !mov ecx, [eax]
                                      !cmp ecx, 0
                                      !jne twoParams      ;if optional parameter Two passed, belt down to integer Rnd2 code section
                                      !lea eax, One
                                      !mov ecx, [eax]
                                      !cmp ecx, 0
                                      !je noParams        ;if no params, boogie down to noParams code
                                    
                                      'we got here so there must be 1 parameter
                                    
                                            SELECT CASE AS LONG One
                                    
                                              CASE 1          ' Rnd2mize. tot possible unique sequences = &hffffffff * &hedffffff * 384
                                                              ' which = 6.6 * 10^21. This excess of initial sequences allows you to
                                                              ' freely use Rnd2mize/Rnd(1) at will to give WildWheel numbers.
                                                 GOSUB gsTSCplusQPC      'add time stamp counter and queryPerfCounter, both !ror 1
                                                 !mov ecx, 384           ;384 possible sophie-germain multipliers
                                                 !mov edx, 0             ;clear edx for division
                                                 !div ecx                ;random remainder (MOD 384) in edx now
                                                 !lea ecx, multiplier    ;codeptr to 384 multipliers
                                                 !mov ecx, [ecx+edx*4]   ;got rnd soph germain multiplier now
                                                 !mov [esi+00], ecx       ;save it
                                                 SLEEP 0                 'considered QPC too because rarely, this can slow 20x. Can be like idle priority.
                                                 GOSUB gsTSCplusQPC      'add time stamp counter and queryPerfCounter, after both !ror 1
                                                 !mov [esi+04], eax      ;save. Now make sure mwcCarry <> mwcSoph(from above) - 1 or 0.
                                                '---------------------
                                                 !cmp dword ptr[esi+08], 0        ;if value is already present, use it as a "memory" of previous loops
                                                 !ja short mwcCarNot0
                                                 SLEEP 0
                                                 GOSUB gsTSCplusQPC      'add time stamp counter and queryPerfCounter, both !ror 1
                                                 !mov [esi+08], eax      ;4 billion+ carrys now possible
                                                 !jnc short mwcCarNot0   ;ok if <> 0 after add, but if there's a carry, add a constant to be sure it's <> 0
                                                 !mov dword ptr[esi+08],&h1234abcd;it was zero! so make it your choice of a constant
                                               mwcCarNot0:
                                                 !mov ecx, [esi+00]
                                                 !sub ecx, 1             ;mwcSoph - 1
                                                 !cmp [esi+08], ecx      ;is carry >= mwcSoph - 1 ?
                                                 !jb  short mwcAllOk     ;if not, we're done
                                                 !and dword ptr[esi+08],&h7fffffff;make it less than mwcSoph - 1
                                               mwcAllOk:
                                                                ' Shuffle mwcRandom a bit more and shuffle mwcCarry
                                                 !mov eax, [esi+00]   ;'(mwcSoph * 2^31 - 1) & (mwcSoph * 2^32 - 1) are both prime (Sophie Germain prime).
                                                 !mov ecx, [esi+04]   ;'Get previous random
                                                 !mul ecx             ;'Multiply mwcSoph * mwcRandom
                                                 !ADD eax, [esi+08]   ;'Add previous carry
                                                 !adc edx, 0          ;'Add possible carry bit from low dword addition
                                                 !mov [esi+04], eax   ;'save our 32-bit random value for next round
                                                 !mov [esi+08],  edx  ;'saving carry for next round
                                                 GOSUB gsStoreSeq
                                    
                                              CASE 0            ' Repeat the last number generated
                                                !fld tbyte [esi+34]  ;load rndE
                                                !fstp FUNCTION       ;pop it to FUNCTION
                                    '           above 2 asm statements do this: FUNCTION = rndE ' will, of course, be zero if no rndE calculated yet
                                                EXIT FUNCTION
                                    
                                              CASE 6           'make it 50/50 + and -
                                                 !not dword ptr[esi+56] ;togScope = NOT togScope
                                    
                                              CASE <= -1       ' Guarrantees a non-duplicate sequence for each integer -1 thru -2147483648
                                                               ' so you can select your own repeatable sequences from the 2Gig available. This gives a
                                                               ' similar functionality to the PB statement "RANDOMIZE number" but without dupe sequences.
                                                !push esi
                                                POKE VARPTR(storeState), PEEK(DWORD, CODEPTR(multiplier) - (One MOD 384) * 4)
                                                POKE VARPTR(storeState) + 4, &h55b218da - One
                                                POKE VARPTR(storeState) + 8, &h3fe8700c
                                                !pop esi
                                                'the above 3 POKE statements do this:
                                    '            mwcSoph = PEEK(DWORD, CODEPTR(multiplier) - (One MOD 384) * 4)
                                    '            mwcRandom = &h55b218da - One 'assures maximum possible 2147483648 unique sequences
                                    '            mwcCarry  = &h3fe8700c
                                                GOSUB gsStoreSeq
                                                !jmp noParams
                                    
                                              CASE 3                ' Rnd2Default seq
                                                 GOSUB gsDefaultState
                                                 GOSUB gsStoreSeq
                                    
                                              CASE 4                ' Rnd2Redo  repeat from beginning of seq, or last bookmarked position
                                                                    ' read saved values from storeState string.
                                                 !mov ecx, [esi+44]         ;mwcSoph
                                                 !mov edx, [esi+48]         ;mwcRandom
                                                 !mov [esi+00], ecx         ;Save mwcSoph In storeState str.
                                                 !mov ecx, [esi+52]         ;mwcCarry to ecx
                                                 !mov [esi+04], edx         ;Save mwcRandom
                                                 !mov [esi+08], ecx         ;Save mwcCarry
                                                  'the above asm does this:
                                    '              mwcSoph    = PEEK(DWORD, VARPTR(storeState    ))
                                    '              mwcRandom  = PEEK( LONG, VARPTR(storeState) + 4)
                                    '              mwcCarry   = PEEK( LONG, VARPTR(storeState) + 8)
                                    
                                              CASE 5                ' Rnd2Mark is like a bookmark
                                                GOSUB gsStoreSeq
                                    
                                              CASE ELSE
                                                GOTO noParams          ' + number > 6, so just generate another EXT rnd
                                    
                                            END SELECT
                                    
                                          FUNCTION = 0 ' OK
                                          EXIT FUNCTION
                                    
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                      twoParams:   ' both optional parameters passed so it's an integer range which can be quickly calculated...
                                                   ' note: if One=Two, then implied 'division' gets us back to One result
                                                               ' Get a 32-bit random value.
                                          !mov eax, [esi+00]   ;'(mwcSoph * 2^31 - 1) & (mwcSoph * 2^32 - 1) are both prime (Sophie Germain prime).
                                          !mov ecx, [esi+04]   ;'Get previous random
                                          !mul ecx             ;'Multiply mwcSoph * mwcRandom
                                          !ADD eax, [esi+08]   ;'Add previous carry
                                          !adc edx, 0          ;'Add possible carry bit from low dword addition
                                          !mov [esi+04], eax   ;'save our 32-bit random value for next round
                                          !mov [esi+08], edx   ;'saving carry for next round
                                                                'got random 32 bits
                                          !mov edx, eax        ;'<<<< hold 32-bit random value in EDX
                                          !mov ecx, One        ;'NOW, evaluate parameters. Get byref address to parameter One
                                          !mov eax, Two        ;'Get byref address to parameter Two
                                          !mov ecx, [ecx]      ;'dereference One.  Expected to be LOWER
                                          !mov eax, [eax]      ;'dereference Two
                                          !cmp ecx, eax        ;'is One > Two?
                                          !jl short Now1LT2    ;'jump over swap, already Two > One
                                          !xchg eax, ecx       ;'Had to switch them, now ecx has LOWER bound as it should
                                    
                                         Now1LT2:
                                          !SUB eax, ecx        ;'now eax is range    [ RANGE (before +1) ]
                                          !inc eax             ;'add 1 to range. Result is correct for two params, and 0 if max range
                                          !jz short doTheRnd   ;'jump if we incremented &hFFFFFFFF up to 0; we have maximum range
                                         'rngNotMax:             At this point ECX = lower,  EAX = range, EDX = random
                                          !mul edx             ;'random * range+1 == edx:eax. Use edx as result as if /(2^32).
                                          !add edx, ecx        ;'add lower bound to rand result in edx
                                        doTheRnd:
                                          !mov [esi+22], edx      ;'store dword result as signed long integer. Need this to load FPU.
                                          !fild dword ptr[esi+22] ;'signed integer load our result to the FPU
                                          !fld st(0)              ;'create duplicate of rndL in FPU
                                          !fstp tbyte [esi+34]    ;'pop extended result to rndE, in case of later Rnd2redo
                                          !fstp FUNCTION          ;'pop extended result for function result
                                          EXIT FUNCTION
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    'GOTO noParams   'doesn't get executed, but may help PB w/ alignment
                                      noParams:  'No parameters, so calc extd rnd2. (v.GDR20081106a)
                                        !mov eax, [esi+00]      'eax << soph
                                        !mul dword ptr[esi+04]  'edx:ead << soph * rand0
                                        !ADD eax, [esi+08]      'eax << eax + carry0 ; this is rand1
                                        !mov [esi+26], eax      'eq(low) << rand1
                                        !adc edx, 0             'edx << edx + cf ; this is carry1
                                        !mov ecx, edx           'ecx << edx ; holding our carry1
                                        !mul dword ptr[esi+00]  'edx:eax << rand1 * soph
                                        !ADD eax, ecx           'eax << eax + carry1 ; this is rand2
                                        !adc edx, 0             'edx << edx + cf ; this is carry2
                                        !mov [esi+04], eax      'mwcRandom << rand2
                                        !lea ecx, [esi+26]      'load address for eq ; this is fastest sequence for P5
                                        !mov [ecx+4], eax       'eq(high) << rand2
                                        !mov [esi+08], edx      'mwcCarry << carry2
                                        ''' end of optimized 2x thru mwc w/ saves to mwcRandom, mwcCarry, and eq.
                                    
                                      okQuadRnd:                'now we have a complete rand quad in eq
                                        !fild qword [esi+26]      ;float load eq which is now a complete quad random
                                        !cmp dword ptr[esi+56], 0 ;do we need to make it 50/50 + and - ?
                                        !jne short rndNeg         ;jump if we do.
                                        !fabs                     ;make it +
                                      rndNeg:
                                        '---------------------  'now take the quad and divide by 9223372036854775816, and it perfectly ranges from [0,1) or (-1,1)
                                        !fld tbyte [esi+12]     ;float load ...-36
                                        !fmulp st(1),st(0)      ;eq * ...-36 to range rnd to [0,1). This = eq / 9223372036854775816 but is much faster
                                        !fld st(0)              ;copy final answer
                                        !fstp tbyte [esi+34]    ;save copy
                                        !fstp FUNCTION          ;save FUNCTION
                                    
                                    EXIT FUNCTION
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                     gsTSCplusQPC:       'Performs time stamp cntr + queryPerformCntr. gs prefix means this is a GOSUB.
                                       !cpuid            ;serialize
                                       !dw &h310f        ;read time stamp counter
                                       !ror eax, 1       ;smooth because some processors are always even low bit
                                       !mov [esi+22],eax ;save in rndL because QPC overwrites registers
                                       !push esi
                                        QueryPerformanceCounter PEEK(QUAD, VARPTR(storeState) + 26)
                                       !pop esi
                                       !mov eax, [esi+26];LO dword
                                       !ror eax, 1       ;smooth, low bit might always be even on some cpu's
                                       !add eax, [esi+22];smoothed QPC + saved smoothed TSC
                                     RETURN
                                    
                                     gsStoreSeq: ' 'save original starting point of sequence in storeState string.
                                                 !mov ecx, [esi+00]         ;mwcSoph
                                                 !mov edx, [esi+04]         ;mwcRandom
                                                 !mov [esi+44], ecx         ;Save mwcSoph In storeState str.
                                                 !mov ecx, [esi+08]         ;mwcCarry to ecx
                                                 !mov [esi+48], edx         ;Save mwcRandom
                                                 !mov [esi+52], ecx         ;Save mwcCarry
                                     RETURN
                                    
                                     gsDefaultState: 'Initialize generator and divisor factor. gs prefix means this is a GOSUB.
                                                 !mov dword ptr[esi+00], 4221732234   ; mwcSoph
                                                 !mov dword ptr[esi+04], &ha5b218da   ; mwcRandom can be any LONG except &hffffffff and 0
                                                 !mov dword ptr[esi+08], &h3fe8700c   ; mwcCarry can be any number except (mwcSoph-1) and 0
                                                 !mov dword ptr[esi+12], &hffffffef   ; factor1 space to hold the constant 1084202172485504433 * 10E-36 (yes I even made it 19 digits)
                                                 !mov dword ptr[esi+16], &hffffffff   ; factor1 ...-36 save in rndE. This is a binary image of the extended precision ...-36
                                                 !mov dword ptr[esi+20], &h3fbf       ; factor1 only 2 bytes, but just move whole dword for speed
                                             'rndL is dword ptr[esi+22] LONG
                                             '  eq is dword ptr[esi+26] QUAD
                                             'rndE is dword ptr[esi+34] EXT
                                       'storeState is dword ptr[esi+44] STRING * 12
                                         'togScope is dword ptr[esi+56] LONG
                                     RETURN
                                    '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                    '#align 4
                                    multiplier: 'a bunch of Sophie Germain primes. Each makes its own unique sequence of ~2^63 LONG random values
                                    !DD 4055108289, 4183291290, 4066422669, 4010830218, 4144557798, 4047225099, 4169878863, 4156378278
                                    !DD 4068734223, 4013003148, 4128794349, 4044603045, 4147482834, 4050081738, 4169007204, 4084483623
                                    !DD 4182936234, 4061167764, 4116557445, 4184835774, 4098609075, 4000058700, 4005596580, 4131991143
                                    !DD 4026365904, 4082490609, 4170263943, 4064971044, 4192040679, 4069686423, 4112450355, 4116008373
                                    !DD 4051352658, 4131639393, 4026209880, 4143019908, 4057560153, 4153038378, 4178347353, 4101943515
                                    !DD 4163817693, 4126770675, 4122227184, 4150506573, 4124871525, 4097114355, 4171215009, 4094254353
                                    !DD 4185190458, 4184112513, 4187782989, 4037092584, 4114448259, 4096721880, 4003880118, 4035500259
                                    !DD 4080989598, 4090215738, 4104202098, 4144153608, 4027213065, 4112123319, 4029634383, 4188620745
                                    !DD 4003957254, 4158202674, 4165028370, 4101889029, 4064867064, 4056294705, 4117302630, 4094813610
                                    !DD 4089078504, 4072584339, 4075250574, 4144182519, 4020827805, 4077052605, 4012941570, 4114015830
                                    !DD 4015303260, 4012049835, 4031934513, 4123667379, 4025171265, 4149021864, 4020494469, 4152989853
                                    !DD 4141465314, 4050172164, 4130534940, 4124347128, 4155032220, 4123523313, 4038610005, 4066391700
                                    !DD 4052359893, 4138494750, 4046848368, 4015233183, 4065337650, 4181156010, 4149686553, 4115669703
                                    !DD 4080411408, 4029985884, 4072279314, 4136476293, 4102312674, 4148638644, 4020161274, 4056852945
                                    !DD 4084467288, 4090139205, 4152479904, 4129623354, 4189154793, 4042650633, 4113056934, 4070634510
                                    !DD 4172190345, 4012616748, 4092782529, 4042027470, 4034320863, 4017110193, 4128178095, 4005317820
                                    !DD 4121565819, 4160465475, 4093432608, 4094047308, 4092039654, 4132108680, 4160799915, 4109110719
                                    !DD 4190254803, 4063105479, 4123739478, 4086096945, 4113466908, 4169157873, 4036670034, 4035486873
                                    !DD 4154194098, 4074334704, 4006945965, 4119880785, 4050935955, 4131729105, 4170646809, 4191996963
                                    !DD 4055775498, 4029162399, 4118132214, 4116397584, 4121266560, 4102454433, 4146555864, 4103353149
                                    !DD 4119974010, 4080379233, 4192378968, 4061071950, 4104928533, 4042978743, 4188739878, 4066717740
                                    !DD 4017709695, 4027617453, 4110604308, 4107339654, 4076278878, 4077074274, 4097495403, 4179562659
                                    !DD 4187765853, 4187454249, 4015793904, 4083863454, 4078492929, 4166495943, 4101303048, 4149525330
                                    !DD 4095286830, 4078227909, 4189944624, 4010811645, 4032304584, 4151394078, 4044317298, 4136517915
                                    !DD 4198354635, 4192501860, 4073134869, 4060180830, 4076815050, 4190613315, 4142749785, 4122567564
                                    !DD 4071542523, 4024430004, 4122798648, 4041267495, 4006243575, 4092566124, 4141397349, 4175565558
                                    !DD 4159829190, 4173505479, 4084339563, 4085131608, 4081507743, 4069428324, 4011038568, 4092438129
                                    !DD 4005482298, 4020895359, 4127615184, 4162803795, 4038272028, 4123171464, 4199942199, 4067713245
                                    !DD 4129181838, 4021766328, 4141102845, 4002607668, 4051580310, 4082443044, 4078962945, 4072199883
                                    !DD 4180693749, 4040763375, 4025696004, 4066226853, 4013137770, 4084688994, 4081465923, 4185884010
                                    !DD 4184193840, 4095653625, 4071642489, 4003011123, 4021708860, 4038391383, 4003548888, 4016275635
                                    !DD 4051483344, 4052001093, 4131504594, 4129105653, 4187278653, 4058921709, 4167113355, 4106971188
                                    !DD 4074045393, 4069825200, 4009724565, 4120937589, 4119577560, 4151390115, 4000637598, 4088788530
                                    !DD 4014859458, 4003633353, 4192075623, 4009856424, 4048255155, 4100175633, 4129717695, 4012882215
                                    !DD 4119226824, 4122492603, 4074693864, 4062187338, 4022104890, 4186039455, 4191285474, 4165800789
                                    !DD 4047934929, 4045886208, 4028478450, 4098395724, 4095869853, 4004229753, 4110500373, 4188458055
                                    !DD 4093944063, 4122368673, 4136075109, 4024434645, 4145270010, 4121262090, 4051650480, 4076720613
                                    !DD 4057135713, 4053301650, 4074379569, 4103950185, 4146078999, 4029125490, 4036104003, 4122595203
                                    !DD 4173008610, 4155931704, 4048316175, 4178853645, 4049069715, 4187855514, 4193714559, 4132340133
                                    !DD 4001184978, 4087342068, 4038996009, 4032782589, 4103313705, 4057212699, 4094324010, 4117022988
                                    !DD 4016133978, 4057176333, 4081210119, 4183410330, 4054406019, 4008415374, 4131217578, 4049176725
                                    !DD 4033804230, 4154677353, 4194818769, 4057689999, 4065887250, 4083913149, 4160269749, 4148719650
                                    !DD 4086572148, 4079152770, 4198797849, 4025836533, 4121774838, 4114818903, 4193265369, 4005720123
                                    !DD 4172736744, 4113446385, 4153872675, 4022863908, 4169665353, 4080875223, 4148976378, 4158173325
                                    !DD 4012107315, 4146530883, 4042645638, 4189878099, 4075365840, 4053276279, 4112504730, 4144260888
                                    !DD 4102144035, 4181673825, 4171915968, 4123257354, 4032551355, 4054454535, 4132616253, 4057321905
                                    !DD 4174490559, 4165419468, 4169862234, 4116771594, 4009920498, 4164231630, 4163597154, 4181713095
                                    !DD 4000268439, 4077171264, 4045424718, 4116626304, 4052701140, 4140380880, 4027965249, 4102323183
                                    
                                    END FUNCTION

                                    Comment


                                    • I think having a minimum code version like that below is useful because 1) It matches RND almost identically in usage, 2) The function is self-contained and portable, ie. it can stand alone without an include file if wanted, and 3) It's as fast as possible.
                                      Hmmm, yerrrs. In post #316 the Crypto code was effectively stunned within a #IF 0/#ENDIF block, in post #337 it was throttled in its sleep.

                                      I am, perhaps uncharacteristically, going to say no more.

                                      For those of you who would like a Crypto seed then read on.

                                      Add:
                                      Code:
                                      Static Buffer As Ext
                                      Static os As OSVERSIONINFO, strOS As String
                                      Static CryptoXP As Long
                                      Static hProvProc, hLib As Dword
                                      Add:
                                      Code:
                                      '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                      $MS_DEF_PROV = "Microsoft Base Cryptographic Provider v1.0"
                                      %PROV_RSA_FULL = 1
                                      %CRYPT_VERIFYCONTEXT = &hF0000000
                                      %XP = 1
                                      
                                      Function GetCryptoBytes( ByVal Which As Long, ByVal hProvProc As Long, x As Ext ) As Long
                                        Dim BinaryByte(9) As Static Byte
                                        Static hexSize As Dword
                                        hexSize = 10
                                        If Which = %XP Then
                                          Call Dword hProvProc Using RtlGenRandom( BinaryByte(0), hexSize )
                                        Else
                                          CryptGenRandom( hProvProc, hexSize, BinaryByte(0) )
                                        End If
                                        x = Peek( Ext, VarPtr( BinaryByte(0)))
                                      End Function
                                      '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                      At the head of Rnd2 between 'GOSUB gsStoreSeq' and passOneTime:

                                      Add:
                                      Code:
                                      ' Crypto initialization
                                      !push esi
                                      os.dwOSVersionInfoSize=SizeOf(os)
                                      GetVersionEx os
                                      If os.dwPlatformId = %VER_PLATFORM_WIN32_NT Then
                                        strOS = Trim$(Str$(os.dwMajorVersion)) & "." & Trim$(Str$(os.dwMinorVersion))
                                        If strOS > "5.0" Then ' we have XP/Vista/Server 2003 or 2008
                                          CryptoXP = %XP
                                        Else
                                          CryptoXP = 0
                                        End If
                                      Else
                                        CryptoXP = 0
                                      End If
                                       
                                      If CrypToXP = %XP Then
                                        hLib = LoadLibrary( "advapi32.dll")
                                        hProvProc = GetProcAddress(hLib, "SystemFunction036")
                                      Else
                                        CryptAcquireContext( hProvProc, ByVal %Null, ByVal %Null, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                                        CryptAcquireContext( hProvProc, ByVal %Null, $MS_DEF_PROV, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                                      End If
                                      !pop esi
                                      ' End Crypto initialization
                                      Add the following two CASEs
                                      Code:
                                      Case 2
                                      reSeedCrypt:
                                        !push esi
                                        GetCryptoBytes(CryptoXP, hProvProc, Buffer)
                                        !pop esi
                                        !lea ecx, Buffer
                                        !mov eax, [ecx]
                                        !mov [esi+4], eax                 ; mwcRandom
                                        !mov eax, [ecx+4]
                                        !mov [esi+8], eax                 ; mwcCarry
                                        !movzx eax, Word Ptr[ecx+8]       ; the last two Of the 10 collected
                                        !mov ecx, 384                     ; 384 possible sophie-germain multipliers
                                        !mov edx, 0                       ; Clear edx For division
                                        !div ecx                          ; Random remainder (Mod 384) In edx now
                                        !lea ecx, multiplier              ; Pointer To 384 multipliers
                                        !mov ecx, [ecx+edx*4]             ; got Rnd soph germain multiplier now
                                        !mov [esi], ecx                   ; mwcSoph
                                        !Sub ecx, 1                       ; mwcSoph - 1
                                        !cmp [esi+8], ecx                 ; does mwcCarry = mwcSoph - 1
                                        !je mwcCarrySophFail              ; yes
                                        !cmp Dword Ptr[esi+8], 0          ; Is mwcCarry = 0
                                        !jne OK                           ; no, Not interested In mwcRandom now
                                        !cmp Dword Ptr[esi+4], 0          ; Is mwcRandom = 0
                                        !je reSeedCrypt                   ; we now have both mwcCarry & mwcRandom = 0
                                        !jmp OK                           ; mwcCarry = 0 but mwcRandom Is Not
                                      mwcCarrySophFail:
                                        !cmp Dword Ptr[esi+4], &hffffffff ; does mwcRandom = &hffffffff
                                        !je reSeedCrypt                   ; mwcCarry = mwcSoph - 1 & mwcRandom = &hffffffff
                                      OK:
                                        GoSub gsStoreSeq
                                       
                                      Case 8
                                        If CryptoXP = %XP Then
                                          FreeLibrary hLib
                                        Else
                                          CryptReleaseContext hProvProc, 0
                                        End If
                                      John forgot to remove 'MACRO Rnd2mizeCrypto = Rnd2(2)'

                                      John is not using CASE 7 which, in my case, initializes in PBMain.

                                      Case 8 has been added and can be used via adding 'Macro Rnd2TidyUp = Rnd2(8)'

                                      Case 8 is required even though Case 7 is not used.

                                      In the case of XP/Vista/Server 03 or 08 the library should be unmapped and in other cases, since the CryptoAPI uses objects, they should be destroyed.
                                      Last edited by David Roberts; 10 Nov 2008, 05:12 AM. Reason: !push esi/!pop esi @ begin/end of Crypto initialization

                                      Comment


                                      • .... and if you want to initialize from PBMain then remove the label passOneTime: and the code above it except '!lea esi, storeState' so it now looks like this:
                                        Code:
                                        !lea esi, storeState
                                        !lea eax, Two
                                        !mov ecx, [eax]
                                        !cmp ecx, 0
                                        .....
                                        .....
                                        Introduce Case 7:
                                        Code:
                                        Case 7
                                          !lea esi, storeState      ;All our variables are now In one Declare: storeState
                                          GoSub gsDefaultState
                                          GoSub gsStoreSeq
                                          ' Crypto initialization
                                          !push esi
                                          os.dwOSVersionInfoSize=SizeOf(os)
                                          GetVersionEx os
                                          If os.dwPlatformId = %VER_PLATFORM_WIN32_NT Then
                                            strOS = Trim$(Str$(os.dwMajorVersion)) & "." & Trim$(Str$(os.dwMinorVersion))
                                            If strOS > "5.0" Then ' we have XP/Vista/Server 2003 or 2008
                                              CryptoXP = %XP
                                            Else
                                              CryptoXP = 0
                                            End If
                                          Else
                                            CryptoXP = 0
                                          End If
                                         
                                          If CrypToXP = %XP Then
                                            hLib = LoadLibrary( "advapi32.dll")
                                            hProvProc = GetProcAddress(hLib, "SystemFunction036")
                                          Else
                                            CryptAcquireContext( hProvProc, ByVal %Null, ByVal %Null, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                                            CryptAcquireContext( hProvProc, ByVal %Null, $MS_DEF_PROV, %PROV_RSA_FULL, %CRYPT_VERIFYCONTEXT )
                                          End If
                                          !pop esi
                                        and add the macro 'Macro Rnd2SetUp = Rnd2(7)'

                                        In PBMain we now have

                                        InitializeTimer
                                        Rnd2SetUp
                                        .....
                                        .....
                                        .....
                                        Rnd2TidyUp

                                        Comment


                                        • Just for the record this is what I'm now getting.

                                          The blocks are Rnd2(), Rnd2(1,52), Rnd2Long.

                                          Code:
                                                         PB8               PB9
                                           
                                          Single   267.17ms (0.821)  249.11ms (0.766)
                                                   323.02ms (0.992)  262.79ms (0.766)
                                                   309.81ms (0.952)  266.97ms (0.821)
                                           
                                          Dual     226.28ms (0.691)  245.66ms (0.758)
                                                   240.23ms (0.733)  290.65ms (0.895)
                                                   180.87ms (0.552)  301.53ms (0.928)
                                          I had some strange results and contradictions when Crypto was in object mode.

                                          In Single Core operation PB9 now beats PB8.

                                          In Dual Core operation PB8 has the edge.

                                          Your esi approach, John, gave a boost.

                                          No alignment tweaking has been considered yet but I have a feeling that no great gains will be got there now.

                                          Regardless of how we slice the cake all relatives are less than 1.000. This was never the aim; period and resolution was. To achieve the aim and beat RND in the speed stakes as well is quite an achievement and my hat goes off to John and Gary for their relentless pursuit in the asm domain. Well done, boys.

                                          Added: Previous nop tweaking with PB9/Dual which gave significant improvements is no longer beneficial. Thank goodness for that.
                                          Last edited by David Roberts; 10 Nov 2008, 06:56 AM.

                                          Comment

                                          Working...
                                          X
                                          😀
                                          🥰
                                          🤢
                                          😎
                                          😡
                                          👍
                                          👎