Announcement

Collapse
No announcement yet.

slow register variables

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Theo Gottwald
    replied
    Lets see, after 11 Years I believe this topic ist still of interest.

    Let's try this one is without REGISTER Variables ...

    Code:
    FUNCTION PBMAIN () AS LONG
    
    LOCAL tc AS LONG
    #REGISTER NONE
    '=====================================================================
    
            tc = GetTickCount()
    
           LOCAL k AS LONG, i AS LONG
           LOCAL  j AS LONG
    
            FOR i = 1 TO 30000
            FOR j = 1 TO 30000
            k+=7
            NEXT j
            NEXT i
    
         ? STR$(GetTickCount - tc)
    
    END FUNCTION
    
    '=====================================================================
    RESULT: 2000 Ticks

    Now we use REGISTER Variables.
    Should it be faster?

    Code:
    FUNCTION PBMAIN () AS LONG
    
    LOCAL tc AS LONG
    
    '=====================================================================
    
            tc = GetTickCount()
    
           REGISTER i AS LONG, j AS LONG
           LOCAL  k AS LONG 
    
            FOR i = 1 TO 30000
            FOR j = 1 TO 30000
            k+=7
            NEXT j
            NEXT i
    
         ? STR$(GetTickCount - tc)
    
    END FUNCTION
    
    '=====================================================================        
    END FUNCTION
    RESULT: 2047 Ticks

    It's even slower!

    Maybe because of some Details of the FOR-LOOP.

    If you do not want to pay the "47 Ticks", just use a DO ... LOOP and IF.

    Now lets unleash the REAL POWER of PB REGISTER Variables!

    By placing "k" into the REGISTER and removing "i".


    Code:
         FUNCTION PBMAIN () AS LONG
    
    LOCAL tc AS LONG
    
    '=====================================================================
    
            tc = GetTickCount()
    
           REGISTER k AS LONG, j AS LONG
           LOCAL  i AS LONG  
    
            FOR i = 1 TO 30000
            FOR j = 1 TO 30000
            k+=7
            NEXT j
            NEXT i
    
         ? STR$(GetTickCount - tc)
    
    END FUNCTION
    
    '=====================================================================

    RESULT: 516 Ticks USING PB REGISTER Variables!
    Thats 4 times as fast as without using the REGISTER Variables!

    Using the Loop-Variables as a REGISTER Variable is only good, if the Loop Variable is also often used inside the Loop.

    Best is, if you put the most often accessed Variable into the REGISTER.
    Last edited by Theo Gottwald; 18 Aug 2011, 02:59 AM.

    Leave a comment:


  • Bud Durland
    replied
    It's because the font with the posting date is very small, and although I have a pair of glasses, my curiosity hasn't overcome my vanity.

    Leave a comment:


  • Brice Manuel
    replied
    Originally posted by Stuart McLachlan View Post
    I think we have a record. A possible solution posted more that 11 years after a question
    11 years might be a necroposting record.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Bud Durland View Post
    I'm very much shooting from the hip here, but what if you use

    Code:
    k = 7&
    I think we have a record. A possible solution posted more that 11 years after a question

    Leave a comment:


  • Bud Durland
    replied
    Originally posted by Dennis Pearson View Post
    So, the answer is whatever the compiler does with "K".
    I'm very much shooting from the hip here, but what if you use

    Code:
    k = 7&

    Leave a comment:


  • John Pfuetze
    replied
    Thanks. Your many posts helped me use REGISTER in my code. The PB manual was not sufficiently clear for me.

    Leave a comment:


  • Tom Hanlin
    replied
    Eh heh heh. Yes, they have, and your memory is sound. The comparison is also apt: one of my products got bought out and merged into PDQ. I'm responsible for PDQ's support of far strings and VB/DOS. Eh, but it's been a few years. Thanks for remembering!

    I've made some use of Modula-2, but the absurdly strict type checking was too much to take for long. It's not a language designed to empower the programmer so much as to lock him in a cage. The idea of "units" as a library implementation technique has its points but, other than that, I don't think Modula-2 contributed anything worthwhile to language design. It was more powerful than QuickBASIC in some key respects, although today's PB/DOS would give it a licking.

    Most of my work has been in one flavor or another of assembly language, BASIC, C, and Pascal. I've specialized in writing code libraries for about the last 16 years. FWIW, I'm bringing some of them back online at www.tgh3.com, and expect to start making some neat PowerBASIC tools once I've gotten the old stuff out of the way.

    ------------------
    Tom Hanlin
    PowerBASIC Staff

    Leave a comment:


  • Steve Hutchesson
    replied
    Tom,

    Unrelated to the subject, I know the name "Tom Hanlin" from about
    10 years ago as a supplier of aftermarket libraries for a number
    of different basic products. Somewhere in the mist, the use of
    Modula2 rings a bell and if I remember correctly, they were among
    the best in the market, in much the same class of stuff as Ethan
    Winer's PDQ product.

    I wondered if PowerBASIC have scored one of the original basic
    gurus on the team ?

    Regards,

    [email protected]

    ------------------

    Leave a comment:


  • Steve Hutchesson
    replied
    Thanks for that little gem Tom, its surprising what happens when you
    actually bother to read the manual. It saves 3 cycles in a single pipeline
    test environment but sad to say, we live in a multi-pipeline world so you
    don't pick up any speed for the cycle reduction count when a test piece is
    run in a loop within signed byte range or with the jump length extension.

    ==========================================================================

    Intel Manual 2 Jxx

    The target instruction is specified with a relative offset (a signed
    offset relative to the current value of the instruction pointer in the EIP
    register). A relative offset (rel8, rel16, or rel32) is generally
    specified as a label in assembly code, but at the machine code level, it
    is encoded as a signed, 8-bit or 32-bit immediate value, which is added to
    the instruction pointer. Instruction coding is most efficient for offsets
    of 128 to +127. If the operand-size attribute is 16, the upper two bytes
    of the EIP register are cleared to 0s, resulting in a maximum instruction
    pointer size of 16 bits.

    ==========================================================================


    This is unfortunate as it looked like it had potential but it seems
    the only advantage is in coding convenience, not performance.

    The following test piece was run on the PIII I am using at the moment. It
    was subsequently modified to make the loop length longer than 128 bytes
    and the times remained so close that they overlapped. The block of
    CPUID instructions is to make sure there is no shadowing from previous
    instructions.

    Code:
          tc& = GetTickCount()
        
          ! xor edx, edx
        
        label1:
          ! push edx
          ' ---------------------------------
          ' Block of non pairing instructions
          ' ---------------------------------
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
            ! db &H0F,&HA2  ; cpuid
          ! pop edx
          ! inc edx
          ! cmp edx, 1000000
          ' ! je  stOut         ; with bridge   1985 ms
          ' ! jmp label1
        
          ! jne label1          ; without       1982 ms
        
        stOut:
    Regards,

    [email protected]

    ------------------

    Leave a comment:


  • Tom Hanlin
    replied
    You normally use a bridged jump when the code in the loop is over
    128 bytes long which is not very hard to do in application code.
    Pleasantly enough, this is no longer much of an issue. In 16-bit code, relative jumps do have a range limit of about 128 bytes, which can be a real nuisance. In 32-bit code, though, the range limit is about 32,767 bytes, so "bridged jumps" aren't needed nearly as often.

    ------------------
    Tom Hanlin
    PowerBASIC Staff

    Leave a comment:


  • Steve Hutchesson
    replied
    I ran the pasted piece of code below and found the opposite results on the
    PIII 600 I am using, the register variables for the For/Next loop were
    about 1.5 seconds faster than stack variables.

    To get some comparison, I wrote a small loop in ASM with the same number
    of loops and assignments using registers for both the counter and the
    assignments and it tests marginally slower.

    I am inclined to agree with Lance as to the value of testing such a small
    loop in that there are other things that come in to play such as the
    minimum instruction count used in a loop does not always give the highest
    speed. Depending on the hardware, processor caching and instruction
    pairing make small loop testing very unreliable.

    Results will start to get a lot more useful when there is some code within
    the loops that is long enough to defeat the close caching of values. The
    increase in code length will also make the differences in loop methods
    much smaller as the calculation code is a lot slower than integer
    incrementing counters.

    Once you get past the different effects that follow from very small loops,
    the fastest way to build a loop is still a label and a "JMP" to it at the
    end. You normally use a bridged jump when the code in the loop is over
    128 bytes long which is not very hard to do in application code.

    Code:
      
        Label:
        ' ------------------------
        ' Put any length code here
        ' ------------------------
          ! cmp var&, number    ; exit condition test
          ! je ExitLoop
        ! jmp Label             ; unconditional jump to label
        ExitLoop:
      
    '=====================================================================
        
            tc& = GetTickCount()
        
            DIM i AS LONG , j AS LONG, k AS LONG  ' 6146 ms
            ' Register i as long, j as long, k as long  ' 4565 ms
        
            FOR i = 1 TO 30000
            FOR j = 1 TO 30000
            k = 7
            NEXT j
            NEXT i
        
          MessageBox hWin,ByCopy str$(GetTickCount - tc&),"Button  1", _
                     %MB_OK or %MB_ICONINFORMATION
        
    '=====================================================================
        
            tc& = GetTickCount()
        
            ! xor eax, eax
            ! xor edx, edx
        
          Loop1:
            ! mov edx, 7
            ! inc eax
            ! cmp eax, 900000000      ; 4575 ms
            ! jne Loop1
        
          MessageBox hWin,ByCopy str$(GetTickCount - tc&),"Button  2", _
                     %MB_OK or %MB_ICONINFORMATION
    
    '=====================================================================
    Regards,

    [email protected]

    ------------------

    Leave a comment:


  • Lance Edmonds
    replied
    Then please show us exactly the code that you want to optimize - there are heaps of way to optimize code, but unless we can see the real code, we are going to go round-and-round in general discussions on CPU performance comparisons and generic techniques - we may overlook the best optimization suggestions for *your* code if we are unable to see your code

    Register variables can certainly offer significant performance gains but they have to be used 'correctly' - as this thread shows, you can get performance differences between brands and types of processors which can obscure optimization results in some cases.

    In addition, the use of (indexed) pointers and inline assembly can be used to highly optimize code. Another possible time wasting trap can be the method you use to pass data between VB and PB:

    If you are continually passing data back & forwards between PB and VB then that will seriously affect performance - it is better to pass large blocks (passing the data less often), than it is to frequently pass many small blocks of data. Again, we can't suggest explicit optimization suggestions unless we can see your code.

    BTW, performing 100000000 assignments of the value 7 to a variable is not what I would call real-world, but maybe you are writing benchmark software?!


    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Tom Hanlin
    replied
    The point is, Andrew, that we can't offer you generic advice in this case. Ahy advice might be completely wrong, depending on the CPU in the machine you're using. In order to find what works best on your machine, you need to test it on your machine. This would not be any different if you were to use C instead of PowerBASIC.

    Based on recent informal testing of math-related routines, we discovered that the CPU type seemed to be a key factor. With a pair of tests run on a plain Pentium, test 1 was much faster than test 2. On a Pentium III, test 1 was much slower than test 2. On an AMD K6, the tests ran at the same speed! So, you see, there is no optimization you can make that is going to work best on all machines all the time.

    ------------------
    Tom Hanlin
    PowerBASIC Staff

    Leave a comment:


  • Guest's Avatar
    Guest replied
    thanks for all the feedback, can i just remind you where we started.

    i am trying to find out why powerbasic is only 4 times faster than visual basic on maths intensive code.

    to investigate this i look at the recommendations about register varibles, integers, longs, etc.

    when i create a skeleton program to see the effect of explicitly choosing my register varibles, i find that register varibles are 2-3 times slower than normal varibles.

    for i = 1 to 10000
    for j= 1 to 10000
    k = 7
    next j
    next i

    runs in 15 secs with i,j,k as longs or integers,
    runs in 40 secs withn i,j as register, k as long

    based on this result i have no way forward to try and improve the speed of my code, apart from abandoning powerbasic and moving the code to c.

    i would really appreciate any light you can shed on this, its a real problem, not a theoretical discuusion of processer problems.


    ------------------


    [This message has been edited by Andrew Askwith (edited April 13, 2000).]

    Leave a comment:


  • Colin Schmidt
    replied
    I do find it interesting to hear this finally. As my whole office is AMD, as well as most of my clients, I have noticed that many of the "old rules and tricks" do not seem to apply.

    What I mean is that clean small code is still faster than bulky code. But you don't need to play with the all the little things in order to make it run any faster. Thus the register variables, the order of the declarations, and other such things appear to make little difference. I use my register variables appropriately anyway though as I’m certain that on some systems they will drastically speed up the code.

    By the way, AMD is reporting record sales and market increases for their first quarter!

    Colin Schmidt

    ------------------
    Colin Schmidt & James Duffy, Praxis Enterprises, Canada

    Leave a comment:


  • Lance Edmonds
    replied
    Folks, some informal testing suggests that the key difference in execution times can be related to the brand of CPU you are using... the AMD processors appear to be less sensitive to the order of the declared local variables and register variables as compared to some of the Intel versions.

    I'll repeat this: these tests were not scientific, but I found the results quite interesting anyway. It would appear that we may just be seeing the difference in how the CPU's perform differently with certain types of code...

    Comments?

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Colin Schmidt
    replied
    While I was running many other tasks at the same time, I recived almost identical times for using both DIM and REGISTER while using Enoch's code.

    Win98 on an AMD-350

    Colin Schmidt

    ------------------
    Colin Schmidt & James Duffy, Praxis Enterprises, Canada

    Leave a comment:


  • Guest's Avatar
    Guest replied
    [QUOTE]Originally posted by Enoch S Ceshkovsky:
    Perhaps there is a problem with your timing code....
    [ENDQUOTE]

    thanks for the timing code, i didn't know about performance api calls.

    but running you code, the timings on my machine stay about the same

    Dim i as long, j as long
    Dim k as long.....................19 secs

    Register i as long, j as long
    Dim k as long........................43 seconds

    maybee something strange with my machine
    P200 mmx, 64mb ram, W95
    although it runs autocad, office, etc just fine.


    PS: my original timings were done with a clockwork stopwatch, so the agrrement is pretty good!

    ------------------




    [This message has been edited by Andrew Askwith (edited April 11, 2000).]

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Perhaps there is a problem with your timing code.
    Code:
    #COMPILE EXE
    #REGISTER NONE
    #DIM ALL
    DECLARE FUNCTION QueryPerformanceCounter LIB "KERNEL32.DLL" ALIAS "QueryPerformanceCounter" (lpPerformanceCount AS QUAD) AS LONG
    DECLARE FUNCTION QueryPerformanceFrequency LIB "KERNEL32.DLL" ALIAS "QueryPerformanceFrequency" (lpFrequency AS QUAD) AS LONG
    FUNCTION PBMAIN()
        LOCAL A AS QUAD, B AS QUAD, OverHead AS QUAD, Freq AS QUAD
        LOCAL ResultsCode AS EXT
        LOCAL strText AS STRING
    
        QueryPerformanceFrequency Freq
    
        QueryPerformanceCounter A
        QueryPerformanceCounter B
        OverHead = B - A
    
        REGISTER i AS LONG, j AS LONG
        DIM k AS LONG
        QueryPerformanceCounter A
        FOR i = 1 TO 30000
        FOR j = 1 TO 30000
          k = 7
        NEXT j
        NEXT i
        QueryPerformanceCounter B
        ResultsCode = (B - A - OverHead) / Freq
    
        strText = "Results: " & FORMAT$(ResultsCode,"###0.####") & " seconds." & CHR$(13,10)
        MSGBOX strText
    END FUNCTION
    P2-266, 256MB, NT Server 4.0
    REGISTER i & j time: ~6.8504
    DIM i & j time: ~13.9399



    ------------------

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Originally posted by Eric Pearson:
    Andrew --

    The K=7 operation is apparently slower when K is a REGISTER variable. Without that assignment, i.e. if you time just the FOR/NEXT loops, then using REGISTER variables instead of LOCALs is about 50% faster. Using INCR K instead of K=7 is also 50% faster when REGISTERs are used, so it looks like it's the numeric assignment that is slowing you down.

    -- Eric


    thanks eric, but i don't think thats the whole story.

    i agree that if i delete the k=7 line the register varibles run much faster than local dims,

    but if i leave the k = 7 line in and make i and j register, k local dim, it runs 2-3 times slower than all local dims.

    timings on my machine are:

    Dim i as long, j as long
    Dim k as long
    FOR i = 1 TO 30000
    FOR j = 1 TO 30000
    k = 7
    NEXT j
    NEXT i
    ...................................... 18 secs

    Register i as long, j as long
    Dim k as long
    FOR i = 1 TO 30000
    FOR j = 1 TO 30000
    k = 7
    NEXT j
    NEXT i ...................................... 42 secs

    its almost as if the register varibles get "pushed off" if there is *any* work to do.


    ------------------


    [This message has been edited by Andrew Askwith (edited April 11, 2000).]

    Leave a comment:

Working...
X