Announcement

Collapse
No announcement yet.

slow register variables

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • slow register variables

    the following code runs 2-3 times slower with the "register varibles" line in. any hints or ideas why??

    (i am playing with this to try to speed up my pbasic code, as in math intensive routinrs it is only 4-5 times quicker than vb)

    COMPILE EXE
    #REGISTER NONE

    FUNCTION PBMAIN
    DIM i AS LONG , j AS LONG, k AS LONG
    'optionally use..... Register i as long, j as long, k as long

    FOR i = 1 TO 30000
    FOR j = 1 TO 30000
    k = 7
    NEXT j
    NEXT i
    MSGBOX "Done"

    END FUNCTION

  • #2
    Andrew --

    The K=7 operation is apparently slower when K is a REGISTER variable. Without that assignment, i.e. if you time just the FOR/NEXT loops, then using REGISTER variables instead of LOCALs is about 50% faster. Using INCR K instead of K=7 is also 50% faster when REGISTERs are used, so it looks like it's the numeric assignment that is slowing you down.

    -- Eric

    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>

    "Not my circus, not my monkeys."

    Comment


    • #3
      Unless there has been a change I missed, the manual says:

      In the current version of PowerBASIC, there may be up to two integer class variables (word/dword/integer/long) and up to four extended precision floats.

      So, the answer is whatever the compiler does with "K".


      ------------------
      [email protected]

      Comment


      • #4
        You're only allowed 2 integer register variables.
        Using i & j is more efficient because of the compares & increments being done to them. (loops)

        As Eric points out, K will definately be faster if you use INC vs. MOV. (INCR var vs. var = 7)



        ------------------

        Comment


        • #5
          Originally posted by Eric Pearson:
          Andrew --

          The K=7 operation is apparently slower when K is a REGISTER variable. Without that assignment, i.e. if you time just the FOR/NEXT loops, then using REGISTER variables instead of LOCALs is about 50% faster. Using INCR K instead of K=7 is also 50% faster when REGISTERs are used, so it looks like it's the numeric assignment that is slowing you down.

          -- Eric


          thanks eric, but i don't think thats the whole story.

          i agree that if i delete the k=7 line the register varibles run much faster than local dims,

          but if i leave the k = 7 line in and make i and j register, k local dim, it runs 2-3 times slower than all local dims.

          timings on my machine are:

          Dim i as long, j as long
          Dim k as long
          FOR i = 1 TO 30000
          FOR j = 1 TO 30000
          k = 7
          NEXT j
          NEXT i
          ...................................... 18 secs

          Register i as long, j as long
          Dim k as long
          FOR i = 1 TO 30000
          FOR j = 1 TO 30000
          k = 7
          NEXT j
          NEXT i ...................................... 42 secs

          its almost as if the register varibles get "pushed off" if there is *any* work to do.


          ------------------


          [This message has been edited by Andrew Askwith (edited April 11, 2000).]

          Comment


          • #6
            Perhaps there is a problem with your timing code.
            Code:
            #COMPILE EXE
            #REGISTER NONE
            #DIM ALL
            DECLARE FUNCTION QueryPerformanceCounter LIB "KERNEL32.DLL" ALIAS "QueryPerformanceCounter" (lpPerformanceCount AS QUAD) AS LONG
            DECLARE FUNCTION QueryPerformanceFrequency LIB "KERNEL32.DLL" ALIAS "QueryPerformanceFrequency" (lpFrequency AS QUAD) AS LONG
            FUNCTION PBMAIN()
                LOCAL A AS QUAD, B AS QUAD, OverHead AS QUAD, Freq AS QUAD
                LOCAL ResultsCode AS EXT
                LOCAL strText AS STRING
            
                QueryPerformanceFrequency Freq
            
                QueryPerformanceCounter A
                QueryPerformanceCounter B
                OverHead = B - A
            
                REGISTER i AS LONG, j AS LONG
                DIM k AS LONG
                QueryPerformanceCounter A
                FOR i = 1 TO 30000
                FOR j = 1 TO 30000
                  k = 7
                NEXT j
                NEXT i
                QueryPerformanceCounter B
                ResultsCode = (B - A - OverHead) / Freq
            
                strText = "Results: " & FORMAT$(ResultsCode,"###0.####") & " seconds." & CHR$(13,10)
                MSGBOX strText
            END FUNCTION
            P2-266, 256MB, NT Server 4.0
            REGISTER i & j time: ~6.8504
            DIM i & j time: ~13.9399



            ------------------

            Comment


            • #7
              [QUOTE]Originally posted by Enoch S Ceshkovsky:
              Perhaps there is a problem with your timing code....
              [ENDQUOTE]

              thanks for the timing code, i didn't know about performance api calls.

              but running you code, the timings on my machine stay about the same

              Dim i as long, j as long
              Dim k as long.....................19 secs

              Register i as long, j as long
              Dim k as long........................43 seconds

              maybee something strange with my machine
              P200 mmx, 64mb ram, W95
              although it runs autocad, office, etc just fine.


              PS: my original timings were done with a clockwork stopwatch, so the agrrement is pretty good!

              ------------------




              [This message has been edited by Andrew Askwith (edited April 11, 2000).]

              Comment


              • #8
                While I was running many other tasks at the same time, I recived almost identical times for using both DIM and REGISTER while using Enoch's code.

                Win98 on an AMD-350

                Colin Schmidt

                ------------------
                Colin Schmidt & James Duffy, Praxis Enterprises, Canada

                Comment


                • #9
                  Folks, some informal testing suggests that the key difference in execution times can be related to the brand of CPU you are using... the AMD processors appear to be less sensitive to the order of the declared local variables and register variables as compared to some of the Intel versions.

                  I'll repeat this: these tests were not scientific, but I found the results quite interesting anyway. It would appear that we may just be seeing the difference in how the CPU's perform differently with certain types of code...

                  Comments?

                  ------------------
                  Lance
                  PowerBASIC Support
                  mailto:[email protected][email protected]</A>
                  Lance
                  mailto:[email protected]

                  Comment


                  • #10
                    I do find it interesting to hear this finally. As my whole office is AMD, as well as most of my clients, I have noticed that many of the "old rules and tricks" do not seem to apply.

                    What I mean is that clean small code is still faster than bulky code. But you don't need to play with the all the little things in order to make it run any faster. Thus the register variables, the order of the declarations, and other such things appear to make little difference. I use my register variables appropriately anyway though as I’m certain that on some systems they will drastically speed up the code.

                    By the way, AMD is reporting record sales and market increases for their first quarter!

                    Colin Schmidt

                    ------------------
                    Colin Schmidt & James Duffy, Praxis Enterprises, Canada

                    Comment


                    • #11
                      thanks for all the feedback, can i just remind you where we started.

                      i am trying to find out why powerbasic is only 4 times faster than visual basic on maths intensive code.

                      to investigate this i look at the recommendations about register varibles, integers, longs, etc.

                      when i create a skeleton program to see the effect of explicitly choosing my register varibles, i find that register varibles are 2-3 times slower than normal varibles.

                      for i = 1 to 10000
                      for j= 1 to 10000
                      k = 7
                      next j
                      next i

                      runs in 15 secs with i,j,k as longs or integers,
                      runs in 40 secs withn i,j as register, k as long

                      based on this result i have no way forward to try and improve the speed of my code, apart from abandoning powerbasic and moving the code to c.

                      i would really appreciate any light you can shed on this, its a real problem, not a theoretical discuusion of processer problems.


                      ------------------


                      [This message has been edited by Andrew Askwith (edited April 13, 2000).]

                      Comment


                      • #12
                        The point is, Andrew, that we can't offer you generic advice in this case. Ahy advice might be completely wrong, depending on the CPU in the machine you're using. In order to find what works best on your machine, you need to test it on your machine. This would not be any different if you were to use C instead of PowerBASIC.

                        Based on recent informal testing of math-related routines, we discovered that the CPU type seemed to be a key factor. With a pair of tests run on a plain Pentium, test 1 was much faster than test 2. On a Pentium III, test 1 was much slower than test 2. On an AMD K6, the tests ran at the same speed! So, you see, there is no optimization you can make that is going to work best on all machines all the time.

                        ------------------
                        Tom Hanlin
                        PowerBASIC Staff

                        Comment


                        • #13
                          Then please show us exactly the code that you want to optimize - there are heaps of way to optimize code, but unless we can see the real code, we are going to go round-and-round in general discussions on CPU performance comparisons and generic techniques - we may overlook the best optimization suggestions for *your* code if we are unable to see your code

                          Register variables can certainly offer significant performance gains but they have to be used 'correctly' - as this thread shows, you can get performance differences between brands and types of processors which can obscure optimization results in some cases.

                          In addition, the use of (indexed) pointers and inline assembly can be used to highly optimize code. Another possible time wasting trap can be the method you use to pass data between VB and PB:

                          If you are continually passing data back & forwards between PB and VB then that will seriously affect performance - it is better to pass large blocks (passing the data less often), than it is to frequently pass many small blocks of data. Again, we can't suggest explicit optimization suggestions unless we can see your code.

                          BTW, performing 100000000 assignments of the value 7 to a variable is not what I would call real-world, but maybe you are writing benchmark software?!


                          ------------------
                          Lance
                          PowerBASIC Support
                          mailto:[email protected][email protected]</A>
                          Lance
                          mailto:[email protected]

                          Comment


                          • #14
                            I ran the pasted piece of code below and found the opposite results on the
                            PIII 600 I am using, the register variables for the For/Next loop were
                            about 1.5 seconds faster than stack variables.

                            To get some comparison, I wrote a small loop in ASM with the same number
                            of loops and assignments using registers for both the counter and the
                            assignments and it tests marginally slower.

                            I am inclined to agree with Lance as to the value of testing such a small
                            loop in that there are other things that come in to play such as the
                            minimum instruction count used in a loop does not always give the highest
                            speed. Depending on the hardware, processor caching and instruction
                            pairing make small loop testing very unreliable.

                            Results will start to get a lot more useful when there is some code within
                            the loops that is long enough to defeat the close caching of values. The
                            increase in code length will also make the differences in loop methods
                            much smaller as the calculation code is a lot slower than integer
                            incrementing counters.

                            Once you get past the different effects that follow from very small loops,
                            the fastest way to build a loop is still a label and a "JMP" to it at the
                            end. You normally use a bridged jump when the code in the loop is over
                            128 bytes long which is not very hard to do in application code.

                            Code:
                              
                                Label:
                                ' ------------------------
                                ' Put any length code here
                                ' ------------------------
                                  ! cmp var&, number    ; exit condition test
                                  ! je ExitLoop
                                ! jmp Label             ; unconditional jump to label
                                ExitLoop:
                              
                            '=====================================================================
                                
                                    tc& = GetTickCount()
                                
                                    DIM i AS LONG , j AS LONG, k AS LONG  ' 6146 ms
                                    ' Register i as long, j as long, k as long  ' 4565 ms
                                
                                    FOR i = 1 TO 30000
                                    FOR j = 1 TO 30000
                                    k = 7
                                    NEXT j
                                    NEXT i
                                
                                  MessageBox hWin,ByCopy str$(GetTickCount - tc&),"Button  1", _
                                             %MB_OK or %MB_ICONINFORMATION
                                
                            '=====================================================================
                                
                                    tc& = GetTickCount()
                                
                                    ! xor eax, eax
                                    ! xor edx, edx
                                
                                  Loop1:
                                    ! mov edx, 7
                                    ! inc eax
                                    ! cmp eax, 900000000      ; 4575 ms
                                    ! jne Loop1
                                
                                  MessageBox hWin,ByCopy str$(GetTickCount - tc&),"Button  2", _
                                             %MB_OK or %MB_ICONINFORMATION
                            
                            '=====================================================================
                            Regards,

                            [email protected]

                            ------------------
                            hutch at movsd dot com
                            The MASM Forum

                            www.masm32.com

                            Comment


                            • #15
                              You normally use a bridged jump when the code in the loop is over
                              128 bytes long which is not very hard to do in application code.
                              Pleasantly enough, this is no longer much of an issue. In 16-bit code, relative jumps do have a range limit of about 128 bytes, which can be a real nuisance. In 32-bit code, though, the range limit is about 32,767 bytes, so "bridged jumps" aren't needed nearly as often.

                              ------------------
                              Tom Hanlin
                              PowerBASIC Staff

                              Comment


                              • #16
                                Thanks for that little gem Tom, its surprising what happens when you
                                actually bother to read the manual. It saves 3 cycles in a single pipeline
                                test environment but sad to say, we live in a multi-pipeline world so you
                                don't pick up any speed for the cycle reduction count when a test piece is
                                run in a loop within signed byte range or with the jump length extension.

                                ==========================================================================

                                Intel Manual 2 Jxx

                                The target instruction is specified with a relative offset (a signed
                                offset relative to the current value of the instruction pointer in the EIP
                                register). A relative offset (rel8, rel16, or rel32) is generally
                                specified as a label in assembly code, but at the machine code level, it
                                is encoded as a signed, 8-bit or 32-bit immediate value, which is added to
                                the instruction pointer. Instruction coding is most efficient for offsets
                                of 128 to +127. If the operand-size attribute is 16, the upper two bytes
                                of the EIP register are cleared to 0s, resulting in a maximum instruction
                                pointer size of 16 bits.

                                ==========================================================================


                                This is unfortunate as it looked like it had potential but it seems
                                the only advantage is in coding convenience, not performance.

                                The following test piece was run on the PIII I am using at the moment. It
                                was subsequently modified to make the loop length longer than 128 bytes
                                and the times remained so close that they overlapped. The block of
                                CPUID instructions is to make sure there is no shadowing from previous
                                instructions.

                                Code:
                                      tc& = GetTickCount()
                                    
                                      ! xor edx, edx
                                    
                                    label1:
                                      ! push edx
                                      ' ---------------------------------
                                      ' Block of non pairing instructions
                                      ' ---------------------------------
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                        ! db &H0F,&HA2  ; cpuid
                                      ! pop edx
                                      ! inc edx
                                      ! cmp edx, 1000000
                                      ' ! je  stOut         ; with bridge   1985 ms
                                      ' ! jmp label1
                                    
                                      ! jne label1          ; without       1982 ms
                                    
                                    stOut:
                                Regards,

                                [email protected]

                                ------------------
                                hutch at movsd dot com
                                The MASM Forum

                                www.masm32.com

                                Comment


                                • #17
                                  Tom,

                                  Unrelated to the subject, I know the name "Tom Hanlin" from about
                                  10 years ago as a supplier of aftermarket libraries for a number
                                  of different basic products. Somewhere in the mist, the use of
                                  Modula2 rings a bell and if I remember correctly, they were among
                                  the best in the market, in much the same class of stuff as Ethan
                                  Winer's PDQ product.

                                  I wondered if PowerBASIC have scored one of the original basic
                                  gurus on the team ?

                                  Regards,

                                  [email protected]

                                  ------------------
                                  hutch at movsd dot com
                                  The MASM Forum

                                  www.masm32.com

                                  Comment


                                  • #18
                                    Eh heh heh. Yes, they have, and your memory is sound. The comparison is also apt: one of my products got bought out and merged into PDQ. I'm responsible for PDQ's support of far strings and VB/DOS. Eh, but it's been a few years. Thanks for remembering!

                                    I've made some use of Modula-2, but the absurdly strict type checking was too much to take for long. It's not a language designed to empower the programmer so much as to lock him in a cage. The idea of "units" as a library implementation technique has its points but, other than that, I don't think Modula-2 contributed anything worthwhile to language design. It was more powerful than QuickBASIC in some key respects, although today's PB/DOS would give it a licking.

                                    Most of my work has been in one flavor or another of assembly language, BASIC, C, and Pascal. I've specialized in writing code libraries for about the last 16 years. FWIW, I'm bringing some of them back online at www.tgh3.com, and expect to start making some neat PowerBASIC tools once I've gotten the old stuff out of the way.

                                    ------------------
                                    Tom Hanlin
                                    PowerBASIC Staff

                                    Comment


                                    • #19
                                      Thanks. Your many posts helped me use REGISTER in my code. The PB manual was not sufficiently clear for me.

                                      Comment


                                      • #20
                                        Originally posted by Dennis Pearson View Post
                                        So, the answer is whatever the compiler does with "K".
                                        I'm very much shooting from the hip here, but what if you use

                                        Code:
                                        k = 7&
                                        Real programmers use a magnetized needle and a steady hand

                                        Comment

                                        Working...
                                        X