No announcement yet.

Faster, quicker, optimized

  • Filter
  • Time
  • Show
Clear All
new posts

  • Faster, quicker, optimized

    I've just spent a couple of hours browsing these forums, looking for ideas to speed up my code. There are a lot of suggestions, but no real collection of principles - except for the few in the help file.

    I know this is a big subject, but I thought it might be useful to try to get suggestions in one place - perhaps an FAQ?

    Here are a few tips I've collected:

    0) It's most effective to optimize your design rather than expect to do it in compilation (further explanation of this one is requested).
    1) It's often quicker to use global variables than local variables.
    2) Use register varibles wherever possible - use the #REGISTER statement
    3) Order conditional test so that the short-cuts can be optimized.
    4) Use DWORD and LONG instead of WORD and INTEGER.
    5) Dimensioning arrays: DIM x(100) is faster than DIM x(0 TO 100)
    6) The PROFILE command is a handy way to understand which parts of your program are most sensitive to optimizing.
    7) Program in assembler if you can stand it.
    8) Use the #VERSION statement if possible (I suspect that limiting backward compatibility speeds up the code.)

    There are at least as many I've come across that seem more contentious, but still interesting.
    There are surely more suggestions...why not add your favourites tips to this topic so that they are easy to find?

    Phil Greenwood

    Owing to circumstances out of my control, there will not be a big parade this Sunday.

    [This message has been edited by Phil Greenwood (edited February 11, 2005).]
    Owing to circumstances out of my control, there will not be a big parade this Sunday.

  • #2
    9)FOR/NEXT blocks are faster when the counter is a LONG variable.


    Egrid32 Grid Control - Form Designer for Egrid32 - Print Engine for Egrid32

    [This message has been edited by Elias Montoya (edited February 10, 2005).]


    • #3

      The problem is there is no simple way to mae sense of code
      optimisation. At the most generalised level the idea is to do no
      more than you need to and a whole host of methods are aimed in that

      Getting the right algorithm is by far the best way to produce fast
      and efficient code and this is well before any optimisation technique.

      When you have what you consider to be the best algo for the job, then
      you can start to look at ways to get it more efficient and do less
      that it may be doing to get its speed up.

      The next trick is to know what to optimise, having the fastest MessageBox
      on the planet is probably not worth the effort where if you have
      code that must perform highly intensive tasks, your work is well
      spent if you can get it to work better.

      Among the trick you have available is coding some things in assembler
      because for some tasks, its the easiest and most efficient way to
      perform it.

      The bottom line is get the idea right and then implement it well
      and you have little left to improve on.


      hutch at movsd dot com

      hutch at movsd dot com
      The MASM Forum


      • #4

        While I agree that there are (usually) several ways any one
        routine can be made to run faster I've always felt that it is
        just as important, or even MORE important, to be able to
        understand the code 6 months down the road that to shave off a
        few nano-seconds.

        Little tricks like: Using LONGs as loop counters and counting
        backwards (to zero) when possible, Dropping some code into ASM,
        REGISTER variables, etc all play a part but.... Unless a given
        routine is used so often and is so heavy into number crunching
        that it is bogging down the whole program I don't waste a lot
        of time looking for a nano or two.

        I'm not a lover of GLOBALs but use them where they're indicated.
        Just make sure you've got some naming convention for them so you
        can see, at a glance, that they're g_GlobalVariables.

        Figure it this way: You spend 2 hours shaving .001 of a second
        off the routine. The routine is called 10 times during normal
        usage of the program in one session. The program is used once a
        day. 10 years from now you will have hit the break even point
        for time expended and your user will never notice one way or the

        Optimize? Sure! But don't get caught up in it. The little 'tricks'
        will come along, one at a time, and drift, quietly, into your code
        over the years.

        The biggest time saver is clean, well documented code with good,
        descriptive variable names. When something goes wrong (and it will)
        6 months, a year, or 5 years from now you'll be glad you put some
        effort into the coding when you can find and fix the bug, or update
        a feature, in a matter of a few minutes rather than a few days or

        don at DASoftVSS dot com


        • #5
          As indicated in the PB docs, using pointers can speed things up a lot. However I never start with pointers. I always start by using standard PB commands. When I've got the routine working, I then decide if it's worth speeding up. If that's the case then I first use pointers. If that's not enough then (gulp) out comes the asm hat.


          "There are two novels that can change a bookish fourteen-year old's life: The Lord of the Rings and Atlas Shrugged. One is a childish fantasy that often engenders a lifelong obsession with its unbelievable heroes, leading to an emotionally stunted, socially crippled adulthood, unable to deal with the real world. The other, of course, involves orcs." - John Rogers


          • #6
            Thanks to everyone so far, and a couple of comments for future postings on this thread:

            What led me to this point is writing a piece of research code that is going to take months to run. I spent a day contemplating improvements, and knocked 50% off its run time. However I look at that, it's an economically efficient activity, but the learning curve is steep - I'm not a professional programmer; for me it's a tool for the job - so I don't want to spend months re-learning ASM or C...

            Now if I can do the same thing a couple more times (knock X0% off the run time), I have a real strategic advantage. The problem is that while there are a few clear responses to the question of speed, there are many, many, many more generalized contemplations about it - "write good algorithms" - but what is a good algorithm? (No offense Steve, your insights are more valuable than that). The techniques "creep in" - great, but which techniques - how can we accelerate learning them? Also every discussion is tinged with other perspectives about clarity, maintainability, etc, which are important, but not really in the context of collecting techniques for writing FAST POWERBASIC.

            I guess I didn't frame the question right in the first place, so my apologies.

            There must be some more tips from people who understand more about how the compilation process works...


            Owing to circumstances out of my control, there will not be a big parade this Sunday.

            [This message has been edited by Phil Greenwood (edited February 11, 2005).]
            Owing to circumstances out of my control, there will not be a big parade this Sunday.


            • #7
              11) Floating point arithmetic is much slower than integer arithmetic. Use integers wherever possible.

              (On this one, I stand corrected - see below).
              Owing to circumstances out of my control, there will not be a big parade this Sunday.

              [This message has been edited by Phil Greenwood (edited February 11, 2005).]
              Owing to circumstances out of my control, there will not be a big parade this Sunday.


              • #8
                I like your posting's topic and have certainly learned a couple of things from your thread.

                Here's a few from yesterday!
                Whilst trying to find the fastest method of accessing substrings of variable length from a contiguous block of bytes, I found that retrieving 10 million instances of the substring "object", in a contigous mem block, using a memory mapped file, took about half the time, it did, using a dynamic string. Fixed length strings can't hold 60 million bytes.

                That surprised me!

                Also MID$ was only 2% slower than PEEK$ when retrieving object using the string method (PEEK$ was used with an asciiz ptr).



                • #9
                  11) Floating point arithmetic is much slower than integer arithmetic. Use integers wherever possible.
                  Used to be, but today isn't always true.
                  Try to do some millions muls between LONG or SINGLE: with an AMD Athlon (who have a very fast FPU), for example, SINGLE is faster by around 25%. DOUBLE is just a bit slower than LONG.
                  On a Pentium IV, instead, FP will will probably far slower, because it have a very poor FP unit (unless one can use SSE2).


                  Try Online TrID file identifier! Recognize over 1.500 filetypes and counting...
                  Give a powerfull scriptable Lua interface to your application with PowerBLua
                  PBTracer - Tracer / Profiler for PowerBASIC (soon!)

                  [This message has been edited by Marco Pontello (edited February 11, 2005).]
                  -- The universe tends toward maximum irony. Don't push it.

                  File Extension Seeker - Metasearch engine for file extensions / file types
                  Online TrID file identifier | TrIDLib - Identify thousands of file formats


                  • #10

                    Actually I think Hutches advice was one of the most important and often overlooked.

                    I'm not a professional programmer either, it's just a tool for my job however I do spend a lot of time doing it, especially over the last few years.

                    Performance enhancing methods like correct use or data types and pointers are very important and these skills come to all programmers after a while but algorithms are not always properly investigated.

                    things like search and sort algorthms and efficient data structures.

                    A very basic example is, if you need to search a list a thousand times, then it pays to sort it first and use a proper search technique like binary searching or interpolated searching. likewise sorting, I hear a lot of people say "PB array sort is the fastest, you won't beat it!" but PB haven't released their algorithm so it's not always going to be the fastest. The programmer has the most knowledge about the list and the data so if he knows his algorithms he can choose a sort algorithm that suits the data. A bubble sort will beat a quicksort of the data is 95% sorted already, only the programmer will know if this is likely to be the case! Array sort might be the best "all round" method but that's not optimisation is it?

                    A lot of people don't spend the time to read about algorithms and subsequently use the wrong one. I think this is the point that Hutch was getting at. The benfits here can be in the x00% or sometimes even the x000% increase so it pays to start here and wonder about skiming off the x% later.

                    These are all part of the whole design process at the beginning rather than hunting through existing code at the end for enhancements to make.

                    I think this is also what MCM is refering to when he often says "It's not the paintbrush it's the artist". A bubble sort is still a bubble sort even if it's written in ASM, getting it to run 20% faster means very little compared to a more appropriate algorthm for the job in question

                    Paul Dwyer
                    Network Engineer
                    Aussie in Tokyo

                    [This message has been edited by Paul Dwyer (edited February 11, 2005).]


                    • #11
                      A few specifics I've used:
                      1) SELECT CASE X AS CONST / CONST$
                      use the AS CONST option if you know that x's values must
                      be certain constant values:

                      SELECT CASE X AS CONST
                      CASE 8: X = 7
                      CASE 16: X = 15
                      CASE 28: X = 15
                      CASE 30: X = 31
                      CASE 32: X = 0
                      END SELECT

                      is faster than:

                      SELECT CASE X
                      CASE 8: X = 7
                      CASE 16: X = 15
                      CASE 28: X = 15
                      CASE 30: X = 31
                      CASE 32: X = 0
                      END SELECT

                      because the former jumps right to the selected case rather
                      than reading thru them one at a time as the latter example does.
                      If you must use the latter example, put the most likely cases
                      to occur at the top of the list:

                      SELECT CASE X
                      CASE < 8: X = 7 'this is the most common case
                      CASE > 16: X = 15
                      CASE 8: X = 15
                      CASE 10: X = 15
                      CASE 12: X = 15
                      CASE ELSE: X = 31
                      END SELECT

                      2) use the API function QueryPerformanceCounter to time your code
                      to check if your changes have improved speed.

                      QueryPerformanceCounter qtime(1)
                      FOR i = 1 TO 100000
                      X = newFunction(Y)
                      QueryPerformanceCounter qtime(2)
                      totTime = qtime(2) - qTime(1)

                      3) When writing data to a file, place it in a buffer string
                      first rather than making a lot of small disk writes.

                      4) As everyone says: use pointers and indexed pointers.

                      5) If you do occasionally use a little asm, remember to
                      make the jumps "short" when possible:

                      !push ebx
                      !mov eax, ptrStr
                      !mov ebx, ptrStr2
                      !mov edx, 1000
                      !mov ecx, [eax+edx]
                      !mov [ebx], ecx
                      !add ebx, 3
                      !dec edx
                      !jns short loop01 ;make jump short here rather than just using: !jns loop01
                      !pop ebx

                      6) How could I have forgotten this one?: use Macros and Macro
                      Functions when possible. Potentially BIG speed increases especially
                      if used in conjunction with asm, for example in those way inner loops.

                      Back in 1998 Francis Beaulieu posted this speed-tip list.
                      It gives many specifics.
                      'Francis Beaulieu, Member
                      'posted November 08, 1998 09:24 PM

                      There is a lot of tips that you can apply to all Basic's

                      - Try to cache any value of any API calls, or storing an
                      array element in a variable before doing a DO LOOP or a
                      For Next loop...if the value returned from the call is always
                      the same...
                      a&=MyApiCall or an array(i%) element
                      For x& = 0 to XX&
                      if Array&(x&) = a& then ...
                      Next x&

                      - Use the AND operator instead the MOD if the divisor is a
                      number in form 2^N. For instance, you can use two methods to
                      extract the least significant byte in an integer or word.
                      lowByte%= value% MOD 256
                      Faster way is: lowByte% = value% AND 255

                      - You can simplify those expressions by:
                      If x = 0 And y = 0 Then...
                      If (x or y) = 0 Then

                      If x=21 And y=33 And Z=22 Then...
                      If x=21 And y=33 And z=23 Then...
                      a = (x=21 And y=33)
                      If a And Z=22 Then...
                      If a And z=23 Then...
                      Much faster
                      a = (x=21 And y=33)
                      If a Then...
                      If z = 22 Then... (Or Select case)
                      If z = 23 Then...
                      End If

                      Then in the case you have a lot of conditions with AND you
                      will be able to cut your IF ... AND ... Then... by:

                      IF ... Then
                      IF ... Then
                      END IF
                      END IF

                      This speed up your code by twice the speed... In the critical

                      - Bidimentional Arrays are stored in the columns/rows in the
                      most Basic's model. When you have to scan an array, the first
                      For... Next should interate on the columns, and the inner one
                      should interate on the rows. By that you access arrays items
                      of theire adjacent memory locations.

                      - By replacing If A$ = "" Then... By
                      If len(A$) = 0 Then... you will gain a 25% of speed.

                      - Consider this exemple:
                      If A$ = "A" Or A$ = "B" Or A$ = "D" Or_
                      A$ = "Z" Or... Then...
                      You can replace everything by...
                      IF Instr(A$, ANY "ABCDZabcdz") > 0 Then...


                      [This message has been edited by John Gleason (edited February 11, 2005).]


                      • #12

                        Lots of good advice in this post.

                        As for specifically what you should do next with your code, as Steve said, the trick is to know what to optimize. Spending the time to figure out where your code is spending most of the execution time is a good investment. Then you can concentrate your efforts where they will make the most economic sense.

                        I haven’t used it myself, but the PROFILE statement might be helpful. And of course there’s the obvious: what’s inside a program loop gets executed more than what’s outside. As the programmer, you should know or be able to estimate how many times each loop will loop, so you scrutinize the highest repeaters.

                        Some knowledge about the inner workings of things can help. For example, repeated string concatenation (A$=A$+B$) can be a real time killer because of all the memory allocation/deallocation that has to go on behind the scenes (even though it looks real efficient when you type it).

                        As for choosing the right algorithm for the job, Paul said it very well. Despite this being really the first and most important step, there doesn’t seem to be any one or two line snipets of advice that can help here.

                        If you can find yourself some good plain English write-ups of both the quicksort and bubble sort algorithms, then once your understanding of how both these algorithms work “clicks on”, you still won’t have the background knowledge of all the different algorithms and techniques out there, but you will at least truly understand how the approach is an order of magnitude more important than the coding details, and have an idea of how one goes about arriving at the most efficient algorithm for the job.

                        Bill Walter

                        P.S. Does anyone know if INCR A& compiles to an !INC instruction or regular addition?

                        [This message has been edited by Bill Walter (edited February 11, 2005).]


                        • #13
                          it compile to an inc, from what i have seen.

                          you may use this code snippet, to look at similar things:
                          source forum: topic: have a look at what the compiler output


                          try online trid file identifier! recognize over 1.500 filetypes and counting...
                          give a powerfull scriptable lua interface to your application with powerblua
                          pbtracer - tracer / profiler for powerbasic (soon!)

                          [this message has been edited by marco pontello (edited february 11, 2005).]
                          -- The universe tends toward maximum irony. Don't push it.

                          File Extension Seeker - Metasearch engine for file extensions / file types
                          Online TrID file identifier | TrIDLib - Identify thousands of file formats


                          • #14
                            Thanks Marco!

                            Okay, add use "INCR" instead of "+1" to the list.

                            Bill Walter



                            • #15
                              Figure it this way: You spend 2 hours shaving .001 of a second
                              off the routine. The routine is called 10 times during normal
                              usage of the program in one session. The program is used once a
                              day. 10 years from now you will have hit the break even point
                              for time expended and your user will never notice one way or the
                              Very good point, Don. And excellent explained. I wholeheartly agree. I often have the feeling that optimisation is done for optimisation's sake. I know, it's fun from the programmer's perspective. But users would more appreciate it when the time went into this would have been spent for error testing or optimising the usability.




                              • #16
                                This was good in 1999 and still is in 2005. imho

                                'Chris Boss, Member
                                'posted March 03, 1999 06:11 AM
                                I would like to add some comments to speed
                                optimizing using and Basic language.
                                To increase speed in any software requires a
                                thorough examination of what you are trying to

                                (1) The algorithm is probably more important
                                than anything else. Nothing can replace a good
                                algorithm. Also make sure you never calculate
                                anything more times than necessary (for example
                                the use of precalculated tables in an Array can
                                speed up some algorithms).

                                (2) Test your code on different platforms and
                                different CPUs. Not all computers are made

                                (3) Make sure the algorithm is not device
                                dependent (ex. reading data from a file to do
                                your calculation). If the code requires access to a
                                device, be more concerned with optimizing the
                                device access since this will usually be your
                                bottle neck.AS an example let me give an example:
                                I had a database routine that accessed a
                                database and did string comparisons and math for
                                a report. I optimized the routine for disk access
                                and math, but the Windows version of the routine
                                was terribly slow, while the DOS version was
                                very fast. I found out that because I added a text
                                counter,(so the end user knew something was going on.
                                It counted records being accessed) the problem
                                was that Windows doesn't do very well with
                                cycling text (counting from 1 to a number) in
                                any type of control. Since there is too much overhead
                                in any screen access in Windows. I had to leave
                                the text counter, but I changed it, so it only
                                would update the counter one time out of every
                                100 iterations. This solved my problem.

                                (4). Benchmark your code. Create a test routine
                                that will call your routine thousands of times
                                and track the time it takes to do it. Then try
                                different things to optimize your code and see
                                how they affect your benchmark. You may be
                                amazed at what really affects your codes speed.
                                The rule is that only 1% of your code will affect
                                90% of your speed. While some optimizations may
                                increase the speed by a small amount, what you
                                are looking for are changes that will affect
                                your code significantly. If a routine takes .5
                                or .45 seconds isn't much (unless it is in a
                                larger loop). If a routine takes 10 seconds
                                compared to 20 seconds then that is a big deal.
                                If the routine is called by a larger loop, say
                                like a SetPixel routine called thousands of
                                times, then every bit of speed is important.

                                (5)If the speed is that critical (like a
                                SetPixel routine) then there is only one
                                solution, learn assembler. Since PB DLL allows
                                inline assembler, then this is the best optimization.
                                A better coded routine using the same language
                                (ex. Basic) may only get you a small increase of
                                speed like 1%. Using Assembler can sometimes
                                increase speed 25%, 50% or even 200%. No matter
                                how good a compiler is (an PB is very good),
                                there is nothing like assembler.

                                (6) Lastly, check for the use of strings in your
                                code. They may be sparse, but they may exist in
                                a math routine (ex. a parsing routine).
                                Strings are more of a problem in speed than are
                                numbers. Eliminating one unneeded string command
                                can do more than multiple changes in numeric
                                code. While, I have not discussed other more obvious
                                things dealing with the math itself, much of
                                that comes with experience and good programming
                                skills. The need to benchmark covers those
                                things, because it forces you to try different
                                ways to accomplish different things.



                                • #17
                                  don't calculate anything inside a loop which doesn't change. Calculate it before the loop instead.
                                  for x = 1 to 100000
                                  should be:
                                  for x = 1 to 100000
                                  Use algebra, you probably learned it at school. Often, applying simple alegebra to rearange you equations can give a more efficient solution.

                                  Invest in appropriate hardware.
                                  If your task involves GBytes of data, get GBytes of RAM so the computer can hold it all in fast memory instead of needing to swap it to slow disk.
                                  If your task can't fit in RAM, get FAST (usually that means SCSI) 15,000+rpm hard disks so that slow swapping to disk isn't so slow.
                                  If your task involves intensive FP calculations, get a fast CPU which is good at FP.
                                  Consider a dual CPU machine if you can write your code to take advantage of it (I think that means you need to use threads in PB)

                                  <<a piece of research code that is going to take months to run>>

                                  One potential big time saver, make sure you do the coding in such a way that you can resume the calculation after stopping and save all the data every hour or every day so you don't lose months of work when the power fails.

                                  <<The problem is that while there are a few clear responses to the question of speed>>

                                  You asked a general question so you'll get general answers. Post specific code and folk may give you specific speed-up advice.

                                  <<11) Floating point arithmetic is much slower than integer arithmetic. Use integers wherever possible.>>

                                  Not true if you do things right.
                                  On modern CPUs you can even do FP and Integer arithmetic at the same time so nearly doubling the overall speed, but you'll need to do it in ASM.

                                  If the task involves intensive graphics, invest in a fast graphics card.
                                  If the task can be written so that some parts can be done independently of others, then split the task so it can be run over multiple machines, and then merge the results at the end.

                                  <<so I don't want to spend months re-learning ASM >>

                                  You can probably learn enough in a day or two to half your computing time.

                                  <<but what is a good algorithm?>>

                                  Depends on the job you're doing but an example would be, suppose you have a large amount of data to sort. You can use the brute force, "search for the biggest and put it at the top" approach or you could use one of the more sophisticated sort routines which will take a fraction of the time.
                                  Then, depending on how much data there is, you may be able to split the sort so that it matches your hardware configuration.
                                  e.g. if the data is too big to all fit in RAM then do partial sorts on blocks of data which DO fit in RAM e,g,2GB of data could be done in 10x200MB blocks. Since each block is held in RAM rather than on disk it'll sort much faster.
                                  You then merge the 10 blocks to write the sorted file back to disk.

                                  Then, to speed it even more, remember that the CPU has 256kB or 512kB or even more cache memory. Cache is much faster than main RAM. So, we could further split the 200MB of each data block into chunks which will fit entirely in the CPU cache. In the same way, we sort each chunk separately and then merge the results.

                                  Of course, there's more work involved for the programmer up front but the potential gain in speed can be huge.

                                  The above example also demonstrates how writing code to fit your hardware can bring real speed gains.

                                  [This message has been edited by Paul Dixon (edited February 11, 2005).]


                                  • #18
                                    Stay away from dynamic strings ($) where possible. Fixed length strings (ASCIIZ) or dynamically allocated memory blocks (GlobalAlloc, HeapAlloc) are much faster. I also avoid having to duplicate memory by using pointers.

                                    Another thing, when using list boxes or list views, etc. use owner drawn type methods where possible. I do this all the time with listviews (see %LVN_GETDISPINFO), and they work much faster.


                                    [This message has been edited by Kev Peel (edited February 11, 2005).]
                           | Slam DBMS | PrpT Control | Other Downloads | Contact Me


                                    • #19
                                      Avoid string concatenation.

                                      Instead of...
                                      FOR Z = 1 to 100
                                       X$ = X$ & Y$(Z)
                                      ...Build a big buffer and use the MID$= function..
                                      X$ = SPACE$(Enough)
                                      iStart = 1
                                      FOR Z = 1 to 100
                                         l = LEN(Y$ (Z))
                                         MID$(X$, iStart, l) = Y$(Z)
                                         iStart = iStart + l
                                      X$ = RTRIM$(X$) ' because 'enough' was really "more than enough"

                                      Michael Mattias
                                      Tal Systems Inc. (retired)
                                      Racine WI USA
                                      [email protected]


                                      • #20
                                        The LEN parameter for OPEN file has been overlooked enough times that it probably should be included in any list:

                                        OPEN fname$ FOR INPUT AS #1 LEN = 32767

                                        The best advice for me came from, I believe, Ethan Winer:

                                        Make it right before you make it faster.
                                        Make it right before you make it better.

                                        Because when it is made right, you might find you don't need the algorithm at all.