Announcement

Collapse
No announcement yet.

Threads, performance, response times

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Threads, performance, response times

    Much to debate here I guess, but when it comes to Threads, and ways to improve performance, and response times, vs CPU usage, Overlap and etc...I wondered if there is a 'general "Rule" of thumb' that some or most should follow?

    My own personal rule is to use Threads when I need to "multi-task", but only if I need to, because debates are that after "so many threads" could counter-act the purpose of multi-tasking (AKA...do to much at once, you spend all your time "Multi-Tasking" and not actually using the data that you are after)

    So my question is this....how much is too much? (obviously it depends on CPU, OS and time slices) but thought I would ask.
    1. 10 threads - (Small in my mind)
    2. 100 threads - (Ok, if I have to, but maybe time to start re-thinking things depending on what each thread is working on)
    3. 1000 threads - (Ummmm...could I be LESS serious?, If I need that many something is DEFINITELY wrong
    4. 10,000 threads - (OK, anyone with that many probably ground their PC to a halt enumerating files or database tables and need a higher end computer to NOT grind to a halt


    Looking at my own list of comments, I can see the detrimental effects, but thought I would ask since debates and documents just more or less say the limit is "It Depends"
    Engineer's Motto: If it aint broke take it apart and fix it

    "If at 1st you don't succeed... call it version 1.0"

    "Half of Programming is coding"....."The other 90% is DEBUGGING"

    "Document my code????" .... "WHYYY??? do you think they call it CODE? "

  • #2
    Your gut feel is correct: performance data cannot be predicted in advance, it's going to be empirical, and at some point the Law of Diminishing Returns will be supervening.

    But you can make this a user-configurable option..
    Code:
      Max# of additional threads of execution [12345]
    .. and never again have to worry about a need to recompile when a user calls with a performance complaint.


    MCM
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Cliff,

      Threads do not speed up an application!

      They may slow it down.

      Threads are useful when multi-tasking is necessary, but they should be used carefully. Also you should keep the number of threads down to a bare minimum.

      Why ?

      When the operating system switches between threads, a "context switch" occurs, where the CPU saves register info for the current thread. A context switch has overhead and takes time to execute. If your application only has two or three threads, then this overhead is minimal. If your application has 50 threads then the overhead increases a good bit. If your application has 100 threads, the overhead increases even more.

      It is important to realize that threads never speed up an applications execution. They only allow multi-tasking.

      If two threads were used together to accomplish a single task, they would execute slightly slower than if one thread were used.

      Threads force the CPU to share CPU time. This means other threads get less CPU time, when a new thread is created.

      On Windows XP, a number of services are likely already running in the background. Then if you have multiple applications running, more threads (or processes in this case) are running. Now introduce an application that has 50 threads running (or some other large amount). Imagine what that does to the overall overhead necessary to keep everything running.

      While Threads are easy to create, this does not mean that they should be used indescrimently. Before adding a new thread to an application, try everything possible to use the existing threads (and the main thread of the process) to accomplish what you need. Only add a new thread if it is absolutely necessary.

      An excellent book to read on the proper use of threads is:

      "Multithreading Applications in Win32"
      ("the complete guide to threads")
      By Jim Beveridge and Robert Wiener
      Published by Addison Wesley Developers Press
      Copyright 1998 (3rd printing)
      Chris Boss
      Computer Workshop
      Developer of "EZGUI"
      http://cwsof.com
      http://twitter.com/EZGUIProGuy

      Comment


      • #4
        Chris, things may well be different in Multi-core CPU's where multiple threads is the only way to use the full potential.

        Comment


        • #5
          Chris,
          Threads do not speed up an application!
          They can do and not just for the multicore CPUs.
          Threads allow the programmer to avoid wasting time waiting for things to happen and to use that time usefully.
          The following code demonstrates this.

          Paul.
          Code:
          'PBCC4 program
          'see if I can demonstrate a speed increase by prefetching data from a large disk file
          'on first run, choose Y to create the test file.
          'on subsequent runs don't choose Y to run the comparison test on the test file
          
          GLOBAL gFlag AS LONG
          GLOBAL gNextRecord AS LONG
          GLOBAL gQuitFlag AS LONG
          
          FUNCTION MyThread(BYVAL y AS DWORD) AS DWORD
          LOCAL  MyDummyRecord AS STRING*1000
          DO
          'if gFlag=0 then I'm waiting for the next record to be fetched so fetch it
          IF gFlag =0 THEN
              gFlag =1                           'to sync with main code so I don't keep fetching
              GET #f&,gNextRecord,MyDummyRecord  'get the record into cache.
              'at this point this THREAD stalls, just as the main code did, waiting for the data from the disk
              'BUT the OS sees there's a delay here and relinquishes this thread's timeslice(s) until the data is available
              'This allows the main code to get the timeslice back while the data is being fetched
              'i.e.the main code doesn't have to waste time waiting for the data but can get on with processing the previous record
          END IF
          
          SLEEP 0      'give up this timeslice immediately so as not to waste it
          
          LOOP UNTIL gQuitFlag
          
          END FUNCTION
          
          
          FUNCTION PBMAIN () AS LONG
          
          
          PRINT "Create test file? (Y/N)"
          INPUT LINE y$
          
          DIM MyRecord AS STRING*1000
          'this next bit creates a 1GB test file, it only needs doing the first time the code is run
          IF LCASE$(y$)="y" THEN
              PRINT "Creating a big file to use for the tests"
              
              'create a test file of 1GB size consisting of 1,000,000 records each of 1000 characters
              f&=FREEFILE
              OPEN "d:\testfile" FOR RANDOM AS f&   LEN=1000
          
              FOR r& = 1 TO 1000000
                  a&=RND(0,255)
                  MyRecord=STRING$(1000,CHR$(a&))
                  PUT #f&,r&,MyRecord
              NEXT
              CLOSE f&
          
              PRINT "Test file created"
          
          ELSE
              'now read them back randomly and process them
              'for no particular reason, we'll choose a sequence of random records and add up the ascii values of the characters in it
          
              'first, no caching
              OPEN "d:\testfile" FOR RANDOM AS f&   LEN=1000
              RANDOMIZE TIMER
          
          
              t##=TIMER
          
              FOR r& = 1 TO 1000          'do 1000 random items
                  rec&=RND(1,1000000)     'get the next record
                  GET #f&,rec&,MyRecord   'at this point the code stalls everytime for 5-20ms while the data if fetched from disk
                                          'the OS relinquishes the timeslice but I have no code that can make use of it so it's wasted
          
                  'give the CPU something to do with the data that takes a significant time
          
                  FOR waste&=1 TO 20
                  FOR t&=1 TO 1000
                      sum&=sum&+ASC(MID$(MyRecord,t&,1))
                  NEXT
                  NEXT
          
          
              NEXT
          
              PRINT "no caching time=",TIMER-t##
          
              'now do the same but use a thread to prefetch the next item in advance
              THREAD CREATE MyThread(0) TO junk&
              t##=TIMER
          
              'get the number of the first record
              gNextRecord=RND(1,1000000)
          
              FOR r& = 1 TO 1000  'do 1000 random items
                  'get the record.
                  'First time in the loop this will stall and wait 5-20ms while the record is fetched
                  'Subsequent loops, the record was already fetched so this doesn't stall (at least not for as long)
                  GET #f&,gNextRecord,MyRecord
          
                  'now the trick.
                  'DON'T immediately process the current record. Instead, calculate where the next record is and cause it to be fetched
                  gNextRecord=RND(1,1000000)
                  gFlag=0           'Set the flag to tell the thread I want a new record fetching
                  SLEEP 0           'give up my current timeslice so the thread will get it
          
                  'give the CPU something to do with the data that takes a significant time
          
                  FOR waste&=1 TO 20
                  FOR t&=1 TO 1000
                      sum&=sum&+ASC(MID$(MyRecord,t&,1))
                  NEXT
                  NEXT
          
          
              NEXT
          
              PRINT "   caching time=",TIMER-t##
              gQuitFlag = 1    'cause thread to exit
          
          END IF
          
          WAITKEY$
              
          END FUNCTION

          Comment


          • #6
            To quote the book I noted above:

            There is a small performance penalty paid for each context switch. If your application is broken down into 500 threads, then you are potentially paying a big performance penalty.
            Chris Boss
            Computer Workshop
            Developer of "EZGUI"
            http://cwsof.com
            http://twitter.com/EZGUIProGuy

            Comment


            • #7
              You know, threads are a lot like prunes.

              Is three enough? Is six too many?
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                Chris,
                if it's written in a book then I must be wrong. I'll have to look at another way to explain the 35% increase in speed avaiable in the above program when a thread is used.

                Paul.

                Comment


                • #9
                  35%?

                  I'm getting 5.562 without caching and 5.547 with caching.

                  My hard drive seems to be fairly fast so I gave the CPU more work [1] by using 'For waste&=1 To 200', ie 10 times more work, and the times were 56.36/57.125. In this case caching was slower.

                  Actually, the times are so close we could get this difference from different runs.

                  I've just done another run as the first and got 5.61/5.656 giving caching slower this time.

                  [1] Is my logic inverted? I'll knock out a 2GB file and test again.
                  Last edited by David Roberts; 18 Oct 2007, 04:13 PM.

                  Comment


                  • #10
                    David,
                    fast hard drive .. and lots of RAM? I chose 1GB for the file size to make sure it was on hard disk and not cached in RAM. If you have 2GB RAM then make the file size >2GB and try again.

                    Paul.

                    Comment


                    • #11
                      Yes, my logic was inverted.

                      With a 3GB file I now get 14.047/7.922 and 13.469/7.734 with a second run. When we were looking at comparing large files I noticed that using larger and larger buffers did not improve matters by much. It seems that the faster the hard drive relative to CPU speed the less we need to be concerned about hard drive bottlenecking - not entirely surprising.

                      Added: Yes, the filecache will make a difference, and hence the better the second run, but I still think that my logic was inverted.

                      More: Just a BTW, with dual core, Task Manager was showing one core in use without caching and both cores in use with caching which confirms my suspicions that programming for dual core may not give us a better return than leaving it up to the system which is aware of everything going on, not just our app.

                      To put us on a level playing field setting the process affinity mask to CPU 0 got me 13.109/7.953 highlighting the advantage of threading, in this context, for single core machines as well.
                      Last edited by David Roberts; 18 Oct 2007, 05:22 PM.

                      Comment


                      • #12
                        David,
                        With a 3GB file ..setting the process affinity mask to CPU 0 got me 13.109/7.953
                        That's more like it. Using threads speeds it up by 40%.

                        Paul.

                        Comment


                        • #13
                          so the answer is "The point of diminishing returns" as I expected. or "It depends"....(system, speed, what you are trying to do etc.)

                          I just thought I would ask, in the case there was some sort of "General Rule" which obviously isn't because too many variables that boil down to

                          "What do you WANT to do?"
                          vs
                          "What ARE the tools you have to do it with?"

                          Engineer's Motto: If it aint broke take it apart and fix it

                          "If at 1st you don't succeed... call it version 1.0"

                          "Half of Programming is coding"....."The other 90% is DEBUGGING"

                          "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                          Comment


                          • #14
                            Cliff,
                            threads are just another tool to use when the programmer sees fit.

                            A thread context switch costs in the region of 2,000 CPU clock cycles assuming the thread is recently used and still in the cache. That's around 1us on a 2GHz CPU.

                            If you switch threads a million times a second then you'll waste half of your available CPU power just switching and not working.
                            If you switch threads 1000 times a second then you'll waste next to nothing.


                            There is a default limit of around 2000 threads. If you want more than that then you'll need to set #STACK to something smaller than the default 1MB.
                            Windows limits total memory to 2GB and each thread is given it's own stack at the default size of 1MB so, when it reaches 2000, memory runs out.
                            Using
                            Code:
                            #STACK 128000
                            allows 15,000+ threads which is as many as you'll get with PB as that's the minimum stack size allowed by PB.

                            Paul.

                            Comment


                            • #15
                              To quote the Windows SDK docs:

                              Using multiple threads is not a guarantee of better performance. In fact, because thread factoring is a difficult problem, using multiple threads often causes performance problems. The key is to use multiple threads only if you are very sure of what you are doing.
                              Chris Boss
                              Computer Workshop
                              Developer of "EZGUI"
                              http://cwsof.com
                              http://twitter.com/EZGUIProGuy

                              Comment


                              • #16
                                Paul,

                                Here is your code so it runs on PB Win 8.04:

                                Code:
                                GLOBAL gFlag AS LONG
                                GLOBAL gNextRecord AS LONG
                                GLOBAL gQuitFlag AS LONG
                                FUNCTION MyThread(BYVAL y AS DWORD) AS DWORD
                                LOCAL  MyDummyRecord AS STRING*1000
                                DO
                                'if gFlag=0 then I'm waiting for the next record to be fetched so fetch it
                                IF gFlag =0 THEN
                                    gFlag =1                           'to sync with main code so I don't keep fetching
                                    GET #f&,gNextRecord,MyDummyRecord  'get the record into cache.
                                    'at this point this THREAD stalls, just as the main code did, waiting for the data from the disk
                                    'BUT the OS sees there's a delay here and relinquishes this thread's timeslice(s) until the data is available
                                    'This allows the main code to get the timeslice back while the data is being fetched
                                    'i.e.the main code doesn't have to waste time waiting for the data but can get on with processing the previous record
                                END IF
                                SLEEP 0      'give up this timeslice immediately so as not to waste it
                                LOOP UNTIL gQuitFlag
                                END FUNCTION
                                
                                FUNCTION PBMAIN () AS LONG
                                     F$="c:\temp\testfile"
                                     DIM MyRecord AS STRING*1000
                                     'this next bit creates a 1GB test file, it only needs doing the first time the code is run
                                     IF DIR$(F$)="" THEN
                                         'create a test file of 1GB size consisting of 1,000,000 records each of 1000 characters
                                         f&=FREEFILE
                                         OPEN F$ FOR RANDOM AS f&   LEN=1000
                                         FOR r& = 1 TO 1000000
                                             a&=RND(0,255)
                                             MyRecord=STRING$(1000,CHR$(a&))
                                             PUT #f&,r&,MyRecord
                                         NEXT
                                         CLOSE f&
                                     END IF
                                     
                                    'now read them back randomly and process them
                                    'for no particular reason, we'll choose a sequence of random records and add up the ascii values of the characters in it
                                     MSGBOX "start test"
                                     
                                    'first, no caching
                                    OPEN F$ FOR RANDOM AS f&   LEN=1000
                                    RANDOMIZE TIMER
                                
                                    t##=TIMER
                                    FOR r& = 1 TO 1000          'do 1000 random items
                                        rec&=RND(1,1000000)     'get the next record
                                        GET #f&,rec&,MyRecord   'at this point the code stalls everytime for 5-20ms while the data if fetched from disk
                                                                'the OS relinquishes the timeslice but I have no code that can make use of it so it's wasted
                                        'give the CPU something to do with the data that takes a significant time
                                        FOR waste&=1 TO 20
                                        FOR t&=1 TO 1000
                                            sum&=sum&+ASC(MID$(MyRecord,t&,1))
                                        NEXT
                                        NEXT
                                
                                    NEXT
                                    MSGBOX "no caching time="+STR$(TIMER-t##)
                                    'now do the same but use a thread to prefetch the next item in advance
                                    THREAD CREATE MyThread(0) TO junk&
                                    t##=TIMER
                                    'get the number of the first record
                                    gNextRecord=RND(1,1000000)
                                    FOR r& = 1 TO 1000  'do 1000 random items
                                        'get the record.
                                        'First time in the loop this will stall and wait 5-20ms while the record is fetched
                                        'Subsequent loops, the record was already fetched so this doesn't stall (at least not for as long)
                                        GET #f&,gNextRecord,MyRecord
                                        'now the trick.
                                        'DON'T immediately process the current record. Instead, calculate where the next record is and cause it to be fetched
                                        gNextRecord=RND(1,1000000)
                                        gFlag=0           'Set the flag to tell the thread I want a new record fetching
                                        SLEEP 0           'give up my current timeslice so the thread will get it
                                        'give the CPU something to do with the data that takes a significant time
                                        FOR waste&=1 TO 20
                                        FOR t&=1 TO 1000
                                            sum&=sum&+ASC(MID$(MyRecord,t&,1))
                                        NEXT
                                        NEXT
                                
                                    NEXT
                                    MSGBOX "   caching time="+STR$(TIMER-t##)
                                    gQuitFlag = 1    'cause thread to exit
                                END FUNCTION
                                Warning it takes some time to create the test file the first time app is run.

                                I ran the program twice and got the following times:

                                non-cached - 26. 156 sec.
                                25.906 sec.
                                cached - 25.281 sec
                                29.719 sec.

                                I should note that while running this test, I have IExplorer running and the MS SDK docs running, plus firewall, AV and antispyware. The more apps running at the same time effects the overall timing of threads. In a number of test runs, the cached code (thread) ran slower than the uncached.


                                I see no significant improvement in the cached (threaded) version.

                                My PC is a 2.5 ghz CPU, 256 meg Ram, Windows XP Home.

                                It should also be noted that thread execution is dependent upon how may processes are running. Since Windows gives a time slice to every process and their threads, the more total threads running, the less time each thread will get.

                                Also threads have priority levels and changing priority levels effects the speed of execution, since the higher the priority the more time the CPU gives the thread to run.

                                Threads do not run (switch) in a simple round robin order (thread 1, thread 2, thread 3, etc.). Windows groups threads based on priority levels first and then runs threads at the highest priority first (in round robin order), and the then the next priority level and so on.

                                It is often a matter of trial and error to find the best solution (speed) when it comes to multiple threads.
                                Chris Boss
                                Computer Workshop
                                Developer of "EZGUI"
                                http://cwsof.com
                                http://twitter.com/EZGUIProGuy

                                Comment


                                • #17
                                  Cliff, seems to me you are often looking for ways to "beat the system."

                                  While we all go through that phase (I think it was in '69 for me), you can get real results a whole lot sooner if you just use the tools Windows provides as they were designed. "Go With The Flow," if you will.

                                  Many curse the WinAPI as too 'complex,' but that 'complexity' is the byproduct of 'choices', and 'choices' are what deliver 'power' to the programmer and make the Win/32 environment simply terrific for developing applications.

                                  MCM
                                  Michael Mattias
                                  Tal Systems (retired)
                                  Port Washington WI USA
                                  [email protected]
                                  http://www.talsystems.com

                                  Comment


                                  • #18
                                    Chris,
                                    Using multiple threads is not a guarantee of better performance.
                                    Neither is using LONGs, using PB or using ASM. It all depends how you use them.

                                    Threads CAN increase performance significantly if you use them properly, the above example demonstrates this. You shouldn't dismiss the possibility of threads increasing performance just because someone else used them inappropriately and didn't get such an increase.


                                    Paul.

                                    Comment


                                    • #19
                                      Chris,
                                      that's not my code. I'll convert it to PBWin and post it soon.

                                      Paul.

                                      Comment


                                      • #20
                                        paul,

                                        I do not think your test program is a good example to demonstrate the speed of threads. Such a test should not access any input/output devices.

                                        The reason is that when accessing the harddrive you are dealing with drive buffers. Windows buffers the reading of harddrive data and it is hard to tell what effects that buffering has. Also many harddrives have their own buffering technology (hardware) which will also effect the results.

                                        Here is a slightly better test of your code, where I reverse the order of the tests (cached first, non cached second) to see what effect harddrive buffering may have. I also added a loop to the code executes twice in each test.

                                        On my PC, the results are:

                                        cached - 62.312 secs.
                                        non-cached - 48.109 sec.s

                                        Code:
                                        GLOBAL gFlag AS LONG
                                        GLOBAL gNextRecord AS LONG
                                        GLOBAL gQuitFlag AS LONG
                                        FUNCTION MyThread(BYVAL y AS DWORD) AS DWORD
                                        LOCAL  MyDummyRecord AS STRING*1000
                                        DO
                                        'if gFlag=0 then I'm waiting for the next record to be fetched so fetch it
                                        IF gFlag =0 THEN
                                            gFlag =1                           'to sync with main code so I don't keep fetching
                                            GET #f&,gNextRecord,MyDummyRecord  'get the record into cache.
                                            'at this point this THREAD stalls, just as the main code did, waiting for the data from the disk
                                            'BUT the OS sees there's a delay here and relinquishes this thread's timeslice(s) until the data is available
                                            'This allows the main code to get the timeslice back while the data is being fetched
                                            'i.e.the main code doesn't have to waste time waiting for the data but can get on with processing the previous record
                                        END IF
                                        SLEEP 0      'give up this timeslice immediately so as not to waste it
                                        LOOP UNTIL gQuitFlag
                                        END FUNCTION
                                        
                                        FUNCTION PBMAIN () AS LONG
                                             MaxCT&=2  ' increase value to loop through code multiple times
                                             
                                             F$="c:\temp\testfile"
                                             DIM MyRecord AS STRING*1000
                                             'this next bit creates a 1GB test file, it only needs doing the first time the code is run
                                             IF DIR$(F$)="" THEN
                                                 'create a test file of 1GB size consisting of 1,000,000 records each of 1000 characters
                                                 f&=FREEFILE
                                                 OPEN F$ FOR RANDOM AS f&   LEN=1000
                                                 FOR r& = 1 TO 1000000
                                                     a&=RND(0,255)
                                                     MyRecord=STRING$(1000,CHR$(a&))
                                                     PUT #f&,r&,MyRecord
                                                 NEXT
                                                 CLOSE f&
                                             END IF
                                            'now read them back randomly and process them
                                            'for no particular reason, we'll choose a sequence of random records and add up the ascii values of the characters in it
                                             MSGBOX "start test"
                                            'first, no caching
                                            OPEN F$ FOR RANDOM AS f&   LEN=1000
                                            RANDOMIZE TIMER
                                            ' swap the order here to see what effects running first makes
                                            GOSUB test2
                                            GOSUB test1
                                            EXIT FUNCTION
                                        
                                        test1:
                                            t##=TIMER
                                            FOR CT&=1 TO MaxCT&
                                                 FOR r& = 1 TO 1000          'do 1000 random items
                                                     rec&=RND(1,1000000)     'get the next record
                                                     GET #f&,rec&,MyRecord   'at this point the code stalls everytime for 5-20ms while the data if fetched from disk
                                                                        'the OS relinquishes the timeslice but I have no code that can make use of it so it's wasted
                                                     'give the CPU something to do with the data that takes a significant time
                                                     FOR waste&=1 TO 20
                                                          FOR t&=1 TO 1000
                                                              sum&=sum&+ASC(MID$(MyRecord,t&,1))
                                                          NEXT
                                                     NEXT
                                                 NEXT
                                            NEXT
                                            MSGBOX "no caching time="+STR$(TIMER-t##)
                                        RETURN
                                        test2:
                                            'now do the same but use a thread to prefetch the next item in advance
                                            THREAD CREATE MyThread(0) TO junk&
                                            t##=TIMER
                                            FOR CT&=1 TO MaxCT&
                                                 'get the number of the first record
                                                 gNextRecord=RND(1,1000000)
                                                 FOR r& = 1 TO 1000  'do 1000 random items
                                                     'get the record.
                                                     'First time in the loop this will stall and wait 5-20ms while the record is fetched
                                                     'Subsequent loops, the record was already fetched so this doesn't stall (at least not for as long)
                                                     GET #f&,gNextRecord,MyRecord
                                                     'now the trick.
                                                     'DON'T immediately process the current record. Instead, calculate where the next record is and cause it to be fetched
                                                     gNextRecord=RND(1,1000000)
                                                     gFlag=0           'Set the flag to tell the thread I want a new record fetching
                                                     SLEEP 0           'give up my current timeslice so the thread will get it
                                                     'give the CPU something to do with the data that takes a significant time
                                                     FOR waste&=1 TO 20
                                                          FOR t&=1 TO 1000
                                                              sum&=sum&+ASC(MID$(MyRecord,t&,1))
                                                          NEXT
                                                     NEXT
                                                 NEXT
                                            NEXT
                                            MSGBOX "   caching time="+STR$(TIMER-t##)
                                            gQuitFlag = 1    'cause thread to exit
                                        RETURN
                                        END FUNCTION
                                        Chris Boss
                                        Computer Workshop
                                        Developer of "EZGUI"
                                        http://cwsof.com
                                        http://twitter.com/EZGUIProGuy

                                        Comment

                                        Working...
                                        X