No announcement yet.

Effect of Multiple Threads

This topic is closed.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of Multiple Threads

    I was playing with some new gbThreads code and decided to see if multiple threads could shorten the 30 minute task of formatting about 50K threads.

    Regardless of thread count, things started off pretty quickly. But somewhere in the 4-6 thread range, things slowed down after a few minutes and at 10+ threads things come to a virtual stop.

    I'm sure there's a logic for that? And other than experimentation, is that a way to quantify the number of threads that should be most effective at reducing time to execute?

  • #2
    This thread was a nice read, but was more generic than the answer I was hoping to find.


    • #3
      Yes, threads can not be used with impunity!

      Why ?

      Because threads do not make an app faster. It simply keeps a separate thread running while the main process is busy which is needed at times.

      The idea that you can add one thread after another and it somehow makes the app faster is a myth.

      Threads have overhead. Each time Windows switches between threads it has to do a context switch saving some data each time. This means every new thread adds more overhead.

      The key to using threads is to write an app which uses the least number of threads possible.

      Best book I have ever seen on how to properly use threads in Windows:

      You may find it used on Ebay or any online book stores.

      Multithreading Applications in Win32: The Complete Guide to Threads

      By Jim Beverage and Robert Wiener
      Addison-Wesley Developers Press, 1997 (368 pages)

      I read the entire book from cover to cover and it enlightened by about how to properly use Threads. I used what I learned in designing the Thread engine in EZGUI 5.0 Pro.

      The concepts taught in the book smack opposite what most assume about threads.

      Nice review of the book on DR Dobbs:
      Chris Boss
      Computer Workshop
      Developer of "EZGUI"


      • #4
        Howdy, Chris!

        I suspect you didn't mean this exactly as written.

        Because threads do not make an app faster.
        I just took an app that was doing some computations, split the effort into 4 threads and the 30min task took only 4min. So your comment might need some tweaking so that your meaning is clearer?


        • #5
          Also look for any string concatenations and replace with stringbuilder, build and arrays with joins.
          How long is an idea? Write it down.


          • #6

            Maybe I don't understand but the way I see it.

            (Oversimplified - assumes everything has the same priority and there is no overhead to task switching)

            I have a single threaded application. It has an intense computation that takes a long time to complete. It is running on a machine which has 10 other processes running.
            My application gets one time slice in eleven.

            I split off the heavy computation into four worker threads.
            My application now gets five time slices in 15 (one time slice in three)

            Added: This MS document appears to confirm that.


            "Windows schedules at the thread granularity. This approach makes sense when you consider that processes don’t run but only provide resources and a context in which their threads run. Because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. For example, if process A has 10 runnable threads, process B has 2 runnable threads, and all 12 threads are at the same priority, each thread would theoretically receive one-twelfth of the CPU time—Windows wouldn’t give 50 percent of the CPU to process A and 50 percent to process B."


            • #7

              Its hardware related, its either "core count" or "core count + thread count". If you have 4 cores without hyperthreading, you can run 4 threads but note that the OS also uses the same cores. If you have 4 cores and 4 threads you can double that. There is yet another consideration, how much work each core is doing, if you are just dribbling data from an internet connection you can up your thread count where if you are bashing through an intensive algorithm the core is saturated. Bottom line is keep your thread count to what is available or it can get really slow.
              hutch at movsd dot com
              The MASM Forum



              • #8
                My application gets one time slice in eleven.
                The 10 other time slices are likely very short as most properly written software gives up its time slice when the job is done.
                So, if a time slice is 16ms, your intense calculation gets 16ms and uses it all, then 10 trivial tasks are allocated 16ms each .. but only use 0.1ms of it before giving up the time slice as the job is done.
                Then your calculation gets its next time slice.

                Your calculation is then running for 16ms out of 17ms.
                If you split your task into 4 then your calculation gets 4 time slices and runs for 16+16+16+16 = 64 out of 65ms but will then have a more overhead due to the switching.


                • #9
                  But somewhere in the 4-6 thread range, things slowed down
                  it's a matter of available resources.
                  You need to know what the bottleneck is.
                  Quite likely, your bottleneck is disk speed.
                  If your hard disk can only supply 400MB/sec and each of your threads can process 100MB/sec then you can gain by using up to 4 threads but adding a 5th thread, even if CPU and memory resources are available, won't gain more disk speed as you're at the limit of the disk.


                  • #10
                    So a great example of the usability of multithreading is something like an EDiscovery indexing program (using 64bit app in this case on my 16/32 core/thread 256GB machine). Each thread will load a certain amount of source data (e.g., 8GB on my machine) into RAM to be indexed. With 1 thread is 8GB and 1 indexer, with 4 threads is 32GB and 4 indexers, 16 threads is 128GB or RAM and 16 indexers (although I am currently limited to 4 threads at the moment due to licensing in the EDiscovery tool) . Now with 4 or 16 threads that doesn't mean it is exactly 4 or 16 times faster, because of context switching and what-not, but each is significantly faster than the one with less threads. Just to mention, there is also a "master" thread running that manages the worker threads and their outputs that "simply" spawns new threads and updates the master index databases when a worker thread completes, but these actions are far faster than the 8GB load and indexing of the index threads so it rarely, if ever, becomes the bottleneck.

                    I am over-simplfying as it doesn't really load 8GB of data, it is some "sizeable amount of data" loaded into RAM with the remaining RAM used to
                    • redundantly extract compound documents (file archives like ZIP files, email archives like PST files, word documents with embedded documents, etc.) to the point all sub-documents are indexed as well
                    • Keep a local copy of indexes that are then merged over to the master thread upon end of that worker

                    As Paul Dixon mentioned, if you can bypass resource bottlenecks (disk being a big one) threads can greatly enhance processing speed. Typically, "chunks of data" that load into RAM which can then be processed by the CPU independently of each other, with little disk use, greatly lend to multithreading. I highlighted independently because if one thread at any time needs to wait on another thread for some input/output then that potentially becomes a bottleneck as well.
                    <b>George W. Bleck</b>
                    <img src=''>


                    • #11

                      You have to consider than most CPU's today are at least dual core and many quad core. Windows can run multiple threads at the same time and that does speed up an app. But in reality you are simply running multiple programs on multiple CPU's (cores). But the average CPU has only 2 to 4 cores (meaning mass market PC's), so the speed increase is limited to just a few threads.

                      What I am talking about is this idea that one can use 5, 10, 20 or more threads in an app and that with each thread you get a speed increase. That is plain false. There is a cutoff point where the extra threads will begin to slow down the app, rather than increase speed. It is hard to determine the best cutoff point because CPU's are all different.

                      I wasn't saying a that a few threads won't speed up an app (if the CPU can handle it), but the idea than "many" threads will improve performance is simply wrong.

                      I would venture to say from experience that a couple threads are fine.

                      But I have seen on this forum, people trying to use dozens of threads and even 100's and that simply does not improve performance.

                      Also remember that in Windows, apps don't run in a vacuum. Sure you can dedicate a PC to run a single app and that is great for performance. But if an app will be used in a typical Windows environment, end users are often running other apps at the same time. The multiple core CPU's help with that so one does not see a performance hit, but if a single apps uses too many threads, it is stealing CPU time from other apps or services. So to get better performance in my app then I end up hurting other apps which are running. This is again another reason programmers should use a few threads as possible. This makes ones app play friendly with all the other apps which may be running.

                      Using too many threads in an app is like someone driving down a road in a car which is so wide it spans the entire width of the road (both sides). You could do it, but you will cause traffic problems because no one can use the road at the same time.

                      In the book I mention above, the authors explain that a well written single thread app will often out perform a multi-threaded app. They explain that using many threads is not the panacea many think it is.Threads are useful, but only when used properly.

                      Chris Boss
                      Computer Workshop
                      Developer of "EZGUI"


                      • #12
                        Another clarification.

                        The biggest improvement in performance is the first Thread!

                        Why ?

                        Because the primary thread of a process (your normal app) gets bogged down with the GUI. The GUI slows everything down, particularly running the message loop and the need to process window messages for each window (forms/controls). The primary thread of an app is not great for performance when it comes to running long running blocks of code. It is best for short blocks of code run during events.

                        Now once you add the first secondary Thread for long running code (ie. lots of computations), there is a huge benefit in performance. Why ? The thread is not bogged down with all the GUI stuff so it can run unencumbered.

                        This is what the book I note above explains. It strongly recommends only doing the GUI stuff in the primary thread of the process and then all other threads be considered non-GUI worker threads. That gets the most performance.

                        The problem is that the first worker thread produces so much improvement in performance for long run code blocks, that people think that it I keep adding threads I will get an exponential improvement in performance. This is false. A CPU has only so many cores and a particular PC will have a cutoff point where the benefit of extra threads no longer outweight the performance hit of running threads (context switches). That point though is different for each PC (each CPU, memory configuration, etc.).

                        Sadly many programmers make the mistake of developing software in a "bleeding edge" PC, say with a core i7 CPU (many cores) which is fast and tons of ram. On such systems the cutoff point is greater, so you may see huge improvements in an app using threads. But take that same app and run it on a more typical PC and the performance will not only be slower because of less raw power, but the app may perform worse than a well written single thread app does.

                        Windows has bottlenecks to performance, but each PC is different.

                        As a rule, the overuse of threads will work against you. Using a minimum number of threads possible for a specific task always makes sense and uses the resources of a PC best.
                        Chris Boss
                        Computer Workshop
                        Developer of "EZGUI"


                        • #13
                          If ShellExecute can replace each thread the application is well designed.
                          1-combine of results from each shell at the end.

                          Look for any "+" or "&" in the application as they really count when strings become large.
                          Use task manager to see virus checker usage.

                          How long is an idea? Write it down.


                          • #14
                            Each thread gets its own time slices and is not bound by the time slices assigned to your app.

                            My PPD Pixel Pipe Demo uses 60 threads. Not a problem. Each thread is responsible for creating a 32x32 pixel block. Yes, if the pixels are completely replaced each time then the CPU can get overtasked. The trick is to make small changes to the pixel blocks. This method allows for greater display speed and minimal effect on CPU usage.

                            If your threads continuously run "Balls to the Wall" then all current apps will run at a crawl. AKA Overtasking the CPU.

                            The key here is to think of a thread as a separate app that does not have to report its status to the system like a GUI app has to.

                            If a thread is not performing a task then it should be sleeping. If the thread has a task then it should be throttled with periodic sleep periods if possible.

                            It is very easy to cause a GUI app to become unresponsive. If threads are constructed adequately and used properly then they will have no affect on your app or other currently running apps.

                            I know a lot of folks here like to cut corners in the name of "simplification/code reduction" but I'm here to tell you that that method is flawed. If you want high speed results then keep thread data separate from all the other threads. Do not share arrays or values unless it is merely to say "I'm Finished" or receive a message to stop processing.

                            And then there is the main app:
                            Your main app should not wait for thread results. If it is designed properly then you will experience no issues.
                            Last edited by Jim Fritts; 14 Aug 2020, 03:30 PM.


                            • #15
                              The best solution is to benchmark different methods of handling threads.

                              For example, let's say you have an app which uses 20 threads. Try to decrease the number of threads to 10 and then compare the two versions by benchmarking them. If the decrease to 10 improves performance, then try to half the number of threads again. If 10 threads is slower then try something in between 10 and 15 threads and benchmark again.

                              Also benchmarking needs to be done on what you consider the typical PC end users will have rather than simply on your development PC. What works on an i7 CPU with lots of cores may run terrible on a lower end CPU which is dual core only.

                              Unless one becnhmarks, you really can not know what the "sweet spot" is for the number of threads to use in an app.

                              In practical application on most typical CPU's 1 or 2 Threads will significantly improve performance on apps which have a lot of "long time" running code. There is no doubt of that. The problem is, where is the sweet spot after that ? Is it 4 threads, 8 threads, 12 threads or more ? Also do threads run constantly or do they periodically sleep ? A single thread which runs constantly can actually force the CPU to 100% usage.

                              The use of SLEEP in Threads changes everything !!!

                              If a Thread were not to call SLEEP then you can easily push CPU usages to close to 100%.

                              So when someone says, I use a couple dozen Threads and my app still only pushes the CPU to 25% then they are using SLEEP in them.

                              Remove the SLEEP command and the thread will push the limits of the CPU.
                              Chris Boss
                              Computer Workshop
                              Developer of "EZGUI"


                              • #16
                                Remove the SLEEP command and the thread will push the limits of the CPU.
                                Correct. And if a thread is not designed properly you can overtax the CPU no matter how many threads you have.


                                • #17
                                  As you can see from all the comments regarding "well-designed thread [functions]," "It's not the tool, it's the craftsman." ( Or something like that!)

                                  Allegedly slow/failing code not presented for peer assessment and/or constructive suggestion.

                                  Michael Mattias
                                  Tal Systems Inc. (retired)
                                  Racine WI USA
                                  [email protected]


                                  • #18
                                    This does not apply to Gary's opening post but is still on-topic. From Microsoft: "An application that creates and destroys a large number of threads that each run for a short time. Using the thread pool can reduce the complexity of thread management and the overhead involved in thread creation and destruction." Thread creation, on my machine, costs about four times as much as a thread submission with a Thread Pool. With my Encrypternet application a 1GB file will see 4096 buffer fills and with decryption the buffer is decrypted in the primary thread and hashed with SHA256 in a secondary thread. 4096 thread creations/destructions would have been expensive.


                                    • #19
                                      If your app creates the threads only once there is no creation cost.
                                      Considering fulltime threads:
                                      If you use THREAD CLOSE following THREAD CREATE there is no destruction cost. When the thread is commanded to close it will stop processing and end.

                                      Yes, if you are using threads for short durations then you will have creation and destruction costs. The question is why would anyone use a part-time thread when an elegantly designed intelligent fulltime thread would do the trick with speed.


                                      • #20
                                        In your case your are trying to Process a large amount of information as fast a possible. So to perform a process intensive procedure, your sweet spot will be at around the total number of physical processors your computer has (Minus one if you want to play nice with the other programs running in the PC).
                                        Cores, Hyper treading etc. all amount to simultaneous processing threads of execution your CPU can handle.
                                        So for process intensive, the only value we care about in most cases is just the simultaneous processing threads your CPU can handle. The OS can handle many more threads, but your will be forcing time slicing to your threads, so no speed gain.
                                        To get the number of processors to create number of threads needed you can use:
                                        #COMPILE EXE
                                        #DIM ALL
                                        #INCLUDE ""
                                        FUNCTION PBMAIN () AS LONG
                                         LOCAL lpSystemInfo AS SYSTEM_INFO                   'System Info (
                                         LOCAL number AS LONG
                                             GetNativeSystemInfo lpSystemInfo                'GetNativeSystemInfo, If called from 32bit app under SysWow64, If a 64bit App it is equivalent to the GetSystemInfo function.
                                                  Number = lpSystemInfo.dwNumberOfProcessors
                                         ?"Number of physical Processors: " + FORMAT$(Number)
                                        END FUNCTION
                                        Final note, for other uses of threads the sweet spot can be much higher, if it is catering to something slower in nature, like the network/internet.