Announcement

Collapse
No announcement yet.

Multi-threading slowdown

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-threading slowdown

    Hi,

    I've written a sort of script engine which works just great. I uses only 2 global variables, arrays that is. These are all protected using CriticalSections, even with different CS's for the different arrays. That solved quite some of the context switches I had.
    On average, I tested that a script comparison takes about 0.04 seconds. I tested this with several hundred entries.
    When i call a dll that has about 400 entries calling the script engine, this would take - at max - about 16 seconds. It's just if then else all the way, so if one hits, it exits.

    Strange things happen in multi-user mode: when testing with - for instance - 5 clients, the same dll that normally exists withing a second or 4, now takes about 80 seconds, sometimes even more than 100 seconds!!! It still works, the results are OK, nothing crashes but I don't understand the delay.
    I thought it was the 'jamming' of critical sections but with the addition of one CS for a specific piece of code, the context switches on the performance monitor are virtually gone.

    What can this be? Any hints and ideas on what to look for , design issues etc. are welcome because I'm getting out of ideas [email protected]*!
    Suppose it is the waiting for critical sections. How can I tell / measure this?

    Hope someone can help.

    Regards

    Jeroen Brouwers



    ------------------

  • #2
    I don't know if the criticalsections are affecting performance; that's impossible to tell given only a "text" description of the applications.

    But, using an alternate method of storing,using and protecting the integrity of global data may be in order.

    I just posted (in the last week or two) in the source code forum a demo of using a memory-mapped file with mutex locking; this might be a method you can deploy.

    MCM

    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      You aren't getting into a CS deadlock, are you? Is it possible two threads are waiting for the other one to leave a critical section?

      Comment


      • #4
        Ron,

        If it were so, it was not a metter of slowness - it was just a DEAD-lock.

        ------------------
        Rgds, Aldo

        Comment


        • #5
          Aldo, you're right.

          Ron, I'm quite sure. It does proceed but just much much slower! BTW, I've checked for this a thousand times and used strict guidelines for the CS's that I use for different operations.

          Michael, I'll have to look into the mutex example. But, are mutexes locking your application (I'm not sure here..) until the part you're locking is finished? If so, this would pose a serious performance reduction in multi-user mode.

          Thanks for all the response. I'll keep looking into it.

          Jeroen

          ------------------

          Comment


          • #6
            This is a question to the assembler gurus. Inside old DOS programs I sometimes used to disable the interrupts, do something and reenable it. To setup a semaphore, I had the following actions:

            1- Disable the interrupts
            2- Test the semaphore value; If not zero, jump to step 5
            3- Increment the semaphore value
            4- Set a flag the semaphore was free
            5- Enable the interrupts.

            This is a few assembler instructions only. To free the semaphore, simply decrement its value (a one-assembler-instruction code can't be interrupted). Now, I don't know if such a solution can be implemented with todays O.S. (Win NT, Win 2000, etc.). If it can be done, may be some guy can tell you the exact assembler code to do it.

            ------------------
            Rgds, Aldo

            Comment


            • #7
              Um, I'll bite with this from the SDK help file...


              The CreateSemaphore function creates a named or unnamed semaphore object.

              HANDLE CreateSemaphore(

              LPSECURITY_ATTRIBUTES lpSemaphoreAttributes, // address of security attributes
              LONG lInitialCount, // initial count
              LONG lMaximumCount, // maximum count
              LPCTSTR lpName // address of semaphore-object name
              );
              MCM
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                Originally posted by jeroen brouwers:
                Hi,

                Strange things happen in multi-user mode: when testing with - for instance - 5 clients, the same dll that normally exists withing a second or 4, now takes about 80 seconds, sometimes even more than 100 seconds!!! It still works, the results are OK, nothing crashes but I don't understand the delay.
                I thought it was the 'jamming' of critical sections but with the addition of one CS for a specific piece of code, the context switches on the performance monitor are virtually gone.
                I thought CS were ONE PROCESS protection scheme
                What are 5 clients? Is it equal 5 threads in one process?
                -------
                If you are running Win95, there are a bug in the Critical Section,
                or as Microsoft put it "a feature"(From the July 1996 issue of Microsoft Systems Journal.)
                -------

                ------------------
                Fred
                mailto:[email protected][email protected]</A>
                http://www.oxenby.se



                [This message has been edited by Fred Oxenby (edited May 11, 2001).]
                Fred
                mailto:[email protected][email protected]</A>
                http://www.oxenby.se

                Comment


                • #9
                  First of all deadlocks are always possible using critical
                  sections - secondly a slow down would seem to indicate either a
                  fair amount of jostling or a "queueing" effect for thread slice
                  time.

                  Keep in mind that ideal performance is achieved using the "one
                  thread per cpu" paradigm. If this value is exceeded then fine
                  however having multiple threads basically being serialized
                  using synchronisation primitives can lead to serious bottlenecks
                  depending on the code design - it might be better in such a
                  case to have a single-threaded application simulating concurrency
                  (as in socket select() loop based applications) than relying
                  on threads.

                  Can you post some (compilable) code which illustrates your design?
                  It's easier to help when getting down to brass tacks.

                  Cheers

                  Florent

                  ------------------

                  Comment


                  • #10
                    Jeroen,

                    When you have a single thread which work on a critical section, no problem - the semaphore is always free, and the thread continues work without interruptions.

                    With two or more threads, you have some situations a thread must halt because the semaphore is busy. The task manager must switch the CPU to another thread - Will the O.S. switch just to the thread you expect? If it switches to another thread, all your program will be halted for a significative time. Well, sooner or later the O.S. will start the correct thread, which will release the semaphore. When will the O.S. restart the first thread (the one was halted first)?

                    This moment my system (Win NT4) runs 28 threads (I have only IE5 running). On every thread switch, the task manager must decide which task must work. If there is a semaphore involved, it must also see for the semaphore free or busy. Well, it is very different to run a single thread or multiple thread program.

                    Other question. How much time needs the O.S. to switch between threads? I remember on old 8086 platforms (with INTEL's iRMX O.S.) I couldn't drive a serial port more than 9600 baud: this was because the O.S. called the task manager on each serial port interrupt, and it wasn't able to work faster. Of course we have PENTIUMs which are faster, but the O.S. must however run the task manager for each thread switch, and it takes time.

                    I think your problems belong to the task manager - thread switch. This is not only because of the overhead time the task manager need, but also because yours are not the only active threads in the system. May be you can't expect the system will switch your threads so many hundreds times per second. Hence you can try to work on the priorities, or set-up a program requiring a fewer number of thread switches.

                    Aldo


                    ------------------


                    [This message has been edited by Aldo Cavini (edited May 12, 2001).]
                    Rgds, Aldo

                    Comment


                    • #11
                      Michael, I'll have to look into the mutex example. But, are mutexes locking your application (I'm not sure here..) until the part you're locking is finished? If so, this would pose a serious performance reduction in multi-user mode
                      Well, what I was thinking was that the THREADs could wait on the availability of the mutex object when they need to update shared data.

                      And yes, all threads are suspended while any one thread is updating the globals, but how long does it take to update two variables?

                      Essentially, what my sample shows is a (political incorrectness alert) jerry-rigged semaphore with a user count of one.

                      You might benefit by purchasing a reprint of my article, "Data Sharing in Win/32" which appeared in the December 1999 issue of "BASICally Speaking." I explain the concept of locking / unlocking the mutex object in great detail.

                      Reprints ($4.00) available from http://www.infoms.com


                      MCM




                      [This message has been edited by Michael Mattias (edited May 12, 2001).]
                      Michael Mattias
                      Tal Systems (retired)
                      Port Washington WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        Hi guys, thanks for all the response.

                        Can't really post code here (about 13.000 lines). The script engine compares a normal string (a sentence) with a condition (using or's, combi's and wild cards).
                        To summaries: I have 2 entry points which are basically the same. One is for a call from a thread where I only have a pointer to a type (@MyUDT), the other is the normal variables. From there, I check what elements are in the condition and parse them one them one - keeping the order of thing in mind - to see if there's a fit. If a fit can't be archieved with the newly parsed element, the engine exits.
                        So it's a bunch of function, each designed for a specific element.

                        The global data are two arrays that are used in the engine. I use critical sections to read these data when I need them.

                        I have linked this dll to the web (both NT and W2K server using IIS) and used WCAT to test with multiple clients (that sort of clients...). This is when I noticed the enormous delay.

                        I think my error is this: I used one Critical Section to access the global data and noticed huge context switches / sec in the performance monitor. So I added a new critical section to be used with these global data at certain points. The context switches disappeared. So I though I was on the right track...
                        I have been thinking this weekend that this might be completely wrong: if I use 2 critical sections to protect the same set of global data, I might get rid of the context switches but I don't get rid of the waiting period that one thread has to do because another has the global data locked (maybe with a different critical section). The problem still is the use of global data which has to be protected using a critical section. Therefore, the different threads have to wait on one another untill the global data are unlocked (whatever critical section that may be).
                        So I have to test leaving the global data out and seeing what it does then.

                        I'll get back when I tested this.

                        Sincerely

                        Jeroen

                        BTW: if I'm completely wrong with this way of thinking: please let me know (it might safe me a lot of work...)



                        ------------------

                        Comment


                        • #13
                          Jeroen, without really understanding the architecture of your program, I am wondering you you might benefit from using a queue or circular buffer which would receive each client request. The worker thread would check the queue/buffer and process each item until the queue/buffer is empty. This will allow you to eliminate the critical section. I would recommend not using one thread per client unless you have a machine with more than 2 processors. Adding more threads generally adds more burden to the already exhausted windows scheduler. When you think about it you are creating more work for windows in order to prevent more than one worker thread from running and modifying variables. I think a queue/buffer is your ticket to serenity.

                          Just a thought,

                          Ron

                          PS: You said the two variables are an array. Are the array members dedicated to individual clients so that more than one thread could modify the array without a critical section?

                          [This message has been edited by Ron Pierce (edited May 14, 2001).]

                          Comment


                          • #14
                            Hi Ron,

                            I think my program would certainly benefit from your 'thread pooling' example. I have the source code lying here but still didn't have the time.
                            BTW, isn't your example on thread pooling basically the same than having a semaphore?

                            Concerning the global arrays: one is for creating a unique SQL statement number for each client, and one is just a string array with data. Only the first one is changed by the clients request. The other one is static (no data change) but I understood that just a read could also damage the data, therefore the use of a critical section.

                            I look into it some more.

                            Thanks
                            Jeroen


                            ------------------

                            Comment


                            • #15
                              I understood that just a read could also damage the data
                              Is it true? I don't know how a read-only data can be damaged by a read operation. If a thread is switched off during a read operation, and the data doesn't change, on resume the data read will continue without errors. Or am I wrong? If this risk exists, I must change all my multithread programs...

                              ------------------
                              Rgds, Aldo

                              Comment


                              • #16
                                The global data are two arrays that are used in the engine. I use critical sections to read these data when I need them.
                                So instead of referring to these GLOBALs by name, why not save pointers to the first element of the arrays and store those pointers in a memory-mapped file?

                                As long as your program defines the data as GLOBAL, the array descriptors don't move. (And even if they do, I think if you do your own allocations from the heap you can guarantee non-moving data).

                                However, as your application description has emerged, I must say I really like Mr. Pierce's idea of not using one thread per client, instead using one "worker" thread.

                                You cannot get a quart from a pint jar; more threads sharing the same number of CPU clocks cannot do more work than one thread which handles all the requests.

                                MCM


                                Michael Mattias
                                Tal Systems (retired)
                                Port Washington WI USA
                                [email protected]
                                http://www.talsystems.com

                                Comment


                                • #17
                                  There is no problem accessing data which is read-only
                                  in multiple threads. You might want to make this data
                                  literally read-only as in the following example (any
                                  attempt to change/overwrite the data will cause an access
                                  violation):

                                  Code:
                                  #INCLUDE "win32api.inc"
                                   
                                  GLOBAL MyData() AS ASCIIZ * 18
                                   
                                  FUNCTION MyStringData() AS DWORD
                                    
                                      FUNCTION = CODEPTR(MyStringDataArray)
                                      EXIT FUNCTION
                                    
                                  MyStringDataArray:
                                      'String 'H??w are you then' + CHR$(0) 15 times
                                      !db "H"," ","0","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","1","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","2","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","3","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","4","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","5","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","6","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","7","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","8","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H"," ","9","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H","1","0","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H","1","1","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H","1","2","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H","1","3","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                      !db "H","1","4","w"," ","a","r","e"," ","y","o","u"," ","t","h","e","n",0
                                   
                                  END FUNCTION
                                   
                                  FUNCTION ShowData( BYVAL lId AS LONG ) AS LONG
                                      LOCAL i AS LONG
                                      LOCAL s AS STRING
                                   
                                      FOR i = 0 TO 14
                                          SLEEP RND(1, (lId + 1) * RND(1,10) )
                                          STDOUT  MyData(i) + " at index: " + FORMAT$(i) + " for Thread: " + FORMAT$(lId)
                                   
                                      NEXT
                                   
                                  END FUNCTION
                                   
                                  FUNCTION PBMAIN() AS LONG
                                      LOCAL i AS LONG
                                      LOCAL lResult AS LONG
                                      LOCAL dwResult AS DWORD
                                      DIM lThreads(0:63) AS LONG
                                      DIM MyData(0:14) AS GLOBAL ASCIIZ * 18 AT MyStringData()
                                        
                                      RANDOMIZE TIMER
                                      FOR i = 0 TO 63
                                          THREAD CREATE ShowData( i ) TO lThreads(i)
                                      NEXT
                                   
                                      CALL WaitForMultipleObjects( 64, BYVAL VARPTR(lThreads(0)), %TRUE, %INFINITE )
                                   
                                      FOR i = 0 TO 63
                                          THREAD CLOSE lThreads(i) TO lResult
                                      NEXT
                                  
                                   
                                      PRINT "Finished..."
                                      WAITKEY$
                                   
                                  
                                  END FUNCTION

                                  ------------------

                                  Comment


                                  • #18
                                    a handy way to implement a job queue is by using the
                                    queueuserapc function (nt/win2000 only) where you could
                                    spawn a worker thread per cpu and send it something
                                    to do.

                                    you'll find an example in the source code forum
                                    at: http://www.powerbasic.com/support/pb...ad.php?t=23286

                                    cheers

                                    florent

                                    ------------------

                                    Comment


                                    • #19
                                      Jeroen, reading variables does not require a critical section. I like Florent's idea of forcing them to be read-only.

                                      In my implementation of a circular buffer, my worker thread(s) are not terminated or exited when the work is done. The thread then waits (forever) for an event to be signaled telling it to go to work.
                                      Each thread has a UDT defining what it must do and pointing to the data it must process. At one point I was suspending the threads and resuming them but ultimately I opted for the use of the kernel event object. If you only use one worker thread suspending the thread might be more efficient. Some simple benchmarks should give you the most efficient rooute to take. I think creating a thread will use the most cpu cycles.

                                      Comment


                                      • #20
                                        Originally posted by Ron Pierce:
                                        Jeroen, reading variables does not require a critical section. I like Florent's idea of forcing them to be read-only.
                                        To be clear, that is only true is there is absolutely NO chance that the variables will be modified by any other thread. If that is possible, a Critical Section (or some other semaphore) is a "requirement" to avoid reading partially updated data due to a context switch part way through the update.



                                        ------------------
                                        Lance
                                        PowerBASIC Support
                                        mailto:[email protected][email protected]</A>
                                        Lance
                                        mailto:[email protected]

                                        Comment

                                        Working...
                                        X