Announcement

Collapse
No announcement yet.

Thread-Local Variables

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thread-Local Variables

    I have created a product (UCalc Fast Math Parser, which is featured in April issues of Visual C++ Mag, and Windows Developer Journal) with PB. People often ask if it is “thread-safe”. The way I designed it, it is not thread-safe. The main problem is that I use some global variables. In order for it to be thread-safe, the global variables would have to be local to each thread (but global for the functions within the thread), instead of shared between the threads. My DLL includes direct support for leading compilers, such as VB, VC++, C++ Builder, Delphi, etc... I am aware of TLS, although it seems a little complicated. Looking at the documentation for various compilers, I see that they have their own native ways of handling TLS which is much easier than using things like TlsAlloc, TlsSetValue, etc... I would like to suggest that PowerBASIC include a native feature for supporting TLS as well.

    Here is what I’ve found so far, looking at the documentations from other compilers:

    In Borland C++ Builder, you can define a thread-local variable like this:

    int __thread x;

    When the explanation before this said “Sometimes, however, you may want to use variables that are global to all the routines running in your thread, but not shared with other instances of the same thread class.”, I knew that this was exactly what I needed.

    Visual C++ supports the following (so does C++ Builder):

    __declspec( thread ) int tls_i = 1;

    Delphi supports the following syntax:

    threadvar X: Integer;

    In Visual Basic it’s even easier (the previous version of my DLL component was done in VB). All I had to do was select the type of Instancing I wanted, from a combo-box menu.

    So it would be nice to have a built-in way of doing this in a future version of PowerBASIC as well. Meanwhile, how can I make use of Thread Local Variables in PB?

    How can I for instance create a trivial DLL with the following function?:
    Code:
    Global X as Long
    
    Function IncrementX Export as Long
    	IncrementX = X
    	X = X+1
    End Function
    What would the LibMain function have to look like if I wanted the IncrementX function to return a 0 every first time it is called from a new thread (instead of returning a value incremented from another thread)?

    -------------
    Daniel Corbier
    UCalc Fast Math Parser
    http://www.ucalc.com
    Daniel Corbier
    uCalc Fast Math Parser
    uCalc Language Builder
    sigpic

  • #2
    I'll pass your suggestion for Thread Local Storage (TLS) along to R&D. Excellent idea IMHO.

    When it comes to multi-threading, you open a whole new can of worms - all sorts of problems can arise when trying to adapt "solid" single-threaded code into multiple-threaded code.

    The first place to start learning is to read up on Synchronization Objects which you'll need to use when accessing or altering any block of memory larger than 32-bits wide, that can be accessed/altered by another thread. The Rector/Newcomer book explains this subject in detail, including the pitfalls of relying 100% on LIBMAIN() failing to be called in cirtain circumstances.

    To implement your own form of TLS, the biggest problem you are faced with is that you need to be able to assign (or be assigned) a unique identifier for each thread, and then continue to use this unique value to somehow identify a particular block of memory for each thread. Essentially, this is how TLS works anyway.

    One interesting fact is that the API provides a unique ID for each thread - GetCurrentThreadID(). We can use this number (or we could generate one of our own) to track entries in a block of memory (or maybe even an array), thereby providing a method for separating storage between threads - say, one array subscript per thread.

    For example, if your LIBMAIN() function intercepts %DLL_THREAD_ATTACH and increments a variable, that variable could be used as an index into a GLOBAL array. This would provide a simple way for one array subscript to relate only to one thread. You'd need to handle %DLL_THREAD_DETACH and this can be a bit more problematic - see the Rector/Newcomer book for more info.

    Anyway this should give you a good starting point. Be sure to read up on CriticalSection()'s, and the pitfalls of LibMain() in a good book. Failing to get the 'formula' right in a multi-threaded app can mean _very_ intermittent GPF's in your code or the code of those that use your DLL - these are some of the hardest bugs you may ever have to find.

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>
    Lance
    mailto:[email protected]

    Comment


    • #3
      TLS storage can be done under PB, but it would be nice if the compiler did all the footwork for you.
      This is not meant to be used in a "real" program. It's just a proof of concept.
      Thread Safe DLL called "AddtoA.dll"
      Code:
      #DIM ALL
      #COMPILE DLL
      #INCLUDE "WIN32API.INC"
      TYPE TLSType
        A AS LONG
        B AS ASCIIZ * 12
        C AS LONG
      END TYPE
      
      GLOBAL gTLSIndex() AS LONG
      GLOBAL gTLSPTR() AS TLSType PTR
      GLOBAL gTLSCount AS LONG
      GLOBAL gTLSMutex AS LONG
      
      FUNCTION GetThreadLocalStorage(rTLS AS TLSType) AS LONG
        LOCAL cThread AS LONG, hHeap AS LONG
        LOCAL I AS LONG
        WaitForSingleObject gTLSMutex, %INFINITE
          cThread = GetCurrentThreadID
          ARRAY SCAN gTLSIndex(), =cThread, TO I
          IF I = 0 THEN
            ARRAY SCAN gTLSIndex(), =0, TO I
            IF I = 0 THEN
              gTLSCount = gTLSCount + 1
              REDIM PRESERVE gTLSIndex(1:gTLSCount) AS LONG
              REDIM PRESERVE gTLSPTR(1:gTLSCount) AS TLSType
              I = gTLSCount
            END IF
            gTLSIndex(i) = cThread
            hHeap = GetProcessHeap
            gTLSPTR(i) = HeapAlloc(hHeap, %HEAP_ZERO_MEMORY, LEN(rTLS))
          END IF
          rTLS = @gTLSPTR(I)
        ReleaseMutex gTLSMutex
      END FUNCTION
      
      FUNCTION PutThreadLocalStorage(rTLS AS TLSType) AS LONG
        LOCAL cThread AS LONG, hHeap AS LONG
        LOCAL I AS LONG
        WaitForSingleObject gTLSMutex, %INFINITE
          cThread = GetCurrentThreadID
          ARRAY SCAN gTLSIndex(), =cThread, TO I
          IF I = 0 THEN
            ARRAY SCAN gTLSIndex(), =0, TO I
            IF I = 0 THEN
              gTLSCount = gTLSCount + 1
              REDIM PRESERVE gTLSIndex(1:gTLSCount) AS LONG
              REDIM PRESERVE gTLSPTR(1:gTLSCount) AS TLSType
              I = gTLSCount
            END IF
            gTLSIndex(i) = cThread
            hHeap = GetProcessHeap
            gTLSPTR(i) = HeapAlloc(hHeap,%HEAP_ZERO_MEMORY,LEN(rTLS))
          END IF
          @gTLSPTR(I) = rTLS
        ReleaseMutex gTLSMutex
      END FUNCTION
      
      FUNCTION FreeThreadLocalStorage ALIAS "FreeThreadLocalStorage" () EXPORT AS LONG
        LOCAL I AS LONG
        WaitForSingleObject gTLSMutex, %INFINITE
          ARRAY SCAN gTLSIndex(), =GetCurrentThreadID, TO I
          IF I <> 0 THEN
            IF gTLSIndex(I) <> 0 THEN
              HeapFree GetProcessHeap, 0, BYVAL gTLSPTR(I)
              gTLSIndex(I) = 0
            END IF
          END IF
        ReleaseMutex gTLSMutex
      END FUNCTION
      
      FUNCTION FreeAllThreadLocalStorage() AS LONG
        LOCAL bLibMainFailed AS LONG
        LOCAL hHeap AS LONG
        LOCAL I AS LONG
        WaitForSingleObject gTLSMutex, %INFINITE
          hHeap = GetProcessHeap
          FOR I = 1 TO gTLSCount
            IF gTLSIndex(I) <> 0 THEN
              bLibMainFailed = %TRUE
              HeapFree hHeap, 0, BYVAL gTLSPTR(I)
              gTLSIndex(I) = 0
            END IF
          NEXT
        ReleaseMutex gTLSMutex
        IF bLibMainFailed = %TRUE THEN MSGBOX "LibMain THREAD_DETACH did not get called!"
      END FUNCTION
      
      FUNCTION AddToA ALIAS "AddToA" (x AS LONG) EXPORT AS LONG
        DIM lTLS AS TLSType
        GetThreadLocalStorage lTLS
          lTLS.A = lTLS.A + x
          FUNCTION = lTLS.A
        PutThreadLocalStorage lTLS
      END FUNCTION
      
      FUNCTION LibMain(BYVAL hInstance   AS LONG, _
                       BYVAL fwdReason   AS LONG, _
                       BYVAL lpvReserved AS LONG) EXPORT AS LONG
        LOCAL lTLS AS TLSType
        SELECT CASE fwdReason
          CASE %DLL_PROCESS_ATTACH
            gTLSMutex = CreateMutex(BYVAL %NULL, %TRUE, BYVAL %NULL)
              REDIM gTLSIndex(1:1) AS LONG, gTLSPTR(1:1) AS TLSType PTR
              gTLSCount = 1
            ReleaseMutex gTLSMutex
            LibMain = 1   'success!
          CASE %DLL_PROCESS_DETACH
            FreeAllThreadLocalStorage ' Only done for niceness
            CloseHandle gTLSMutex
            LibMain = 1   'success!
          CASE %DLL_THREAD_ATTACH
            PutThreadLocalStorage lTLS
            LibMain = 1   'success!
          CASE %DLL_THREAD_DETACH
            FreeThreadLocalStorage
            LibMain = 1   'success!
        END SELECT
      END FUNCTION
      Our Testing EXE "testata.exe":
      Code:
      #DIM ALL
      #COMPILE EXE
      %NumberOfChildThreads = 3
      DECLARE FUNCTION FreeThreadLocalStorage LIB "ADDTOA.DLL" ALIAS "FreeThreadLocalStorage" () AS LONG
      DECLARE FUNCTION AddToA LIB "ADDTOA.DLL" ALIAS "AddToA" (x AS LONG) AS LONG
      
      FUNCTION ChildThread(BYVAL dwParam AS LONG) AS LONG
        DIM r AS LONG
        r = AddToA(dwParam)
        r = AddToA(dwParam)
        MSGBOX STR$(r), 0, "Child Thread: " & str$(dwParam)
        FreeThreadLocalStorage
      END FUNCTION
      
      FUNCTION PBMAIN() AS LONG
        DIM i AS LONG, r AS LONG, cThreads(1:%NumberOfChildThreads) AS LONG
        FOR i = 1 TO %NumberOfChildThreads
          THREAD CREATE ChildThread(i) TO cThreads(i)
        NEXT i
        MSGBOX "Close me last or face the consequences!", 0, "Main Thread"
        FOR i = 1 TO %NumberOfChildThreads
          THREAD CLOSE cThreads(i) TO r
        NEXT i
      END FUNCTION
      Notes:
      1.There is no guarantee Libmain thread_attach/detach are going to get called. If you run into this, the app will bloat if it opens/closes lots of threads. It's not a leak because we still have a pointer to the mem. Blame MS & force users to call FreeThreadLocalStorage function before a thread closes.
      2.If you don't use dynamic arrays and only allow 255 threads AND the other app calls FreeThreadLocalStorage at thread death, you can avoid all the heap api and just use a non-dynamic global array.
      3.PutThreadStorage is there so we change stack rather than heap. It also leaves the door open if you want to add dynamic arrays or dynamic strings to your TLS.
      4.Dynamic arrays & strings can be done by creating them locally like we did the UDT and passing them BYREF also.
      5.Whoops, I just realized HeapFree uses byref ANY for the target. I changed the calls to use BYVAL, changed the LONG array to be a UDT PTR, changed the test exe too.


      [This message has been edited by Enoch S Ceshkovsky (edited March 30, 2000).]

      Comment


      • #4
        Thanks for the responses. The above code looks pretty complicated, and in addition, it would appear that every call to the exported AddToA function would cycle through a long list of instructions (via GetThreadLocalStorage and PutThreadLocalStorage). In my component, some of the exported functions are very time critical, so adding extra instructions would not be so great. By the way, I do use global arrays, so I suppose it would further complicate things.

        I find that if I load a program several times (a program which uses the DLL), the threads in each runtime process appear to be completely independent (thread-safe). And it doesn't require any extra code for it to work that way. So why isn't there a more or less clean-cut way for the same runtime process to have independent threads without adding code that will slow things down?

        Is there a way of perhaps spawning a new thread process from the host code that calls the DLL (without having to modify the DLL itself)? I figure that something like this should be possible. Or perhaps is there a product out there that can do something like this?

        Lance, you mentioned something about the altering of blocks of memory that can be accessed/altered by another thread, which indeed would require caution. However, I actually don't want any of the memory blocks to be shared at all by any of the threads. I want them to work like threads from separate runtime processes.

        I'm looking forwarding to having the local thread variable feature in a future version of PB, but in the mean time I hope to find a solution that I can already start implementing for now.

        ------------------
        Daniel Corbier
        UCalc Fast Math Parser
        http://www.ucalc.com
        Daniel Corbier
        uCalc Fast Math Parser
        uCalc Language Builder
        sigpic

        Comment


        • #5
          Daniel,
          It is not as complicated as it looks. It also is pretty quick in its current state and could be optimized heavily.

          Global dynamic arrays or just global arrays? Global arrays are not a problem if its inside the TLS UDT. Dynamic strings, globals are slow and should be avoided in time critical code anyways. If you can convert the dynamic arrays/strings to static, it can work as is. If you really really need em, its possible and I can give you another outlined example as above.

          I find that if I load a program several times (a program which uses the DLL), the threads in each runtime process appear to be completely independent (thread-safe). And it doesn't require any extra code for it to work that way.
          This statement leads me to believe that you would benefit from Lance's book suggestion. I'll try to summarize: All of a DLL's variables are actually stored in your EXE's memory. Only the code of the DLL is shared in memory. With win2k, the code is not always shared. When you make a CALL, the thread actually executing the dll code is the one that made the CALL.

          Spawning a new thread on every exported call is going to be alot slower than the sample I provided could ever be.

          To sum it up, there is no way, no how, no matter what language you use to get TLS without extra code in each exported function. If you really need to remember something, consider forcing the user to save it between calls.


          ------------------

          Comment


          • #6
            I do not want to spawn a new thread each time an exported function is called. I think I should explain my DLL component a little more to give you a better idea of what I'm trying to do.

            The component is a math parser. Currently UCalc Fast Math Parser can do a summation of an expression such as "x^2+5*x-10", running this calculation in a loop a million times, in less than one second on my PC (Pentium III 600Mhz).

            The following is pseudo-code and is not the syntax used in my math parser, however, here's an explanation of what I mean:
            Code:
            Thread 1
            --------
            For x = 1 to 1,000,000
               Total = Total + x^2+5*x-10
            Next
            
            Thread 2
            --------
            For x = 1 to 1,000,000
               Total = Total + x^2+5*x-10
            Next
            The programmer may need to run these two threads at the same time. However, the way it is now, both threads will be changing the value of x at the same time, creating a mess.

            Here is some actual code (although not useful for anything) which would work fine with the previous version of my component which was written in VB as an ActiveX DLL. You may need to browse the help file for my component to understand the syntax.

            Code:
            Private Sub Command1_Click()
                Dim Eq As Long, xVar As Long, x As Double, Total As Double
                Set Thread1 = New FastMath
                
                Thread1.ucDefineVariable "b=1000"
                xVar = Thread1.ucDefineVariable("x=0")
                Eq = Thread1.ucParse("x^2+5*x-b")
            
                For x = 1 To 1000000
                    Thread1.ucVariableValue(xVar) = x
                    Call DoSomethingInAnotherThread
                    Total = Total + Thread1.ucEvaluate(Eq)
                Next
                
                MsgBox Total
                MsgBox Thread1.ucEval("x+b")
            End Sub
            
            Private Sub DoSomethingInAnotherThread()
                Set Thread2 = New FastMath
              
                Thread2.ucDefineVariable "x=10"
                Thread2.ucDefineVariable "b=5"
                Text2 = Thread2.ucEval("x+b")
                'x and b here are independent 
                'from x and b in the other thread
            End Sub
            By the way, the previous version of the DLL which was done with VB did not have any extra code in the exported functions for taking care of TLS. This was automatically taken care of by the compiler. From my understanding of the syntax of the other compilers as mentioned earlier in this thread (this forum message thread I mean), the other compilers would not require any extra code in the exported functions either.

            ------------------
            Daniel Corbier
            UCalc Fast Math Parser
            http://www.ucalc.com
            Daniel Corbier
            uCalc Fast Math Parser
            uCalc Language Builder
            sigpic

            Comment


            • #7
              They may not need 'extra code' because the compiler adds in all of the extra code required to implement TLS for you. The only difference is that you have to write code to do it with PB, but at least you have complete control over how it is done.

              Daniel, it is clear from your descriptions that you need to learn more about how memory is treated and accessed between threads - the concept of "thread-safe" programming only applies to situations where two or more threads can access the EXACT same block of memory (be it a variable's data, a buffer, string or whatever). When you spawn a thread, it gets is own stack, therefore LOCAL variables are truly LOCAL to each thread. You need to use synchronization if two thread try to access GLOBAL variables (which are shared between threads) STATIC variables are effectively only shared between threads executing the same code.

              In you example above (two FOR X loops), as long a X is a LOCAL variable, there is no issue. However, is X is GLOBAL, then your FOR loops will explode because their instantaneous value will not be able to be predicted as both threads would alter the variable in unpredictable ways. To see the effect of LOCAL variables in operation, take a look at the thread example code supplied with PB/DLL.

              Finally, my advice is simple: get a good book on writing multi-threaded applications. Rector/Newcomers "Win32 Programming" book has a good and very informative chapter on threading, TLS and synchronization objects - and (most importantly) why synchronization is absolutely necessary.



              ------------------
              Lance
              PowerBASIC Support
              mailto:[email protected][email protected]</A>
              Lance
              mailto:[email protected]

              Comment


              • #8
                In addition to what Lance said, VB doesn't support threading like the way people think it does.
                Code:
                This is not multithreading code:
                DIM abc as new meowobject
                DIM def as new meowobject
                abc.meow "1"
                def.meow "2"
                
                This is multithreading code:
                PRIVATE WITHEVENTS abc as new meowobject
                PRIVATE WITHEVENTS def as new meowobject
                Function Command1_Click()
                 abc.meow "1"
                 def.meow "2"
                End Function
                Function abc_results()
                 msgbox abc.value
                End Function
                Function def_results()
                 msgbox def.value
                end function
                And in MeowObject's class in a SEPARATE DLL:
                PUBLIC Event Results()
                DIM WITHEVENTS timer1 as TIMER
                Sub MeowObject_Meow()
                  timer1.interval = 1000  
                  timer1.enabled = true
                End Sub
                Sub Timer1_Timer()
                  timer1.enabled = false
                  RaiseEvent Results
                End Sub

                ------------------

                Comment


                • #9
                  In the past I had gone to Amazon.com in search of a book like what you suggested. I was aware of the Newcomer Win32 book, since you had mentioned it before in different forum discussions. However, while there, I also saw books like:

                  Code:
                  * Win32 Multithreaded Programming -- Aaron Cohen, et al
                  * Multithreading Applications in Win32
                     The Complete Guide to Threads -- Jim Beveridge, Robert Wiener et al...
                  * Multithreaded Programming with Win32; Thuan Q. Pham, et al
                  etc... which seem to deal more specifically with the subject. Then there’s a number of general books like the Newcomer Win32 book and others. Too much to choose from, and I haven’t selected any yet. However, I do already have a collection of documentation on my computer, such as the full MSDN 2000 library, separate VB documentation, Delphi and C++ Builder docs, etc... all of which cover topics about threads. I’ve done some reading about Synchronization, Mutex, Critical code, TLS, thread-safety, etc... I do have an idea what these are about.

                  What I am after is not just a solution for the thread problem I'm facing. What I'm specifically inquiring about is a simple solution (in other words, one which does not require me to restructure my current code, which is complex enough as is). In addition to being simple, I want the solution not to have an adverse effect on the component‘s current speed. Apparently a simple solution in PB does not exist yet. That‘s fine, so I started this forum thread by suggesting native support for thread-local variables in a future version of PB, similar to other compilers. That’s what I’m asking for.

                  Meanwhile, I clearly understand that "thread-safe" programming applies to situations where two or more threads can access the exact same block of memory. I think you guys misunderstood the explanation of my use of global variables. Let me try again. My functions use local variables. There’s no problem there. However, I also make use of some global arrays (which happen to be dynamic) because that’s where I store certain data which must be accessible to the various exported DLL functions. So far there’s still no problem. And as such, this is probably a non-issue for most of my customers. The problem comes up for those who want to create a wrapper class for the parser which would allow them to run several threads of calculations -- each thread with its own independent definition space. In one thread they may define a variable or user-function in one way, and in another thread, they may want to use the same variable names, but with different sets of values. (By the way, the parser already allows multiple expressions to be calculated in parallel without any conflict. That‘s not where threads are needed. Users want to create parser threads only when they don’t want expressions from one thread to share any variables, user functions, modes, etc... with expressions from another thread).

                  The exported parser functions shield the programmer from what is actually going on behind the scenes. So perhaps I may have only confused things by posting a code example. (Before posting it, I had actually started creating a more elaborate example using the timer etc... but then I opted for a simpler, though less realistic snippet of code, thinking it might be easier to follow). Lance, you are right in suggesting that if x is local, then there’s no issue. That’s the way I want it to work. However, by using global variables which work as intended in a single-thread environment, it also has the undesired effect of also acting as global across threads, which is not what I want.

                  You guys apparently think that there are times when I may want to access the same memory locations from different threads. As such, synchronization of some sort would be necessary. However, the way it‘s intended, there is actually no memory location or resource whatsoever that needs to ever be shared between separate threads of the parser.

                  By the way, I can conceptualize a way to support multiple threads in a safe way without the need to load LibMain more than once. It’s just that it would require redesigning some complicated code in a way that‘s not necessarily straight-forward. I was just hoping that there was an easier way (which I still think there must be).

                  I do not know how other compilers handle native thread-local variables (as described in the first message), however, I doubt that they would add extra TLS and synchronization code to each exported function, when these variables are specifically never intended to be ever be shared across threads. If they are never to be shared, then there’s no danger of reading and writing to these variables from different threads (as they do not have the same memory locations), and no need for complex synchronization code to slow the functions down.

                  (By the way in the first message, I made a mistake saying my parser was reviewed in Visual C++ Mag (which apparently doesn’t exist). The person who informed me had meant that it appeared in the DevX Visual C++ Developer’s Journal. That’s where you’ll find the review.)


                  ------------------
                  Daniel Corbier
                  UCalc Fast Math Parser
                  http://www.ucalc.com
                  Daniel Corbier
                  uCalc Fast Math Parser
                  uCalc Language Builder
                  sigpic

                  Comment


                  • #10
                    I do not know how other compilers handle native thread-local variables (as described in the first message), however, I doubt that they would add extra TLS and synchronization code to each exported function, when these variables are specifically never intended to be ever be shared across threads. If they are never to be shared, then there’s no danger of reading and writing to these variables from different threads (as they do not have the same memory locations), and no need for complex synchronization code to slow the functions down.
                    This is not really a debateable topic on if they have extra code or not. There is no magic way to get a pointer to that thread's TLS data. And since you're going to have to search through a global array to find it, you'll have to use synchronization for that global array. The OOP compilers do their "native" TLS variables by just using objects. Which, in itself, is even slower, because now you are tracking OLS on EVERY object call that uses em. Object Level Storage: each object has it's own set of "module" level variables that can only be changed by it.




                    ------------------

                    Comment

                    Working...
                    X