Announcement

Collapse
No announcement yet.

CPU Core Control

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CPU Core Control

    Does anyone know if there is a in-program way to designate which core on a multicore CPU a thread is to be executed on? One would think there must be a Windows API for CPU core utilization control, but so far I've not located it.
    Michael Burns

  • #2
    Michael,
    I've never used it but maybe SetThreadAffinityMask is what you want:



    Paul.

    Comment


    • #3
      Thanks. It looks like that API, along with SetThreadIdealProcessor (http://msdn.microsoft.com/en-us/libr...53(VS.85).aspx ) and GetProcessAffinityMask (http://msdn.microsoft.com/en-us/libr...13(VS.85).aspx) might do it.
      Michael Burns

      Comment


      • #4
        Hi guys, I have to make sure to take advantage of the cores of the two intel Xeon (6core + 6core) present in my old HP Z600 workstation to speed up the execution time of a complex data reading operation, their adaptation and subsequent statistical inference from these data . I can't take advantage of the power of the present gpu and (at the moment) I'm not interested in using windows threads. The idea is to take advantage of the current, monolithic, monocore, already available and which I wrote with PBW10

        Regardless of the architectures, SMP (Symmetrical Multi Processing), MPP (Massive Parallel Processing), NUMA (Access to non-uniform memory), the idea is this: since not all the data of each single processing are needed at the same time, but only at the end , when all the results of each run are available, another software will try to optimize and find the best possible condition from the final data.

        I have to build a master program that assigns a new execution of the already available program, which reads its specific data file to be processed, assigning each core its execution and whose results are always saved on different files. The problem is, I don't know how to interact on this hardware level. I need to understand how to see the cores, to be able to access and assign the execution of a program. Is it possible to do it with PB? Do you have a small example where a piece of supervisor code assigns a task to be performed on a specific core?
        Thanks in advance if you can help me with this task!


        Post scriptum: My WS has 32Gb RAM and WIN10 Pro 64bit - I studied the Microsoft reference below, but my expertise is not good at bringing out something useful: https://docs.microsoft.com/it-it/win...ocessor-groups
        Last edited by Mimmo Labate; 10 Jun 2020, 06:24 AM.

        Comment


        • #5
          To a large extent, windows takes care of scheduling on the cores by picking the fastest, low-use available core available for the next process, particularly when you get to the SpeedStep chips. I know that's an over simplistic view, but you might consider if writing your own core management routine is productive. Scheduling enough processes to fill the cores might be your biggest bang for the buck.

          Comment


          • #6
            Mimmo,

            I agree with Ray here, trying to control the distribution of processor resources is a hardware / operating system task where if you can utilise as many cores and threads as you need, the operating system will handle the distribution of process resource without you have to take much notice of it. Twin Xeon layouts tend to be a bit more laggy that a single processor so while that configuration my not be a big hit with gaming type tasks, 12 cores and 24 threads should give you some decent processor grunt if you can utilise enough of it.
            hutch at movsd dot com
            The MASM Forum - SLL Modules and PB Libraries

            http://www.masm32.com/board/index.php?board=69.0

            Comment


            • #7
              "I'm not interested in using windows threads"

              But Windows threads is how you take advantage of all the CPU cores.

              It's rarely worthwile controlling these things yourself.
              Usually you would write your program using threads and let the operating system allocate each thread to a CPU core.
              If you have 12 pending threads then the OS will allocate them to the 12 available cores.

              Comment


              • #8
                Hello Guys!
                First of all I want to thank (in chronological order of the respective messages) Raymond, Steve and Paul! I must say that, as always, the great competence of the members of this forum is confirmed. I don't think it can be denied that this patrimony of professionalism has an invaluable value! I am more than convinced! Thanks to your clarifications and suggestions I understood that it is something that depends on the windows scheduler as much as I thought I could do.
                But I did not understand correctly, if I have to use the WIndows API that manages the Threads or just modify the program to use the PB instruction set relating to Threads or both?
                Thank you very much for your invaluable help !!!
                Mimmo

                Comment


                • #9
                  You don't need to use the Windows API, just PowerBASIC commands.
                  Here's an old example to demonstrate simple processing of large arrays using threads so multiple CPU cores can speed it up, but it's for PBCC so you'll need to change the PRINTs to MSGBOXes
                  Code:
                  %LoopLimit = 10000'0000
                  
                  #COMPILE EXE
                  #DIM ALL
                  #BREAK ON
                  
                  #INCLUDE "win32api.inc"
                  
                  TYPE ThreadToken
                      StartIndex   AS LONG
                      EndIndex     AS LONG
                  END TYPE
                  
                  
                  GLOBAL a() AS QUAD
                  
                  
                  FUNCTION PBMAIN () AS LONG
                  LOCAL r,t,ASize, NoofThreads AS LONG
                  LOCAL t1,t2 AS EXT
                  LOCAL freq, count0, count1 AS QUAD
                  
                  PRINT "Running..."
                  
                  QueryPerformanceFrequency freq   'Get timer frequency.
                  
                  aSize = 362880 '145152  '25200   '362880 ' a number divisible by all integers up to 10 so I don't get odd elements when I split the array
                  DIM a(aSize) AS QUAD 'LONG
                  
                  'fill array with test data
                  FOR r = 1 TO aSize
                      a(r)=r
                  NEXT
                  
                  
                  QueryPerformanceCounter count0   'read the timer at the start of the test
                  
                  'GOTO skip
                  'the job to do
                  FOR t = 1 TO %LoopLimit
                      FOR r = 1 TO aSize
                          a(r)=a(r)+1
                      NEXT
                  NEXT
                  
                  
                  QueryPerformanceCounter count1   'read the timer at the end of the test
                  
                  'check
                  FOR r = 1 TO aSize
                      IF a(r)<> r+ 1 * %LoopLimit THEN
                          PRINT "error at location";r,a(r)
                      END IF
                  
                  NEXT
                  
                  PRINT "Time without threads=";FORMAT$(1000*(count1-count0)/freq,"######0.000");"msec"  'print the elapsed time
                  skip:
                  
                  
                  'now do the same job with threads
                  LOCAL Token() AS ThreadToken
                  LOCAL hThreads() AS LONG
                  LOCAL junk AS LONG
                  
                  DIM hThreads(1 TO 2000)
                  DIM Token(1 TO 2000)
                  
                  FOR NoofThreads = 1 TO 10
                  
                      'fill array with test data
                      FOR r = 1 TO aSize
                          a(r)=r
                      NEXT
                  
                      QueryPerformanceCounter count0   'read the timer at the start of the test
                      FOR r = 1 TO noofThreads
                  
                          Token(r).StartIndex= 1+ (r-1)*(ASize/NoofThreads)
                          Token(r).EndIndex=  r*(ASize/NoofThreads)
                  
                        '  THREAD CREATE ProcessingThread(VARPTR(Token(r))),SUSPEND TO hThreads(r)
                           THREAD CREATE ProcessingThread(VARPTR(Token(r))), TO hThreads(r)
                           THREAD SET PRIORITY hThreads(r), %THREAD_PRIORITY_BELOW_NORMAL
                  
                      NEXT
                  
                   '   FOR r = 1 TO noofThreads
                   '       THREAD RESUME hThreads(r) TO junk
                   '   NEXT
                  
                  '    WaitForMultipleObjects(BYVAL NoofThreads, BYVAL VARPTR(hThreads(1)),BYVAL %true, BYVAL %INFINITE)
                  FOR r = 1 TO NoofThreads
                      WaitForSingleObject(hThreads(r), BYVAL %INFINITE)
                  NEXT
                  
                      FOR r = 1 TO noofThreads
                          THREAD CLOSE hThreads(r) TO junk
                      NEXT
                  
                      QueryPerformanceCounter count1   'read the timer at the end of the test
                  
                      'check
                      FOR r = 1 TO aSize
                          IF a(r)<> r + 1* %LoopLimit THEN
                              PRINT "error at location";r,a(r) ;" Threads=";noofThreads
                              WAITKEY$
                          END IF
                  
                      NEXT
                  PRINT "Time taken with";NoofThreads; "threads="FORMAT$(1000*(count1-count0)/freq,"######0.000");"msec"  'print the elapsed time
                  
                  NEXT
                  
                  
                  WAITKEY$
                  END FUNCTION
                  
                  
                  
                  THREAD FUNCTION ProcessingThread(BYVAL TokenPointer AS LONG) AS LONG
                  LOCAL InputData AS ThreadToken  PTR
                  LOCAL r,t AS LONG
                  
                  InputData=TokenPointer
                  
                  FOR t = 1 TO %LoopLimit
                      FOR r = @InputData.StartIndex TO @InputData.EndIndex
                          a(r)=a(r)+1
                      NEXT
                  NEXT
                  
                  END FUNCTION

                  Comment


                  • #10
                    Paul I am immensely grateful to you for your clarification to my doubts and also for the excellent example you have published that from now on I start studying to internalize the context and then I try to adapt it to my situation. Paul, thank you very much for your help !!!
                    Greetings from Germany
                    Mimmo

                    Comment


                    • #11
                      Generally speaking, affinity allows you to force a badly behaving app to use use a single or specific cores. Some apps multi-thread poorly, so assigning affinity allows you to correct that issue and in very some rare cases can minimally improve speed (i.e. locked to a single core saves some minimal ticks when you avoid swapping cores for different threads)

                      Other than that, allowing Windows to allocate your thread affinity is the best way to go as it will balance the load better.

                      <b>George W. Bleck</b>
                      <img src='http://www.blecktech.com/myemail.gif'>

                      Comment


                      • #12
                        Originally posted by Mimmo Labate View Post
                        (at the moment) I'm not interested in using windows threads.
                        While I agree in theory with Paul's comment on this, I have a slightly different take here. We have some monolithic processes that need to run essentially on-demand (client uploads file, it needs processed). On a normal day, one process watching for files was fine. On days prior to a deadline (quarterly/yearly), we get thousands (or 10s of thousands) of files all at the same time. We wrote a little job scheduler that spawns up to 8 stand-alone processes at a time, with a job# included on the command line. Windows will balance those across free cores pretty well. The jobs signal the scheduler when they are done and another process is started so long as there are files in the queue. It was a lot easier (lazier) than making the processes multi-threaded. There were also some business reasons that complicated rewriting the app. We all do what we have to, to get the job done.

                        Comment


                        • #13
                          Windows will balance those across free cores pretty well
                          That's a good point, Raymond. Windows is a pretty good operating system when it comes to utilizing system resources.

                          Going back to Post #1, the original question from Mr. Burns and and post #6 from Mr. Labate.

                          Does anyone know if there is a in-program way to designate which core on a multicore CPU a thread is to be executed on? One would think there must be a Windows API for CPU core utilization control, but so far I've not located it.
                          But that now begs the question: "Why is it so important that a particular process (or thread) only operate on a specific processor? " As processing power becomes available, Windows will use it regardless of on which processor is available. So what if part of your process runs on processor 'A' and another part of your process runs on processor "B?"

                          Enquiring Minds Want to Know! (Because there may be a much easier way to do it!)
                          Michael Mattias
                          Tal Systems (retired)
                          Port Washington WI USA
                          [email protected]
                          http://www.talsystems.com

                          Comment


                          • #14
                            Ok, if you download this tool, then you will see that it does just this.
                            It will make prefered Applications run on "the best cores".
                            While Theoretically "all cores are the same", this is in real life not the case.
                            Some cores can run with a higher speed while others can not.

                            Seen from that standpoint it may make sense to run preferred applications on dedicated cores.
                            At least Intel seems to think so.
                            Download new and previously released drivers including support software, bios, utilities, firmware and patches for Intel products.

                            Comment


                            • #15
                              Originally posted by Raymond Leech View Post

                              While I agree in theory with Paul's comment on this, I have a slightly different take here. We have some monolithic processes that need to run essentially on-demand (client uploads file, it needs processed). On a normal day, one process watching for files was fine. On days prior to a deadline (quarterly/yearly), we get thousands (or 10s of thousands) of files all at the same time. We wrote a little job scheduler that spawns up to 8 stand-alone processes at a time, with a job# included on the command line. Windows will balance those across free cores pretty well. The jobs signal the scheduler when they are done and another process is started so long as there are files in the queue. It was a lot easier (lazier) than making the processes multi-threaded. There were also some business reasons that complicated rewriting the app. We all do what we have to, to get the job done.
                              Raymond, this representation of yours seems to me to have many similarities with what I would like to solve quickly. You can outline your pragmatic approach to speeding up the job without making many revolutions because then I fall into the ditch and I will need time to be sure that everything works as expected. Thanks for your wise advice!
                              Thanks Raymond!

                              Comment


                              • #16
                                Originally posted by Michael Mattias View Post
                                Enquiring Minds Want to Know! (Because there may be a much easier way to do it!)
                                Mr. Michael Mattias, I think Theo has well represented the sense of wanting to assign a heavy task to a specific core, what can do a faster job than the other cores!
                                I would be very interested in understanding your approach which seems to be simpler but that really eludes me at the moment. I am too busy thinking about a viable way without revolutionizing the current PB code.
                                Thanks in advance



                                Comment


                                • #17
                                  In line with this thread started, is there a way to send a thread to the GPU? I have successfully implemented multi-thread processing in our software and we can decide how many of the CPU logical cores to take over during processing. Windows handles all that for us on using all the available cores desired. But can we also send some threads separately to the GPU or is this not possible...Not an expert on GPU utilization. I should note our multithread processing we do not use the Thread Create command, but shell out to a separate application.

                                  Comment


                                  • #18
                                    Coding for the GPU is completely different, it's not like it is a fast x86 device you can just send threads to..It's manly designed for a limited "palette" of parallel tasks. Take a look at CUDA.
                                    <b>George W. Bleck</b>
                                    <img src='http://www.blecktech.com/myemail.gif'>

                                    Comment


                                    • #19
                                      . I am too busy thinking about a viable way without revolutionizing the current PB code.[emphasis mine]
                                      This is the second thread in a week where someone has asked for help improving the performance of an existing program, but doing so without the extra work it might entail to change the design and/or existing code.

                                      By the way.,...

                                      When you are trying to take advantage of multi-threading, you also need to make sure your "worker" thread is not forcing thread switches, for example by calling tor the text of a control in a window or dialog which is executing in another thread context.

                                      If your program is doing this (code not shown), it won't require "revolutionizing" it to effect this kind of performance-enhancing change.
                                      Michael Mattias
                                      Tal Systems (retired)
                                      Port Washington WI USA
                                      [email protected]
                                      http://www.talsystems.com

                                      Comment


                                      • #20
                                        Originally posted by Mimmo Labate View Post
                                        this representation of yours seems to me to have many similarities...
                                        Mimmo,

                                        At the most bare-bones level, you create a work queue (table/file). The scheduler assigns each task an entry in the work queue with a unique number. Assuming there is a worker slot free, mark the unit of work as assigned to a worker slot (instance), and spawn the process with a command line like:
                                        Code:
                                        import <instance#> <item#>
                                        Where:
                                        - instance# is a number between 1 and the max number of processes configured to run
                                        - item# is the unique number from the work queue

                                        You need a method to mark the work item as processed, and signal that the shelled 'instance' is complete.

                                        The scheduler watches the slots until a slot is available (either by sleep in a loop or an event signal). If a slot is available, it looks for new unassigned work items, and if there is work to do, assigns it to to the free slot and starts the external process.

                                        I know that's a pretty jankey description, but probably the bare minimum starting point.

                                        There's a lot that goes into a good work que system, regardless if its threads or processes. If you really intend to do some kind of job scheduling, you should do some reading first. Google "design good work queueing" or "job queue design". I'm sure the top few articles explain this much better than I. I modeled my implementation after the Stratus queue system and an old custom workflow system, because they were something I knew intimately. Pick something that makes sense to you and good luck!




                                        Comment

                                        Working...
                                        X