Does anyone know if there is a in-program way to designate which core on a multicore CPU a thread is to be executed on? One would think there must be a Windows API for CPU core utilization control, but so far I've not located it.
Announcement
Collapse
No announcement yet.
CPU Core Control
Collapse
X
-
Thanks. It looks like that API, along with SetThreadIdealProcessor (http://msdn.microsoft.com/en-us/libr...53(VS.85).aspx ) and GetProcessAffinityMask (http://msdn.microsoft.com/en-us/libr...13(VS.85).aspx) might do it.Michael Burns
Comment
-
Hi guys, I have to make sure to take advantage of the cores of the two intel Xeon (6core + 6core) present in my old HP Z600 workstation to speed up the execution time of a complex data reading operation, their adaptation and subsequent statistical inference from these data . I can't take advantage of the power of the present gpu and (at the moment) I'm not interested in using windows threads. The idea is to take advantage of the current, monolithic, monocore, already available and which I wrote with PBW10
Regardless of the architectures, SMP (Symmetrical Multi Processing), MPP (Massive Parallel Processing), NUMA (Access to non-uniform memory), the idea is this: since not all the data of each single processing are needed at the same time, but only at the end , when all the results of each run are available, another software will try to optimize and find the best possible condition from the final data.
I have to build a master program that assigns a new execution of the already available program, which reads its specific data file to be processed, assigning each core its execution and whose results are always saved on different files. The problem is, I don't know how to interact on this hardware level. I need to understand how to see the cores, to be able to access and assign the execution of a program. Is it possible to do it with PB? Do you have a small example where a piece of supervisor code assigns a task to be performed on a specific core?
Thanks in advance if you can help me with this task!
Post scriptum: My WS has 32Gb RAM and WIN10 Pro 64bit - I studied the Microsoft reference below, but my expertise is not good at bringing out something useful: https://docs.microsoft.com/it-it/win...ocessor-groupsLast edited by Mimmo Labate; 10 Jun 2020, 06:24 AM.
Comment
-
To a large extent, windows takes care of scheduling on the cores by picking the fastest, low-use available core available for the next process, particularly when you get to the SpeedStep chips. I know that's an over simplistic view, but you might consider if writing your own core management routine is productive. Scheduling enough processes to fill the cores might be your biggest bang for the buck.
Comment
-
Mimmo,
I agree with Ray here, trying to control the distribution of processor resources is a hardware / operating system task where if you can utilise as many cores and threads as you need, the operating system will handle the distribution of process resource without you have to take much notice of it. Twin Xeon layouts tend to be a bit more laggy that a single processor so while that configuration my not be a big hit with gaming type tasks, 12 cores and 24 threads should give you some decent processor grunt if you can utilise enough of it.hutch at movsd dot com
The MASM Forum - SLL Modules and PB Libraries
http://www.masm32.com/board/index.php?board=69.0
Comment
-
"I'm not interested in using windows threads"
But Windows threads is how you take advantage of all the CPU cores.
It's rarely worthwile controlling these things yourself.
Usually you would write your program using threads and let the operating system allocate each thread to a CPU core.
If you have 12 pending threads then the OS will allocate them to the 12 available cores.
Comment
-
Hello Guys!
First of all I want to thank (in chronological order of the respective messages) Raymond, Steve and Paul! I must say that, as always, the great competence of the members of this forum is confirmed. I don't think it can be denied that this patrimony of professionalism has an invaluable value! I am more than convinced! Thanks to your clarifications and suggestions I understood that it is something that depends on the windows scheduler as much as I thought I could do.
But I did not understand correctly, if I have to use the WIndows API that manages the Threads or just modify the program to use the PB instruction set relating to Threads or both?
Thank you very much for your invaluable help !!!
Mimmo
Comment
-
You don't need to use the Windows API, just PowerBASIC commands.
Here's an old example to demonstrate simple processing of large arrays using threads so multiple CPU cores can speed it up, but it's for PBCC so you'll need to change the PRINTs to MSGBOXes
Code:%LoopLimit = 10000'0000 #COMPILE EXE #DIM ALL #BREAK ON #INCLUDE "win32api.inc" TYPE ThreadToken StartIndex AS LONG EndIndex AS LONG END TYPE GLOBAL a() AS QUAD FUNCTION PBMAIN () AS LONG LOCAL r,t,ASize, NoofThreads AS LONG LOCAL t1,t2 AS EXT LOCAL freq, count0, count1 AS QUAD PRINT "Running..." QueryPerformanceFrequency freq 'Get timer frequency. aSize = 362880 '145152 '25200 '362880 ' a number divisible by all integers up to 10 so I don't get odd elements when I split the array DIM a(aSize) AS QUAD 'LONG 'fill array with test data FOR r = 1 TO aSize a(r)=r NEXT QueryPerformanceCounter count0 'read the timer at the start of the test 'GOTO skip 'the job to do FOR t = 1 TO %LoopLimit FOR r = 1 TO aSize a(r)=a(r)+1 NEXT NEXT QueryPerformanceCounter count1 'read the timer at the end of the test 'check FOR r = 1 TO aSize IF a(r)<> r+ 1 * %LoopLimit THEN PRINT "error at location";r,a(r) END IF NEXT PRINT "Time without threads=";FORMAT$(1000*(count1-count0)/freq,"######0.000");"msec" 'print the elapsed time skip: 'now do the same job with threads LOCAL Token() AS ThreadToken LOCAL hThreads() AS LONG LOCAL junk AS LONG DIM hThreads(1 TO 2000) DIM Token(1 TO 2000) FOR NoofThreads = 1 TO 10 'fill array with test data FOR r = 1 TO aSize a(r)=r NEXT QueryPerformanceCounter count0 'read the timer at the start of the test FOR r = 1 TO noofThreads Token(r).StartIndex= 1+ (r-1)*(ASize/NoofThreads) Token(r).EndIndex= r*(ASize/NoofThreads) ' THREAD CREATE ProcessingThread(VARPTR(Token(r))),SUSPEND TO hThreads(r) THREAD CREATE ProcessingThread(VARPTR(Token(r))), TO hThreads(r) THREAD SET PRIORITY hThreads(r), %THREAD_PRIORITY_BELOW_NORMAL NEXT ' FOR r = 1 TO noofThreads ' THREAD RESUME hThreads(r) TO junk ' NEXT ' WaitForMultipleObjects(BYVAL NoofThreads, BYVAL VARPTR(hThreads(1)),BYVAL %true, BYVAL %INFINITE) FOR r = 1 TO NoofThreads WaitForSingleObject(hThreads(r), BYVAL %INFINITE) NEXT FOR r = 1 TO noofThreads THREAD CLOSE hThreads(r) TO junk NEXT QueryPerformanceCounter count1 'read the timer at the end of the test 'check FOR r = 1 TO aSize IF a(r)<> r + 1* %LoopLimit THEN PRINT "error at location";r,a(r) ;" Threads=";noofThreads WAITKEY$ END IF NEXT PRINT "Time taken with";NoofThreads; "threads="FORMAT$(1000*(count1-count0)/freq,"######0.000");"msec" 'print the elapsed time NEXT WAITKEY$ END FUNCTION THREAD FUNCTION ProcessingThread(BYVAL TokenPointer AS LONG) AS LONG LOCAL InputData AS ThreadToken PTR LOCAL r,t AS LONG InputData=TokenPointer FOR t = 1 TO %LoopLimit FOR r = @InputData.StartIndex TO @InputData.EndIndex a(r)=a(r)+1 NEXT NEXT END FUNCTION
Comment
-
Paul I am immensely grateful to you for your clarification to my doubts and also for the excellent example you have published that from now on I start studying to internalize the context and then I try to adapt it to my situation. Paul, thank you very much for your help !!!
Greetings from Germany
Mimmo
Comment
-
Generally speaking, affinity allows you to force a badly behaving app to use use a single or specific cores. Some apps multi-thread poorly, so assigning affinity allows you to correct that issue and in very some rare cases can minimally improve speed (i.e. locked to a single core saves some minimal ticks when you avoid swapping cores for different threads)
Other than that, allowing Windows to allocate your thread affinity is the best way to go as it will balance the load better.
<b>George W. Bleck</b>
<img src='http://www.blecktech.com/myemail.gif'>
Comment
-
Originally posted by Mimmo Labate View Post(at the moment) I'm not interested in using windows threads.
Comment
-
Windows will balance those across free cores pretty well
Going back to Post #1, the original question from Mr. Burns and and post #6 from Mr. Labate.
Does anyone know if there is a in-program way to designate which core on a multicore CPU a thread is to be executed on? One would think there must be a Windows API for CPU core utilization control, but so far I've not located it.
Enquiring Minds Want to Know! (Because there may be a much easier way to do it!)Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
Ok, if you download this tool, then you will see that it does just this.
It will make prefered Applications run on "the best cores".
While Theoretically "all cores are the same", this is in real life not the case.
Some cores can run with a higher speed while others can not.
Seen from that standpoint it may make sense to run preferred applications on dedicated cores.
At least Intel seems to think so.
Comment
-
Originally posted by Raymond Leech View Post
While I agree in theory with Paul's comment on this, I have a slightly different take here. We have some monolithic processes that need to run essentially on-demand (client uploads file, it needs processed). On a normal day, one process watching for files was fine. On days prior to a deadline (quarterly/yearly), we get thousands (or 10s of thousands) of files all at the same time. We wrote a little job scheduler that spawns up to 8 stand-alone processes at a time, with a job# included on the command line. Windows will balance those across free cores pretty well. The jobs signal the scheduler when they are done and another process is started so long as there are files in the queue. It was a lot easier (lazier) than making the processes multi-threaded. There were also some business reasons that complicated rewriting the app. We all do what we have to, to get the job done.
Thanks Raymond!
Comment
-
Originally posted by Michael Mattias View PostEnquiring Minds Want to Know! (Because there may be a much easier way to do it!)
I would be very interested in understanding your approach which seems to be simpler but that really eludes me at the moment. I am too busy thinking about a viable way without revolutionizing the current PB code.
Thanks in advance
Comment
-
In line with this thread started, is there a way to send a thread to the GPU? I have successfully implemented multi-thread processing in our software and we can decide how many of the CPU logical cores to take over during processing. Windows handles all that for us on using all the available cores desired. But can we also send some threads separately to the GPU or is this not possible...Not an expert on GPU utilization. I should note our multithread processing we do not use the Thread Create command, but shell out to a separate application.
Comment
-
. I am too busy thinking about a viable way without revolutionizing the current PB code.[emphasis mine]
By the way.,...
When you are trying to take advantage of multi-threading, you also need to make sure your "worker" thread is not forcing thread switches, for example by calling tor the text of a control in a window or dialog which is executing in another thread context.
If your program is doing this (code not shown), it won't require "revolutionizing" it to effect this kind of performance-enhancing change.
Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
Originally posted by Mimmo Labate View Postthis representation of yours seems to me to have many similarities...
At the most bare-bones level, you create a work queue (table/file). The scheduler assigns each task an entry in the work queue with a unique number. Assuming there is a worker slot free, mark the unit of work as assigned to a worker slot (instance), and spawn the process with a command line like:
Code:import <instance#> <item#>
- instance# is a number between 1 and the max number of processes configured to run
- item# is the unique number from the work queue
You need a method to mark the work item as processed, and signal that the shelled 'instance' is complete.
The scheduler watches the slots until a slot is available (either by sleep in a loop or an event signal). If a slot is available, it looks for new unassigned work items, and if there is work to do, assigns it to to the free slot and starts the external process.
I know that's a pretty jankey description, but probably the bare minimum starting point.
There's a lot that goes into a good work que system, regardless if its threads or processes. If you really intend to do some kind of job scheduling, you should do some reading first. Google "design good work queueing" or "job queue design". I'm sure the top few articles explain this much better than I. I modeled my implementation after the Stratus queue system and an old custom workflow system, because they were something I knew intimately. Pick something that makes sense to you and good luck!
Comment
Comment