Announcement

Collapse
No announcement yet.

fastest number cruncher?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Jim Fritts
    replied
    Code:
    'Where speedup = 1/((P/N) + S)
    'The maximum speed increase for 2 cores is speedup = ~1.98 times no matter how many threads are used.
    'Given: P = 99% Code parallelized
    '       S = 01% Code serialized
    '       N = 2 system cores
    
    'The maximum speed increase for 4 cores is speedup = ~3.88 times no matter how many threads are used.
    'Given: P = 99% Code parallelized
    '       S = 01% Code serialized
    '       N = 4 system cores
    
    'The maximum speed increase for 8 cores is speedup = ~7.476 times no matter how many threads are used.
    'Given: P = 99% Code parallelized
    '       S = 01% Code serialized
    '       N = 8 system cores

    Leave a comment:


  • Michael Burns
    replied
    Originally posted by Petr Schreiber jr View Post
    I posted an starter example on using this technology from PowerBASIC sometime ago:
    http://www.jose.it-berater.org/smffo...p?topic=3327.0

    Petr
    That example seems to be missing the CL.INC file. Is that available anywhere?

    Leave a comment:


  • Michael Mattias
    replied
    >and get it working then ask for advice of how to speed up the critical parts.

    I think I am getting through to some people. I'll declare victory now.

    Leave a comment:


  • Paul Dixon
    replied
    Peter,
    it's probably better to rewrite you program in a way you're comfortable with and get it working then ask for advice of how to speed up the critical parts. There's lots of things you could try but most might not apply to your program. If you have a working program and post your specific problem then you will likely get more specific and useful advice.

    General advice for speeding things up has been asked for before e.g.:
    http://www.powerbasic.com/support/pb...ad.php?t=47989
    http://www.powerbasic.com/support/pb...ad.php?t=11567

    Paul.

    Leave a comment:


  • peter edmiston
    replied
    I just bought PBCC ver 6 and it will be worth it even for a 10% speed improvement. Thanks to everyone. I now have to convert 8000 lines of code (much of it badly written) so I may not survive the process. I have made a start and now is a good time to consider the program structure. Given the interest in speed, is there any program structure to avoid/use? Writing to the screen (minimising of) has been mentioned in this thread but I guess there are other traps.

    Someone, who knows more about this than I do, said in a corporate environment you plan out the structure for months and then writing the code is simple as you already know what to do. This is a really weird concept for me. I expect there is a procedure/framework to use when planning. Is there a reference somewhere that provides examples? My program is quite simple and the problem may be the corporate models are too big/complex to follow.

    Leave a comment:


  • Rodney Hicks
    replied
    FWIW Many years ago ~10 or so, I used a very early version of PBCC, on the Win 98 OS, I believe, to access and manipulate 6 years of hourly meteorological data, items such as temperature, relative humidity, barometric pressure, wind speed, wind direction, etc. I forget how many items there were, but I correlated each item with my work history for the same period.
    This whole process, even with my badly written code, did not take an overly long time to produce results, although I can't remember just how long it took. I know I was expecting, when I started, that it may take a few hours to produce anything usable, but it didn't take hours.
    I had started this in PBDOS, but gave up and switched to PBCC because of the amount of data. Just what the speed improvement was I can't say but it was substantial even at that time.

    Interestingly, although not related to the issue, I had expected to find that my problems with my work attendance were related to barometric pressure, that turned out not to be the indicator. There was a 92.xx% correlation between an attendance issue and 8 hour periods of unchanged relative humidity. And a 91.xxx% correlation between my attendance issues and 8 hour periods of unchanging temperature. During that period, I had missed an average of 16 or 17 days a year, and was late for work at least twice that often.

    Also interesting, that was the only program I wrote between 1999 and 2007.
    Last edited by Rodney Hicks; 1 Sep 2011, 04:50 PM.

    Leave a comment:


  • Bob Zale
    replied
    Both PB/CC 3 and PB/CC 6 generate 32-bit executables, so that isn't the direct issue. However, overall, PB/CC 6 will offer considerably better performance and an improved feature set. Highly recommended.

    Bob Zale

    Leave a comment:


  • peter edmiston
    replied
    I just tried an small experiment running exactly the same program on PB3.5 (DOS) and PBCC ver 3.0. It takes 27 seconds for the DOS and 9 seconds for the PBCC.
    Please excuse my ignorance but can I assume PBCC 3.0 is a 16 bit compiller and the latest version is a 32 bit? Can I also assume that if I use a 32 bit version it will be faster than the 16?
    Thanks

    Leave a comment:


  • John Petty
    replied
    Originally posted by Michael Mattias View Post
    My point is more, often we see here references to "Huge" or "Gigantic" or "Really Big" tasks.... which may well have been accurate descriptions running under MS-DOS on a 6 Mhz processor... but under Windows' using a modern computer it's a nothing and just not worth the effort of developing some special 'optimization scheme.'


    MCM
    Agreed
    I have programmes that 15 years ago took 6 to 8 hours, today they take less than 2 minutes. They work on data files of 400MB+. The speed came from three simple reasons. Hardware, 32 Bit processing and the only real optimisation of understanding how to use the amount of memory that can be addressed.

    Leave a comment:


  • Michael Mattias
    replied
    My point is more, often we see here references to "Huge" or "Gigantic" or "Really Big" tasks.... which may well have been accurate descriptions running under MS-DOS on a 6 Mhz processor... but under Windows' using a modern computer it's a nothing and just not worth the effort of developing some special 'optimization scheme.'


    MCM

    Leave a comment:


  • John Petty
    replied
    Originally posted by Michael Mattias View Post
    FWIW, 1.5 Meg of data x 5M calculations = Not even breathing hard for modern computers.

    DISCLAIMER: "how many runs" not shown. Also assumes one calculation = one arithmetic operation.
    Agreed, PB in its own advertising gives an example of simple floating point math where the current compilers are over 2000 times faster than their own DOS compilers on the same computer. It may be an extreme example but shows the benefit of going to 32 Bit.
    OP you mention accessing a 1.7 Meg file which is in todays terms small. Yes Windows will most likely keep it in cache but even faster as you now have 2GB of flat memory addressing then just load the whole file into your program in one go and use it from there.

    Leave a comment:


  • Michael Mattias
    replied
    FWIW, 1.5 Meg of data x 5M calculations = Not even breathing hard for modern computers.

    DISCLAIMER: "how many runs" not shown. Also assumes one calculation = one arithmetic operation.

    Leave a comment:


  • Paul Dixon
    replied
    Peter,
    a 4 or 8 core PC with 4/8 threads running
    Be aware that Intel HyperThreading allows you to run twice the number of threads on a CPU but it is not twice as fast. Instead, each core shares its resources between the 2 threads which can make better use of the CPU resources by getting 20%-30% more work done, but not 100% more.


    writing to the screen is deadly slow
    Not if you do it sensibly. Update the screen only when needed and only at a rate that's useful for the user and you'll not notice the extra time it takes but you may well benefit from the feedback it provides.


    stick to integers (in the belief this is faster?)
    Depends on the job you're doing.


    So it seems there is no obvious advantage in splitting the program into more than the maximum number of cores available
    Which seems to be almost exactly the same as
    They aren't exactly the same! You complete faster with more threads in that case beacuse your CPU is 100% utilised for 24s instead of 100% utilised for 18s and 50% utilised for the next 9s.
    As usual, it depends on what you're doing but if each thread was to take 1s then you might not notice but if each thread was to take 12 hours then you'd be waiting hours longer than necessary for the final result.



    I can't help thinking there is a lot of unused CPU despite the 100% usage as shown on task manager.
    A lot of that unused CPU is utilised by Intel in the hyperthreading mentioned above.
    The rest is up to you to use and you need to program with that in mind. The "100%" tells you that you had complete access to the CPU resources, it doesn't tell you how well you made use of them.


    Paul.

    Leave a comment:


  • peter edmiston
    replied
    Thanks again for the detailed replies. I gather there is no such thing as a free lunch so the graphics card is on the back burner. I just wish no one had told me what kind of speed is possible, if only I was clever...
    At this stage the most practical thing is a 4 or 8 core PC with 4/8 threads running. That is probably enough to get started.
    I agree, writing to the screen is deadly slow and stick to integers (in the belief this is faster?).

    I just tried an experiment with a simple counting loop. The experiment was run an a dual core PC. I then copied the same program 5 times and ran them independently.
    Running 1 9 seconds each
    Running 2 9 seconds each
    Running 5 ~24 seconds average time

    Which seems to be almost exactly the same as

    Core 1 Running 2 x programs, each taking 18 seconds
    Core 2 Running 3 x programs, eaching taking 27 seconds
    Average of (2x18 + 3 x 27)/5=23.4 seconds

    So it seems there is no obvious advantage in splitting the program into more than the maximum number of cores available.

    I can't help thinking there is a lot of unused CPU despite the 100% usage as shown on task manager.

    Brian: The code is just simple maths used for a hydrological simulation model. There are daily weather inputs and then the program calculates plant growth and soil water and runoff for 120 years of data. Then it changes a variable and repeats the process. If the program is faster I can then use smaller steps and spend more time on optimisation. There are some thousands of lines of code but only a small part of this does the iterative processing.
    However if it takes 30 lines of GPU code to use one of PBCC code then it is going to be slow to take advantage of this opportunity.

    It sounds like a job for Uncle Bob: Power GPU

    Thanks again.

    Leave a comment:


  • Nick Luick
    replied
    If you still want to use a ramdisk you might try,

    http://memory.dataram.com/products-a...ftware/ramdisk

    They have free version up to 4GB using NFTS (unpaid-free), and 2GB with Fat32. I have not tried it yet but in forums users seem happy. Fairly good PDF doc's

    One advantage of memory drive is less wear and tear on hard drive + much faster i/o. They have another feature for 32bit OS, where memory installed above 4GB can be used as memory drive.

    Leave a comment:


  • Mel Bishop
    replied
    Originally posted by peter edmiston View Post
    ...structure the set-up for maximum speed....
    Been thinking about this for a while. Since you didn't specify one way or the other: Avoid screen print/write routines like the plague.

    That, in itself, will speed things up a minimum of 10x.

    If you absolutely have to have screen updates, try updating every 10,000 (or so) calculations.

    But it all depends on what you are trying to accomplish.

    Leave a comment:


  • Paul Dixon
    replied
    Peter,
    Q#1: I used to use a RAM disk to speed up data read/write. If the cache will do this anyway is it correct to assume there is no value in doing this (approx 1.5 meg read size)?
    The Windows file system will cache that for you anyway so you don't need a RAM disk.

    Q#2: Is there any value in using an XP OS with a minimal install to minimise the potential for competing processes? It seems unnecessary.
    No value, it's not necessary.


    Q3: I had a read of Petr's paper and it seems that he used OpenCL. However if I just stick to PBCC and add a graphics card do I need to specifically instruct the graphics card to work or will it do this by default?
    I'd forget the suggestion of using a graphics card. It's a very specialised area which will give you benefits in very restricted circumstances and requires you to rewrite your code. First get your program working in Windows then, if it doesn't perform well enough, look at other alternatives.

    Q4: Assuming I could (theoretically) split the calculations into an infinite number of separate processes or just one larger, identical process BUT I am confined to just one processor (core) will it be faster to parallel process or does Windows XP "multi-task" merely by moving sequentially from one task to another (albeit quickly). If multi-tasking is just sequential rather than parallel then there seems no benefit??
    You must split your process up into individual threads to take advantage of multiple cores. It's not that difficult.
    Windows will then schedule the threads on the available CPU cores.
    If you have only 1 thread then it will only ever use 1 core.

    Search this place and you'll find plenty of examples of using multiple threads to speed things up such as:
    http://www.powerbasic.com/support/pb...ad.php?t=44282
    http://www.powerbasic.com/support/pb...=41843&page=10



    Q5: XP is now replaced by Win7. Is this a better OS for multi-tasking?
    No. Just go with whicher OS you're most comfortable with.

    Paul.

    Leave a comment:


  • Brian Chirgwin
    replied
    I would start by doing this in PBCC and just writing the code. I think you will find it runs faster than the current DOS version.

    Use the built in PB profiler. This will tell you where the application spends most of its time and is the performance bottle neck.

    Improve performance in this area, repeat.

    NVidia does simplify things using the CUDA toolkit, but Petr correct, there is still setup to work with the graphics card. By the way there is a debugger if you have two graphics cards.

    What type of calculations are you doing? I'd be interested in helping if you can post the current code or explain the project. I am sure others will add there ideas too.

    Leave a comment:


  • Petr Schreiber jr
    replied
    Hi Peter,

    Q3: I had a read of Petr's paper and it seems that he used OpenCL. However if I just stick to PBCC and add a graphics card do I need to specifically instruct the graphics card to work or will it do this by default?
    OpenCL is technology usable from PB/CC - on Windows, it is "just" set of functions in DLL installed by graphic driver (NVidia) or SDK (ATi/AMD/Intel), which allow to setup the computation for GPU. The GPU program itself is written in something based on C99.

    So typically - you load the data from hard drive using PB/CC, organize them to variables/arrays in PB/CC, you call the OpenCL run time functions from the PB/CC to initialize GPU, create queue, compile the GPU program and run it on GPU cores, you use PB/CC to pickup the crunched data back.

    Last note - it is lot of code to just setup the calculation on GPU, and it is hell to debug. If it runs, it is extremely fast, if you run into driver problem or some card specific issue, you can spend half of the month just debugging.

    So ... performance comes at a price of gray hair in case of OpenCL
    Check the complete example I linked on Josés forum, it shows the most simple task of summing two arrays to third. In pure PB, it is 3 lines of code, with OpenCL, you decorate whole thing with 100 lines of code


    Petr

    Leave a comment:


  • peter edmiston
    replied
    Thank you all for your suggestions. (I feel a bit like some guy who has been lost in the jungle wondering if the war is over.) I am going to enjoy trying your suggestions. Yes I can run parallel process. In a first pass I think I can just run a multi-core PC and run multiple copies of the program, each doing some of the work. I never would have thought of a graphics card but I guess this is just a number crunching accessory.
    Q#1: I used to use a RAM disk to speed up data read/write. If the cache will do this anyway is it correct to assume there is no value in doing this (approx 1.5 meg read size)?
    Q#2: Is there any value in using an XP OS with a minimal install to minimise the potential for competing processes? It seems unnecessary.
    Q3: I had a read of Petr's paper and it seems that he used OpenCL. However if I just stick to PBCC and add a graphics card do I need to specifically instruct the graphics card to work or will it do this by default?
    Q4: Assuming I could (theoretically) split the calculations into an infinite number of separate processes or just one larger, identical process BUT I am confined to just one processor (core) will it be faster to parallel process or does Windows XP "multi-task" merely by moving sequentially from one task to another (albeit quickly). If multi-tasking is just sequential rather than parallel then there seems no benefit??
    Q5: XP is now replaced by Win7. Is this a better OS for multi-tasking?

    Thanks again for your comments.

    Leave a comment:

Working...
X