I just found out that a piece of petroleum industry standard economic software (written in windows) that we use at work is programmed in SINGLE precision. Bigger numbers were changing slightly on us and I was trying to figure out why. When I talked to a person on the help desk and found this out I went from disbelief to anger. He defended it by saying that single precision was so much faster than double precision. I can't imagine that there is much difference in this going thru the FPU. Has anyone have experience with this issue. I will test it myself when I have the time - I know graphics api's use singles over doubles for speed reasons.
Announcement
Collapse
No announcement yet.
FPU precision speed
Collapse
X
-
Here is some test code, I didn't see much difference. You may want to try some other operations than addition too.
Note: added later--code to set FPU precision
Code:#COMPILE EXE #DIM ALL #REGISTER NONE DECLARE FUNCTION QueryPerformanceCounter LIB "KERNEL32.DLL" ALIAS "QueryPerformanceCounter" (lpPerformanceCount AS QUAD) AS LONG DECLARE FUNCTION QueryPerformanceFrequency LIB "KERNEL32.DLL" ALIAS "QueryPerformanceFrequency" (lpFrequency AS QUAD) AS LONG '~~~~~~~~~~~A Variation of Dave Roberts' MACRO Timer~~~~~~~~~~~~~~~~~~~~~~~ MACRO onTimer LOCAL qFreq, qOverhead, qStart, qStop AS QUAD LOCAL f AS STRING f = "#.###" QueryPerformanceFrequency qFreq QueryPerformanceCounter qStart ' Intel suggestion. First use may be suspect QueryPerformanceCounter qStart ' So, wack it twice <smile> QueryPerformanceCounter qStop qOverhead = qStop - qStart ' Relatively small END MACRO MACRO goTimer = QueryPerformanceCounter qStart MACRO stopTimer = QueryPerformanceCounter qStop MACRO showTimer = USING$(f,(qStop - qStart - qOverhead)*1000000/qFreq /1000) + " milliseconds" '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTION PBMAIN () AS LONG LOCAL singVar AS SINGLE, doubVar AS DOUBLE LOCAL ctrlWord AS WORD LOCAL ii AS LONG ontimer gotimer !fstcw ctrlWord !and ctrlWord, &b1111110011111111 ;set to single precision !fldcw ctrlWord FOR ii = 1 TO 100000000 singVar = singVar + 1.1 NEXT stoptimer ? showtimer gotimer !or ctrlWord, &b0000001000000000 ;set to double precision !fldcw ctrlWord FOR ii = 1 TO 100000000 doubVar = doubVar + 1.1 NEXT stoptimer !or ctrlWord, &b0000001100000000 ;set back to ext precision !fldcw ctrlWord ? showtimer END FUNCTION
-
-
James,
The 8087 Floating Point co-processor (the original Intel FPU) lists Single precision multiply as taking 11.9usec and extended precision muliply as taking 16.9 usecs, that's 42% slower. Add/Subtract were not iterative so they took the same time as each other to execute. More complex instructions were iterative and took longer for more bits.
These days the FPU is a lot faster and does the multiply of any size at equal speed but the more complex instructions are still done iteratively and, if set to a lower precision mode, the FPU will run faster.
Examples from the Athlon Optimization Guide:
FMUL (any size) latency=4 cycles
FDIV single latency= 16 cycles
FDIV double latency= 20 cycles
FDIVT extended latency= 24 cycles (50% slower thqn SINGLE)
FSQRT single latency= 19 cycles
FSQRT double latency= 27 cycles
FSQRT extended latency= 35 cycles
Also, for large arrays of data, the memory bandwidth can become a problem. A DOUBLE takes twice the memory of a SINGLE and so a large array of DOUBLEs can take twice as long to fetch from and write to memory than the same sized array of SINGLEs even if the FPU calculates at the same speed.
EXTs are worse because they are not only larger, at 10 bytes, but they are also either not aligned in memory (being spaced at 10 bytes instead of 8 or 16) which can make them considerably slower or they are spaced out at 16 bytes which aligns them but now need twice the memory to be fetched when compared to a DOUBLE.
Paul.
Comment
-
-
When comparing real world performance between single & double precision, usually the real differences in times come from the doubled bandwidth needs: so half data transfered in the same time, half data in the same cache size, and so on.
Edit:
Paul post obviously come before!
Bye!-- The universe tends toward maximum irony. Don't push it.
File Extension Seeker - Metasearch engine for file extensions / file types
Online TrID file identifier | TrIDLib - Identify thousands of file formats
Comment
-
-
John,
be careful with your testing.
When the FPU is set to EXTENDED precision, as it is with PB, ALL calculations are done to EXTENDED precision, even those with SINGLEs and DOUBLEs. The EXTENDED precision result is then converted to the required precision after the calculation is complete, when the result is stored.
The FPU has bits in the control register to tell it what precision to use. If it's told to work in SINGLE precision via the flags then it will stop calculation when that precision is met and this will result in faster results.
If the FPU is set to work in EXTENDED precision then it's not enough to just use a SINGLE variable as that variable will be converted by the FPU to EXTENDED when it's loaded, the calculation will be performed in EXTENDED precision and the result will then be converted back to SINGLE for storing back in memory.
Paul.
Comment
-
-
James--
Obviously, you can measure the difference in speed between calculating a single and a double and an extended float. However, in the context of a complete application, float opcodes represent just a part of the total, Typically, a small part. I'd be very surprised to see any application where the measurable total difference was more than 1-2%.
Bob Zale
PowerBASIC Inc.
Comment
-
-
I have accumulated some experience with intensive floating point calculations, and I always set all float variable to DOUBLE, with negligible loss of speed but much gain in "safety".
However I am with Marco: there is a situation in which it can be convenient to switch to SINGLE (and I have done it a couple of times).
This happens when precision is not at all an issue, and at the same time a very large amount of data need to undergo a very simple manipulation, such as a single addition/subtraction/multiplication.
Since the CPU is usually much faster than the pipeline sending data back and forth from memory, under the above circumstances the bottleneck of the computation is the transfer of data back and forth from memory and not the arithmetic operations, and having half-sized data will speed it up by a factor of 2.
RegardsAldo Vitagliano
alvitagl at unina it
Comment
-
-
have accumulated some experience with intensive floating point calculations, and I always set all float variable to DOUBLE, with negligible loss of speed but much gain in "safety
Which suggests ..... when using these compilers one should use EXT for floats and convert to SINGLE/DOUBLE if necessary/wanted only after completion of calculations .... that is , in the absence of a compelling reason to use SINGLE/DOUBLE for the actual calculations.
i.e., for integers use LONG, for floats use EXT to take advantage of the compiler's optimizations.
If I remembered this EXT thing incorrectly, never mind.Last edited by Michael Mattias; 9 Apr 2008, 09:02 AM.Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
-
One for two ain't bad. (I KNOW "LONG" is the 'datatype of choice' for integers).
I do next to nothing with floats. About 99% of my numeric needs are covered by integers and money, for which I use LONG and CUR/CUX respectively.Michael Mattias
Tal Systems (retired)
Port Washington WI USA
[email protected]
http://www.talsystems.com
Comment
-
Comment