Announcement

Collapse
No announcement yet.

CPU Speed Test

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CPU Speed Test

    I’ve been told the only way to programmatically judge a CPU’s speed is to perform a series of operations that take a known amount of clock cycles to execute, and to use a high precision timer to measure how long the operation’s take.

    Are there ideas on how to do this, and the amount of clock cycles needed?

    Tony D.

  • #2
    'The following PB3.5 code works out the speed of a Pentium type CPU


    REM Work out the CPU clock frequency.
    REM Unfortunately the RDTSC instruction won't run unless CPU is in
    REM protected mode or REAL mode so I need to run in either REAL mode or
    REM using a DOS Protected Mode Interface memory manager (as in Win9x/Win3.1.
    REM Requires DPMIPB.OBJ and DPMIPB.INC to be it the directory for compiling.
    REM These are got from the PowerBASIC forum library. The file is DPMIPB.ZIP
    REM by Gunther Ilzig.

    'Program: DPMIPB.BAS
    'Purpose: Protected Mode (DPMI) example for PB/DOS.
    'Compiler: PB 3.20
    'Assembler: TASM 4.0
    'Status: Freeware
    'Author: Gunther Ilzig

    '----- Compiler Instructions

    $CPU 80386 'compile for 80386 or better CPU
    $OPTIMIZE SIZE 'make a small code
    $COMPILE EXE 'compile to an EXE
    $DEBUG MAP OFF 'turn on map file generation
    $DEBUG PBDEBUG OFF 'don't include pbdebug support
    $LIB COM OFF 'turn off communications library.
    $LIB CGA OFF 'turn off CGA graphics library.
    $LIB EGA OFF 'turn off EGA graphics library.
    $LIB VGA OFF 'turn off VGA graphics library.

    $LIB LPT OFF 'turn off printer support library.
    $LIB IPRINT OFF 'turn off interpreted print library.
    $ERROR BOUNDS OFF 'turn off bounds checking
    $ERROR NUMERIC OFF 'turn off numeric checking
    $ERROR OVERFLOW OFF 'turn off overflow checking
    $ERROR STACK OFF 'turn off stack checking
    $COM 0 'set communications buffer to nothing
    $STRING 1 'set largest string size at 1 KB
    $STACK 2048 'let's use a 2 KB stack
    $SOUND 1 'smallest music buffer possible
    $DIM ARRAY 'force arrays to be pre-dimensioned before
    $DYNAMIC 'all arrays will be dynamic by default
    $OPTION CNTLBREAK OFF 'don't allow Ctrl-Break to exit program

    '----- Protected Mode related values

    TYPE Addresses 'UDT for passed procedure addresses
    pmSetReal AS DWORD 'PM SetReal (PM = Protected Mode address)
    rmBackLab AS DWORD 'RM BackLab (RM = Real Mode address)
    pmClearText AS DWORD 'PM Cleartext
    pmXPokeString AS DWORD 'PM XPokeString
    pmWaiting AS DWORD 'PM Waiting
    pmGetMemInfo AS DWORD 'PM GetMemInfo
    pmAllocBlock AS DWORD 'PM AllocBlock
    pmFreeBlock AS DWORD 'PM FreeBlock
    pmFillBlock AS DWORD 'PM FreeBlock
    END TYPE
    DIM pa AS Addresses 'passed addresses
    paseg?? = VARSEG(pa.pmSetReal) '32-bit pointer to passed addresses
    paoff?? = VARPTR(pa.pmSetReal)
    pa.pmSetReal = CODEPTR32(SetReal)
    'Real Mode address SetReal
    pa.rmBackLab = CODEPTR32(BackLab)
    'Real Mode address BackLab
    pa.pmClearText = CODEPTR32(ClearText)
    'Real Mode address ClearText
    pa.pmXPokeString = CODEPTR32(XPokeString)
    'Real Mode address XPokeString
    pa.pmWaiting = CODEPTR32(Waiting)
    'Real Mode address Waiting
    pa.pmGetMemInfo = CODEPTR32(GetMemInfo)
    'Real Mode address GetMemInfo
    pa.pmAllocBlock = CODEPTR32(AllocBlock)
    'Real Mode address AllocBlock
    pa.pmFreeBlock = CODEPTR32(FreeBlock)
    'Real Mode address FreeBlock
    pa.pmFillBlock = CODEPTR32(FillBlock)
    'Real Mode address FillBlock
    TYPE Information 'UDT for Protected Mode memory
    'information
    total AS DWORD 'largest block of contiguous linear
    'memory in bytes that could be allocated
    pages AS DWORD 'number of memory pages that could be
    'allocated
    lockable AS DWORD 'largest lockable block in pages
    linear AS DWORD 'total linear address space in pages
    unlocked AS DWORD 'currently unlocked pages
    unused AS DWORD 'currently unused pages
    managed AS DWORD 'total number of pages that are
    'managed by the DPMI Host
    size AS DWORD 'size of page partition/file
    reserved AS STRING*16 'reserved area
    END TYPE
    DIM info AS Information 'define UDT
    farsize??? = 0 'variable for far heap manipulation
    dpmimajor? = 0 'DPMI major version
    dpmiminor? = 0 'DPMI minor version
    flag32? = 0 'flag for 32-bit DPMI Host
    errorcode? = 0 'error code
    'Factory Default = 0 (no error)
    'do not change that value, it's
    'SHARED


    '----- Main Program

    $INCLUDE"DPMIPB.INC"

    NoDPMI%=0
    cpuOK??=0
    credit$="The program was written by P.Dixon. It is freeware. Use it at your own risk."
    credit1$="The DPMI interface is freeware written by Gunther Ilzig."

    '----- We start our application in Real Mode, or better, in V86 Mode,
    '----- which is approximately Real Mode.

    CLS

    ?" CPUSPEED.EXE"
    ?" ------------"
    ?" This program measures the clock frequency of the CPU. It is very"
    ?"accurate measuring to a few parts per million. It requires a Pentium or"
    ?"better CPU. It will run with the CPU in either REAL mode (no memory"
    ?"manager installed) or under a DPMI memory manager as found in Win3.x and"
    ?"Win9x. It will not run under WinNT due to the way the hardware is accessed."
    ?"It works by comparing the Time Stamp Counter of the Pentium type processor"
    ?"with the Real Time Clock so the RTC crystal determines the accuracy."
    ?
    ?" Best results are gained from using a test time of 10 seconds or more."
    ?"The program will disable interrupts during this time so other applications"
    ?"will not respond."
    ?
    ?

    '----- Check the installed processor. We need at least a Pentium
    !db %FORCE32bit
    !db %pusha ;save all registers
    !db %FORCE32bit ;save original flags
    !pushf

    !db %FORCE32bit ;get flags..
    !pushf
    !db %FORCE32bit ;..into eax
    !pop ax

    !db %FORCE32bit
    !dw %BSWAPeax ;swap bytes to get access to bit 21

    !mov bx,ax ;copy old value
    !xor ax,&h2000 ;flip bit 21

    !db %FORCE32bit
    !dw %BSWAPeax ;swap bytes back

    !db %FORCE32bit
    !push ax ;push onto stack

    !db %FORCE32bit
    !popf ;set flags

    'now get flags again and see if bit 21 changed..

    !db %FORCE32bit ;get flags..
    !pushf

    !db %FORCE32bit ;..into eax
    !pop ax

    !db %FORCE32bit
    !dw %BSWAPeax ;swap bytes to get access to bit 21

    !xor ax,bx ;mask off all except original bit 21
    !and ax,&h2000
    !jz noCPUid ;Can't ID the CPU so it's not a pentium

    !mov ax,0 ;a cumbersome load of EAX with 1
    !db %FORCE32bit
    !dw %BSWAPeax ;swap bytes
    !mov ax,1

    !db %FORCE32bit
    !dw %CPUID ;get cpuid bits with time stamp info (EAX=1 on entry)

    !and dx,&h10 ;test time stamp counter presence (bit 4)
    !jz notsc ;no time stamp counter so can't do it
    !mov ax,1
    !mov cpuOK??,ax ;flag that all is well
    !jmp finish

    notsc:
    noCPUid:
    !mov ax,0 ;flag that test can't be done
    !mov cpuOK??,ax

    finish:

    !db %FORCE32bit ;restore flags
    !popf
    !db %FORCE32bit ;restore all registers
    !db %popa


    IF (pbvCpu? < 3) or (cpuOK?? = 0) THEN
    PRINT

    PRINT" This application needs a Pentium or better CPU."

    END 'program ends here
    END IF

    '----- Is DPMI support available?
    y$="y"
    IF (ISFALSE CheckDpmi%(dpmimajor?,dpmiminor?,flag32?)) OR (flag32? = 0) THEN
    'DPMI support isn't available

    PRINT" Your configuration doesn't support 32-bit DPMI."
    PRINT" The program may still work on your system without DPMI"
    PRINT" support (e.g. if the processor is running in REAL mode)."

    PRINT
    INPUT" Do you want to try anyway(y/n)";y$

    NoDPMI%=1
    END IF
    IF left$(y$,1)<>"y" AND left$(y$,1)<>"Y" then END

    If NoDPMI%=0 Then
    '----- We must give the DPMI host memory space for his private data.
    '----- The required amount of that memory area determines the
    '----- FUNCTION CheckDpmi%.

    farsize??? = SETMEM(-600000) 'give all memory back to DOS
    farsize??? = SETMEM(50000) 'The size of our program. If your
    'program needs more DOS memory, please
    'change this value. You can determine
    'the appropriate values with the PB IDE,
    'Item "Compile" and "Get Info".

    '----- The FUNCTION PmAlloc% allocates memory for the necessary
    '----- private DPMI Host data. If this function fails, the program
    '----- terminates, because it wouldn't work properly in the
    '----- Protected Mode.

    IF ISFALSE PmAlloc% THEN 'memory allocation failed
    PRINT

    PRINT" Couldn't allocate DOS memory for private DPMI data."

    END
    END IF

    End if
    '----- At this point, we're ready to initialize the Protected Mode.

    '###################################################################
    'Code by PD starts here

    %RDTSC=&h310f :REM opcode FOR RDTSC instruction
    %PUSHA=&h60
    %POPA=&h61
    %CPUID=&hA20F
    %BSWAPeax=&hC80F
    %BSWAPedx=&hCA0f
    %FORCE32bit=&h66

    a&&=0:cnt&=0:sum&&=0
    f$="###,###,###,###"

    ?
    input "How many seconds for each measurement?",sec$
    if len(sec$) > 5 then sec$=left$(sec$,5)
    sec&&=val(sec$)
    if sec&& < 1 then sec%=1 else sec% = sec&& mod 30000

    IF sec%=1 THEN
    ?"Test time set at ";sec%;"sec."
    ELSE
    ?"Test time set at ";sec%;"secs."
    END IF

    ?"Working..."

    DO

    If NoDPMI%=0 then
    '----- Initialize Protected Mode

    IF ISFALSE InitPM%(paseg??,paoff??) THEN
    PRINT
    CALL ErrorHandling(errorcode?) :'print error message and terminate
    END IF

    '----- !!!!! Attention. Now in 32-bit Protected Mode !!!!!
    End if

    !jmp start

    waitforRTC:
    !in al,&h70 ;RTC address register
    !and al,&h80 ;preserve top (NMI) bit
    !or al,&ha ;point to status register (&h0a)
    !out &h70,al
    rtclp2:
    !in al,&h71 ;get status register
    !and al,128 ;test 'update in progress' bit
    !jz rtclp2 ;wait until it is in progress
    rtclp3:
    !in al,&h71 ;get status register
    !and al,128 ;test 'update' bit
    !jnz rtclp3 ;wait until update is finished
    !retn

    start:
    !cli ;disable interrupts to make timing accurate
    !db %PUSHA ;save all registers
    !call waitforrtc

    !db %PUSHA
    !dw %CPUID ;force all pending instructions to complete
    !db %POPA

    !dw %RDTSC ;read time stamp counter

    !mov a&&,ax ;store first value somewhere safe
    !db %FORCE32bit
    !dw %BSWAPeax
    !xchg ah,al
    !mov a&&[2],ax

    !mov a&&[4],dx
    !db %FORCE32bit
    !dw %BSWAPedx
    !xchg dh,dl
    !mov a&&[6],dx

    !mov cx,sec%

    ww:
    !call waitforrtc

    !dec cx
    !jnz ww

    !db %PUSHA
    !dw %CPUID ;force all pending instructions to complete
    !db %POPA

    !dw %RDTSC ;get final time stamp counter value

    !sub ax,a&& ;subtract initial value from final value
    !mov a&&,ax

    !db %FORCE32bit
    !dw %BSWAPeax
    !xchg ah,al

    !sbb ax,a&&[2]
    !mov a&&[2],ax

    !sbb dx,a&&[4]
    !mov a&&[4],dx
    !db %FORCE32bit
    !dw %BSWAPedx
    !xchg dh,dl

    !sbb dx,a&&[6]
    !mov a&&[6],dx

    !db %POPA ;restore registers
    !sti ;re-enable interrupts


    If NoDPMI%=0 Then
    '----- Switch back to Real Mode.
    '----- That avoids a GPF in the RTL cleanup code.

    CALL dword pa.pmSetReal

    End if


    CLS
    LOCATE 1,1
    a&&=a&&/sec%

    ?"CPU speed =";using$(f$,a&&)" Hz. "
    IF cnt&=0 THEN
    maxsp&&=a&&
    minsp&&=a&&
    END IF

    INCR cnt&

    sum&&=sum&&+a&&


    maxsp&&=MAX(maxsp&&,a&&)
    minsp&&=MIN(minsp&&,a&&)
    ?
    ?" Max =";using$(f$,maxsp&&);" Hz. "
    ?" Min =";using$(f$,minsp&&);" Hz. "
    ?" Ave =";using$(f$,INT(sum&&/cnt&));" Hz. "
    ?" Spread ="INT((maxsp&&-minsp&&)/sum&&*cnt&*100000000)/100;"ppm "
    ?
    ?"Number of tests = ";cnt&
    ?
    ?"Press any key to exit after current test."
    ?
    ?
    ?

    DELAY 0.5
    LOOP UNTIL INSTAT


    'code by PD ends here
    '######################################################################

    BackLab: 'back jump label


    END

    'SUB ErrorHandling
    'Task: Check the error code, print the appropriate error
    ' message and terminate.
    'Input: pcode? = error code
    'Output: error message
    SUB ErrorHandling(BYVAL pcode?)
    SELECT CASE pcode? 'check error code
    CASE = 1 'Protected Mode initialization failed
    PRINT"þ Couldn't enter 32-bit Protected Mode."
    CASE = 2 'descriptor allocation failed
    PRINT"þ Couldn't allocate needed descriptors."
    CASE = 3 'descriptor write error
    PRINT"þ Couldn't write 32-bit data descriptor."
    CASE = 4
    PRINT"þ Couldn't write return descriptor."
    CASE = 5 'get Protected Mode memory information
    'failed
    PRINT"þ Couldn't get Protected Mode memory information."
    CASE = 6 'Extended Memory allocation failed
    PRINT"þ Couldn't allocate Extended Memory block."
    END SELECT
    PRINT" Application terminates now."
    END 'end application
    END SUB

    Comment


    • #3
      Code:
       [quote]
      REM Work out the CPU clock frequency.
      REM Unfortunately the RDTSC instruction won't run unless CPU is in
      REM protected mode or REAL mode so I need to run in either REAL mode or
      REM using a DOS Protected Mode Interface memory manager (as in Win9x/Win3.1.
      [/quote]
         
      If executed under a protected mode OS (i.e., under Windows), any test 
      such as this one is destined to return inconsistent and unrealiable 
      results, with the effort to disable interrupts probably reducing 
      reliablility even further.  [i]Only[/i] if attempted in real mode can 
      truly reliable CPU benchmarking occur.  In other words, simply 
      achieving access to the time stamp counter is not enough. 
         
      Moreover, reliable benchmarking requires attention to some other 
      critically important issues (including the effects of the cache and 
      data alignment), and the code presented here makes no provision for 
      any of them. 
         
      Just my unsolicited crabby thoughts.

      -------------
      -- Greg
      [email protected]

      Comment


      • #4
        It has become very clear that this type of access is far and away more in-depth than my current programming abilities. I do however appreciate the your responses and I’m sure that others will benefit from your offerings.

        Regards,

        Tony D

        Comment


        • #5
          Greg,
          why should the code return inconsistent results? Protected mode is NOT Windows, its a state of the CPU. Windows is a piece of software which just happens to run in protected mode. Once there, if interrupts are disabled, then the operating system no longer functions, it can't interrupt the code so it can't mess up the timing. The code doesn't work in NT because NT prevents a user application from disabling the interrupts but in DOS/Win3.1/Win9x the code works and is VERY consistent (a few parts in a million). Try it and you'll see!

          As for 'reliable benchmarking', all I claim from this code is the correct CPU clock speed. Cache, memory, video etc. are all irrelevant as they do not alter the CPU clock. The code was originally written as lots of people ask "my computer is too slow, have I got the links set right?". This code tells you without the need to take the lid off the computer.

          Paul.

          Comment


          • #6
            Code:
             [quote]why should the code return inconsistent results? Protected 
            mode is NOT Windows, its a state of the CPU. Windows is a piece of 
            software...[/quote] 
               
            The simple answer is that, for a DOS app, Windows is not just a piece 
            of software which happens to run in protected mode.  (In the same 
            way, the landlord who lives upstairs is not just a neighbor.)  To 
            point out only one fundamental issue, a DOS app cannot disable 
            interrupts under Win32 in the same way that it can under DOS 6.x.  
            The "try it and you'll see" method, despite its appeal, has to take 
            second place to understanding the facts. 
               
            Lots of good explanations of protected mode exist, as do explanations 
            of DPMI and DOS apps when run inside a virtual machine under Win32.  
            Barry Kauler's _Windows Assembly Language and Systems Programming_ is 
            certainly thorough.
               
             [quote]As for 'reliable benchmarking', all I claim from this code is 
            the correct CPU clock speed. Cache, memory, video etc. are all 
            irrelevant as they do not alter the CPU clock.[/quote] 
               
            And the CPU clock speed is being determined how?  By timing a loop.  
            The relevance of the cache and data alignment should be pretty 
            obvious.

            -------------
            -- Greg
            [email protected]

            Comment


            • #7
              Greg,
              <<a DOS app cannot disable interrupts under Win32 in the same way that it can under DOS 6.x.>>

              I disagree. There is only 1 interrupt flag in the CPU. If the operating system allows the flag to be set/cleared by the application (which DOS, Win3.x & Win9x do but WinNT doesn't) then the application becomes the sole user of the CPU, the operating system cannot interfere. I've locked out the landlord for the duration of the test and the only way he can stop me is to smash down the door (press the reset button). I note that you qualified your statement with "..under Win32..". If by this you mean NT then I agree as NT sets the privilege of the application at a level which prevents it from disabling the interrupts.

              <<The "try it and you'll see" method, despite its appeal, has to take second place to understanding the facts.>>

              I believe I understand the facts in this case but I would say that the practical application of any method is more important than any underlying fact. Try it and see is the final proof of anything. In this case the fact as I see it is that the CPU cannot be interrupted once the interrupts are disabled. I'd be interested in you showing how I can halt an application which has disabled interrupts. As a simple example, could you compile the following code in PB3.5 and run it in under Windows and then tell me how you stop it from running without reseting the CPU.

              !CLI
              DO
              LOOP


              <<And the CPU clock speed is being determined how? By timing a loop.>>

              It's determined by a non-stoppable, non-resetable, read only counter implemented in hardware which counts CPU clock cycles since the last reset (called the Time Stamp Counter or TSC). Turn your cache off, align the code how you wish and run the test and you'll see that the code still works i.e. you get the correct CPU clock frequency to within a few ppm.

              Paul.

              Comment


              • #8
                Code:
                 [quote]I believe I understand the facts in this case but I would say 
                that the practical application of any method is more important than 
                any underlying fact. Try it and see is the final proof of 
                anything.[/quote] 
                   
                Faith in this approach makes unlikely the success of any explanation 
                anyone might offer.  'Try it and see' reminds me of the shareware 
                programmer who became notorious for antagonizing beta testers when 
                they'd reports bugs.  His response was invariably, 'Well, it runs 
                fine on my system.'  He had tried it and found it good.  He also 
                couldn't keep beta testers.  Testing on one's own system is the 
                logical, inevitable first step--and the last thing to rely on.  
                Unless, of course, someone intends to create software exclusively for 
                the system on which it's created. 
                   
                 [quote]It's determined by a non-stoppable, non-resetable, read only 
                counter implemented in hardware which counts CPU clock cycles since 
                the last reset (called the Time Stamp Counter or TSC). Turn your 
                cache off, align the code how you wish and run the test and you'll 
                see that the code still works i.e. you get the correct CPU clock 
                frequency to within a few ppm.[/quote] 
                   
                As I said, it's determined by timing a loop:
                   
                   ww:
                   !call  waitforrtc
                   
                   !dec cx
                   !jnz ww
                   
                Or, more accurately, by timing that relies on a loop to delay.
                   
                However, and despite the danger of reinforcing the credibility of  
                flawed methodology, I tried it.  An AMD K6-2/266 on a Win95a system 
                reports a speed of 267Mz.  An AMD K6-3/400 running Win98SE reports a 
                speed of 401Mz.  Both tests were run for the suggested 10 seconds.  
                Repeated runs also returned varying results--"within a few ppm." 
                   
                A third test, of an IBM ThinkPad with a Pentium 100 (and running 
                Win95a), resulted in a solid lockup of the system. 
                   
                I have no intention of wasting time by trying to figure out why the 
                P100 system reacted as it did.  I could have tested 20 systems and 
                found nothing wrong, but those results would not change the facts, as 
                unpleasant as those facts might be.  The approach taken in the posted 
                code is flawed.

                -------------
                -- Greg
                [email protected]

                Comment


                • #9
                  Greg,
                  <<Faith in this approach makes unlikely the success of any explanation anyone might offer.>>

                  Not if that explanation is backed by practical evidence. If you can tell me how to get Windows to stop that 3 line program from running in a DOS window then I'm wrong about operating system interference.
                  If you can demonstrate that the code mistimes your CPU when you turn off the cache, run other programs at the same time or realign your memory, then I'm wrong about the philosophy of the code.

                  I really think you have misunderstood how the code works. It waits for the RTC to roll over the seconds and the code reads the Time Stamp counter. This takes <1usec. The code then waits for the next RTC rollover and does the same with the TSC. Subtract the two TSC readings and you know how many CPU clock cycles passed in 1 sec. The software plays no significant part in the timing. The code avoids nearly all of the potential error but there is still an uncertainty caused by the software and the hardware of a micro-second or two. Over a 1 second test this is 1 or 2 ppm.


                  <<As I said, it's determined by timing a loop>>

                  No it isn't. The timing is done by comparing 2 hardware timers in the machine, the time stamp counter and the real time clock. The 'loop' you're looking at adds 2 machine instructions to the delay (dec cx, jnz ww). The slowest pentium around will execute these in under 100 nano-seconds in a program that runs for a minimum of 1 sec, adding less than 0.1ppm to the timing error. Most of the timing error comes from the instability of the crystals/clock generator circuits in the machine and the timing delay from accessing the slow ISA bus (where the RTC lives).


                  <<A third test, of an IBM ThinkPad>>

                  This is the second machine which has had problems that I'm aware of. Unfortunately, I don't have a thinkpad to test. I suspect the DPMI.


                  <<The approach taken in the posted code is flawed.>>

                  You keep saying that but you are unable to demonstrate it. I'll admit that it is flawed in that it may not run with every system. RTC's and NMI are implemented differently in some systems and I don't check them. But, the method used for the timing is not flawed.


                  Paul.

                  Comment

                  Working...
                  X