Announcement

Collapse
No announcement yet.

#Optimize vs #Speed vs #Neither

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #Optimize vs #Speed vs #Neither

    Just read the latest newsletter regarding #Optimize vs #Speed. I took a 400k program I am currently working on and decided to try them out (having used neither previously). There was virtually no size difference/change between using either or none at all.

    No complaints, just curiosity.

    Here's a little sample code:

    '
    Code:
    'PBWIN 9.01 - WinApi 05/2008 - XP Pro SP3
    #Compile Exe                                
    #Dim All 
     
    '>>>>>>> Virtually no difference in using the #Directives or not
    '>>>>>>> Rem/Unrem to run test
     
    '$Title = "No Directive"
     
    '#Optimize SPEED
    '$Title = "Speed Directive"
     
    #Optimize Size 
    $Title = "Size Directive"
     
    #Include "WIN32API.INC"
    #Include "COMDLG32.INC"
    #Include "InitCtrl.inc"
     
    Sub Testing_directives
      Local ctr As Long
      Local l As String
      For ctr = 1 To 100000
         l$ = String$(ctr, "k") 
      Next ctr
    End Sub
     
    Function PBMain         
      ErrClear           
      Local tmr As Double
     
      tmr = Timer
       Call Testing_directives      
       ?Using$("Done in #.### secs", Timer - tmr),,$Title
    End Function 
    '
    My Results:
    Speed = 1.612 secs for each or none (identical for each condition)
    Size = 655 bytes for no Directive
    Size = 654 bytes when using either Directive (1 byte smaller)

    ===========================
    He of whom many are afraid
    ought to fear many.
    Sir Francis Bacon
    ===========================

    Later: just for fun I put an #ALIGN 16 in front of the For Ctr loop and the time elapsed plummeted from 1.612 secs to 1.595 secs and Size ballooned from 654 to 741 bytes (for nothing and for Speed). Note #Optimize Size speed stayed at 1.612 though size didn't change. {grin}

    Gee, so many options. What's a guy to do? {sigh}

    =====================================
    "I choose a block of marble
    and chop off whatever I don't need."
    Francois-Auguste Rodin (1840-1917),
    when asked how he managed
    to make his remarkable statues
    =====================================
    Last edited by Gösta H. Lovgren-2; 30 Mar 2009, 08:40 PM. Reason: Fun
    It's a pretty day. I hope you enjoy it.

    Gösta

    JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
    LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

  • #2
    time elapsed plummeted from 1.612 secs to 1.595 secs and Size ballooned from 654 to 741 bytes

    Comment


    • #3
      From the Docs: If not used, the default is to choose faster code speed.
      From the Gazette: The alternative is to specify #OPTIMIZE SIZE (the default) ...

      I played with #OPTIMIZE when PB9 first came out and I could not tell what the default was either.

      As for ALIGN see the fifth post here.

      Comment


      • #4
        There are certain alignments of instructions which cause a degradation in performance.
        This happens worst when certain instruction straddle cache line boundaries.
        The effect is more noticible in repeated, tight loops.
        The recommended way to reduce these effects is to align branch targets with the start of the cache line which reduces the liklihood of an instruction's performance being degraded in this way. This is done by padding the code with NOPs which slightly increases the code size.

        The effect of OPTIMIZE SPEED is not to make all the code faster, since a lot of code isn't in tight loops or isn't misaligned to start with, but to reduce the chances of a misalignment slowing your code down unexpectedly.


        I can demonstrate it but the exact effects vary from CPU to CPU so the following code might not perform the same on your system. However, it is good practice on all CPUs.

        The following code allows the alignment of the loop to be adjusted by inserting NOPs.
        As posted, it runs in 4.09 clks per loop on my AthlonXP CPU.
        Uncommenting the NOP slows this to 4.32 clks/loop, 5.6% slower.

        Paul
        Code:
        #OPTIMIZE SIZE    'stop the compiler inserting NOPs and messing up my own alignment
        FUNCTION PBMAIN () AS LONG
               
        #ALIGN 16  'set a fixed alignment which we'll adjust by insterting NOPS
        !nop
        !nop
        !nop
        !nop
        !nop
        !nop
        !nop
        '!nop      'Uncommenting this NOP slows the loop by 5%
        
        sum##=0
        cnt&=1000000000
        
        TIX a&&
        FOR r& = 1 TO cnt&
            sum## += r&
            
        NEXT
        TIX END a&&
        
        PRINT "CPU cycles per loop="; a&&/cnt&
        PRINT sum##
        
        WAITKEY$
        
        END FUNCTION

        Comment


        • #5
          David,
          As for ALIGN see the fifth post here.
          See from post 50 onwards here:
          User to user discussions about the PB/Win (formerly PB/DLL) product line. Discussion topics include PowerBASIC Forms, PowerGEN and PowerTree for Windows.


          Paul.

          Comment


          • #6
            Paul

            I ran your code above quite a few times and then

            > '!nop 'Uncommenting this NOP slows the loop by 5%

            and found that on my machine there wasn't a blind bit of difference.

            I had noticed previously when running code by yourself and John Gleason where nops were inserted deliberately that there was no difference in speed when I altered the number of insertions.

            I have an Intel E6700 which I got shortly after its introduction. Perhaps it and the architecture which has followed since simply isn't sensitive to alignment.

            Comment


            • #7
              David,
              Try varying the number of NOPs from 1 to 16 and see if there is any variation.

              Some architectures are less sensitive to alignment, if yours is one then just optimize for size.
              Athlons, especially when writing ASM, can vary in perfomance by over 20% just by adjusting the code alignment.

              Paul.

              Comment


              • #8
                Paul

                > Try varying the number of NOPs from 1 to 16 and see if there is any variation.

                I am only getting slight variations in the fourth significant figure - so negligible in practical terms.

                Interestingly, the code is nearly four times slower if I use

                Code:
                Local hProcess As Long
                hProcess = GetCurrentProcess()
                SetProcessAffinityMask( hProcess, 1) ' ie CPU 0
                Added: Come to think of it that just proves that with multi-core and time stamps not in sync we will get erroneous results with short intervals. ie SetProcessAffinityMask is a must for timings using the TSC and QPC. With the mask set I still got no difference with varying the number of nop insertions.
                Last edited by David Roberts; 31 Mar 2009, 09:20 AM.

                Comment


                • #9
                  FYI - I emailed PB support about the discrepency between the Gazette and Help/Docs. They said the Help/Docs were incorrect. PB uses #OPTIMIZE SIZE by default.
                  Bernard Ertl
                  InterPlan Systems

                  Comment


                  • #10
                    Bern, I just checked using PBwin 9.01 and it defaults to #OPTIMIZE SPEED, i.e. there are !nop's inserted throughout when no #OPTIMIZE is specified. Specifying #..SIZE results in no inserted !nop's. :hmmm:

                    Comment


                    • #11
                      Hmmmm.... "More Taste?" or "Less Filling?"

                      =====================================================
                      One machine can do the work of fifty ordinary men.
                      No machine can do the work of one extraordinary man.
                      Elbert Hubbard (1859-1915)
                      =====================================================
                      Last edited by Gösta H. Lovgren-2; 31 Mar 2009, 07:31 PM.
                      It's a pretty day. I hope you enjoy it.

                      Gösta

                      JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                      LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                      Comment


                      • #12
                        Originally posted by John Gleason View Post
                        Bern, I just checked using PBwin 9.01 and it defaults to #OPTIMIZE SPEED, i.e. there are !nop's inserted throughout when no #OPTIMIZE is specified. Specifying #..SIZE results in no inserted !nop's. :hmmm:
                        I supposed it's best to just be explicit in what you want.
                        Bernard Ertl
                        InterPlan Systems

                        Comment

                        Working...
                        X