No announcement yet.


  • Filter
  • Time
  • Show
Clear All
new posts

  • #align

    can anyone give me an example how to use this tool?
    In the last email Mr.Zale wrote this:

    " Place #ALIGN
    just before heavily accessed labels and loops (the frequent target of
    jumps), and you'll often find a big difference. "

    but I don't understand very well this instructions, for example:

    #ALIGN 16
    CALL EnableScreen


    #ALIGN 16
    SUB EnableScreen

    code .....


    Which is correct way to use this tool?

    Antonello Ventullo

  • #2
    You would use it like this:

    'Not a lot of potential gain since the SUB overhead is quite large:
    #ALIGN 16
    SUB EnableScreen   'it's the routine being called that needs to be aligned.
    code .....
    or this
    More potential gains since the GOSUB overhead is small
    FOR r& = 1 TO cnt&
        GOSUB MySub
    #ALIGN 16
    MySub:		'again, it's the routine being called that needs to be aligned.
    INCR a&
    INCR b&
    But it's of most use in ASM like this:
    !mov eax,a&
    !mov ebx,0
    #align 16
    lp:		'the label being jumped to that needs to be aligned.
    !add ebx,eax
    !dec eax
    !jnz lp
    or this:
    #ALIGN 8
    LookupTable:   ;it's important for speed that data is correctly aligned.
    !dd 1,2,3,4,6,8,9,22,1,3,55,3,2,33,6,78


    • #3
      Paul, how do you decide to use use #Align 16 vs. say #Align 8 like you used in the last example?


      • #4
        The usual alignment for code on a current CPU is ALIGN 16 but that might change in future CPUs as it depends on the architecture. Current CPUs tend to decode programs in 16 byte blocks.

        For data, the usual recommendation is to make sure data is aligned on a boundary equal to its size.
        For bytes, that's 1 (i.e. no need to align)
        For Word it's 2
        For DWORDS it's 4
        For QUADs it would be 8.

        EXTs are an odd size, 10 bytes, and are usually aligned on an 8 or 16 byte boundary.



        • #5
          Paul, if I'm understanding your posts on this subject (both here and in other threads on the forum), real performance gains will likely only be realized if the loop (or goto/return block) are both small and executed/iterated a lot. The idea is that the processor can load all the (assembled) code in fewer blocks thus reducing the number of cycles required to process them.

          If I understood correctly, the alignment issue will at best save 1 block fetch. So time savings will be proportional to the total number of blocks that need to be fetched for the routine. The savings can then be weighted against the number of times the routine is called/executed to get an idea on how useful alignment might be.

          So, while the time savings would be proportionally high for an initialization loop like:

          #ALIGN 16
          FOR I = 1 TO UBOUND( lArray)
          lArray( I) = I

          If the loop is only executed once (or infrequently), it really isn't worth worring about. But if you have a loop that processes data nested in other loops in a critical section of code, it's probably worth the effort to align it.

          Am I understanding correctly?
          Bernard Ertl
          InterPlan Systems


          • #6
            real performance gains will likely only be realized if the loop (or goto/return block) are both small and executed/iterated a lot.
            That's the case with any optimisation, if it's a short piece of code that's not run much then you'll not notice much difference. A few nanosconds saved in the typical program just goes unnoticed but a few nanosends saved in each iteration of, e.g., a fast graphic routine processing 1 million pixels, will save a few milliseconds which might make the difference between acheiving the required frame rate or not.

            A real example can be found here:
            User to user discussions about the PB/Win (formerly PB/DLL) product line. Discussion topics include PowerBASIC Forms, PowerGEN and PowerTree for Windows.

            Look at posts 50, 51 and after.
            Once the code was speeded up from 200ms+ to below 10ms, the alignment issue became very significant and just by aligning the code the processing time dropped from 7.45ms to 6.64ms, a 0.81ms or 12% improvement on an already fast routine.
            However, fiddling with the alignment of the initial BASIC code may have saved the same 0.81ms but you wouldn't have noticed it in the 200ms+ that the original code took. There it would represent only a a fraction of 1% improvement.

            The improvements are more noticable in short, fast loops because the proportion of time potentially wasted by misaligned jumps is greater in a short loop.



            • #7
              #ALIGN 16
              FOR I = 1 TO UBOUND( lArray)
                lArray( I) = I

              Compiler does not do this automatically? even if #OPTIMIZE SPEED?

              Seems very strange since as long as I have known them most of the PB folks are "speed freaks*"

              *no reference to amphetamines is implied or should be inferred.
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]


              • #8
                see this:

                Appears that some folks take a lot of convincing that there is anything to be gained from code alignment.
                But now, the OPTIMIZE SPEED option does it for you automatically.



                • #9
                  Thanks for all the info Paul.