Announcement

Collapse
No announcement yet.

Comb sort in ASM

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comb sort in ASM

    ''' A forum member suggested that I should post this sample
    ''' code here. It is a comb sort, generally faster than a
    ''' bubble sort.


    '''
    TYPE my_type
    my_sortkey AS STRING * 80 '' Key record
    my_stuff AS STRING * 80 '' More data
    more_stuff AS STRING * 240 '' Still more data
    END TYPE

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    FUNCTION PBMAIN() AS LONG
    '''
    DIM this_one (0 TO 5000) AS GLOBAL my_type
    DIM stp (0 TO 5000) AS GLOBAL my_type POINTER

    ‘’’ Below: variables used in comb-sort algorithm
    DIM
    chgd AS GLOBAL INTEGER, _ ' Set to 0 (nothing changed
    ' on this iteration), or to 1
    ktr AS GLOBAL INTEGER, _ ' Number of instances,
    ' or items to sort
    ktra AS GLOBAL INTEGER, _ ' Working value of ktr,
    ' gradually diminished
    ng AS GLOBAL INTEGER ' A measure of the separation
    ' between two items to be
    ' compared, initially ktr \ 2,
    ' eventually 1

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    ktr = 0

    ' ' ' Read several thousand instances of this_one,
    ' ' ' letting ktr = number of instances. Each instance
    ' ' ' consists of a my_sortkey and two other strings,
    ‘ ‘ ‘ all read in from files, with appropriate editing.
    ' ' ' The reading / editing step is not a bottleneck, and
    ' ' ' is fairly fast.
    ' ' ' No data are placed in instance (0), which is a dummy,
    ' ' ' so instance (ktr) is the last one populated.

    '' Below: Make POINTERs for instances 0 to 5000,
    '' even if ktr < 5000 . . .
    FOR k = 0 To 5000
    stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
    NEXT k

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    ktra = ktr '' working value of ktr
    ng = ktr \ 2 '' half of ktr (integer division)

    ' ' ' Below, the CALL passes stp(0), but this will quickly
    ' ' ' be incremented, so stp(1) will be the first instance
    ' ' ' to be considered . . .

    ' ' ' ASM routine to do sorting on my_sortkey . . .
    CALL do_sort (stp(0))

    ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    ''' Output all results to a properly-formatted file.
    ''' Output is in a loop on k from 1 to ktr, and is
    '[' essentially as follows:
    '''
    ''' @stp (k).my_stuff
    ''' followed by
    ''' @stp (k).more_stuff
    ''' discarding my_sortkey
    '''
    ''' . . . with appropriate headers, blank-padding,
    ''' blank lines, formatting, and paging. It is a
    ''' relatively fast step, not a bottleneck.


    '''''''''''''''''''''''''''''''''''''''''''''''''''''''


    ''' SUB do_sort follows. It is a comb-sort, and works
    ''' perfectly, but is still the slowest step in this
    ''' application.
    '''

    ''''''''''''''''''''''''''''''''''''''''''''''''''''''
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''
    ''''''''''''''''''''''''''''''''''''''''''''''''''''''

    SUB do_sort (BYREF StrucPtr AS DWORD)

    '' Power Basic pushes EBX, ESI, EDI.

    ASM CLD

    Very_top: ' Scan from the top

    ASM MOV chgd, 0 ; Set or reset "changed" flag . . .

    ASM LEA EBX, ktra ; Pointer to ktra
    ASM MOV EAX, [EBX] ; EAX = ktra
    ASM CMP EAX, 1 ; Testing for exit, if ktra has been
    ' decremented to 1
    ASM JE Labexit

    ASM LEA EBX, ng ; Pointer to ng
    ASM SUB EAX, [EBX] ; Subtract ng from ktra.
    ASM MOV ECX, EAX ; This ECX is based on (ktra - ng).
    ' It determines how far down in the
    ' table the testing goes on the current
    ' iteration.

    ASM MOV EAX, [EBX] ; EAX = ng
    ASM SAL EAX, 2 ; Quadrupled
    ASM LEA EBX, disp
    ASM MOV [EBX], EAX ; disp (displacement) in bytes =
    ' 4 * ng between stp( ) pointers
    ' of the two strings to be compared

    ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will
    ' start with stp(1). Below this
    ' point, EDX will always = stp ( ).


    ''' Scan will go from the top of the array to a bottom
    ''' which gradually moves up as "heavier" items sink to
    ''' the bottom and "lighter" ones rise to the top. When
    ''' ktra, the working value of the bottom of the set,
    ''' has been decremented all the way to 1 (top of the
    ''' set), sorting is finished. It is also finished if
    ''' an iteration fails to exchange a pair of items,
    ''' provided that at least one bubble-sorting iteration
    ''' has been made.
    '''
    ''' "Very_top" starts a new iteration from the top to the
    ''' current bottom.
    '''
    ''' "Intermediate" starts a comparison between two
    ''' my_sortkey strings and a possible exchange of
    ''' this_one pointers.


    Intermediate:

    ASM PUSH ECX ; Save master counter.
    ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
    ''' farther down in the this_one( ) array, at least
    ''' according to the stp ( ) pointers. [The instances of
    ''' this_one are never moved. Only their stp( )
    ''' pointers are moved.]

    ASM MOV ECX, 80 ; ECX becomes the counter to compare 80
    ' bytes (length of any my_sortkey).

    ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.

    ASM LEA EBX, disp
    ASM MOV EAX, [EBX] ; disp in EAX
    ASM ADD EAX, EDX ; EAX = EDX + disp, for my_sortkey #2.
    ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.

    ASM REPE CMPSB ; Compare strings. ESI and EDI change
    ' in this process. This is "repeat
    ' while equal," so it exits at first
    ' mismatch.

    ASM JB Past_exchange ; "Below" = smaller ASCII value, so
    ' jump to avoid changing order,
    ' else . . .

    '' Exchanging pointers to change order . . .
    ASM MOV ESI, [EDX] ; First, restore original ESI and EDI.
    ASM MOV EDI, [EAX]
    ASM MOV [EDX], EDI ; Then, exchange their locations in the
    ' stp( ) array . . .
    ASM MOV [EAX], ESI

    ASM MOV chgd, 1 ; Set "changed" flag.

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    Past_exchange:

    ''' Master counter (the one based on ktra) . . .
    ASM POP ECX

    '' Decrement and check counter, for loop-back to
    '' Intermediate . . .
    ASM DEC ECX
    ASM JNZ Intermediate ; to compare another pair of instances
    ' on current iteration

    '' Current iteration has ended, so narrow the range ng
    ‘’ (unless ng = 1) . . .
    ASM LEA EBX, ng
    ASM MOV EAX, [EBX] ; EAX = [EBX] = ng
    ASM CMP EAX, 1 ; To test for ng = 1

    ASM JE SHORT Keep_ng ; If ng was = 1, keep it, else (below)
    ' scale it down . .

    '' Scaling down ng by 3/4. Scale factor is not an exact
    '' science, and an assunmption must be made. Literature
    '' suggests division by 1.3, with integer rounding, so
    '' this is close . . .

    ASM SAL EAX, 1 ; Double it.
    ASM ADD EAX, [EBX] ; Triple it.
    ASM SAR EAX, 2 ; Integer division by 4,
    ' can go down to 0.

    ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
    ASM JNE SHORT Save_ng

    ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
    ' below), but only if it was = 0
    ' above. With ng = 1, sort will
    ' become a bubble-sort.
    Save_ng:
    ASM MOV [EBX], EAX ; Save ng, which may or may not = 1.
    ' [EBX] still = ng here, but not after
    ' this point.
    ASM JMP Very_top ; To start new iteration

    Keep_ng:
    ASM LEA EBX, chgd ; Check "changed" flag . . .
    ' Note: No ehecking of chgd unless this
    ' iteration was a bubble-sort, using
    ' ng = 1

    ASM MOV EAX, [EBX]
    ASM SUB EAX, 0 ; To compare chgd with 0

    ASM JZ Labexit ; To Labexit if no change on this
    ' iteration, else . . .

    ASM LEA EBX, ktra
    ASM MOV EAX, [EBX]
    ASM DEC EAX ; Decrement ktra (move bottom upward).
    ASM MOV [EBX], EAX ; Store new ktra.

    ASM JMP Very_top ; To start new iteration


    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

    Labexit:

    '' Nothing to POP

    END SUB '' do_sort
    Last edited by Harry Akers; 23 Jul 2009, 05:10 PM. Reason: It gave me smiley faces where I wanted something else Also, problems with and spacing.

  • #2
    Harry, I got it to compile (PB9) but I haven't debugged it yet. It has a page fault somewhere at this time.

    Note: You can solve those formatting problems using [CODE ][/CODE ] tags (without the spaces) around your code.
    Code:
       #DIM ALL
       #REGISTER NONE
       ' Comb sort in ASM
       ''' A forum member suggested that I should post this sample
       ''' code here. It is a comb sort, generally faster than a
       ''' bubble sort.
    
    
       '''
       TYPE my_type
       my_sortkey AS STRING * 80 '' Key record
       my_stuff AS STRING * 80 '' More data
       more_stuff AS STRING * 240 '' Still more data
       END TYPE
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
    
       FUNCTION PBMAIN() AS LONG
       '''
       DIM this_one (0 TO 5000) AS GLOBAL my_type
       DIM stp (0 TO 5000) AS GLOBAL my_type POINTER
       LOCAL k AS LONG
    
       '’’ Below: variables used in comb-sort algorithm
       DIM   chgd AS GLOBAL INTEGER, _ ' Set to 0 (nothing changed on this iteration), or to 1
       ktr AS GLOBAL INTEGER, _ ' Number of instances, or items to sort
       ktra AS GLOBAL INTEGER, _ ' Working value of ktr gradually diminished
       ng AS GLOBAL INTEGER ' A measure of the separation
       ' between two items to be
       ' compared, initially ktr \ 2,
       ' eventually 1
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
    
       ktr = 0
    
       ' ' ' Read several thousand instances of this_one,
       ' ' ' letting ktr = number of instances. Each instance
       ' ' ' consists of a my_sortkey and two other strings,
       '‘ ‘ ‘ all read in from files, with appropriate editing.
       ' ' ' The reading / editing step is not a bottleneck, and
       ' ' ' is fairly fast.
       ' ' ' No data are placed in instance (0), which is a dummy,
       ' ' ' so instance (ktr) is the last one populated.
    
       '' Below: Make POINTERs for instances 0 to 5000,
       '' even if ktr < 5000 . . .
       FOR k = 0 TO 5000
       stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
       NEXT k
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''
    
       ktra = ktr '' working value of ktr
       ng = ktr \ 2 '' half of ktr (integer division)
    
       ' ' ' Below, the CALL passes stp(0), but this will quickly
       ' ' ' be incremented, so stp(1) will be the first instance
       ' ' ' to be considered . . .
    
       ' ' ' ASM routine to do sorting on my_sortkey . . .
       CALL do_sort (stp(0))
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''
    
       ''' Output all results to a properly-formatted file.
       ''' Output is in a loop on k from 1 to ktr, and is
       '[' essentially as follows:
       '''
       ''' @stp (k).my_stuff
       ''' followed by
       ''' @stp (k).more_stuff
       ''' discarding my_sortkey
       '''
       ''' . . . with appropriate headers, blank-padding,
       ''' blank lines, formatting, and paging. It is a
       ''' relatively fast step, not a bottleneck.
    
       END FUNCTION
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''
    
    
       ''' SUB do_sort follows. It is a comb-sort, and works
       ''' perfectly, but is still the slowest step in this
       ''' application.
       '''
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
       '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
       '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
    
       SUB do_sort (BYREF StrucPtr AS DWORD)
    
       '' Power Basic pushes EBX, ESI, EDI.
       LOCAL disp AS LONG
       ASM CLD
    
       Very_top: ' Scan from the top
    
       ASM MOV chgd, 0 ; Set or reset "changed" flag . . .
    
       ASM LEA EBX, ktra ; Pointer to ktra
       ASM MOV EAX, [EBX] ; EAX = ktra
       ASM CMP EAX, 1 ; Testing for exit, if ktra has been
       ' decremented to 1
       ASM JE Labexit
    
       ASM LEA EBX, ng ; Pointer to ng
       ASM SUB EAX, [EBX] ; Subtract ng from ktra.
       ASM MOV ECX, EAX ; This ECX is based on (ktra - ng).
       ' It determines how far down in the
       ' table the testing goes on the current
       ' iteration.
    
       ASM MOV EAX, [EBX] ; EAX = ng
       ASM SAL EAX, 2 ; Quadrupled
       ASM LEA EBX, disp
       ASM MOV [EBX], EAX ; disp (displacement) in bytes =
       ' 4 * ng between stp( ) pointers
       ' of the two strings to be compared
    
       ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will
       ' start with stp(1). Below this
       ' point, EDX will always = stp ( ).
    
    
       ''' Scan will go from the top of the array to a bottom
       ''' which gradually moves up as "heavier" items sink to
       ''' the bottom and "lighter" ones rise to the top. When
       ''' ktra, the working value of the bottom of the set,
       ''' has been decremented all the way to 1 (top of the
       ''' set), sorting is finished. It is also finished if
       ''' an iteration fails to exchange a pair of items,
       ''' provided that at least one bubble-sorting iteration
       ''' has been made.
       '''
       ''' "Very_top" starts a new iteration from the top to the
       ''' current bottom.
       '''
       ''' "Intermediate" starts a comparison between two
       ''' my_sortkey strings and a possible exchange of
       ''' this_one pointers.
    
    
       Intermediate:
    
       ASM PUSH ECX ; Save master counter.
       ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
    
       ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
       ''' farther down in the this_one( ) array, at least
       ''' according to the stp ( ) pointers. [The instances of
       ''' this_one are never moved. Only their stp( )
       ''' pointers are moved.]
    
       ASM MOV ECX, 80 ; ECX becomes the counter to compare 80
       ' bytes (length of any my_sortkey).
    
       ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.
    
       ASM LEA EBX, disp
       ASM MOV EAX, [EBX] ; disp in EAX
       ASM ADD EAX, EDX ; EAX = EDX + disp, for my_sortkey #2.
       ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.
    
       ASM REPE CMPSB ; Compare strings. ESI and EDI change
       ' in this process. This is "repeat
       ' while equal," so it exits at first
       ' mismatch.
    
       ASM JB Past_exchange ; "Below" = smaller ASCII value, so
       ' jump to avoid changing order,
       ' else . . .
    
       '' Exchanging pointers to change order . . .
       ASM MOV ESI, [EDX] ; First, restore original ESI and EDI.
       ASM MOV EDI, [EAX]
       ASM MOV [EDX], EDI ; Then, exchange their locations in the
       ' stp( ) array . . .
       ASM MOV [EAX], ESI
    
       ASM MOV chgd, 1 ; Set "changed" flag.
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
    
       Past_exchange:
    
       ''' Master counter (the one based on ktra) . . .
       ASM POP ECX
    
       '' Decrement and check counter, for loop-back to
       '' Intermediate . . .
       ASM DEC ECX
       ASM JNZ Intermediate ; to compare another pair of instances
       ' on current iteration
    
       '' Current iteration has ended, so narrow the range ng
       '‘’ (unless ng = 1) . . .
       ASM LEA EBX, ng
       ASM MOV EAX, [EBX] ; EAX = [EBX] = ng
       ASM CMP EAX, 1 ; To test for ng = 1
    
       ASM JE SHORT Keep_ng ; If ng was = 1, keep it, else (below)
       ' scale it down . .
    
       '' Scaling down ng by 3/4. Scale factor is not an exact
       '' science, and an assunmption must be made. Literature
       '' suggests division by 1.3, with integer rounding, so
       '' this is close . . .
    
       ASM SAL EAX, 1 ; Double it.
       ASM ADD EAX, [EBX] ; Triple it.
       ASM SAR EAX, 2 ; Integer division by 4,
       ' can go down to 0.
    
       ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
       ASM JNE SHORT Save_ng
    
       ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
       ' below), but only if it was = 0
       ' above. With ng = 1, sort will
       ' become a bubble-sort.
       Save_ng:
       ASM MOV [EBX], EAX ; Save ng, which may or may not = 1.
       ' [EBX] still = ng here, but not after
       ' this point.
       ASM JMP Very_top ; To start new iteration
    
       Keep_ng:
       ASM LEA EBX, chgd ; Check "changed" flag . . .
       ' Note: No ehecking of chgd unless this
       ' iteration was a bubble-sort, using
       ' ng = 1
    
       ASM MOV EAX, [EBX]
       ASM SUB EAX, 0 ; To compare chgd with 0
    
       ASM JZ Labexit ; To Labexit if no change on this
       ' iteration, else . . .
    
       ASM LEA EBX, ktra
       ASM MOV EAX, [EBX]
       ASM DEC EAX ; Decrement ktra (move bottom upward).
       ASM MOV [EBX], EAX ; Store new ktra.
    
       ASM JMP Very_top ; To start new iteration
    
    
       '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''
    
       Labexit:
    
       '' Nothing to POP
    
       END SUB '' do_sort

    Comment


    • #3
      Harry,
      you aren't using the available opcodes efectively in your program, in particular the addressing modes available to the CPU.
      Throughout your code you do things like this:
      Code:
      ASM LEA EBX, ktra ; Pointer to ktra
      ASM MOV EAX, [EBX] ; EAX = ktra
      When the same can be acheived in a single instruction like this:
      Code:
      !mov eax,ktra   'move the value in ktra into eax register
      There is no need to load the address into a register first, just access the variable directly and its address is encoded into the instruction.



      You also have this:
      Code:
      ASM LEA EBX, disp
      ASM MOV EAX, [EBX] ; disp in EAX
      ASM ADD EAX, EDX   ; EAX = EDX + disp, for my_sortkey #2.
      ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.
      Which could be written
      Code:
      !mov eax,disp
      !mov edi,[eax+edx]


      Code:
      Keep_ng:
      ASM LEA EBX, chgd ; Check "changed" flag . . .
      ' Note: No ehecking of chgd unless this
      ' iteration was a bubble-sort, using
      ' ng = 1
      
      ASM MOV EAX, [EBX]
      ASM SUB EAX, 0 ; To compare chgd with 0
      
      ASM JZ Labexit ; To Labexit if no change on this
      Could be written:
      Code:
      Keep_ng:
      !cmp chgd,0    'compare chgd to zero
      !je Labexit    'jump if equal to Labexit


      and this:
      Code:
      ASM LEA EBX, ktra
      ASM MOV EAX, [EBX]
      ASM DEC EAX ; Decrement ktra (move bottom upward).
      ASM MOV [EBX], EAX ; Store new ktra.
      Could be:
      Code:
      !dec ktra    'decrement ktra
      I'm sure there are more.

      Also,
      Code:
      REPE CMPSB
      is easy to program but is usually slower than coding the search explicitly using a number of simpler opcodes and, since you're scanning for =, you could scan 4 bytes at a time instead of 1 and it would be a lot quicker.


      Paul.

      Comment


      • #4
        I'll try your suggestions.

        Comment


        • #5
          Definitely note all the changes Paul mentioned above, and for my sortkeys below in particular, the one change from byte to dword made a huge--over 150x--difference in speed. That's an extreme case, but does demo what a small optimization can do.
          Code:
          ASM REPE CMPSD ; Compare strings. ESI and EDI change
          Code:
             #DIM ALL
             #REGISTER NONE
             ' Comb sort in ASM
             ''' A forum member suggested that I should post this sample
             ''' code here. It is a comb sort, generally faster than a
             ''' bubble sort.
          
          
             '''
             TYPE my_type
             my_sortkey AS STRING * 80 '' Key record
             my_stuff AS STRING * 80 '' More data
             more_stuff AS STRING * 240 '' Still more data
             END TYPE
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
          
             FUNCTION PBMAIN() AS LONG
             '''
             DIM this_one (0 TO 5000) AS GLOBAL my_type
             DIM stp (0 TO 5000) AS GLOBAL my_type POINTER
             LOCAL k AS LONG, outPtr AS STRING PTR * 80, outStr AS STRING, t AS QUAD
             RANDOMIZE
             '’’ Below: variables used in comb-sort algorithm
             DIM   chgd AS GLOBAL INTEGER, _ ' Set to 0 (nothing changed on this iteration), or to 1
             ktr AS GLOBAL INTEGER, _ ' Number of instances, or items to sort
             ktra AS GLOBAL INTEGER, _ ' Working value of ktr gradually diminished
             ng AS GLOBAL INTEGER ' A measure of the separation
             ' between two items to be
             ' compared, initially ktr \ 2,
             ' eventually 1
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
          
             ktr = 0
             
             FOR k = 1 TO 5000          'fill our records with data
                this_one(k).my_sortkey = STRING$(40, RND(33, 122)) & STRING$(40, RND(33, 122))   'printable common chrs
                this_one(k).my_stuff = STRING$(40, RND(33, 122)) & STRING$(40, RND(33, 122))   'printable common chrs
                this_one(k).more_stuff = STRING$(120, RND(33, 122)) & STRING$(120, RND(33, 122))   'printable common chrs
             NEXT
             ktr = 5000
          
             ' ' ' Read several thousand instances of this_one,
             ' ' ' letting ktr = number of instances. Each instance
             ' ' ' consists of a my_sortkey and two other strings,
             '‘ ‘ ‘ all read in from files, with appropriate editing.
             ' ' ' The reading / editing step is not a bottleneck, and
             ' ' ' is fairly fast.
             ' ' ' No data are placed in instance (0), which is a dummy,
             ' ' ' so instance (ktr) is the last one populated.
          
             '' Below: Make POINTERs for instances 0 to 5000,
             '' even if ktr < 5000 . . .
             FOR k = 0 TO 5000
             stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
             NEXT k
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''
          
             ktra = ktr '' working value of ktr
             ng = ktr \ 2 '' half of ktr (integer division)
          
             ' ' ' Below, the CALL passes stp(0), but this will quickly
             ' ' ' be incremented, so stp(1) will be the first instance
             ' ' ' to be considered . . .
          
             ' ' ' ASM routine to do sorting on my_sortkey . . .
          '     Do sort now
             TIX t
              CALL do_sort (stp(0))
              
              'try it in PB too
          '  array sort this_one()
             TIX END t
             'for PB sort, use this outStr
          '   FOR k = 1 TO 30
          '      outStr = outStr & this_one(k).my_sortkey & $CRLF
          '   NEXT
             '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''
             FOR k = 1 TO 30
                outPtr = stp(k)
                outStr = outStr & @outPtr & $CRLF
             NEXT
             
             ? FORMAT$(t, "0,") & " tix" & $CRLF & outStr
                
             ''' Output all results to a properly-formatted file.
             ''' Output is in a loop on k from 1 to ktr, and is
             '[' essentially as follows:
             '''
             ''' @stp (k).my_stuff
             ''' followed by
             ''' @stp (k).more_stuff
             ''' discarding my_sortkey
             '''
             ''' . . . with appropriate headers, blank-padding,
             ''' blank lines, formatting, and paging. It is a
             ''' relatively fast step, not a bottleneck.
          
             END FUNCTION
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''
          
          
             ''' SUB do_sort follows. It is a comb-sort, and works
             ''' perfectly, but is still the slowest step in this
             ''' application.
             '''
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
             '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
             '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
          
             SUB do_sort (BYREF StrucPtr AS DWORD)
          
             '' Power Basic pushes EBX, ESI, EDI.
             LOCAL disp AS LONG
             ASM CLD
          
             Very_top: ' Scan from the top
          
             ASM MOV chgd, 0 ; Set or reset "changed" flag . . .
          
             ASM LEA EBX, ktra ; Pointer to ktra
             ASM MOV EAX, [EBX] ; EAX = ktra
             ASM CMP EAX, 1 ; Testing for exit, if ktra has been decremented to 1
             ASM JE Labexit
          
             ASM LEA EBX, ng ; Pointer to ng
             ASM SUB EAX, [EBX] ; Subtract ng from ktra.
             ASM MOV ECX, EAX ; This ECX is based on (ktra - ng).
             ' It determines how far down in the
             ' table the testing goes on the current
             ' iteration.
          
             ASM MOV EAX, [EBX] ; EAX = ng
             ASM SAL EAX, 2 ; Quadrupled
             ASM LEA EBX, disp
             ASM MOV [EBX], EAX ; disp (displacement) in bytes =
             ' 4 * ng between stp( ) pointers
             ' of the two strings to be compared
          
             ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will
             ' start with stp(1). Below this
             ' point, EDX will always = stp ( ).
          
          
             ''' Scan will go from the top of the array to a bottom
             ''' which gradually moves up as "heavier" items sink to
             ''' the bottom and "lighter" ones rise to the top. When
             ''' ktra, the working value of the bottom of the set,
             ''' has been decremented all the way to 1 (top of the
             ''' set), sorting is finished. It is also finished if
             ''' an iteration fails to exchange a pair of items,
             ''' provided that at least one bubble-sorting iteration
             ''' has been made.
             '''
             ''' "Very_top" starts a new iteration from the top to the
             ''' current bottom.
             '''
             ''' "Intermediate" starts a comparison between two
             ''' my_sortkey strings and a possible exchange of
             ''' this_one pointers.
          
          
             Intermediate:
          
             ASM PUSH ECX ; Save master counter.
             ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
          
             ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
             ''' farther down in the this_one( ) array, at least
             ''' according to the stp ( ) pointers. [The instances of
             ''' this_one are never moved. Only their stp( )
             ''' pointers are moved.]
          
             ASM MOV ECX, 80 ; ECX becomes the counter to compare 80
             ' bytes (length of any my_sortkey).
          
             ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.
          
             ASM LEA EBX, disp
             ASM MOV EAX, [EBX] ; disp in EAX
             ASM ADD EAX, EDX ; EAX = EDX + disp, for my_sortkey #2.
             ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.
          
             ASM REPE CMPSD ; Compare strings. ESI and EDI change
             ' in this process. This is "repeat
             ' while equal," so it exits at first
             ' mismatch.
          
             ASM JB Past_exchange ; "Below" = smaller ASCII value, so
             ' jump to avoid changing order,
             ' else . . .
          
             '' Exchanging pointers to change order . . .
             ASM MOV ESI, [EDX] ; First, restore original ESI and EDI.
             ASM MOV EDI, [EAX]
             ASM MOV [EDX], EDI ; Then, exchange their locations in the
             ' stp( ) array . . .
             ASM MOV [EAX], ESI
          
             ASM MOV chgd, 1 ; Set "changed" flag.
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
          
             Past_exchange:
          
             ''' Master counter (the one based on ktra) . . .
             ASM POP ECX
          
             '' Decrement and check counter, for loop-back to
             '' Intermediate . . .
             ASM DEC ECX
             ASM JNZ Intermediate ; to compare another pair of instances
             ' on current iteration
          
             '' Current iteration has ended, so narrow the range ng
             '‘’ (unless ng = 1) . . .
             ASM LEA EBX, ng
             ASM MOV EAX, [EBX] ; EAX = [EBX] = ng
             ASM CMP EAX, 1 ; To test for ng = 1
          
             ASM JE SHORT Keep_ng ; If ng was = 1, keep it, else (below)
             ' scale it down . .
          
             '' Scaling down ng by 3/4. Scale factor is not an exact
             '' science, and an assunmption must be made. Literature
             '' suggests division by 1.3, with integer rounding, so
             '' this is close . . .
          
             ASM SAL EAX, 1 ; Double it.
             ASM ADD EAX, [EBX] ; Triple it.
             ASM SAR EAX, 2 ; Integer division by 4,
             ' can go down to 0.
          
             ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
             ASM JNE SHORT Save_ng
          
             ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
             ' below), but only if it was = 0
             ' above. With ng = 1, sort will
             ' become a bubble-sort.
             Save_ng:
             ASM MOV [EBX], EAX ; Save ng, which may or may not = 1.
             ' [EBX] still = ng here, but not after
             ' this point.
             ASM JMP Very_top ; To start new iteration
          
             Keep_ng:
             ASM LEA EBX, chgd ; Check "changed" flag . . .
             ' Note: No ehecking of chgd unless this
             ' iteration was a bubble-sort, using
             ' ng = 1
          
             ASM MOV EAX, [EBX]
             ASM SUB EAX, 0 ; To compare chgd with 0
          
             ASM JZ Labexit ; To Labexit if no change on this
             ' iteration, else . . .
          
             ASM LEA EBX, ktra
             ASM MOV EAX, [EBX]
             ASM DEC EAX ; Decrement ktra (move bottom upward).
             ASM MOV [EBX], EAX ; Store new ktra.
          
             ASM JMP Very_top ; To start new iteration
          
          
             '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''
          
             Labexit:
          
             '' Nothing to POP
          
             END SUB '' do_sort

          Comment


          • #6
            Here it is with several optimizations, but by far the main one is replacing REPE CMPSB with our own compare loop, which more than doubled the speed. It could be improved to nearly 3x by comparing 4 bytes at a time.

            Code:
               #DIM ALL
               #REGISTER NONE
               ' Comb sort in ASM
               ''' A forum member suggested that I should post this sample
               ''' code here. It is a comb sort, generally faster than a
               ''' bubble sort.
            
            
               '''
               TYPE my_type
               my_sortkey AS STRING * 80 '' Key record
               my_stuff AS STRING * 80 '' More data
               more_stuff AS STRING * 240 '' Still more data
               END TYPE
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
            
               FUNCTION PBMAIN() AS LONG
               '''
               DIM this_one (0 TO 5000) AS GLOBAL my_type
               DIM stp (0 TO 5000) AS GLOBAL my_type POINTER
               LOCAL k AS LONG, outPtr AS STRING PTR * 80, outStr AS STRING, t AS QUAD
            '   RANDOMIZE
               '’’ Below: variables used in comb-sort algorithm
               DIM   chgd AS GLOBAL LONG, _ ' Set to 0 (nothing changed on this iteration), or to 1
               ktr AS GLOBAL LONG, _ ' Number of instances, or items to sort
               ktra AS GLOBAL LONG, _ ' Working value of ktr gradually diminished
               ng AS GLOBAL LONG ' A measure of the separation
               ' between two items to be
               ' compared, initially ktr \ 2,
               ' eventually 1
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
            
               ktr = 0
            
               FOR k = 1 TO 5000          'fill our records with data
                  this_one(k).my_sortkey = CHR$(RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122)) & STRING$(72, RND(33, 122))   'printable common chrs
                  this_one(k).my_stuff = STRING$(40, RND(33, 122)) & STRING$(40, RND(33, 122))   'printable common chrs
                  this_one(k).more_stuff = STRING$(120, RND(33, 122)) & STRING$(120, RND(33, 122))   'printable common chrs
               NEXT
               ktr = 5000
            
               ' ' ' Read several thousand instances of this_one,
               ' ' ' letting ktr = number of instances. Each instance
               ' ' ' consists of a my_sortkey and two other strings,
               '‘ ‘ ‘ all read in from files, with appropriate editing.
               ' ' ' The reading / editing step is not a bottleneck, and
               ' ' ' is fairly fast.
               ' ' ' No data are placed in instance (0), which is a dummy,
               ' ' ' so instance (ktr) is the last one populated.
            
               '' Below: Make POINTERs for instances 0 to 5000,
               '' even if ktr < 5000 . . .
               FOR k = 0 TO 5000
               stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
               NEXT k
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''
            
               ktra = ktr '' working value of ktr
               ng = ktr \ 2 '' half of ktr (integer division)
            
               ' ' ' Below, the CALL passes stp(0), but this will quickly
               ' ' ' be incremented, so stp(1) will be the first instance
               ' ' ' to be considered . . .
            
               ' ' ' ASM routine to do sorting on my_sortkey . . .
            '     Do sort now
               TIX t
                CALL do_sort (stp(0))
            
                'try it in PB too
            '  ARRAY SORT this_one()
               TIX END t
               'for PB sort, use this outStr
            '   FOR k = 1 TO 30
            '      outStr = outStr & this_one(k).my_sortkey & $CRLF
            '   NEXT
               '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''
               FOR k = 1 TO 30
                  outPtr = stp(k)
                  outStr = outStr & @outPtr & $CRLF
               NEXT
            
               ? FORMAT$(t, "0,") & " tix" & $CRLF & outStr
            
               ''' Output all results to a properly-formatted file.
               ''' Output is in a loop on k from 1 to ktr, and is
               '[' essentially as follows:
               '''
               ''' @stp (k).my_stuff
               ''' followed by
               ''' @stp (k).more_stuff
               ''' discarding my_sortkey
               '''
               ''' . . . with appropriate headers, blank-padding,
               ''' blank lines, formatting, and paging. It is a
               ''' relatively fast step, not a bottleneck.
            
               END FUNCTION
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''
            
            
               ''' SUB do_sort follows. It is a comb-sort, and works
               ''' perfectly, but is still the slowest step in this
               ''' application.
               '''
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
               '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
               '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
            
               SUB do_sort (BYREF StrucPtr AS DWORD)
            
               '' Power Basic pushes EBX, ESI, EDI.
               LOCAL disp AS LONG
               ASM CLD
            
               Very_top: ' Scan from the top
            
               ASM MOV chgd, 0 ; Set or reset "changed" flag . . .
            
            '   ASM LEA EBX, ktra ; Pointer to ktra
            '   ASM MOV EAX, [EBX] ; EAX = ktra
               !mov eax, ktra
               ASM CMP EAX, 1 ; Testing for exit, if ktra has been decremented to 1
               ASM JE Labexit
            
               !mov ebx, ng       'ng in ebx
            '   ASM LEA EBX, ng ; Pointer to ng
               !sub eax, ebx      ; Subtract ng from ktra.
            '   ASM SUB EAX, [EBX] ; Subtract ng from ktra.
               ASM MOV ECX, EAX ; This ECX is based on (ktra - ng).
               ' It determines how far down in the
               ' table the testing goes on the current
               ' iteration.
            
            '   !mov eax, ebx      ; EAX = ng
               !lea eax, [ebx*4]
            '   ASM MOV EAX, [EBX] ; EAX = ng
            '   ASM SAL EAX, 2 ; Quadrupled
            '   ASM LEA EBX, disp
            '   ASM MOV [EBX], EAX ; disp (displacement) in bytes =
               !mov disp, eax
               ' 4 * ng between stp( ) pointers
               ' of the two strings to be compared
            
               ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will
               ' start with stp(1). Below this
               ' point, EDX will always = stp ( ).
            
            
               ''' Scan will go from the top of the array to a bottom
               ''' which gradually moves up as "heavier" items sink to
               ''' the bottom and "lighter" ones rise to the top. When
               ''' ktra, the working value of the bottom of the set,
               ''' has been decremented all the way to 1 (top of the
               ''' set), sorting is finished. It is also finished if
               ''' an iteration fails to exchange a pair of items,
               ''' provided that at least one bubble-sorting iteration
               ''' has been made.
               '''
               ''' "Very_top" starts a new iteration from the top to the
               ''' current bottom.
               '''
               ''' "Intermediate" starts a comparison between two
               ''' my_sortkey strings and a possible exchange of
               ''' this_one pointers.
            
            
               Intermediate:
            
               ASM PUSH ECX ; Save master counter.
               ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
            
               ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
               ''' farther down in the this_one( ) array, at least
               ''' according to the stp ( ) pointers. [The instances of
               ''' this_one are never moved. Only their stp( )
               ''' pointers are moved.]
            
               ASM MOV ECX,  0 ; ECX becomes the counter to compare to 80. Change to 80 if using REPE CMPSB below
               ' bytes (length of any my_sortkey).
            
               ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.
            
               !mov eax, disp      ; disp in EAX
            '   ASM LEA EBX, disp
            '   ASM MOV EAX, [EBX] ; disp in EAX
                ASM ADD EAX, EDX ; EAX = EDX + disp, for my_sortkey #2.
                ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.
            
              nextCmp:                         'compare strings with our own code--over 2x faster than REPE CMPSB
               !movzx ebx, byte ptr[edi+ecx]
               !cmp byte ptr[esi+ecx], bl
            '  !mov ebx, [edi+ecx]             'if you customize my_sortkey, you can compare 4 bytes at a time for about 3x faster than REPE CMPSB
            '  !cmp [esi+ecx], ebx
               !jne short exitCmp
               !add ecx, 1                     ;1 here would change to 4 if comparing 4 bytes at a time
               !cmp ecx, 80
               !jb short nextCmp
              exitCmp:
            
            ' ASM REPE CMPSB ; Compare strings. ESI and EDI change
               ' in this process. This is "repeat
               ' while equal," so it exits at first
               ' mismatch.
            
               ASM JB Past_exchange ; "Below" = smaller ASCII value, so
               ' jump to avoid changing order,
               ' else . . .
            
               '' Exchanging pointers to change order . . .
               ASM MOV ESI, [EDX] ; First, restore original ESI and EDI.
               ASM MOV EDI, [EAX]
               ASM MOV [EDX], EDI ; Then, exchange their locations in the
               ' stp( ) array . . .
               ASM MOV [EAX], ESI
            
               ASM MOV chgd, 1 ; Set "changed" flag.
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
            
               Past_exchange:
            
               ''' Master counter (the one based on ktra) . . .
               ASM POP ECX
            
               '' Decrement and check counter, for loop-back to
               '' Intermediate . . .
               ASM DEC ECX
               ASM JNZ Intermediate ; to compare another pair of instances
               ' on current iteration
            
               '' Current iteration has ended, so narrow the range ng
               '‘’ (unless ng = 1) . . .
               !mov eax, ng
            '  ASM LEA EBX, ng
            '  ASM MOV EAX, [EBX] ; EAX = [EBX] = ng
               ASM CMP EAX, 1 ; To test for ng = 1
            
               ASM JE SHORT Keep_ng ; If ng was = 1, keep it, else (below)
               ' scale it down . .
            
               '' Scaling down ng by 3/4. Scale factor is not an exact
               '' science, and an assunmption must be made. Literature
               '' suggests division by 1.3, with integer rounding, so
               '' this is close . . .
            
               ASM SAL EAX, 1 ; Double it.
               !add eax, ng
            '   ASM ADD EAX, [EBX] ; Triple it.
               ASM SAR EAX, 2 ; Integer division by 4,
               ' can go down to 0.
            
               ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
               ASM JNE SHORT Save_ng
            
               ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
               ' below), but only if it was = 0
               ' above. With ng = 1, sort will
               ' become a bubble-sort.
               Save_ng:
               !mov ng, eax
            '   ASM MOV [EBX], EAX ; Save ng, which may or may not = 1.
               ' [EBX] still = ng here, but not after
               ' this point.
               ASM JMP Very_top ; To start new iteration
            
               Keep_ng:
               ASM LEA EBX, chgd ; Check "changed" flag . . .
               ' Note: No ehecking of chgd unless this
               ' iteration was a bubble-sort, using
               ' ng = 1
            
               ASM MOV EAX, [EBX]
               ASM SUB EAX, 0 ; To compare chgd with 0
            
               ASM JZ Labexit ; To Labexit if no change on this
               ' iteration, else . . .
            
               ASM LEA EBX, ktra
               ASM MOV EAX, [EBX]
               ASM DEC EAX ; Decrement ktra (move bottom upward).
               ASM MOV [EBX], EAX ; Store new ktra.
            
               ASM JMP Very_top ; To start new iteration
            
            
               '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''
            
               Labexit:
            
               '' Nothing to POP
            
               END SUB '' do_sort

            Comment


            • #7
              Comb sort in ASM

              To Paul Dixon: Thanks for the tips about things like !mov eax, disp. I had tried this in a few places, but it wouldn't compile. The reason was that I had declared disp, etc., as INTEGER instead of as LONG.

              To John Gleason: This is a puzzle! I see your change from CMPSB to CMPSD. However, when I tried this, it doubled the time, even when I cut ECX from 80 back to 20. And the results were wrong -- badly out of order.

              Comment


              • #8
                Harry, this now compares 4 bytes at a time AND always maintains correct sort order, but make sure your sortkey length MOD 4 is 0. This code is usually faster than my post #6 above, which does a byte by byte comparison. Sorting the the data we generate below, both are faster than PB sort by quite a margin.
                Code:
                   #DIM ALL
                   #REGISTER NONE
                   ' Comb sort in ASM
                   ''' A forum member suggested that I should post this sample
                   ''' code here. It is a comb sort, generally faster than a
                   ''' bubble sort.
                   %ARRSIZE = 5000
                
                   '''
                   TYPE my_type
                   my_sortkey AS STRING * 80 '' Key record
                   my_stuff AS STRING * 80 '' More data
                   more_stuff AS STRING * 240 '' Still more data
                   END TYPE
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
                
                   FUNCTION PBMAIN() AS LONG
                   '''
                   DIM this_one (0 TO %ARRSIZE) AS GLOBAL my_type
                   DIM stp (0 TO %ARRSIZE) AS GLOBAL my_type POINTER
                   LOCAL k AS LONG, outPtr AS STRING PTR * 80, outStr AS STRING, t AS QUAD
                '   RANDOMIZE
                   '’’ Below: variables used in comb-sort algorithm
                   DIM   chgd AS GLOBAL LONG, _ ' Set to 0 (nothing changed on this iteration), or to 1
                   ktr AS GLOBAL LONG, _ ' Number of instances, or items to sort
                   ktra AS GLOBAL LONG, _ ' Working value of ktr gradually diminished
                   ng AS GLOBAL LONG ' A measure of the separation
                   ' between two items to be
                   ' compared, initially ktr \ 2,
                   ' eventually 1
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''
                
                   ktr = 0
                
                   FOR k = 1 TO %ARRSIZE          'fill our records with data
                      this_one(k).my_sortkey = CHR$(RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122),RND(33, 122)) & STRING$(72, RND(33, 122))   'printable common chrs
                '      this_one(k).my_sortkey = STRING$(10, RND(33, 122)) & STRING$(70, RND(33, 122))   'printable common chrs
                      this_one(k).my_stuff = STRING$(40, RND(33, 122)) & STRING$(40, RND(33, 122))   'printable common chrs
                      this_one(k).more_stuff = STRING$(120, RND(33, 122)) & STRING$(120, RND(33, 122))   'printable common chrs
                   NEXT
                   ktr = %ARRSIZE
                
                   ' ' ' Read several thousand instances of this_one,
                   ' ' ' letting ktr = number of instances. Each instance
                   ' ' ' consists of a my_sortkey and two other strings,
                   '‘ ‘ ‘ all read in from files, with appropriate editing.
                   ' ' ' The reading / editing step is not a bottleneck, and
                   ' ' ' is fairly fast.
                   ' ' ' No data are placed in instance (0), which is a dummy,
                   ' ' ' so instance (ktr) is the last one populated.
                
                   '' Below: Make POINTERs for instances 0 to %ARRSIZE,
                   '' even if ktr < %ARRSIZE . . .
                   FOR k = 0 TO %ARRSIZE
                   stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
                   NEXT k
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''
                
                   ktra = ktr '' working value of ktr
                   ng = ktr \ 2 '' half of ktr (integer division)
                
                   ' ' ' Below, the CALL passes stp(0), but this will quickly
                   ' ' ' be incremented, so stp(1) will be the first instance
                   ' ' ' to be considered . . .
                
                   ' ' ' ASM routine to do sorting on my_sortkey . . .
                '     Do sort now
                   TIX t
                    CALL do_sort (stp(0))
                
                    'try it in PB too
                '  ARRAY SORT this_one()
                   TIX END t
                   'for PB sort, use this outStr
                '   FOR k = 1 TO 30
                '      outStr = outStr & this_one(k).my_sortkey & $CRLF
                '   NEXT
                   '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''
                   FOR k = 1 TO 30
                      outPtr = stp(k)
                      outStr = outStr & @outPtr & $CRLF
                '      outStr = outStr & this_one(k).my_sortkey & $CRLF
                   NEXT
                
                   ? FORMAT$(t, "0,") & " tix" & $CRLF & outStr
                
                   ''' Output all results to a properly-formatted file.
                   ''' Output is in a loop on k from 1 to ktr, and is
                   '[' essentially as follows:
                   '''
                   ''' @stp (k).my_stuff
                   ''' followed by
                   ''' @stp (k).more_stuff
                   ''' discarding my_sortkey
                   '''
                   ''' . . . with appropriate headers, blank-padding,
                   ''' blank lines, formatting, and paging. It is a
                   ''' relatively fast step, not a bottleneck.
                
                   END FUNCTION
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''
                
                
                   ''' SUB do_sort follows. It is a comb-sort, and works
                   ''' perfectly, but is still the slowest step in this
                   ''' application.
                   '''
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
                   '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
                   '''''''''''''''''''''''''''''''''''''''''''''''''' ''''
                
                   SUB do_sort (BYREF StrucPtr AS DWORD)
                
                   '' Power Basic pushes EBX, ESI, EDI.
                   LOCAL disp AS LONG
                   ASM CLD
                
                   Very_top: ' Scan from the top
                
                   ASM MOV chgd, 0 ; Set or reset "changed" flag . . .
                
                '   ASM LEA EBX, ktra ; Pointer to ktra
                '   ASM MOV EAX, [EBX] ; EAX = ktra
                   !mov eax, ktra
                   ASM CMP EAX, 1 ; Testing for exit, if ktra has been decremented to 1
                   ASM JE Labexit
                
                   !mov ebx, ng       'ng in ebx
                '   ASM LEA EBX, ng ; Pointer to ng
                   !sub eax, ebx      ; Subtract ng from ktra.
                '   ASM SUB EAX, [EBX] ; Subtract ng from ktra.
                   ASM MOV ECX, EAX ; This ECX is based on (ktra - ng).
                   ' It determines how far down in the
                   ' table the testing goes on the current
                   ' iteration.
                
                '   !mov eax, ebx      ; EAX = ng
                   !lea eax, [ebx*4]
                '   ASM MOV EAX, [EBX] ; EAX = ng
                '   ASM SAL EAX, 2 ; Quadrupled
                '   ASM LEA EBX, disp
                '   ASM MOV [EBX], EAX ; disp (displacement) in bytes =
                   !mov disp, eax
                   ' 4 * ng between stp( ) pointers
                   ' of the two strings to be compared
                
                   ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will
                   ' start with stp(1). Below this
                   ' point, EDX will always = stp ( ).
                
                
                   ''' Scan will go from the top of the array to a bottom
                   ''' which gradually moves up as "heavier" items sink to
                   ''' the bottom and "lighter" ones rise to the top. When
                   ''' ktra, the working value of the bottom of the set,
                   ''' has been decremented all the way to 1 (top of the
                   ''' set), sorting is finished. It is also finished if
                   ''' an iteration fails to exchange a pair of items,
                   ''' provided that at least one bubble-sorting iteration
                   ''' has been made.
                   '''
                   ''' "Very_top" starts a new iteration from the top to the
                   ''' current bottom.
                   '''
                   ''' "Intermediate" starts a comparison between two
                   ''' my_sortkey strings and a possible exchange of
                   ''' this_one pointers.
                
                
                   Intermediate:
                
                   ASM PUSH ECX ; Save master counter.
                   ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
                
                   ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
                   ''' farther down in the this_one( ) array, at least
                   ''' according to the stp ( ) pointers. [The instances of
                   ''' this_one are never moved. Only their stp( )
                   ''' pointers are moved.]
                
                   ASM MOV ECX,  0 ; ECX becomes the counter to compare to 80. Change to 80 if using REPE CMPSB below
                   ' bytes (length of any my_sortkey).
                
                   ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.
                
                   !mov eax, disp      ; disp in EAX
                '   ASM LEA EBX, disp
                '   ASM MOV EAX, [EBX] ; disp in EAX
                    ASM ADD EAX, EDX ; EAX = EDX + disp, for my_sortkey #2.
                    ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.
                    
                '--------------------------comparison part of code to replace REPE CMPSB-----------------------------------
                'this code handles 4 bytes at a time
                !push eax                          'because we need value later
                  nextCmp:                         'compare strings with our own code--approaching 3x faster than REPE CMPSB
                '   !movzx ebx, byte ptr[edi+ecx]
                '   !cmp byte ptr[esi+ecx], bl
                   !mov ebx, [edi+ecx]             'swap bytes in my_sortkey to make big endian
                   !mov eax, [esi+ecx]
                   !bswap ebx                      'make big endian so string sort is correct
                   !bswap eax                      '         "           "            "
                   !cmp eax, ebx
                   !jne short exitCmp
                   !add ecx, 4                     ;4 here because comparing 4 bytes at a time
                   !cmp ecx, 80
                   !jb short nextCmp
                  exitCmp:
                !pop eax
                '----------------------end comparison part of code to replace REPE CMPSB-----------------------------------
                 
                'ASM REPE CMPSB ; Compare strings. ESI and EDI change
                   ' in this process. This is "repeat
                   ' while equal," so it exits at first
                   ' mismatch.
                
                   ASM JB Past_exchange ; "Below" = smaller ASCII value, so
                   ' jump to avoid changing order,
                   ' else . . .
                
                   '' Exchanging pointers to change order . . .
                   ASM MOV ESI, [EDX] ; First, restore original ESI and EDI.
                   ASM MOV EDI, [EAX]
                   ASM MOV [EDX], EDI ; Then, exchange their locations in the
                   ' stp( ) array . . .
                   ASM MOV [EAX], ESI
                
                   ASM MOV chgd, 1 ; Set "changed" flag.
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''
                
                   Past_exchange:
                
                   ''' Master counter (the one based on ktra) . . .
                   ASM POP ECX
                
                   '' Decrement and check counter, for loop-back to
                   '' Intermediate . . .
                   ASM DEC ECX
                   ASM JNZ Intermediate ; to compare another pair of instances
                   ' on current iteration
                
                   '' Current iteration has ended, so narrow the range ng
                   '‘’ (unless ng = 1) . . .
                   !mov eax, ng
                '  ASM LEA EBX, ng
                '  ASM MOV EAX, [EBX] ; EAX = [EBX] = ng
                   ASM CMP EAX, 1 ; To test for ng = 1
                
                   ASM JE SHORT Keep_ng ; If ng was = 1, keep it, else (below)
                   ' scale it down . .
                
                   '' Scaling down ng by 3/4. Scale factor is not an exact
                   '' science, and an assunmption must be made. Literature
                   '' suggests division by 1.3, with integer rounding, so
                   '' this is close . . .
                
                   ASM SAL EAX, 1 ; Double it.
                   !add eax, ng
                '   ASM ADD EAX, [EBX] ; Triple it.
                   ASM SAR EAX, 2 ; Integer division by 4,
                   ' can go down to 0.
                
                   ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
                   ASM JNE SHORT Save_ng
                
                   ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
                   ' below), but only if it was = 0
                   ' above. With ng = 1, sort will
                   ' become a bubble-sort.
                   Save_ng:
                   !mov ng, eax
                '   ASM MOV [EBX], EAX ; Save ng, which may or may not = 1.
                   ' [EBX] still = ng here, but not after
                   ' this point.
                   ASM JMP Very_top ; To start new iteration
                
                   Keep_ng:
                   ASM LEA EBX, chgd ; Check "changed" flag . . .
                   ' Note: No ehecking of chgd unless this
                   ' iteration was a bubble-sort, using
                   ' ng = 1
                
                   ASM MOV EAX, [EBX]
                   ASM SUB EAX, 0 ; To compare chgd with 0
                
                   ASM JZ Labexit ; To Labexit if no change on this
                   ' iteration, else . . .
                
                   ASM LEA EBX, ktra
                   ASM MOV EAX, [EBX]
                   ASM DEC EAX ; Decrement ktra (move bottom upward).
                   ASM MOV [EBX], EAX ; Store new ktra.
                
                   ASM JMP Very_top ; To start new iteration
                
                
                   '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''
                
                   Labexit:
                
                   '' Nothing to POP
                
                   END SUB '' do_sort

                Comment


                • #9
                  Comb sort in ASM

                  Thanks, John Gleason for all the help. Your method of comparing 4 bytes at a time cuts the sorting time by more than half, down to about 0.04 second in my test case (from 0.09 to 0.10). When I said the 4-byte comparison doubled the time and gave totally wrong results, I had simply cut ECX to 20 and substituted REPE CMPSD for REPE CMPSB, with no concern for the order of the bytes within each 4-byte sequence.

                  I still don't quite understand what is happening when you say
                  ! MOV EBX, ng
                  ! LEA EAX, [EBX * 4]
                  ! MOV disp, EAX

                  I see that EBX * 4 gives the desired value for disp, but it appears that you are loading EAX with the address of a numeric quantity, then moving that address to disp.

                  Comment


                  • #10
                    Harry,
                    LEA is Load Effective Address, it loads into the register the address of the operand that follows rather than the content pointed to by that address.
                    If you used MOV eax,[ebx*4] then eax would get the content of the memory at address ebx*4
                    If you use LEA eax,[ebx*4] then eax gets the address instead of the content, in this case ebx*4 is loaded into eax.

                    It's a way to multiply ebx by 4 without the use of a slower MUL instruction.

                    Paul.

                    Comment


                    • #11
                      Comb sort in ASM

                      Thanks. I think I understand how you are using LEA in that context. It works only if you are multiplying by powers of 2.
                      My app can be streamlined a little more. With your method of using CMPSD instead of CMPSB, ESI and EDI are not changed during the comparison, so they don't have to be restored after the comparison (no need for ! MOV ESI, [EDX] and ! MOV EDI, [EAX] before exchanging pointers), and the elimination of these also makes a visible improvement in the timing.
                      Also, an early ! MOV ECX, EAX can be eliminated by placing ktra in ECX to begin with and working with it there.

                      Thanks for all your help. The PB version now compares very favorably with that other compiler.

                      Comment


                      • #12
                        Harry,
                        Code:
                        It works only if you are multiplying by powers of 2.
                        It works with a very limited range of operands.
                        The address can be formed from an immediate operand plus a register plus a register *1 or *2 or *4 or *8. The second register doesn't have to be different to the first.

                        So you could use:
                        Code:
                        !lea eax,[ecx + edx*8 + 1234]
                        to put 1234 + ecx + edx*8 into the eax register.

                        or
                        Code:
                        !lea eax,[eax + eax*4]
                        to multiply the value in EAX by 5 very quickly.

                        The PB version now compares very favorably with that other compiler
                        Since the PB version has turned into an ASM version it should beat any compiler. If it doesn't then it needs more work!

                        Paul.

                        Comment


                        • #13
                          Comb sort in ASM (new code)

                          '''
                          #CODE

                          TYPE my_type
                          my_sortkey AS STRING * 80 '' Key record
                          my_stuff AS STRING * 80 '' More data
                          more_stuff AS STRING * 240 '' Still more data
                          END TYPE

                          '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''

                          FUNCTION PBMAIN() AS LONG
                          '''
                          DIM this_one (0 TO 10000) AS GLOBAL my_type
                          DIM stp (0 TO 10000) AS GLOBAL my_type POINTER
                          DIM k AS INTEGER

                          ''' Below: variables used in comb-sort algorithm
                          DIM _
                          chgd AS GLOBAL LONG, _ ' Set to 0 (nothing changed
                          ' on this iteration), or to 1
                          ktr AS GLOBAL LONG, _ ' Number of instances,
                          ' or items to sort
                          ktra AS GLOBAL LONG, _ ' Working value of ktr,
                          ' gradually diminished
                          ng AS GLOBAL LONG ' A measure of the separation
                          ' between two items to be
                          ' compared, initially ktr \ 2,
                          ' eventually 1

                          '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''''''''''''''''

                          ktr = 0

                          ' ' ' Read several thousand instances of this_one,
                          ' ' ' incrementing ktr to become number of instances.
                          ' ' ' Each instance consists of a my_sortkey and two
                          ' ' ' other strings, read in from files and/or created
                          ' ' ' by appropriate editing.
                          ' ' '
                          ' ' ' This process, and the data structure that is populated,
                          ' ' ' are specific to this particular example.
                          ' ' ' In this example, my_sortkey is derived from the data
                          ' ' ' as read, and is eventually discarded. However, this
                          ' ' ' is simply due to the requirements of this example.
                          ' ' '
                          ' ' ' What is important here is that one of the items in
                          ' ' ' the data structure is the sort key, and the rest of
                          ' ' ' the items tag along with it when pointers to it are
                          ' ' ' exchanged.
                          ' ' '
                          ' ' ' No data are placed in instance (0), which is a dummy.
                          ' ' ' Instance (ktr) is the last one populated.
                          ' ' '

                          '' Below: Make POINTERs for instances 0 to 10000,
                          '' even if ktr < 10000 . . .
                          FOR k = 0 To 10000
                          stp(k) = VARPTR(this_one(k)) '' Computing POINTERs
                          NEXT k

                          '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''

                          ktra = ktr '' working value of ktr
                          ng = ktr \ 2 '' half of ktr (integer division)

                          ' ' ' Below, the CALL passes stp(0), but this will quickly
                          ' ' ' be incremented, so stp(1) will be the first instance
                          ' ' ' to be considered . . .

                          ' ' ' ASM routine to do sorting on my_sortkey . . .
                          CALL do_sort (stp(0))

                          '''''''''''''''''''''''''''''''''''''''''''''''''' ''''''''

                          ''' After ASM routine . . .
                          '''
                          ''' Output all results to a properly-formatted file.
                          ''' Output is in a loop on k from 1 to ktr, and is
                          ''' essentially as follows:
                          '''
                          ''' @stp (k).my_stuff
                          ''' followed by
                          ''' @stp (k).more_stuff
                          ''' discarding my_sortkey
                          '''
                          ''' . . . with appropriate headers, blank-padding,
                          ''' blank lines, formatting, and paging.
                          '''
                          ''' The sort key does not have to be discarded, but
                          ''' this particular example no longer needs it.
                          '''


                          '''''''''''''''''''''''''''''''''''''''''''''''''''''''


                          ''' SUB do_sort follows. It is a comb-sort.
                          '''

                          '''''''''''''''''''''''''''''''''''''''''''''''''''''''
                          '''''''''''''''''''''''''''''''''''''''''''''''''''''''
                          '''''''''''''''''''''''''''''''''''''''''''''''''''''''

                          SUB do_sort (BYREF StrucPtr AS DWORD)

                          '' Power Basic pushes EBX, ESI, EDI.

                          ASM CLD

                          Very_top: ' Scan from the top

                          ASM MOV chgd, 0 ; Set or reset "changed" flag . . .
                          ' Destination must be LONG for a direct MOV.

                          ASM MOV ECX, ktra ; ECX will eventually become counter for
                          ' comb sort's outer loop.
                          ASM CMP ECX, 1 ; Test for final exit.
                          ASM JE Labexit

                          ASM SUB ECX, ng ; ECX = ktra - ng = counter for outer loop

                          '' Below, find disp = 4 * ng = displacement in bytes between
                          '' the stp() pointers of two strings to be compared . . .

                          ASM LEA EAX, [EAX * 4] ; Fast way of placing EAX * 4 into EAX.
                          ' This approach works only for certain
                          ' kinds of operands.
                          ASM MOV disp, EAX ; ng * 4 = displacement in bytes between
                          ' the stp( ) pointers of two strings to
                          ' be compared


                          ASM MOV EDX, StrucPtr ; Pointer to stp(0). Scan will start
                          ' with stp(1). EDX is dedicated to
                          ' always contain current pointer stp( ).


                          ''' Scan will go from the top of the array to a bottom
                          ''' which gradually moves up as "heavier" items sink to
                          ''' the bottom and "lighter" ones rise to the top. When
                          ''' ktra, the working value of the bottom of the set,
                          ''' has been decremented all the way to 1 (top of the
                          ''' set), sorting is finished. It is also finished if
                          ''' an iteration fails to exchange a pair of items,
                          ''' provided that at least one bubble-sorting iteration
                          ''' (an iteration with ng = 1) has been made.
                          '''
                          ''' "Very_top" starts a new iteration from the top to the
                          ''' current bottom.
                          '''
                          ''' "Intermediate" starts a comparison between two
                          ''' my_sortkey strings and a possible exchange of their
                          ''' stp ( ) pointers.


                          Intermediate:

                          ASM PUSH ECX ; Save outer loop's counter.
                          ASM ADD EDX, 4 ; Pointer to next stp ( ), initially stp(1)

                          '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''

                          ''' Compare my_sortkey #1 with my_sortkey #2, where #2 is
                          ''' farther down in the this_one( ) array, at least
                          ''' according to the stp ( ) pointers. Note: The instances
                          ''' of this_one are never moved. Only their stp( )
                          ''' pointers are moved around in their own array.

                          ASM MOV ECX, 0 ; ECX becomes the counter to compare 80
                          ' bytes (length of any my_sortkey).

                          ASM MOV ESI, [EDX] ; ESI points to my_sortkey #1.
                          ASM MOV EAX, disp
                          ASM ADD EAX, EDX
                          ASM MOV EDI, [EAX] ; EDI points to my_sortkey #2.



                          '''' The following compares 4 bytes at a time, and is
                          '''' faster than the more intuitive REPE CMPSB . . .

                          ASM PUSH EAX ; Needed later

                          NextCmp:
                          '' Below, capture a 4-byte sequence (sort key)
                          '' into EAX, and another one into EBX . . .
                          ASM MOV EAX, [ESI + ECX] ; ECX initially 0, increases by 4
                          ' for each 4-byte sequence compared.
                          ASM MOV EBX, [EDI + ECX]

                          '' Below, use BSWAP because characters must be
                          '' compared left-to-right in a 4-byte sequence,
                          '' so reverse order to get correct comparison . . .
                          ASM BSWAP EAX
                          ASM BSWAP EBX
                          '' Compare the two 4-byte sequences . . .
                          ASM CMP EAX, EBX ; Comparison will set flags.
                          ASM JNE SHORT ExitCmp ; Quit this loop on inequality between
                          ; sort keys, else . . .
                          ASM ADD ECX, 4 ; Update ECX.
                          ASM CMP ECX, 80 ; Compare with length of sort keys.
                          ASM JB SHORT NextCmp ; Back for new comparison if not yet 80.
                          ExitCap:

                          ASM POP EAX ; still needed below

                          '''''' End of key comparison '''''''


                          '' There were two possible CMP results. If CMP EAX, EBX
                          '' said "inequality," it also told whether EAX was "above"
                          '' or "below" EBX (for example, "h" is higher-valued, or
                          '' above, "g").
                          ''
                          '' If CMP EAX, EBX said "equality," and then if CMP ECX, 80
                          '' said "equality" and caused exit from the testing loop, it
                          '' indicated that EAX = EBX, meaning that a pair of duplicate
                          '' sort keys existed.
                          ''
                          '' Next statement is very critical, as it controls whether
                          '' pointers will be exchanged and this iteration will be
                          '' flagged with chgd as having had such an exchange.
                          '' To jump around the step that exchanges pointers, it is a
                          '' mistake to use JB (jump if below).
                          ''
                          '' JBE is required, because the jump around the pointer
                          '' exchange should also occur if the two sort keys were equal.
                          ''
                          '' With JB instead of JBE, a single pair of duplicate sort
                          '' keys can cause an increase in sorting time by orders of
                          '' magnitude, because this pair requires repeated exchanging
                          '' of pointers and flagging of the exchange, hence many
                          '' repeated iterations.
                          ''


                          ASM JBE Past_exchange ; Jump to avoid exchanging pointers.


                          '' Exchanging pointers in the stp ( ) array, to change order . . .
                          ASM MOV [EAX], ESI
                          ASM MOV [EDX], EDI

                          ASM MOV chgd, 1 ; Set "changed" flag.

                          ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

                          Past_exchange:

                          '' Restore outer loop's counter . . .
                          ASM POP ECX

                          '' Decrement and check counter, for loop-back to
                          '' Intermediate . . .
                          ASM DEC ECX
                          ASM JNZ Intermediate ; to compare another pair of instances
                          ' on current iteration

                          '' Current iteration has ended, so narrow the range ng
                          '' (unless ng = 1 already) . . .
                          ASM MOV EAX, ng
                          ASM CMP EAX, 1

                          ASM JE SHORT Keep_ng ; If ng was 1, keep it, else (below)
                          ' scale it down.

                          '' Scaling down ng by 3/4 . . .
                          '' Scale factor is not an exact science, and an assunmption
                          '' must be made. Literature suggests division by 1.3, with
                          '' integer rounding, so this is close . . .

                          ASM LEA EAX, [EAX + EAX * 2] ; Triple ng in EAX.
                          ASM SAR EAX, 2 ; Divide by 4. Result can go to 0.

                          ASM CMP EAX, 0 ; Testing for EAX = new ng = 0
                          ASM JNE SHORT Save_ng ; If not 0, save new ng, else . . .

                          ASM MOV EAX, 1 ; Force EAX = 1 (to be stored as ng,
                          ' below), but only if it was = 0
                          ' above. With ng = 1, sort will
                          ' henceforth be a bubble-sort.

                          Save_ng:
                          ASM MOV ng, EAX ; Save ng, from EAX,
                          ASM JMP Very_top ; To start new iteration

                          Keep_ng:
                          ASM CMP chgd, 0 ; Check the "changed" flag.
                          ' Note: It can't get here to check chgd
                          ' unless this iteration was a bubble-sort
                          ' (ng = 1). The chgd flag must never be
                          ' checked unless at least one bubble-sorting
                          ' itgeration has been made.
                          ASM JZ Labexit ; To Labexit if no exchange on this iteration,
                          ; else . . .

                          ASM DEC ktra ; Decrement ktra for next iteration.
                          ASM JMP Very_top ; To start new iteration

                          '''''''''''''''''''''''''''''''''''''''''''''''''' '''''''''''''''

                          Labexit:

                          '' Nothing to POP

                          END SUB '' do_sort

                          Comment

                          Working...
                          X