Announcement

Collapse
No announcement yet.

SmartAlphaSort done

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SmartAlphaSort done

    Hi everybody,

    They said "It could not be done"

    But I finally got my SmartSort alpha version running.
    It is FAST, blazingly FAST.
    How about sorting 16 millions 4 bytes Alpha keys from A to Z in 1.68 seconds while ARRAY SORT take 30.18 seconds for the same work ?
    It is very easy to scale up, if you have enough memory, to all 256 Asc codes.
    Same thing for the number of bytes in the key.
    This is the first working version so it can probably be improved yet.
    Comments welcomed.

    I wonder if M. Zale would be willing to be my Agent as I would not know how to present this program to Microsoft or Google ?
    Code:
    'SmartAlphaSort.bas by Guy Dombrowski (all rights reserved)
    'Developped using my 1980 SmartSort, for numeric only, as a template
    'A new kind of algorithm for sorting alpha keys without moving string around
    'It is many times faster than existing algorithm and easy to scale up
    
    #COMPILE EXE
    #BREAK ON
     DEFLNG a-z
    
    FUNCTION PBMAIN () AS LONG
     LOCAL t1,t2 AS SINGLE: RANDOMIZE CVS(CHR$(255,255,255,255))
     M=2000000    ' Maximum number of keys to be sorted
     N=4             ' Number of bytes in keys
     L=65: U=90   ' Lower and Higher ASC value to be sorted between A and Z for now
     R=U-L          ' Range limit for test 0 to 25  A=0  Z=25
     B=1000000    ' Basic array size for 4 byte key
     K=(B*(R+1))  ' Maximum array size
     F=0             ' Index pointer
     P=0             ' Index pointer
     Y=0             ' Index pointer
     DIM tot(k)    ' Total instance of each character per row
     DIM jrx(k)     ' Index for big array
     DIM jrp(k)     ' Pointer for building index
     DIM jrs(m,n)  ' Sorted Asc value minus Lbound by column
     DIM a(n)       ' Array for column Asc value
     DIM k(n)       ' Array for Modulo math
     k(0)=1000000
     k(1)=10000
     k(2)=100
     k(3)=0
     
     PRINT "Building Test file...";
     GOSUB MakeTrs  ' Make dummy transactions
     PRINT "Smart Sorting";m;"Alpha keys..."
    
    'Reading file from simulated hard disk and counting each characters per column
     t1=TIMER: FOR t=0 TO m: GOSUB ReadTrs: NEXT: t2=TIMER
    
    'Building index while checking for multiple keys
     FOR x=0 TO k
       IF tot(x)>0 THEN
         q=tot(x): WHILE q>0: jrx(f)=x: DECR q: INCR f: WEND
         jrp(x)=p: p=p+tot(x)
       END IF
     NEXT
    
    'Building sorted array from scratch using computed Asc values
     FOR t=0 TO m
       z=jrx(t): FOR y=0 TO n-2: a(y)=z\k(y): z=z MOD k(y): NEXT: a(y)=z
       FOR x=0 TO n-1: jrs(t,x)=a(x): NEXT ' You can write sorted keys to disk here
     NEXT
     
     t3=TIMER ' For total time
    
    'Show time elapsed for SmartSort and 100 first keys
     PRINT "Reading and counting ="; t2-t1; " Actual sorting =";t3-t2;" Total =";t3-t1: PRINT
     FOR t=0 TO 99: PRINT CHR$(jrs(t,0)+L);CHR$(jrs(t,1)+L);CHR$(jrs(t,2)+L);CHR$(jrs(t,3)+L);" ";: NEXT
    
     PRINT: PRINT: PRINT "Now do a PB ARRAY SORT on the same starting data..."
     t1 = TIMER: ARRAY SORT test$(): t2=TIMER
    
    'Show time elapsed for PB Array Sort and 100 first keys
     PRINT "Time for ARRAY SORT is "; t2-t1; " secs, and the 1st 100 sorted values are: ": PRINT
     FOR x=0 TO 99: PRINT test$(x);" ";: NEXT
     PRINT: PRINT: PRINT "The results should match. Press a key or clic [x] to END...";
    
     WAITKEY$: END
    
    ReadTrs:REM Build numeric arrays with ASC value minus lbound of all alpha keys
     FOR c=0 TO n-1: a(c)=ASC(MID$(test$(t),c+1))-L: NEXT
     w=(a(0)*1000000)+(a(1)*10000)+(a(2)*100)+a(3)
     INCR tot(w)
    RETURN
    
    MakeTrs:REM Big random 4 byte alpha key array...
     REDIM test$(m), t1(m),t2(m),t3(m),t4(m)
     FOR y=0 TO m: t1(y)=RND(L,U):t2(y)=RND(L,U): t3(y)=RND(L,U):t4(y)=RND(L,U): NEXT
     FOR x=0 TO m: test$(x)=CHR$(t1(x),t2(x),t3(x),t4(x)): NEXT
     ERASE t1(),t2(),t3(),t4()
    RETURN
    
    END FUNCTION
    Last edited by Guy Dombrowski; 27 May 2009, 12:38 PM.
    Old QB45 Programmer

  • #2
    SmartTree

    Thinking about tree....

    By using the same kind of algorithm, it is now possible to write a new kind of instant search that will go directly to any unique alpha key or to the beginning of multiple identical one.

    Another benefit is the ability to validate each digit as it is entered by the operator.
    Last edited by Guy Dombrowski; 28 May 2009, 10:02 AM.
    Old QB45 Programmer

    Comment


    • #3
      ...how to present this program to Microsoft or Google...
      Posting the source here, I think you just did.
      There are no atheists in a fox hole or the morning of a math test.
      If my flag offends you, I'll help you pack.

      Comment


      • #4
        >Posting the source here, I think you just did.

        Mel, You think some of our unnamed guests could work for those big companies ?
        Old QB45 Programmer

        Comment


        • #5
          How about sorting 16 millions 4 bytes Alpha keys from A to Z in 1.68 seconds while ARRAY SORT take 30.18 seconds for the same work ?
          Note that ARRAY SORT() has a few more options to support than a four-byte key with a restricted range of values and a more-or-less random scattering of key values.

          That is, you are attempting a comparison where there is none.

          However, you are demonstrating an important principle: all optimization is application-specific. Speaking of which...

          With a four byte key, you could let PB sort it as an integer rather than a string. Five bucks says PB's ARRAY SORT can handle 16M random long integers right quick.

          MCM
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Numeric sort

            Michael,

            If you look at my numeric SmartSort you will see a comparison of both numeric sort and my sort beat PB sort there too.
            Old QB45 Programmer

            Comment


            • #7
              Guy
              I wouldn't be putting a deposit in a new yacht just jet, it is cute and reasonably fast but with horrible limitations. Your example code is only 2 million and 1 strings (m = 2000000). ie if you wish to handle all 256 combinations for each byte then your counting array is 64 MB. By the time you get to a 9 byte string that array is 1099 Gig! I am not sure what you count as sorting time as the major portion of the math is done in the file reading and counting section.
              Three tests on my old AMD Sempron with the only modifications to your code being message box outputs as I use PBWIN 8.03 gave the following total times which are the true times as they exclude the random array creations
              2.808 secs, 2.355 secs and 2.667 secs.
              You cannot compare to PB's excellent array sort as that has to handle too many options so I botched together a simple Quick Sort for a 4 character string which by its nature handles all 256 byte combinations and with simple modifications does not run into the array size limitation. Three tests on it give 2.1557 secs, 2.1557, and 2.1411 secs. Not particularly optimised as I should be switching to a Schell or Radix sort when the stack groupings get small and of course this algorithem (the assembler version of which was posted by Hutch here some years ago) is can benefit by threads on a multi core CPU
              My appologies to all for the somewhat unreadable code as I just threw together a combination of Guy's code and some of my standard code.
              Code:
              #COMPILE EXE
              DEFLNG a-z
              FUNCTION PBMAIN () AS LONG
              L=65: U=90   ' Lower and Higher ASC value to be sorted between A and Z for now
              LOCAL timer1 AS SINGLE
              LOCAL timer2 AS SINGLE
              LOCAL f1 AS LONG
              LOCAL f2 AS LONG
              LOCAL x AS LONG
              LOCAL y AS LONG
              LOCAL m AS LONG ' Maximum number of keys to be sorted
              LOCAL dwpi AS DWORD PTR
              LOCAL dspi AS BYTE PTR
              LOCAL stp AS BYTE PTR
              LOCAL sap AS BYTE PTR
                  m = 2000000
              REDIM IndP(m) AS DWORD
              REDIM SortArray((m + 1) * 4 - 1) AS BYTE
                  GOSUB MakeTrs
                  timer1 = TIMER
                  dspi = VARPTR(IndP(0))
                  sap = VARPTR(SortArray(0))
                  FOR x = 0 TO m
                      indp(x) = sap
                      stp = STRPTR(test$(x))
                      FOR y = 0 TO 3
                          @sap[3 - y] = @stp[y]
                      NEXT
                      sap = sap + 4
                  NEXT
                  GOSUB asort
                  sap = VARPTR(SortArray(0))
                  dspi = VARPTR(IndP(0))
                  FOR x = 0 TO m
                      stp = STRPTR(test$(x))
                      FOR y = 0 TO 3
                          dspi = indp(x)
                          @stp[y] = @dspi[3 - y]
                      NEXT
                  NEXT
                  timer2 = TIMER
                  ? "sort time =" & STR$(timer2 - timer1)
              
              EXIT FUNCTION
              
              asort:
                  REDIM  qstk(m^.5 + 5) AS LONG         'stack array
                  first& = 0
                  last& = m
                  cntr& = 0
                  DIM hmem AS LONG PTR
                  DIM dwp1 AS DWORD PTR
                  DIM dwp2 AS DWORD PTR
                  hmem = VARPTR(qstk(0))
                  dwp1 = VARPTR(IndP(0))
                  Complen& = 1
                  ! PUSHAD
                  ! MOV esi, dwp1              ; source address IN ESI
              outer_loop:
                  ' -------------------
                  ' calculate midpoint
                  ' -------------------
                  ! MOV eax, last&
                  ! ADD eax, first&
                  ! SHR eax, 1
                  ! MOV esi, dwp1
                  ' =========================
                  ! MOV ebx, [esi+eax*4]      ; midpoint IN EBX
                  ! mov dwp2, ebx
                  ' =========================
                  ! CLD
                  ! MOV ecx, first&
                  ! MOV edx, last&
                  ' ---------------------------------------------------------
                  inner_loop:
                  ! MOV esi, dwp1
                  ! MOV edi, ebx
                  ! MOV esi, [esi + ecx*4]
                  ! MOV eax, complen&
                  LessComp:
                  ! CMPSD
                  ! JA wl2
                  ! JB lc1
                  ! DEC eax
                  ! JNZ LessComp
                  ! JMP wl2
                  lc1:
                  ! INC ecx
                  ! JMP inner_loop
                  wl2:
                  ! MOV esi, dwp1
                  ! MOV edi, ebx
                  ! MOV esi, [esi + edx*4]
                  ! MOV eax, complen&
                  GreatComp:
                  ! CMPSD
                  ! JB wl2out
                  ! JA gc1
                  ! DEC eax
                  ! JNZ GreatComp
                  ! JMP wl2out
                  gc1:
                  ! DEC edx
                  ! JMP wl2
                  wl2Out:
                  ! MOV esi, dwp1
                  ! CMP ecx, edx              ; IF ecx > edx, EXIT inner LOOP
                  ! JG exit_innerx
                  ' =========================
                  ! MOV eax, [esi+ecx*4]
                  ! MOV ebx, [esi+edx*4]      ; SWAP elements
                  ! MOV [esi+ecx*4], ebx
                  ! MOV [esi+edx*4], eax
                  ' =========================
                  ! mov ebx, dwp2
                  ! INC ecx
                  ! DEC edx
                  ! CMP ecx, edx
                  ! JLE inner_loop
                  exit_innerx:
                  ' ---------------------------------------------------------
                  ! CMP ecx, last&             ; IF ecx < Last jump over
                  ! JG iNxt
                  ' =========================
                  ! MOV edi, hmem
                  ! MOV eax, cntr&
                  ! MOV [edi+eax*4], ecx
                  ! MOV ebx, last&
                  ! MOV [edi+eax*4+4], ebx
                  ' =========================
                  ! ADD cntr&, 2
                  iNxt:
                  ! MOV last&, edx             ; Last  = EDX
                  ! CMP edx, first&            ; compare Last & First
                  ! JG outer_loop
              
              ! CMP cntr&, 0
              ! JZ qsOut
              ! SUB cntr&, 2
              ' =========================
              ! MOV edi, hmem
              ! MOV eax, cntr&
              ! MOV ebx, [edi+eax*4]
              ! MOV first&, ebx
              ! MOV ebx, [edi+eax*4+4]
              ! MOV last&, ebx
              ' =========================
              ! JMP outer_loop
              
              qsOut:
                  ! POPAD
                  ERASE qstk
              RETURN
              MakeTrs:REM Big random 4 byte alpha key array...
               REDIM test$(m), t1(m),t2(m),t3(m),t4(m)
               FOR y=0 TO m: t1(y)=RND(L,U):t2(y)=RND(L,U): t3(y)=RND(L,U):t4(y)=RND(L,U): NEXT
               FOR x=0 TO m
                   test$(x)=CHR$(t1(x),t2(x),t3(x),t4(x))
               NEXT
               ERASE t1(),t2(),t3(),t4()
              RETURN
              
              END FUNCTION

              Comment


              • #8
                Perhaps a good testing tool for this application?
                Add Process Memory Usage Report to any program 1-12-04
                Michael Mattias
                Tal Systems (retired)
                Port Washington WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #9
                  Originally posted by Michael Mattias View Post
                  With a four byte key, you could let PB sort it as an integer rather than a string. Five bucks says PB's ARRAY SORT can handle 16M random long integers right quick.
                  MCM
                  Sorry MCM that would not work as Intel architecture is little endian which is why ARRAY SORT for strings is so much slower than for longs or dwords.
                  John

                  Comment


                  • #10
                    You flip-flop the bytes when you load the file.. ie, the 'integer' becomes simply a 'key' array which is sorted with the original array as a TAGARRAY.

                    In examples above, the 'load up' time is presumed equal; but of course would not be were you to do this.

                    Forget I brought it up*. Instead, plug in that memory usage report with various key sizes and valid value ranges.

                    MCM
                    * I am a firm believer that if you don't ever put forth a bad idea, you're not generating enough ideas.
                    Michael Mattias
                    Tal Systems (retired)
                    Port Washington WI USA
                    [email protected]
                    http://www.talsystems.com

                    Comment


                    • #11
                      Big memory needed

                      John you objected to the ram size needed for my sort :

                      When I invented the algorithm in 1980 with a 8 bit computer with 32k ram I would never in my wildest dreams have tought we could have gigabyte worth of memory like today.
                      Don't you think than in a few more years, will have a lot more ?

                      You also compared it with a assembler sort using a 2 millions test and you came about even.
                      Of course, with small sample like this, the speed difference is not much and even the regular array sort is not far behind.
                      But if you try a bigger sample, you will see that the bigger it is, the more difference in speed advantage will be seen for my SmartSort.

                      Also you nitpick about the time spent counting when reading the original data on disk but don't you think it is very good for part of the work to be done while doing a slower process that you will have to use anyway ?
                      That part does not increase the total reading time by much but it will save time later.

                      In the 16 millions sample, even with the counting time, I am still 3 times faster than ARRAY SORT !
                      And, that sort would also have to read the data on disk but it get a free pass here in that test.

                      As for that yacht, my small skiff is enough for my fishing needs
                      Attached Files
                      Old QB45 Programmer

                      Comment


                      • #12
                        Well, perhaps you should take into account that the SmartSort is relying upon fixed length keys... each is always 4-bytes here. Yet, the PowerBASIC Array Sort is handling variable length strings. Many extra compares needed. You might want to try adding:

                        LOCAL Test() AS STRING * 4

                        and see what happens to the execution timings when PowerBASIC Array Sort is given that identical assurance. Of course, since the ARRAY SORT is intended as a general purpose, programmable sort, it requires numerous additional comparisons and branches to account for starting position, limited comparison length, collate alteration, direction, etc.

                        Best regards,

                        Bob Zale

                        Comment


                        • #13
                          Originally posted by Michael Mattias View Post
                          You flip-flop the bytes when you load the file.. ie, the 'integer' becomes simply a 'key' array which is sorted with the original array as a TAGARRAY.

                          In examples above, the 'load up' time is presumed equal; but of course would not be were you to do this.

                          Forget I brought it up*. Instead, plug in that memory usage report with various key sizes and valid value ranges.

                          MCM
                          * I am a firm believer that if you don't ever put forth a bad idea, you're not generating enough ideas.
                          I didn't say it was a bad idea, in fact if you had read the code in my post (poorly documented) you would have noticed that I do reverse the 4 bytes of the test data and compare as DWORDS using CMPSD. This is why at 2000001 items it results in a sort time half that ARRAY SORT.
                          John

                          Comment


                          • #14
                            Guy
                            You compare Apples with Oranges, you don't return the sorted array in your timing only some arrays from which it can be generated, to test it you only finalise for 100 entries with
                            "FOR t=0 TO 99: PRINT CHR$(jrs(t,0)+L);CHR$(jrs(t,1)+L);CHR$(jrs(t,2)+L);CHR$(jrs(t,3)+L);" ";: NEXT "
                            It is not nit picking. The first portion of time you show "reading the disk" is 0 for reading and 100% for the major calculation needed in the method. Both Array Sort and my code return the sorted data in the original array, you must do this step and add the time for it in as well if you wish to compare.
                            John

                            Comment


                            • #15
                              you would have noticed that I do reverse the 4 bytes of the test data
                              I assume this is the reversing of the bytes?
                              Code:
                                      FOR y = 0 TO 3
                                          @sap[3 - y] = @stp[y]
                                      NEXT
                              BSWAP is your friend.
                              From the Intel manual:
                              BSWAP—Byte Swap
                              Description
                              Reverses the byte order of a 32-bit (destination) register: bits 0 through 7 are swapped with bits
                              24 through 31, and bits 8 through 15 are swapped with bits 16 through 23. This instruction is
                              provided for converting little-endian values to big-endian format and vice versa.
                              Opcode Instruction Description
                              0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register.

                              Comment


                              • #16
                                Paul
                                You are correct on both counts. Lazyness on my part just quickly modded some existing code for sorting UDT fields where the fields can be any type, double singles, text etc and the fields can be mixed as to ascending or descending so a little more than byteswapping is needed.
                                John

                                Comment


                                • #17
                                  Adding LOCAL Test() AS STRING * 4

                                  Bob,

                                  You are right and that little change make ARRAY SORT about 3 times faster.
                                  But, it is still slower than my actual sorting time.
                                  You will notice also that my reading and counting time has almost doubled ?
                                  It would indicate that the ASC value command gets a lot slower with a fixed string ? How can that be ?

                                  I am the first to admit that it is unfair to compare a single special sort with your excellent Array Sort who is testing a lot more thing than mine.
                                  But the purpose of that test was to demonstrate a new kind of algorithm that I think had never been published before.
                                  I would like to see if anybody can make a faster code, using only normal PB command, to sort the very same 16 millions 4 bytes keys. Anybody want to try ?

                                  And John, I did not complete only the first 100 key as I use them this way in my lookup code and this is the real end product.
                                  That algorithm does not need any kind of tree or hash code and need only a few lines of code.
                                  It use a simple index showing the first and last instance of each character in each columns and can be validated as each letter is entered from the keyboard.
                                  It also work just as well for unique or multiple keys.



                                  Here is the new result
                                  Attached Files
                                  Old QB45 Programmer

                                  Comment


                                  • #18
                                    Originally posted by Bob Zale View Post
                                    Well, perhaps you should take into account that the SmartSort is relying upon fixed length keys... each is always 4-bytes here. Yet, the PowerBASIC Array Sort is handling variable length strings. Many extra compares needed. You might want to try adding:

                                    LOCAL Test() AS STRING * 4

                                    and see what happens to the execution timings when PowerBASIC Array Sort is given that identical assurance. Of course, since the ARRAY SORT is intended as a general purpose, programmable sort, it requires numerous additional comparisons and branches to account for starting position, limited comparison length, collate alteration, direction, etc.

                                    Best regards,

                                    Bob Zale
                                    Bob
                                    I am in awe, I have obviously always held your compiler in high regard which is why I bought it for so many years but to find out how much it can write unique code to match the programmers inputs is amazing. Sorting has for me been an interest for 35 years and would have enjoyed discussing what a compiler could do and your current sort abilities while understanding it is proprietry information.
                                    What can I say but WOW.
                                    John

                                    Comment


                                    • #19
                                      Guy
                                      Please take this in the posative way my comments have always been intended. As I said in an earlier post it is a cute piece of code and I complement you on it but I worry that you are misleading some of our younger viewers that are not taught such basics as sorting algorithms. It is hard without wasting more time to decide which of Radix, Bucket, Check or Rapid method would correctly describe it as none of those have the serious memory constraints but it is certainly not unique. Yes it is an L class sort which with limited k's can be faster than a quicksort (L = linear by n number of values to sort and linear by k number of characters to be sorted).
                                      You have thowen out a challange for someone to be faster when by your own post Array Sort is much faster. You are confused by your own terminology, yes counting, bucket filling etc are normally called setup time and far exceed what is often referred to as sort time, but of course none of the methods work without that major portion of the math so in reality total time is only ever considered.
                                      You then bring in indexing and hashes which is a totally different subject.
                                      John

                                      Comment

                                      Working...
                                      X