Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

Slot Sort #2 – twice as fast as PB string sort

  • Filter
  • Time
  • Show
Clear All
new posts

  • PBWin Slot Sort #2 – twice as fast as PB string sort

    On random strings (Asc(32) to Asc(127)); 3 to 50 chars: it is consistently twice as fast.
    That only amounts to a split-second on 100,000 strings.

    (Asc(32) to Asc(127)); 3 to 2,000 chars: 4 to 5 times faster.

    This sort really shines on very long strings.
    Long strings reach their final destination quickly.
    If you were sorting millions of strings, up to thousands of characters wide, this sort would be worth considering.

    Slot Sort:
    Works sideways through the strings in an array, at a given character depth.
    Now makes 3 passes and uses a temporary array. (two passes in previous version)
    Pass 1: get count for each type of character (0 – 255)
    Determines where each slot begins: slot 0 … slot 255
    Pass 2: copies string pointer into temporary array in proper slot
    Pass 3: copies string pointers back into original array
    It’s actually faster to use a temporary array, and extra pass, than swapping pointers in place. No decision loops.
    All strings in slot zero have reached their final destination. Null on end of string.
    All slots containing only one string have reached their final destination. (that’s why the sort shines on really long string)
    Recursively sort all remaining slots on next character down.

    The sort only works down far enough to find a set of matching strings, or a single unique string.

    Note: this is just a test of concept. This version doesn’t test for null strings in the array to be sorted. A null string will cause a crash. Arr(i) = “”

    #PBForms CREATED V1.51
    'pbwin 9
    $TestSource = "SlotSort.bas"
    $TestTitle = "Slot Sort Test"
    #Compile Exe "SlotSort.exe"
    #Dim All
    #Optimize Speed
        'Slot Sort - new sort algorithm
        '   constantly about twice as fast as PB array sort
        '   (about 50% to 75% faster on "A" ~ "z"; 65 to 122)
        '   mix of short and really long strings: 4 to 5 times faster (random strings 3 to 2,000 chars)
        '   Note: this is not suggested as an alternative to PB sort
        '       only posted to test the concept
        '   this is a recursive radix type sort
        '   - now makes three passes
        '   - uses two temporary arrays
        '   Passes:
        '       1. get count for each different kind of character, at a given depth, for each string in array
        '           1a. build slot information - where each string belong in array
        '       2. goes through array and stores each string ptr in temp array in proper slot
        '       3. copies string pointers back into original array
        '               now in sorted order for current depth
        '           3a. recursively sort each slot
        '               skip slot zero (end-of-string) now in final position
        '               skip slots with only one string - in final position
        '   currently, doesn't test for null strings in test array: arr(i) = ""
    #If Not %Def(%WINAPI)
        #Include Once "WIN32API.INC"
    #Include Once "PBForms.INC"
    %Dlg1    =  101
    %BtnTest = 1002
    %Lbx1    = 1001
    Declare CallBack Function ShowDlg1Proc()
    Declare Function ShowDlg1(ByVal hParent As Dword) As Long
    Global lbx As LBxI
    Global tmr As TimerI
    Function PBMain()
        ShowDlg1 %HWND_Desktop
    End Function
    Type SlotSortT
        slotCount(255) As Long
        slotStart(254) As Long
        slotEnd(255) As Long
    End Type
    Sub SlotSort(arr() As String)
        Local pArr, pArr2 As Long Ptr
        Local dummy(), charIndex() As Long
        Local lb, ub, Count, depth As Long
        lb = LBound(arr)
        ub = UBound(arr)
        Count = ub - lb + 1
        If Count > 1 Then
            pArr = VarPtr(arr(lb))
            ReDim dummy(lb To ub)
            pArr2 = VarPtr(dummy(lb))
            ReDim charIndex(Count - 1)
            depth = 0
            SlotSortR pArr, pArr2, charIndex(), depth, 0, Count - 1
        End If
    End Sub
    Sub SlotSortR(ByVal pArr As Long Ptr, ByVal pArr2 As Long Ptr, charIndex() As Long, ByVal depth As Long, ByVal leftSide As Long, ByVal rightSide As Long)
        Register i As Long
        Register char As Long
        Local lastSlotStart, lastSlotCount As Long
        Local ss As SlotSortT
        'Pass #1: get slot counts
        For i = rightSide To leftSide Step -1
            'Incr ss.slotCount(Peek(Byte, @pArr[i] + depth))
            charIndex(i) = Peek(Byte, @pArr[i] + depth)
            Incr ss.slotCount(charIndex(i))
        Next i
        'build slot info
        lastSlotStart = leftSide
        lastSlotCount = 0
        For char = 0 To 255
            If ss.slotCount(char) Then
                ss.slotStart(char) = lastSlotStart + lastSlotCount
                ss.slotEnd(char) = ss.slotStart(char) - 1
                lastSlotStart = ss.slotStart(char)
                lastSlotCount = ss.slotCount(char)
            End If
        Next char
        'Pass #2: store each string in proper slot
        For i = rightSide To leftSide Step -1
            'char = Peek(Byte, @pArr[i] + depth)
            Incr ss.slotEnd(charIndex(i))
            @pArr2[ss.slotEnd(charIndex(i))] = @pArr[i]
        Next i
        'Pass #3: restore slots to original array (string pointers)
        For i = rightSide To leftSide Step -1
            @pArr[i] = @pArr2[i]
        Next i
        'recursively sort each slot on next character down
        '   we skip zero - null on end of string
        '       those strings have reached their final destination
        For char = 255 To 1 Step -1
            'if a slot only has one string - no need to sort - string in final destination
            If ss.slotCount(char) > 1 Then
                SlotSortR pArr, pArr2, charIndex(), depth + 1, ss.slotStart(char), ss.slotEnd(char)
            End If
        Next char
    End Sub
    Sub BtnTest(ByVal hDlg As Long)
        Local i, j As Long
        Local a(), b() As String
        Local s As String
        Local pbTime, ssTime As Double
        Local pbTix, ssTix As Quad
        'test settings
        Local Count As Long          : Count = 100000        'number of random strings to build
        Local shortestString As Long : shortestString = 10   'shortest random string
        Local longestString As Long  : longestString = 50    'longest random string
        Local lowestChar As Long     : lowestChar = 32       'ASC() smallest random character
        Local highestChar As Long    : highestChar = 127     'ASC() largest random character
        'if Count <= 500 then random strings will be displayed
        '   any Count < 100,000 is to small to show any difference
        ReDim a(1 To Count)
        ReDim b(1 To Count)
        lbx.Add("fill arrays")
        For i = 1 To Count
            s = ""
            For j = 1 To Rnd(shortestString, longestString)
                s += Chr$(Rnd(lowestChar, highestChar))
            Next j
            a(i) = s
            b(i) = s
        Next i
        lbx.Add("PB Sort")
        Tix pbTix
        Array Sort a()
        Tix End pbTix
        pbTime = tmr.GetTime
        lbx.Add("Slot Sort")
        Tix ssTix
        SlotSort b()
        Tix End ssTix
        ssTime = tmr.GetTime()
        lbx.Add("see if anything out of order")
        For i = 1 To Count
            If a(i) <> b(i) Then
                ? "out of order at: " + Str$(i)
                Exit For
            End If
        Next i
        lbx.Add("Tix PB: " + Format$(pbTix))
        lbx.Add("Tix SS: " + Format$(ssTix))
        lbx.Add("TIme PB: " + Format$(pbTime, "####.##########"))
        lbx.Add("Time SS: " + Format$(ssTime, "####.##########"))
        lbx.Add("Slot Sort is " + Format$(pbTime/ssTime) + " times faster than PB sort")
        If Count <= 500 Then
            For i = 1 To Count
                lbx.Add("PB: = '" + a(i) +"'")
                lbx.Add("SS: = '" + b(i) +"'")
            Next i
        End If
    End Sub
    CallBack Function ShowDlg1Proc()
        Select Case As Long CbMsg
            Case %WM_InitDialog
                 lbx = Class "LBxC"
                 lbx.INI(Cb.Hndl, %Lbx1)
                 tmr = Class "TimerC"
            Case %WM_NCActivate
                Static hWndSaveFocus As Dword
                If IsFalse CbWParam Then
                    hWndSaveFocus = GetFocus()
                ElseIf hWndSaveFocus Then
                    hWndSaveFocus = 0
                End If
            Case %WM_Command
                Select Case As Long CbCtl
                    Case %BtnTest
                        If CbCtlMsg = %BN_Clicked Or CbCtlMsg = 1 Then
                        End If
                End Select
        End Select
    End Function
    Function ShowDlg1(ByVal hParent As Dword) As Long
        Local lRslt  As Long
    #PBForms BEGIN DIALOG %Dlg1->->
        Local hDlg   As Dword
        Local hFont1 As Dword
        Dialog New hParent, $TestTitle, 67, 61, 341, 241, %WS_Popup _
            Or %WS_Border Or %WS_DlgFrame Or %WS_Caption Or %WS_SysMenu Or _
            %WS_MinimizeBox Or %WS_ClipSiblings Or %WS_Visible Or %DS_ModalFrame _
            Or %DS_3DLook Or %DS_NoFailCreate Or %DS_SetFont, _
            %WS_Ex_ControlParent Or %WS_Ex_Left Or %WS_Ex_LtrReading Or _
            %WS_Ex_RightScrollbar, To hDlg
        Control Add ListBox, hDlg, %Lbx1, , 5, 5, 330, 210, %WS_Child Or _
            %WS_Visible Or %WS_HScroll Or %WS_VScroll Or %LBS_Notify Or _
            %LBS_NoIntegralHeight, %WS_Ex_ClientEdge Or %WS_Ex_Left Or _
            %WS_Ex_LtrReading Or %WS_Ex_RightScrollbar
        Control Add Button,  hDlg, %BtnTest, "Test", 275, 220, 60, 15
        hFont1 = PBFormsMakeFont("Courier New", 9, 400, %FALSE, %FALSE, %FALSE, _
        Control Send hDlg, %Lbx1, %WM_SETFONT, hFont1, 0
        Dialog Show Modal hDlg, Call ShowDlg1Proc To lRslt
    #PBForms BEGIN CLEANUP %Dlg1
        DeleteObject hFont1
        Function = lRslt
    End Function
    Class LBxC
        Instance meHDlg As Long
        Instance meID As Long
        Interface LBxI
            Inherit IUnknown
            Method INI(ByVal hDlg As Long, ByVal Id As Long)
                meHDlg = hDlg
                meID = Id
            End Method
            Method SetHorizontal(ByVal Count As Long)
                Local hCntrl&
                Control Handle meHDlg, meID To hCntrl&
                SendMessage hCntrl&, %LB_SETHORIZONTALEXTENT, Count, 0
            End Method
            Method Clear(Opt doEventsCount As Long)
                ListBox Reset meHDlg, meID
                If VarPtr(doEventsCount) Then me.DoEventsCount(doEventsCount)
            End Method
            Method Add(ByVal s As String, Opt doEventsCount As Long) As String
                ListBox Add meHDlg, meID, s
                If VarPtr(doEventsCount) Then me.DoEventsCount(doEventsCount)
            End Method
        End Interface
        Class Method DoEventsCount(ByVal Count As Long)
            Local i As Long
            For i = 1 To Count
                Dialog DoEvents
            Next i
        End Method
    End Class
    Class TimerC
        Instance startTime As Double
        Instance stopTime As Double
        Interface TimerI
            Inherit IUnknown
            Method Start()
                startTime = Timer
            End Method
            Method Stop()
                stopTime = Timer
            End Method
            Method Get() As String
                Method = "    Time: " + Format$(stopTime - startTime, "#####.###############")
            End Method
            Method GetTime() As Double
                Method = stopTime - startTime
            End Method
        End Interface
    End Class
    Dead Theory Walking
    Range Trie Tree
    HLib ~ Free Data Container Lib ~ Arrays, Lists, Stacks, Queues, Deques, Trees, Hashes