Announcement

Collapse
No announcement yet.

Binary Search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binary Search

    I posted this in the Source Code forum, but wanted to discuss it here.

    I needed a binary search and was surprised that despite lots of hits on the search term, the actual code examples were far fewer.

    Perhaps that's because it's thought to be so simple? But despite the relatively few lines of code in the result, I managed to spend far more time than I expected to get the final code (I won't admit how much time!).

    So, my next step would be to speed it up even more. But I knew that if I posted the code, someone would say "Hey, I did this back in '85 and it still works - better than that turtle code Gary posted!"

    One more thing - in cases where the search term was not found, I want to know where the search term "would have" been. It lets me get suggested terms when I use the binary search in spell check code.

    Code:
    Function BinaryWordSearch(sWord As String, iPos&) As Long
        'search for sWord$ in WordList(), which is a Global array
        'return 1 if found, 0 otherwise
        'iPos& is UBound(WordList) + 1 if searchterm > all values in array
        'iPos& is -1 if searchterm < all values in array
        'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
        Local Upper As Long, Lower As Long
        Lower = LBound(WordList) : Upper = UBound(WordList)
    
        'test boundary values
        If sWord = WordList(Lower) Then iPos& = Lower : Function = 1 : Exit Function
        If sWord = WordList(Upper) Then iPos& = Upper : Function = 1 : Exit Function
        If sWord < WordList(Lower) Then iPos& = Lower - 1 : Function = 0 : Exit Function
        If sWord > WordList(Upper) Then iPos& = Upper + 1 : Function = 0 : Exit Function
    
        Do Until (Upper <= (Lower+1))
            iPos& = (Lower + Upper) / 2
            Select Case sWord
               Case > WordList(iPos&) :  Lower = iPos&
               Case < WordList(iPos&) :  Upper = iPos&
               Case WordList(iPos&)   :  Function = 1 : Exit Function
            End Select
        Loop
    End Function
    And here's a compilable example.
    Code:
    'Compilable Example:
    'This example declares sWord as STRING, search for a match in a sorted string array. 
    'Also, be aware that in code such as this, which uses the greater than or less than 
    'symbols, that string comparisons are being made and that "0600" will come after "06"
    '(the numeric values of 600 and 6 are NOT used to make the comparison).
    #Compile Exe
    #Dim All
    #Include "Win32API.inc"
    Global hDlg as Dword, WordList() As String
    Function PBMain() As Long
       Local i As Long
       ReDim WordList(1000) 
       For i = 0 to 700 : WordList(i) = Format$(i,"00000") : Next i    'these two lines leave out "00701"
       For i = 701 to 1000 : WordList(i) = Format$(i+1,"00000") : Next i
       Dialog New Pixels, 0, "Test Code",300,300,200,200, %WS_OverlappedWindow To hDlg
       Control Add Button, hDlg, 100,"Search", 50,10,100,20
       Control Add TextBox, hDlg, 200,"00000", 50,35,100,20
       Dialog Show Modal hDlg Call DlgProc
    End Function
    
    CallBack Function DlgProc() As Long
       If CB.Msg = %WM_Command AND CB.Ctl = 100 AND CB.Ctlmsg = %BN_Clicked Then
          Local sWord$, iPos&, i As Long
          Control Get Text hDlg, 200 To sWord$
          If BinaryWordSearch(sWord$, iPos&) Then
             MsgBox sWord$ + " was found at array position " + Str$(iPos&) + "."
          Else
             MsgBox sWord$ + " was not found.  It should have been at position " + Str$(iPos&) + "."
          End If
          If LinearWordSearch(sWord$, iPos&) Then
             MsgBox sWord$ + " was found at array position " + Str$(iPos&) + "."
          Else
             MsgBox sWord$ + " was not found.  It should have been at position " + Str$(iPos&) + "."
          End If
          SpeedTest sWord$
       End If
    End Function
    
    Function BinaryWordSearch(sWord As String, iPos&) As Long
        'search for sWord$ in WordList(), which is a Global array
        'return 1 if found, 0 otherwise
        'iPos& is UBound(WordList) + 1 if searchterm > all values in array
        'iPos& is -1 if searchterm < all values in array
        'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
        Local Upper As Long, Lower As Long
        Lower = LBound(WordList) : Upper = UBound(WordList)
    
        'test boundary values
        If sWord = WordList(Lower) Then iPos& = Lower : Function = 1 : Exit Function
        If sWord = WordList(Upper) Then iPos& = Upper : Function = 1 : Exit Function
        If sWord < WordList(Lower) Then iPos& = Lower - 1 : Function = 0 : Exit Function
        If sWord > WordList(Upper) Then iPos& = Upper + 1 : Function = 0 : Exit Function
    
        Do Until (Upper <= (Lower+1))
            iPos& = (Lower + Upper) / 2
            Select Case sWord
               Case > WordList(iPos&) :  Lower = iPos&
               Case < WordList(iPos&) :  Upper = iPos&
               Case WordList(iPos&)   :  Function = 1 : Exit Function
            End Select
        Loop
    End Function
    
    Function LinearWordSearch(sWord As String, iPos&) As Long
        'search for sWord$ in WordList(), which is a Global array
        'return 1 if found, 0 otherwise
        'iPos& is UBound(WordList) + 1 if searchterm > all values in array
        'iPos& is -1 if searchterm < all values in array
        'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
        Local i As Long
        For i = 0 to UBound(WordList)
            Select Case sWord
               Case > WordList(i)    'no action, keep looping  
               Case < WordList(i) :  iPos& = i - 1 : Function = 0 : Exit Function 
               Case WordList(i)   :  iPos& = i : Function = 1 : Exit Function
            End Select
        Next i
        iPos& = i
    End Function
    
    Sub SpeedTest (sWord As String)
       'speed test - Binary
       Dim iStart As Long, iEnd As Long, i As Long, iPos&
       iStart = GetTickCount
       For i = 1 To 10000 : BinaryWordSearch(sWord,iPos&) : Next i
       iEnd = GetTickCount
       MsgBox "Binary:   " + Format$((iEnd - iStart)/1000,3) & " seconds"
       'speed test - Linear
       iStart = GetTickCount
        For i = 1 To 10000 : LinearWordSearch(sWord,iPos&) : Next i
       iEnd = GetTickCount
       MsgBox "Linear:   " + Format$((iEnd - iStart)/1000,3) & " seconds"
    End Sub
    
    'gbs_00396

  • #2
    Originally posted by Gary Beene View Post
    I needed a binary search ...
    Why did it have to be binary?

    Comment


    • #3
      Gary, I'm getting "found record positions" that are one less than their actual positions, eg. 00543 said it was found at position 543, but it is actually at position 544. All that's necessary to fix it I think, is to increment each found position by 1.

      added: Wait, you seem to have a zero based found position, so in that case, forget that increment idea.
      Last edited by John Gleason; 16 Oct 2009, 12:01 PM.

      Comment


      • #4
        Chris,

        Nothing special about binary, it's just an approach that works better than a linear one - plus it's pretty simple code (well, now it is, that I got past my bugs) and not much maintenance involved.

        I'm happy to see other options.

        Pointers, assembly langauge, other algorithms?

        I have a working spell checker now for gbSnippets, and with a library of 20K words, even a linear search seems fast. But I wanted faster because I'd like to support 100K size word libraries.

        Comment


        • #5
          I made a one million word dictionary, and it finds records by your binary search in .00003 seconds on my slow laptop. That's pretty dang fast. You could maybe do an integer rather than string search for a little more speed, but wow, that's pretty dang fast.

          Comment


          • #6
            that works better than a linear one
            That's for sure. It works about 10,000 times better on a big million word dictionary. I think you can safely say binary is the way to go.

            Comment


            • #7
              John,
              Thanks for trying the bigger size array. Those results should let me spell check any gbSnippets snippet plenty fast enough to avoid a user complaint about the spell check time!

              Comment


              • #8
                Gary, as fast as it is, the little tiny SELECT CASE structure "bothered" me a bit, so I converted it to IF...THEN...ELSE and, drum roll... 3x faster. For such a little change in code for that kind of speed, I had to mention it.
                Code:
                FUNCTION BinaryWordSearch(sWord AS STRING, iPos&) AS LONG
                    'search for sWord$ in WordList(), which is a Global array
                    'return 1 if found, 0 otherwise
                    'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                    'iPos& is -1 if searchterm < all values in array
                    'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                    LOCAL Upper AS LONG, Lower AS LONG
                    Lower = LBOUND(WordList) : Upper = UBOUND(WordList)
                
                    'test boundary values
                    IF sWord = WordList(Lower) THEN iPos& = Lower : FUNCTION = 1 : EXIT FUNCTION
                    IF sWord = WordList(Upper) THEN iPos& = Upper : FUNCTION = 1 : EXIT FUNCTION
                    IF sWord < WordList(Lower) THEN iPos& = Lower - 1 : FUNCTION = 0 : EXIT FUNCTION
                    IF sWord > WordList(Upper) THEN iPos& = Upper + 1 : FUNCTION = 0 : EXIT FUNCTION
                
                    DO UNTIL (Upper <= (Lower+1))
                        iPos& = (Lower + Upper) / 2
                        IF sWord > WordList(iPos&) THEN
                           Lower = iPos&
                        ELSEIF sWord < WordList(iPos&) THEN
                           Upper = iPos&
                        ELSE
                           FUNCTION = 1 : EXIT FUNCTION
                        END IF
                    LOOP
                END FUNCTION

                Comment


                • #9
                  Way to go John!

                  Yes, little change for big result - with no penalty of understandability - nice!


                  (is understandability a word? seems right in this case)

                  Comment


                  • #10
                    Very cool, too, in that this next function of mine - the linear search version - uses
                    the same structure. So I can change this in the same fashion as you just posted. Plus,
                    I can eliminate the "continue, no action" block (should have done it already, but it made
                    it more readable). So even the slowest approach gets a boost.

                    Code:
                    Function LinearWordSearch(sWord As String, iPos&) As Long
                        'search for sWord$ in WordList(), which is a Global array
                        'return 1 if found, 0 otherwise
                        'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                        'iPos& is -1 if searchterm < all values in array
                        'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                        Local i As Long
                        For i = 0 to UBound(WordList)
                            Select Case sWord
                               Case > WordList(i)    'no action, keep looping  
                               Case < WordList(i) :  iPos& = i - 1 : Function = 0 : Exit Function 
                               Case WordList(i)   :  iPos& = i : Function = 1 : Exit Function
                            End Select
                        Next i
                        iPos& = i
                    End Function
                    After changes, it becomes:

                    Code:
                    Function LinearWordSearch(sWord As String, iPos&) As Long
                        'search for sWord$ in WordList(), which is a Global array
                        'return 1 if found, 0 otherwise
                        'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                        'iPos& is -1 if searchterm < all values in array
                        'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                        Local i As Long
                        For i = 0 to UBound(WordList)
                            If sWord < WordList(i)  Then
                                 iPos& = i - 1 : Function = 0 : Exit Function 
                            ElseIf sWord =  WordList(i)  Then
                                 iPos& = i : Function = 1 : Exit Function
                            End If
                        Next i
                        iPos& = i
                    End Function
                    Re-usability is king!
                    Last edited by Gary Beene; 16 Oct 2009, 03:22 PM.

                    Comment


                    • #11
                      Originally posted by Gary Beene View Post
                      (is understandability a word? seems right in this case)
                      FWIW: Yes! understandability is a word, but clarity might be better.

                      Comment


                      • #12
                        , someone would say "Hey, I did this back in '85 and it still works - better than that turtle code Gary posted!"
                        Oh, come on. It wasn't 1985:
                        Binary Search of an array February 14 2000, July 15 2003

                        MCM
                        Michael Mattias
                        Tal Systems (retired)
                        Port Washington WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          I suspect that a recursive version will be slower, but here's one anyway - untested and possibly toxic.

                          Code:
                          '--------------------------------------------------------------------------
                          ' gword is a global string containing the word to search for
                          ' gwordlist is a global array of words to search
                          ' returns 0 if not found else index of word found in array
                          function chop (llow as long, lhigh as long) as long
                              local p as long
                              local s as string
                              local diff as long
                          
                              diff = lhigh-llow
                              p = low + diff
                              if lhigh <= llow then function = 0: exit function
                          
                              if gWord <> gWordList(p) then
                                  chop (llow, llow + ((diff)/2) )
                                  chop (llow + ((diff)/2), lhigh)
                              else
                                  function =  p
                              end if
                          end function
                          Last edited by Chris Holbrook; 16 Oct 2009, 04:23 PM.

                          Comment


                          • #14
                            >Nothing special about binary, it's just an approach that works better than a linear one

                            If a table has 'n' enties, then on average you will make n/2 string comparsions to determine if the item is the table. (n when not found, 1 when it's at the beginning)

                            But doing a binary search, the maximum number of string comparisons you will do is LOG2(n) + 1.


                            Of course there is no such thing as a free lunch:
                            1. The comparison table must be sorted, this is also not a free lunch.
                            2. Only tests for equality are valid - cannot use to test for greater than/less than.
                            3. You gain nothing unless you are checking against at least nine items.

                            MCM
                            Michael Mattias
                            Tal Systems (retired)
                            Port Washington WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Related:
                              If you are checking data you are collecting to make a list of the unique items found, you can use an array scan/binary search. However, maintaining an array in sorted order when adding items to it is tricky (unless you re-sort the entire list, which can be a loser if you have "lots" of items).

                              This might be a more efficient approach to building a list of 'unique' things found:
                              Count unique keys and occurrences thereof in sequential file

                              (Darned, did he use a DISK FILE and not RAM? Horrors!)

                              MCM
                              Michael Mattias
                              Tal Systems (retired)
                              Port Washington WI USA
                              [email protected]
                              http://www.talsystems.com

                              Comment


                              • #16
                                Another variation, with pointers.

                                Code:
                                'Compilable Example:
                                'This example declares sWord as STRING, search for a match in a sorted string array.
                                'Also, be aware that in code such as this, which uses the greater than or less than
                                'symbols, that string comparisons are being made and that "0600" will come after "06"
                                '(the numeric values of 600 and 6 are NOT used to make the comparison).
                                #Compile Exe
                                #Dim All
                                #Include "c:\pbwin90\winapi\Win32API.inc"
                                Global hDlg As Dword, WordList() As String
                                Function PBMain() As Long
                                   Local i As Long
                                   ReDim WordList(10000)
                                   For i = 0 To 7000 : WordList(i) = Format$(i,"00000") : Next i    'these two lines leave out "00701"
                                   For i = 7001 To 10000 : WordList(i) = Format$(i+1,"00000") : Next i
                                   Dialog New Pixels, 0, "Test Code",300,300,200,200, %WS_OverlappedWindow To hDlg
                                   Control Add Button, hDlg, 100,"Search", 50,10,100,20
                                   Control Add TextBox, hDlg, 200,"00000", 50,35,100,20
                                   Dialog Show Modal hDlg Call DlgProc
                                End Function
                                
                                CallBack Function DlgProc() As Long
                                   If Cb.Msg = %WM_Command And Cb.Ctl = 100 And Cb.CtlMsg = %BN_Clicked Then
                                      Local sWord$, iPos&, i As Long
                                      Local a$
                                      
                                      Control Get Text hDlg, 200 To sWord$
                                      a$ = "Binary  "
                                      If BinaryWordSearch(sWord$, iPos&) Then
                                         a$ = a$ + sWord$ + " was found at array position " + Str$(iPos&) + "."
                                      Else
                                         a$ = a$ +  sWord$ + " was not found.  It should have been at position " + Str$(iPos&) + "."
                                      End If
                                      a$ = a$ + $CrLf
                                      a$ = a$ + "Binary1 "
                                      If BinaryWordSearch1(sWord$, iPos&) Then
                                         a$ = a$ +  sWord$ + " was found at array position " + Str$(iPos&) + "."
                                      Else
                                         a$ = a$ +  sWord$ + " was not found.  It should have been at position " + Str$(iPos&) + "."
                                      End If
                                      a$ = a$ + $CrLf
                                      a$ = a$ + "Linear   "
                                      If LinearWordSearch(sWord$, iPos&) Then
                                         a$ = a$ +  sWord$ + " was found at array position " + Str$(iPos&) + "."
                                      Else
                                         a$ = a$ +  sWord$ + " was not found.  It should have been at position " + Str$(iPos&) + "."
                                      End If
                                      MsgBox a$
                                      SpeedTest sWord$
                                   End If
                                
                                End Function
                                
                                Function BinaryWordSearch(sWord As String, iPos&) As Long
                                    'search for sWord$ in WordList(), which is a Global array
                                    'return 1 if found, 0 otherwise
                                    'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                                    'iPos& is -1 if searchterm < all values in array
                                    'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                                    Local Upper As Long, Lower As Long
                                    Lower = LBound(WordList) : Upper = UBound(WordList)
                                
                                    'test boundary values
                                    If sWord = WordList(Lower) Then iPos& = Lower : Function = 1 : Exit Function
                                    If sWord = WordList(Upper) Then iPos& = Upper : Function = 1 : Exit Function
                                    If sWord < WordList(Lower) Then iPos& = Lower - 1 : Function = 0 : Exit Function
                                    If sWord > WordList(Upper) Then iPos& = Upper + 1 : Function = 0 : Exit Function
                                
                                    Do Until (Upper <= (Lower+1))
                                        iPos& = (Lower + Upper) / 2
                                        Select Case sWord
                                           Case > WordList(iPos&) :  Lower = iPos&
                                           Case < WordList(iPos&) :  Upper = iPos&
                                           Case WordList(iPos&)   :  Function = 1 : Exit Function
                                        End Select
                                    Loop
                                End Function
                                
                                Function BinaryWordSearch1(ByVal sWord As String,ByRef iPos As Long) As Long
                                    ' this method is about 10 times faster then the above, but it's really only
                                    ' good for where the string lengths are the same. If not then you'd have to 
                                    ' pad the shorter one & that might blow the time savings.
                                    
                                    'search for sWord$ in WordList(), which is a Global array
                                    'return 1 if found, 0 otherwise
                                    'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                                    'iPos& is -1 if searchterm < all values in array
                                    'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                                    Local Upper As Long, Lower As Long
                                    Local done&,l&
                                    Local p&,sptr As Byte Ptr, wptr As Byte Ptr
                                    sptr = StrPtr(sWord)
                                   
                                    Lower = LBound(WordList) : Upper = UBound(WordList)
                                
                                    'test boundary values
                                    If sWord = WordList(Lower) Then iPos& = Lower : Function = 1 : Exit Function
                                    If sWord = WordList(Upper) Then iPos& = Upper : Function = 1 : Exit Function
                                    If sWord < WordList(Lower) Then iPos& = Lower - 1 : Function = 0 : Exit Function
                                    If sWord > WordList(Upper) Then iPos& = Upper + 1 : Function = 0 : Exit Function
                                
                                    Do Until (Upper <= (Lower+1))
                                        GoSub Compare
                                        If done& Then Exit Do 
                                    Loop                     
                                    Function = done&
                                    Exit Function
                                    
                                Compare:
                                    wptr = StrPtr(wordlist(iPos&))
                                    l& = Len(sWord)
                                    Decr l&
                                    For p& = 0 To l&
                                        If @sptr[p&] > @wptr[p&] Then lower = ipos:Return
                                        If @sptr[p&] < @wptr[p&] Then upper = ipos:Return
                                    Next                    
                                    done& = 1
                                    Return
                                End Function
                                
                                Function LinearWordSearch(sWord As String, iPos&) As Long
                                    'search for sWord$ in WordList(), which is a Global array
                                    'return 1 if found, 0 otherwise
                                    'iPos& is UBound(WordList) + 1 if searchterm > all values in array
                                    'iPos& is -1 if searchterm < all values in array
                                    'otherwise, iPos is position found, or position where search term should have been found (immediately below 1st item greater)
                                    Local i As Long
                                    For i = 0 To UBound(WordList)
                                        Select Case sWord
                                           Case > WordList(i)    'no action, keep looping
                                           Case < WordList(i) :  iPos& = i - 1 : Function = 0 : Exit Function
                                           Case WordList(i)   :  iPos& = i : Function = 1 : Exit Function
                                        End Select
                                    Next i
                                    iPos& = i
                                End Function
                                
                                Sub SpeedTest (sWord As String)
                                   'speed test - Binary
                                    Dim i As Long, iPos&
                                    Local txq As Quad 
                                    Local txq1 As Quad 
                                    Local txq2 As Quad      
                                    MousePtr 11
                                
                                ' getickcount is handy, but not very accurate.  You can do the same job over and over
                                ' and get different results. Tix at least gives you a cpu count. Not time, but more 
                                ' accurate then gettickcount
                                '                    
                                    '
                                    tix txq
                                    For i = 1 To 1000 : BinaryWordSearch(sWord,iPos&) : Next i
                                    Tix End txq
                                      
                                    tix txq1
                                    For i = 1 To 1000 : BinaryWordSearch1(sWord,iPos&) : Next i
                                    Tix End txq1
                                    
                                    'speed test - Linear
                                    tix txq2
                                    For i = 1 To 1000 : LinearWordSearch(sWord,iPos&) : Next i
                                    
                                    Tix End txq2
                                    MousePtr 1
                                    MsgBox "Binary:   " + Format$(txq,"000,000,000,000") + " Cpu Clock Cycles" + $CrLf +_
                                        "Binary1: " + Format$(txq1,"000,000,000,000") + " Cpu Clock Cycles"+ $CrLf +_
                                        "Linear:   " + Format$(txq2,"000,000,000,000") + " Cpu Clock Cycles"
                                End Sub
                                "There are two novels that can change a bookish fourteen-year old's life: The Lord of the Rings and Atlas Shrugged. One is a childish fantasy that often engenders a lifelong obsession with its unbelievable heroes, leading to an emotionally stunted, socially crippled adulthood, unable to deal with the real world. The other, of course, involves orcs." - John Rogers

                                Comment


                                • #17
                                  You can't beat a Trie Tree for a dictionary.
                                  Find all 100,000 keys in a set of 100,000, in: 0.0189999 seconds

                                  Added:
                                  Also, prefix lookup. Which is just as fast.


                                  File Range Trie DB - world’s fastest local database?

                                  Note 2: applications like IntelliSense use a Trie.
                                  (don’t know if IntelliSense uses a Trie)
                                  Only one search is made on a prefix.
                                  All key words below the prefix in the tree, match the prefix.
                                  http://en.wikipedia.org/wiki/Trie

                                  .
                                  Last edited by Stanley Durham; 18 Oct 2009, 12:23 PM.
                                  stanthemanstan~gmail
                                  Dead Theory Walking
                                  Range Trie Tree
                                  HLib ~ Free Data Container Lib ~ Arrays, Lists, Stacks, Queues, Deques, Trees, Hashes

                                  Comment


                                  • #18
                                    While binary searches are cool, and very fast, I always had a problem when there were duplicate entries (say a number of invoices wirth the same date for example). While I haven't fooled wth it for years, I used to use a For/Next method of searching that was extremely fast at finding the FIRST item in a sorted series. Here's some sample code put together from memory.

                                    It finds (or not) a worst case number in unmeasurably quick time.

                                    ''For/Next search of sorted list
                                    '
                                    Code:
                                    'PBWIN 9.02 - Win32Api 10/2009 - XP Pro SP
                                    #Compile Exe                                
                                    #Dim All 
                                    #Optimize SPEED  'Fly baby Fly!!!
                                    #Debug Display On '<<<<<<<<<<<<<<< Remember to turn off for production code
                                     
                                    #Include "WIN32API.INC"
                                    '
                                    Function PBMain         
                                      Local Closest_Match, Found_At, Searching_For, flag, Num_Data_Records, ctr, ctr_tmp, stp, strt As Long
                                      Local strt1, Lotsa_Numbers() As Long
                                      Local tmr As Double
                                      Local s As String                         
                                      Local last_step, lastcheck As Long
                                      '                   
                                      'create dummy array                        
                                      Randomize Timer 'mix em up  
                                      tmr = Timer
                                      Num_Data_Records = 10000000  'lotta records  10 mil takes over a minute to set up
                                      Dim Lotsa_Numbers(1 To Num_Data_Records) 
                                      For ctr = LBound(Lotsa_Numbers()) To UBound(Lotsa_Numbers()) 
                                           Lotsa_Numbers(ctr) = Rnd(1, Num_Data_Records) 'should be plenty - duplicates
                                      Next ctr                                          
                                      
                                        'find a number unrem one of these
                                       Searching_For = Lotsa_Numbers(Rnd(1, Num_Data_Records)) 'find number to look for that exists             
                                      ' Searching_For = Rnd(1, Num_Data_Records) 'find number to look for that may not exist             
                                       'Searching_For =  Num_Data_Records * .95882 'find worst case number to look for that may not exist              
                                    '
                                      Array Sort Lotsa_Numbers() 'put it in order now
                                      
                                      ? Using$("#.# seconds to set up ", Timer - tmr )
                                       
                                      Stp = Num_Data_Records \ 10  'start jumping by tenths of a file
                                      Strt = 1   
                                      tmr# = Timer      
                                      Flag = 0               
                                    'start search loop  ************   
                                      Do              
                                      If stp < 1 Then stp = 1 'JIC   
                                        Flag = stp
                                        strt1 = strt
                                      For ctr = strt To UBound(Lotsa_Numbers()) Step stp     
                                         last_step = stp
                                          lastcheck = ctr
                                      
                                         If Lotsa_Numbers(ctr) = Searching_For Then 
                                            Found_At = ctr             
                                            Exit For 'matched so don't look any further
                                         End If          
                                         If Lotsa_Numbers(ctr) > Searching_For Then Exit For 'greater so don't look any further
                                          strt = ctr  'if we got here it must be less so set next starting point
                                      Next ctr                        
                                      Closest_Match = ctr
                                    '                         
                                      Stp = Stp \ 10 'search in smaller increments
                                    '  
                                      Loop While flag <> 1 'not there yet                               
                                    ' done loop ************  
                                    '  
                                      
                                      If Found_At Then
                                         s$ = Using$("Found #, in #.### seconds in #, Records at #, ", _
                                                     Searching_For, Timer - tmr, Num_Data_Records, Found_At) & $CrLf 
                                        Else 
                                         s$ = Using$("Didn't find #, in #.### seconds in #, Records  ", _
                                                     Searching_For, Timer - tmr, Num_Data_Records) & $CrLf 
                                         Found_At = lastcheck 'for display             
                                        s$ = s$ & Using$("Closet was at # ",Found_At) & $CrLf & $CrLf 
                                      End If               
                                      
                                       For ctr = Found_At - 5 To  Found_At + 5   'show before and after                                          
                                          If ctr < LBound(Lotsa_Numbers()) Then Exit For 'JIC
                                          If ctr > UBound(Lotsa_Numbers()) Then Exit For 'JIC
                                          If Lotsa_Numbers(ctr) = Searching_For Then s$ = s$ & "*"
                                          s$ = s$ & Using$("#,)  ####,####,####,####", ctr, Lotsa_Numbers(ctr)) & $CrLf 
                                       Next ctr
                                      ? s$,, Using$("looked for #, ", Searching_For)
                                      '1st occurence of Data_Record is at Ctr
                                    End Function 
                                    '
                                    =============================================
                                    "Opportunities multiply as they are seized."
                                    Sun Tzu
                                    =============================================
                                    It's a pretty day. I hope you enjoy it.

                                    Gösta

                                    JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                                    LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                                    Comment


                                    • #19
                                      I do something like that in my 837 (Health Care Claim and Encounter) API...

                                      I have many dup keys in that map.

                                      I wrote a function to do binary search for any hit; then I sequentially go BACKWARD in the table to find the FIRST occurence of the key and return that subscript to the calling procedure, which then processes sequentially until the key changes.

                                      Key size is 12 alpha.
                                      1100+ entries in table
                                      Table is searched anywhere from 100 to 100,000 times per run depending on input file size. It can be in the millions, but that is quite rare. My 'Killer' test file does a little over 4 million searches.


                                      MCM
                                      Michael Mattias
                                      Tal Systems (retired)
                                      Port Washington WI USA
                                      [email protected]
                                      http://www.talsystems.com

                                      Comment


                                      • #20
                                        I posted and forgot to mention the For/Next Method above is immeasurably fast. I tried it up to 100 million array and it it never took as much as 1/100 of a second to (or not) find a match.

                                        Of course Longs are optimized and strings would be slower but still ...
                                        It's a pretty day. I hope you enjoy it.

                                        Gösta

                                        JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                                        LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                                        Comment

                                        Working...
                                        X