Announcement

Collapse
No announcement yet.

Fastest possible string search

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastest possible string search

    I'm opening big files using ReadFile et al APIs, and reading them to a string.
    What would be the fastest way to search these strings for other strings.
    I'm currently using InStr, but wondering if there's anything faster.

    Even better, would be to search a Byte Array for s string.

    Thanks,

    - Nathan.

  • #2
    Nathan,

    In most instances INSTR is easily fast enough for string searches
    but in some specialised cases, usually very long strings or byte
    sequences, a different type of search algorithm can be used if you
    are willing to do the extra work to set it up.

    I posted a while ago a Boyer Moore exact pattern matching algorithm
    that has generally tested up OK so it may be useful to you. I cannot
    yet verify that it matches every pattern that can be found in the
    data being searched but it appears to be working OK.

    The Boyer Moore search algo has a different logic to a classic byte
    scanner and it is well suited to search large bodies of data but it has
    a higher startup overhead that makes it unsuitable for short recursive
    searching.

    Regards,

    [email protected]

    ------------------
    hutch at movsd dot com
    The MASM Forum

    www.masm32.com

    Comment


    • #3
      I will be searching the possibly very large string for smaller strings of a few bytes.
      So the Boyer-Moore is only efficient when working with Larger searches?
      Maybe i could use InStr for small files, and Boyer-Moore for larger..?

      A method that works best searching a byte array would be best,
      since i could remove the Byte->Unicode conversion on my OpenFile sub.

      Thanks

      - Nathan


      ------------------

      Comment


      • #4
        I can't get Steve's boyermoore algo to work. It returns -2 no matter what i search for.

        My code:

        Code:
        '  Boyer-Moore Defines
           DIM Fnum     AS LONG
           DIM slen     AS LONG
           DIM src      AS LONG
           DIM ln       AS LONG
           DIM pat      AS LONG
           DIM pln      AS LONG
           DIM spos     AS LONG
           DIM lnRet1   AS LONG
           DIM pattern  AS STRING
           DIM szData() AS BYTE
        '  -------------------   
        
        'code here that fills szData() byte array.
        
            pattern = "a"
            src  = STRPTR(szData$)       ' source BYTE OR TEXT DATA address
            ln   = UBOUND(szData)        ' this would usually be supplied when you OPEN the file
            pat  = STRPTR(pattern$)      ' pattern address
            pln  = LEN(pattern$)         ' pattern length
            spos = 0                     ' where TO start IN the file (0 based offset)
            slen = BMBinSearch(spos,src,ln,pat,pln)
        What causes it to return -2 ?

        Thanks.

        -Nath.

        Comment


        • #5
          Ok, it seems it returns -2 is the pattern is too small.

          I used a larger pattern, and the -2 goes.

          BUT.. now, no matter what i put as the pattern above 2 characters
          it returns 625.

          Can someone explain if these are locations in the byte array? Or return codes..


          Thanks,

          -Nathan.

          ------------------

          Comment


          • #6
            If any one interested: here are the modules im using. I found them
            here on the forums, from Steve.

            Code:
            'BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE
            FUNCTION create_shift_table(srch$) AS STRING
            
                  #REGISTER NONE
            
                  LOCAL chTable AS STRING     ' string to build table in
                  LOCAL lpTable AS LONG       ' address of table
                  LOCAL lpSrch  AS LONG       ' address of search string
                  LOCAL ssl     AS LONG       ' search string length
            
                  chTable = SPACE$(1024)      ' allocate 256 * 4 bytes
                  ssl     = LEN(srch$)        ' get search string length
                  lpSrch  = STRPTR(srch$)     ' get search string address
                  lptable = STRPTR(chTable)   ' get table address
            
                ' ------------------------------------------------------
                ' fill DWORD table with the length of the search string
                ' ------------------------------------------------------
            
                  ! cld                   ; clear direction flag to read forward
            
                  ! mov ecx, 256
                  ! mov edi, lpTable
                  ! mov eax, ssl          ; write search string length to table
                  ! rep stosd
            
                  ! mov ecx, ssl          ; put search string length in ECX
                  ! dec ecx
                  ! mov esi, lpSrch       ; address search string
                  ! mov edi, lpTable      ; table address
            
                  ! xor eax, eax          ; zero EAX
            
                ' -----------------------------------------------
                ' write the shift vaue for each search character
                ' into the correct position in the shift table
                ' -----------------------------------------------
                Write_Shift_Characters:
                  ! mov al, [esi]         ; get the character
                  ! inc esi
                  ! mov [edi+eax*4], ecx  ; write shift for each character
                  ! dec ecx               ; to ascii location in table
                  ! jnz Write_Shift_Characters
            
                  FUNCTION = chTable
            
            END FUNCTION
            
            
            FUNCTION BMsearch(BYVAL lp_shift AS LONG, _
                                BYVAL startpos AS LONG, _
                                BYVAL lpsource AS LONG, _
                                BYVAL lnsource AS LONG, _
                                BYVAL lptarget AS LONG, _
                                BYVAL lntarget AS LONG) AS LONG
            
                  #REGISTER NONE
            
                  LOCAL resetpos AS LONG
                  LOCAL buffrend AS LONG
            
                  ! mov esi, lpsource     ; source address
                  ! mov edi, lpTarget     ; target address
            
                  ! mov ecx, lntarget
                  ! dec ecx               ; correct back one to read last character pair
            
                  ! add esi, ecx          ; set esi to first end compare
                  ! add edi, ecx          ; set edi to first end compare
            
                  ! mov resetpos, edi     ; set reset position for shift loop
            
                  ! mov ebx, esi          ; use EBX as shift location counter
            
                ' ------------------------------
                ' set end of source in buffrend
                ' ------------------------------
                  ! mov edx, lpsource     ; source address
                  ! add edx, lnsource     ; add source length
                  ! inc edx               ; end of buffer + 1
                  ! mov buffrend, edx
            
                  ! mov edx, lp_shift     ; use EDX as address of shift table
            
                  ! dec lntarget          ; dec target length for shift loop
                  ! xor eax, eax          ; clear EAX
            
                  ! add esi, startpos     ; move ESI forward by starting position
            
                bmMainLoop:
                  ! mov al, [esi]
                  ! dec esi
                  ! dec edi
                  ! cmp al, [edi+1]
                  ! jne bmSetShift
                  ! dec ecx
                  ! jnz bmMainLoop        ; if ECX = 0, match has occurred
            
                  ! jmp bmCalcLength      ; exit on match
            
                bmSetShift:
                  ! add ebx, [edx+(eax*4)]; add shift to location counter
                  ! mov esi, ebx          ; copy location counter + shift to ESI
                  ! mov ecx, lntarget     ; reset ECX to target length
                  ! mov edi, resetpos     ; reset EDI to end char of target string
                  ! cmp esi, buffrend     ; test end of buffer condition
                  ! jbe bmMainLoop        ; if not end of buffer, do next string compare
            
                  ! jmp bmNoMatch         ; else set no match and exit
            
                bmCalcLength:
                  ! sub esi, lpsource
                  ! mov eax, esi
                  ! jmp bmexitpos
            
                bmNoMatch:
                  ! mov eax, -1
            
                bmexitpos:
                  ! mov FUNCTION, eax
            
            END FUNCTION
            'BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE*BOYER MOORE

            Comment


            • #7
              Nathan,

              Try this version, the earlier one I posted had problems with repeat
              sequences in the search pattern.

              Code:
                ' #########################################################################
                
                FUNCTION BMBinSearch(ByVal startpos as LONG, _
                                     ByVal lpSource as LONG,ByVal srcLngth as LONG, _
                                     ByVal lpSubStr as LONG,ByVal subLngth as LONG) as LONG
                
                    #REGISTER NONE
                
                    LOCAL cval    as DWORD
                    LOCAL ExitLen as DWORD
                    LOCAL shift_table as STRING * 1024
                
                    ! cmp subLngth, 1
                    ! jg tooShort
                    ! mov eax, -2                 ; string too short, must be > 1
                    ! jmp Cleanup
                  tooShort:
                
                    ! mov esi, lpSource
                    ! add esi, srcLngth
                    ! sub esi, subLngth
                    ! mov ExitLen, esi            ; set Exit Length
                
                  ' ----------------------------------------
                  ' load shift table with value in subLngth
                  ' ----------------------------------------
                    ! mov ecx, 256
                    ! mov eax, subLngth
                    ! lea edi, shift_table
                    ! rep stosd
                
                  ' ----------------------------------------------
                  ' load decending count values into shift table
                  ' ----------------------------------------------
                    ! mov ecx, subLngth           ; SubString length in ECX
                    ! dec ecx                     ; correct for zero based index
                    ! mov esi, lpSubStr           ; address of SubString in ESI
                    ! lea edi, shift_table
                
                    ! xor eax, eax
                
                  Write_Shift_Chars:
                    ! mov al, [esi]               ; get the character
                    ! inc esi
                    ! mov [edi+eax*4], ecx        ; write shift for each character
                    ! dec ecx                     ; to ascii location in table
                    ! jnz Write_Shift_Chars
                
                  ' -----------------------------
                  ' set up for main compare loop
                  ' -----------------------------
                    ! mov ecx, subLngth
                    ! dec ecx
                    ! mov cval, ecx
                
                    ! mov esi, lpSource
                    ! mov edi, lpSubStr
                    ! mov ebx, esi                ; EBX as location pointer
                    ! add esi, startpos           ; add starting position
                
                  Cmp_Loop:
                    ! xor eax, eax                ; zero EAX
                    ! mov al, [esi+ecx]
                    ! cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
                    ! jne Set_Shift               ; if not equal, get next shift
                    ! dec ecx
                    ! jns Cmp_Loop
                
                  Match:
                    ! sub esi, lpSource           ; sub source from ESI
                    ! mov eax, esi                ; put length in eax
                    ! jmp Cleanup
                
                  Set_Shift:
                    ! mov edi, lpSubStr           ; restore sub string address
                    ! mov edx, shift_table[eax*4] ; get char shift value
                    ! cmp edx, subLngth           ; is it pattern length ?
                    ! jne Suffix_Shift            ; if not, jump to Suffix_Shift
                
                  Bad_Char_Shift:
                    ! lea ebx, [ebx+ecx+1]        ; add bad char shift
                    ! mov ecx, cval               ; reset counter in compare loop
                    ! mov esi, ebx
                    ! cmp esi, ExitLen
                    ! jle Cmp_Loop
                
                    ! mov eax, -1
                    ! jmp Cleanup
                
                  Suffix_Shift:
                    ! add ebx, edx                ; add suffix shift
                    ! mov ecx, cval               ; reset counter in compare loop
                    ! mov esi, ebx
                    ! cmp esi, ExitLen
                    ! jle Cmp_Loop
                
                    ! mov eax, -1                 ; set value for no match
                
                  Cleanup:
                
                    ! mov FUNCTION, eax
                
                END FUNCTION
                
                ' ########################################################################
              Regards,

              [email protected]

              ------------------
              hutch at movsd dot com
              The MASM Forum

              www.masm32.com

              Comment


              • #8
                Morning,

                Shall i use the same 'create_shift_table' function?

                I don't understand how i can use this function efficiently with several hundred searches..
                As it creates a new shift table each call?

                Or am i just totally not understand Boyer Moore

                Thanks.

                -Nathan.

                Comment


                • #9
                  Perhaps I do not understand your application situation very well, but I have to wonder about your design.

                  Seems to me if you are searching a very large string (a file) for (the presence of one or more of?) several hundred smaller strings, there has got to be a more efficient method. I have a similar (I think?) search requirement I use in my ANSI EDI parser, where I load a whole file and search every character of the string for any number of small strings (e.g, "ISA", "GS", "BIG", etc) and without using any assembly language or high-math algorithms, the speed of the parser blows the socks off of any competing product. (I use pointer variables, not INSTR or REGEXPR).For example, it parses and writes an unwrapped 280Kb file in less than four seconds on my 400 mhz machine, and all the terminators and delimiters are dynamically found.

                  Maybe if you can describe your application a little more fully someone can come up with a mthod worth looking at.

                  MCM

                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Nathan,

                    The characteristic of a Boyer Moore algorithm is that it builds
                    a shift table for each different search it undertakes. If the
                    strings you are searching is big enough, the fundamental logic
                    of the algorithm will make it faster but if you are not searching
                    very long strings/byte data, INSTR will be faster.

                    With the version I posted in which I posted here a while ago, just
                    pass to it the address of both the source text and the pattern
                    being searched for, the length of each and the starting position
                    for the search. With the starting position, make sure it is not
                    past the end of where the last match can be found in the source.
                    This version builds its own shift table so you don't need to use
                    the one from the earlier version.

                    This version tested up well but I have not exhaustively tested it
                    over all possible variations so it is worth ensuring that it does
                    not miss any matches.

                    Regards,

                    [email protected]

                    ------------------
                    hutch at movsd dot com
                    The MASM Forum

                    www.masm32.com

                    Comment


                    • #11
                      I'm working on a lightweight antivirus engine.. dont laugh
                      So you have to think about all the files it will be scanning, and each file
                      has to be scanned for multiple smaller strings.

                      The "smaller" strings are about 100 bytes long. Whereas the files it will be
                      searching in, could be generally anything from 1kb to 50mb..

                      Which ever method i settle with, needs to be fast. ie: search a file of 5mb
                      for atleast 1000 signitures - in less than a second...

                      Is BoyerMoore the correct way to go? Or is there another algorithm to use for this?

                      Thanks

                      -Nathan.

                      Comment


                      • #12
                        Bob Zale's INSTR is an excellent piece of code and I doubt we can find
                        a better or faster way to do it. Hutch has posted some interesting code
                        that can take advantage of pre-set pointers and length, etc, but this
                        also means a bit more complicated coding around it. Still, Hutch is a
                        champion at this, so his code is probably the best way to do it faster.

                        I once tried to find faster way and came up with the code below. Just
                        did it to see if I could "beat" INSTR, but real life test on scanning
                        text against a database with 3000 phrases showed about same average
                        speed as INSTR. So code below has no point, since INSTR does same job
                        in about same speed and also enables backwards scanning, but I post it
                        in case someone is interested in experimenting.

                        I think, no matter what kind of algoritm one uses, in the end, every byte
                        has to be tested, so I actually can't see a better way than simple, straight
                        forward byte scanning. INSTR has my vote.
                        Code:
                        FUNCTION myINSTR(BYVAL start AS LONG, MainStr$, Search$) AS DWORD
                          IF start > 0 THEN DECR start        'search is zero based here (return is ok, 1-based)
                         
                          #REGISTER NONE                      ' using edi and esi, so..
                          LOCAL esiPos AS DWORD, pos AS DWORD, MaxEdi AS DWORD 'need these for storage purposes
                         
                          ! mov edi, Search$        ; edi = pointer to search string handle
                          ! mov edi, [edi]          ; edi = pointer to search string data
                          ! cmp edi, 0              ; if edi is zero - no search string
                          ! je getOut               ; then get out
                          ! mov ah, [edi]           ; move first char into ah
                          ! mov MaxEdi, edi         ; store search startpos in MaxEdi
                          ! mov edx, [edi-4]        ; move length of search string into edx
                          ! add MaxEdi, edx         ; add length of search string to startpos in MaxEdi
                         
                          ! mov esi, MainStr$       ; esi = pointer to main string handle
                          ! mov esi, [esi]          ; esi = pointer to main string data
                          ! cmp esi, 0              ; if esi is zero - no main string
                          ! je getOut               ; then get out
                          ! mov ebx, [esi-4]        ; store length of main string in ebx
                          ! sub ebx, [edi-4]        ; subtract search string length from main string length
                          ! inc ebx                 ; adjust for 1-based return
                         
                          ! mov ecx, [esi-4]        ; move length of main string into ecx (counter)
                          ! cmp ecx, edx            ; compare lengths, if main is shorter than search
                          ! jb getOut               ; then get out
                          ! sub ecx, [edi-4]        ; subtract search string length from main string length
                          ! sub ecx, Start          ; subtract startpos from counter
                          ! add esi, Start          ; add/set start position
                         
                          myINSTRloop:
                             ! mov al, [esi]         ; move current char into al
                             ! cmp al, ah            ; compare against first character in search string
                             ! je prepComp           ; if equal, jump to prepare
                             ! inc esi               ; get next character
                             ! dec ecx               ; decrease ecx (length) counter
                             ! jnz myINSTRloop       ; iterate if counter isn't zero (end of string)
                             ! jmp getOut            ; was zero, jump out
                         
                          prepComp:
                             ! mov esiPos, esi       ; store position in main string
                             ! mov pos, ecx          ; store counter
                         
                          compRest:
                             ! inc edi               ; next search character
                             ! cmp MaxEdi, edi       ; compare position against maxpos
                             ! je isMatch            ; if equal, we're done - all matches
                             ! dec ecx               ; else decrease counter
                             ! inc esi               ; get next main character
                             ! mov al, [esi]         ; move next main char into al
                             ! mov ah, [edi]         ; move search char into ah
                             ! cmp al, ah            ; compare the two
                             ! je compRest           ; if they match - get next char
                             ! jmp resumeLoop        ; if not - resume search
                         
                          isMatch:
                             ! sub ebx, pos          ; subtract backwards position from length to get relative pos
                             ! mov Function, ebx     ; return result (position)
                             ! jmp getOut            ; and jump out
                         
                          resumeLoop:
                             ! mov esi, esiPos       ; restore esi - position in main string
                             ! inc esi               ; point to next character
                             ! mov edi, Search$      ; restore edi - set pointer to search string handle
                             ! mov edi, [edi]        ; edi = pointer to search string data
                             ! mov ah, [edi]         ; move first char into ah again
                             ! mov ecx, pos          ; restore counter
                             ! dec ecx               ; decrease counter
                             ! jnz myINSTRloop       ; iterate if counter isn't zero (end of string)
                         
                          getOut:
                        END FUNCTION

                        ------------------

                        Comment


                        • #13
                          Nathan,

                          Scanning files for a virus byte pattern is probably an ideal
                          application for a Boyer Moore algorithm because it has the linear
                          read speed that a classic byte scanner does not have. The strength
                          of a Boyer Moore algo is that it does NOT scan every byte but finds
                          the mismatch and does not scan the rest of the source if a match is
                          not possible.

                          With virus patterns averaging about 100 bytes, you will get the
                          maximum advantage using it where if you were scanning for byte
                          patterns of 6 characters or less, a byte scanner is simply faster.

                          The overhead with a Boyer Moore algo is in the way it loads the shift
                          table for each search pattern but you need to remember that it is
                          only 256 DWORD size writes which does not take all that long.

                          The best way to do it if you have enough memory on the target machine
                          is to load the complete file into memory and scan through it one
                          pattern at a time. This may sound tedious but once its in memory
                          the following scans will be a lot faster.

                          On my PIII 600, I was getting speeds of about 300 meg/sec with patterns
                          of about 80 characters so this may be a fast enough method to do what
                          you require.

                          I would test the algorithm against the INSTR output just to make sure
                          that it does not miss any patterns, I have done substantial testing
                          on the algorithm I posted but I have not yet worked out a way to
                          prove it exhaustively.

                          Regards,

                          [email protected]

                          ------------------
                          hutch at movsd dot com
                          The MASM Forum

                          www.masm32.com

                          Comment


                          • #14
                            Borje, if you say the code you posted is no faster than InStr then
                            it probably isn't what i'm looking for. However, i will benchmark it anyway

                            Steve, okay - this Algorithm sounds great but none of the source i have
                            for it seem to work.
                            They always return 625, no matter what i search for as long as its two bytes or more... ? I'm passing it a Byte Array straight from the ReadFile api's,
                            and a String for the searchpattern. I'm really hoping to get this working :]

                            Thanking both of you for your comments.

                            - Nathan.

                            [This message has been edited by Nathan Evans (edited June 30, 2001).]

                            Comment


                            • #15
                              Steve,

                              Each file i read is returned in a Byte Array.
                              How do i pass this byte array to the BMsearch function?

                              Code:
                              '  Boyer-Moore Defines
                                 LOCAL sSearch  AS STRING
                                 LOCAL sPattern AS STRING
                                 LOCAL sTable   AS STRING
                                 LOCAL lPos     AS LONG
                                 DIM pattern    AS STRING
                                 DIM szData()   AS BYTE
                              '  ------------------- 
                              
                                 CALL fOpenFile("e:\sampleexefile.exe", szData()) 'szData() is passed ByRef to be filled with bytes from the file
                              
                                 sPattern = "P r e p a r i n g" 'a string picked from around the middle of sampleexefile.exe
                              
                                 sTable = create_shift_table(sPattern) 'sTable holds shifttable, sPattern is passed to create_shift_table as the pattern.
                              
                                       'now, a UBOUND(sTable) would return 1024
                              
                                 lPos = BMsearch( STRPTR(sTable), 0, STRPTR(szData), UBOUND(szData), STRPTR(sPattern), LEN(sPattern) )
                                 '(above)  - line errors because of [i]STRPTR(szData)[/i], err: array not dimensioned - when it is!  [img]http://www.powerbasic.com/support/forums/smile.gif[/img]
                              Thanks for your time.

                              -Nath.

                              Comment


                              • #16
                                Nathan,

                                Use the version I pasted in, it does not use the external shift table that the
                                first one did.

                                Code:
                                FUNCTION BMBinSearch(ByVal startpos as LONG, _
                                                     ByVal lpSource as LONG,ByVal srcLngth as LONG, _
                                                     ByVal lpSubStr as LONG,ByVal subLngth as LONG) as LONG
                                use either VarPtr() or StrPtr() to get the address of the source, depending on
                                how you load the source, the source length, the address of the pattern to search
                                for and the pattern length.

                                Code:
                                retval = BMBinSearch(0,lpSource,slen,lpPattern,plen)
                                See if you get a result from this.

                                Regards,

                                [email protected]


                                ------------------
                                hutch at movsd dot com
                                The MASM Forum

                                www.masm32.com

                                Comment


                                • #17
                                  Nathan, According to your snippet "szdata" is a BYTE array, so you cannot use STRPTR() on it!

                                  Maybe you need to post more of the code so we can see exactly what you are doing here, but you probably just need to pass it as BYVAL VARPTR(szData(0)) which will pass the address of the 1st subscript (0) of the szData() array (assuming szData's LBOUND is 0).



                                  ------------------
                                  Lance
                                  PowerBASIC Support
                                  mailto:[email protected][email protected]</A>
                                  Lance
                                  mailto:[email protected]

                                  Comment


                                  • #18
                                    I will try what you have suggested, Steve, Lance.. and get back to you
                                    this afternoon.



                                    -Nath

                                    Comment


                                    • #19
                                      Borje,

                                      If you have the time and still have the test handy, would you run this
                                      byte scanner through it to see how it shapes up, I only have it
                                      benchmarked in MASM when I was using this type of algo to test
                                      against the Boyer Moore ones I was working on.

                                      What I was trying to do with this one was bridge the gap between
                                      scanners and the BM type algos. It has a short main loop instruction
                                      path where my testing showed that the main time was taken with byte
                                      scanners, its branch path is reasonable but I could not se an easy way to
                                      shorten it. I unrolled the main loop by 2 so that there were no
                                      unusual effects from it being so short.

                                      Regards,

                                      [email protected]

                                      Code:
                                        '##########################################################################
                                        
                                        FUNCTION ScanString(ByVal startpos as LONG, _
                                                            ByVal source   as LONG, ByVal lnsrc as LONG, _
                                                            ByVal pattern  as LONG, ByVal lnpatn as LONG) as LONG
                                        
                                            #REGISTER NONE
                                        
                                            ! mov edi, pattern
                                            ! mov al, [edi]       ; get 1st byte
                                        
                                            ! mov esi, source
                                            ! mov ecx, lnsrc
                                            ! add esi, ecx
                                            ! neg ecx
                                            ! add ecx, startpos
                                        
                                            ! dec lnpatn
                                        
                                            ! jmp main_loop
                                        
                                          ' %%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                        
                                          pre_loop:
                                            ! pop ecx
                                            ! inc ecx
                                        
                                          main_loop:
                                          ' ------------
                                          ' unroll by 2
                                          ' ------------
                                            ! cmp al, [esi+ecx]
                                            ! je pre_sub
                                            ! inc ecx
                                            ! jz nMatch
                                        
                                            ! cmp al, [esi+ecx]
                                            ! je pre_sub
                                            ! inc ecx
                                            ! jnz main_loop
                                          ' ------------
                                            ! jmp nMatch
                                        
                                          pre_sub:
                                            ! push ecx
                                            ! mov edi, pattern
                                            ! mov ebx, lnpatn
                                        
                                          sub_loop:
                                            ! mov ah, [esi+ecx+1]
                                            ! cmp ah, [edi+1]
                                            ! jne pre_loop
                                            ! inc ecx
                                            ! inc edi
                                            ! dec ebx
                                            ! jnz sub_loop
                                        
                                          ' %%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                        
                                            ! pop eax
                                            ! add eax, lnsrc
                                            ! jmp Outa_Here
                                        
                                          nMatch:
                                            ! mov eax, -1
                                        
                                          Outa_Here:
                                            ! mov FUNCTION, eax
                                        
                                        END FUNCTION
                                        
                                        ' ########################################################################
                                      ------------------
                                      hutch at movsd dot com
                                      The MASM Forum

                                      www.masm32.com

                                      Comment


                                      • #20
                                        I think before i can do anything, i need to convert this Byte Array
                                        into a String. As it seems none of these search algorithms will process
                                        a Byte Array

                                        I wrote a simple For Next loop using Chr$ to convert the Byte Array into a string
                                        but it's far too slow..

                                        I scoured the helpfiles for a PB equivilent of VB's StrConv, but couldn't
                                        find anything. You guys have any fast subs to do it?

                                        Thanks.

                                        -Nathan

                                        Comment

                                        Working...
                                        X