Announcement

Collapse
No announcement yet.

Fast string searching

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fast string searching

    admin note: this is a continuation of
    http://www.powerbasic.com/support/pb...ead.php?t=3950

    nathan,

    all of the search methods mentioned will search binary files with
    no problems, what i would be interested to see is how you are
    storing the binary data and how you are converting it to string.

    past in the format you are using so we can have a look at it. also
    be careful with the bm algorithm, i have not been able to verify
    that it finds every search pattern so i would test it with either
    instr or the scanstring functions to make sure it gets all of the
    patterns you are searching for.

    regards,

    hutch@pbq.com.au

    ------------------
    hutch at movsd dot com
    The MASM Forum

    www.masm32.com

  • #2
    Steve,
    The strings i load into an array are loaded fine and will search fine
    InStr. However when some of them are passed to the ScanString function,
    the program will completely illegalop and win2k writes an errorlog..
    (not sure where it writes it though )

    How fast is this ScanString function supposed to be? So far, i'm only
    managing around 5-7 Mb/sec ??


    My general declarations:
    Code:
    #COMPILE EXE
    #DIM ALL
    #INCLUDE "win32api.inc"
    
    GLOBAL ayFileDef() AS STRING

    Here is the function i am using to load my Definitions file.
    The Definitions file is seperated by CRLF for each definition, then
    each line is seperated into 3 sections by a Semicolon ";".
    The first section is the parasite name, second is the type, and third
    is the hexidecimal signiture.

    Code:
    FUNCTION LoadDB(lPerformanceOpenFile AS LONG, lPerformanceProcess AS LONG) AS LONG
    
     DIM szFileDef AS STRING, lPerformance AS LONG
    
     CALL fOpenFile("E:\WIN2K\Programming\Net-Commando V3\VirusIDENT\file.def", szFileDef, lPerformance)
    
      lPerformanceOpenFile = lPerformance
    
     DIM tC AS LONG
     tC  = GetTickCount
    
        IF LEN(szFileDef) <> 0 THEN
    
            DIM X AS LONG, Y AS LONG, FirstBound AS LONG, SecondBound AS LONG
            DIM tmpArray(0 TO 0) AS STRING, tmpArray2(0 TO 0) AS STRING
            DIM arrayData(0 TO 0) AS STRING, tmpData AS STRING, enumL AS LONG
    
            Split szFileDef, $CRLF, tmpArray()
            FirstBound = UBOUND(tmpArray)
    
             FOR X = 0 TO UBOUND(tmpArray())
                 Split tmpArray(X), ";", tmpArray2()
                 IF UBOUND(tmpArray2) > SecondBound THEN SecondBound = UBOUND(tmpArray2)
             NEXT
    
            ERASE ayFileDef
            REDIM ayFileDef(FirstBound - 1, SecondBound)
    
             FOR X = 0 TO FirstBound - 1
    
                 Split tmpArray(X), ";", tmpArray2()
    
                 FOR Y = LBOUND(tmpArray2) TO UBOUND(tmpArray2)
                    ayFileDef(X, Y) = tmpArray2(Y)
                 NEXT
    
                  Split ayFileDef(X, 2), " ", arrayData()
                  tmpData = ""
                      FOR enumL = LBOUND(arrayData) TO UBOUND(arrayData)
                          tmpData = tmpData & CHR$(VAL("&H" & arrayData(enumL)))
                          ayFileDef(X, 2) = tmpData
                      NEXT enumL
                    ERASE arrayData
    
             NEXT
    
            ERASE tmpArray
            ERASE tmpArray2
    
        END IF
    
     lPerformanceProcess = GetTickCount - tC
     FUNCTION = (UBOUND(ayFileDef) - LBOUND(ayFileDef)) + 1
    
    END FUNCTION
    My AnalyseFile function opens the file in the szFile parameter.
    This uses the ReadFile et al APIs, then uses PEEK to return the bytes as
    a string. Then it enumerates through the DB of signitures i loaded
    first in PBMAIN, and currently uses ScanString to hopefully find the signiture.

    Code:
    FUNCTION AnalyseFile(BYVAL szFile AS STRING, BYREF szAnalysis AS STRING, BYREF lPerformanceOpenFile AS LONG, BYREF lPerformanceProcess AS LONG, BYREF lSpeed AS LONG) AS SINGLE
    
        'Variables
        DIM tC AS LONG, enumI AS LONG
        DIM retVal AS LONG, MainStr AS STRING, SearchStr AS STRING
        LOCAL ap AS LONG, aln AS LONG, bpt AS LONG, bln AS LONG
    
        'Open the file and return statisitcal result to lPerformanceOpenFile BYREF
        CALL fReadFile(BYVAL szFile, BYREF MainStr, BYREF lPerformanceOpenFile)
        aln = LEN(MainStr)
    
        'Performance Counter
        tC = GetTickCount
    
        'Analysis for Virus Defintions
        FOR enumI = LBOUND(ayFileDef) TO UBOUND(ayFileDef)
            SearchStr = ayFileDef(enumI, 2)
            'MSGBOX SearchStr
                'ap  = STRPTR(MainStr)
                'aln = LEN(MainStr)
                'bpt = STRPTR(SearchStr)
                'bln = LEN(SearchStr)
            'rv& = ScanString(0,STRPTR(a$),aln&, STRPTR(patterns$(cnt)),LEN(patterns$(cnt)))
            retVal = ScanString(0, BYVAL STRPTR(MainStr), BYVAL aln, BYVAL STRPTR(SearchStr), BYVAL LEN(SearchStr))
            'retVal = ScanString(  0, ap, aln, bpt, bln  )
            'szAnalysis = szAnalysis & ", " & STR$(retVal)
    
            'retVal = INSTR(  1, MainStr, SearchStr  )
            'szAnalysis = szAnalysis & ", " & STR$(retVal)
    
            'retval = BMBinSearch(  0, BYVAL STRPTR(MainStr), BYVAL LEN(MainStr), BYVAL STRPTR(SearchStr), BYVAL LEN(SearchStr)  )
            'szAnalysis = szAnalysis & STR$(STRPTR(MainStr)) & " / " & STR$(STRPTR(SearchStr)) '", " & STR$(retVal)
    
              IF retVal > 2 THEN
                 szAnalysis = szAnalysis & ayFileDef(enumI, 0) & " (" & ayFileDef(enumI, 1) & ")"
              END IF
        NEXT enumI
    
        lPerformanceProcess = GetTickCount - tC
        lSpeed = LEN(MainStr) / (lPerformanceProcess * 1000)
    
    END FUNCTION
    The problem here, is that on some of the signiture in the DB..
    ScanString will not like it and the program will IllegalOp.
    If i comment out the ScanString line and use InStr instead, it would
    work fine.

    I'm still not having any luck with the BM functions.

    Btw: I realise the code is not best

    Thanks steve,

    - Nathan.

    Comment


    • #3
      Steve, did you notice my comments about INSTR being faster than the BM code? Can you test that exact code on your PC and tell me whether we are seeing a processor-specific effect on the results?

      ------------------
      Lance
      PowerBASIC Support
      mailto:support@powerbasic.comsupport@powerbasic.com</A>
      Lance
      mailto:lanceedmonds@xtra.co.nz

      Comment


      • #4
        Steve, Lance..

        I've got the BMBinSearch algo working finally!!!!

        It's correctly identifying the test files i'm passing it!

        I'll post some performance results in a moment..

        Thanks guys!

        - Nathan.

        Comment


        • #5
          Before i post performance stats on the BMBinSearch..

          My ReadFile sub, although atleast 50 times faster than the Visual Basic
          equivilent , is fairly slow. Reading a 34mb file.. it took 939ms.

          Code:
          FUNCTION fReadFile(BYVAL szFile AS STRING, BYREF sBytes AS STRING, BYREF lPerformance AS LONG) AS SINGLE
          
          
              DIM tC AS LONG
              DIM hOrgFile AS LONG, bBytes() AS BYTE
              DIM nSize AS LONG, Ret AS LONG
              DIM enumI AS LONG
          
              'Performance Counter
              tC = GetTickCount
          
              'Open File to Byte Array
              hOrgFile = CreateFile(BYCOPY szFile$, %GENERIC_READ, %FILE_SHARE_READ + %FILE_SHARE_WRITE, BYVAL 0&, %OPEN_EXISTING, 0, 0)    'Get Handle to File
              nSize = GetFileSize(hOrgFile, 0)
                  IF nSize = 0 OR nSize = -1 THEN EXIT FUNCTION
          
              SetFilePointer hOrgFile, 0, 0, %FILE_BEGIN
              REDIM bBytes(1 TO nSize) AS BYTE
              ReadFile hOrgFile, bBytes(1), UBOUND(bBytes), Ret, BYVAL 0&
          
              IF Ret <> UBOUND(bBytes) THEN EXIT FUNCTION
              CALL CloseHandle (hOrgFile)
          
                  sBytes = PEEK$(VARPTR(bBytes(LBOUND(bBytes))), UBOUND(bBytes))
          
              'Performance Counter
              lPerformance = GetTickCount - tC
          
          END FUNCTION

          The culprit is in the Byte Array to String conversion, on the line i highlighted.

          I can't see anyway in optimising this, as it is reading directly from
          memory. But i thought i'd put it to you anyway

          Thanks

          -Nathan

          Comment


          • #6
            I see you use UBOUND a couple of times, it would make sense to do that evaluation just once and store that in a local variable.

            However, arrays are always slightly slower than using scalar variables, so if you can convert the code to use a scalar string it would be faster then doing that memory block copy.

            Maybe something like this - it stores the data directly into the return string and eliminated the byte array completely (pseudocode - may need some alteration):
            Code:
            hOrgFile = CreateFile...
            nSize = GetFileSize(hOrgFile, 0)        
            IF nSize = 0 OR nSize = -1 THEN EXIT FUNCTION    
            SetFilePointer hOrgFile, 0, 0, %FILE_BEGIN    
            sBytes = space$(nSize) ' preallocate the buffer
            ReadFile hOrgFile, BYVAL STRPTR(sBytes), nSize, Ret, BYVAL 0&
            CALL CloseHandle (hOrgFile)        
            EXIT FUNCTION
            Using byte arrays is something that VB programmers often use, but often you'll find PowerBASIC has much better and faster ways to deal with it.

            That said, it would be interesting to compare that API-based code to a generic PowerBASIC code:
            Code:
            ' more pseudocode
            OPEN "filename" FOR BINARY AS #1
            IF ERR THEN EXIT FUNCTION
            GET$ #1, LOF(1), sBytes
            CLOSE #1
            Further, since this sub just reads the file, it would make sense to take this code out of the function, and place it directly in the calling code block, thereby removing the overhead in calling the function and setting up it's stack, initializing variables, etc.

            More food for thought for you...

            I hope this helps!

            ------------------
            Lance
            PowerBASIC Support
            mailto:support@powerbasic.comsupport@powerbasic.com</A>
            Lance
            mailto:lanceedmonds@xtra.co.nz

            Comment


            • #7
              Lance,

              That worked a treat. I don't know why i overlooked it, but its now
              reading it directly into a String - which is far more efficient
              than reading to a byte array then converting it to a string afterwards.

              My previous code took ~900ms to open a 34mb file, its now almost
              half that at ~510ms.

              I will experiment what you suggested about the call stack being wasted.
              I realised this may be a problem, it's just that ReadFile function is going
              to be used alot, by many different procedures - and i thought it would
              be best if it was in its own function to save memory. But it actually
              seems have a negative effect on that.. ?

              The whole reason i jumped over to using the API instead of the built
              in PB file i/o functions was because it would read some files but not others.
              Never figured out why, the code was so simple - i don't think it could
              of been any error on my part.

              Here's the procedure i was using:

              Code:
              FUNCTION fOpenFile(BYVAL szFile AS STRING, BYREF sBytes AS STRING, BYREF lPerformance AS LONG) AS SINGLE
              
                LOCAL Fnum AS LONG, tC AS LONG
              
                  'Performance Counter
                  tC = GetTickCount
              
                  'Find a FREEFILE
                  Fnum = FREEFILE
              
                  'Open file
                   OPEN szFile FOR BINARY AS Fnum
                      GET$ Fnum, LOF(Fnum), sBytes
                   CLOSE Fnum
              
                  'Performance Counter
                  lPerformance = GetTickCount - tC
              
              END FUNCTION
              It would read some files, but not others. Never-ever got it to sucessfully
              read a MP3 file..


              cheers
              -Nathan.

              [This message has been edited by Nathan Evans (edited July 01, 2001).]

              Comment


              • #8
                All:
                So what method must i use to get up to the 300Mb/sec searching speeds?

                I was thinking, maybe if i run multiple BMBinSearch's in say 10 threads..
                the results may come quicker.

                -Nathan.

                [This message has been edited by Nathan Evans (edited July 01, 2001).]

                Comment


                • #9
                  I was thinking, maybe if i run multiple BMBinSearch's in say 10 threads..
                  the results may come quicker.
                  Nope. The machine only has so many CPU cycles per second.

                  If you ran ten threads, you'd get ten times as many answers all at once, but in ten times the time.

                  Closely related: Winston Churchill's (attributed) comment, "It takes nine months to have a baby, regardless of the number of women assigned to the job."

                  MCM


                  Michael Mattias
                  Tal Systems Inc.
                  Racine WI USA
                  mmattias@talsystems.com
                  http://www.talsystems.com

                  Comment


                  • #10
                    Lance,

                    I checked the code you tested and it was the function ScanString
                    which is a byte scanner of a similar type to the code used in
                    INSTR and I got some interesting results on my 2 different boxes,
                    on the PIII the ScanString function was about 30% faster but on my
                    AMD 550 INSTR was nearly twice as fast.

                    The Boyer Moore has different logic so the comparison is not a lot
                    of use, if the search pattern is long enough (over 6 characters)
                    it is almost exclusively faster and with longer patterns, faster
                    by a long way over any BYTE scanner.

                    The reason why I recommend INSTR in most normal instances is because
                    its performance is more than adequate in most instances and it is
                    a lot more flexible to use. Most of the search algorithms I have
                    developed have been written in MASM where you have no runtime library
                    at all. Fortunately porting them across to PowerBASIC is easy enough
                    because the inline assembler can handle MASM format with only very
                    minor notation changes.

                    Nathan,

                    From a quick look at your code, the problems you are having with
                    speed is related to the amount of overhead you have processing the
                    virus signatures. I would suggest that you try a different way of
                    storing the data and preload it as the program starts.
                    Code:
                        sig$(0) = chr$(1,2,3,4,5,6,7,8,9)
                        sig$(1) = chr$(9,8,7,6,5,4,3,2,1)  etc ....
                    The technique using CHR$ can handle any BYTE data with no problems
                    and this technique once loaded at startup has no overhead. If the scan
                    finds a virus signature, it can use the array index to look up another
                    array that has the virus names in it.
                    Code:
                        vname$(0) = "BadGuy1"
                        vname$(1) = "Idiot_Fringe2"    etc ....
                    This means you only process the name if a signature is found so
                    you are doing a lot less work if no signature is found.

                    If you have a list of virus signatures in binary rather than in HEX
                    it is easy enough to write yourself a small utility that reads the
                    BYTE data from each signature and converts it into the CHR$ format
                    I suggested so you can automate the process. This would be useful
                    as you add more signatures over time.

                    Regards,

                    hutch@pbq.com.au

                    ------------------
                    hutch at movsd dot com
                    The MASM Forum

                    www.masm32.com

                    Comment


                    • #11
                      Steve,

                      I am loading the DB into memory when the program starts. It's not
                      processing from the DB as it analyses a file..

                      Is it normal for BMBinSearch to take 140ms for just one search on a
                      34mb EXE file ?

                      There is currently only 30 test signitures in the DB, and to analyse
                      that entire 34mb with all 30, takes 4160ms !!

                      Maybe it will search faster if i split the files up into chunks?


                      -Nathan.

                      Comment


                      • #12
                        Nathan,

                        By a quick calculation, 34 meg in 140 ms is 242 meg/sec which is
                        by no means slow. The other thing to be aware of is that the first
                        read of the memory will be slower than subsequent reads so the next
                        ones will be a lot faster.

                        I don't know what the machine you are using is so let us know as this
                        is some indication of whether the code is working OK or not.

                        Regards,

                        hutch@pbq.com.au

                        ------------------
                        hutch at movsd dot com
                        The MASM Forum

                        www.masm32.com

                        Comment


                        • #13
                          Steve,

                          I'm developing on and testing this code on a Athlon 700mhz, 512ram, win2k.

                          I have noticed the performance increases on uncompressed file formats, such as BMP..
                          After i got my lSpeed calculation right, your right - it is infact scanning
                          at a minimum of 200mb/sec, and with some files will burst upto 700mb/sec..
                          but the general average seems to be 200-300mb/sec..

                          Can you explain why subsequent searches are faster? tah

                          -Nathan.

                          Comment


                          • #14
                            Nathan,

                            Subsequent reads are faster mainly due to cacheing, you normally
                            get what you call a first time penalty reading data but once it
                            is immediate in memory, subsequent reads are a lot faster.

                            Speed differences with an algorithm like a Boyer Moore are effected
                            by pattern length, character frequency and repeat sequences. If the
                            data being searched is using a wide number of characters, you have
                            the best chance of getting a bad character shift which jumps the
                            largest amount forward.

                            It is to your advantage to use the longest search patterns possible
                            as this will maximise your speed. If you are happy with the BM
                            algorithm and it is testing up reliable with the list of search
                            patterns that you are using, I will post a later more compact version
                            that may be slightly faster in some circumstances.

                            Regards,

                            hutch@pbq.com.au

                            ------------------
                            hutch at movsd dot com
                            The MASM Forum

                            www.masm32.com

                            Comment


                            • #15
                              Steve,

                              Subsequent reads are faster mainly due to cacheing, you normally
                              get what you call a first time penalty reading data but once it
                              is immediate in memory, subsequent reads are a lot faster.
                              Okay, that's good then


                              Speed differences with an algorithm like a Boyer Moore are effected
                              by pattern length, character frequency and repeat sequences. If the
                              data being searched is using a wide number of characters, you have
                              the best chance of getting a bad character shift which jumps the
                              largest amount forward.
                              Thanks for the tip, i will ensure that the definitions are atleast 40bytes.
                              What would be the result of a 'bad character shift'? Would it return -1, or just take longer
                              to search? What can be done to prevent this?

                              If you are happy with the BM algorithm and it is testing up reliable with the list of search
                              patterns that you are using, I will post a later more compact version
                              that may be slightly faster in some circumstances.
                              Steve, i'm VERY happy now that i know my scanspeed is comparable to
                              or even better than some of the top-name brand scanners
                              If you optimise your BM code any further, let me know!

                              Thanks Steve

                              - Nathan.

                              Comment


                              • #16
                                Karl,
                                Thank you for your comments.

                                Can you direct me to that BOYER.BAS you mentioned? I've spent 15 mins trying
                                to find it on the powerbasic.com downloads section but so far unsuccesful.
                                I think Steve's algo is almost as good as it gets, but i'm inclined to
                                test as many algo's as possible

                                My data stream is certain - it will always be of executable code.
                                ie: EXE, DLL. Once i'm done.. my Function won't even bother to scan
                                say a JPEG or BMP file - as they are not executable. I'll do this by
                                checking the File Header.

                                I disagree that InStr is simply all-round better.. InStr runs at a fraction
                                of the speed Steve's BMBinSearch algo does! In my tests, InStr was running
                                at around 7Mb/sec, whereas Steve's is around 200-300Mb/sec average.



                                Thanks.

                                - Nathan.

                                Comment


                                • #17
                                  The point is, Boyer-Moore isn't guaranteed to be faster. That depends entirely
                                  on the nature of the search.

                                  The Jackson/Abrash version is for PB/DOS. It should be in the PB/DOS area of
                                  the Downloads section, probably under "Library" or "Misc".

                                  ------------------
                                  Tom Hanlin
                                  PowerBASIC Staff

                                  Comment


                                  • #18
                                    Tom,

                                    I realise that a Boyer Moore search on small files, or with small search
                                    strings would be much slower than an equivilent search with InStr.

                                    But in the case of my program, i will be searching files from 5kb upwards, using
                                    search strings of atleast 40bytes.
                                    InStr may outperform BoyerMoore with smaller files, ie: below 20kb..

                                    I have already noticed how much BoyerMoore is dependant on larger files and larger
                                    search strings, searching a 70kb file runs at around 200Mb/sec, whereas a 500kb file
                                    runs at around 400Mb/sec.

                                    I will give the Jackson/Abrash BM a go.



                                    - Nathan

                                    Comment


                                    • #19
                                      A couple of things here, Tom has made a valid point in that no
                                      particular algorithm, Boyer Moore included is garranteed to be
                                      faster in every instance. It is slower on very short source text
                                      and with search patterns under 6 bytes long, there are cases where
                                      certain patterns of repeat characters will slow down a Boyer Moore
                                      algorithm but in most instances, the data is rarely in such a format.

                                      At an empirical level with large files and search patterns over 6
                                      bytes in length a Boyer Moore algo is almost exclusively faster and
                                      the speed increases as the pattern gets longer.

                                      The original algorithm by Michael Abrash in MASM and its port to
                                      DOS PowerBASIC works fine in DOS but its design is out of date in
                                      32 bit code so I would not hold out any great hope of getting it
                                      up to speed. I have a version of it re-written by Jeremy Collake
                                      which I have tweaked to fit MASM library format with his permission
                                      that uses that design but I have yet to be able to successfully
                                      port it to PowerBASIC.

                                      Here is a size optimised version of the last BM algo that I posted,
                                      it has a shorter instruction path in the loop code but it does not
                                      test up faster in direct linear read speed although it is a bit faster
                                      where there are a large number of partial matches.

                                      Same comment as with the last one, it performs well in testing but
                                      I cannot verify that it gets all matches so test the list of pattern
                                      with INSTR to make sure you get all of the matches in the source file.

                                      Just a note on the speed of INSTR, I can reliably get it to work
                                      at over 100 meg/sec so the problem you had with it reading slowly
                                      was related to the overhead in the code you were testing, INSTR is
                                      a classic byte scanner and has very good average performance over
                                      a wide range of search requirements. On your Athlon 700 you should
                                      be getting between 120 and 150 meg/sec with it if the rest of the
                                      code is working OK.

                                      Regards,

                                      hutch@pbq.com.au

                                      Code:
                                        ' #########################################################################
                                        
                                        FUNCTION BMBinSearch(ByVal startpos as LONG, _
                                                             ByVal lpSource as LONG,ByVal srcLngth as LONG, _
                                                             ByVal lpSubStr as LONG,ByVal subLngth as LONG) as LONG
                                        
                                            #REGISTER NONE
                                        
                                            LOCAL cval    as DWORD
                                            LOCAL shift_table as STRING * 1024
                                        
                                            ! cmp subLngth, 1
                                            ! jg OKsize
                                            ! mov eax, -2                 ; string too short, must be > 1
                                            ! jmp Cleanup
                                          OKsize:
                                        
                                            ! mov esi, lpSource
                                            ! add esi, srcLngth
                                            ! sub esi, subLngth
                                            ! mov ebx, esi            ; set Exit Length
                                        
                                          ' ----------------------------------------
                                          ' load shift table with value in subLngth
                                          ' ----------------------------------------
                                            ! mov ecx, 256
                                            ! mov eax, subLngth
                                            ! lea edi, shift_table
                                            ! rep stosd
                                        
                                          ' ----------------------------------------------
                                          ' load decending count values into shift table
                                          ' ----------------------------------------------
                                            ! mov ecx, subLngth           ; SubString length in ECX
                                            ! dec ecx                     ; correct for zero based index
                                            ! mov esi, lpSubStr           ; address of SubString in ESI
                                            ! lea edi, shift_table
                                        
                                            ! xor eax, eax
                                        
                                          Write_Shift_Chars:
                                            ! mov al, [esi]               ; get the character
                                            ! inc esi
                                            ! mov [edi+eax*4], ecx        ; write shift for each character
                                            ! dec ecx                     ; to ascii location in table
                                            ! jnz Write_Shift_Chars
                                        
                                          ' -----------------------------
                                          ' set up for main compare loop
                                          ' -----------------------------
                                            ! mov ecx, subLngth
                                            ! dec ecx
                                            ! mov cval, ecx
                                        
                                            ! mov esi, lpSource
                                            ! mov edi, lpSubStr
                                            ! add esi, startpos           ; add starting position
                                        
                                            ! mov edx, subLngth           ; pattern length in edx
                                        
                                            ! jmp Cmp_Loop
                                        
                                        ' *********************** Loop Code ***************************
                                        
                                          Suffix_Shift:
                                            ! add esi, eax                ; add suffix shift
                                        
                                          Pre_Cmp:
                                            ! mov ecx, cval               ; reset counter in compare loop
                                            ! cmp esi, ebx                ; test exit length
                                            ! jg  No_Match
                                        
                                            ! xor eax, eax                ; zero EAX
                                        
                                          Cmp_Loop:
                                            ! mov al, [esi+ecx]
                                            ! cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
                                            ! jne Set_Shift               ; if not equal, get next shift
                                            ! dec ecx
                                            ! jns Cmp_Loop
                                        
                                            ! jmp Match
                                        
                                          Set_Shift:
                                            ! mov eax, shift_table[eax*4] ; get char shift value
                                            ! cmp eax, edx                ; is it pattern length ?
                                            ! jne Suffix_Shift            ; if not, jump to Suffix_Shift
                                            ! lea esi, [esi+ecx+1]        ; add bad char shift
                                            ! jmp Pre_Cmp
                                        
                                        ' ***************************************************************
                                        
                                          Match:
                                            ! sub esi, lpSource           ; sub source from ESI
                                            ! mov eax, esi                ; put length in eax
                                            ! jmp Cleanup
                                        
                                          No_Match:
                                            ! mov eax, -1                 ; set value for no match
                                        
                                          Cleanup:
                                        
                                            ! mov FUNCTION, eax
                                        
                                        END FUNCTION
                                        
                                        ' #########################################################################

                                      ------------------


                                      [This message has been edited by Steve Hutchesson (edited July 02, 2001).]
                                      hutch at movsd dot com
                                      The MASM Forum

                                      www.masm32.com

                                      Comment


                                      • #20
                                        Steve,

                                        I will give that new function ago and see how it compares up to
                                        your last one

                                        I was calculating the InStr speeds incorrectly, as i didn't Multiply the sum
                                        by how many Searches it had performed - that is why i thought it was
                                        searching at ~7mb/sec After correcting that, your right - its around
                                        200Mb/sec.

                                        BTW: I've been reading up on BM, and have a clearer understanding of its
                                        somewhat clever logic

                                        - Nathan.

                                        Comment

                                        Working...
                                        X