Announcement

Collapse
No announcement yet.

INSTR performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pierre Bellisle
    replied
    Oups! Wrong thread....

    Leave a comment:


  • David Roberts
    replied
    GOT IT!

    With this test case I figured that we should only go past NoFind: twice - the first time by falling through and the second time after a jump but NoFind: was passed an enormous amount of times.

    If we insert as follows

    ! dec esi 'Compensate for the following INC
    NoFind:
    ! inc esi 'Point to next charater in string to restart the search from
    ! cmp esi, ebx
    ! jg xit ' not enough characters though, so not found


    we pull out without further ado.

    With

    SubString = String$(100000000,0)
    MainString = String$(100000000-16, 0) + String$(16,1)

    InStr pulls out after 234ms, Boyer-Moore after 157ms and INSTR2 after 140ms.

    So, Paul's code now reckons no match the fastest.

    Added: However, , with MainString extended

    SubString = String$(100000000,0)
    MainString = String$(100000000-16,0) + String$(30,1)

    InStr pulls out after 3266ms, Boyer-Moore after 172ms and INSTR after 1875ms.

    Boyer-Moore likes big strings here too.

    Short ones?

    SubString = String$(5,5)
    MainString = String$(100000000-16,0) + String$(30,1)

    InStr at 219ms, Boyer-Moore at 62ms and INSTR2 at 47ms.

    Here we have the less than 6 rule coming into play again.
    Last edited by David Roberts; 6 May 2008, 11:25 AM.

    Leave a comment:


  • David Roberts
    replied
    Sounds good, Paul.

    I don't know if you are considering a rewrite or a tweak but if the latter it is worth mentioning the 'lock up' encountered a little while back.

    I've revised the test case to

    SubString = String$(10000,0)
    MainString = String$(10000-16, 0) + String$(16,1)

    Both InStr and Boyer-Moore return almost immediately whereas, on my machine, INSTR2 returns in 16ms.

    Changing the 10,000 to 100,000 both InStr and Boyer-Moore again return almost immediately with INSTR2 returning after 1188ms.

    With 1,000,000 INSTR2 is clearly not 'locking-up' as thought and will probably return if we waited long enough but seems to get 'bogged down' with redundant calculation.

    Added: Just for the hell of it I tried

    SubString = String$(100000000,0)
    MainString = String$(16,1) + String$(100000000, 0)

    Boyer-Moore came back in 438ms, InStr in 234ms and INSTR2 in 109ms.

    I've just deleted an 'oops'. I had StartPos for InStr as 60 so it didn't find SubString.
    Last edited by David Roberts; 6 May 2008, 07:51 AM.

    Leave a comment:


  • Pierre Bellisle
    replied
    David,
    Thank for the link and the bug report.
    I guess that 6 reveal Paul's excellent work.

    Paul,
    Thank for the links also.
    I see Boyer-Moore the same way you do.
    For me, since the size of both the Boyer-Moore and Instr2,
    is insignificant compared to the final size of an exe,
    I will keep a routine that have both of them and
    programally switch accordingly to the substring lenght.
    Best of both world.

    I will stay around, I don't want to miss your next version,
    great work...

    Leave a comment:


  • Paul Dixon
    replied
    Pierre,
    not concise at all but explains all the instructions in detail, the Intel IA-32 Software Developer's Manual. It comes in about 5 volumes, you need Volumes 2A and 2B.

    http://www.intel.com/products/processor/manuals/

    Volume 2A:
    http://download.intel.com/design/pro...als/253666.pdf
    Volume 2B
    http://download.intel.com/design/pro...als/253667.pdf



    David,
    I haven't followed this whole thread, I've been too busy recently, but I'm sure I can knock 30% off the original INSTR2 I posted, (at least for short search strings). I hope to get some code done in the next day or 2 or maybe 3.. .
    It'll still be a straight forward "Search from the start" algorithm, nothing fancy like Boyer-Moore, but just more efficient than my original code.

    From the little I've read about Boyer-Moore, almost any INSTR should beat it with small search strings. Boyer-Moore is only of real use when the string to be searched for is long, and has certain other characteristics, as it's only then that the algorithm can reliably skip forward and not need to read the whole of the string to be searched.

    Paul.

    Leave a comment:


  • David Roberts
    replied
    The following is a link to a pdf download

    OpCode of Intel Assembly 80x86 Mnemonics

    Found a bug in your version of Instr2, Pierre.

    Code:
    ! mov edi, pSubString                 'Pointer to start of short string
    ! mov esi, pMainString                'Pointer to start of long string
    ! Add esi, StartPos                   'Add starting position
    
    ! mov ebx, LenMainString              'Length of long string
    ! Sub ebx, LenSubString               'Calculate last location to look at for start of a match,
    ! js xit                              'If LenSubString > LenMainString
    ! Add ebx, esi                        'after this point the short string can't match because there aren't enough characters
    StartPos is added to esi prematurely. Ebx can end up past the end of MainString.

    Ebx should be the same for any value of StartPos.

    Change to

    Code:
    ! mov edi, pSubString                 'Pointer to start of short string
    ! mov esi, pMainString                'Pointer to start of long string
     
    ! mov ebx, LenMainString              'Length of long string
    ! Sub ebx, LenSubString               'Calculate last location to look at for start of a match,
    ! js xit                              'If LenSubString > LenMainString
    ! Add ebx, esi                        'after this point the short string can't match because there aren't enough characters
    ! Add esi, StartPos                   'Add starting position
    Added: The 'overun' in some cases was sufficient to effect speed. Instr2 is now faster than Boyer-Moore when SubString is less than 6.
    Last edited by David Roberts; 5 May 2008, 05:43 PM.

    Leave a comment:


  • Pierre Bellisle
    replied
    Nice!
    Look's good to me...


    [Added]
    Does anybody knows where to find a really good assembler reference ?

    I mean all recents Intel/AMD opcodes with CPU restriction
    and concise explanation of what the opcode does.
    For example, many I found on the net do not have JS.
    Last edited by Pierre Bellisle; 5 May 2008, 01:04 PM.

    Leave a comment:


  • David Roberts
    replied
    Is this OK?

    to

    Code:
    ! mov ebx, LenMainString              'Length of long string
    ! Sub ebx, LenSubString               'Calculate last location to look at for start of a match,
    ! js xit                              'If LenSubString > LenMainString
    ! Add ebx, esi                        'after this point the short string can't match because there aren't enough characters

    Leave a comment:


  • Pierre Bellisle
    replied
    This look's good...

    Code:
    Change
     ! mov ebx, LenMainString              'Length of long string
     ! sub ebx, LenSubString               'Calculate last location to look at for start of a match,
     ! add ebx, esi                        'after this point the short string can't match because there aren't enough characters
     
    to
     ! mov ebx, LenMainString              'Length of long string
     ! sub ebx, LenSubString               'Calculate last location to look at for start of a match,
     ! cmp ebx, -1                         'Is substring longer than string
     ! jle xit                             'If lower or equal then exit 
     ! add ebx, esi                        'after this point the short string can't match because there aren't enough characters

    Leave a comment:


  • David Roberts
    replied
    Yep.

    Leave a comment:


  • Pierre Bellisle
    replied
    One more adjustement,

    Paul's Instr2 will nedd to check
    if substring is longer than mainstring.
    Because I got invalid result with...

    MainString = String$(100,0)
    SubString = String$(101,0)

    Leave a comment:


  • David Roberts
    replied
    Amazing, we posted at the same time.

    Technically, your solution is the correct one.

    How many inclusive elements between 10 and 5. It is, of course, 6 but many would say 5 as a quick response.

    Well done.

    Leave a comment:


  • David Roberts
    replied
    Even dynamic strings are null-terminated in PB so the following should do the trick

    Code:
    ! mov pMainString, esi                'Put main string pointer in pMainString
    ! mov esi, [esi - 4]                  'Put main string lenght in esi
    ! inc esi                             ' Avoids losing a SubString at end of MainString
    ! mov LenMainString, esi              'Put main string lenght in LenMainString
    unless someone sees a fault in doing so.

    Leave a comment:


  • Pierre Bellisle
    replied
    David,

    You have eagle's eyes.
    The following seem's to take care of it...

    Code:
    Change
     
     ! mov esi, LenMainString              'Get main string lenght in esi
     ! sub esi, LenSubString               'Substarct the lenght of the substring
     ! cmp StartPos, esi                   'Compare this value to StartPos 
     ! ja No_Match                         'If smaller or equal then jump to No_Match
     
    to
     
     ! mov esi, LenMainString              'Get main string lenght in esi
     ! sub esi, LenSubString               'Substarct the lenght of the substring
     ! inc esi                             'Set esi to LenMainString - LenSubString + 1
     ! cmp StartPos, esi                   'Compare this value to StartPos 
     ! ja No_Match                         'If smaller or equal then jump to No_Match

    Leave a comment:


  • David Roberts
    replied
    MainString = String$(100,0)
    SubString = String$(99,0)

    Both INSTR and INSTR2 find substring at postions 1 and 2.
    Boyer-Moore only finds substring at position 1.

    MainString = String$(100,0)
    SubString = String$(100,0)

    Both INSTR and INSTR2 find substring at postion 1.
    Boyer-Moore does not find substring.

    It is probably an incorrect jump instruction but I cannot see it yet.

    Leave a comment:


  • Pierre Bellisle
    replied
    No, its not, so probably we can expect
    even a better performance from Instr2..

    Leave a comment:


  • David Roberts
    replied
    Thanks Pierre.

    I'm getting Paul's method to be faster than Boyer-Moore when we have less than 5 characters.

    Added: Is your padding optimized in your version of INSTR2?
    Last edited by David Roberts; 2 May 2008, 10:11 AM.

    Leave a comment:


  • Pierre Bellisle
    replied
    Because Boyer-Moore is relatively slow with short substring,
    the Steeve and Börge's function jump
    to PB's INSTR instead of using the asm B-M code algo
    when the substring is less than 3 charaters long.

    Paul did an outstanding job with his Instr2,
    this is fast, and he may even come back with something better.

    To benefit from the best of both world,
    and acheive "ultimate speed",
    one could replace the call to PB's INSTR with a
    call to Paul's Inst2 in the Boyer-Moore function.

    Leave a comment:


  • Chris Holbrook
    replied
    Originally posted by David Roberts View Post
    From what I have read Boyer-Moore comes into its own with relatively large sub-strings or should I say where there is greater redundancy for exploitation.
    ISTR that BM is better with longer unique substrings, as distinct from longer substrings per se, as repetion reduces the shift number. Don't have time to look at Pierre's code yet.

    Leave a comment:


  • David Roberts
    replied
    I've now had a chance to play with your code, Pierre. Nice work.

    With the sub-string as posted I get 703ms for INSTR, 78ms for Boyer-Moore and 266ms for Paul's INSTR2.

    From what I have read Boyer-Moore comes into its own with relatively large sub-strings or should I say where there is greater redundancy for exploitation.

    So, I decided to deprive it and reduced the sub-string to simply $Nul. That should throttle the method employed.

    It did. Now I get 703ms for INSTR, 703ms for Boyer-Moore and 140ms for Paul's INSTR2.

    That is one hell of a reversal.

    It needs looking at further before any conclusions can be drawn and the hiccup I found with Paul's code needs to be addressed; not necessarily by Paul but it is his baby.

    Leave a comment:

Working...
X