Announcement

Collapse
No announcement yet.

INSTR performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by David Roberts View Post
    > '1st check, say, 16 early bytes to see if there's even a chance of match

    Yeah, but you are posing the question knowing the answer.

    Suppose there is a difference at the 17th byte.
    I see now the problem. For some reason I was thinking the strings were equal length, even though Chris said that situation was just a subset. Okay, we want to find out if a real big ol' string is in an equally big, or bigger string with max efficiency ie. speed. Thar's the challenge we face. Note that I optimistically call the burden a challenge. I do have an idea how to accomplish this, so I'll code some, and post it asap.

    Comment


    • #22
      The strings are the same length, John - at least in the code posted.

      If we consider s = STRING$(j,0) and ss = STRING$(j,0) ie identical strings then, on my machine, I get 362.264ms to create and 193.435 to calculate that they are the same. On the other hand, with ss = STRING$(j,1), now totally different, I get 345.064ms to create, a little faster this time, but 228.628ms to calculate that they are different.

      That, to me, is the issue here, unless I am mistaken.

      We can, in this case, just test the first byte or the first 16 bytes but in general that will not solve our problem.

      Suppose we have two strings which are identical for the first 75% and then differ. We want the computation to pull out immediately on a difference and not keep chugging away to the bitter end as seems to be happening.

      With If s = ss and they are equal I get a result in 200.457ms.

      With If s <> ss and they are not equal I get a result in 6.622ms.

      In this scenario INSTR is being crippled.

      Comment


      • #23
        I do some comparisions on FIXED strings and fixed-string members of UDTS byte-by-byte using some macros..
        Code:
        ' ===[ MACRO TO COMPARE FIXED-LENGTH BUFFERS FOR EQUALITY ]===
        ' === returns: %TRUE (strings are equal) or %FALSE (they ain't)
        '
        MACRO FUNCTION m_CompareFixedStringEqual (a1, A2, iLen)
        
          MACROTEMP iChar, iRet, p1, p2
          DIM iChar AS LONG, iRet AS LONG, p1 AS BYTE PTR, p2 AS BYTE PTR
        
            p1      = VARPTR(A1)
            p2      = VARPTR(A2)
            iRet    =  %TRUE    ' we will set to false if / when it happens
        
          FOR ichar = 1& TO ilen
            IF (@p2 XOR @p1) THEN
         '   IF @p2 <> @p1 THEN       ' as soon as any character in 2 is not equal condition is false
              iRet = %FALSE
              EXIT FOR
            ELSE
              INCR p1
              INCR p2
            END IF
          NEXT
        END MACRO = iRet
        
        ' ===[ MACRO TO COMPARE CHARACTERS AT ADDRESSES FOR EQUALITY ]===
        '
        MACRO FUNCTION m_CompareAddressEqual (a1, A2, iLen)
        
          MACROTEMP iChar, iRet, p1, p2
          DIM iChar AS LONG, iRet AS LONG, p1 AS BYTE PTR, p2 AS BYTE PTR
        
            p1      = A1
            p2      = A2
            iRet    =  %TRUE    ' we will set to false if / when it happens
        
          FOR ichar = 1& TO ilen
            IF (@p2 XOR @p1) THEN     ' as soon as any character in 2 is not equal condition is false
              iRet = %FALSE
              EXIT FOR
            ELSE
              INCR p1
              INCR p2
            END IF
          NEXT
        END MACRO = iRet
        
        
        'returns true if A2>A1
        MACRO FUNCTION m_CompareFixedStringGreater (a1, A2, iLen)
        
          MACROTEMP iChar, iRet, p1, p2
          DIM iChar AS LONG, iRet AS LONG, p1 AS BYTE PTR, p2 AS BYTE PTR
        
            p1      = VARPTR(A1)
            p2      = VARPTR(A2)
            iRet    =  %FALSE    ' we will set to true if / when it happens
        
          FOR ichar = 1& TO ilen
            IF @p2 < @p1  THEN      ' this character is less, so string cannot be greater
               EXIT FOR             ' with iRet = %FALSE
            ELSEIF @p2 > @p1 THEN    ' as soon as any character in 2 is greater than corresponding char in one
              iRet = %TRUE          ' function is true
              EXIT FOR
            ELSE                    ' must be eqaul, so we have to keep on going.
              INCR p1
              INCR p2
            END IF
          NEXT
        END MACRO = iRet
        You could adapt these.

        Note that the above are NOT "INSTR" or "compare strings" replacements... these are limited-purpose functions optimized for a very specific application.

        I only do this because I have MANY to do - millions per run - and PB support confirmed to me that fixed-string UDT members are compared by creating and destroying temporary OLE strings.

        But when I need to know "WHERE, if at all, can I find '<X>' within '<Y>' I just use INSTR, since that what it was DESIGNED to do.

        If I only needed to know "DOES '<X>' appear within '<Y>' - that is, I only needed a yes/no answer - AND I had "many" to do - AND was not dealing with already-created OLE strings - then I might work out something like the above to avoid the use of INSTR.

        Optimization is application-specific.

        MCM
        Michael Mattias
        Tal Systems Inc. (retired)
        Racine WI USA
        [email protected]
        http://www.talsystems.com

        Comment


        • #24
          Originally posted by Michael Mattias View Post
          ... then I might work out something like the above to avoid the use of INSTR.
          I've tried a comparison using a pointer and it's not as quick as INSTR. Hence my comments above about using assembler, apparently processors have moved on since the 8086!

          Originally posted by Michael Mattias View Post
          Optimization is application-specific.
          Well, we all have to work with the tools available. INSTR does work, and I could understand PowerBASIC, Inc not wanting to get drawn into discussion of how it works.

          Comment


          • #25
            We did a large file comparison a while back and there was a fair few entries in what became a little competition. The assembler routines came out on top. Comparing large files and large strings is pretty much the same. I haven't got time to search for it. I'll have a look when I have time unless someone finds the thread before then.

            Comment


            • #26
              Originally posted by David Roberts View Post
              We did a large file comparison
              I've just been reading it: http://www.powerbasic.com/support/pb...parison&page=2

              Comment


              • #27
                Hand-optimized assembly-language code for a specific application will always be faster than a 'general purpose' function such as INSTR.

                Assuming of course, the optimizer 'has a clue.'
                Michael Mattias
                Tal Systems Inc. (retired)
                Racine WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #28
                  Absolutely no promises, as I haven't had time to fully test this, but the following code appears to out perform INSTR by a factor of 2 to 5 depending on circumstances.

                  I'm sure it could be made faster by using SSE instructions but that would restrict its use to later CPUs (although SSE has been around about 10 years now).



                  Paul.
                  Code:
                  FUNCTION instr2(a$,b$) AS LONG
                  aptr&=STRPTR(a$)
                  bptr&=STRPTR(b$)
                  alen&=LEN(a$)
                  blen&=LEN(b$)
                  result&=0
                  
                  !pushad             'save all registers
                  !mov esi,aptr&      'pointer to start of long string
                  !mov edi,bptr&      'pointer to start of short string
                  !mov ebx,alen&      'length of long string
                  !sub ebx,blen&      'calculate last location to look at for start of a match,
                  !add ebx,esi        'after this point the short string can't match because there aren't enough characters
                  
                  
                  !nop      'PAD HERE!!!!!!!!!!!!!
                  !nop
                  !nop
                  !nop                'pad to align the code so certain branch targets are at the front end of a cache line.
                  !nop                'incorrect alignment of the code can increase execution time by 20% or more
                  !nop                'for correct alignment the label NoFind should be on a 16 byte boundary
                  !nop                'if you can't figure out how to do that then put this function first in your code
                  !nop                'and use trail and error (adding up to 16 NOPS) until you find the best padding.
                  !nop
                  
                  
                  
                  
                  !dec esi           'compensate for the following INC
                  '!int 3
                  NoFind:
                  !inc esi           'point to next charater in string to restart the search from
                  
                  lp0:
                  !mov al,[edi]      'get first character to look for
                  
                  lp1:
                  !cmp al, [esi]     'compare 1st character
                  !je short skip1
                  
                  lp2:
                  !cmp al, [esi+1]   'compare 2nd character
                  !je short skip2
                  
                  lp3:
                  !cmp al, [esi+2]   'compare 3rd character
                  !je short skip3
                  
                  lp4:
                  !cmp al, [esi+3]   'compare 4th character
                  !je short skip4
                  
                  '!prefetcht0 [esi+512]    'this helps on my cpu to avoid CPU having to wait for memory saving upto 20% in time
                  !db &h0f,&h18,&h8e        'PB doesn't allow the above instruction so code it by hand
                  !dd &h00000200
                  
                  !add esi,4         '4 characters done so update count
                  !cmp esi,ebx       'finished?
                  !jle short lp1     'no, go back for next 4
                  
                  
                  !mov result&,0          'yes, return 0 for string not found
                  !jmp xit
                  
                  !nop
                  !nop
                  
                  
                  
                  skip4:
                  !inc esi                'these correct the esi value depending on where in the above block of 4 the match was found
                  skip3:
                  !inc esi
                  skip2:
                  !inc esi
                  skip1:
                  
                  
                  
                  'Found a first letter match, now compare strings from this match onwards
                  !xor ecx,ecx            'zero ecx, use it as a count through the short string
                  !mov eax,blen&          'get length of short string
                  
                  '!int 3
                  lp33:
                  !sub eax,16             'Are there at least 16 bytes to do?
                  !js LessThan16          'no, just check the last few bytes
                  
                  '>16 so block compare them
                  !mov edx, [esi+ecx]     'compare the strings 4 bytes at a time
                  !cmp edx, [edi+ecx]
                  !jne NoFind
                  
                  !mov edx, [esi+ecx+4]
                  !cmp edx, [edi+ecx+4]   'and 4 times to give 16 bytes in the block
                  !jne NoFind
                  
                  !mov edx, [esi+ecx+8]   '3rd
                  !cmp edx, [edi+ecx+8]
                  !jne NoFind
                  
                  !mov edx, [esi+ecx+12]  '4th set of 4 bytes
                  !cmp edx, [edi+ecx+12]
                  !jne NoFind
                  
                  '!prefetcht0 [esi+ecx+512]     'my CPU shows no improvement with these lines, probably because the address decoders are
                  '!prefetcht0 [edi+ecx+512]     'being hit too hard by the adjacent code to be able to take note of the prefetch hint
                  
                  !add ecx,16             'update the pointer to count the 16 bytes done
                  !jmp lp33               'go back for the next block
                  
                  
                  LessThan16:
                  !add eax,16             'do the last few bytes one at a time
                  !jz xit2                'no more to do so must've found it
                  
                  lp25:
                  '!int 3
                  !mov dl, [esi+ecx]      'get next byte in long string
                  !cmp dl, [edi+ecx]      'compare with next byte in short string
                  !jne NoFind
                  !inc ecx                'update the pointer
                  
                  !dec eax                'update the byte counter
                  !jnz short lp25         'if 0 then all done, else go back for next byte
                  
                  
                  'got it!
                  xit2:
                  !sub esi,aptr&          'Subtract start of string address to get offset into string
                  !inc esi                'because I use 0 for the first character and it should be 1
                  !mov result&,esi             'write position to result
                  
                  xit:
                  
                  !popad      'restore all registers
                      
                      FUNCTION=result&
                      
                  END FUNCTION

                  Comment


                  • #29
                    Paul, thanks, it looks good. Will report tomorrow.

                    Comment


                    • #30
                      Paul

                      With s = STRING$(j,0) and ss = STRING$(j,0) I get about 155ms compared with 193ms in my earlier post. Correct that to 155ms to 55ms!

                      With ss = String$(j-100,0) + String$(100,1) your code locks up. InStr(s, ss) comes in at about 225ms.
                      Last edited by David Roberts; 30 Apr 2008, 11:19 AM. Reason: Time taken

                      Comment


                      • #31
                        This is what I get with Random Access and ss = String$(j-100,0) + String$(100,1) with the difference at the 'far right'; although for random access it doesn't matter where the difference is. The 100,000,000 has been broken into 10,000 at 10,000.

                        Code:
                        Comparisons Time
                        8667 231.678ms
                        6936 185.756ms
                        3157 90.962ms
                        1883 59.375ms
                          75 3.759ms
                         175 6.848ms
                        2353 71.432ms
                        8212 219.547ms
                        474 99.654ms
                        2286 69.473ms
                        Average time = 103.848ms

                        The worst case with 10000 comparisons would take about 250ms.

                        The following is quick and dirty just to see the idea in action.

                        Code:
                        #COMPILE  EXE
                        #REGISTER NONE
                        #DIM      ALL
                        #TOOLS    OFF
                        
                        %NOMMIDS = 1
                        %NOGDI = 1
                        
                        #INCLUDE "WIN32API.INC"
                        
                        DECLARE FUNCTION QueryPerformanceCounter LIB "KERNEL32.DLL" ALIAS "QueryPerformanceCounter" (lpPerformanceCount AS QUAD) AS LONG
                        DECLARE FUNCTION QueryPerformanceFrequency LIB "KERNEL32.DLL" ALIAS "QueryPerformanceFrequency" (lpFrequency AS QUAD) AS LONG
                        
                        '~~~~~~~~~~~A Variation of Dave Roberts' MACRO Timer~~~~~~~~~~~~~~~~~~~~~~~
                        MACRO onTimer
                          LOCAL qFreq, qOverhead, qStart, qStop AS QUAD
                          LOCAL f AS STRING
                          f = "#.###"
                          QueryPerformanceFrequency qFreq
                          QueryPerformanceCounter qStart ' Intel suggestion. First use may be suspect
                          QueryPerformanceCounter qStart ' So, wack it twice <smile>
                          QueryPerformanceCounter qStop
                          qOverhead = qStop - qStart     ' Relatively small
                        END MACRO
                        
                        MACRO goTimer = QueryPerformanceCounter qStart
                        MACRO stopTimer = QueryPerformanceCounter qStop
                        
                        MACRO showTimer = USING$(f,(qStop - qStart - qOverhead)*1000/qFreq) + " milliseconds"
                        '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        
                        FUNCTION PBMAIN( ) AS LONG
                        
                        LOCAL j, dummy AS LONG
                        DIM s AS STRING
                        DIM ss AS STRING
                        
                        DIM a(1 TO 10000) AS LONG ' initial state
                        FOR j = 1 TO 10000
                          a(j) = j
                        NEXT
                        
                        j = 100000000 
                        s = STRING$(j,0)
                        ss = STRING$(j-100,0) + STRING$(100,1)
                        
                        RANDOMIZE TIMER
                        
                        onTimer
                        
                        goTimer
                          FOR j = 1 TO 10000
                            dummy = RND(j,10000)
                            IF MID$(s, (a(dummy)-1)*10000+1, 10000) <> MID$(ss, (a(dummy)-1)*10000+1, 10000) THEN EXIT FOR
                            ' Corrected above line - I had a(dummy)*10000. tch, tch
                            SWAP a(j), a(dummy) ' to avoid repetition
                          NEXT
                        stopTimer
                        
                        MSGBOX STR$(j) + " " + ShowTimer
                        
                        END FUNCTION
                        Last edited by David Roberts; 30 Apr 2008, 02:18 AM.

                        Comment


                        • #32
                          Paul,

                          I'm away until this evening but on a first try I observed:

                          By exiting from the function when the residue(or entitey) of the "searchin" string is shorter than the "searchfor" string, a dramatic change when the strings are long - 000s of percent!

                          In a "lifelike" test, a blistering increase in speed was observed, down from 110 secs to 2 secs. But.. the results were, er, wrong - I will have to investigate later today. Meanwhile, blame me!

                          Certainly points the way, though!

                          David,

                          Are you by any chance nocturnal?!

                          Comment


                          • #33
                            On the Random Access thing.

                            Inside a big loop the average settled down to about 125ms. Not surprising as that is bang in the middle of 0 and 250.

                            A sequential approach would also see an average of 125ms if the difference data was allowed to be random from the 'left side' to the 'right side'.

                            There is, however, a difference. The sequential approach gives rise to a rectangular distribution whereas the random approach gives rise to a normal distribution. Most of the random results will be 'packed' about the average with fast and slow results being rare. The sequential results will be equally likely.

                            The input data has bee confined to two strings almost the same with a small difference section lurking somewhere in one of them.

                            What Paul is up to is much more interesting.

                            David,

                            Are you by any chance nocturnal?!
                            Sometimes.
                            Last edited by David Roberts; 30 Apr 2008, 03:45 AM.

                            Comment


                            • #34
                              Wow, Paul did that whole thing and I am like maybe 1/2 done, and judging by the results Paul is getting, 1/2 is as far as I'm going. Does the phrase "beating a dead horse" ring any bells?

                              I do have a question for Paul: A while ago I saw a technique by I think it was Agner Fog, where you can determine if a certain byte is in a dword. Say, you're looking for &h31 in &h23573166. You do a couple operations and if the byte is present, the dword becomes maybe 0, or even &hffffffff. The effect is like maybe this:

                              Code:
                               !mov ecx, &h23573166
                               !mov eax, &h23573166
                               !xor ecx, &h31313131  ;12660057
                               !xor eax, &h31313131  ;12660057
                               !shr ecx, 16          ;00001266
                               !and al,  ah          ;12660000
                               !and cl,  ch          ;00001202
                               !and cx,  ax          ;00000000
                               !jz location
                               location:
                              and ecx will be 0 if &h31 is one of the 4 bytes in the dword. Do you know what that algorithm is?

                              Comment


                              • #35
                                Paul,
                                Thank for the code, even better its well commented.

                                One thing, I tried it on 2 machines and did not get valid results.

                                For example, Calling Instr2("12345x7890", "x") return 1 instead of 6
                                Problem seem's to be around <<!sub esi, aptr&>>,
                                this look like esi is resetted to zero,
                                it dosen't make any sense to me because
                                I cheched esi and aptr& just before this line and they are valid.

                                Does anybody have different result with this ?

                                Comment


                                • #36
                                  Pierre,
                                  I can't duplicate your error. The code produces 6 here, as it should. Which CPU are you using?


                                  David,
                                  With ss = String$(j-100,0) + String$(100,1) your code locks up
                                  A quick test shows it does. I'll look into it when I get time.



                                  Chris,
                                  But.. the results were, er, wrong
                                  I haven't yet found any wrong results although Pierre reckons he gets the wrong answer for some. Can you post an example of a wrong result along with the CPU you're using?

                                  Paul.

                                  Comment


                                  • #37
                                    I'm getting the same as Pierre, Paul

                                    Result is OK if I change the last lines to

                                    Code:
                                    'got it!
                                    xit2:
                                    '!Sub esi,aptr&          'Subtract start of string address to get offset into string
                                    '!inc esi                'because I use 0 for the first character and it should be 1
                                    !mov result&,esi             'write position to result
                                    
                                    xit:
                                    
                                    !popad      'restore all registers
                                        
                                        Function=result& - aptr& + 1
                                        
                                    End Function
                                    My CPU is an Intel E6700

                                    ADDED: Problem solved if I use #REGISTER NONE in Instr2.
                                    Last edited by David Roberts; 30 Apr 2008, 12:30 PM.

                                    Comment


                                    • #38
                                      Tried on those 3...
                                      Intel E6750
                                      AMD Athlox XP 2600+
                                      AMD Athlon 700mhz

                                      David solution works for me too :-)

                                      Comment


                                      • #39
                                        I normally use #REGISTER NONE in Functions which use a mix of PB and assembler. On this occasion I just copied Paul's code and pasted without further ado.

                                        Without its inclusion aptr& is being allocated esi so when we get to !Sub esi,aptr&, esi becomes zero.

                                        Perhaps Paul is using #REGISTER NONE globally.

                                        Comment


                                        • #40
                                          >Perhaps Paul is using #REGISTER NONE globally

                                          I do, in my "template" file. When I want REGISTER vars, I choose at the LOCAL level.
                                          Michael Mattias
                                          Tal Systems Inc. (retired)
                                          Racine WI USA
                                          [email protected]
                                          http://www.talsystems.com

                                          Comment

                                          Working...
                                          X