Announcement

Collapse
No announcement yet.

Algorithm Challenge

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Algorithm Challenge

    Hi All,

    I have a need to compare two strings in a sliding fashion. For example, lets suppose you have two strings:

    A B C D E
    F G H I J

    You want to compare these two in a way that the first comparison might be:

    E
    F

    Next you slide and compare:

    D E
    F G

    Next you slide and compare:

    C D E
    F G H

    And onward until the last comparison would then be:

    A
    J

    In each case you are comapring overlapping characters to eachother to see if they match, but that is secondary to the algorithm. I have written this and it works, but I am wondering if it could be done faster, because in my application speed matters. Of course, ASM would be great, but I am wondering if anyone can do it faster than my code. Below is the code (note that z and y are indexes into the string which is stored in an array...also rawinputlength is length of the string, a.k.a the number of elements in the array).

    Code:
    'this first section slides the strings until the are almost directly over eachother
            matches2 = 0: matches = 0
            overlap1 = %rawInpLength: overlap2 = 1
            z = %rawInpLength
            y = 1
            DO
                 DO WHILE z >= overlap1
                      DO WHILE y >= 1
                          '''''Do comparison here
                          z = z - 1
                          y = y - 1
                      WEND
                 WEND
                 DECR overlap1
                 INCR overlap2
                 z = %rawInpLength
                 y = overlap2
    
            LOOP WHILE overlap2 <= %rawInpLength-1
    
    'this section then continues the sliding forward from where the last code finished
            FOR z = 0 TO %rawInpLength-1
    
                 FOR y = 1 TO %rawInpLength - z
                        '''''Do comparison here
                 NEXT y
            NEXT z
    Last edited by Kevin Cramer; 11 Nov 2009, 11:30 AM.

  • #2
    You might want to do some research on the 'Ceasar Cipher'. Looks like that is what you are looking for.
    There are no atheists in a fox hole or the morning of a math test.
    If my flag offends you, I'll help you pack.

    Comment


    • #3
      Kevin,

      How about giving a compilable of your approach so we can test if our solution is faster?

      Comment


      • #4
        I'm still not 100% sure of what you're looking for, but here's an approach that tests the strings "abcdx" and "cdxab". It took 2.3 seconds for 1,000,000 tests. In a second test, with a string of length 29 characters, it took right at 3 seconds for the 1,000,000 tests.

        The output gives the strings tested, which slider position were a match and how long it took.

        Without your code to compare it to, I don't know if that's fast or slow - or even if the output is what you were looking for.

        I've only done a limited numbers of test strings, so let me know if you find anything wrong with it.

        Code:
        'Compilable Example:
        #Compile Exe
        #Dim All
        #Include "Win32API.inc"
        Global hDlg as DWord
        Function PBMain() As Long
           Dialog New Pixels, 0, "Test Code",300,300,200,200, %WS_OverlappedWindow To hDlg
           Control Add Button, hDlg, 100,"Push", 50,10,100,20
           Dialog Show Modal hDlg Call DlgProc
        End Function
        CallBack Function DlgProc() As Long
           If CB.Msg = %WM_Command AND CB.Ctl = 100 AND CB.Ctlmsg = %BN_Clicked Then
              Dim iStart As Long, iEnd As Long, Result As String, A as String, B as String
              A = "abcdx"  :  B = "cdxab"
              iStart = GetTickCount
              Result = SlidingComparison( 1000000, A, B)
              iEnd = GetTickCount
              MsgBox "Strings:" + $crlf + A + $crlf + B + $crlf + $crlf + Format$((iEnd - iStart)/1000,3) & " seconds" + $crlf + $crlf + "Matches:" + $crlf  + Result
           End If
        End Function
        
        Function SlidingComparison(iLoops&, A as String, B as String) As String
            'strings must be equal length
            Local pEndA As Byte Pointer, pStartB As Byte Pointer, sLength As Long, iNotMatchFlag As Long, Matches$, X As Long
            Local iMisMatch As Long, pATemp As Byte Pointer, pBTemp As Byte Pointer, i As Long, j As Long, pStartA As Byte Pointer
            sLength  = Len(A)
            pEndA    = StrPTR(A) + sLength - 1
            pStartA  = StrPTR(A)
            pStartB  = StrPTR(B)
        
            For x = 1 To iLoops&
                Matches$ = ""
        
                For i = 1 To sLength                           ' i is the amount of overlap
                    iNotMatchFlag = 0
                    For j = 1 To i
                        pATemp = pEndA - i + j
                        pBTemp = pStartB + j - 1
                        If @pATemp <> @pBTemp Then
                            iNotMatchFlag = 1
                            Exit For
                        End If
                    Next j
                    If iNotMatchFlag = 0 Then  Matches$ = Matches$ + " " + Str$(i)
                Next i
        
                For i = 2 To sLength                            ' i is the amount of overlap
                    iNotMatchFlag = 0
                    For j = i To sLength
                        pATemp = pStartA - i + j
                        pBTemp = pStartB + j - 1
                        If @pATemp <> @pBTemp Then
                            iNotMatchFlag = 1
                            Exit For
                        End If
                    Next j
                    If iNotMatchFlag = 0 Then  Matches$ = Matches$ + " " + Str$(i+sLength)
                Next i
        
            Next X
            Function = Matches$
        
        End Function

        Comment


        • #5
          Originally posted by Gary Beene View Post
          Kevin,

          How about giving a compilable of your approach so we can test if our solution is faster?
          Here is some simple code to precede the prior code that should serve as a decent test bed. You can make the strings whatever length you want...

          Code:
          #COMPILE EXE
          #DIM ALL
          
          rawInpLength = 300
          
          FUNCTION PBMAIN () AS LONG
          
              REGISTER x AS LONG, y AS LONG, z AS LONG
              DIM overlap1 AS LONG, overlap2 AS LONG
              
              DIM temp1(300) AS STRING
              DIM temp2(300) AS STRING
              
              FOR x = 1 TO 300
                   temp1(x) = CHR$(RND(65,90))
                   temp2(x) = CHR$(RND(65,90))
              NEXT x
              
              ''''insert rest of code here...
                   
          
          END FUNCTION

          Comment


          • #6
            This is a quite fast way without asm:

            Added: REGISTER'ed two vars (changed lines in red) for a nice speed increase
            Code:
            #COMPILE EXE
            #DIM ALL
            
            FUNCTION PBMAIN () AS LONG
            [COLOR="Red"]   LOCAL str1, str2 AS STRING, ii, rawLen AS LONG, t AS QUAD
               REGISTER overlap AS LONG, compare AS LONG
            [/COLOR]   rawLen = 100
               str1 = SPACE$(rawLen)
               str2 = STRING$(rawLen, 32)
               DIM s1Arr(1 TO rawLen) AS BYTE AT STRPTR(str1)
               DIM s2Arr(1 TO rawLen) AS BYTE AT STRPTR(str2)
               DIM matches(1 TO rawLen) AS LONG
               DIM matches2(1 TO rawLen - 1) AS LONG
               
               FOR overlap = 1 TO rawLen
                  FOR compare = 1 TO overlap
                     IF s1Arr(rawLen - compare + 1) = s2Arr(compare) THEN INCR matches(overlap)
                  NEXT
               NEXT
               
               'and reverse strings, but don't count "center position" twice, so only go to 99
               
               FOR overlap = rawLen - 1 TO 1 STEP -1
                  FOR compare = 1 TO overlap
                     IF s2Arr(rawLen - compare + 1) = s1Arr(compare) THEN INCR matches2(overlap)
                  NEXT
               NEXT
               ? "Compare is done"
            '   WAITKEY$
            
               'show results.
               'Note: The below method to display above results is just for demo purposes and not final production code.
               'If the strings being compared are large, "str1 &= STR$(matches(ii))" is a VERY SLOW way to display
               'the results and should not be used. Iterative string concatenation is very slow and should be avoided.
               RESET str1
               FOR ii = 1 TO rawLen
                  str1 &= STR$(matches(ii))
               NEXT
               FOR ii = rawLen - 1 TO 1 STEP -1
                  str1 &= STR$(matches2(ii))
               NEXT
               ? str1
            '   WAITKEY$
            
            END FUNCTION
            Last edited by John Gleason; 11 Nov 2009, 06:26 PM. Reason: REGISTER'ed 2 vars.

            Comment


            • #7
              I ran a timing on the post #6 code and got 5 billion Tix for two (identical) strings of 16,000 bytes each. For non-matching strings the time was 3.7 billion Tix. So, 3.7 to 5 billion Tix depending on the two strings' matching level would be the algo speed.
              Last edited by John Gleason; 11 Nov 2009, 06:00 PM.

              Comment


              • #8
                Speed test

                Hi Gary,

                i had to modify your code to work with an array instead of pointers into a single string (see test code below). We get the same count at the end, which means your code works. Mine was a bit faster over a 30,000 element array...here were the results:

                test kevin start: 68241.125
                Kevin finish: 68306.359 34606953
                test Gary start: 68306.359
                test Gary end: 68376.312 34606953



                Code:
                %rawInpLength = 30000
                
                FUNCTION PBMAIN () AS LONG
                
                    REGISTER x AS LONG, y AS LONG, z AS LONG
                    DIM overlap1 AS LONG, overlap2 AS LONG, counter AS LONG
                    
                    counter = 0
                    
                    OPEN "speedtest.log" FOR OUTPUT AS #10
                
                    DIM temp1(%rawInpLength) AS STRING
                    DIM temp2(%rawInpLength) AS STRING
                
                    FOR x = 1 TO %rawInpLength
                         temp1(x) = CHR$(RND(65,90))
                         temp2(x) = CHR$(RND(65,90))
                    NEXT x
                    
                    PRINT#10,"test kevin start:";TIMER
                
                    overlap1 = %rawInpLength: overlap2 = 1
                    z = %rawInpLength
                    y = 1
                    DO
                         DO WHILE z >= overlap1
                              DO WHILE y >= 1
                                  '''''Do comparison here
                                  
                                  IF temp1(z) = temp2(y) THEN INCR counter
                                  z = z - 1
                                  y = y - 1
                              WEND
                         WEND
                         DECR overlap1
                         INCR overlap2
                         z = %rawInpLength
                         y = overlap2
                
                    LOOP WHILE overlap2 <= %rawInpLength-1
                
                      'this section then continues the sliding forward from where the last code finished
                    FOR z = 0 TO %rawInpLength-1
                
                         FOR y = 1 TO %rawInpLength - z
                             IF temp1(y) = temp2(y+z) THEN INCR counter
                         NEXT y
                    NEXT z
                    
                    PRINT#10,"Kevin finish:";TIMER, counter
                    
                    
                    LOCAL pEndA AS LONG, pStartB AS LONG, sLength AS LONG, iNotMatchFlag AS LONG, Matches$
                    LOCAL iMisMatch AS LONG, pATemp AS LONG, pBTemp AS LONG, i AS LONG, j AS LONG, pStartA AS LONG
                                        
                    PRINT#10,"test Gary start:";TIMER
                    sLength  = %rawInpLength
                    pEndA    = %rawInpLength
                    pStartA  = 1
                    pStartB  = 1
                    counter = 0
                
                    FOR i = 1 TO sLength                           ' i is the amount of overlap
                        FOR j = 1 TO i
                            pATemp = pEndA - i + j
                            pBTemp = pStartB + j - 1
                            IF temp1(pATemp) = temp2(pBTemp) THEN
                                INCR counter
                            END IF
                        NEXT j
                    NEXT i
                
                    FOR i = 2 TO sLength                            ' i is the amount of overlap
                        FOR j = i TO sLength
                            pATemp = pStartA - i + j
                            pBTemp = pStartB + j - 1
                            IF temp1(pATemp) = temp2(pBTemp) THEN
                                INCR counter
                            END IF
                        NEXT j
                    NEXT i
                
                    PRINT#10,"test Gary end:";TIMER,counter
                
                END FUNCTION

                Comment


                • #9
                  John Test

                  Hi John,

                  I tested your method, code below, but got a different match count than my code. Perhaps I made an error when converting your code to work with my scenario?

                  test kevin start: 71442.406
                  Kevin finish: 71518.875 34606953
                  test Gary start: 71518.875
                  test Gary end: 71610.546 34606953
                  test John Start: 71610.546
                  test John End: 71723.718 34560000

                  Code:
                      counter = 0
                      
                      PRINT#10,"test John Start:";TIMER
                      LOCAL overlap, compare, rawLen AS LONG
                      
                      FOR overlap = 1 TO %rawInpLength
                         FOR compare = 1 TO overlap
                            IF temp1(%rawInpLength - compare + 1) = temp2(compare) THEN INCR counter
                         NEXT
                      NEXT
                  
                     'and reverse strings, but don't count "center position" twice, so only go to 99
                  
                     FOR overlap = %rawInpLength - 1 TO 1 STEP -1
                        FOR compare = 1 TO overlap
                           IF temp2(%rawInpLength - compare + 1) = temp1(compare) THEN INCR counter
                        NEXT
                     NEXT
                     
                     PRINT#10,"test John End:";TIMER,counter

                  Comment


                  • #10
                    Kevin, that translation looks correct, must be a problem with my code. I'll see if I can debug it. Well, I just checked the timing and it's slower than both the other code examples, so no reason to debug that one, just eighty-six it.

                    I'll see if I can think of a different approach.
                    Last edited by John Gleason; 11 Nov 2009, 07:59 PM.

                    Comment


                    • #11
                      John,

                      I have never used the Register function before so when I saw your post I thought I'd give it a try on the example I already gave.

                      Here's the earlier example I posted, but with the loop variables declared with REGISTER.

                      But to my surprise, the new answer was almost the same as the old one - no gain worth mentioning.

                      All I did was use REGISTER on the i,j loop variables. These get changed a lot in the Function.

                      Would you know why a noticeable gain would not be seen? Like I said, I've not used them before but from reading the Help file I was expecting at least a noticeable speed improvement.

                      Code:
                      'Compilable Example:
                      #Compile Exe
                      #Register None
                      #Dim All
                      #Include "Win32API.inc"
                      Global hDlg As Dword
                      Function PBMain() As Long
                         Dialog New Pixels, 0, "Test Code",300,300,200,200, %WS_OverlappedWindow To hDlg
                         Control Add Button, hDlg, 100,"Push", 50,10,100,20
                         Dialog Show Modal hDlg Call DlgProc
                      End Function
                      CallBack Function DlgProc() As Long
                         If Cb.Msg = %WM_Command And Cb.Ctl = 100 And Cb.CtlMsg = %BN_Clicked Then
                            Dim iStart As Long, iEnd As Long, Result As String, A As String, B As String
                            A = "abcdx"  :  B = "cdxab"
                            iStart = GetTickCount
                            Result = SlidingComparison( 1000000, A, B)
                            iEnd = GetTickCount
                            MsgBox "Strings:" + $CrLf + A + $CrLf + B + $CrLf + $CrLf + Format$((iEnd - iStart)/1000,3) & " seconds" + $CrLf + $CrLf + "Matches:" + $CrLf  + Result
                         End If
                      End Function
                      
                      Function SlidingComparison(iLoops&, A As String, B As String) As String
                          'strings must be equal length
                          Register i As Long, j As Long
                          Local pEndA As Byte Pointer, pStartB As Byte Pointer, sLength As Long, iNotMatchFlag As Long, Matches$, x As Long
                          Local iMisMatch As Long, pATemp As Byte Pointer, pBTemp As Byte Pointer, pStartA As Byte Pointer
                          sLength  = Len(A)
                          pEndA    = StrPtr(A) + sLength - 1
                          pStartA  = StrPtr(A)
                          pStartB  = StrPtr(B)
                      
                          For x = 1 To iLoops&
                              Matches$ = ""
                      
                              For i = 1 To sLength                           ' i is the amount of overlap
                                  iNotMatchFlag = 0
                                  For j = 1 To i
                                      pATemp = pEndA - i + j
                                      pBTemp = pStartB + j - 1
                                      If @pATemp <> @pBTemp Then
                                          iNotMatchFlag = 1
                                          Exit For
                                      End If
                                  Next j
                                  If iNotMatchFlag = 0 Then  Matches$ = Matches$ + " " + Str$(i)
                              Next i
                      
                              For i = 2 To sLength                            ' i is the amount of overlap
                                  iNotMatchFlag = 0
                                  For j = i To sLength
                                      pATemp = pStartA - i + j
                                      pBTemp = pStartB + j - 1
                                      If @pATemp <> @pBTemp Then
                                          iNotMatchFlag = 1
                                          Exit For
                                      End If
                                  Next j
                                  If iNotMatchFlag = 0 Then  Matches$ = Matches$ + " " + Str$(i+sLength)
                              Next i
                      
                          Next X
                          Function = Matches$
                      
                      End Function
                      ... edited at 9pm. I left off the #Register None, but the results were the same - no noticeable difference between the two.
                      Last edited by Gary Beene; 11 Nov 2009, 09:03 PM.

                      Comment


                      • #12
                        I see you wrote to John...but i will add my two cents worth. The Powerbasic folks play up the register capability for performance improvements in their software, but i have seen very little evidence that it makes any difference. Perhaps with older OS's and CPU's it made more of a difference, but with modern processor and OS it seems to make little difference.

                        You may notice more with floating point variables being registered, which is only allowed for EXT data types...I have not tested that as thoroughly...

                        Cheers,
                        Kevin

                        Comment


                        • #13
                          John/Kevin,

                          I got the answer in a different thread. In my example above, questioning the value of Register, I didn't specifically ensure that i,j were not automatically used as register variables.

                          But when I rewrote the code to guarantee that i,j are NOT register variables, there is a speed difference although not that much since the content within the loops is the dominant speed factor.

                          Removing all of the content, leaving just the loop, the with-REGISTER variables time was 0.09 seconds vs 0.16 without register variables. So the speed improvement when using REGISTER is there, but it's just not large in comparison to the speed of all the content within the loops.

                          Comment


                          • #14
                            Kevin/Gary,
                            A possible reason the register vars. didn't make a difference between 2 code versions, is that both versions may already have similar variables registered even tho you didn't specify them. The compiler did it automatically and it matched or mostly matched your explicit declaration.

                            Sometimes it doesn't make a big difference using them or not, but I try to always give it a shot.

                            I'm further thinking that Paul Dixon's code in the DNA thread may be close to the answer here. Ever done any MMX code?

                            Comment


                            • #15
                              I have not done MMX code. This comparison is very similar to DNA sequence comparison..something I am slightly familiar with but not what this application is for. I will try to find that thread...thanks

                              Comment


                              • #16
                                Here is the thread:

                                http://www.powerbasic.com/support/pb...ad.php?t=41843

                                Comment


                                • #17
                                  Kevin, try the code below in your algorithm. Check if it is right, but I'm getting a several times speedup just converting to byte from string.
                                  Code:
                                      DIM temp1(%rawInpLength) AS BYTE
                                      DIM temp2(%rawInpLength) AS BYTE
                                  
                                      FOR x = 1 TO %rawInpLength
                                           temp1(x) = RND(65,90)
                                           temp2(x) = RND(65,90)
                                      NEXT x

                                  Comment


                                  • #18
                                    >> If @pATemp <> @pBTemp Then
                                    ==>
                                    IF (@pATemp XOR @pbTemp) THEN

                                    (Compiler may already be doing this for "<>" operator with integer class variables)
                                    Michael Mattias
                                    Tal Systems (retired)
                                    Port Washington WI USA
                                    [email protected]
                                    http://www.talsystems.com

                                    Comment


                                    • #19
                                      That's an interesting trick but, I'm not sure in what case an XOR would be any faster than an equality comparison.

                                      Comment


                                      • #20
                                        I should have said that I am actually comparing two arrays og long integers. I just used strings in my example because it was a bit easier to show what I needed using strings, since the core algorithm remains the same in either case. So any special tricks for making string comparisons faster will not help me.

                                        I am feeling that aside from ASM that the methos I came up with will not be beaten by any other method. I checked the DNA thread which essentially asked for the same thing and I think my code will probably run faster than the solutions over there too.

                                        Thanks all for the suggestions!

                                        Kevin

                                        Comment

                                        Working...
                                        X