Announcement

Collapse
No announcement yet.

Procedure to Display Text Difference

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Procedure to Display Text Difference

    I've previously written a (not very robust) text comparison procedure and would to make it a lot better. I'm doing the usual net search for algorithms and source code. I also looked on the forum but did not find any PowerBASIC code along those lines.

    Basically, I'd like to put text in two RichEdit controls and use colors to highlight the differences - a WinDiff kind of comparison. But I don't want a separate utility. I already use my limited procedure in gbSnippets to compare snippets from different libraries, and would like to improve the procedure.

    The code below is my starting point - it just basically does a look ahead for a fixed number of lines. It reads the left side, looks for a match within the next few lines (10 in this case) on the right side. When it discovers differences it offsets the differing lines, using "********" to fill in where an offset was displayed. Speed of the comparison shouldn't be an issue, given the relatively small snippet sizes I'm working with.

    My procedure generally works for small differences, but has limitations. It does not address RichEdit coloration of the differences.

    If anyone has something to share, I'd appreciate hearing about it.

    Code:
    'Compilable Example:
    #Compile Exe
    #Dim All
    #Include "win32api.inc"
    #Include "richedit.inc"
    #Include "commctrl.inc"
    %ID_RELeft = 200  : %ID_RERight = 300
    %ID_Button1 = 400 : %ID_Button2 = 500
    Global hDlg As Dword, hRELeft As Dword, hRERight As Dword
    
    Function PBMain () As Long
       Local Style&
       style& = %WS_Child Or %WS_Visible Or %ES_MultiLine Or %WS_VScroll Or %ES_AutoHScroll _
                 Or %WS_HScroll Or %ES_AutoVScroll Or %ES_WantReturn Or %ES_NoHideSel Or %WS_TabStop
       Dialog New Pixels, 0, "Compare Text Example",400,400,400,200, %WS_OverlappedWindow To hDlg
       Control Add Button, hDlg, %ID_Button1, "Compare", 30,10,100,20
       Control Add Button, hDlg, %ID_Button2, "Reset", 150,10,100,20
       LoadLibrary("riched32.dll") : InitCommonControls
       Control Add "RichEdit", hDlg, %ID_RELeft,  "",0,0,50,50, style&, %WS_Ex_ClientEdge
       Control Add "RichEdit", hDlg, %ID_RERight, "",0,0,50,50, style&, %WS_Ex_ClientEdge
       Control Handle hDlg, %ID_RELeft To hRELeft
       Control Handle hDlg, %ID_RERight To hRERight
       LoadText
       Dialog Show Modal hDlg Call DlgProc
    End Function
    
    CallBack Function DlgProc() As Long
       Select Case Cb.Msg
          Case %WM_Command
             If Cb.Ctl = %ID_Button1 Then ShowDifference 
             If Cb.Ctl = %ID_Button2 Then LoadText
          Case %WM_Size
             'resizes controls when form is resized
             Dim w As Long, h As Long
             Dialog Get Client Cb.Hndl To w,h
             Control Set Loc Cb.Hndl, %ID_RELeft, 5,35
             Control Set Size Cb.Hndl, %ID_RELeft, (w-20)/2, h-40
             Control Set Loc Cb.Hndl, %ID_RERight, w/2+5, 35
             Control Set Size Cb.Hndl, %ID_RERight, (w-20)/2, h-40
       End Select
    End Function
    
    Sub LoadText
       Local temp$
       temp$ = "These are lines" + $CrLf + "of text to compare" + $CrLf + "to the right side."
       Control Set Text hDlg, %ID_RELeft, temp$
       temp$ = "These are lines" + $CrLf + "of text to compare" + $CrLf + "to the left side."
       Control Set Text hDlg, %ID_RERight, temp$
    End Sub
    
    
    Sub ShowDifference
        Local Found As Long, iStep As Long, A As String, B As String, SColor As String
        Local i As Long, j As Long, n As Long, Done As Long, r As Long
        Local A1() As String, B1() As String, S() As String, LText As String, RText As String
    
        'get raw text data
        Control Get Text hDlg, %ID_RELeft To A
        Control Get Text hDlg, %ID_RERight To B
    
        ReDim A1(ParseCount(A,$CrLf)-1), B1(ParseCount(B,$CrLf)-1)
        Parse A, A1(), $CrLf : Parse B, B1(), $CrLf
    
        LText = "" : RText = ""
        'start by comparing lines A1(0) and B1(0)
        Do While Not Done
            If i > UBound(A1) Then
                'no more A1() entries, so load all remaining B1() entries
                For n = j To UBound(B1)
                    LText = LText + $CrLf + ""
                    RText = RText + $CrLf + B1(n)
                    SColor = SColor & ":" & Str$(n)
                Next n
                Done = %True
                Exit Do
            End If
            If j > UBound(B1) Then
                'no more B1() entries, so load all remaining A1() entries
                For n = i To UBound(A1)
                    LText = LText + $CrLf + A1(n)
                    RText = RText + $CrLf + ""
                    SColor = SColor & ":" & Str$(n)
                Next n
                Done = %True
                Exit Do
            End If
            If A1(i) = B1(j) Then
                'they are equal so display them
                LText = LText + $CrLf + A1(i)
                RText = RText + $CrLf + B1(j)
                'go to next pair of lines
                i = i + 1
                j = j + 1
            Else
                'they are not equal, so check to see if B1(j) is
                'found within the next 10 lines of A1()
                Found = %False
                iStep = 20
                If i + iStep > UBound(A1) Then iStep = UBound(A1) - i
                For r = 1 To iStep
                    If A1(i + r) = B1(j) Then
                        Found = %True
                        Exit For
                    End If
                Next r
                If Found = %True Then
                    'B1(j) was found within 10 lines of A()
                    'print all of A1(i)-n/a up to the point it is found
                    'then print A1(i)-B1(j)
                    'If r > 1 Then
                        For n = 0 To r - 1
                            LText = LText + $CrLf + A1(i + n)
                            RText = RText + $CrLf + String$(200, "*") 
                            SColor = SColor & ":" & Str$(i + n)
                        Next n
                    'End If
                    LText = LText + $CrLf + A1(i + r)
                    RText = RText + $CrLf + B1(j)
                    i = i + r + 1
                    j = j + 1
                Else
                    'B1(j) was not found within 10 lines of A()
                    'print n/a-B1(j)
                    LText = LText + $CrLf + String$(200, "*") 
                    RText = RText + $CrLf + B1(j)
                    SColor = SColor & ":" & Str$(j)
                    j = j + 1
                End If
            End If
        Loop
    
        'get rid of leading $crlf
        If Len(LText) > 0 Then LText = Right$(LText, Len(LText) - 2)
        If Len(RText) > 0 Then RText = Right$(RText, Len(RText) - 2)
    
        'show differences
        Control Set Text hDlg, %ID_RELeft, LText
        Control Set Text hDlg, %ID_RERight, RText
    
    End Sub

  • #2
    I worked out a solution to the problem I raised - a better approach to comparing two strings and displaying their differences. The result is posted in the Source Code forum.

    http://www.powerbasic.com/support/pb...ad.php?t=41543

    The solution is a variation on the LCS (longest common subsequence) algorithm - converted to PowerBASIC of course.

    BTW, credit goes to Rod Stephens, author of several programming books. He had written an article on the topic and was able to help me get past a sticking point on the algorithm.

    Comment


    • #3
      Gary, thanks for posting that. Can you post a link or links to the source of the algos for your code?

      Comment


      • #4
        Chris,

        Here's the 3 I used the most.

        http://en.wikibooks.org/wiki/Algorit...on_subsequence

        http://www.ics.uci.edu/~eppstein/161/960229.html

        http://en.wikipedia.org/wiki/Longest...quence_problem

        Comment


        • #5
          Hey Chris(H),
          I'm about to release a new app called gbCompareText which uses a modification of the last code I posted on comparing text. Here's a screen shot of gbCompareText, which provides a side by side view of two files (or clipboard content) with differences highlighted.



          While I'm finishing up a few things with the app, and since you asked about the code at the time, I thought I'd see if you had made any changes or improvements to the code I published that you could share, or if you simply had any comments/suggestions to make.

          I'm also working on a new release of my PowerBASIC source code library (powerbasic.gbs), which has grown to over 1100 snippets. As part of cleaning up the code, I've found a real need for a text comparator that can work off the clipboard as well as from files. That need has sidetracked me into creating gbCompareText (yes, I know I'm easy to sidetrack). So, over the last few days I've come up with gbCompareText and am doing more intensive testing of the primary EditDistanceLines function, modified to generate a side by side output rather than the strikeout result that my posted code created.

          Any comments would be appreciated.

          PS: I'm an enormous fan of Beyond Compare and use it all the time. But it's a commercial product and I wanted something I could publish, as well as include in several other future apps I'm working on.

          I also looked at several freeware products, such as WinMerge. The size of the packages and the need for a user to install a secondary app kept me from deciding to use one of those products.

          Interestingly, I did find a few utilities at Code Project that were reasonable (and compact) solutions. But since I already had the PowerBASIC source code for a solution in hand, and could customize it to my specific needed, I decided to go ahead with gbCompareText.

          Comment


          • #6
            And, just in case you're interested, here are some of the freeware packages I looked at:

            WinMerge
            ExamDiff
            KMB Text Compare
            Diff Merge
            Diffuse
            GNU DiffUtils (for Windows)
            FreeDiff
            KDiff3
            WinDiff (Micrsooft)

            And over at Code Project, these caught my eye. They actually had a contest on the topic of text comparison, so there were several articles to look at:

            KGDiff (Groves)
            Diff Tool (Rodriguez)
            Differ (Hovel)
            DiffEngine (?)
            O(ND) Diff (Hertel)

            I also found these articles which give good summaries of available tools:
            25+ Useful Document and Fle Comparison Tools
            Best Diff Tool
            File Comparison Utilities

            There were even a few online text comparison options:

            www.text-compare.com
            www.textdiff.com
            diffchecker.com
            Last edited by Gary Beene; 9 Feb 2012, 12:48 AM.

            Comment


            • #7
              And (still talking to myself) I not only wanted to have the source code available, I wanted the result to be fairly simple - an uncomplicated GUI with a limited set of key features.

              I found that a lot of the available tools had several features such as creating patch files (to convert one file to another), comparison of 3+ files, support for non-text files, folder level comparisons and synchronization, document merging, source code syntax highlighting, registry comparison, and on and on ....

              All of those are great features, but my needs are much simpler. I just want to know (and see) how two text files/strings differ.

              Comment


              • #8
                Originally posted by Gary Beene View Post
                [B]... since you asked about the code at the time, I thought I'd see if you had made any changes or improvements to the code I published that you could share, or if you simply had any comments/suggestions to make.
                Absolutely none, unfortunately. I think I just wanted you to credit your sources, which you did.

                For years I have used FolderMatch to do this sort of thing and it does all I need.

                But two and half years on, I stand in awe of your perseverance and organisation!

                Comment


                • #9
                  >had several features such as creating patch files (to convert one file to another),

                  When I did work on IBM mainframes in COBOL, several of my clients used 'patch id' scenarios. I found these were nice only when there were not a lot of changes and/or all the changes were pretty much big chunks of new/replacement code. When the patch ids start apprearing all over the source code they may as well not be there. YMMV.

                  I would think some of the commercial "source code library manangement" software would include "version/patch" options.. eg, "extract the source as it existed for version 2.5.12" even if you are now at version 3.1.9. But that stuff is generally not inexpensive.

                  MCM
                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Originally posted by Michael Mattias View Post
                    But that stuff is generally not inexpensive.
                    see cvs, rcs, sccs, subversion, etc etc all free AFIAK.

                    Comment


                    • #11
                      Hi MCM,
                      Yes, when I was looking at options, some of the tools that had just basic text comparison features were mostly < $50.

                      But those with expanded scope, such patch making and version control, were several hundred dollars.

                      Comment


                      • #12
                        If your "WHAT" is text comparison, that's one thing.

                        If your "WHAT" is source code version management, that's another.

                        Your HOW should be selected based on the true WHAT.

                        Perhaps you can accomplish some limited - but sufficient for your application - source code management using a text comparison tool. I guess you have to decide what you really want.

                        MCM
                        Michael Mattias
                        Tal Systems (retired)
                        Port Washington WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          Hi MCM,
                          Yep:
                          ... accomplish some limited - but sufficient for your application ...
                          Specifically, in gbSnippets I want to compare snippets from the PowerBASIC source code library to look for near-duplicate entries.

                          In this case, just a plain old visual text comparison (and a few supporting features) is sufficient.

                          Comment


                          • #14
                            Today's not-so-secret trade secret....

                            Getting users to define what they really want is at least half the job.

                            Sometimes I think the other half is getting users to explain clearly how a product is not serving their needs with something a bit more descriptive than "It Don't Work."
                            Michael Mattias
                            Tal Systems (retired)
                            Port Washington WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Originally posted by Gary Beene View Post
                              And (still talking to myself) I not only wanted to have the source code available, I wanted the result to be fairly simple - an uncomplicated GUI with a limited set of key features.
                              Gary, sorry I didn't see this sooner... I will look at your file compare... I use WinMerge, and love the "yellow bands" at the left, to clue me as to where the diffs are, and how easy it is to move the diffs left or right (using CTRL-LEFT_ARROW or CTRL-RIGHT_ARROW).
                              3.14159265358979323846264338327950
                              "Ok, yes... I like pie... um, I meant, pi."

                              Comment


                              • #16
                                I'm a big fan of BeyondCompare as well. When I'm not using it, I just load up two file in EditPlus and control tab between them looking for visual shifts in the file, works pretty well for small jobs.
                                LarryC
                                Website
                                Sometimes life's a dream, sometimes it's a scream

                                Comment


                                • #17
                                  Hi Larry,
                                  Did you know that your technique of alternating two displays in front of the eyes is one of the techniques astronomers use to detect movement of objects in the the sky - dark or light objects. It gives visual "blinking" that alerts the viewer to subtle differences in two images.

                                  You must be a pretty fast Ctl -Tab'er! :laugh:

                                  Hi Jim,
                                  Thanks. I should have the app released tomorrow for folks to take a look at. I just want to do some more testing on it before I release the code.

                                  Comment


                                  • #18
                                    I know i am going crazy when i dream of somebody elses code while sleeping
                                    Even though i have not had time to explore your code. I had a few thoughts that may or may not be of interest.
                                    1. If you remove all the different code of each file then do they equal.
                                    I feel it would be a great efect to report that.
                                    2. A thought. If in the difference highlighted lines. Could highlighting a line with another color if the line matches some other line in the compared to file.
                                    In this case, any lines such as "END IF" would most likely get highlighted because it would likely match a line in the compared to file.

                                    A question.
                                    Do you think it is better to check for lines that do not compare, match up or assume all lines do not compare and try to find lines that match up. Seems crazy to suggest an alternate way as if it would make a difference and i am still working that one out in my head.
                                    p purvis

                                    Comment

                                    Working...
                                    X