'

'LEVENSHTEIN.BAS - Ported by Wayne Diamond to PB from the VB sample at http://www.merriampark.com/ld.htm

'Quote from ld.htm:

' Levenshtein distance (LD) is a measure of the similarity between two strings,

' which we will refer to as the source string (s) and the target string (t). The

' distance is the number of deletions, insertions, or substitutions required to

' transform s into t. For example, If s is "test" and t is "test", then LD(s,t) = 0,

' because no transformations are needed. The strings are already identical. If s is

' "test" and t is "tent", then LD(s,t) = 1, because one substitution

' (change "s" to "n") is sufficient to transform s into t. The greater

' the Levenshtein distance, the more different the strings are.

' Levenshtein distance is named after the Russian scientist Vladimir

' Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce

' Levenshtein, the metric is also sometimes called edit distance.

[This message has been edited by Wayne Diamond (edited June 05, 2001).]

**Updated**6th June 2001, cheers Lance for the optimisations'LEVENSHTEIN.BAS - Ported by Wayne Diamond to PB from the VB sample at http://www.merriampark.com/ld.htm

'Quote from ld.htm:

' Levenshtein distance (LD) is a measure of the similarity between two strings,

' which we will refer to as the source string (s) and the target string (t). The

' distance is the number of deletions, insertions, or substitutions required to

' transform s into t. For example, If s is "test" and t is "test", then LD(s,t) = 0,

' because no transformations are needed. The strings are already identical. If s is

' "test" and t is "tent", then LD(s,t) = 1, because one substitution

' (change "s" to "n") is sufficient to transform s into t. The greater

' the Levenshtein distance, the more different the strings are.

' Levenshtein distance is named after the Russian scientist Vladimir

' Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce

' Levenshtein, the metric is also sometimes called edit distance.

Code:

#COMPILE EXE "lsdist.exe" '******************************** '*** Compute Levenshtein Distance '******************************** FUNCTION LD(BYVAL s AS STRING, BYVAL t AS STRING) AS LONG DIM d() AS LONG ' matrix DIM m AS LONG ' length of t DIM n AS LONG ' length of s DIM i AS LONG ' iterates through s DIM j AS LONG ' iterates through t DIM s_i AS STRING ' ith character of s DIM t_j AS STRING ' jth character of t DIM cost AS LONG ' cost ' Step 1 n = LEN(s) m = LEN(t) IF n = 0 THEN LD = m EXIT FUNCTION END IF IF m = 0 THEN LD = n EXIT FUNCTION END IF REDIM d(0 TO n, 0 TO m) AS LONG ' Step 2 FOR i = 0 TO n d(i, 0) = i NEXT i FOR j = 0 TO m d(0, j) = j NEXT j ' Step 3 FOR i = 1 TO n s_i = MID$(s, i, 1) ' Step 4 FOR j = 1 TO m t_j = MID$(t, j, 1) ' Step 5 IF s_i = t_j THEN cost = 0 ELSE cost = 1 END IF ' Step 6 d(i, j) = MIN%(d(i - 1, j) + 1, d(i, j - 1) + 1, d(i - 1, j - 1) + cost) NEXT j NEXT i ' Step 7 LD = d(n, m) ERASE d END FUNCTION FUNCTION PBMAIN() AS LONG DIM LevenshteinDistance AS LONG String1$ = "test" String2$ = "test" LevenshteinDistance = LD(String1$, String2$) STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance) String1$ = "pie" String2$ = "pies" LevenshteinDistance = LD(String1$, String2$) STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance) String1$ = "doom" String2$ = "wolfenstein" LevenshteinDistance = LD(String1$, String2$) STDOUT "The Levenshtein Distance between " & String1$ & " and " & String2$ & " is: " & STR$(LevenshteinDistance) END FUNCTION

[This message has been edited by Wayne Diamond (edited June 05, 2001).]

## Comment