Quantcast
Viewing latest article 3
Browse Latest Browse All 6

Excel VBA: Jaro Winkler Distance (Fuzzy Matching)

Here’s a very nice implementation of the Jaro-Winkler algorithm in Excel VBA.

I use it for fuzzy matching, clustering data and looking for duplicates.

The code is modified from Jay Tracewell’s post here. I simplified the code by taking out the clean_string function and renaming the function to JW.

You can read more about the Jaro-Winkler Distance on Wikipedia.

Example:

Image may be NSFW.
Clik here to view.
JWtable

Here comes the code: (Since it’s a function, remember to insert it in a module)

Function JW(ByVal str1 As String, ByVal str2 As String) As Double
Dim l1, l2, lmin, lmax, m, i, j As Integer
Dim common As Integer
Dim tr As Double
Dim a1, a2 As String
l1 = Len(str1)
l2 = Len(str2)
If l1 > l2 Then
    aux = l2
    l2 = l1
    l1 = aux
    auxstr = str1
    str1 = str2
    str2 = auxstr
End If
lmin = l1
lmax = l2
Dim f1(), f2() As Boolean
ReDim f1(l1), f2(l2)
For i = 1 To l1
    f1(i) = False
Next i
For j = 1 To l2
    f2(j) = False
Next j
m = Int((lmax / 2) - 1)
common = 0
tr = 0
For i = 1 To l1
    a1 = Mid(str1, i, 1)
    If m >= i Then
        f = 1
        L = i + m
    Else
        f = i - m
        L = i + m
    End If
    If L > lmax Then
        L = lmax
    End If
    For j = f To L
        a2 = Mid(str2, j, 1)
        If (a2 = a1) And (f2(j) = False) Then
            common = common + 1
            f1(i) = True
            f2(j) = True
            GoTo linea_exit
        End If
    Next j
linea_exit:
Next i
Dim wcd, wrd, wtr As Double
L = 1
For i = 1 To l1
    If f1(i) Then
        For j = L To l2
            If f2(j) Then
                L = j + 1
                a1 = Mid(str1, i, 1)
                a2 = Mid(str2, j, 1)
            If a1 <> a2 Then
                tr = tr + 0.5
            End If
            Exit For
            End If
        Next j
    End If
Next i
wcd = 1 / 3
wrd = 1 / 3
wtr = 1 / 3
If common <> 0 Then
    JW = wcd * common / l1 + wrd * common / l2 + wtr * (common - tr) / common
Else
    JW = 0
End If
End Function

Viewing latest article 3
Browse Latest Browse All 6

Trending Articles