Here’s a very nice implementation of the Jaro-Winkler algorithm in Excel VBA.
I use it for fuzzy matching, clustering data and looking for duplicates.
The code is modified from Jay Tracewell’s post here. I simplified the code by taking out the clean_string function and renaming the function to JW.
You can read more about the Jaro-Winkler Distance on Wikipedia.
Example:
Image may be NSFW.
Clik here to view.
Here comes the code: (Since it’s a function, remember to insert it in a module)
Function JW(ByVal str1 As String, ByVal str2 As String) As Double Dim l1, l2, lmin, lmax, m, i, j As Integer Dim common As Integer Dim tr As Double Dim a1, a2 As String l1 = Len(str1) l2 = Len(str2) If l1 > l2 Then aux = l2 l2 = l1 l1 = aux auxstr = str1 str1 = str2 str2 = auxstr End If lmin = l1 lmax = l2 Dim f1(), f2() As Boolean ReDim f1(l1), f2(l2) For i = 1 To l1 f1(i) = False Next i For j = 1 To l2 f2(j) = False Next j m = Int((lmax / 2) - 1) common = 0 tr = 0 For i = 1 To l1 a1 = Mid(str1, i, 1) If m >= i Then f = 1 L = i + m Else f = i - m L = i + m End If If L > lmax Then L = lmax End If For j = f To L a2 = Mid(str2, j, 1) If (a2 = a1) And (f2(j) = False) Then common = common + 1 f1(i) = True f2(j) = True GoTo linea_exit End If Next j linea_exit: Next i Dim wcd, wrd, wtr As Double L = 1 For i = 1 To l1 If f1(i) Then For j = L To l2 If f2(j) Then L = j + 1 a1 = Mid(str1, i, 1) a2 = Mid(str2, j, 1) If a1 <> a2 Then tr = tr + 0.5 End If Exit For End If Next j End If Next i wcd = 1 / 3 wrd = 1 / 3 wtr = 1 / 3 If common <> 0 Then JW = wcd * common / l1 + wrd * common / l2 + wtr * (common - tr) / common Else JW = 0 End If End Function