Levenshtein Distance

Levenshtein distance defines the distance between strings. It is used for spell checking and suggestions (among other applications). It is interesting as it reminded me of Hamming distance (I am looking at you, undergrad information theory course!) which led me the rabbit hole of string distance functions.

Well, wiki says it all

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

Just looking at the piecewise function and pseudo-code in the wiki. It looked simple enough. and it was! I added some comments

import sys

s1 = sys.argv[1]
s2 = sys.argv[2]

def lev(t, s):
    # If one of the string is empty the distance is the lenght of the other string. DUH!
    if len(t)==0:
        return len(s)

    if len(s)==0:
        return len(t)

    # checking the first char in each string, if they are the same, call lev again
    if s[0] == t[0]:
        return lev(t[1:] , s[1:])

    # well, this is tricky, at this point, we know at least one char is difference (hence the 1 below). Then trying the least distance with and without that char in each string
    return 1 + min ([
                        lev(t[1:], s),
                        lev(t, s[1:]),
                        lev(t[1:], s[1:]),
                    ])

x = lev(s1, s2)

print("res=", x)