Levenshtein Distance

The Levenshtein distance between two words is the minimum number of character edits (insertion, deletion, substitution) required to change one word into the other.

 

By default List Analyser lists all words where the Levenshtein Distance between each word in List A and List B is from 0 to 32.  It should be left on this setting for measurements of Stemming Errors and Similarity.

 

If the Levenshtein Distance setting is 0 to 0, the words listed will be where the words in List A and List B are identical, conversely with a setting of 1 to 32 it will list all words that differ in List A and List B. The setting will affect the metric displayed for List A and B Mean Word Length, and Mean Characters Removed.

 

The number of words that differ and the Mean Characters Removed are a crude metric for stemmer strength, see

Number of Words and Stems that Differ and Mean Characters Removed.