Stemmer Strength - Overview

"The strongest possible affix removal stemmer would be one that removed all but the first character from each stemmed word. The number of conflation classes in this case would be 26, the mean conflation class size would be the number of words in the corpus divided by 26, and the compression factor would be (n-26)/n where n is the number of words in the corpus.

 

The weakest possible affix removal stemmer would be one that changes no characters in any stemmed word. Such a stemmer would have one word per conflation class. The index compression factor, number of words and stems that differ, mean characters removed, and mean and median modified Hamming distance between word and stem in this case would all be zero." Frakes and Fox [Ref: W1]