Mean Characters Removed

"The mean number of characters removed in forming stems—Stronger stemmers remove more characters from words to form stems. For example, a stemmer that stems the corpus {engineer, engineered, engineering, engineers} to the stem engineer would remove an average of (0+2+3+1)/4 = 1.5 characters. A weakness of this metric is that it does not measure transformations of stem endings. We therefore have developed the following measures...." Frakes & Fox [Ref: W1].

 

 

The test files SS1 and SS1_trunc8 are based on the above example and will give a result for Mean Characters Removed of 1.5.

 

 

The Modified Hamming Distance metric overcomes the weakness of the above simple method.