Mean Conflation Class Size

"The mean number of words per conflation class—This is the average number of words that correspond to the same stem for a corpus. For example if the words "engineer," "engineered," and "engineering" are stemmed to "engineer," then this conflation class size is three. Stronger stemmers will tend to have more words per conflation class." Frakes & Fox [Ref: W1].

 
 

This metric is obviously dependent on the number of words processed, but for a word collection of given size, a higher value indicates a heavier stemmer. The value is easily calculated as follows:

MWC = Mean number of words per conflation class

N = Number of unique words before Stemming

S = Number of unique stems after Stemming

 

MWC = N/S