Compression Factor

"Index compression factor—The index compression factor is defined as (n-s)/n where n is the number of words in the corpus and s is the number of stems. In other words, the index compression factor is the fractional reduction in index size achieved through stemming. For example, a corpus with 50,000 words (n) and 40,000 stems (s), would have an index compression factor of 20%. Stronger stemmers will tend to have larger index compression factors." Frakes & Fox [Ref: W1].

 

  1.  

IC = Index Compression Factor

N = Number of unique words before Stemming

S = Number of unique stems after Stemming

 

ICF = (N - S)/N