English Grouped

Stemmer Strength

Files:

A  react.txt

B  react_trunc5.txt

MHD

react   react 0  
reacts   react 1  
reacting   react 3  
reacted   react 2  
reaction   react 3  
reactions   react 4  
reactive   react 3  
reactivity   react 5  
reactivities   react 7  

 

Mean Conflation Class size = 9/1 = 9

Compression Factor = (9 - 1)/9 = 0.889 rounded to 3 decimal places.

Mean Characters Removed  (0 + 1 + 3 + 2 + 3 + 4 + 3 + 5 + 7) /9 = 3.111

MHD = HD(1,P) + (Q-P) where HD(1,P) is the Hamming Distance for the first P characters of both strings.

Mean MHD = the average MHD value for every word in the original sample = 3.111

 

Error counting

File A = English2Grouped.txt

File B = English2Grouped_trunc5.txt

UI = 0.545  OI = 0

 

Source: Lancs University

 

English2Grouped.txt   English2Grouped_trunc5.txt
divide
dividing
divided
division
divisor
====
divine
divination
  divid
divid
divid
divis
divis
====
divin
divin