Comparing two or more stemmers

A direct method to compare the similarity of two stemmers is to use a common word list for the input to both stemmers, then to input the two outputs of the stemmers into File A and File B of the List Analyser.

 

 

The input to File A is the output.txt file from the Porter website http://tartarus.org/~martin/PorterStemmer/  File B is the output from Stemming Tester 1.4, using voc.txt file from the Porter website as the word list and the default English stemming.dat file from dtSearch Desktop.

 

The results of interest are the Similarity group metrics and the Mean Word Length and Unique Words count from each stemmer. See Stemmer Similarity Metrics