Analysing a stemmer

From the File menu choose open, in the dialog that opens click on Browse alongside the file A edit box and select the text file that was used as the input to the stemmer. For example you can download the file voc.txt from the Porter website http://tartarus.org/~martin/PorterStemmer/ , you do not need access to an actual stemmer program.

 

Now click on the Browse button alongside the file B edit box and browse to the output file from the stemmer, this would be output.txt file from the same source as above. Now click on the Calculate button.

 

The List Analyser will detect if both the word lists contain barriers, if they do it will calculate stemming error counts, otherwise the error count group will be disabled as shown in the examples below.

 

 

Fig 1 Showing the stemmer metrics for the Porter stemming using the sample files supplied on the official Porter website.

 

 

Fig 2 Showing the results for the Porter stemmer option in Stemming Tester 1.4 using as input the sorted.txt file (Word List A) from the Lancs University website. The List View can be sorted on any column by clicking on the column header. Here it shows the well known over-stemming errors in the Porter stemmer on the witness, wit, witted groups of words.

 

 

The List View makes it easy to find over-stemming errors by scrolling through the word list, here it shows the well known over-stemming errors in the Porter stemmer on the generalize, generate, generic, genetic groups of words.

 

You can save the results as a CSV file for further analysis in a spreadsheet like Excel by selecting Save As... from the File menu.