Testing using a List of Words

Stemming Tester has an option to allow words to be entered from a text file so that you can easily repeat tests, or use existing lists of words; it can use grouped word lists as required for evaluation according to the error counting method described by C.D. Paice (Ref W8).

Displaying dtSearch default English stemming rules and reading in
Word List B (commonwords.txt) from the Lancaster University website.


1 ) From the Options menu click on Use List of Words... and browse to your word list.

You can prepare your own word lists by typing words into Windows Notepad, one word on each line, a line of ==== characters should be used as a 'strong' barrier to separate unrelated groups of words, and a row of ---- characters should be used for 'weak' barriers. Stemming Tester has a word length limit of 32 characters, and a maximum file size of 800 kB, the file should be saved in Unicode format. Comments may be added at the end of the text file on lines starting with a double backslash \\.

2) From the File menu select Open a stemming file... or Porter stemmer.

3) Click the Stem button to start the stemming process, you can return to the start of the word list by clicking the reset button .

If you want to return to manually entering words just click on the Options|Use List of Words... menu to unselect the option again.

Although typing your own lists of words is satisfactory for small scale testing, you can use the List Index Contents facility in dtSearch Desktop/dtSearch Network to create longer lists or download word lists from the Internet.