How to use the Stemming Test files

The files to use for testing the effect of stemming are in the ...\LEP500\Language\Test folder. There is a separate file for each language (e.g. STT_german.txt). Each file consists of a plain text file containing a list of words in the language to be tested.

The words in each file are arranged in groups of two or more. Each group is designed to test a particular grammatical rule of that language. The stemming rules for French for example have groups to test that all the verb conjugations are tested. The grammatical rules for each language generally also include tests for gender, number (e.g. plurals) and tense.

Testing using dtSearch end-user versions (i.e. dtSearch Desktop or dtSearch Nework)

1) Ensure that the stemming rule file (Stemming.dat) has been changed to match the language of the test file.

2) From the Index menu, choose Create Index (Advanced) and select the 'accent sensitive' option. Name the Index and choose to 'add documents now'. From the Index Dialog Box choose 'add file' and select the appropriate stem test file.

3) In the Search Dialog Box, turn on the stemming search expansion but make sure all other search expansion tools (fuzzy, phonic, thesaurus) are turned off.

4) In the Search Dialog Box ensure that only the test index is selected. The word-wheel will display the words in the test file.

5) Select a word from the word-wheel and press enter.

6) The test file should be displayed, ideally with all the words in the same group highlighted*.

Note:

 

Testing using your own application.

If your application has a word-wheel and hit-highlighting you can test using the above method. If you do not have a word wheel you will need to print out the test files and check off each word as you enter it.

Stemming Tester
The Language Extension Pack LEP500 series includes the Stemming Tester tool, this utility is similar to the StemTest.exe utility provided in the dtSearch \bin folder but has additional features to facilitate faster development or optimization of stemming rules. See the separate Stemming Tester WebHelp for details.

List Analyser
The List Analyser is for optimization of stemming rules in conjunction with Stemming Tester 1.4 or later. See the List Analyser WebHelp for details.