Stemming Test File Format

The stemming test files are in Unicode format, one word to a line and arranged in 'concept groups', ideally all the words in a group will be stemmed to the same word.

Where words within a group are not stemmed to the same word, this is an understemming error. Where words in separate groups are stemmed to the same word, this is an overstemming error.

To be compatible with Stemming Tester 1.4 and later the groups are separated with 'barriers' of either ==== or ---- characters, there should be between 4 and 32 = or - characters.

You can add comments to the end of each test file, each comment line must start with \\ characters.

See the Stemming Tester WebHelp for details.