Error Rate Relative to Truncation see Ref W8.


Stemming Tester 1.4 can be used to produce four Unicode text files at a time with truncated word lengths of 4 to 8 letters from a word list consisting of words arranged in concept groups according the method described by Paice in the reference above.  Each of  these files can be input to the List Analyser together with the word list to obtain over-stemming (OI) and under-stemming indexes (UI).


By plotting the UI and OI coordinates we have a truncation line against which any stemmer can be assessed, a reasonable stemmer will give a (UI, OI) point between the truncation line and the origin; in general, the further away the point is from the truncation line, the better the stemmer can be said to be. Specifically, a performance measure called the error rate relative to truncation (ERRT) can be obtained by extending a line from the origin O through the (UI, OI) point P until it intersects the truncation line at T, ERRT is then defined as:


ERRT = Length(OP) / Length(OT) .