Introduction - LEP500 series Language Extension Pack

dtSearch products are supplied with stemming rules and a noise-word file for English(US). Stemming is the only search expansion option which is 'on' by default in the dtSearch end-user products; the reason for this is that stemming is almost always useful when making a search, and adds little to the time required tomake a search. Unlike some other search engines, dtSearch applies stemming at search time, there is no need to build indexes specifically to apply stemming and no need to build separate indexes for each language in use.

The problem

With the stemming option selected dtSearch will find plurals and many other word variations; for example a search on print will find printers, printing, printed automatically. However, if you are searching documents written in other languages, the English stemming rules will cause you to miss many word variations which do not occur in English (e.g. verb and noun changes with gender), and you may find that words which are unrelated are found in error.

Furthermore, the English noise word list, which is designed to remove unwanted English words from your index to keep the index size small, is not suitable for other languages; your indexes may contain many words which will not be useful in searches and which will add to the size of your indexes.

The solution

The Language Extension Packs consist of stemming rule files, stop word files and test files for multiple languages, together with other utilities to assist in the testing and development of stemming rules and synonym rings to improve search recall in many languages. All files are in Unicode format.

The LEP500 series of Language Packs for dtSearch include:

1) Noise word files for ALL languages listed in Tables A, B and C

2) Test files to check the operation of stemming in ALL languages listed in Tables A, B and C.

3) Stemming Selector application (SLS500, SLS502 or SLS503)*

4) Stemming rule files for languages as shown below:

LEP500 includes SLS500 which contains stemming rule files for all languages in Tables A, B and C.
LEP502 includes SLS502 which contains stemming rule files for all languages in Table A.
LEP503 includes SLS503 which contains stemming rule files for all languages in Table B.

5) Stemming Tester application.

6) User Thesaurus Plus* application with sample files.

 

Compatibility

dtSearch 7.1 or later (Base License covers use with dtSearch Engine or Web on up to three servers, or dtSearch Desktop\Network for up to 15 users); other licensing available.

System requirements

Needs Windows 7, Vista, XP (SP3 .Net 3.5 Framework)

Support and Updates

One year of on-line technical support and updates. You will find latest update information at www.dtsearch.co.uk/language-support.htm

* Stemming Selector 4 and User Thesaurus Plus can also be licensed separately on a per-user basis for use with dtSearch Desktop or dtSearch Network, however the license is for end-user use only. For distribution with an application using the dtSearch Engine an LEP500 series license is required.