dtSearch UK logo
Home    Products    Case Studies    Buy    Downloads    Support    About  

Language Extension Packs

Language Packs are designed for developers to extend the performance of their dtSearch Engine powered applications. Language packs include:

  • Stemming rule files for languages as shown in the table on the right
  • Noise word files for ALL languages listed
  • Test files to check the operation of stemming in ALL languages listed
  • Stemming Selector application.
  • User Thesaurus Plus application.
  • Stemming Tester application.
  • One year of on-line technical support and updates.

Stemming rules and noise-word files

dtSearch products are supplied with stemming rules and a noise-word file for English(US). Stemming is the only search expansion option which is 'on' by default in the dtSearch end-user products; the reason for this is that stemming is almost always useful when making a search, and adds little to the time required to make a search. Unlike some other search engines, dtSearch applies stemming at search time, there is no need to build indexes specifically to apply stemming and no need to build separate indexes for each language in use.

The problem

With the stemming option selected dtSearch will find plurals and many other word variations; for example a search on print will find printers, printing, printed automatically.
However, if you are searching documents written in other languages, the English stemming rules will cause you to miss many word variations which do not occur in English (e.g. verb and noun changes with gender), and you may find that words which are unrelated are found in error.

Furthermore, the English noise word list, which is designed to remove unwanted English words from your index to keep the index size small, is not suitable for other languages; your indexes may contain many words which will not be useful in searches and which will add to the size of your indexes.

The solution
Use language specific files in place of the default US English files. These are supplied in the form of Language Extension Packs which contain files for many languages. All files are in Unicode format.
Order Code LEP502: Western European languages (i.e. Latin alphabet)
Order Code LEP503: Eastern European languages (e.g. Cyrillic, Greek)
Order Code LEP500: All languages listed.

The LEP500 license includes the "Russian Plus" stemming rules; these combine Cyrillic and Latin rules to enable improved search recall in document collections containing Russian and another language.

The language pairs supported are:
Russian plus Czech
Russian plus Estonian
Russian plus Finnish
Russian plus German
Russian plus Greek
Russian plus Hungarian
Russian plus Latvian
Russian plus Lithuanian
Russian plus Polish
Russian plus Slovak
Russian plus Swedish


License:
250 USD for LEP502 or LEP503 for use on up to three servers or workstations for use with the dtSearch Engine or dtSearch Web, OR up to 15 workstations for use with dtSearch Desktop or Network. Click the Buy Now button to see the full price list, including special 'bundle' offers.

Language Extension Packs can also be licensed for large volume use or wider distribution in your own application
please ask for developer licensing options.

Release notes...

Buy button Price List and Order Form

Needs

  • dtSearch 7.1 or later (Base License covers use with dtSearch Engine or Web on up to three servers, or dtSearch Desktop\Network for up to 15 users); other licensing available.
  • Needs Windows Windows 7 or XP (SP3 .NET 3.5)
  • ESD (electronic download only)



Stemming Selector

Stemming Selector can be purchased separately for use with dtSearch Desktop or dtSearch Network. Single user version just 19.90 USD for SLS502 or SLS503, 29.85 USD for SLS500. (For end-user use only. If you need to distribute stemming selector or its stemming files with your own application, or with dtSearch Web or Publish order LEP500, LEP502 or LEP503). Find out more...

User Thesaurus Plus

User Thesaurus Plus can be purchased separately for use with dtSearch Desktop/Network. Single user version 39 USD. (For end-user use only. If you need to distribute User Thesaurus Plus or any of the supplied thesauri with your own application, or with dtSearch Web or Publish use the LEP500, LEP502 or LEP503). Find out more...

Stemming Tester

Stemming Tester is a free application for developers and IR researchers. Find out more...

Evaluation

30-day evaluation versions of Stemming Selector and User Thesaurus Plus are available for download, these allow evaluation of ALL listed languages.

 

 

Order Code

LEP500

SLS500

LEP502

SLS502

LEP503

SLS503

Western European
Danish  yes  yes  
Dutch  yes  yes  
English  yes  yes  
Finnish  yes  yes  
French + English*  yes  yes  
German  yes  yes  
German + English*  yes  yes  
Italian  yes  yes  
Norwegian  yes  yes  
Portuguese  yes  yes  
Spanish  yes  yes  
Swedish  yes  yes  
Eastern European
Belarusian  yes    yes
Bosnian  yes    yes
Bulgarian  yes    yes
Croatian  yes    yes
Czech  yes    yes
Estonian  yes    yes
Greek  yes    yes
Hungarian  yes    yes
Latvian  yes    yes
Lithuanian  yes    yes
Polish  yes    yes
Romanian  yes    yes
Russian  yes    yes
Serbian**  yes    yes
Slovak  yes    yes
Slovenian  yes    yes
Turkish  yes    yes
Ukrainian  yes    yes
Uzbek**  yes    yes
Russian Plus  yes    

* LEP500 and LEP502 also include unique bilingual French/English and German/English stemming and noise word files which enables search expansion on indexes and documents containing a mix of French and English, or German and English text.

** Supports Cyrillic and Latin scripts simultaneously.

Please enquire for any language or language/combinations not listed.

Contact us for other licensing

 

 

Language Pack Support