Language
Packs are designed for developers to
extend the performance of their dtSearch
Engine powered applications. Language
packs include:
- Stemming
rule files for languages as shown in the
table on the right
- Noise
word files for ALL languages listed
-
Test files to check the operation of stemming
in ALL languages listed
-
Stemming
Selector application.
- User
Thesaurus Plus application.
- Stemming
Tester application.
- One
year of on-line technical support and
updates.
Stemming
rules and noise-word files
dtSearch
products are supplied with stemming
rules and a noise-word
file for English(US). Stemming is the only
search expansion option which is 'on' by
default in the dtSearch end-user products;
the reason for this is that stemming is
almost always useful when making a search,
and adds little to the time required to
make a search. Unlike some other search
engines, dtSearch applies stemming at search
time, there is no need to build indexes
specifically to apply stemming and no need
to build separate indexes for each language
in use.
The problem
With the stemming option selected dtSearch
will find plurals and many other word variations;
for example a search on print
will find printers,
printing, printed automatically.
However,
if you are searching documents written in
other languages, the English stemming rules
will cause you to miss many word variations
which do not occur in English (e.g. verb
and noun changes with gender), and you may
find that words which are unrelated are
found in error.
Furthermore,
the English noise word list, which is designed
to remove unwanted English words from your
index to keep the index size small, is not
suitable for other languages; your indexes
may contain many words which will not be
useful in searches and which will add to
the size of your indexes.
The
solution
Use language specific files in place of
the default US English files. These are
supplied in the form of Language Extension
Packs which contain files for many languages.
All files are in Unicode format.
Order Code LEP502: Western European languages
(i.e. Latin alphabet)
Order
Code LEP503: Eastern European languages
(e.g. Cyrillic, Greek)
Order Code LEP500: All languages listed.
The LEP500 license includes the "Russian
Plus" stemming rules; these
combine Cyrillic and Latin rules to
enable improved search recall in document
collections containing Russian and another
language.
The language pairs supported are:
Russian
plus Czech
Russian plus Estonian
Russian plus Finnish
Russian plus German
Russian plus Greek
Russian plus Hungarian
Russian plus Latvian
Russian plus Lithuanian
Russian plus Polish
Russian plus Slovak
Russian plus Swedish |
License: 250
USD for LEP502 or LEP503 for use on up
to three servers or workstations for use
with the dtSearch Engine or dtSearch Web,
OR up to 15 workstations for use with
dtSearch Desktop or Network. Click the
Buy Now button to see the full price list,
including special 'bundle' offers.
Language Extension Packs can also be licensed
for large volume use or wider distribution
in your own application please
ask for developer licensing options.
Release
notes...
Price
List and Order Form
Needs
- dtSearch
7.1 or later (Base License covers use
with dtSearch Engine or Web on up to three
servers, or dtSearch Desktop\Network for
up to 15 users); other licensing available.
-
Needs Windows Windows 7 or XP (SP3 .NET
3.5)
- ESD
(electronic download only)
Stemming Selector
Stemming
Selector can be purchased separately for
use with dtSearch Desktop or dtSearch
Network. Single user version just 19.90
USD for SLS502 or SLS503, 29.85 USD for
SLS500. (For end-user use only. If you
need to distribute stemming selector or
its stemming files with your own application,
or with dtSearch Web or Publish order
LEP500, LEP502 or LEP503).
Find out more...
User
Thesaurus Plus
User
Thesaurus Plus can be purchased separately
for use with dtSearch Desktop/Network.
Single user version 39 USD. (For end-user
use only. If you need to distribute User
Thesaurus Plus or any of the supplied
thesauri with your own application, or
with dtSearch Web or Publish use the LEP500,
LEP502 or LEP503).
Find out more...
Stemming
Tester
Stemming
Tester is a free application for developers
and IR researchers. Find
out more...
Evaluation
30-day
evaluation versions of Stemming Selector
and User Thesaurus Plus are available
for download, these allow evaluation of
ALL listed languages.
|