|
|
|
Language
Extension Packs |
For use
with dtSearch version 6.5 or later.
dtSearch
Engine/Web is supplied with stemming rules and a noise-word file for
English(US). Stemming is the only search expansion option which is 'on'
by default in the dtSearch end-user products; the reason for this is
that stemming is almost always useful when making a search, and adds
little to the time required to make a search. Unlike some other search
engines, dtSearch applies stemming at search time, there is no need
to build indexes specifically to apply stemming and no need to build
separate indices for each language in use.
The problem
With the stemming option selected dtSearch will find plurals and many
other variations; for example a search on print
will find printers, printing,
printed automatically. However,
if you are searching documents written in other languages, the English
stemming rules will cause you to miss many word variations which do
not occur in English (e.g. verb and noun changes with gender), and you
may find that words which are unrelated are found in error.
Furthermore,
the English noise word list, which is designed to remove unwanted English
words from your index to keep the index size small, is not suitable
for other languages; your indexes may contain many words which will
not be useful in searches and which will add to the size of your indexes.
The
solution
Use language specific files in place of the default US English files.
These are supplied in the form of Language Extension Packs which contain
files for many languages, see list below. All files are in Unicode format.
Language
Extension Packs
*
LEP400 and LEP402 also include unique bi-lingual French/English and
German/English stemming and noise word files which enables search expansion
on indexes and documents containing a mix of French/German and English
text.
License:
Licensed
for
use on a single server or workstation for use with dtSearch Engine
or dtSearch Web, OR up to 5 workstations for use with dtSearch Desktop
or Network. Please ask for other
licensing options.
-
Stemming
rule files and noise word files for each supported language
- Test
files to check the operation of stemming in all the supplied languages.
- Stemming
Language Selector application, changes stemming rules from the Windows
Start menu*.
- Multilingual
Installer (English, French, Spanish, German, Dutch)
- One
year of on-line technical support and updates.
*User must have administrator permissions
Needs:
- dtSearch
6.5 or later (License covers use with dtSearch Engine or Web on a
single server, or dtSearch Desktop\Network for up to 5 users); other
licensing available.
- Needs
Windows NT4, 2000, XP
- Supplied
on CDROM
Evaluation
A 30-day evaluation version is available; this allows English and
any single language to be tried for comparison tests. Please complete
the Enquiry
Form. Please
enquire for languages not listed.
|