dtSearch User Thesaurus File Format

The file generated by dtSearch Desktop/Network is always called thesaur.xml and is stored in the Users private directory. The file encoding is UTF-8.


Whenever you select a synonym file from the drop-down list in User Thesaurus Plus it generates a thesaur.xml file and writes to the target path you have set, normally this will be in the dtSearch Users private directory as above.


All files imported into User Thesaurus Plus are prefixed with UT_ , example UT_sample_thesaurus.xml and saved in your My Documents folder under User Thesaurus Plus\Data.


Example : sample_thesaur.xml


<?xml version="1.0" encoding="UTF-8" ?>



<Name>Personal computer</Name>

<Synonyms>"Personal computer" PC laptop</Synonyms>



<Name>how much</Name>

<Synonyms>"how much is" "what's the price of" "what's the cost of" "how much does"</Synonyms>




<Synonyms>ƒ Guilder florin</Synonyms>




<Synonyms>sing sang sung</Synonyms>




Search Queries are only expanded with the synonyms from the user thesaurus when the User Thesaurus check-box on the Search dialog is selected. Because dtSearch Desktop/Network is supplied with a single thesaurus file this can make searches unnecessarily broad, harming the precision.


The User Thesaurus Plus utility makes it possible to have multiple smaller files that can be designed for very specific search tasks, thus enabling better recall while maintaining a high precision.


The is no hard limit to the size of each XML file, but performance will fall off as the size increases, a recommended maximum is 10,000 items per file.


Files generated in User Thesaurus Plus when using File|New > Macro file... or New > synonym file... or New > SharePoint file contain a time stamp to distinguish them from files generated by dtSearch.




<?xml version="1.0" encoding="UTF-8"?>

<!-- Generated by User Thesaurus Plus : 2012-02-10 17:30:23 -->

<dtSearchMacros />