hidden abstract class %iKnow.Stemmer extends %Library.RegisteredObject
This class represents an object responsible for stemming user input into a base form, either through some internal algorithm or through an external library. Use GetDefault() to instantiate the default stemmer for a particular language or GetCustom() to retrieve one configured with custom settings saved as a %iKnow.Stemming.Configuration object.
See also %iKnow.Stemming.Utils for defining exception rules.
This method will try to decompound the supplied pWord into composing elements and returns the stems of those elements:
pCompounds(n) = $lb([stem], [start pos in string], [score])
Note that most punctuation encountered in pWord (including spaces) will be considered as explicit compound boundaries (like hyphens).
See also %iKnow.Stemming.DecompoundUtils.
Returns the default stemmer object for language pLanguage, which is resolved as follows:
- Check if there is a pLanguage*.aff file in INSTALL_DIR/dev/hunspell/ and just use the first you come across for the requested language, instantiating a hunspell-based stemmer object. Most libraries have a more detailed locale such as en_US.aff, which is covered by this check. Note that pLanguage should refer to the two-letter ISO code for that language.
- If no such file is found, check if there is a directory called INSTALL_DIR/dev/hunspell/pLanguage with a *.aff file within. If found, use it to instantiate a hunspell-based stemmer object.
- If no hunspell library is found, try to check if a %Text.Text implementation corresponding to the requested language is found and use its Standardize() method (through a %iKnow.Stemming.TextStemmer instance)
- If none found, revert to the default in %Text.Text:Standardize(), which will mostly just lowercase the string.