Indexing Sources for SQL Search
Indexing Sources for SQL Search
You can use SQL Search to search text in %String data type or %Stream.GlobalCharacter (character stream) data type.
To perform a SQL search, the column to be searched must have a defined SQL Search bitmap index. Multiple types of index are defined as nested subclasses, with each subsequent subclass adding additional properties. The two basic SQL Search index types are as follows:
-
Minimal index (%iFind.Index.MinimalOpens in a new tab): supports SQL word and word phrase search with wildcards, fuzzy search, and regular expressions. It does not support co-occurrence or positional phrase searches or the highlighting of search results.
-
Basic index (%iFind.Index.BasicOpens in a new tab): supports all the features provided by the Minimal index type (SQL word and word phrase search with wildcards), and adds support for co-occurrence, positional phrase searches, and highlighting of search results. The features of this index type are sufficient for the majority of common full text search scenarios.
An SQL Search index of any type includes support for the following parameters:
-
IGNOREPUNCTUATION specifies whether or not to ignore punctuation characters. For %iFind.Index.MinimalOpens in a new tab the default is 1: punctuation is ignored. For all other SQL Search index types, the default is 0: punctuation affects search results; leading and trailing punctuation in the text must match the same punctuation in the search string.
-
INDEXOPTION specifies whether or not to index to allow for stemming or decompounding. Because indexing to support these operations significantly increases the size of the index, it is recommended that you specify INDEXOPTION=0 unless stemming or decompounding will very likely be used. The default is 0.
-
LANGUAGE specifies the language to use when indexing records. For example, "en" specifies English. Use "*" to enable automatic language identification. The default is "en".
-
LOWER specifies whether query search is case-sensitive. By default, InterSystems SQL Search indexing is not case-sensitive; SQL Search normalizes the text to lowercase before indexing it. The LOWER parameter determines whether or not to perform this lowercase normalization (LOWER=1, the default, normalizes to lowercase). Because language conventions commonly capitalize words at the beginning of a sentence or when used in a title, normalizing to lowercase is preferable in most applications. If you specify LOWER=0, the query search_items string is case-sensitive. For example, if you specify LOWER=0, the query search_items string ‘turkey’ will only match turkey and not Turkey. If you specify LOWER=1, the query search_items string ‘turkey’ will match both turkey and Turkey.
-
USERDICTIONARY allows you to specify the name of a user-defined UserDictionary that is applied to the texts before indexing. This parameter is optional and is for advanced use only.
Unspecified parameters take default values. For a full list of supported parameters refer to %iFind.Index.BasicOpens in a new tab in the InterSystems Class Reference.
The following Class Definition example creates a table with a Semantic index on the Narrative property (column). The indexed property can be of data type %String or %Stream.GlobalCharacter:
Class Aviation.TestSQLSrch Extends %Persistent [
DdlAllowed,Owner={UnknownUser},SqlRowIdPrivate,
SqlTableName=TestSQLSrch ]
{
Property UniqueNum As %Integer;
Property CrashDate As %TimeStamp [ SqlColumnNumber=2 ];
Property Narrative As %String(MAXLEN=100000) [ SqlColumnNumber=3 ];
Index NarrBasicIdx On (Narrative) As %iFind.Index.Basic(INDEXOPTION=0,
LANGUAGE="en",LOWER=1);
Index UniqueNumIdx On UniqueNum [ Type=index,Unique ];
}
Populating a Table
Like any SQL index, a defined SQL Search index (by default) is built when you populate a new table, and maintained when you subsequently insert, update, or delete data. You can defer building of an index when populating a table using %NOINDEX, and then use the %Build() method to build the index. You can add an index to a table that is already populated with data and then build that index. Refer to Defining and Building Indexes for further details.
The following example populates the Aviation.TestSQLSrch table from the Aviation.Events table. Any defined SQL Search indexes will automatically be built. This example inserts a large amount of text, so running it may take a minute or so:
INSERT OR UPDATE INTO Aviation.TestSQLSrch (UniqueNum,CrashDate,Narrative)
SELECT %ID,EventDate,NarrativeFull FROM Aviation.Event
This example uses INSERT OR UPDATE with a field defined with a unique key to prevent repeated execution from creating duplicate records.
Advanced Index Types
If you are running an instance which is licensed to use InterSystems Natural Language Processor (NLP), you can create the following additional SQL Search index types:
-
Semantic index (%iFind.Index.SemanticOpens in a new tab): supports all the features provided by the Basic index type, and adds support for SQL search of NLP entities and (optionally) the negation attribute.
-
Analytic index (%iFind.Index.AnalyticOpens in a new tab): supports all of the features of the Semantic index type, as well as path, proximity, and dominance information.
You can view information about your instance’s license key from the Management Portal; select System Administration then Licensing.
Semantic and Analytic SQL Search indexes use InterSystems NLP, which is deprecatedOpens in a new tab. InterSystems provides documentation about these index types for existing users only. If you would like assistance identifying an alternative solution, contact the WRCOpens in a new tab.
In addition to the parameters available for the Minimal and Basic index types, Semantic and Analytic indexes support the following optional parameter:
-
IFINDATTRIBUTES allows you to specify whether or not to identify negation in the text and store the negation attribute. IFINDATTRIBUTES=1 identifies and indexes negation. The default is IFINDATTRIBUTES=0.