Skip to main content

Indexing Sources for SQL Search

Indexing Sources for SQL Search

You can use SQL Search to search text in %String data type or %Stream.GlobalCharacter (character stream) data type.

To perform a SQL search, the column to be searched must have a defined SQL Search bitmap index. There are four levels of SQL Search indices. These levels are defined in nested subclasses. Each index level provides all of the features of the previous level, plus additional SQL Search features specific to that level. You can create any of the following SQL Search index types:

  • Minimal index (%iFind.Index.Minimal): supports SQL word and word phrase search with wildcards, fuzzy search, and regular expressions. It does not support co-occurrence or positional phrase searches or the highlighting of search results.

  • Basic index (%iFind.Index.Basic): supports SQL word and word phrase search with wildcards. It supports co-occurrence and positional phrase searches and highlighting of search results.

  • Semantic index (%iFind.Index.Semantic): supports SQL search of NLP entities and optionally supports the negation attribute.

  • Analytic index (%iFind.Index.Analytic): supports the all of the NLP features of Semantic index, as well as path, proximity, and dominance information.

Each index level supports all of the parameters of the previous level, and adds one or more additional parameters. Unspecified parameters take default values.

The following Class Definition example creates a table with a Semantic index on the Narrative property (column). The indexed property can be of data type %String or %Stream.GlobalCharacter:

   Class Aviation.TestSQLSrch Extends %Persistent [
      SqlTableName=TestSQLSrch ]
  Property UniqueNum As %Integer;
  Property CrashDate As %TimeStamp [ SqlColumnNumber=2 ];
  Property Narrative As %String(MAXLEN=100000) [ SqlColumnNumber=3 ];
  Index NarrSemanticIdx On (Narrative) As %iFind.Index.Semantic(INDEXOPTION=0,
  Index UniqueNumIdx On UniqueNum [ Type=index,Unique ];

An SQL Search index of any type includes support for the following parameters:

  • IGNOREPUNCTUATION specifies whether or not to ignore punctuation characters. For %iFind.Index.Minimal the default is 1: punctuation is ignored. For all other SQL Search index types, the default is 0: punctuation affects search results; leading and trailing punctuation in the text must match the same punctuation in the search string.

  • INDEXOPTION specifies whether or not to index to allow for stemming or decompounding. Because indexing to support these operations significantly increases the size of the index, it is recommended that you specify INDEXOPTION=0 unless stemming or decompounding will very likely be used. The default is 0.

  • LANGUAGE specifies the language to use when indexing records. For example, "en" specifies English. Use "*" to enable automatic language identification. The default is "en".

  • LOWER specifies whether query search is case-sensitive. By default, InterSystems SQL Search indexing is not case-sensitive; SQL Search normalizes the text to lowercase before indexing it. The LOWER parameter determines whether or not to perform this lowercase normalization (LOWER=1, the default, normalizes to lowercase). Because language conventions commonly capitalize words at the beginning of a sentence or when used in a title, normalizing to lowercase is preferable in most applications. If you specify LOWER=0, the query search_items string is case-sensitive. For example, if you specify LOWER=0, the query search_items string ‘turkey’ will only match turkey and not Turkey. If you specify LOWER=1, the query search_items string ‘turkey’ will match both turkey and Turkey.

  • USERDICTIONARY allows you to specify the name of a user-defined UserDictionary that is applied to the texts before indexing. This parameter is optional and is for advanced use only.

For a full list of supported parameters refer to %iFind.Index.Basic in the InterSystems Class Reference.

A Semantic index (%iFind.Index.Semantic) also supports the following optional parameter:

  • IFINDATTRIBUTES allows you to specify whether or not to identify negation in the text and store the negation attribute. IFINDATTRIBUTES=1 identifies and indexes negation. The default is IFINDATTRIBUTES=0.

Populating a Table

Like any SQL index, a defined SQL Search index (by default) is built when you populate a new table, and maintained when you subsequently insert, update, or delete data. You can defer building of an index when populating a table using %NOINDEX, and then use the %Build() method to build the index. You can add an index to a table that is already populated with data and then build that index. Refer to Defining and Building Indices for further details.

The following example populates the Aviation.TestSQLSrch table from the Aviation.Events table. Any defined SQL Search indices will automatically be built. This example inserts a large amount of text, so running it may take a minute or so:

INSERT OR UPDATE INTO Aviation.TestSQLSrch (UniqueNum,CrashDate,Narrative) 
    SELECT %ID,EventDate,NarrativeFull FROM Aviation.Event

This example uses INSERT OR UPDATE with a field defined with a unique key to prevent repeated execution from creating duplicate records.