Skip to main content

Indexing Sources for SQL Search

Indexing Sources for SQL Search

You can use SQL Search to search text in %String data type or %Stream.GlobalCharacter (character stream) data type.

To perform a SQL search, the column to be searched must have a defined SQL Search bitmap index. Multiple types of index are defined as nested subclasses, with each subsequent subclass adding additional properties. The two basic SQL Search index types are as follows:

  • Minimal index (%iFind.Index.MinimalOpens in a new tab): supports SQL word and word phrase search with wildcards, fuzzy search, and regular expressions. It does not support co-occurrence or positional phrase searches or the highlighting of search results.

  • Basic index (%iFind.Index.BasicOpens in a new tab): supports all the features provided by the Minimal index type (SQL word and word phrase search with wildcards), and adds support for co-occurrence, positional phrase searches, and highlighting of search results. The features of this index type are sufficient for the majority of common full text search scenarios.

An SQL Search index of any type includes support for the following parameters:

  • IGNOREPUNCTUATION specifies whether or not to ignore punctuation characters. For %iFind.Index.MinimalOpens in a new tab the default is 1: punctuation is ignored. For all other SQL Search index types, the default is 0: punctuation affects search results; leading and trailing punctuation in the text must match the same punctuation in the search string.

  • INDEXOPTION specifies whether or not to index to allow for stemming or decompounding. Because indexing to support these operations significantly increases the size of the index, it is recommended that you specify INDEXOPTION=0 unless stemming or decompounding will very likely be used. The default is 0.

  • LANGUAGE specifies the language to use when indexing records. For example, "en" specifies English. Use "*" to enable automatic language identification. The default is "en".

  • LOWER specifies whether query search is case-sensitive. By default, InterSystems SQL Search indexing is not case-sensitive; SQL Search normalizes the text to lowercase before indexing it. The LOWER parameter determines whether or not to perform this lowercase normalization (LOWER=1, the default, normalizes to lowercase). Because language conventions commonly capitalize words at the beginning of a sentence or when used in a title, normalizing to lowercase is preferable in most applications. If you specify LOWER=0, the query search_items string is case-sensitive. For example, if you specify LOWER=0, the query search_items string ‘turkey’ will only match turkey and not Turkey. If you specify LOWER=1, the query search_items string ‘turkey’ will match both turkey and Turkey.

  • USERDICTIONARY allows you to specify the name of a user-defined UserDictionary that is applied to the texts before indexing. This parameter is optional and is for advanced use only.

Unspecified parameters take default values. For a full list of supported parameters refer to %iFind.Index.BasicOpens in a new tab in the InterSystems Class Reference.

The following Class Definition example creates a table with a Semantic index on the Narrative property (column). The indexed property can be of data type %String or %Stream.GlobalCharacter:

   Class Aviation.TestSQLSrch Extends %Persistent [
      DdlAllowed,Owner={UnknownUser},SqlRowIdPrivate,
      SqlTableName=TestSQLSrch ]
  { 
  Property UniqueNum As %Integer;
  Property CrashDate As %TimeStamp [ SqlColumnNumber=2 ];
  Property Narrative As %String(MAXLEN=100000) [ SqlColumnNumber=3 ];
  Index NarrBasicIdx On (Narrative) As %iFind.Index.Basic(INDEXOPTION=0,
     LANGUAGE="en",LOWER=1);
  Index UniqueNumIdx On UniqueNum [ Type=index,Unique ];
  }

Populating a Table

Like any SQL index, a defined SQL Search index (by default) is built when you populate a new table, and maintained when you subsequently insert, update, or delete data. You can defer building of an index when populating a table using %NOINDEX, and then use the %Build() method to build the index. You can add an index to a table that is already populated with data and then build that index. Refer to Defining and Building Indexes for further details.

The following example populates the Aviation.TestSQLSrch table from the Aviation.Events table. Any defined SQL Search indexes will automatically be built. This example inserts a large amount of text, so running it may take a minute or so:

INSERT OR UPDATE INTO Aviation.TestSQLSrch (UniqueNum,CrashDate,Narrative) 
    SELECT %ID,EventDate,NarrativeFull FROM Aviation.Event

This example uses INSERT OR UPDATE with a field defined with a unique key to prevent repeated execution from creating duplicate records.

Managing SQL Search Index Storage

By default, InterSystems IRIS stores SQL Search indexes differently from other types of SQL index—a fact which has important implications for system administrators.

Data for other types of index are stored and managed within the extent of the index’s associated table, in globals specific to that table. By contrast, InterSystems IRIS optimizes the efficiency and performance of SQL Search by storing index data about all the tokens that it has ingested within a namespace as a single, shared set of globals, independent of any individual index or table. These globals begin with the prefix ^IRIS.IF*.

Important:

Globals with the prefix ^IRIS* are intended for internal use only, and should not be accessed or modified directly in any way. In rare circumstances where it is necessary to drop ^IRIS.IF* globals, note the following:

  • All ^IRIS.IF* globals should be dropped entirely, and at the same time.

  • After dropping these globals, all SQL Search indexes in the namespace must be rebuilt.

The independence of ^IRIS.IF* global storage means that, if you intend to map a table’s contents from one namespace onto another and you would like the table’s SQL Search indexes to be available in the destination namespace as well, you must manually map the ^IRIS.IF* globals to the same namespace.

Note:

To further optimize the efficiency of this shared storage structure for SQL Search indexes, you may wish to map the ^IRIS.IF* globals across namespaces even if you do not intend to use an SQL Search index outside of the namespace in which it is located.

Should you choose to map these globals between namespaces, InterSystems strongly recommends mapping them to a database dedicated exclusively for that purpose (rather than the destination namespace’s default database, for example). All ^IRIS.IF* globals must be mapped to the same database. Once the mapping is created, you must rebuild all SQL Search indexes in the source namespace.

On the other hand, there may be situations where it is not desirable to share this data structure among indexes. For example: an SQL Search index which encompasses a small dataset would likely yield quicker results if it only needed to scan the set of tokens associated with that dataset, rather than a set of tokens which it shared with other indexes with larger, more diverse datasets. In such cases, simply assign the value "INDEX" to the IFINDSHAREDDATALOCATIONOpens in a new tab parameter when you define the index.

Advanced Index Types

If you are running an instance which is licensed to use InterSystems Natural Language Processor (NLP), you can create the following additional SQL Search index types:

You can view information about your instance’s license key from the Management Portal; select System Administration then Licensing.

Important:

Semantic and Analytic SQL Search indexes use InterSystems NLP, which is deprecatedOpens in a new tab. InterSystems provides documentation about these index types for existing users only. If you would like assistance identifying an alternative solution, contact the WRCOpens in a new tab.

In addition to the parameters available for the Minimal and Basic index types, Semantic and Analytic indexes support the following optional parameter:

  • IFINDATTRIBUTES allows you to specify whether or not to identify negation in the text and store the negation attribute. IFINDATTRIBUTES=1 identifies and indexes negation. The default is IFINDATTRIBUTES=0.

FeedbackOpens in a new tab