Indexing Results
InterSystems has deprecated InterSystems IRIS® Natural Language Processing (NLP). It may be removed from future versions of InterSystems products. The following documentation is provided as reference for existing users only. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.
The Indexing Results enables you to enables you to view the NLP indexing of the contents of a individual data source. This user interface is part of the Domain Architect.
All functionality provided via this tool is also available by using ObjectScript to invoke NLP class methods and properties.
Introduction
The Indexing Results tool enables you to view the NLP indexing of the contents of a individual data source. This displays three listings: Indexed sentences, Concepts, and CRCs. The Indexed sentences display includes both color-coded text that shows entity types (Concept, Relations, Non-relevants, Path-relevants) and color-coded highlighting that shows attributes and their scope.
You can access the Indexing Results tool from the InterSystems IRIS Management Portal by selecting Analytics, then Text Analytics. The Analytics options are not displayed unless you are in a namespace that has been enabled for Analytics. Select the desired namespace. This displays the Analytics tools. You can access the Indexing Results tool in either of two ways:
-
By selecting the Text Analytics and then the Indexing Results option.
-
By selecting the Text Analytics and then the Domain Architect option. In the Domain Architect you open an existing domain or define a new domain. Once you are in a compiled domain, you can use the Tools tab Indexing Results button to display how NLP has indexed the data. This displays the Indexing Results tool as a separate browser tab.
The Indexing Results tool enables you to display indexed results of either data in a specified domain or manual input data.
To display Indexing Results options displayed at the top right you may have to scroll horizontally.
Domain Data
At the top right of the Indexing Results window is a drop-down list of defined domains. It defaults to the first defined domain. Select the desired domain.
Click the wide blank box across the top of the window to display a drop-down single-line listing of the contents of each indexed data source. Select one of these sources to display the indexing results for that source.
You can use the >> button to collapse (make disappear) the wide single-line source box. This enables you to view the indexing results without horizontal scrolling. You can use the << button to expand (make reappear) the wide single-line source box as a blank box that you can click to select another data source.
Manual input Data
At the top right of the Indexing Results window select the manual input button to input text directly for NLP indexing results analysis. This opens the Real-time input box. Type or paste your input text in the blank box. Use the Configuration drop-down box to select an existing (or default) configuration, or select language —> and then use the second drop-down list to select a national language or Auto-detect.
Indexed Sentences
The sentences in the source are listed in order, one sentence per line. Entity types (Concept, Relations, Non-relevants, Path-relevants) and attributes are indicated by color-coding and highlighting.
At the top right of the Indexing Results window you can select the highlighting type: either light or full: light uses color-coding and underlining to indicate entity types and attributes; it is intended to be unobtrusive to allow for convenient reading of sentences; full displays boxes around each entity and uses thick lines for attributes to provide a clearer representation of the NLP indexed structures. The information content of both type of highlighting is the same. The default is full.
The sentence text is highlighted for entities as follows:
-
concept: blue, boxed
-
relation: light green, boxed
-
non-relevant: grey, not boxed
-
path-relevant: black, grey box
The sentence text is highlighted for attributes as follows:
-
A Negation attribute phrase has red text (with concepts in bold letters and relations in regular letters); the concepts and relations are further clarified in full highlighting, where the enclosing boxes are the entity type color: blue for concepts, light green for relations. The negation keywords are underlined in red; multi-word negation terms (such as “was not”) are shown with each word underlined in red.
-
A Time, Duration, or Frequency attribute phrase is underlined with an orange dotted line. Time attribute keywords are underlined in orange. Duration attribute keywords are underlined in bright green. Frequency attribute keywords are underlined in yellow.
-
A Measurement attribute is underlined with a magenta dotted line. The measurement keywords are underlined in magenta.
-
A Negative Sentiment attribute is underlined with a purple dotted line. The sentiment keywords are underlined in purple.
-
A Positive Sentiment attribute is underlined with a green dotted line. The sentiment keywords are underlined in green.
These combinations make it possible to highlight combinations of entities and attributes. For example, a Measurement attribute that is part of a Negation attribute phrase.
Concepts and CRCs
The Indexing Results displays two listings, one of all concepts in the source, one of all of the CRCs in the source
-
Concepts in the source in descending order.
-
CRCs in the source highlighted (as above) to indicate concepts and relations, in descending order. Note that the CRCs listings do not include non-relevant or path-relevant words and do not indicate attributes.
At the top right of the Indexing Results window the sort by buttons allow you to toggle the Concepts and CRCs listings to display either frequency counts or dominance values in descending order.
In the Concepts listing, the most dominant concept(s) are given a dominance of 1000. Less dominant concepts are given smaller integer values, with larger sources tending to have lower least-dominant values. For example, a source containing 25 concepts might have a dominance range between 1000 and 83; a source containing 300 concepts might have a dominance range between 1000 and 2.
In the CRCs listing, the dominance score is arrived at my adding the dominance values of the concepts and relations.
If Japanese is the only language supported for the domain, the Indexing Results display substitutes a single Entities listing for the Concepts and CRCs listings.