docs.intersystems.com
InterSystems IRIS Data Platform 2019.2  /  Using InterSystems Natural Language Processing (NLP)

Using InterSystems Natural Language Processing (NLP)
Semantic Attributes
Previous section           Next section
InterSystems: The power behind what matters   
Search:  


InterSystems NLP identifies concepts and their context in a natural language text.
Concepts are indivisible word groups that have a meaning on their own, and are often associated with other concepts in a sentence by relations. For example, in the sentence “Patient is being treated for acute pulmonary hypertension,” NLP identifies the word groups “patient” and “acute pulmonary hypertension” as concepts. These concepts are associated by the relation “is being treated for”.
Context is provided by associating a concept with a semantic attribute. For example, in the sentence “Patient is not being treated for acute pulmonary hypertension,” the concept “acute pulmonary hypertension” has the same meaning, but its context is clearly different. The context for this concept is that it appears in a part of the sentence that is negated. An application created with NLP can provide much more focused results by differentiating instances of a concept that are not negated from those that are.
Negation is one of several semantic attributes that associate concepts with specific contexts.
NLP supports the following semantic attributes:
An NLP attribute flag is associated with one or more marker terms (word or short phrase) that affects the interpretation of a path or sentence. A part of a path or sentence is flagged as being affected by that attribute, and can thus be separated from similar parts of paths or sentences that do not have the attribute.
Negation
Negation is the process that turns an affirmative sentence (or part of a sentence) into its opposite, a denial. For example, the sentence “I am a doctor.” can be negated as “I am not a doctor.” In analyzing text it is often important to separate affirmative statements about a topic from negative statements about that topic.
NLP provides a means to determine if a sentence or path is negated. During source indexing NLP associates the attribute “negation” with a sentence and indicates which part of the text is negated.
While in its simplest form negation is simply associating “no” or “not” with an affirmative statement or phrase, in actual language usage negation is a more complex and language-specific operation. There are two basic types of negation:
The NLP language models contain a variety of language-specific negation words and structures. Using these language models the NLP analysis engine is able to automatically identify and flag for future use most instances of formal negation as part of the source loading operation. However, NLP cannot identify instances of semantic negation.
The largest unit of negation in NLP is a path; NLP cannot identify negations in text units larger than a path. Many, but not all, sentences comprise a single path.
Properties of Formal Negation
Formal negation can be defined by three properties:
NLP uses these properties to identify negated units of text. Negation markers are tagged at the entity (concept or relation) level by assigning a negation attribute. Negation span is tagged at the path level with negation-begin and negation-end tags.
Japanese supports the negation attribute at the entity level, but because of the fundamentally different definition of paths in Japanese, path expansion is not supported. Therefore, negation for Japanese does not necessarily expand to all affected entities at the path level.
Using Negation Attributes
Negation analysis information can be used with the following methods:
You can specify the negation attribute ID using the $$$IKATTNEGATION macro, defined in the %IKPublic #Include file.
Negation Attribute Structure
Negation is implemented in NLP as an attribute. That is, sources, sentences, or paths that contain negation have the negation attribute. This attribute is a %List structure with the following elements:
Negation Bit Map
Element 5 is the negation bit map. It indicates where the negation markers are in the negation scope. When the negation scope is 1, this is a simple bit map. When the negation scope is greater than one, this is a series of bit maps separated by spaces, one bit map for each entity within the negation scope.
Within the negation scope, if an entity contains a negation marker the negation marker and each word preceding it is indicated by either a 1 (negation marker word) or a 0 (word preceding the negation marker). If an entity within the negation scope does not contain a negation marker, the whole entity is represented by a single 0. Note that non-relevant words, such as “a” and “the”, are considered to be separate entities. Some examples of negation bit mapping are shown in the following table:
Negation Bit Map Sentence Text with / entity dividers and underlined negation markers
01 0 1 Bartleby / is neither / busy / nor idle.
01 0 1 Bartleby / is neither / sixty-five years old / nor retired.
1 0 1 Bartleby / is / no idler / and certainly is / no loafer.
11 0 0 01 Bartleby / is not / my / favorite fictional character / but neither is / he / my / least favorite.
1 0 0 01 Bartleby / isn’t / my / favorite fictional character / but neither is / he / my / least favorite.
001 0 0011 Bartleby / is either not / trying very hard / or he is not / succeeding.
11 0 0 0011 Bartleby / is not / a / wholly realistic character / and yet is not / wholly unbelievable.
1 0001 Bartleby / never works, / but he is never / wholly idle.
1 001 Bartleby / does / nothing / and yet never is / he / idle.
The largest entity bit map is 8 bits. In rare cases a negation marker can be more than eight words from the beginning of its entity. If the negation marker is a two-word marker at positions 8 and 9, the second “1” is omitted (“00000001”); if the negation marker is at position 9 or greater, no bit map is returned. In the following examples the negation marker is in the second entity, a relation containing many words (due to the semantic ambiguity of the word “in”): “They start when you get in and are not finished when you leave.” maps as “00000011”; “They start when you get in and they are not finished when you leave.” maps as “00000001” (second word of the negation marker not mapped); “They often start when you get in and they are not finished when you leave.” returns no bit map.
You can determine if a negation bit map has been omitted by comparing the Element 4 scope of negation entity count with the Element 5 number of blank-separated bit maps. If these two counts do not match one or more negation entity bit maps are missing.
Negation and Dictionary Matching
NLP recognizing negated entities when matching against a dictionary. It calculates the number of entities that are part of a negation and stores this number as part of the match-level information (as returned by methods such as GetMatchesBySource() or as the NegatedEntityCount property of %iKnow.Objects.DictionaryMatch). This allows you to create code that interprets matching results by considering negation content, for example by comparing negated entities to the total number of entities matched.
For further details, refer to the Smart Matching: Using a Dictionary chapter of this manual.
Negation Examples
Refer to A Note on Program Examples for details on the coding and data used in the examples in this book.
The following example uses %iKnow.Queries.SourceAPI.GetAttributes() to search each source in a domain for paths and sentences that have the negation attribute. It displays the PathId or SentenceId, the start position and the span of each negation. To limit %iKnow.Queries.SourceAPI.GetAttributes() to paths, specify $$$IKATTLVLPATH rather than $$$IKATTLVLANY:
#Include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeCause FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeCause")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
GetSourcesAndAttributes
   SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
   DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.srcs,domId,1,numSrcD)
   SET i=1
   WHILE $DATA(srcs(i)) {
      SET srcId = $LISTGET(srcs(i),1)
      SET i=i+1
      DO ##class(%iKnow.Queries.SourceAPI).GetAttributes(.att,domId,srcId,1,10,"",$$$IKATTLVLANY)
      SET j=1
      WHILE $DATA(att(j)) {
          IF $LISTGET(att(j),1)=1 {
            SET type=$LISTGET(att(j),2)
            SET level=$LISTGET(att(j),3)
            SET targId=$LISTGET(att(j),4)
            SET start=$LISTGET(att(j),5)
            SET span=$LISTGET(att(j),6)
               IF level=1 {WRITE "source ",srcId," ",type," path ",targId," start at ",start," span ",span,!}
               ELSEIF level=2 {WRITE "source ",srcId," ",type," sentence ",targId," start at ",start," span ",span,!!}
               ELSE {WRITE "unexpected attribute level",! }
         }
     SET j=j+1
     }
    }
The following example uses %iKnow.Queries.SentenceAPI.GetAttributes() to find those sentences in each source in a domain that have the negation attribute. It displays which sentence id of those sentences that have this attribute, and the entity position that contains the negation marker. It then displays the text of these sentences.
#Include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT TOP 100 ID AS UniqueVal,Type,NarrativeCause FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeCause")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
GetSourcesAndSentences
   SET numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId)
   DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.srcs,domId,1,numSrcD)
   SET i=1
   WHILE $DATA(srcs(i)) {
      SET srcId = $LISTGET(srcs(i),1)
      SET i=i+1
      SET st = ##class(%iKnow.Queries.SentenceAPI).GetBySource(.sent,domId,srcId)
      SET j=1
      WHILE $DATA(sent(j)) {
         SET sentId=$LISTGET(sent(j),1)
         SET text=$LISTGET(sent(j),2)
         SET j=j+1
CheckSentencesForNegation
         SET atstat=##class(%iKnow.Queries.SentenceAPI).GetAttributes(.att,domId,sentId)
         SET k=1
            WHILE $DATA(att(k)) {
             WRITE "sentence ",sentId," has attribute=",$LISTGET(att(k),2)
             WRITE ", marker at entity position=",$LISTGET(att(k),3),!
             /* Format for display */
             WRITE sentId,": "
             SET x=1
             SET totlines=$LENGTH(text)/60
               FOR L=1:1:totlines {
               WRITE $EXTRACT(text,x,x+60),!
               SET x=x+61 }
             WRITE "END OF SENTENCE ",sentId,!!
         SET k=k+1 }
    }
  }
Adding Negation Terms
You can specify negation for other specific words or phrases using the NLP UserDictionary. Using the AddNegationTerm() method, you can add a list of negation terms to a UserDictionary. When source texts are loaded into a domain, all appearances of these terms are flagged with the negation marker.
Negation Special Cases
The following are a few peculiarities of negation in English:
Because negation operates on sentence units, it is important to know what NLP does (and does not) consider a sentence. For details on how NLP identifies a sentence, refer to the Logical Text Units Identified by NLP section of the “Conceptual Overview” chapter.
Measurement
Documents commonly contain structured data elements that express a quantity. These can include counts, lengths, weights, currency amounts, medication dosages, and other quantified expressions. These data element commonly consist of a numeric expression (specified as either numbers or words) and either an associated concept term (23 widgets) for a count, or some form of unit for a measurement. NLP annotates this combination of a number and a unit of measure as a measurement marker term at the word level.
In addition to annotating the number and unit, InterSystems NLP uses attribute expansion rules to identify the other concepts “involved” in the measurement. This annotated sequence of concepts captures what is being measured, rather than just the measurement itself.
These expanded attributes can be used to:
Note:
The Measurement attribute is currently only supported for English.
Generally, any concept associated with a number is flagged with the measurement attribute. The exceptions are described in “Time, Duration, and Frequency”.
Time, Duration, and Frequency
Documents may also contain structured data that expresses time, duration, or frequency. These are annotated as separate attributes, commonly consisting of an attribute term as part of a concept. These attributes are identified based on marker terms identified in the language. They may or may not include a specific number.
A number, either specified with numerals or with words is almost always treated as a measurement attribute. However, a time attribute can contain a numeric, and a frequency attribute can contain an ordinal number.
The following are some of the guidelines in the English language model governing numbers:
Note:
The Time, Duration, and Frequency attributes are currently only fully supported for English. Partial support is provided for Dutch and Czech.
Sentiment
A sentiment attribute flags a sentence as having either a positive or negative sentiment. Sentiment terms are highly dependent on the kind of texts being analyzed. For example, in a customer perception survey context the following terms might be flagged with a sentiment attribute:
Because sentiment terms are often specific to the nature of the source texts, NLP does not automatically identify sentiment terms. You can flag individual words as having a positive sentiment or a negative sentiment attribute. By default, no words have a sentiment attribute. You can specify a sentiment attribute for specific words using the NLP UserDictionary. Using the AddPositiveSentimentTerm() and AddNegativeSentimentTerm() methods, you can add a list of sentiment terms to a UserDictionary. When source texts are loaded into a domain, each appearance of these terms and the part of the sentence affected by it is flagged with the specified positive or negative sentiment marker.
For example, if “hated” is specified as having a negative sentiment attribute, and “amazing” is specified as having a positive sentiment attribute, when NLP applies them to the sentence:
I hated the rain outside, but the running shoes were amazing.
Negative sentiment would affect “rain” and positive sentiment would affect “running shoes”.
Sentiment attributes are not currently supported for Czech or Japanese.
Using Sentiment Attributes
Sentiment Analysis information can be used with the following methods:
You can specify a sentiment attribute ID using either the $$$IKATTSENPOSITIVE or $$$IKATTSENNEGATIVE macro, defined in the %IKPublic #Include file.


Previous section           Next section
Send us comments on this page
View this book as PDF   |  Download all PDFs
Copyright © 1997-2019 InterSystems Corporation, Cambridge, MA
Content Date/Time: 2019-08-23 05:35:26