Skip to main content

Semantic Proximity

Semantic Proximity

Semantic proximity is a calculation of the semantic “distance” between two entities within a sentence. The higher the proximity integer, the closer the entities.

As a demonstration of this semantic distance, given the sentence:

“The giraffe walked with long legs to the base of the tree, then stretched his long neck
 up to reach the lowest leaves.”

the proximity of the concept “giraffe” might be as follows: long legs=64, base=42, tree=32, long neck=25, lowest leaves=21.

Semantic proximity is calculated for each entity in each sentence, then these generated proximity scores are added together producing an overall proximity score for each entity for the entire set of source texts. For example, given the sentences:

“The giraffe walked with long legs to the base of the tree, then stretched his long neck
 up to reach the lowest leaves. Having eaten, the giraffe bent his long legs and stretched
 his long neck down to drink from the pool.”

the proximity of the concept “giraffe” might be as follows: long legs=128, long neck=67, base=42, tree=32, pool=32, lowest leaves=21.

Entity proximity is commutative; this means that the proximity of entity1 to entity2 is the same as the proximity of entity2 to entity1. NLP does not calculate a semantic proximity of an entity to itself. For example, the sentence “The boy told a boy about another boy.” would not generate any proximity scores, but the sentence “The boy told a younger boy about another small boy.” generates the proximity scores younger boy=64, small boy=42. If the same entity appears multiple times in a sentence, the proximity score is additive. For example, the proximity for the concept “girl” in the sentence “The girl told the boy about another boy.” is boy=106, the total the two proximity scores 64 and 42.

Japanese Semantic Proximity

NLP semantic analysis of Japanese uses an algorithm to create Entity Vectors. An entity vector is an ordering of entities in the sentence that follow a predefined logical sequence. When NLP converts a Japanese sentence into an entity vector it commonly rearranges the order of entities. Semantic proximity for Japanese uses the entity vector entity order, not the original sentence entity order.

Proximity Examples

Refer to A Note on Program Examples for details on the coding and data used in the examples in this book.

The following example uses the GetProfile()Opens in a new tab method to return the proximity of the concept “student pilot” to other concepts in sentences in all of the sources in the domain. GetProfile() supports filters and skiplists:

#include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
ProximityForEntity
    SET entity="student pilot"
    DO ##class(%iKnow.Semantics.ProximityAPI).GetProfile(.eresult,domId,entity,1,20)
       SET k=1
       WHILE $DATA(eresult(k)) {
          SET item=$LISTTOSTRING(eresult(k))
          WRITE $PIECE(item,",",1)," ^ "
          WRITE $PIECE(item,",",2)," ^ "
          WRITE $PIECE(item,",",3),!
          SET k=k+1 }
    WRITE !,"all done"

The following example uses the GetProfileBySourceId()Opens in a new tab method to list the concepts with the greatest proximity to a given entity for each source. Each concept is listed by entity Id, value, and proximity score:

#include %IKPublic
DomainCreateOrOpen
  SET dname="mydomain"
  IF (##class(%iKnow.Domain).NameIndexExists(dname))
     { WRITE "The ",dname," domain already exists",!
       SET domoref=##class(%iKnow.Domain).NameIndexOpen(dname)
       GOTO DeleteOldData }
  ELSE 
     { WRITE "The ",dname," domain does not exist",!
       SET domoref=##class(%iKnow.Domain).%New(dname)
       DO domoref.%Save()
       WRITE "Created the ",dname," domain with domain ID ",domoref.Id,!
       GOTO ListerAndLoader }
DeleteOldData
  SET stat=domoref.DropData()
  IF stat { WRITE "Deleted the data from the ",dname," domain",!!
            GOTO ListerAndLoader }
  ELSE    { WRITE "DropData error ",$System.Status.DisplayError(stat)
            QUIT}
ListerAndLoader
  SET domId=domoref.Id
  SET flister=##class(%iKnow.Source.SQL.Lister).%New(domId)
  SET myloader=##class(%iKnow.Source.Loader).%New(domId)
QueryBuild
   SET myquery="SELECT Top 25 ID AS UniqueVal,Type,NarrativeFull FROM Aviation.Event"
   SET idfld="UniqueVal"
   SET grpfld="Type"
   SET dataflds=$LB("NarrativeFull")
UseLister
  SET stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds)
      IF stat '= 1 {WRITE "The lister failed: ",$System.Status.DisplayError(stat) QUIT }
UseLoader
  SET stat=myloader.ProcessBatch()
      IF stat '= 1 {WRITE "The loader failed: ",$System.Status.DisplayError(stat) QUIT }
SourceCountQuery
  SET totsrc=##class(%iKnow.Queries.SourceAPI).GetCountByDomain(domId)
GetEntityID
  SET entId=##class(%iKnow.Queries.EntityAPI).GetId(domId,"student pilot")
QueryBySource
  DO ##class(%iKnow.Queries.SourceAPI).GetByDomain(.result,domId,1,totsrc)
  SET j=1,k=1
  WHILE $DATA(result(j),srclist) {
    SET src = $LISTGET(srclist)
    WRITE !,"Source id: ",src,!
    SET entity="student pilot"
    DO ##class(%iKnow.Semantics.ProximityAPI).GetProfileBySourceId(.srcresult,domId,entId,src,1,totsrc)
       WHILE $DATA(srcresult(k)) {
          SET item=$LISTTOSTRING(srcresult(k))
          WRITE $PIECE(item,",",1)," ^ "
          WRITE $PIECE(item,",",2)," ^ "
          WRITE $PIECE(item,",",3),!
          SET k=k+1 }
    SET k=1
    SET j=j+1 }
  WRITE !!,"Printed all ",j-1," sources"
FeedbackOpens in a new tab