class %iKnow.Source.Converter.Html extends %iKnow.Source.Converter

This is a sample implementation for %iKnow.Source.Converter, designed to weed out HTML tags from plain text input. Data is first buffered into a process-private global and stripped of HTML in the Convert call.

Converter parameters:

  1. Unescape As %Boolean: set to 1 to unescape HTML special characters such as converting "&" to "&" (default = 1)
  2. SkipTags As %String: comma-separated list of tags whose content (text nested between the start and end tag) is to be left out (default = "script,style")
  3. BreakLines As %Boolean: whether or not to insert double line breaks for non-inline tags (such as p, br, td, ...), in order for the iKnow engine to split sentences at those positions (default = 1)


• property BreakLines as %Boolean [ InitialExpression = 1 ];
• property SkipTags as %String(MAXLEN="") [ InitialExpression = ",script,style," ];
• property Unescape as %Boolean [ InitialExpression = 1 ];


• method BufferString(data As %String) as %Status
Buffer data in the PPG
• method Convert() as %Status

Loop through buffered data and strip off HTML tags. Reset the pointer in the root PPG node at the end, for NextConverterdPart to know where to start.

• method NextConvertedPart() as %String
Loop through the PPG again and return processed strings.
• method SetParams(params As %String) as %Status

Utility method called by the %iKnow.Source.Processor and %iKnow.Source.Loader logic to register any new or changed parameter values.

• classmethod StripHTML(ByRef pText As %String, pUnescape As %Boolean = 1, pSkipTags As %String = "script,style", pBreakLines As %Boolean = 1, Output pSC As %Status) as %String
Utility method to strip HTML tags from the supplied string. See the class documentation for more details on the available parameters.