Considerations When Using the XML Tools
When you work with XML tools of any kind, there are at least three general points to consider:
Character Encoding of Input and Output
When you export an XML document, you can specify the character encoding to use; otherwise, InterSystems IRIS® chooses the encoding, depending on the destination:
-
If the output destination is a file or a binary stream, the default is "UTF-8".
-
If the output destination is a string or a character stream, the default is "UTF-16".
For any XML document read by InterSystems IRIS, the XML declaration of the document should indicate the character encoding of that file, and the document should be encoded as declared. For example:
<?xml version="1.0" encoding="UTF-16"?>
However, if the character encoding is not declared in the document, InterSystems IRIS assumes the following:
-
If the document is a file or a binary stream, InterSystems IRIS assumes that the character set is "UTF-8".
-
If the document is a string or a character stream, InterSystems IRIS assumes the character set is "UTF-16".
For background information on character translation in InterSystems IRIS, see Localization Support.
Parser Behavior
The InterSystems IRIS SAX Parser is used whenever InterSystems IRIS reads an XML document, so it is useful to know its default behavior. Among other tasks, the parser does the following:
-
It verifies whether the XML document is well-formed.
-
It attempts to validate the document, using the given schema or DTD.
Here it is useful to remember that a schema can contain <import> and <include> elements that refer to other schemas. For example:
<xsd:import namespace="target-namespace-of-the-importing-schema"
schemaLocation="uri-of-the-schema"/>
<xsd:include schemaLocation="uri-of-the-schema"/>
The validation fails unless these other schemas are available to the parser. Especially with WSDL documents, it is sometimes necessary to download all the schemas and edit the primary schema to use the corrected locations.
-
It attempts to resolve all entities, including all external entities. (Other XML parsers do this as well.) This process can be time-consuming, depending on their locations. In particular, Xerces uses a network accessor to resolve some URLs, and the implementation uses blocking I/O. Consequently, there is no timeout and network fetches can hang in error conditions, which have been rare in practice.
Also, Xerces does not support https; that is, it cannot resolve entities that are at https locations.
If needed, you can create custom entity resolvers and you can disable entity resolution.