Considerations When Using the XML Tools

When you work with XML tools of any kind, there are at least three general points to consider:

Character Encoding of Input and Output

When you export an XML document, you can specify the character encoding to use; otherwise, InterSystems IRIS® data platform chooses the encoding, depending on the destination:

If the output destination is a file or a binary stream, the default is "UTF-8".
If the output destination is a string or a character stream, the default is "UTF-16".

For any XML document read by InterSystems IRIS, the XML declaration of the document should indicate the character encoding of that file, and the document should be encoded as declared. For example:

<?xml version="1.0" encoding="UTF-16"?>

However, if the character encoding is not declared in the document, InterSystems IRIS assumes the following:

If the document is a file or a binary stream, InterSystems IRIS assumes that the character set is "UTF-8".
If the document is a string or a character stream, InterSystems IRIS assumes the character set is "UTF-16".

For information on character sets and translation tables, see Translation Tables.

Choosing a Document Format

When you work with an XML document, you must know the format to use when mapping the document to InterSystems IRIS classes. Similarly, when you create an XML document, you specify the document format to use when writing the document. The XML document formats are as follows:

Literal means that the document is a literal copy of the object instance. In most cases, you use literal format, even when working with SOAP.
Except where otherwise noted, the examples in the documentation use literal format.
Encoded means encoded as described in the SOAP 1.1 standard or the SOAP 1.2 standard. For links to these standards, see XML Standards.
The details are slightly different for SOAP 1.1 and SOAP 1.2.

The following subsections show the differences between these document formats.

Literal Format

The following sample shows an XML document in literal format:

<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <Person>
      <Name>Klingman,Julie G.</Name>
      <DOB>1946-07-21</DOB>
      <GroupID>W897</GroupID>
      <Address>
         <City>Bensonhurst</City>
         <Zip>60302</Zip>
      </Address>
      <Doctors>
         <DoctorClass>
            <Name>Jung,Kirsten K.</Name>
         </DoctorClass>
         <DoctorClass>
            <Name>Xiang,Charles R.</Name>
         </DoctorClass>
         <DoctorClass>
            <Name>Frith,Terry R.</Name>
         </DoctorClass>
      </Doctors>
   </Person>
</Root>

Encoded Format

In contrast, the following example shows the same data in encoded format:

<?xml version="1.0" encoding="UTF-8"?>
<Root xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:s="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
   <DoctorClass id="id2" xsi:type="DoctorClass">
      <Name>Jung,Kirsten K.</Name>
   </DoctorClass>
...
   <DoctorClass id="id3" xsi:type="DoctorClass">
      <Name>Quixote,Umberto D.</Name>
   </DoctorClass>
...
   <DoctorClass id="id8" xsi:type="DoctorClass">
      <Name>Chadwick,Mark L.</Name>
   </DoctorClass>
...
   <Person>
      <Name>Klingman,Julie G.</Name>
      <DOB>1946-07-21</DOB>
      <GroupID>W897</GroupID>
      <Address href="#id17" />
      <Doctors SOAP-ENC:arrayType="DoctorClass[3]">
         <DoctorClass href="#id8" />
         <DoctorClass href="#id2" />
         <DoctorClass href="#id3" />
      </Doctors>
   </Person>
   <AddressClass id="id17" xsi:type="s_AddressClass">
      <City>Bensonhurst</City>
      <Zip>60302</Zip>
   </AddressClass>
...
</Root>

Note the following differences in the encoded version:

The root element of the output includes declarations for the SOAP encoding namespace and other standard namespaces.
This document includes person, address, and doctor elements all at the same level. The address and doctor elements are listed with unique IDs that are used by the person elements that refer to them. Each object-valued property is treated this way.
The names of the top-level address and doctor elements are named the same as the respective classes, rather than being named the same as the property that refers to them.
Encoded format does not include any attributes. The GroupID property is mapped as an attribute in the Person class. In literal format, this property is projected as an attribute. In the encoded version, however, the property is projected as an element.
Collections are treated differently. For example, the list element has the attribute ENC:arrayType.
Each element has a value for the xsi:type attribute.

Note:

For SOAP 1.2, the encoded version is slightly different. To easily distinguish the versions, check the declaration for the SOAP encoding namespace:

For SOAP 1.1, the SOAP encoding namespace is "http://schemas.xmlsoap.org/soap/encoding/"
For SOAP 1.2, the SOAP encoding namespace is "http://schemas.xmlsoap.org/wsdl/soap12/"

Parser Behavior

The InterSystems IRIS SAX Parser is used whenever InterSystems IRIS reads an XML document, so it is useful to know its default behavior. Among other tasks, the parser does the following:

It verifies whether the XML document is well-formed.
It attempts to validate the document, using the given schema or DTD.
Here it is useful to remember that a schema can contain <import> and <include> elements that refer to other schemas. For example:
```
<xsd:import namespace="target-namespace-of-the-importing-schema"
                  schemaLocation="uri-of-the-schema"/>

<xsd:include schemaLocation="uri-of-the-schema"/>
```
The validation fails unless these other schemas are available to the parser. Especially with WSDL documents, it is sometimes necessary to download all the schemas and edit the primary schema to use the corrected locations.
It attempts to resolve all entities, including all external entities. (Other XML parsers do this as well.) This process can be time-consuming, depending on their locations. In particular, Xerces uses a network accessor to resolve some URLs, and the implementation uses blocking I/O. Consequently, there is no timeout and network fetches can hang in error conditions, which have been rare in practice.
Also, Xerces does not support https; that is, it cannot resolve entities that are at https locations.
If needed, you can create custom entity resolvers and you can disable entity resolution.

Reading and Validating XML Documents

Introduction to InterSystems XML Tools