XML Primer

From MTConnect® User's Portal
Jump to: navigation, search

XML

XML consists of a hierarchy of elements. The elements can contain sub-elements, CDATA, or both. For this specification, however, an element never contains mixed content or both sub-elements and CDATA. Attributes are additional information associated with an element. The textual representation of an element is referred to as a tag. See the following example:

1. <Foo name=”bob”>Ack!</Foo>

An XML element consists of a named opening and closing tag. In the above example, <Foo...> is referred to as the opening tag and </Foo> is referred to as the closing tag. The text Ack! in between the opening and closing tags is called the CDATA. CDATA can be restricted to certain formats, patterns, or words. In the document when it refers to an element having CDATA, it indicates that the element has no sub-elements and only contains data.

When one looks at an XML Document there are two parts. The first part is typically referred to as an XML declaration and is only a single line. It looks something like this:

2. <?xml version="1.0" encoding="UTF-8"?>

This line indicates the XML version being used and the character encoding. Though it is possible to leave this line off, it is usually considered good form to include this line in the beginning of the document.

Every XML Document contains one and only one root element. In the case of MTConnect, it is the MTConnectDevices, MTConnectStreams, MTConnectAssets, or MTConnectError element. When these root elements are used in the examples, you will sometimes notice that it is prefixed with mt as in mt:MTConnectDevices.

The mt is what is referred to as a namespace alias and it refers to the urn urn:mtconnect.org:MTConnectDevices:1.2 in the case of an MTConnectDevices document. The urn is the important part and MUST be consistent between the schema and the XML document. The namespace alias will be included as an attribute of the XML element as in:

  1. <MTConnectDevices 
  2.   xmlns:m="urn:mtconnect.org:MTConnectDevices:1.2" 
  3.   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4.   xmlns="urn:mtconnect.org:MTConnectDevices:1.2"
  5.   xsi:schemaLocation="urn:mtconnect.org:MTConnectDevices:1.2
  6.   http://www.mtconnect.org/schemas/MTConnectDevices_1.2.xsd">

In the above example, the alias m refers to the MTConnectDevices urn. This document also contains a default namespace on line 4 which is specified with an xmlns attribute without an alias. There is an additional namespace that is always included in all XML documents and usually assigned the alias xsi. This namespace is used to refer to all the standard XML data types prescribed by the W3C. An example of this is the xsi:schemaLocation attribute that tells the XML parser where the schema can be found.

In XML, to allow for multiple XML Schemas to be used within the same XML Document, a namespace will indicate which XML Schema is in effect for this section of the document. This convention allows for multiple XML Schemas to be used within the same XML Document, even if they have the same element names. The namespace is optional and is only required if multiple schemas are required.

An attribute is additional data that can be included in each element. For example, in the following MTConnect® DataItem, there are several attributes describing the DataItem:

  1. <DataItem name=”Xpos” type=”POSITION” subType=”ACTUAL” category=”SAMPLE” />

The name, type, subType, and category are attributes of the element. Each attribute can only occur once within an element declaration, and it can either be required or optional.

An element can have any number of sub-elements. The XML Schema specifies which sub-elements and how many times a given sub-element can occur. Here’s an example:

  1. <TopLevel>
  2.   <FirstLevel>
  3.     <SecondLevel>
  4.       <ThirdLevel name=”first”></ThirdLevel>
  5.       <ThirdLevel name=”second”></ThirdLevel>
  6.     </SecondLevel>
  7.   </FirstLevel>
  8. </TopLevel>

In the above example, the FirstLevel has an sub-element SecondLevel which in turn has two sub-elements, ThirdLevel, with different names. Each level is an element and its children are its sub-elements and so forth.

In XML we sometimes use elements to organize parts of the document. A few examples in MTConnect® are Streams, DataItems, and Components. These elements have no attributes or data of their own; they only provide structure to the document and allow for various parts to be addressed easily.

In the following example DataItems and Components are only used to contain certain types of elements and provide structure to the documents. These elements will be referred to as Containters in the standard.

  1.   <Device id=”d” name=”Device”>
  2.     <DataItems>
  3.       <DataItem/>
  4.       </DataItems>
  5.       <Components>
  6.          <Axes></Axes>
  7.       </Components>
  8.    </Device>

An XML Document can be validated. The most basic check is to make sure it is well-formed, meaning that each element has a closing tag, as in <foo>...</foo> and the document does not contain any illegal characters (<>) when not specifying a tag. If the closing </foo> was left off or an extra > was in the document, the document would not be well-formed and may be rejected by the receiver. The document can also be validated against a schema to ensure it is valid. This second level of analysis checks to make sure that required elements and attributes are present and only occur the correct number of times. A valid document must be well-formed.

All MTConnect® documents must be valid and conform to the XML Schema provided along with this specification. The schema will be versioned along with this specification. The greatest possible care will be taken to make sure that the schema is backward compatible.

For more information, visit the w3c website for the XML Standards documentation: http://www.w3.org/XML/

Markup Conventions

MTConnect® follows industry conventions on tag format and notations when developing the XMLschema. The general guidelines are as follows:

  1. All tag names will be specified in Pascal case (first letter of each word is capitalized). For example: <ComponentEvents />
  2. Attribute names will also be camel case, similar to Pascal case, but the first letter will be lower case. For example: <MyElement attributeName=”bob”/>
  3. All values that are part of a limited or controlled vocabulary will be in upper case with an _ (underscore) separating words. For example: ON, OFF, ACTUAL, COUNTER_CLOCKWISE, etc…
  4. Dates and times will follow the W3C ISO 8601 format with arbitrary decimal fractions of a second allowed. Refer to the following specification for details: http://www.w3.org/TR/NOTE-datetime The format will be YYYY-MM-DDThh:mm:ss.ffff, for example 2007-09-13T13:01.213415. The accuracy and number of decimal fractional digits of the timestamp is determined by the capabilities of the device collecting the data. All times will be given in UTC (GMT).
  5. XML element names will be spelled-out and abbreviations will be avoided. The one exception is the word identifier that will be abbreviated Id. For example: SequenceNumber will be used instead of SeqNum.

MTConnect XML Overview