Overview, Introduction, & Tools
Overview in more depth
What is XML?
XML is a meta-language. You can use it to create and format your own "document"
markup tags. Three types of files are typically processed by an XML-oriented application:
-
XML document(s) themselves
These contain the data, which takes the form of:
XML documents follow rules for:
XML documents may also contain:
-
Stylesheet document(s)
These determine how the XML should be formatted and presented
-
Structure document(s)
These can include either a Document Type Definition (DTD) or a Schema,
and specify rules for how XML elements, attributes, and data are defined structurally
as well as how they are related logically in an XML compliant document.
Elements
XML elements consist of a start tag, content, and an end tag. As with XHTML:
-
An XML element name must start with a letter or an underscore. Numbers, hyphens, periods,
or additional underscores can follow, and there is no limit to characters per element.
- Colons are permitted only for specifying namespaces (we shall talk about them later).
- Whitespace and symbols are not legal in an element name.
Non-empty elements must have both start and end tags. Elements such as <p>
containtextual data, and thus should be coded like these examples:
<p>textual data</p>
and
<sales>sales data</sales>
Empty Elements
XML also supports empty elements. These are typically used to add non-textual content to a
document. Empty elements must still contain a forward slash like this:
<logo src="./ourLogo.gif" />
Nested Elements
Tags in XML must be nested correctly, like this:
<partDescriptor>
<partName>foo</partName>
<partNumber>12345</partNumber>
</partDescriptor>
return to top of page
XML Syntax
Case Sensitivity
XML is case sensitive. All declarations and markup must match in case. This applies throughout XML
processing.
Reserved Names
Additionally, the name XML is reserved. You may not create an element, attribute, or
name a file beginning with "xml" or "XML".
return to top of page
Attributes
Elements May Have Attributes
Attributes of an element describe additional information about that element. Attributes
consist of a name and a value and must appear in the start tag or in the empty-element tag.
The attribute name, paired with its quoted value, is known as an attribute specification.
This is the familiar name / value pairpattern you see in many information technology
contexts, like in Java (with the Hashtable) and Perl (with the associative array).
An example might look like this:
<anthology proverb="journey">
In this example the element nameis "anthology". This instance of that
element has one attribute, with the name "proverb" and the
value "journey".
An attribute value may be surrounded by single or double quotes, but it must always be quoted.
An attribute name must follow the same character restrictions as do element names: it must start
with a letter or an underscore, and can be followed by numbers, hyphens, periods, or additional
underscores.
Use Element or Attribute?
There will be times when pairing an element with an attribute specification is less effective than
creating sub-elements (nesting elements). We will examine the element / attribute problem
in some detail later.
return to top of page
Content (Character Data)
All XML documents consist of characters. There are two types of character data in XML documents:
- PCDATA, which has to be parsed after loading
- CDATA, which is not parsed after loading
Content in Elements
Elements consist of PCDATA and the attributes of CDATA. PCDATA allows the use of
pre-declared character names and decimal codeslike those you find in
this table of character decimal codes and entities.
To use predeclared names or decimal codes, begin with the &delimiter
for characters or &#for numbers, and end with the ;
( semi-colon) delimiter for both characters and numbers.
As an example, the "greater than" character (>) can be presented using either
the notation >or the notation >
CDATA Sections in Elements
A CDATA section will take the following form:
<![CDATA["<my (not-to-be-parsed cdata goes here>"]]>
return to top of page
Processing Instructions
Using processing instructions, it is possible to incorporate information for operations not
performed by the XML processor. Processing instructions take the following form:
<?NameOfTargetApplication ApplicationInstructions ?>
When a processing instruction is included, it is often used like this to add a style sheet:
<?xml-stylesheet type=text/css href="example.css"?>
return to top of page
Comments
Comments in XML should not nest inside a container tag, and cannot come before the XML declaration.
A Legal Comment
An Illegal Comment
return to top of page