Overview, Introduction, & Tools

Overview in more depth

What is XML?

XML is a meta-language. You can use it to create and format your own "document" markup tags. Three types of files are typically processed by an XML-oriented application:

  1. XML document(s) themselves

    These contain the data, which takes the form of:

    XML documents follow rules for:

    XML documents may also contain:

  2. Stylesheet document(s)

    These determine how the XML should be formatted and presented

  3. Structure document(s)

    These can include either a Document Type Definition (DTD) or a Schema, and specify rules for how XML elements, attributes, and data are defined structurally as well as how they are related logically in an XML compliant document.

Elements

XML elements consist of a start tag, content, and an end tag. As with XHTML:

  1. An XML element name must start with a letter or an underscore. Numbers, hyphens, periods, or additional underscores can follow, and there is no limit to characters per element.
  2. Colons are permitted only for specifying namespaces (we shall talk about them later).
  3. Whitespace and symbols are not legal in an element name.

Non-empty elements must have both start and end tags. Elements such as <p> containtextual data, and thus should be coded like these examples:

<p>textual data</p>

and

<sales>sales data</sales>

Empty Elements

XML also supports empty elements. These are typically used to add non-textual content to a document. Empty elements must still contain a forward slash like this:

<logo src="./ourLogo.gif" />

Nested Elements

Tags in XML must be nested correctly, like this:

<partDescriptor>
    <partName>foo</partName>
    <partNumber>12345</partNumber>
</partDescriptor>

return to top of page

Well-Formed XML

If these basic syntax rules are met in your XML document, then the document is said to be well-formed

XML Declaration

XML documents begin with an XML declaration. This specifies the XML version being used. The XML declaration is mandatory in an XML document. The initial portion of a document, the portion preceding the first element, is known as the prolog.

Prolog

The prolog contains only declarations, whitespace, comments and processing instructions. An example might look like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="journey.css"?>

return to top of page

XML Syntax

Case Sensitivity

XML is case sensitive. All declarations and markup must match in case. This applies throughout XML processing.

Reserved Names

Additionally, the name XML is reserved. You may not create an element, attribute, or name a file beginning with "xml" or "XML".

return to top of page

Attributes

Elements May Have Attributes

Attributes of an element describe additional information about that element. Attributes consist of a name and a value and must appear in the start tag or in the empty-element tag.

The attribute name, paired with its quoted value, is known as an attribute specification. This is the familiar name / value pairpattern you see in many information technology contexts, like in Java (with the Hashtable) and Perl (with the associative array). An example might look like this:

<anthology proverb="journey">

In this example the element nameis "anthology". This instance of that element has one attribute, with the name "proverb" and the value "journey".

An attribute value may be surrounded by single or double quotes, but it must always be quoted. An attribute name must follow the same character restrictions as do element names: it must start with a letter or an underscore, and can be followed by numbers, hyphens, periods, or additional underscores.

Use Element or Attribute?

There will be times when pairing an element with an attribute specification is less effective than creating sub-elements (nesting elements). We will examine the element / attribute problem in some detail later.

return to top of page

Comments

Comments in XML should not nest inside a container tag, and cannot come before the XML declaration.

A Legal Comment

<?xml version="1.0"?>
<!-- Comment. -->

An Illegal Comment

<?xml version="1.0"?>
<name>North Air<!--flight schedule goes here --></name>

return to top of page

Content (Character Data)

All XML documents consist of characters. There are two types of character data in XML documents:

Content in Elements

Elements consist of PCDATA and the attributes of CDATA. PCDATA allows the use of pre-declared character names and decimal codeslike those you find in this table of character decimal codes and entities. To use predeclared names or decimal codes, begin with the &delimiter for characters or &#for numbers, and end with the ; ( semi-colon) delimiter for both characters and numbers.

As an example, the "greater than" character (>) can be presented using either the notation &gt;or the notation &#62;

CDATA Sections in Elements

A CDATA section will take the following form:

<![CDATA["<my (not-to-be-parsed cdata goes here>"]]>

return to top of page

Processing Instructions

Using processing instructions, it is possible to incorporate information for operations not performed by the XML processor. Processing instructions take the following form:

<?NameOfTargetApplication     ApplicationInstructions ?>

When a processing instruction is included, it is often used like this to add a style sheet:

<?xml-stylesheet type=text/css href="example.css"?>

return to top of page


revalidate the HTML revalidate the CSS Pastafarian Flag

Last modified: 2 Sep 2007 12:28:14 PM