"The basic XPath syntax is similar to filesystem addressing. If the path starts with the slash / character, then it represents an absolute path to the required element."
Any properly-formed XML instance has one and only one root element. Finding the root is simple. We will use the All's Well That Ends Well file (marked up as XML) which we used last week, so make sure you have a copy saved to your local disk and make sure to remove the DOCTYPE statement from your local copy. Open your local copy up in oXygen so we can use XPath to explore it together.
Notice the XPath query bar in the screen snapshot on the right. This query bar is always available
in oXygen, and we will use that convenience today.
We know the root of this XML data is the PLAY element (or "node") just by looking at the XML editor display. Looking down the page, we also see that the characters in this particular PLAY are named in the various PERSONA nodes. By investigating further we see that, when a character is supposed to say a line, the text is within a node called a LINE which is always within a SPEECH node. We also see that each SPEECH node starts with a SPEAKER node which is followed by one or more LINE nodes.
So, where does a SPEECH node belong? A SPEAKER and their LINEs live within a SPEECH which lives within a SCENE which lives within an ACT of a PLAY. How would we put that in an XPath statement? How about this:
/PLAY/ACT/SCENE/SPEECH/SPEAKER
Try this out in your own XML editor. Notice how, as you type in each slash following a node type a context sensitive menu drops down to show your choices.
So now let us find all the places our SPEAKER named "BERTRAM" has some lines.
The name of a particular SPEAKER is the text contained within that SPEAKER node. XPath lets us find a node by comparison against the text contained by that node pretty easily. The function to access the text contained by a node is simply text() and our example would look like this:
/PLAY/ACT/SCENE/SPEECH/SPEAKER[text()='BERTRAM']
Try this out in oXygen now please. Look to the bottom of the oXygen display for results.
But really, all we are looking for is the text of any LINE by any SPEAKER named BERTRAM, which might look something like this:
//SPEAKER[text()='BERTRAM']/following-sibling::LINE[text()]
Notice the use of the following-sibling:: axis to get the text of all the LINEs which follow any SPEAKER node with "BERTRAM" as the text.
Take a look at your solution to the last homework. Can you see how you might have used an approach like this to deal with the problem?
Okay, so what about moving this work to the client-side and running things in a browser context? We have talked a number of times in this class about the issue of separation of concerns. We want to keep the model of the data, the views of the data which we create, and the controller code all separated from one another. If we want results to show up in a browser, we need to find ways to move the application of XSL against XML to the browser-side.
Mozilla-derived browsers (Firefox, Mozilla, Camino, and so on) now provide a JavaScript interface to XPath which you can use in your controller code (the JavaScript). The simplest way to use the Mozilla JavaScript interface to XPath is through the evaluate function of the document object. This will return an XPathResult object (like a ResultSet is returned by Java when querying a DBMS).
The evaluate function uses five (5) arguments (from an article at Cambridge):
Note that XPath expressions can be run against both ordinary HTML and XML. For example, let us say we wanted to extract the Level Two Headings from an HTML document. The XPath expression to match these would be "//h2", and the JavaScript code would look like this:
var headings = document.evaluate("//h2", document, null, XPathResult.ANY_TYPE, null);
Note that, since HTML does not have namespaces, null is passed in as the namespaceResolver. Note also the use of XPathResult.ANY_TYPE as the resultType for this code snippet. The JavaScript engine will try to find a more-or-less natural object-type to use if passed XPathResult.ANY_TYPE as the resultType (more do what I mean JavaScript magic). In this case, you would then evaluate the resultType property of the returned object to find out what it actually turned out to be.
Here is an example of using the iterateNext method of the returned object to access the nodes it contains:
var headings = document.evaluate("//h2", document, null, XPathResult.ANY_TYPE, null);
var thisHeading = headings.iterateNext();
var alertText = "Level 2 headings in this document are:\n"
while (thisHeading) {
alertText += thisHeading.textContent + "\n"
thisHeading = headings.iterateNext();
}
alert(alertText);
Note that, after iterating through the elements, additional calls to iterateNext() will return null.
But what about the situation where what is returned by the XPath expression is not a node set but rather a simpler type, like a number, or a string or a boolean? We still get an XPathResult object from our call to document.evaluate, but we must access it via the appropriate numberValue property or stringValue property or booleanValue property of the XPathResult to retrive our results. See now why we might want to evaluate the resultType property of the returned object to find out what it actually turned out to be? Here is an example of counting the number of paragraphs in an HTML document:
var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE, null).numberValue;
alert("This document contains " + paragraphCount + " paragraphs");
Note that the XPath interface will not automatically convert the numerical result if the stringValue property is requested in the above example. JavaScript may do automatic conversions, but the XPath interface will not. As a result, the following code would not work (instead, it will return a NS_DOM_TYPE_ERROR):
var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE, null).stringValue;
alert("This document contains " + paragraphCount + " paragraphs");
Remember the homework from last week? It was to create a customized printout a play with the lines for a particular part formatted to make them easy to pick out. You were supposed to:
create a file author.xsl which contains the XSL-T and the XSL-FO in one file and can be used to transform the play XML data into PDF format. There will be one constant declared, and it is used to select whichever character you are outputting for this time.
You are going to use what we covered in class today (see above) to try to make a searchable online version of your output document.
Last modified: 9 Mar 2009 11:02:46 AM