Tip

The XPath Toolkit in Java 5

What is XPath Anyway?

The XML Path Language, or XPath, is a language that defines a syntax for locating items in an XML document. It was originally defined for use with XSL transformations and most readers will have encountered it in

    Requires Free Membership to View

that context. Java programmers recognized that XPath expressions could be very useful and with the release of Java 1.5, XPath arrived in the standard toolkit in the javax.xml.xpath package.

Getting an Instance of XPath

Like many other APIs in JAXP, in order to get an instance of a working class you start with a factory. Although it seems cumbersome, this architecture provides flexibility and allows for future expansion. In the following example, the parameter handed to the newInstance method says that we want to build XPath objects that work with the default W3CDOM model, the only one supported in Java 1.5.

XPathFactory factory = XPathFactory.newInstance( XPathFactory.DEFAULT_OBJECT_MODEL_URI );
XPath xpath = factory.newXPath();

Once you have an XPath object, there are two ways to put it to work. You can have it evaluate an expression or you can have it compile the expression to create an instance of XPathExpression that incorporates the expression logic and can be used repeatedly.

A Simple XPath Example

The first XML example I am going to use is the web.xml file for the example servlets in Tomcat 5.5.9. In the following statement, doc is a reference to the JAXP Document for the web.xml file.

System.out.println( xpath.evaluate("/web-app/filter", doc  ) ); 

Execution of that line produces the following output:
                             
         Servlet Mapped Filter              
         filters.ExampleFilter              
                                           
      attribute                            
      filters.ExampleFilter.SERVLET_MAPPED 

That is the text content extracted from the following section of the web.xml document, note that the evaluation preserved all of the text content of all of the nodes contained in the first "filter" element found.

    <filter>
        <filter-name>Servlet Mapped Filter</filter-name>
        <filter-class>filters.ExampleFilter</filter-class>
<init-param>
     <param-name>attribute</param-name>
     <param-value>filters.ExampleFilter.SERVLET_MAPPED</param-value>
 </init-param>
    </filter>

It is important to note that only the first node satisfying the expression contributed to the output. Returning the full text content of the first node is the default for that particular "evaluate" method call. Contrast that simple XPath statement with the number of org.w3c.dom.Node method calls which would be required to extract that text from 6 separate elements and you begin to see the attraction of working with XPath.

Evaluation for Different Content Types

There are four different XPath methods named "evaluate", two are defined as returning a java.lang.String and two as returning a java.lang.Object reference. Therefore in writing a statement using evaluate, you may have to provide a specific type cast. The methods which provide for returning various object types are controlled by means of constants defined in the XPathConstants class.

For example, we can get all five of the nodes in the example web.xml document using the following statement.

NodeList nl = (NodeList)xpath.evaluate("/web-app/filter", doc, XPathConstants.NODESET );

Where the returned type implements the org.w3c.dom.NodeList interface methods. Note that although "NodeList" sounds like it should implement the java.util.List interface, it does not. The XPathConstants and the corresponding Java reference types that will be returned can be summarized as follows:

XPathConstants.BOOLEAN                 java.lang.Boolean
XPathConstants.NUMBER                  java.lang.Double
XPathConstants.STRING                  java.lang.String
XPathConstants.NODE                    org.w3c.dom.Node
XPathConstants.NODESET                 org.w3c.dom.NodeList

Now for a more complex example. The XML document source will be the server.xml file that Tomcat uses to define the service to be created and the connectors that will be exposed. Here are the pertinent XML elements. The real file is much larger.

<Server port="8005" shutdown="SHUTDOWN">
 <Service name="Catalina">
   <Connector port="80" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" />
    <!-- many details left out here -->           
 </Service>
</Server> 

The following code, where doc is a org.w3c.dom.Document containing server.xml, locates the element having the "name" attribute equal to "Catalina". Inside that element it finds the first element and locates the attribute named "enableLookups". The text value of that attribute is then used to create a Boolean object which is returned.

Boolean flag = (Boolean)xpath.evaluate( 
  "/Server/Service[attribute::name='Catalina']/Connector/attribute::enableLookups", 
  doc, XPathConstants.BOOLEAN );

Note that although the examples I have been using start with a org.w3c.dom.Document object, the evaluate method can apply an expression to any node in a document.

Using XPathExpression Instances

Instead of using the XPath evaluate method used in the first examples, you can build an XPathExpression instance that contains the expression and use it repeatedly. For example we could reproduce the output from the first examplewith the following:

  XPathExpression xpe = xpath.compile("/web-app/filter");
  System.out.println( xpe.evaluate( doc ) );

The intent of the XPathExpression class is to let the programmer define a suite of search expressions which can be reused, thus saving a bit of programming complexity.

Performance of the XPath Toolkit

Surely nobody would expect XPath, which is built on top of standard JAXP classes, to be faster than those classes. To get at the performance penalty for using XPath I timed the creation of XPathExpression instances and subsequent evaluation with an expression to get a NodeList of the nodes in a web.xml file. The Java statements required to do this (given an existing instance of XPath) are:

XPathExpression xpe = xpath.compile("/web-app/filter/filter-name");
NodeList nl = (NodeList) xpe.evaluate( doc,  XPathConstants.NODESET );

Using the methods in the org.w3c.dom package, this would be accomplished by code like the following to first get a NodeList containing the elements:

NodeList nlOne = doc.getElementsByTagName("filter");

Followed by looping through the elements to get each element as the contents of a second NodeList:

for( int j = 0 ; j < nlOne.getLength(); j++ ){
  Element fE = (Element)nlOne.item( j ) ;
  NodeList nlTwo = fE.getElementsByTagName("filter-name");
}

The timing results using my AMD Athlon 1.4GHz cpu can be summarized as follows:

Creating an instance of XPathExpression                 0.3 millisec
Using XPathExpression to get a  NodeList   7.5 millisec
Using getElementsByTagName to find <filter-name> nodes  0.3 millisec

The other performance indicator of interest is the amount of memory used, so I measured the memory consumed by creating 1,000 instances of XPathExpression. This turned out to be very small, approximately 500 bytes per instance.

Apparently the convenience and flexibility of using XPath comes with a considerable execution speed penalty. However, for many applications programmers will be glad to accept a speed penalty in exchange for simplicity and flexibility. I think we can all be glad that XPath is now a part of the Java standard library.

References

The W3C's XPath Recommendation 1.0 is at: http://www.w3.org/TR/xpath

The chapter on XPath in Elliote Rusty Harold's book, "Processing XML with Java" is available online at: http://www.cafeconleche.org/books/xmljava/chapters/ch16.html

The JavaDocs for the javax.xml.xpath package are available online at: http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/xpath/package-summary.html

This was first published in February 2006

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.