Searching XML with XQuery

In this tip, William Brogden discusses the many layers that are in development of the XQuery language.

XQuery is intended to be a language for stating queries into XML data sources that is concise, powerful and easy

to use. Supporters draw a parallel between SQL for relational databases and XQuery for XML. Defining the XQuery language standard is still at the W3C Candidate Recommendation stage, which was reached only in November 2005 after several iterations. Development of XQuery is tightly connected to the development of XPath 2.0, XSLT 2.0 and XMLSchema.

Movement from candidate to final recommendation has been cautious because of the parallel development of XPath and other XML and XSL standards. It seems that people keep turning up new and interesting applications so they are reluctant to freeze the standard before all of the implications have been worked out. For example, on Feb 21 certain data types were moved from one XMLSchema namespace to another. However, indications are that a Query Recommendation will be finalized soon.

In spite of the caution at the W3C, many implementations have been created independently over the last few years. Commercial and open source developments are listed at the W3C site which has shown a great deal of activity recently. If you are considering using XQuery you should check the site frequently.

In order to ensure that all implementations are working toward the same goal, the W3C maintains a frequently updated "XML Query Test Suite." To give you an idea of how complex a full implementation of XQuery is going to be, the Feb 15, 2006 W3C test suite for XQuery consists of 28,245 files containing sample XML documents, queries and representation of the expected results.

An Example XQuery Program

Here is an example XQuery program from the W3C use-cases documentation. It is intended to extract the title and minimum price from the "prices.xml" document, returning a collection of "minprice" elements. The operator "doc" creates a DOM from a file, the "distinct-values" operator creates a collection of distinct titles using the "//book/title" XPath expression and the "min" operator computes the arithmetic minimum value of the floating point values of the collection of price elements located by the XPath statement "//book[title = $t]/price".

<results>
  {
    let $doc := doc("prices.xml")
    for $t in distinct-values($doc//book/title)
    let $p := $doc//book[title = $t]/price
    return
      
 
        
  
   { min($p) }
  
      
 
  }
</results> 

The Saxon B XQuery processor release contains the use-case example programs and the prices.xml test database. After setting the SAXONCP environment variable to point to the saxon8.jar file, I executed the following Java command line, where q10.xq contains the above program.

java -cp %SAXONCP% net.sf.saxon.Query -t q10.xq

The output XML text, produced in about 360 milliseconds, included:


<?xml version="1.0" encoding="UTF-8"?>
<results>
 <minprice title="Advanced Programming in the Unix environment">
      &<price>65.95
   <minprice>
   <minprice title=" TCP/IP Illustrated ">
      <price>65.95
   <minprice>
   <minprice title="Data on the Web">
      <price>34.95
   </minprice>
</results> 

The process of applying an XQuery program to a XML data source could just as easily be done inside a Java program, using a program created as a String and getting the result as a org.w3c.dom.Document object.

The Results of XQuery Expressions

The results of applying XQuery expressions can be as simple as a single value or as complex as an entirely new XML document. XQuery expressions always return what the W3C calls "sequences" of one or more data types, which can be either "atomic" values such as a string or a floating point number or an XML Node.

You may recall from my previous tip that XPath 1.0 expressions as implemented in Java 1.5 can return the Java object types Double, Boolean, String, Node and NodeList. XQuery 1.0 and XPath 2.0 greatly expand the object type possibilities by using the type information in XML Schema. The intent of XQuery is to be able to use any XML Schema types, starting with the simple built in types.

XQuery, Java Standard Library APIs and JSR 225

Formal development of an XQuery API for Java is covered by the Java Specification Request (JSR) 225, which has the support of all of the major players in the Java applications business. It is intended that the XQuery API will be similar in style to JDBC and consistent with all JAXP tools. However, at the present time, if you want to execute XQuery statements in a Java program you will be using a particular vendor's API. Since XQuery is still at the Candidate Recommendation stage, we are far from having a standard API in the standard library.

The Saxon Implementation of XQuery for Java

Michael Kay released the Saxon B implementation of XQuery at the same time as the Candidate Recommendation was released. If you want to experiment with XQuery programming in Java, Saxon B is a good choice. Saxon uses and extends the Java 1.5 implementation of XPath 1.0 (the javax.xml.xpath package) to support XPath 2.0.

The commercial version of Saxon, Saxon SA, adds significant capabilities to use the information in an XML Schema for validation and to select elements and attributes based on their schema-defined type.

Other Open Source Implementations

Sleepycat Software has recently released the "Berkeley DB XML 2.2" package. This new version of their XML database supports the XQuery 1.0 and the XPath 2.0 Candidate Recommendations. This is not the Berkeley DB Java Edition but it has a Java API suitable for programmatic execution of XQuery statements. Sleepycat is considered one of the leaders in open source software. The company was recently acquired by Oracle in a surprise move.

In summary, I found the state of the possibilities for Java programming with XQuery to be in a greater state of flux than I expected. You can get started right now with Saxon B or other vendor implementation, but you should not expect to find XQuery in the Java standard library any time soon.

References

W3C documentation

Primary site for entry into W3C XQuery activities, including recent news and list of vendors: http://www.w3.org/XML/Query/

Sample use-cases for XQuery: http://www.w3.org/TR/xquery-use-cases/

Current XML Query Test Suite: http://www.w3.org/XML/Query/test-suite/

Documentation of the functions and operators for XQuery 1.0 and XPath 2.0: http://www.w3.org/TR/xpath-functions/

Documentation of the data model used in XQuery 1.0 and XPath 2.0: http://www.w3.org/TR/xpath-datamodel/

Candidate Recommendation for XPath 2.0: http://www.w3.org/TR/xpath20/

Some Freely Available Implementations for Java

Saxon XSLT and XQuery processor (Saxon B): http://saxon.sourceforge.net/

Sleepcat Software's Berkeley DB supporting XQuery 1.0 and XPath 2.0: http://www.sleepycat.com/products/bdbxml.html

General Resources

A site created by Jason Hunter with resources for XQuery developers: http://www.xquery.com/


This was first published in February 2006

Dig deeper on XQuery

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchSoftwareQuality

SearchCloudApplications

SearchAWS

TheServerSide

SearchWinDevelopment

Close