Home > SOA Tips > The Web Services Advisor > Is that XML valid?
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

THE WEB SERVICES ADVISOR

Is that XML valid?


William Brogden
12.20.2005
Rating: -3.25- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Java parsers that follow the JAXP API provide fine grained control over the type of validity checking. In addition to the standard library parsers, there are other parsers such as those created by the Apache Xerces project that implement the JAXP API and can be a direct replacement.

The most basic validity requirement for an XML document is that must be "well-formed." A well-formed document follows the XML syntax requirements such as having a single root element, having all tags properly terminated and using a legal character set. All XML parsers can detect failure to conform to the syntax rules.

When using the Java JAXP standard API, parsers are obtained from "factory" methods rather than by creating an instance of a parser class directly. For example, to get a DOM parser you create an instance of DocumentBuilderFactory. The class used for the creation of the factory instance can be set as a system property although most users will simply accept the default.

With a factory instance in hand, you then call various methods to tell the factory the characteristics of the parser you want it to build. For example, whether the parser is to verify the XML according to a schema. If the factory can't create a parser with those characteristics, an exception will be thrown. These extra levels of indirection are necessary to meet the JAXP goal of an API that is independent of the underlying parser implementation. They provide flexibility at the cost of extra code if you want anything but the default configuration. Fortunately the default classes will work for most purposes.

Generally speaking, schemas provide a way to specify rules for the content and structure of XML documents. The most basic format for rule specification, as found in XML 1.0, is the DTD or Document Type Definition. A DTD simply defines the allowed names of elements and attributes and the rules for nesting them in a complete document. Although DTDs fit the lax definition of a schema, by convention the term "XML schema" is used to refer to systems more complex than DTDs.

Expanded capabilities now built into Java 1.5 standard library in the javax.xml.validation package provide a schema class that can represent more complex validation rules. You can tell a parser factory to create a schema aware parser by supplying a schema object. The W3C 2001 XML schema recommendation, which was created by two years of effort by XML experts, is the only implementation provided in the standard Java 1.5 library. With the W3C schema definition language you can specify requirements such as numeric values must be inside a given range.

Next I want to discuss what happens when a parser detects an invalid document and contrast the DOM versus SAX models. With DOM processing, after you have configured the parser, it takes over and tries to completely parse the document. The only control you have over treatment of possible validation errors is by supplying the parser with an object implementing the ErrorHandler interface. If you do not supply a custom ErrorHandler, parse errors result in an exception being thrown. No document object will be built and all of the information parsed out of the document up to the error will be lost.

When writing code to handle parsing exceptions, you should not rely on the typical Java PrintStackTrace() method. That may give you a cause, but not tell you where the error occurs in the document. Your code should first try to catch an SAXParseException, it may be able to tell you the location of the XML text causing the problem in terms of the line number and column number. See the JavaDocs in the org.xml.sax package for details.

With SAX processing, your custom code will have to handle events that represent three kinds of parsing errors - warnings, plain errors such as failure to follow a DTD, and fatal errors such as errors in syntax. Your custom event handling methods will have received valid data up to the point of the parse error. Your code may be able to recover usable data from the events already processed and may be able to provide extra information in the error event reporting.

Expanded capablilities now built into Java 1.5 standard library in the javax.xml.validation package provide for more complex validation approaches outside the parser classes. A validator object can work with files, stream sources or in memory document objects. This capability means that you can parse a document into a DOM object with a less stringent parser and then check it against various schema with a validator.

References

W3C XML schema activities, tutorials and lists of schema building tools

Up to date review of XML schema projects


Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
The Web Services Advisor
What to expect with the new JavaScript standardization (ECMAScript 5)
Restlet framework wrestles RESTful Web applications
3 tips for choosing whether to use EGL
Use SoaML to facilitate Model Driven Architecture
Enterprise mashup patterns act as API enablers
XQuery learns to write using XUF
Descriptive Languages for RESTful Services
Notable Python language update on view
Try XML-based Extensible Business Reporting Language (XBRL) for accounting reports
Whatever happened to ''X''?

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SOA Trends and Strategy - SOA Education, SOA Development, SOA Implementations
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2001 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts