Home > SOA Tips > XML Developer > XML things that bite you
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

XML DEVELOPER

XML things that bite you


William Brogden
02.01.2006
Rating: -4.57- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


SAXParseException Mysteries

Many SAXParseException reports can be interpreted fairly easily, but what are we to make of this report which you may see when parsing an XML document created or edited using a text editor?

The processing instruction target matching "[xX][mM][lL]" is not allowed, especially mystifying when you know you have not tried to create a processing instruction. It turns out that this is what you get if the "<" character of the starting XML declaration -- -- is not the first character in the file. It is particularly mystifying because the file looks perfectly ok when viewed or printed.

The Case of the Vanishing Node

For purposes of illustrating this problem, assume you have an XML document used to keep a list of users who sign up online, with this as a typical entry.

When parsed into memory as an org.w3c.dom.Document, the firstname Element has a child node that is a TEXT_NODE type. Programmers may be tempted to use Java code like the following to get the firstname String:

With similar code to set a new value for firstname:

This code will compile and appear to work correctly until the fateful day when for some reason an empty string is used to set the value of the text node. While the Document is still in memory, the above code will continue to work. However, when the Document is serialized to a file with a Transformer, instead of the expected text:

What you actually get is:

The Transformer recognizes that the firstname element is empty and considers this the preferred form. Now when the revised document is parsed into memory, that firstname element does not have a child Node and the statement --

-- causes a NullPointerException, giving the programmer a nasty shock.

The solution is of course to code defensively, checking for the presence of the child node and providing ...


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
XML and XML schema
What's the future of XML?
SOA pattern of the week (#7): policy centralization
Try XML-based Extensible Business Reporting Language (XBRL) for accounting reports
What's new at the W3C
Ganymede: Modeling tools target SOA, UML
Data services mashups emerge for SOA
Making sense of data services mashups
XML turns 10
SOA helps save 100-year-old business
Oracle maps heterogeneous data services strategy for SOA

XML Developer
Use the soapUI software tool to tame WSDL
WSDL 2.0, new messaging for Web services
Using RELAX NG For data integration
Efficient XML Interchange tackles data verbosity
XML to DDL imports, synchronizes database schemata
The basics of MathML 3.0
Migrating to XSLT 2.0
What's up with XML 2.0?
Say hello to XPath 2.0
Podcasting software covers many bases

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
class diagram  (SearchSOA.com)
Fast Infoset (FI)  (SearchSOA.com)
GeoRSS  (SearchSOA.com)
Keyhole Markup Language  (SearchSOA.com)
RELAX NG  (SearchSOA.com)
state diagram  (SearchSOA.com)
Universal Business Language  (SearchSOA.com)
Vector Markup Language  (SearchSOA.com)
XML infoset  (SearchSOA.com)
XML pipeline  (SearchSOA.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


a default value if it is not there. For example, use this code to get the first name:

Setting a firstname value when the child node does not exist requires more complex code because we have to create the Node first.

Why is my Document null?

Some programmers have been accustomed to writing code like the following in a method that parses XML into a Document object to assure themselves that the Document was created.

With the Java 1.5 XML library this results in output that looks like:

To a new programmer this appears to be saying that the Document has no content. Actually all it is saying is that you do have a Document object. I always find the Javadocs table for the Node interface to be a big help in cases like this. It tells you what to expect from the getNodeValue and getNode name methods for various DOM objects. Here is a link to Sun's online documentation for org.w3c.dom.Node: http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Node.html.

From this table you can see that the getNodeValue() method always returns null from a Document type node. The toString() method for Document combines the "#document" name plus the value from getNodeValue().

Unicode Errors

A very frustrating type of error you many encounter when dealing with XML documents is the invalid Unicode character error resulting an an exception report that looks like this:

Or perhaps the even more alarming:

Chasing down the source of invalid characters can be quite a detective job, especially since they may look perfectly normal on casual examination.

The 0x1a character turns out to be a control code used as an end of file mark in certain applications. In one case, this character ended up in a database field and was subsequently inserted in an XML document with unfortunate results.

A common source of an invalid character is a document created with a word processor that use Microsoft's convention for "smart" punctuation. If you see different characters for open and close quotes, you have "smart" punctuation.

Unfortunately, Microsoft selected character codes for "smart" punctuation that lie in the range 0x82 through 0x95, which Unicode reserves for control codes and are not legal in XML. Thus when Microsoft documents are used as a source for cut and paste operations with XML documents there is a danger of introducing characters that will prevent a document from parsing.

If you have an SAXParseException, you can extract the location of the offending character from the exception with code like the following:

It is also a big help if you have a programmer's editor which can switch between display of normal text and character codes in hex. Personally I am fond of UltraEdit-32 for this sort of detective work.

About the author
Bill Brogden is a computer consultant who enjoys exploring new technologies. He has written study guides for Java certifications and several books on using XML with Java. You can reach Bill at wbrogden@bga.com.


Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.


Submit a Tip




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SOA Trends and Strategy - SOA Education, SOA Development, SOA Implementations
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2001 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts