Anybody who wrote communication programs in the old days of slow modems and expensive long lines before the Internet probably had the same reaction on first seeing XML. "What a wasteful format!" Admittedly it is hard to give up those give up those old byte-saving instincts to gain the advantages of XML. Transmitting 5000 bytes of data of which perhaps 500 are the real content and the rest horribly repetitive seems to go against the grain. The "Fast Infoset" standardization is an attempt create a compact encoding method that can transmit an XML Information Set with significant saving of bandwidth and processing power.
What is an XML Information Set Anyway?
Although many people have doubted the need for yet another XML standard, the W3C created the "XML Information Set" or "Infoset" recommendation. This specification attempts to standardize all of the definitions of the parts of an XML document at a more abstract level than the character syntax.
The idea is that all specifications dealing with XML can use the Infoset definitions and be sure they are talking about the same thing. To avoid using terms such as "Element" that appear in specific XML handling APIs, the specification talks in terms of "Information Items." Eleven different types of information item are recognized. Any XML document representation that can handle all of these items is considered to hold a complete Infoset.
Fast Infoset Standardization
Alternatives to Fast Infoset
Given the verbose nature of XML it is not surprising that a number of attempts have been made to compress documents for more efficient transmission. The well know ZIP and GZIP encodings are an obvious technique to use. Filters that perform zip compression and decompression for data streams on both the client and server side are easily installed and more widely available than Fast Infoset implementations. However since zip compression only "knows" about character sequences it can't take advantage of the formal structure of XML documents and decompression takes processing power.
Sun's Java Implementations of Fast Infoset
The implementation of Fast Infoset in Sun's Java Web Service Developers Pack (JWSDP) and the open source Glassfish project in its second iteration. It is still considered to be a young technology that has not yet reached its full potential. For example, it does not compress bulk text or support other advanced features that appear in the Fast Infoset specification.
The JWSDP 2.0 release provides a negotiation mechanism by which the Web service client can use the standard HTTP headers "Accept" and "Content-Type" in the initial contact with a server to indicate that it can accept Fast Infoset-coded data. If the Web service side has been configured properly the response and future conversations will be Fast Infoset encoded.
Experiments with Fast Infoset
In order to get an idea of the payoff for using Fast Infoset formatted XML documents versus plain text formatted and zip formatted document files, I performed some timing experiments. The test XML document was a set of XML formatted test questions with most of the content as text elements and without namespaces. This document is considerably larger than those I have seen in other Fast Infoset tests.
I used the latest Fast Infoset toolkit from Sun's Glassfish project to create a Fast Infoset-coded version and WinZip to create zipped versions of the document file. The toolkit provides a selection of Fast Infoset parser implementations for Document Object Model (DOM), SAX and StAX style document parsing. StAX is the Streaming API for XML, a "pull-passing" API which many programmers find easier to work with than SAX. I timed only the DOM creating parser since that is most likely to be used in SOAP Web services.
Examining the Fast Infoset formatted file revealed that all the XML tags were compacted to codes but the text content was literal so I also tried zipping the Fast Infoset file. Here are the resulting file sizes:
Plain XML text file 769,396 bytes Zipped XML text file 68,715 Fast Infoset formatted 548,669 Zipped Fast Infoset 71,929In order to isolate the effect of the formatting from network delays that would influence an actual Web service, I wrote a test program to time creation of a standard Java org.w3c.dom.Document object from the disk files. I paid due attention to "warming up" the JVM and doing garbage collection outside the timed portions. The Java standard library was used to create an input stream to the parser when reading a zipped file. For timing I used the JAMon open source performance monitor toolkit (heartily recommended.) These times are an average of 10 repetitions in milliseconds.
Parse plain XML text file 62.8 msec Parse zipped XML text file 70.4 Parse Fast Infoset formatted 31.3 Parse zipped Fast Infoset 37.4
I conclude that Fast Infoset encoding makes a substantial reduction in the time required to parse an XML document, but only slightly reduces the bulk while zip encoding is great at reducing the bulk at some cost in processing time. I think we can look forward to greater use of Fast Infoset in Web services as the tools become more widespread.
The W3C XML Information Set Recommendation: http://www.w3.org/TR/xml-infoset/
The ITU Telecommunications Standardization Site (X.891 is the Fast Infoset standard): http://www.itu.int/ITU-T/
Java Web Services Developer Pack Version 2.0: http://java.sun.com/webservices/jwsdp/index.jsp
Fast Infoset Project in Glassfish: https://fi.dev.java.net/
Open source Java performance monitor: http://jamonapi.sourceforge.net/
A survey of earlier XML compression articles: http://xml.coverpages.org/xmlAndCompression.html
This was first published in July 2006