Ever since the beginning of web services as a class of application, performance, in terms of response time and memory requirements, has been a major consideration. In particular, the use of XML and SOAP is seen as an obstacle to high performance and the developers of various toolkits have devoted much effort to fixing performance problems. For purposes of this article, I am going to break down web service performance into the following elements:
- Message transmission
- Message parsing
- Service object creation
- Backend processes
- Response creation
- Message Transmission
REST style service requests are generally very compact but SOAP messages are clearly very wordy, with a small ratio of payload to total message size. The explosion of namespace references in SOAP messages just adds to the bandwidth requirement. The Fast Infoset standard
There have been numerous other attempts to create more compact XML messages. Many people have concluded that the standard "gzip" compaction algorithm can create a satisfactory compact message. Since gzip is available in many languages and is widely available on HTTP servers it is easy to implement.
In this step, the service has to determine the destination of the request, and with some toolkits, extract method parameters. With a RESTful Web service, destination and parameters can be extracted directly from the URL but SOAP messages have to be parsed.
A lot of effort has gone into improving the XML parsing stage of SOAP request handling because it frequently appears to be a bottleneck. All XML parsers must scan the incoming stream of characters and recognize the various XML elements, packaging the element data into programming language specific chunks such as Java objects. Where parsers differ is in how these chunks of data are handled.
In the document object model or DOM, all XML tags and the actual request data ends up in memory objects reproducing the structure of the original XML document. Constructing the DOM takes a lot of CPU time and memory. One advantange of the DOM is that tools such as XPath can be used to locate data with a simplified notation. The original Apache SOAP toolkit used the DOM but this was found to be slow and use lots of memory.
A newer approach to SOAP, used by both the XFire and Axis2 toolkit projects, is intended to achieve high performance by using an event oriented parser known as StAX. StAX is called a "pull" parser because it only generates events as requested. This has the advantage that the service can usually find all the data it needs and stop parsing before the complete SOAP message has been read, and memory requirements are much lower.
The XFire toolkit got good publicity for speed when it started in 2005. Version 1.2.6 of XFire was released in May 2007. However, at that point the XFire project merged with IONA's Celtix project to become a new Apache project called CXF. CXF is separate from the Axis2 project so there are now two completely separate open source high performance web service projects housed by the Apache Software Foundation. In these StAX based toolkits, parsing occurs at the same time as the service object creation step.
Service object creation
This is often referred to as the "databinding" step in which values extracted from the service request are passed to the method which will accomplish the service request. With Java based services, this typially involves creation of a new object that is initialized with extracted values.
There has been an astonishing proliferation of schemes for this step, some developed completely new for web services and some as add-ons to existing application frameworks like Spring. From the benchmarks I have seen (such as this one), it appears that the choice of the databinding approach can make a big difference in throughput. Both CXF and Axis2 attempt to support as many databinding schemes as possible, to make it easier for developers currently using and existing toolkit to get started with web services. Some widely used databinding approaches are:
- JAXB: The Java API for XML Binding is part of the standard Java library.
- POJO: Both Axis2 and CXF provide one or more methods for creating web services based on Plain Old Java Objects.
- Spring Framework objects: Spring is a widely used open source application framework.
- Castor: Castor is a popular open-source XML binding framework for Java.
Depending on the design of a Web service, backend processes, such as collating results of database searches may consume much more time than all of the other phases. If your potential Web service falls in this category it makes no sense to choose your Web service toolkit on the basis of benchmarks which only measure response time using fake results.
Standard techniques for caching any results of complex backend processes are clearly a good idea. Depending on the frequency of duplicate requests, this may yield impressive speedups. Microsoft's tools for Web services created using ASP.NET include a mechanism for the programmer to specify that the system should copy output to a cache where it is stored for a specified amount of time.
In the original Apache SOAP toolkit, responses were created by building a DOM of the complete output message and then serializing it to the response stream. This was clearly very inefficient. Generally the various toolkits for databinding also provide mechanisms for serializing object data back into XML. It is up to the Web service tools such as Axis2 and CXF to enclose this data in the correct SOAP envelope.
Wikipedia survey of compact XML format projects
The XFire Project
Some Apache Software Foundation web service related projects including Axis 1 and 2 but not CXF
The Fast Infoset part of the Glassfish server project
The Java API for XML Binding (JAXB) implementation in the Glassfish server project
The CXF User's Guide
The Spring application framework
Interview with the creators of CXF
This was first published in October 2008