Tip

Expert advice on transmitting data: A pros and cons comparison of data transfer formats

Recent discussions on Web service programming forums suggested to me that a review of the various formats for transporting data around networks would be useful for developers new to the field. In this article, I am only going to talk about formats used for language independent machine to machine transmission, the various transport mechanisms will be a topic for another article.

Text-based data transfer formats:

First, let's take a look at text-based data transfer formats.

XML

    Requires Free Membership to View

XML is a flexible text format for data representation which has solved many problems for developers, while creating some new ones. The standard for XML document syntax and the many related standards is maintained by the W3C working groups, while domain specific formats are maintained by a variety of organizations.

PROS:
  • Readable and editable by developers
  • Error checking by means of Schema and DTDs
  • Can represent complex hierarchies of data
  • Unicode gives flexibility for international operation
  • Plenty of tools in all computer languages for both creation and parsing
CONS:
  • Bulky text with low payload/formatting ratio (but can be compressed)
  • Both creation and client side parsing are CPU intensive
  • Some common word processing characters are illegal (MS Word "smart" punctuation, for example)
  • Images and other binary data require extra encoding
JSON

The development and standardization of JavaScript has made the Web browser a powerful tool for dynamic presentation of data by manipulating the appearance and content of HTML elements. In recent years, it has become possible to assemble the JavaScript component of a webpage from multiple sources which can be updated repeatedly with data objects, thus JavaScript Object Notation or JSON.

JavaScript recognizes the usual set of variable types, strings, numbers, arrays and simple objects. The data structures that JSON excels at representing are collections of name/value pairs and ordered lists of data values. Since JavaScript is transmitted as plain text, JSON can be read by other languages so the uses extend far beyond the Web browser. Thus, JSON is strong competition for data transmission in many areas. Recognizing this, RESTful Web service frameworks, such as Jersey and Restlet put a lot of effort into supporting JSON.

PROS:
  • Readable and editable by developers
  • Plenty of JavaScript developers
  • Highly developed browser toolkits such as Dojo and jQuery
CONS:
  • Bulky text with low payload/formatting ratio, but not as bad as XML
  • Client CPU time required to parse
  • Not as flexible as XML for some data structures and binary data
Plain Text
Note on text based formats:
All of the text based formats share the virtue of being readable and editable by developers. This means that you can create and test both ends of a data exchange with fake data. As discussed in my article on testing Web services, this makes a tremendous difference during development.


It is rather easy to represent some sorts of data as lines of plain text in which one line corresponds to a single data item. Spreadsheet rows can be expressed this way using "comma separated values" or CSV. Another common approach is a list of "properties" where each line contains a name/value pair.

PROS:
  • Readable and editable by developers
  • Fairly compact representation for simple types
CONS:
  • Possible confusion introduced by punctuation in values
  • Limited to very simple structures
Binary formats

So much for formats based on text, next let's look at some binary formats.

CORBA

The Common Object Request Broker Architecture or CORBA was the first serious effort to provide for communication of complex data objects between completely different systems. Much of CORBA is concerned with aspects of communication that we are not talking about here. The CORBA standard, now at version 3.1 (2008), is maintained by the Object Management Group.

PROS:
  • Language and operating system independence
  • Compact data representation
  • Built in mapping in Java covers almost all features
  • Open-source versions are available
CONS:
  • The complete standard is quite complex
  • Interfacing to non-object-oriented languages not easy
  • Incomplete implementation on many systems
Packed Binary

Back in the days when dinosaurs roamed the earth and 300 baud modems were as good as you could get for a remote system, programmers put a lot of ingenuity into packing maximum information into the minimum number of bits. If we only needed integers between 0 and 63, we only used 5 bits which could share a byte with 3 true or false bits. I suspect that only programmers of deep space probes do much packed binary by hand these days.

PROS:
  • Very compact, approaching theoretical maximum.
CONS:
  • Computation intensive
  • Fragile, dropped or damaged data bits are hard to detect and correct
  • Modern programmers not familiar with the idea
Google Protocol Buffers

Google has made the idea of packed binary more practical for real applications with the introduction of Protocol Buffers. This toolkit evolved as a replacement for hand coded packed binary for exchanging requests and responses with Google index servers. The tools were released to open source distribution just 2 years ago. The intent of this API is to provide a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

This API produces an encoding of typical data values almost as compact as a hand optimized packed binary. The documentation serves as an introduction to the concept and is suitable for modern programmers. Programmers must use a "proto" syntax to specify the data types to be transmitted and the toolkit takes care of generating the support code for packing and unpacking.

PROS:
  • Very compact representation, approaching theoretical maximum
  • Tools for many languages
  • Not sensitive to version changes
  • Open source license
CONS:
  • Not readable or editable by developers
  • Yet another data definition syntax (proto) to learn

This was first published in August 2010

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.