Using RELAX NG For data integration

XML Schema proving a difficult fix when it comes to data integration? Perhaps you should take a look at RELAX NG.

In a tip I wrote almost one year ago entitled Relax NG, the XML Schema Alternative, I explained why companies or individuals might want to mess with this powerful schema language, even in the face of a set of formal W3C XML Schema recommendations. These points are as true today as they were when I first wrote them:

  • The language is designed to be simple and easy to learn, which many would observe is not the case for XML Schema.
  • The language includes both an XML syntax and a compact non-XML syntax. It also supports XML namespaces and does not change the information set of any document it touches.
  • It works with XML Schema Datatypes (just as does XML Schema itself) and can draw on the expressive power of that markup language.

To amplify the final bullet item, I'd now like to add that support for Datatypes enables definition and use of plug-in datatype libraries, functionality that is extremely helpful when encountering and accommodating new sources for and types of data for description, definition and processing.

To this litany of reasons for considering RELAX NG, I'd like to add the following additional observations, which together form the central thesis for this tip:

  • RELAX NG is backward compatible all the way to the original XML 1.0 DTD (XML Schema is not). Thus RELAX NG is also quite adept at handling XML documents of all kinds, including DTD and Schema based representations.
  • RELAX NG is unusually adept for use when generating schemas from different data sources, partly because it's more approachable and transparent than XML Schema and because its grammar allows sub-elements to occur in any order within complex types (and thus avoids the need to groom data sets for ordering).
  • RELAX NG works with a variety of development toolsets, including the JAXB libraries in the Java Web Services Developer Pack, oXygen, Jing, Trang and many more. Check out the Software resources at the RELAX NG home page for more information.
  • At its core, a RELAX NG schema specifies a pattern for structure and content in an XML document. Thus a RELAX NG schema identifies a class of XML documents consisting of all XML document instances that match its pattern. This makes it unusually helpful at working with data outputs in XML formats from databases, document management systems and applications (including Office 2007, which now uses XML-based formats as the default file type for all constituent apps).

Ultimately, if you can get at the structure of an XML document programmatically, you can use that access to also construct a RELAX NG schema to represent its pattern. With this capability, you can then manipulate any and all other documents that also match that pattern. Given its programmer-friendly construction and easy-to-understand grammar and syntax, this makes it a powerful addition to the arsenals of those who must take data in and out of XML forms.

About the author

Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT Certification and information security topics. Among his many XML projects are XML For Dummies, 4th edition, (Wylie, 2005) and the Shaum's Easy Outline of XML (McGraw-Hill, 2004). E-mail Ed at etittel@techtarget.com with comments, questions or suggested topics or tools for review.


This was first published in October 2007
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSoftwareQuality

SearchCloudApplications

SearchAWS

TheServerSide

SearchWinDevelopment

Close