Home > SOA Tips > XML Developer > Untangling Unicode encoding in XML
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

XML DEVELOPER

Untangling Unicode encoding in XML


Ed Tittel
01.15.2003
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



XML Developer Tip
(Receive this column in your inbox,
click Edit your Profile to subscribe.)

Untangling Unicode encoding in XML
Ed Tittel

My last tip dealt with including the Euro currency symbol in XML documents, using various forms of Unicode character entity references. It caused an unexpected blizzard of e-mail asking for help on managing the details involved in working with the many different forms that Unicode can take. This led me on an expedition to locate good references and tutorials on the subject, which in turn led me to the subject of this week's tip. It's a profound bow of gratitude toward Mike J. Brown's excellent Web resource entitled "The skew.org XML Tutorial". This paper concentrates on matters related not just to XML in general, but also on XML encoding strategies. It also covers the differences between Unicode (which is bandied about—as I've done here—as a way of describing a mammoth collection and codification of character codes, alphabets, and other typographical marks)—and the standard that actually governs XML character encoding—namely, ISO/IEC Standard 10646-1. Brown cuts through these matters by calling this a Universal Character Set or UCS.

The biggest practical difference between the two standards is that the Unicode Standard is available online at www.unicode.org and is well and affordably documented in Addison-Wesley's various versions of the Unicode Consortium's excellent publications, of which the most current version is The Unicode Standard 3.0 (Addison-Wesley, 2000). The ISO/IEC 10646-1 official documentation comes in numerous pieces—as many as six, in fact—and costs hundreds of dollars and up for electronic, CD, or paper copies available only from the Web site at www.iso.org. Brown also recommends Tony Graham's Unicode: A Primer (Wiley, 2000) as another valuable resource on the topic, one that explains the differences between Unicode and ISO 10646 more thoroughly than his tutorial, in fact.

Brown's tutorial does numerous wonderful things to help XML content and tool developers fit their minds around the many minutia of getting Unicode/10646 encoding right in the XML documents and in the tools that deal with such documents, including:

  • The best introduction of specific terminology and its specific meanings in the character encoding context (this turns out to be far more important than you might guess).
  • Mappings between various important characters sets—include ASCII, the various ISO Latin character sets (denoted ISO/IEC 8859-X, where X runs between 1 and 15 at last check), the WGL4 Windows Glyph List (version 4) that Microsoft defined with Agfa Monotype and implements in most Windows fonts, and the Adobe Glyph List (AGL), itself a superset of WGL4.
  • An explanation of how the UCS code space is divided into 17 planes, each of which accommodates up to 65,535 values (a 16-bit encoding space, in other words), and how general character encoding works.
  • The process whereby character encodings are converted from abstract representations like Զ to specific numeric codes that some device can recognize and render.
  • Documentation of common encoding schemes used for abstract representations, such as UTF-8 and UTF-16, how these work in XML, and how to reference them in XML document descriptions.

By working your way through this excellent collection of materials, you should be much better equipped to understand and use UCS encodings in your XML documents. Having worked around the topic for nearly 5 years now, I nevertheless learned a lot about UCS encodings from this resource myself; hopefully, you will have the same experience.


About the Author

Ed Tittel is a principal at LANWrights, Inc., a network-oriented writing, training, and consulting firm based in Austin, Texas. He is the creator of the Exam Cram series and has worked on over 30 certification-related books on Microsoft, Novell, and Sun related topics. Ed teaches in the Certified Webmaster Program at Austin Community College and consults. He a member of the NetWorld + Interop faculty, where he specializes in Windows 2000 related courses and presentations.


For More Information:

  • Looking for free research? Browse our comprehensive White Papers section by topic, author or keyword.
  • Are you tired of technospeak? The Web Services Advisor column uses plain talk without the hype.
  • For insightful opinion and commentary from today's industry leaders, read our Guest Commentary columns.
  • Hey Codeheads! Start benefiting from other time-saving XML Developer Tips and .NET Developer Tips.
  • Visit our huge Best Web Links for Web Services collection for the freshest editor-selected resources.
  • Choking on the alphabet soup of industry acronyms? Visit our helpful Glossary for the latest lingo.
  • Visit Ask the Experts for answers to your Web services, SOAP, WSDL, XML, .NET, Java and EAI questions.
  • Discuss this issue, voice your opinion or just talk with your peers in the SearchWebServices Discussion Forums.

Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
XML and XML schema
What's the future of XML?
SOA pattern of the week (#7): policy centralization
Try XML-based Extensible Business Reporting Language (XBRL) for accounting reports
What's new at the W3C
Ganymede: Modeling tools target SOA, UML
Data services mashups emerge for SOA
Making sense of data services mashups
XML turns 10
SOA helps save 100-year-old business
Oracle maps heterogeneous data services strategy for SOA

XML
National Weather Service policy supports XML
XML and democracy at work: The Election Markup Language (EML)
For interesting interface access, check out Xamlon
Royalty-free, revolutionary UBL
Altova strikes again with MapForce 2005
Beating the RSS crunch with aggregation/bloglines
Voice, speech, SIP, and XML: ECMA-269
Microsoft Baseline Security Analyzer and XML
An open source, native XML database: dbXML 2.0
Second-generation XML security preview: SAML

XML Developer
Use the soapUI software tool to tame WSDL
WSDL 2.0, new messaging for Web services
Using RELAX NG For data integration
Efficient XML Interchange tackles data verbosity
XML to DDL imports, synchronizes database schemata
The basics of MathML 3.0
Migrating to XSLT 2.0
What's up with XML 2.0?
Say hello to XPath 2.0
Podcasting software covers many bases

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
class diagram  (SearchSOA.com)
Fast Infoset (FI)  (SearchSOA.com)
GeoRSS  (SearchSOA.com)
Keyhole Markup Language  (SearchSOA.com)
RELAX NG  (SearchSOA.com)
state diagram  (SearchSOA.com)
Universal Business Language  (SearchSOA.com)
Vector Markup Language  (SearchSOA.com)
XML infoset  (SearchSOA.com)
XML pipeline  (SearchSOA.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SOA Trends and Strategy - SOA Education, SOA Development, SOA Implementations
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2001 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts