Home > SOA Tips > XML Developer > Untangling Unicode encoding in XML
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

XML DEVELOPER

Untangling Unicode encoding in XML


Ed Tittel
01.15.2003
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



XML Developer Tip
(Receive this column in your inbox,
click Edit your Profile to subscribe.)

Untangling Unicode encoding in XML
Ed Tittel

My last tip dealt with including the Euro currency symbol in XML documents, using various forms of Unicode character entity references. It caused an unexpected blizzard of e-mail asking for help on managing the details involved in working with the many different forms that Unicode can take. This led me on an expedition to locate good references and tutorials on the subject, which in turn led me to the subject of this week's tip. It's a profound bow of gratitude toward Mike J. Brown's excellent Web resource entitled "The skew.org XML Tutorial". This paper concentrates on matters related not just to XML in general, but also on XML encoding strategies. It also covers the differences between Unicode (which is bandied about—as I've done here—as a way of describing a mammoth collection and codification of character codes, alphabets, and other typographical marks)—and the standard that actually governs XML character encoding—namely, ISO/IEC Standard 10646-1. Brown cuts through these matters by calling this a Universal Character Set or UCS.

The biggest practical difference between the two standards is that the Unicode Standard is available online at www.unicode.org and is well and affordably documented in Addison-Wesley's various versions of the Unicode Consortium's excellent publications, of which the most current version is The Unicode Standard 3.0 (Addison-Wesl


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
XML and XML schema
SOA pattern of the week (#7): policy centralization
Try XML-based Extensible Business Reporting Language (XBRL) for accounting reports
What's new at the W3C
Ganymede: Modeling tools target SOA, UML
Data services mashups emerge for SOA
Making sense of data services mashups
XML turns 10
SOA helps save 100-year-old business
Oracle maps heterogeneous data services strategy for SOA
Handling XML with Ajax

XML
National Weather Service policy supports XML
XML and democracy at work: The Election Markup Language (EML)
For interesting interface access, check out Xamlon
Royalty-free, revolutionary UBL
Altova strikes again with MapForce 2005
Beating the RSS crunch with aggregation/bloglines
Voice, speech, SIP, and XML: ECMA-269
Microsoft Baseline Security Analyzer and XML
An open source, native XML database: dbXML 2.0
Second-generation XML security preview: SAML

XML Developer
Use the soapUI software tool to tame WSDL
WSDL 2.0, new messaging for Web services
Using RELAX NG For data integration
Efficient XML Interchange tackles data verbosity
XML to DDL imports, synchronizes database schemata
The basics of MathML 3.0
Migrating to XSLT 2.0
What's up with XML 2.0?
Say hello to XPath 2.0
Podcasting software covers many bases

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
class diagram  (SearchSOA.com)
Fast Infoset (FI)  (SearchSOA.com)
GeoRSS  (SearchSOA.com)
Keyhole Markup Language  (SearchSOA.com)
RELAX NG  (SearchSOA.com)
state diagram  (SearchSOA.com)
Universal Business Language  (SearchSOA.com)
Vector Markup Language  (SearchSOA.com)
XML infoset  (SearchSOA.com)
XML pipeline  (SearchSOA.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


ey, 2000). The ISO/IEC 10646-1 official documentation comes in numerous pieces—as many as six, in fact—and costs hundreds of dollars and up for electronic, CD, or paper copies available only from the Web site at www.iso.org. Brown also recommends Tony Graham's Unicode: A Primer (Wiley, 2000) as another valuable resource on the topic, one that explains the differences between Unicode and ISO 10646 more thoroughly than his tutorial, in fact.

Brown's tutorial does numerous wonderful things to help XML content and tool developers fit their minds around the many minutia of getting Unicode/10646 encoding right in the XML documents and in the tools that deal with such documents, including:

By working your way through this excellent collection of materials, you should be much better equipped to understand and use UCS encodings in your XML documents. Having worked around the topic for nearly 5 years now, I nevertheless learned a lot about UCS encodings from this resource myself; hopefully, you will have the same experience.


About the Author

[IMAGE]Ed Tittel is a principal at LANWrights, Inc., a network-oriented writing, training, and consulting firm based in Austin, Texas. He is the creator of the Exam Cram series and has worked on over 30 certification-related books on Microsoft, Novell, and Sun related topics. Ed teaches in the Certified Webmaster Program at Austin Community College and consults. He a member of the NetWorld + Interop faculty, where he specializes in Windows 2000 related courses and presentations.


For More Information:


Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



SOA Trends and Strategy - SOA Education, SOA Development, SOA Implementations
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2001 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts