XML Developer Tip
(Receive this column in your inbox,
click Edit your Profile to subscribe.)
Euro symbols and XML Schema
Ed Tittel and Lucinda Dykes
Character entities are XML general entities that provide a name for a single Unicode character. Character entities can be referenced using a numeric character reference, for example, the decimal € or the hexadecimal € for the Euro symbol. The Euro symbol can be used as data in an XML element or attribute when used in this numeric character reference form. (Numeric character references can't be used in CDATA sections or in XML names, however.) Character entities can also be referenced using a named character entity, for example, € for the Euro symbol.
Developers accustomed to declaring entities in DTDs soon discover that XML Schema offer no comparable mechanism for use of entities. Entities continue to be such a controversial area of XML Schema that the W3C XML Core Working Group issued a consensus statement on the topic of character entities in November 2002. In short, the consensus statement says that DTDs are an existing mechanism for declaring entities, and therefore there is no need to create a new way to include entities in XML documents.
So how do you include entities when using XML Schema? Let's start with the W3C XML Schema Recommendation, and then look at a new approach.
The W3C Schema Recommendation offers two different methods for using entities with XML Schema. The first method is to use an internal DTD subset to declare an entity, for example, to use the Euro symbol in an XML instance document such as the following:
<?xml version="1.0" ?> <!DOCTYPE Order [ <!ENTITY euro " "> ]> <Order xmlns="http://www.LANwrights.com/euro.xsd"> <item> <title>XML Schema</title> <price>40 €</price> </item> </Order>
This instance document is constrained by a schema. When this document is processed, the entity reference will be replaced by the entity content—this occurs before schema validation occurs, so the schema processor will use 40 € as the value of the price element. However, what the W3C Recommendation fails to mention is that if an internal DTD subset is used, you'll need to include ELEMENT declarations for every element in the document, as well as ATTLIST declarations, to avoid validation errors.
The second method is to declare the character entity as an element within a schema document, and make the element's content fixed, for example:
<xsd:element name="euro" type="xsd:token" fixed="€">
This element can then be used in an XML instance document, such as the following:
<?xml version="1.0" ?> <Order xmlns="http://www.LANwrights.com/euro.xsd" xmlns:ce="http://www.LANwrights.com/characterEntities"> <item> <title>XML Schema</title> <price>40 <ce:euro/></price> </item> </Order>
There is yet another approach. This method uses an XSLT library called xmlchar that provides named elements for all of the character entities of HTML 4, including the Euro character. This method is similar to the second W3C method mentioned earlier, but it's designed to be used with XSLT stylesheets. This approach works only for element content, not attribute values. The xmlchar stylesheet can be imported into an existing xslt stylesheet using <xsl:import>, as follows:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="xmlchar-1.1/html4-all.xsl"/>
Any xmlchar elements added to an XML document will be converted to the appropriate character entity in the output.
Until the time that a more straightforward method exists for using character entities with XML schemas (and yes, despite the consensus statement, we're still hoping!), you can use character entities and XML schemas by using one of these three techniques.
For more information on xmlchar, see the article "Named Character Elements for XML" by Anthony Coates and Zarella Rendon. For more details on using the Euro character in XML documents, see Rick Jelliffe's "Euro-XML".
About the Author
Ed Tittel is a 20-plus year veteran of the computing industry, who's worked as a programmer, manager, systems engineer, instructor, writer, trainer, and consultant. He's also the series editor of Que Certification's Exam Cram 2 and Training Guide series, and writes and teaches regularly on Web markup languages and related topics.
Lucinda Dykes is the principal at Zero G Web Design in Santa Fe, New Mexico, and has been developing Websites and writing code since 1994. She teaches Web-related topics at Santa Fe Community college and has contributed to numerous books on XML.
For More Information:
- Looking for free research? Browse our comprehensive White Papers section by topic, author or keyword.
- Are you tired of technospeak? The Web Services Advisor column uses plain talk without the hype.
- For insightful opinion and commentary from today's industry leaders, read our Guest Commentary columns.
- Hey Codeheads! Start benefiting from other time-saving XML Developer Tips and .NET Developer Tips.
- Visit our huge Best Web Links for Web Services collection for the freshest editor-selected resources.
- Choking on the alphabet soup of industry acronyms? Visit our helpful Glossary for the latest lingo.
- Visit Ask the Experts for answers to your Web services, SOAP, WSDL, XML, .NET, Java and EAI questions.
- Discuss this issue, voice your opinion or just talk with your peers in the SearchWebServices Discussion Forums.