Tip

Two tricky techniques for preserving character entities in XSLT 2.0

Thanks to a recent story by Bob DuCharme for XML.com, entitled "Entity and Character References," whose focus is XSLT 2.0, I found myself pondering a problem typical for those who take XML documents through multiple parsers while working

    Requires Free Membership to View

through various transformations or operations. DuCharme succinctly observes that while a parser's job is to take entity references (in SGML those symbolic names that start with an ampersand and end with a semicolon, like the character entities &amp; for ampersand and &lt; for the less-than symbol) and replace them with their values. Trouble is, if you're trying to create output that needs and expects characters entities in the final document, you're in a bit of a pickle if a parser somewhere early in the chain replaces &amp; with "&" and &lt; with "<".

But there is a two-step maneuver that makes this relatively easy to gloss, without having to store those items as unparsed character data in CDATA sections, or through use of XSLT's disable-output-escaping attribute. By first using numeric references rather than character entities -- that is &#38; rather than &amp; and &#60; in ISO-Latin-1 -- you can use XSLT to transform this stuff exactly as you wish during a final editing pass (or at least, something that follows after the last parser that might otherwise make substitutions you don't want). This, of course, is step number one.

Step number two depends on using the character map feature in XSLT 2.0, whereby you can convert input strings consisting of specific characters into whatever you instruct your markup to do. In this case, you can take numeric character references (which are not entities, and hence not parsed) and turn them into character entities so they're ready when you need them. A character map basically defines a substitution table that the XSLT processor uses so that when it finds a certain string, instead of writing it directly to the results tree, it inserts a corresponding replacement instead. Thus, the following example:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:output use-character-maps="num2ent"/>

  <xsl:character-map name="num2ent">
    <xsl:output-character character="&#38;" string="&amp;"/>
    <xsl:output-character character="&#60;" string="&lt;"/>
  </xsl:character-map>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

This markup does nothing more than write the entire results tree verbatim to output except when it encounters the two numeric entities specified, in which case it replaces them with the desired character entities. Obviously, thanks to Mr. DuCharme, you can grab this code and add whatever <XSL:output-character...> replacements you want and you've got a handy-dandy tool. This is particularly useful when you have to run content through other applications (like MS Office components) that may not perform entirely sensible replacements for you, or when you want to create markup as final output (something anybody who teaches markup must do all the time). Very handy indeed!


Ed Tittel is a writer, trainer, and consultant based in Austin, TX, who writes and teaches on XML and related vocabularies and applications. E-mail Ed at etittel@lanw.com.


This was first published in July 2004

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.