This is the fifth segment of what was planned as an eight-part series on XML eXtensible Stylesheets Language Transformations, aka XSLT, but has now gone into eight parts because of the very topic under discussion. The subject of brief tutorial is XSLT expressions, after taking care of variables and data types in the last tip. Like variable and data types, expressions ultimately originate from XPath. But remember that expressions are the "transformation engine" inside XSLT, at least when input must be parsed, interpreted, reacted to, or manipulated in some form or fashion.
The archetypal expression in XSLT and XPath is called a path expression, which defines a navigation path through a document tree. Starting from some defined point of origin—usually, this is either the current node in the tree, or the root of the tree—a path expression follows a sequence of steps in a specific direction. At each step, a path may branch so one can choose to apply an expression to a node in the tree, or to the children of that node. The result of a path expression is always a set of nodes; even if that result is empty or contains only a single node, it's still treated as a set of nodes.
Navigation proceeds in specific directions: these are called the axes, and are generally organized into the following sets (shown in alphabetical order):
- Ancestor axis: finds all ancestors of a node.
- Attribute axis: fins all attributes of a node.
- axis: finds all children of a node.
- Following-siblings axis: finds all nodes that follow the current node in the tree, and share the same parent.
- Preceding-siblings axis: finds all nodes that precede the current node in the tree, and share the same parent.
Other recognized axes also exist (though the preceding ones are most important when navigating document trees), and ancestor-or-self, descendant, descendant-or-self, following, namespace, parent, preceding, preceding-sibling, and self. These are fully documented in Chapter 5 of Michael Kay's XSLT Programmer's Reference (see part 1 of this series for a complete citation). They're also fully documented at the excellent XSLT tutorial at Zvon.org.
Expressions can also qualify which nodes are to be selected at each step in a path. Here again, there are various ways to make such qualifications, generally by defining specific constraints:
- Node name (complete or partial specifications are possible)
- Node type (elements or processing instructions)
- Node predicate (nodes must satisfy some arbitrary Boolean expression)
- Node position on the current axis (one can, for example, select all following siblings, or only the sibling that immediately follows the current node)
A path expression uses the «/» operator to separate successive steps in the expression. If an expression starts with «/» this indicates the origin begins at the root node; otherwise, it typically applies to the current node whatever that may be. In any given step, the axis is stated first using the «::» (double colon) separator. If no axis is specified (and it may be legally omitted) it's assumed to be the child axis. «@» serves as a valid abbreviation for the attribute axis. If you dig into the previously supplied references (Kay and the Zvon.org pages cited), you'll find plenty of examples to work through. For simplicity's sake, here are three brief examples:
child::Address//attribute::CSZ Address/@CSZ Address[@code="AustinTX78728"]@CSZ
Astute readers will recognize that since the default axis is child, and @ stands for attribute, the second expression to the first is functionally and syntactically equivalent to the first. The third expression illustrates that attribute value matching predicates use the @code operator; this one matches your author's city, state, and ZIP code to the value sought (and anyone else in the XML document who shares that same CSZ value).
In XPath and XSLT expressions are always subject to context, if not context dependent outright. This means that the value of the expression $a depends on the current value of the variable a, and the value of the «.» expression depends on the current node in the source document. Context has two components in XSLT and XPath, static and dynamic. The static component depends on where the expression occurs in the XSLT document or stylesheet, and the dynamic context depends on the state of processing as the expression is evaluated.
Static context is defined as follows:
- The complete set of namespace declarations in effect in the document where the expression occurs, and defines meaning and validity of namespace references used as prefixes in the expression.
- The complete set of variable declarations (all applicable <xsl:variable> and <xsl:param> elements) in effect where the expression occurs, and defines the validity of variable references in the expression.
Dynamic context may be determined as follows:
- Current values for all variables in scope for the expression, and may change each time an expression is evaluated.
- Current location in the source document tree, including
- Current node: Node in the source tree being processed, by virtue of providing the focus of <xsl:apply-templates> or <xsl:for-each> instructions. The current node may always be determined using the current() function.
- Context node: Usually this is identical to the current node, except in a predicate used to qualify some step within a path expression, when it becomes whatever node the predicate is testing. The context node may be referenced using the «.» expression or the verbose equivalent self::node(). The expression address[.="family"] selects all <address> elements with a string-value of "family".
- Context position: An integer number that indicates the position of the context node within the current node list, available using the position() function. Any time a list of nodes is processed using the <xsl:apply-templates> or <xsl:for-each> instructions, the context position value represents the position in the list for the current item being processed. When predicates are being tested, this applies to the position of the node within the list being tested.
- Context size: An integer number that indicates the total count of nodes in the current node list, available using the last() function.
Hopefully, it's pretty easy to see that context describes what variables and namespaces are relevant to an expression, which values for variables apply, and where in the source document tree the current location and position is situated. All these things apply when evaluating expressions, and must therefore be considered when writing them as well.
Given all this background on XPath/XSLT expressions the actual functions used in expressions are pretty straightforward, and are summed up in alphabetical order in Table 1 that follows (for examples of same, visit the Zvon page).
|boolean||converts input argument to Boolean, where the result is true if a number is neither positive or negative zero nor "not a number" (NaN), if a node set is not empty, if a string has a non-zero length, or uses a type-dependent conversion function|
|ceiling||returns the smallest integer number not less than the input argument|
|comment||Returns true for any comment node encountered, false otherwise|
|concat||Returns concatenation of all input arguments|
|contains||Returns true if first argument string contains second argument string, otherwise returns false|
|count||Returns number of nodes in input argument node-set|
|current||Returns node-set with current node as only member|
|document||Provides access to XML documents besides main source document (see Zvon for more important details)|
|element-available||Input argument must evaluate as a string that is a QName (qualified name with namespace and element components) converted into an expanded name with explicit namespace component; returns true only if expanded name matches the name of an instruction.|
|false||Returns Boolean false value.|
|floor||Returns largest integer number not greater than the input argument|
|format-number||Converts first numeric argument to string, using the format pattern from the second string argument and the decimal format named by the third string argument (or the default decimal format if no third argument is present).|
|function-available||Input argument must evaluate as a QName string; function returns true only if the resulting expanded name (see element-available for defns) matches the name of a function in the function library|
|generate-id||Returns a string that uniquely identifies the node in an argument node-set that is first in document order (syntactically, this string is an XML name)|
|id||Select elements by unique ID value; if the input argument is a node-set, the result is the union of the results of applying id to each node's string-value for the entire node set. Otherwise, the argument is converted to a string, and all matches to that string are returned to in a node set as the context node.|
|key||Works like id function for keys: first argument specifies key name, whose value must be a QName. When second argument is a node-set, the result is the union of applying the key function to each node in the set; otherwise, the argument is converted to a string, and all matches to that string are returned in a node-set as the context node.|
|lang||Returns true or false depending on whether the language for the context node (defined by the xml:lang attribute) matches or is a sublanguage of the language specified in the input argument string|
|last||Returns a number for the context size from the expression evaluation context.|
|local-name||Returns the local part of the expanded name of a node in the input argument node-set that is first in document order; if the node-set is empty or the first node has no expanded name, returns an empty string instead. If no argument is supplied, applies to the node-set consisting of the context node itself.|
|name||Returns a QName string for the expanded name of the node in the input node-set argument that appears first in document order. See Zvon site for additional details.|
|namespace-uri||Returns namespace URI for expanded-name of the node in the input node-set argument that appears first in document order, otherwise returns empty string as in local-name.|
|node||Returns true for any input node of any type|
|normalize-space||Returns input argument string with all leading and trailing whitespace sequences removed, and all internal whitespace sequences replaced by a single space. If no argument is supplied, applies to the string value of the context node.|
|not||Returns true if argument is false, false otherwise|
|number||Converts input argument to a number following IEEE 754 conversion rules; see Zvon site for details|
|position||Returns a number equal to the context position from the expression evaluation context (negative indicate preceding, positive succeeding)|
|processing-instruction||Returns true for any node that is a processing instruction, or for any processing instruction that matches the value of an input literal string|
|round||Returns the integer number that is closest in value to the input argument (see Zvon for additional details)|
|starts-with||Returns true if the first argument string starts with the second argument string; false otherwise|
|string||Converts input argument object into a string, where the string value of the node first in document order is returned for an input node node-set, and where most other values are returned as string equivalents (see Zvon for additional details)|
|string-length||Returns the number of characters in an input string; when no input value is supplied, this defaults to the string value of the context node|
|substring||Returns the substring of the first string argument starting at the position specified in the numeric value of the second input argument, using the length supplied in the numeric value of the third input argument. If no third argument appears, the remainder of the input string beginning from the start position is returned.|
|substring-after||Returns the substring of the first string argument that follows the first occurrence of the second string argument if found, or the empty string otherwise.|
|substring-before||Returns the substring of the first string argument that precedes the first occurrence of the second string argument if found, or the empty string otherwise.|
|sum||Returns the sum of converting the string value of each node in a node-set to a number for all nodes in the input argument node-set|
|system-property||The input argument must evaluate to a QName as mapped to an expanded name, where the return value is an object that represents the value of the system property that the name identifies if such a property exists; otherwise, the empty string is returned. This provides access to values like xsl:version, xsl:vendor, and xsl:vendor-url.|
|text||Returns true for any text node, false otherwise|
|translate||Returns the first argument string where occurrences of characters in the second argument string are replaced by characters that occur in the same position in the third argument string. If the third argument string is shorter than the second, all characters that appear in the second and not in the third are removed from the first; if the third argument string is longer than the second, excess characters from the third are ignored.|
|true||Returns the Boolean value true|
|unparsed-entity-uri||Returns the URI for any unparsed entity that matches the name specified in the input string value from the same document as the context node. If no such unparsed entity is found, returns the empty string.|
Some experimentation and learning is required to master the proper syntax, use, and placement of these expression functions. Please consult the Kay book and the Zvon site mentioned earlier in this tip for those details, and for copious examples.
About the author
Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT Certification and information security topics. E-mail Ed with comments, questions, or suggested topics or tools for review.
This was first published in September 2005