When it comes to navigating a document tree and plucking data from its leaves, many XML developers use one XSLT engine or another to perform this task. XSH is a command-line XML shell that makes this unnecessary for small or quick 'n dirty jobs, and allows you query and manipulate document data without creating reusable code. Of course, for repetitive tasks this isn't a benefit, but it's certainly a fine way to dig into documents and massage their contents on an ad hoc basis (though I'm tempted to see if a combination of scripting with XSH might not even lend itself to such uses -- but that's a tip for another day ...).
All shells must come from somewhere, and the open source XSH is no exception. XSH is built from Perl, and its syntax is quite Perl-like (a benefit for those already familiar with that language, and no terrible burden for those who are not). XSH supports Perl code, and even allows XML data structures to be accessed just like regular Perl variables. XSH also supports XPath, and processes queries formulated using XPath syntax.
Interested readers and tinkers can download XSH from the SourceForge
For Linux, BSD variants, MacOS X, and other Unix varieties:
- XML-XSH2-2.0.2.tar.gz (or a higher numbered version of this file; if available; it's the most recent version of XSH)
- Xmltools-bundle.tar.gz: This is the collection of libraries and modules that XSH uses to handle its runtime responsibilities.
To install XSH in this type of environment, uncompress the xmltools bundle, then run the install script contained therein to compile and install the XSH components (after you cd into the xmltools-bundle/ directory, type ./install -prefix=/usr/local; you may need to change the -prefix setting to match your local runtime environment).
- Grab a copy of ActivePerl from ActiveState.
- Grab xsh2-2.0.2-windows-apnnn.zip, where nnn is 561 for ActivePerl 5.61 or 580 for ActivePerl 5.8. This includes an XSH package for ActiveState's Perl Package Manger (ships with ActivePerl).
- Grab xmltools-bundle-apnnn.zip, where as before nnn is 561 or 580 to match whichever version of Perl you're using.
To install XSH in Windows environments, first install ActivePerl, then unzip the downloaded files into a directory of your choice. Inside that directory, execute the ppm.bat package manager file inside a command window (Start, Run, type Command in the Open: textbox, type ppm.bat, then click the OK button). When the PPM shell fires up, type set rep localdir . if you're running ActivePerl 5.6.1, or rep add localdir . if you're running ActivePerl 5.8. Then install the packages (you must download them first) by typing install XML::XSH in the PPM shell.
You can use XSH by trying various commands at the command line. Make sure your files are in a working directory, then set the local context to that directory to begin working with them. The xsh command opens the shell. $a:- open "myname.xml" opens a local file named myname.xml. ls $a lists the contents of myname.xml. You can use the ls command to access XML document structures that occur in the file. If the file contents looks something like this (sans preambles and header data):
<root filename="myfile.xml"> <branch> <label>This is label 1</label> <data>This is data </data> <placeholder /> </branch> <branch> <label>This is label 2</label> <data>This is data 2</data> </branch> </root>
typing ls //root[branch/label="This is label 1"] will produce the following output:
<root filename="myfile.xml"> <branch> <label>This is label 1</label> <data>This is data </data> <placeholder /> </branch> </root>
Some people will recognize this as an XPath query for all branch elements whose label content matches the string "This is label 1." Simply typing ls //root/branch will display all branch elements in the myname.xml file. Additional XPath details are covered in its official specification and in the Zvon tutorial for those unfamiliar with this excellent tool.
Basic XSH commands mirror Perl and common Unix shells, so that cd changes the position of the context node (sometimes called the cursor in other implementations), ls lists file contents (based on current document position or node specifications), and pwd shows the current context node location. Manipulation commands include the following:
- copy: Copy one or more nodes from a source to a destination (both XPath). Makes only a single copy of source node to the same position in the parameter list at the destination
- insert: Insert a new node of a given type. Node types can be element, attribute, text, cdata, comment, chunk, or entity_reference.
- map: Map an expression or short operation onto a list of nodes.
- move: Move nodes from one place to another. Identical to providing the same argument first to copy, then to remove.
- remove: Remove one or more nodes.
- rename: Rename a node.
- xcopy: Cross-copy nodes from a source to a destination. Copies every source node to every destination node, so that each destination node gets a copy of every copied source node.
- xinsert: Cross-insert nodes to one or more destination nodes. Works like xcopy; inserts one copy of each source node specified into every destination node.
- xmove: xcopy followed by remove.
Of the foregoing commands, copy, insert, xcopy, xinsert and xmove take a location parameter to specify how sources nodes should be handled vis-à-vis destination nodes. The following values are legal for this location parameter:
- after: Place source nodes after destination nodes. Most = cases are obvious, but when both source and destination nodes are attributes, XSH attaches the source node to the parent element of the destination attribute. If the source is not an attribute, but the destination node is an attribute, the text of the source attribute is appended to the destination attribute's value.
- append: Append source node to destination node. If destination nodes are of type element or document, source node gets added as a child of the destination node. Otherwise, XSH appends textual content of source node to content of destination node.
- before: Place source nodes before destination nodes. Works just like after, except goes to the preceding position, not the following one.
- into: Place source nodes into destination nodes. If destination nodes are of type element, source nodes become children of the element (unless the source node is of type attribute, when the source node becomes an attribute of the destination node). Otherwise, the value for the destination node is set to that of the source node.
- prepend: Place source node before destination node. Works just like append, except goes to the preceding position not the following one. For children, prepend starts at first child and bumps all other children forward.
- replace: Replace entire destination node with source node, except when destination node is an attribute. Then, only text content of destination node is replaced by text content of source node.
The types of nodes have already been listed (and will be familiar to most folks already familiar with one or more XML markup languages), but here they're briefly defined as well:
- element: Element markup (for example, <root>...</root> <branch>...</branch> or <placeholder />)
- attribute: An attribute associated with an element (for example filename="myfile.xml")
- text: Text content (occurs between element tag pairs)
- cdata: A CDATA section that contains binary data or other content not to be parsed as text (from SGML)
- comment: An XML comment (starts with <!-- And ends with -->)
- chunk: Any piece or section of well-formed, valid XML in text form
- entity_reference: Any entity reference (usually to represent metadata, as in & for ampersand).
Those willing to spend some time playing with XSH will find their efforts amply rewarded. Because of its Unix-heavy roots and origin, those working with XSH in related environments may find the tool both more familiar and usable than in Windows environments. But in either home it will find a well-earned play in any XML developer's toolbox. Explore the XSH <About/> page for more information, with special attention to the <Features/>, <Usage/>, and <Examples/> links.
About the author
Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT certification and information security topics. E-mail Ed at firstname.lastname@example.org with comments, questions, or suggested topics or tools for review.
This was first published in May 2005