Ask the Expert

Problems converting HTML to other languages

My company wants to convert their html files to xml for two reasons: (1) to enable us to present our documents in any format possible and (2) to allow us to extract parts of our documents to create new custom configurations. My problem is that I can convert our HTML to XHTML and then our own XML for which I have a DTD (and schema) but I am basically "wrapping" the XHTML in the elements I've defined. Converting back to HTML is not a problem as I can use <xsl:copy-of select="."/> and get the old HTML back. However, converting to WML or anything else is not possible right now since the text is full of old html tags: <ol>, <ul>, <li> and even <table>, <tr> and <td> that will not work for WML and probably not for other formats either. Any suggestions?

    Requires Free Membership to View

The crux of the problem is the wrapping of the html elements in the new DTD. Instead of wrapping existing elements you need to add a subset of XHTML's tagset into your own DTD. The subset should be aimed at simple presentation elements that you know you can map to, or directly use, in other formats e.g. p, b, perhaps ul etc. Note that table markup is particularly troublesome as it is often used in HTML to achieve layout effects that are difficult/impossible to re-use in other formats.

This was first published in May 2003

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: