Q

Problems converting HTML to other languages

My company wants to convert their html files to xml for two reasons: (1) to enable us to present our documents in any format possible and (2) to allow us to extract parts of our documents to create new custom configurations. My problem is that I can convert our HTML to XHTML and then our own XML for which I have a DTD (and schema) but I am basically "wrapping" the XHTML in the elements I've defined. Converting back to HTML is not a problem as I can use <xsl:copy-of select="."/> and get the old HTML back. However, converting to WML or anything else is not possible right now since the text is full of old html tags: <ol>, <ul>, <li> and even <table>, <tr> and <td> that will not work for WML and probably not for other formats either. Any suggestions?
The crux of the problem is the wrapping of the html elements in the new DTD. Instead of wrapping existing elements you need to add a subset of XHTML's tagset into your own DTD. The subset should be aimed at simple presentation elements that you know you can map to, or directly use, in other formats e.g. p, b, perhaps ul etc. Note that table markup is particularly troublesome as it is often used in HTML to achieve layout effects that are difficult/impossible to re-use in other formats.
This was first published in May 2003

Dig deeper on XML and XML schema

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchSoftwareQuality

SearchCloudApplications

SearchAWS

TheServerSide

SearchWinDevelopment

Close