For those not already in the know, Open Office is a free suite of office productivity software that includes a large number of highly usable components, including text-procesing, spreadsheets, presentations, drawing, charting, complex mathematical notation and more. It runs on all major computing platforms and uses open-component-based APIs and XML-based file formats throughout.
The OpenOffice software traces its origins back to German software company StarDivision, who created the StarOffice suite in the mid-1980s and upon which Open Office is based. After being acquired in 1999 by Sun Microsystems, all versions of StarOffice from 6.0 forward have been constructed using OpenOffice.org source code, APIs, file formats and reference implementations. Sun continues to support the parent organization for Open Office (OpenOffice.org) to this day and even uses the standard XML file formats also used in Open Office for its ongoing Star Office development efforts.
At present OASIS is the industry consortium that controls the vendor-neutral OpenDocument standard on which OpenOffice XML file formats are based. It's managed through a technical committee that is advancing work already done on the OpenOffice.org XML file format for OpenOffice.org 1.0. The results of this labor make up the OpenOffice.org 2.0 XML file format, which defines the current standard for the files that Open Office creates and manages (it can also read and write files in numerous other formats, including
Common internal file extensions associated with Open Office include: odt for text documents, .odp for presentations, .ods for spreasheets, and .odg for graphics files. The database format is not part of the OpenDocument standard, so the .odb database format is not a part of this standard. Because OpenDocument notation is verbose, OpenDocument files are started in the Java Archive (JAR) format as JAR files: These include a compressed ZIP file with an additional manifest file that lists all archive contents. Thus, you can use any ZIP tool to unpack and inspect the contents of an OpenDocument file and examine the XML content for yourself.
Files that show up from such unpacking include an assignment of MIME type for the document (filename: mimetype), the document's XML contents (filename: content.xml), a document stylesheet (filename: styles.xml), metadata about the document content (filename: meta.xml), application specific information (filename: settings.xml), the manifest file (stored in a subdirectory named META-INF, it's called manifest.xml) and a list of all images included in the document (directoryname: pictures). If there are no images in the XML content file, the application that creates the JAR file may still include this directory, albeit devoid of contents.
You can learn all the gory details about Open Office XML file formats through the OASIS specifications, but savvy readers may prefer to turn to J. David Eisenberg's outstanding, free online book for O'Reilly & Associations. Called OASIS OpenDocument Essentials, it manages to be comprehensive, approachable and sometimes entertaining in leading you through the ins and outs of the XML markup from which OpenDocument files are built. After spending some time with the former, then discovering the latter, it took me less than five seconds to determine which would be my best source of information on this subject. Try them both and I'm sure you'll feel likewise!
About the author
Ed Tittel is a full-time writer and trainer whose interests include XML and development topics, along with IT Certification and information security topics. E-mail Ed at firstname.lastname@example.org with comments, questions or suggested topics or tools for review.
This was first published in August 2006