Content or attribute: Which is best?

This tip helps decide whether a value should be described as content, or as an attribute in XML.

Content or attribute: Which is best?
Ed Tittel

Momentous questions always need answers that start off, as Ed says in this tip, with "It depends." So when the question is about whether a value should be described as content, or as an attribute, well, the answer starts with...

Got an XML developer tip of your own? Why not send it in? We'll post it on our Web site, and we'll enter you in our tips contest for some neat prizes.


Two of the most interesting and enduring questions with which all XML document designers must grapple might be succinctly stated as "When should some value associated with an element be captured as an attribute value? When should it be treated as content associated with that element instead?" That difference shows up in markup as follows, assuming a title element exists, and that you?d like to associate an ISBN number with some title.

You could design XML markup to capture this data using either of the following forms: You could design XML markup to capture this data using either of the following forms: <title ISBN="1583482590">HTML 4.0 Specification</title>, or <title>HTML 4.0 Specification <ISBN>1583482590</ISBN> </title>, or even <title name="HTML 4.0 Specification" ISBN="1583482590">. As you can see, this also raises the question of when (or what) content associated with an element should be treated as explicit content--that is, text placed between some element's opening tag and its closing tag--or when that data should be associated with some attribute instead of appearing between a matched pair of tags designed to contain content.

You'll find some fascinating coverage of this topic on Robin Cover's SGML/XML pages, specifically on a document entitled "SGML/XML: Using Elements and Attributes" (http://xml.coverpages.org/elementsAndAttrs.html). Among other things, it's fascinating to learn that this debate goes way back into an era when SGML ruled the markup world as the supreme metalanguage for defining documents, and to learn that no conclusive or final resolution to this debate is ever likely to be forthcoming.

A wise friend of mine once told me that the hallmark answer to any question that indicates that the question probes serious matters, or that it confronts real-world issues always starts with the same two words "That depends..." In the case of the question of designing XML markup to make an element's value into element content or an attribute's value, here are some of the things that making that decision depends upon:

  • How you want to use the value involved. Michael Sperberg-McQueen (MSM) makes the distinction between an inherent value and a constituent part, where an inherent value is some value associated with all objects in a class (such as height, weight, color, and so forth) and a constituent part represents some part of an object without which the object is not a valid member of its class. MSM uses the notion of a person's head to illustrate that while height and head are both parts of a normal person, a person without a head is by no means normal, but a person without a height is still a person.
    Here the argument seems to be that you should use attributes to capture inherent values that are not constituent parts. An interesting way to distinguish what I can understand better as a property of an object from a vital part of that object.
  • MSM also makes the point that you can create elements represented as data tuples, by associating the same set of named attributes with all such elements, and making sure those attribute values are properly instantiated. In that case, each element looks like a row in a table of values, where the column names correspond to attribute names.
  • Attribute values work well for datatype validation; embedded elements work well for complex structure validation; element content works well for random or unstructured element data. Another way to think of this is as a kind of abstract data typing mechanism.
  • Clearly distinguish metadata from content, where metadata is information that describes a container for content, and where content is the information that the container is meant to convey. Metadata belongs in attributes; content belongs in child elements or as text information within the element itself.
  • Jim Amsden of IBM makes the point that object attributes have no identity on their own, and when objects have relationships those relationships are between or among each other, not between and among specific attributes. He also points out that attribute names indicate the roles they or their values play in an element, that attributes may be assigned default values, and that they're easier to access when navigating a document object model (DOM). On the minus side, he adds that attributes aren't convenient for large amounts of data, that they do not support binary data well, that attributes can't contain other elements, and that white space within an attribute cannot be ignored (since values in quotes must be preserved as entered).

Although it's mathematically and logically true that any document that uses attributes can be modeled in another document that replaces attributes with equivalent child objects, it's important to understand the use of attributes as a matter of compactness and convenience. If the data you're trying to capture for certain objects is unlikely to be accessed on its own, and it qualifies clearly as metadata, it will probably work well as an attribute. Otherwise, give your design some more thought--you may be able to revise it to capture that same information as element content or as content for child elements more effectively.

Just remember that in the final analysis, the document design that is best is the one that works best to meet your information capture, management, and delivery needs.


Have questions, comments, or feedback about this or other XML-related topics? Please e-mail me at tips@searchmiddleware.com; I'm always glad to hear from my readers!

Ed Tittel is a principal at LANWrights, Inc., a wholly owned subsidiary of LeapIt.com. LANWrights offers training, writing, and consulting services on Internet, networking, and Web topics (including XML and XHTML), plus various IT certifications (Microsoft, Sun/Java, and Prosoft/CIW).

Related Book

XML Internationalization and Localization
by Yves Savourel
Online Price: $49.00
Publisher Name: SAMS Publishing
Date published: June 2001
Summary:
The purpose of this book is twofold: First to describe what needs to be done to internationalize XML documents and applications; second to describe how the XML data can be localized efficiently. There is currently almost no information on these two topics grouped and organized in a single reference. In addition, while XML has evolved a lot the past 2 years, it has now reached a point of global acceptance, as evidenced by the many international XML working groups addressing trading partner agreements, electronic document exchange, business processes, and eBusiness.


This was first published in September 2001

Dig deeper on XML and XML schema

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchSoftwareQuality

SearchCloudApplications

SearchAWS

TheServerSide

SearchWinDevelopment

Close