XML in Office 2003
XML in Office 2003
Executive Summary
XML support is the most dramatic change in Microsoft® Office since Word and Excel first got bundled together for marketing purposes nearly 20 years ago. Up until now, your favorite Office applications helped you input data, manipulate data, and format data. Now those same applications have the capability to understand what your data means, help you work with it more effectively, and present it with greater flexibility than ever before.
For example:
- Now you can establish rules for each type of document. If a document violates a rule, you can tell Office not to print or save the document until the violation is corrected.
- Now at the click of a button you can instantly transform a document to suit multiple purposes.
- Now you can save documents as XML data files that can be opened and edited with any industry standard XML editor.
Table of Contents
I. Why Office 2003 isn’t your father’s Office.
II. The impact of XML
III. Sneaking a peek inside an XML data file
IV. Schemas and transforms: the key to the XML highway
V. Putting the pieces together with Office Professional Edition 2003
VI. Things you can do with XML data in an Office 2003 application
VII. Experiencing the power of XML for yourself
VIII. Links to more information about XML
IX. About Wordsite Office Automation
I.Why Office 2003 isn’t your father’s Office.
Microsoft Office has always been littered with what the cognoscenti like to refer to as “undocumented” features but the most important new feature of Office Professional Edition 2003 isn’t so much undocumented as it is unheralded. You might say it’s hiding in plain sight.
I’m talking about XML support across the most popular Office 2003 applications, Word, Excel, and Access, and even the new InfoPath application.
No one at Microsoft denies that Office Professional Edition 2003 supports XML but I don’t see the company going out of its way to promote or explain this incredibly useful new functionality. Instead, the company is touting changes in Outlook’s interface and talking up OneNote and InfoPath and the brave new world of collaborative workspaces.
I’m all in favor of Outlook’s new interface. I’m totally convinced and completely accepting of the fact that OneNote and InfoPath and collaborative workspaces are the new new thing. To me, though, nothing about Office 2003 is as exciting or as immediately useful as XML support. In fact, in my opinion XML support is the most dramatic change in Office since Word and Excel first got bundled together for marketing purposes nearly 20 years ago.
NOTE: If you’ve purchased the Small Business Edition, Standard Edition, or Teacher and Student Edition of Office 2003, you’ve lost out on the most powerful aspects of XML in Office. You can save your documents in a native XML file format that can be manipulated and searched by any industry standard XML editor, and that’s great, but that’s all. The real power of XML isn’t available to you unless you purchase Office Professional Edition 2003.
II.The impact of XML
XML has been around since 1998 but most of us didn’t care because it was a fairly arcane technology aimed at specialists and geared toward database compatibility. Since then, XML has grown up and learned a thing or two about validating and formatting all types of data and presenting a single body of data in different ways to different users.
Oops, was I starting to bore you with all that talk about data? Don’t go to sleep on me just yet.
Sure, XML is all about data and that’s nice for database designers and database administrators and database users. But XML isn’t just about data-data. Starting with Office Professional Edition 2003, it’s about Word documents and Excel spreadsheets and data from other Office applications. In other words, it’s about you and the kind of data you work with on a day-in, day-out basis.
With Office 2003 you can create an XML document in Word and at the click of a button instantly transform it to suit multiple purposes. For example, you can create a training guide and at the click of a button transform it into a teacher guide or a student guide or a self-quiz. (Stay tuned to learn how. It’s easier than you think.)
Another benefit of XML is that it allows you to establish rules for each type of document. For example, you can specify that a fax number must appear at the top of each fax cover sheet and must always include a valid area code. If a document violates this or any other rule, you can tell Office to refuse to print or save the document until the violation is corrected.
Best of all, since Office 2003 produces files that can be opened and edited with any XML editor, you can share your work with other users in new and exciting ways.
Perhaps the best way to sum up the impact of all this is to state the following: Up until Office Professional Edition 2003, your favorite Office applications helped you input data, manipulate data, and format data. Now those same applications are capable of understanding what your data means and helping you work with it more effectively and allowing you to present it with greater flexibility than ever before.
Does this mean that every user of Office Professional Edition 2003 needs to become an XML geek? Notonyerlife but it does mean that every user owes it to himself or herself to experience the incredible power that derives from working with XML data in Office.
The rest of this article provides some additional background and a simple set of procedures that any user of Office Professional Edition 2003 can carry out in minutes in order to experience personally and directly the amazing power of XML data in Office.
NOTE: If you lack the time to carry out even these simple procedures, you can download a sample file from the sites listed at the end of this article. Don’t stop reading, though. The sample file will dazzle you but you won’t have any idea how it was put together unless you take a few minutes to read through the discussion that follows.
A word before we begin: The fundamentals of XML are extremely simple to understand, so that’s what you’ll find here. What I’m about to explain is so easy to understand that even your boss should be able to follow along. In fact, I’ll go out on a limb and say that even your boss’s boss should be able to follow along.
Remember, the point of what follows is NOT to turn you into an XML geek but rather to show you how powerful and easy-to-use XML technology has become with the advent of Office Professional Edition 2003. If this leaves you hungry for more information, so much the better. In that case, you’ll appreciate the references I’ve provided at the end of the article, which will lead you to more information about the technical nitty-gritty of XML.
III.Sneaking a peek inside an XML data file
XML stands for eXtensible Markup Language. This tells you that XML belongs to the same family of languages as HTML (HyperText Markup Language) and SGML (Standard Generalized Markup Language), which many users have tried to learn without success.
Fortunately, XML is dramatically easier to learn and use than other markup languages. I’ve identified six key concepts that are worth learning even if you’re not the type who normally looks under the hood. It will take you a few minutes to grasp these concepts but only a few minutes and not because I’m such a great communicator but because the concepts are very easy to grasp.
XML 101: Data is divided into elements
Consider the following snippet of XML “code.”
<ARGUMENT>
<FACT>One way to learn a language is to immerse yourself in it.</FACT
<FACT>If you’re reading this, you’re immersed in XML.</FACT
<CONCLUSION>One way to learn XML is to read this.</CONCLUSION>
</ARGUMENT>
<ARGUMENT>
<FACT>I made up these tags to suit my purpose.</FACT
<FACT>My purpose was to help you read XML.</FACT
<CONCLUSION>I made up these tags to help you.</CONCLUSION>
</ARGUMENT>
Before I even bother to explain what you’ve just read, I want to pause and give you a chance to celebrate the progress you’ve already made toward mastery of XML.
Did you notice any of the following?
1. That each element of data started with a <TAG> of some kind and ended with a matching </TAG>?
2. That the tags have friendly, meaningful names that describe each element of data?
3. That some elements were nested inside other elements?
If you noticed all that, then you’re ready to graduate from XML 101. If you didn’t notice all that, then take a minute to read through the code snippet again before proceeding.
Did you also notice that I snuck into this discussion the technical term, “element”?
XML data is divided into elements so that each element can be tagged separately from all the other elements. That way, the meaning of each element can be made clear and elements can be nested inside each other according to their natural relationships. This approach makes data much easier to manage than if it were stored as one large, undifferentiated, unlabeled mass.
In the snippet above, there’s nothing special about the element names except that they suited my purpose (or so it seemed to me). I used ARGUMENT, FACT, and CONCLUSION but I could just as easily have used SYLLOGISM, PREMISE, and CONCLUSION or almost any other names that reflect the type of data contained in each element.
A side note: There are a couple of rules you have to follow when naming elements, regarding placement of alpha and numeric characters and the use of punctuation and space characters. Also, an element of data can contain almost any text but some characters have to be represented using special techniques. These details are fully explained in the references listed at the end of this article.
XML 102: Getting at the root of the matter
You can store as many elements of data in an XML file as desired but the file can have only one “root” element. The root element is the first element in the file. All of the other elements must be nested inside the root. In the XML snippet below, the <EXAMPLES_OF_LOGIC> element serves as the root element. All of the other elements are nested inside that element.
<EXAMPLES_OF_LOGIC>
<ARGUMENT>
<FACT>One way to learn a language is to immerse yourself in it.</FACT
<FACT>If you’re reading this, you’re immersed in XML.</FACT
<CONCLUSION>One way to learn XML is to read this.</CONCLUSION>
</ARGUMENT>
<ARGUMENT>
<FACT>I made up these tags to suit my purpose.</FACT
<FACT>My purpose was to help you read XML.</FACT
<CONCLUSION>I made up these tags to help you.</CONCLUSION>
</ARGUMENT>
</EXAMPLES_OF_LOGIC>
If you promise to remember that an XML file can have only one root element and that all of the other elements in the file must be nested inside the root element, then you’re ready to graduate from XML 102.
XML 103: Elements are like pockets, sometimes they're empty.
Some XML elements contain no data. The following snippet contains three ARGUMENT elements. The middle one (i.e., the second one) contains no data.
<EXAMPLES_OF_LOGIC>
<ARGUMENT>
<FACT>One way to learn a language is to immerse yourself in it.</FACT
<FACT>If you’re reading this, you’re immersed in XML.</FACT
<CONCLUSION>One way to learn XML is to read this.</CONCLUSION>
</ARGUMENT>
<ARGUMENT/>
<ARGUMENT>
<FACT>I made up these tags to suit my purpose.</FACT
<FACT>My purpose was to help you read XML.</FACT
<CONCLUSION>I made up these tags to help you.</CONCLUSION>
</ARGUMENT>
</EXAMPLES_OF_LOGIC>
Did you find the empty element? Here’s another look at it:
<ARGUMENT/>
In the following code snippet, you’ll see an ARGUMENT element that contains two FACT elements and a CONCLUSION element. The FACT elements are empty. So is the CONCLUSION element. The ARGUMENT element isn’t empty, however, because it contains the FACT elements and the CONCLUSION element.
<ARGUMENT>
<FACT/>
<FACT/>
<CONCLUSION/>
</ARGUMENT>
Did you notice the following?
That an empty element needs only one <TAG/>?
That the tag for an empty element includes a slash immediately before the closing angle bracket?
If you noticed both of these things, then you’re ready to graduate from XML 103. If you missed either of them, then take a minute to read through the code snippets again before proceeding.
XML 104: Attributes are just properties that have moved uptown
Sometimes naming an element isn’t sufficient to convey everything there is to know about the element. For that reason, XML allows you to assign other properties to an element besides a name. The other properties are called “attributes.” Don’t let the terminology throw you. Attributes are just properties by another name. Many elements have no attributes. Some have just one attribute (besides a name). Some have multiple attributes.
In the following snippet, each ARGUMENT element has been assigned an attribute that tells whether the element is convincing. The “convincing” attribute of the first ARGUMENT element is set to “Yes” but the attributes of the other two ARGUMENT elements are set to “No.” See if you can figure out why.
<EXAMPLES_OF_LOGIC>
<ARGUMENT convincing="Yes">
<FACT>All virtues are laudable.</FACT
<FACT>Kindness is a virtue.</FACT
<CONCLUSION>Kindness is laudable.</CONCLUSION>
</ARGUMENT>
<ARGUMENT convincing="No"/>
<ARGUMENT convincing="No">
<FACT>Socrates was a man.</FACT
<FACT>All men are created equal.</FACT
<CONCLUSION>Socrates was Plato.</CONCLUSION>
</ARGUMENT>
</EXAMPLES_OF_LOGIC>
NOTE: I used uppercase letters for the ELEMENT NAMES and lower case letters for the attribute names. I didn’t have to do that. I did it to make it easier to distinguish the names from the attributes. (There are a couple of rules you have to follow when naming attributes. The rules are explained in the references listed at the end of this article.)
Did you notice the following?
That attributes are placed inside an element’s start tag?
That an empty element can have attributes just like any other element?
If you noticed both of these things, then you’re ready to graduate from XML 104. If you missed either of them, then take a minute to read through the code snippets again before proceeding.
XML 105: My baloney has a first name and so do all my XML elements
The great thing about XML is that you get to name your own data elements. The bad thing about XML is that everyone else gets to name their own data elements, too, which means that naming conflicts can occur.
Naming conflicts aren’t unique to XML. They can occur in all sorts of situations where people are allowed to name things. They can occur even with something as simple as filenames.
For example, I have a website and one of the files at my website is called index.html. The New York Times has a website and one of the files there is called index.html, too.
Fortunately, there’s no danger of the two files being confused, because they exist in different domains. The domain name for my website is The domain name for the New York times website is
When you include the domain name as part of a filename, you eliminate the possibility of a naming conflict. For example, the full name of the index.html file at my website is and the full name of the index.html file at the New York Times website is Since these names are unique, no one will ever confuse my file and the New York Times file.
What’s needed for XML is something like a domain name, to prevent like-named elements from being confused. XML uses namespaces for this purpose. Namespaces are a lot like domain names. They act like a first name for each element. When you include a unique namespace name as part of an element name, you eliminate the possibility of a naming conflict.
For example, I might have an XML element called <TITLE> and the New York Times might have an XML element called <TITLE> but a naming conflict can be avoided if the full name of my element is { and the full name of the New York Times element is {
By the way, there’s no rule saying that a namespace name has to look like a domain name, but why argue with success? I know that my domain name is unique and I know that the New York Times domain name is unique, so why not use the domain names as part of the namespace names? That way, I can be sure that the names are unique. (This idea isn’t new with me. Domain names are widely used as namespace names precisely because they are unique.)