Overview of WordprocessingML

Microsoft Corporation
November 2003


© 2003 Microsoft Corp. All rights reserved.

The information contained in this document represents the current view of Microsoft Corp. on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), or for any purpose, without the express written permission of Microsoft Corp.

Microsoft may have patents, patent applications, trademarks, copyrights or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights or other intellectual property.

Microsoft, ActiveX, Outlook, and Visual Basic are either registered trademarks or trademarks of Microsoft Corp. in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Microsoft Corp.• One Microsoft Way • Redmond, WA 98052-6399 • USA

Overview of WordprocessingML

Table of Contents

Introduction 5

Structure of This Document 5

Section 1: WordprocessingML Overview 6

Top-Level Elements, Namespace, Basic Document Structure 6

Tying the Document to Microsoft Word 7

Section 2: Adding Text to the Document 8

The t (Text), r (Run), and p (Paragraph) Elements 8

Sections 8

Organizing Text 9

Inserting Breaks 10

Creating Paragraphs 11

Tabs 11

Section 3: Formatting Text 13

Formatting Runs of Text 13

Formatting Paragraphs 15

Styles 16

Using Styles 16

Defining Styles 17

Extending Styles 18

Style Properties 20

Property Conflicts 23

Fonts 24

Specifying Fonts 24

Using Fonts 24

Formatting a Section 25

Setting a Page’s Size and Margins 25

Columns 26

Section 4: Document Components 26

Lists 27

Headers, Footers, and Title Pages 31

Tables 34

Document Properties 37

Document Information 37

Section 5: Other Topics 38

Graphics 38

Bookmarks 40

Fields 40

Simple Fields 41

Complex Fields 41

Hyperlinks 44

Macros and Components 46

Section 6: Auxiliary Elements 47

Sections and Subsections 47

The sect Element 48

The sub-section Element 49

Using the sect and sub-section Elements 50

Auxiliary Attributes of the Tab Element 51

The Auxiliary font Element 52

The Auxiliary estimate Attribute 52

Reference 52

Table 1: WordprocessingML Elements 52

Table 2: rPr Child Elements (Run Properties) 54

Table 3: pPr Child Elements (Paragraph Properties) 56

Table 4: sectPr Child Elements (Section Properties) 59

Table 5: style Element Attributes 60

Table 6: style Child Elements (Style Definitions) 60

Table 7: tab Element Attributes 61

Table 8: WordprocessingML Auxiliary Elements and Attributes 62

Table 9: docPr Child Elements (Document Properties) 62

Table 10: Table-Related Elements 70

Table 11: tblPr Child Elements (Table Properties) 71

Table 12: tblpPr Child Elements (Table Positioning Properties) 72

Table 13: trPr Child Elements (Table Row Properties) 73

Table 14: tcPr Child Elements (Table Cell Properties) 74

Table 15: listDef Child Elements (List Definitions) 75

Table 16: lvl Child Elements (List-Level Definitions) 75

Table 17: nfc Element Integer Values 76

Introduction

This document describes the elements in the WordprocessingML Schema that are important to document developers and to application developers whose programs will read and write WordprocessingML documents. The text assumes that you have a basic understanding of XML 1.0, XML namespaces, and the functionality of Microsoft® Office Word. Each major section of this document introduces new features of the language and describes those features in the context of concrete examples.

In this document, you’ll see how to:

·  Create a document with typical Word structures (paragraphs and sections)

·  Add typical document components, including lists, tables, headers, footers, and title pages

·  Format your documents by specifying formatting at any level within the document

·  Define and use styles

·  Insert graphics, bookmarks, hyperlinks, and fields into your document

Following this introduction to WordprocessingML is a reference to the WordprocessingML elements that are most useful to developers.

Structure of This Document

After an initial overview of WordprocessingML and document-level properties and information, this whitepaper looks at WordprocessingML topics in the order that developers will presumably need them. This structure means that some elements are not discussed in detail in one location. For instance, the documentProperties element contains elements that affect how fields and headers are handled. As a result, the child elements of the documentProperties element are discussed in two different places in the document.

·  Section 1: An overview of WordprocessingML that describes the simplest possible WordprocessingML document and a summary of the top-level WordprocessingML elements.

·  Section 2: Adding content to WordprocessingML document as unformatted text.

·  Section 3: Formatting text, including defining and using styles.

·  Section 4: Adding additional components to documents including lists, tables, headers, and footers. Also covered in this section are the document information and properties sections.

·  Section 5: Other topics: bookmarks, hyperlinks, fields.

·  Section 6: The auxiliary elements added by Word to a WordprocessingML document to provide information about the document.

·  Reference: Tables that list and describe the WordprocessingML elements.

Section 1: WordprocessingML Overview

Top-Level Elements, Namespace, Basic Document Structure

The top-level elements in a WordprocessingML document are:

·  Document information (documentProperties element)

·  Font information (fonts element)

·  List-style definitions (lists element)

·  Style definitions (styles element)

·  Drawing defaults (shapeDefaults element)

·  docSuppData element (Microsoft Visual Basic® for Applications [VBA] code)

·  Document options (docPr element)

·  The document’s content (body element)

However, the simplest Word document consists of just five elements (and a single namespace). The five elements are:

Element / Description
wordDocument / The root element for a WordprocessingML document.
body / The container for the displayable text.
p / A paragraph.
r / A contiguous set of WordprocessingML components with a consistent set of properties.
t / A piece of text.

The namespace for the root WordprocessingML Schema (also known as the XML Document 2003 Schema) is “http://schemas.microsoft.com/office/word/2003/wordml”. This namespace is normally associated with the WordprocessingML elements by using a prefix of “w.” The simplest possible WordprocessingML document looks like this:

<?xml version="1.0"?>

<w:wordDocument

xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"

<w:body>

<w:p>

<w:r>

<w:t>Hello, World.</w:t>

</w:r>

</w:p>

</w:body>

</w:wordDocument>

In Figure 1, you can see the resulting document, displayed in Microsoft Word.

Figure 1. A WordprocessingML document in Microsoft Word

Tying the Document to Microsoft Word

If you save a Word document with the .xml extension, Windows will treat the file like any other XML file. Double-clicking the file, for instance, will open it in the standard XML processor (usually Microsoft Internet Explorer). However, adding the mso-application processing instruction specifies Word as the preferred application for processing the file. As a result, Word will open the XML document when the user double-clicks the document’s icon. This example shows the sample document with the mso-application element added:

<?xml version="1.0"?>

<?mso-application progid="Word.Document"?>

<w:wordDocument

xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"

<w:body>

<w:p

<w:r>

<w:t>Hello, World.</w:t>

</w:r>

</w:p

</w:body>

</w:wordDocument>

A side effect of this automatic behavior, however, is that it prevents the display in Internet Explorer of the XML markup of XML files saved by Word. You can temporarily disable this behavior by deleting the following registry entry and value

Word.Document = “application/msword”

from the following subkey:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\11.0\Common\Filter\text/xml

Section 2: Adding Text to the Document

The document’s content is held in the body element. Text within the body element is kept in a nested set of three elements: t (a piece of text), r (a run of text within a paragraph), and p (a paragraph).

The t (Text), r (Run), and p (Paragraph) Elements

The lowest level of this hierarchy is the t element, which is the container for the text that makes up the document’s content. You can put as much text as you want in a t element—up to and including all your document’s content. However, in most WordprocessingML documents, long runs of text will be broken up into paragraphs and strings with different formats, or be interrupted by line breaks, graphics, tables, and other items in a Word document.

A t element must be enclosed by an r element—a run of text. An r element can contain multiple occurrences of t elements, among other elements. The r element allows the WordprocessingML author to combine breaks, styles and other components but apply the same characteristics to all the parts of the run. All of the elements inside an r element have their properties controlled by the rPr element (for run properties), which is the first child of the of the r element. The rPr element, in turn, is a container for a set of property elements that are applied to the rest of the children of the r element. The elements inside the rPr container element allow you to control, among other options, whether the text in the following t elements is bold, underlined, or visible.

Sections

In a WordprocessingML document, the layout of the page that your text appears in is controlled by the properties for that section of the document. However, there is no container element for sections in WordprocessingML. Instead, the information about a section is kept inside a sectPr (section properties) element that appears at the end of each section. Though a sectPr element isn’t necessary in a WordprocessingML document, Word always inserts a sectPr element at the end of any new document that it creates. Here is a typical sectPr element generated by Word when a document is created:

<w:sectPr>

<w:pgSz w:w="12240" w:h="15840"/>

<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800"

w:header="720" w:footer="720" w:gutter="0"/>

<w:cols w:space="720"/>

<w:docGrid w:line-pitch="360"/>

</w:sectPr>

When new sections are added to a WordprocessingML document, the new sectPr elements must appear inside pPr elements (which are discussed later) inside p elements. This example shows a sectPr element added to a document to mark the end of a section:

<w:p>

<w:pPr>

<w:sectPr>

<w:pgSz w:w="6120" w:h="7420" />

<w:pgMar w:top="720" w:right="720" w:bottom="720"

w:left="720" w:header="0" w:footer="0"

w:gutter="0" />

</w:sectPr>

</w:pPr>

</w:p>

Each sectPr element marks the end of a section and the start of a new section. The child elements of the sectPr element provide the definition of the section just ended. All the child elements for the sectPr element are listed in Table 4.

While WordprocessingML does not have a container for sections, Word does generate sect elements that act as containers for sections. These are not part of WordprocessingML but belong to the Auxiliary XML Document 2003 namespace (“http://schemas.microsoft.com/office/word/2003/auxHint”). The sect elements (and other auxiliary elements) are discussed later in this document.

Organizing Text

The following example has multiple t elements inside an r element (for the following examples, only the body element and its children are shown):

<w:body>

<w:p>

<w:r>

<w:t>Hello, World.</w:t>

<w:t> How are you, today?</w:t>

</w:r>

</w:p>

</w:body>

Although this document is valid, duplicating the t element isn’t necessary. Therefore, the following example would give the same result as the previous one:

<w:body>

<w:p>

<w:r>

<w:t>Hello, World. How are you, today?</w:t>

</w:r>

</w:p>

</w:body>

Inserting Breaks

Typically, if you have multiple t elements in an r element, it’s because you need to insert some other element in between the pieces of text. In the following example, a br element appears between the two t elements. The br element will force the second t element to a new line when the text is displayed in Word:

<w:body>

<w:p>

<w:r>

<w:t>Hello, World. </w:t>

<w:br w:type="text-wrapping"/>

<w:t>How are you, today?</w:t>

</w:r>

</w:p>

</w:body>

The br element’s type attribute allows you to specify the kind of break (“page”, “column”, “text-wrapping”). Because the default is “text-wrapping” (a new line), the type attribute in the previous example could have been omitted. Figure 2 shows the results of using a br element between r elements.

Figure 2. A Word document with a br element between t elements

Creating Paragraphs

You use p elements to define new paragraphs (a br element with text-wrapping is equivalent to the “soft break” in Word that’s created by pressing SHIFT+ENTER and doesn’t start a new paragraph). A WordprocessingML document with text in two separate paragraphs would look like this:

<w:body>

<w:p>

<w:r>

<w:t>Hello, World.</w:t>

</w:r>

</w:p>

<w:p>

<w:r>

<w:t>How are you, today?</w:t>

</w:r>

</w:p>

</w:body>

The resulting document can be seen in Figure 3. As comparing Figures 2 and 3 shows, depending on your formatting options, the difference between using br elements and p elements may not be visible. The display of a WordprocessingML document in Word may not reveal the underlying structure of the document.

Figure 3. A Word document with multiple p elements

Tabs

The tab element allows you to position text horizontally on a line. Tab elements move the following text to the next tab stop. Exactly where on the line that will be depends on how tab stops are defined in the document.

In this example, the text will appear on a single line but with each t element’s text positioned at a separate tab stop: