Lab Supplement: Open XML in C++Lab 11

Lab11: Open XML in C++

Using the .NET Framework and C++/CLI

Open XML programming has been simplified as a result of powerful software frameworks like the .NET Framework 4.0. If your application is written in C++, one option you have for Open XML development is to use existing ZIP and XML libraries from vendors or the community. Another option is to use the .NET Framework 4.0 in your C++ application.

In this lab you will learn how to integrate the Common Language Runtime into a C++ application using the C++/CLI language features. C++/CLI are a set of extensions to the C++ language that are an Ecma standard ( C++/CLI extends C++ to add features available in more modern programming environments such as garbage collection, iterators, properties, etc. You will also learn how to use the class library in the .NET Framework 4.0 from C++..

The goal of this lab is to write a word counting utility for WordprocessingML files. Utilities that count words in ASCII files are common in various operating systems but are of no use for richly formatted documents like those in WordprocessingML format. The utility you will create will be capable of counting paragraphs, runs of text, words and characters.

You will start the lab with a skeleton that is written in ANSI conformant C++. The skeleton defines a few data structures and has prints a report into the standard output stream but is missing all logic to read a WordprocessingML file and count the various objects.

You will then add a new module that uses the C++/CLI extensions and the result will be a mixed mode application that is part native and part managed. This module uses System.IO.Packagine and System.Xml to do the majority of the work and it builds on the knowledge you have acquired in the previous labs.

Exercise 0: Getting Started

In this exercise, you will open the OfficeOpenXML solution, which contains the starting points for all of the labs.

To open the Office Open XML solution:

  1. In Visual Studio 2010, on the Filemenu, select to Open, and then click Project/Solution.


If Visual Studio 2010 is not open, click Start, point to All Programs, point to MicrosoftVisual Studio 2010,and then click MicrosoftVisual Studio 2010.

  1. In the Open Project dialog box, navigate to the C:\OfficeOpenXML\Labs\Starter folder, select Starter.sln, and then click Open.
  2. Locate the Lab11 project in the Solution Explorer. All of the following exercises will be done using this project.

If theSolution Explorer is not visible, click View and select Solution Explorer to open it.

  1. Right click Lab11 in the Solution Explorer and select Set as StartUp Project. This will cause all commands such as Build and Debug to act on this project.

Exercise 1: Configuring the C++ project for CLR integration.

In this exercise you will configure your project so you can begin to write code in C++/CLI.

You will start with a project that was created as a Win32 Console Application. One of the modules (count.cpp) has been written as a stub for doing the work of counting objects in a WordprocessingML document. You will convert this module to use the Common Language Runtime (CLR).

There are several compiler settings that Visual C++ adds by default to Win32 project types that are not compatible with the CLR. The compiler and linker will warn you of any such conflicts. In this exercise you will go through the steps needed to make the project compatible with the CLR.

To validate that the program has successfully loaded the CLR you will make a trivial call to System.Console.WriteLine which in C++/CLI syntax is represented as System::Console::WriteLine.

Configure the project’s compiler settings for CLR compatibility:

  1. UsingSolution Explorerin Visual Studio, select Lab11.
  1. Right click the Lab11 project and select Properties.

The Property Pagesdialog box should have appeared displaying the compiler, linker and other settings global to the project.

  1. Click the Configurations button and select All Configurations.
  2. Navigate the configurations tree to Configuration Properties | C/C++ | General.
  3. Add the path to the .NET Framework 4.0 to the Resolve #using References setting. In a default installation to the C: drive, the default location is:“C:\Program Files\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0”.

In particular you are adding the location of WindowsBase.DLL which is the name of the .NET assembly where System.IO.Packaging is implemented.

  1. Click the Configurations button and select Debug.
  2. Navigate the configurations tree to Configuration Properties | C/C++ | General.
  3. Set Debug Information Format to Program Database (/Zi).

Configure count.cpp compiler settings for CLR compatibility:

  1. Using Solution Explorer in Visual Studio, select source code file count.cpp.

Count.cpp is the stubbed module where you will be writing new code using C++/CLI. In its initial state this file is compiled to native code. The steps outlined in this task will change the compilation of this file to produce managed code (MSIL).

  1. Right click on count.cpp and select Properties.

The Property Pages dialog box should have appeared displaying the compiler, linker and other settings for the count.cpp module.

  1. Click the Configurations button and select All Configurations.
  2. Navigate the configurations tree to Configuration Properties | C/C++ | General.
  3. Set Common Language Runtime Support to Common Language Runtime Support (/clr).

This is the primary switch that enables use of the .NET Framework inside a C++ application. All other changes being made to compiler settings are in order to be compatible with this switch.

  1. Navigate the configurations tree to Configuration Properties | C/C++ | Code Generation.
  2. Set Enable minimal rebuild (/Gm) to No.
  3. Set Enable C++ exceptions (“/EH”) to Yes With SEH Exceptions (/EHa).
  4. Set Basic Runtime Checks (/RTC) to Default.
  5. Navigate the configurations tree to Configuration Properties | C/C++ | Precompiled Headers.
  1. Set Precompiled Header to Not Using Precompiled Headers.

Precompiled headers generally require all code modules to be compiled with the same compiler options. Because you will be compiling the count.cpp module using the /clr switch, you cannot reuse the precompiled header that was used for native code compilation.

In an actual production development scenario you would want to establish a configuration where you could use precompiled headers for managed code. This could be done by having different precompiled header or consolidating the majority of managed code into separate DLLs.

Add a trivial call to the .NET Framework:

  1. Open the file count.cpp in Visual Studio.
  2. Add a reference to the .NET Framework base class library System.Dll.

#include"stdafx.h"
#include"count.h"
// Add references to the .NET Framework
#using"System.Dll"

usingnamespace System;

  1. Locate function wordml_count.
  2. Add a trivial call to System::Console::WriteLine

// trap all errors in a single try/catch

try

{

// marshal native string file name to managed type

String ^fileName = gcnew String(file);

Console::WriteLine("Counting {0}", fileName);

}

catch (Exception^ e)

{

Console::WriteLine(e->Message);

returnfalse;

}

catch (...)

{

returnfalse;

}

returntrue;

At this point you will begin to see the C++/CLI extensions in action. One of the first things you see is the use of the ^ operator which is used to declare a handle to a managed object on the garbage collected heap. Because the file name type being passed to the wordml_count function is a native C++ string, it is necessary to marshall this to a managed string type System::String. C++/CLI provides facilities for marshalling and unmarshalling data across the native and managed boundary.
For performance considerations, it is generally recommended to cross between managed and unmanaged in a coarse grained fashion. In the case of Open XML this might be done on file read and write operations.
The other thing you will notice is that error hadling is done in a monster try/catch block that is not very sophisticated. This is so that the code in this lab will be easier to read
Finally, you should also notice that this code is no calling the traditional printf function to output to the console but rather the .NET equivalent System::Console::WriteLine. Because at the top of the module you declared use of namespace System it is not necessary to fully qualify the name of the class, and so you see it as just Console rather than System::Console.

Run and verify the results

  1. Build the project by opening the Build menu and clicking Build Lab11.
  1. Open a Windows Command Prompt and change directory to C:\OfficeOpenXML\Labs\Starter\Lab11.
  2. Run debug\lab11 sample.docx at the command line. You should see output similar to the one below.


If you wish to run the program in Debug mode from within Visual Studio, the project has been saved with a default input parameter of “debug\sample.docx”.

C:\OfficeOpenXML\Labs\Starter\Lab11>debug\lab11 sample.docx
Counting sample.docx
0 0 0 0 sample.docx
0 0 0 0 totals

Exercise 2: Implementing the WordprocessingML counting function

Once you have successfully integrated the CLR into your C++ application you can now reuse the code and samples that are being created for Open XML programming in .NET. If the sample code is written using another .NET compatible language such as C# or VB.NET you can either link to those libraries or you can port the code to C++/CLI in a straightforward manner.

In this example we have already shown how a native C++ type can be marshalled to a managed type. Notice that the Counters class was declared as a native class. The print_report function in Lab11.cpp access the members exactly as you would expect. Some managed languages are designed to keep the developer to use managed code and managed types exclusively. The power of C++/CLI is that the developer has freedom to mix and match managed/native code and managed/native types freely. In this exercise you will see how in managed code you can access the same native Counters class.

The word counting function relies heavily on XPath to inspect the contents of the document.

Add references to System.Xml and System.IO.Packaging:

  1. Open the file count.cpp in Visual Studio.
  1. Locate the place where you previously added a reference to System.Dll.
  2. Add a reference to the .NET assemblies.

#using"System.Dll"

#using"WindowsBase.Dll"// System::IO::Packaging

#using"System.Xml.Dll"// System::Xml

  1. Add refences to namspaces you will use in this module. This results in more concise source code.

usingnamespace System;

usingnamespace System::Text;

usingnamespace System::Xml;

usingnamespace System::Xml::XPath;

usingnamespace System::IO;

usingnamespace System::IO::Packaging;

Add a function to return the WordprocessingML document in an XmlTextReader:

  1. Locate the place immediately below the previous step.
  2. Declare string constants for WordprocessingML URIs.

refclass OfficeResource

{

public:

static String^ DocumentRelationshipType = "

static String^ WordmlNamespace = "

};


Because these string constants will be declared as managed types, you create them as a ref class which is how C++/CLI declares a garbage collected type.

  1. Add a function to locate the main document part and wrap it in an XmlTextReader to parse the XML contained in the file.

XmlTextReader^ GetToDocPart(Package^ myPackage)

{

XmlTextReader^ doc = nullptr;

// Get the main document part (document.xml).

foreach (PackageRelationship^ relationship in

myPackage->GetRelationshipsByType

(OfficeResource::DocumentRelationshipType))

{

// There should only be one document part in the package

Uri^ documentUri =

PackUriHelper::ResolvePartUri(gcnew Uri("/",

UriKind::Relative), relationship->TargetUri);

PackagePart^ documentPart =

myPackage->GetPart(documentUri);

doc = gcnew XmlTextReader(documentPart->GetStream());

break;

}

return doc;

}


The choice of XmlTextReader instead of XmlDocument is done because word counting will be a forward read-only operation and it is more efficient for this purpose.

Implement the counting function for WordprocessingML:

  1. Locate the wordml_count function
  2. Remove the call to Console::WriteLine inside the try block.

try

{

// marshal native string file name to managed type

String ^fileName = gcnew String(file);

Console::WriteLine("Counting {0}", fileName);

}

catch (Exception^ e)

{

  1. On the current line inside the try block, open the file using the System.IO.Packaging library.

// Open WordML file for read

Package^ pkg = Package::Open(fileName, FileMode::Open, FileAccess::Read);

  1. Get the main WordprocessingML document part and create an XPath navigable structure for it.

// Locate main document part

XmlTextReader^ xmlDoc = GetToDocPart(pkg);

XPathDocument^ xpathDoc = gcnew XPathDocument(xmlDoc, XmlSpace::Preserve);

XPathNavigator^ nav = xpathDoc->CreateNavigator();

  1. Because all the WordprocessingML elements are in a separate namespace, create a XmlNamespaceManager for use with XPath expressions.

// locate runs of text

NameTable^ nt = gcnew NameTable;

XmlNamespaceManager^ nsManager = gcnew XmlNamespaceManager(nt);

nsManager->AddNamespace("w", OfficeResource::WordmlNamespace);

  1. In WordprocessingML, a paragraph contains multiple runs of text. Create an XPath expression to find all paragraphs and another one to locate the runs of text contained in the paragraph.

// define XPath expression to find paragraphs

XPathExpression^ paraExpr = nav->Compile("//w:p");

paraExpr->SetContext(nsManager);

// define XPath expression to find runs of text

XPathExpression^ runExpr = nav->Compile("descendant::w:r/w:t");

runExpr->SetContext(nsManager);

  1. Implement the word counting logic by iterating through paragraphs and runs of text. In WordprocessingML it is possible for a single word to be split into multiple runs of text. For example if there are multiple formats on the same word. To get an accurate word count this function merges all the runs into a single string representing the paragraph and then applies the Split function to count words.

// start iterating through paragraphs

XPathNodeIterator^ paragraph = nav->Select(paraExpr);

counts->para_count = paragraph->Count;

while (paragraph->MoveNext())

{

// start iterating through runs of text

XPathNodeIterator^ run = paragraph->Current->Select(runExpr);

// add the number of runs in this paragraph to total

counts->run_count += run->Count;

// append all runs together

StringBuilder^ paragraphTextSb = gcnew StringBuilder();

while (run->MoveNext())

{

paragraphTextSb->Append(run->Current->Value);

}

String^ paragraphText = paragraphTextSb->ToString();

// Split the line into words

array<String^>^ words = paragraphText->Split(nullptr);

// count the number of words

for (int w = 0; w < words->Length; w++)

{

if (words[w]->Length > 0)

{

// Don't count empty strings

counts->word_count++;

}

}

// count the number of characters in the run

counts->char_count += paragraphText->Length;

}

  1. C++/CLI provides an elegant way to clean up CLR resources that might be consuming memory. In .NET resource management is typically done using the Dispose pattern. C++/CLI provides language support for Dispose through the delete operator when used with reference types.

// free unwanted expensive resources

delete xmlDoc;

pkg->Close();

delete pkg;

Run and verify the results

  1. Build the project by opening the Build menu and clicking Build Lab11.
  1. Open a Windows Command Prompt and change directory to C:\OfficeOpenXML\Labs\Starter\Lab11.
  2. Run debug\lab11 sample.docx at the command line. You should see output similar to the one below.


If you wish to run the program in Debug mode from within Visual Studio, the project has been saved with a default input parameter of “debug\sample.docx”.

C:\OfficeOpenXML\Labs\Starter\Lab11>debug\lab11 sample.docx
13 79 2057 12033 sample.docx
13 79 2057 12033 totals

  1. Compare your results to Microsoft Word

Notice that the number of paragraphs reported by Microsoft Word is different. Microsoft Word ignores empty paragraphs for purposes of counting paragraphs. However these paragraphs exist in the document and are represented in WordprocessingML. As an exercise you can modify the lab sample to use the same logic as Microsoft Word.

To finish up:

  1. On the File menu, click Save All.
  2. On the File menu, click Close Solution.

Office Open XML1