NATO UNCLASSIFIED

AC/322(CP/3)WP(2011)0005-REV1

CONSULTATION, COMMAND AND CONTROL BOARD

CIVIL/MILITARY SPECTRUM PANEL (CaP 3)

Spectrum Tools Configuration Control Team (STCCT)

COMPOSITE XML SOURCE FILE PROJECT

1.  The attached working paper describes a new approach from the NHQC3S/SC3IB Staff for maintaining the SMADEF-XML documentation and XML Schema, which allows the automatic generation of all technical and user documents from a single source (set of XML files).

2.  The members of the STCCT are invited to review this paper in order to ensure that all XML constructs necessary to SMADEF-XML are covered. Any comments should be provided during the 2nd STCCT meeting on 04-06 October 2011.

S. BASSO

NHQC3S / SC3IB

NATO UNCLASSIFIED

-2-

NATO UNCLASSIFIED

SMADEF-XML

COMPOSITE XML SOURCE FILE PROJECT

TABLE OF CONTENTS

1. Executive Summary 2

2. Objective 2

2.1 Background 2

2.2 Requirements 2

2.3 Additional Requirements, not yet implemented 3

3. Implementation – Overview 3

3.1 Types of Documents 3

3.2 Definitions 3

3.3 Software required 4

3.4 Source documents 4

3.5 Generation of the XML Schema 5

3.6 Generation of the HTML documents 5

3.7 Generation of the PDF documents 5

4. Implementation – Details 5

4.1 Overall Schema 5

4.2 Project 6

4.3 Metadata 7

4.4 Domain 7

4.5 Enum 8

4.6 CodeList 9

4.7 Data Items 9

4.8 Groups 11

4.9 Element and Dataset 12

4.10 Type 12

4.11 Extension elements 13

4.12 Documentation 15

1.  Executive Summary

This paper describes a new approach from the NHQC3S/SC3IB Staff for generating the SMADEF-XML documentation and XML Schema, which allows the automatic generation of all technical and user documents while separating the technical complexities of the Schema from the user-oriented or conceptual model. This conceptual model uses a XML language which is easier to handle than the full XML Schema specification while allowing the required combination of technical and user definitions.

The conceptual model forms a set of composite XML source documents (called composite because of their mixed technical/user content). Along these documents, a set of XSL Stylesheets are also developed, which allow transforming the composite source documents into either XML Schema or User Manual (including automatically generated diagrams of the SMADEF-XML structure).

2.  Objective

2.1  Background

During the development of the previous versions of SMADEF-XML, it became rapidly apparent that due to the complex and evolving nature of the format it would be very difficult and time consuming to ensure a perfect alignment between the user manual and the technical implementation of the schema. The maintenance of the user manual itself is also a challenge. The main problems are to ensure a consistent formatting, correctness of all the hyperlinks between elements and their children and parents, manual insertion of diagrams from third-party tools such as XMLSpy.

2.2  Requirements

The main requirements at the origin of this development are:

  1. Use a single master document (or a set of documents to cope with volume) containing all information: Schema, user documentation, application specific information such as O/R mapping, etc;
  2. Be able to automatically generate the HTML user document with consistent styles (through the use of CSS) and automated content whenever possible (such as input rules) for all elements; this includes automatically generated hyperlinks between elements (parents, children), eliminating risks of forgetting to update/add/remove links after each structure change.
  3. Be able to automatically generate diagrams of the conceptual model to eliminate the need for third-party tools involving manual copy of information and formatting; the diagrams will naturally be generated in SVG format (Scalable Vector Graphics) since the SVG specification is a member of the XML / XSD / XSL family, and a SVG file is an XML document.
  4. Be able to automatically generate the XML Schema (XML Schema Definition, or XSD files), eliminating risks of discrepancy between Schema and user document;
  5. Provide an abstraction layer between the conceptual model and the real schema (users complain that the current user manual is too complex); automatically add systematic metadata such as classification and remarks attributes when generating the XML Schema.
  6. Use a more condensed form than XML Schema for the conceptual model, easier to develop and to understand.
  7. Provide a mechanism to describe separately the core (SMADEF) and national extensions (e.g. SSRF), and to produce distinct or merged documentation and schemas.

2.3  Additional Requirements, not yet implemented

  1. Model for XSL checks (ID0001 checks are shown in the user manual but there is no automatic XSL generation. May be difficult to implement.
  2. Friendly Editor to review the conceptual model without having to dig into an XML document (under development, but low priority).

3.  Implementation – Overview

3.1  Types of Documents

The project consists of a new XML Schema used as the reference for the Composite XML Source documents, XSL Stylesheets to transform this source in different output formats (XML Schema, user document), and the Composite XML documents themselves.

During this development, it became quickly difficult to distinguish what was “attributes” and “elements” in the composite document versus the real (final) SMADEF-XML schema. Therefore a new vocabulary has been developed and is used in the Composite XML Source and its associated XML Schema:

3.2  Definitions

Domain Simple type derived from an XML Schema atomic type, with facet restrictions; implemented in XSD as xs:simpleType with restrictions. See details in section 4.2.

Enum Format for a Data Item with a multiple occurrence of its data (generally from a code list); implemented in XSD as a xs:list. See details in section 4.3.

CodeList Simple type with a restriction and an enumeration. See details in section 4.4.

Data Item[1] Atomic data entry, single-occurring within an element, complemented with metadata attributes as required [2]. The Composite XML Source contains ItemDef (global definitions and user explanations for these data items which may be re-used in several complex elements); Item (local definition of a data item within an element); ItemRef (local use within an element of a globally defined Data Item). See details in section 4.5.

Group Group of data items re-used in several elements, or having a condition such as “both data items must be filled”. The Composite XML Source contains GroupDef (global definition of a Group) and GroupRef (local use within an element of a globally defined Group). See details in section 4.6.

Type Complex type composed of Data Items, Groups, and Elements. A Type may contain either a sequence of Datasets (only used for Body), or a sequence of Item, ItemRef, GroupRef, followed by Elements and XOR. See details in section 4.7.

Element Instance of a Type. See details in section 4.8.

Dataset Specialised version of an Element, used only under Body. See details in section 4.8.

3.3  Software required

In principle, XML and XSLT technologies could completely be handled with a text editor and some freeware packages. For XML files manipulation, the Staff is using the open source Notepad++ editor with the XMLTools add-on.

However, using additional tools provide more opportunities to check the syntax and to debug the XSL Transformations.

The composite source XML is converted into various output documents (HTML, SVG, PDF, XSD) using the XSLT technology. An XSLT processor implementing XSLT 2.0 and XPath 2.0 is necessary to run the XSLT stylesheets. Several such XSLT processors are available, both commercially and freeware. The Staff is using successfully Altova XmlSpy 2011 as it contains an XSLT processor and debugger (versions before 2008 implemented only XSLT 1.0 so they will not work). The freeware version, AltovaXML Community Edition, works also well for batch processing the files from a command line.

The generation of PDF files requires an additional freeware which can handle XSL-FO technology. The Staff is using the free Apache Formatting Objects Processor (Apache-FOP).

3.4  Source documents

The composite source files are a set of XML files where the elements are broadly grouped by COIs, plus four additional files containing respectively Domains, shared Types, CodeLists, and a Files.xml which can be seen as a “project file”. These XML files were primarily derived from the version 2.1.0 XSD’s, and are being reviewed to insert the user manual information and to align them with the USA proposals discussed during the STCCT 11-1 meeting.

The transformations are done by running one of the XSLTs from paragraphs 3.5 (HTML) or 3.6 (PDF) on Files.xml. In addition, some utility XSL files have been developed:

-  Generate_pseudoschema_bare.xsl produces the same set of source XML documents stripped from all user input entries, keeping only the technical definitions. This is a development tool allowing developers of the schema to have a better overview of the structure.

Generate_errors.xsl generates a file error.htm which will show cross-reference problems in the pseudo schema, such as an ItemRef without base ItemDef or Elt with an undefined Type. This is a development tool allowing developers of the schema to quickly check for potential problems.

-  The HTML and PDF generation use manual_common.xsl which contains the common logic for both outputs. All stylesheets also use common.xsl which contains a set of utility functions and global variables.

-  All transformations can be performed as a batch using AltovaXML and a script generate_all.cmd.

3.5  Generation of the XML Schema

Generate_schema.xsl produces the XML Schema as a set of XSD files.

3.6  Generation of the HTML documents

Manual_html.xsl generates a set of HTML and SVG files which will constitute the user manual. Due to its complexity, it has been splitted in modules and will call manual_hhk_index_toc.xsl and manual_svg.xsl.

One separate HTML is produced for each Type, ItemDef, CodeList and SVG diagram. The main file (default.htm) is a <frameset> containing a top banner, a left pane showing either a table of contents (ToC) or an index, and the main frame displaying the current documentation content.

Notes: Javascript must be enabled in the browser to allow the ToC and index to work properly. The SVG display requires one of the latest versions of the browsers (Firefox 4 or IE 9).

3.7  Generation of the PDF documents

Manual_pdf.xsl generates a XSL-FO version of the manual. Most of the formatting (except where the outputs are really different, such as the ToC) is firstly done in HTML then converted in XSL-FO objects through the html-to-fo.xsl style sheet.

The resulting XSL-FO document must then be run through a PDF formatter such as Apache FOP.

Notes: The Apache FOP can be called from XmlSpy if properly installed and configured. It requires a Java JRE 1.6 installed. In addition the file fop.bat initiating the call to the FOP engine via Java had to be modified to increase the amount of memory usable by the Java JRE, using the command line parameter “-Xmx512m” (maximum memory 512 MB).

The SVG graphics are integrated as in-line graphics. A XSL-FO / PDF limitation is that the hyperlinks from inside the graphics to other parts of the document don’t work; therefore a workaround as been applied, by listing all referenced elements on the top of the page prior to the graphic.

The supported HTML elements are: <p>, <br/>, <b>, <i>, <ol> (not tested), <ul> (nested up to 2 levels only), <dl> (not tested), <table> (colspan implemented, but not rowspan), <img> (not tested), <a href>. Special tags, common with the HTML output, are also implemented (see paragraph 4.10.3).

The Index at the end of the PDF document is a 2-level index showing:

-  Each Type in bold

-  Each data item, with in this case a second level when they are re-used indicated the Type where they are used.

4.  Implementation – Details

4.1  Overall Schema

The diagram below provides an overview of the different elements in the schema. The following paragraphs will describe the purpose and behaviour of each element.

Notation: Elements from the source XML definition are noted in italics without prefix such as <Domain>; elements in the XML Schema are noted with the widely used “xs:” prefix such as <xs:complexType>; example elements from a SMADEF message are displayed as Location.

4.2  Project

This element, which must be the only element present in one XML file, contains the parameters for the entire project (core and extensions).

Namespace contains the URI used to define the resulting Schema namespace. For the core, it will be “urn:int:nato:standard:smadef:3.0.0”.

<Extension> contains the tag used for the extension elements, which will be merged with the core to form the Schema and documentation (e.g. “SSRF”).

<Filter> contains a space separated list of elements to be published in the HTML or PDF documentation. This is to allow partial rendering of the documentation during the development.

The elements <Folders> and <Files> indicate which XML files compose the source and where to store the generated documents.

The element <Extras> contains the text of the banner (header) to be displayed on top of HTML and PDF pages, and the name of the HTML file to be inserted as cover page.

4.3  Metadata

Each “leaf” element (single elements with content and no more child elements), and the Common abstract type, will contain a set of standard metadata attributes (classification, link to remarks, etc). These attributes are described in a list of Attr elements under <StdMetadata>:

Schema Generation: All Metadata entries are grouped in an xs:attributeGroup name=”metadata”> which is included in all leaf xs:complexType>.

-  An entry with domain=”nnn” generates a <xs:attribute type=”nnn”/>;

-  An entry with codelist=”AA” generates a <xs:attribute type=”ListAA”/>.

Documentation Generation: The StdMetadata element generates a manual page similar to a Type (see paragraph 4.11).

4.4  Domain

Each domain represents the format information for atomic data items. It contains the following attributes, which are translated into facets in the XML Schema:

-  base is basically one of the intrinsic XML Schema types, using the following short codes: C (string), UC (uppercase string), UI / SI (unsigned / signed integer), UD / SD (unsigned / signed decimal), D / DT (date / date-time), P (pattern);