CEOS Guidelines on Standard Formats and Data Description LanguagesPage 1of vi

CEOS.WGISS.DS.TN01 Issue 1.0 May 1998

CEOS
Working Group on Information Systems and Services
Data Subgroup
Guidelines on Standard Formats and Data Description Languages
Version 1.0

Draft for review

Doc. Ref.:CEOS.WGISS.DS.TN01
Date:18 May 1998
Issue:1.0

Document Status Sheet

Issue / Date / Comments / Editor
A / August 1996 / First issue for CEOS-FGTT review / W. Cudlip
B / April 1997 / Revised draft for general review / W. Cudlip
C / September 1997 / Version for final review / W. Cudlip
1.0 / May 1998 / Issued following no comments on Version C / W. Cudlip

Acknowledgements

This document is based on an edited version of “Technical Note on Standard Formats, Data Description Languages and Media” (LUK.502.EC21317/TN003) written by Steve Smith of Logica UK Ltd., as a result of a Data Packaging and Retrieval Study (DPRS) funded by ESA. Edited extracts from “Report for the CEOS Format Subgroup: An Inter-Use Reference Model” (CEOS-RP-NRL-SE-0006) written by Tim Fern of NRSC Ltd, UK and funded by BNSC, were also used. Additional material was provided by R. Suresh (NASA/Hughes), S. Suzuki (NASDA/EORC), H. Engels (DLR) and W. Cudlip (BNSC/DRA); and further comments by D. Ilg (NASA/Hughes).

CONTENTS

SectionsPage

1. Introduction1

1. 1 Purpose and Scope1

1. 2 Intended Readership1

1. 3 Document Structure2

1. 4 Maintenance Plan2

2. Concepts3

2. 1 Basic Concepts3

2. 2 Storage Models5

2. 3 Intermediate Data Structures7

2. 3 .1 Basic Structures8

2. 3 .2 Higher Level Structures12

2. 3 .3 Unique Structures15

2. 3 .4 Metadata15

3. Standard Generic Formats16

3. 1 Introduction16

3. 2 Comparison Criteria16

3. 3 ‘Standard’ Generic Formats18

3. 3 .1 Common Data Format(CDF/netCDF)18

3. 3 .2 Hierarchical Data Format (HDF)21

3. 3 .3 CEOS Superstructure Format25

3. 3 .4 MPH/SPH/DSR29

3. 3 .5 Spatial Data Transfer Standard (SDTS)32

3. 3 .6 Flexible Image Transport System (FITS)34

3. 3 .7 Graphics Interchange Format (GIF)37

3. 3 .8 ISO/IEC 12087 - Image Processing and Interchange39

3. 3 .9 Standard Formatted Data Units (SFDU)43

3. 3 .10 GeoTIFF47

3. 4 Formats Summary Comparison49

3. 5 Specifc Formats52

4. Data Description Languages53

4. 1 Introduction53

4. 2 ‘Standard’ DDLs55

4. 2 .1 FREEFORM55

4. 2 .2 EAST - Enhanced Ada SubSet57

4. 2 .3 MADEL - Modified ASN.1 as a Data Description Language59

4. 2 .4 PVL - Parameter Value Language61

4. 2 .5 DEDSL - Data Entity Dictionary Specification Language63

4. 2 .6 EXPRESS65

4. 3 DDL Summary Comparison69

5. Additional Information71

5. 1 Heirarchical Data Format (HDF)71

5. 1 .1 Introduction71

5. 1 .2 Scientific Data Set (SDS)71

5. 1 .3 HDF Vset74

5. 1 .4 Software Tools75

5. 1 .5 HDF Advantages79

5. 2 CEOS SAR Formats80

6. Other Aspects85

6. 1 Format Translation85

7. Conclusions and Recommendations86

APPENDIX A. REFERENCES9187

APPENDIX B. ACRONYMS93

APPENDIX C. REVISION HISTORY9689

APPENDIX C. REVISION HISTORY92

Figures and Tables

FiguresPage

Figure 2-1- Reference Model - Basic Concept______3

Figure 2-2: An Example of a Multi-dimensional Array______8

Figure 2-3: An 8-bit Image______9

Figure 2-4: Three Types of 24-bit Images______9

Figure 2-5: An Example of a Palette______9

Figure 2-6: A Ragged Array______10

Figure 2-7: A 3x3 Array of Records______10

Figure 2-8: A table as an Array of Records______11

Figure 2-9: An Index Structure______11

Figure 2-10: A Representation of a Point Data Set______12

Figure 2-11: A Swath______14

Figure 2-12: A “Label = Value” Metadata Structure______15

Figure 3-1: An Example organisation of Data Objects in an HDF File______21

Figure 3-2: The Software Interface of a HDF File______23

Figure 3-3: Schematic of the CEOS Superstructure Format______26

Figure 3-4: Schematic of an MPH/SPH/DSR Formatted File______29

Figure 3-5: Examples of MPH/SPH/DSR Media Format______30

Figure 3-6: Sample FITS Image File______35

Figure 3-7: Schematic of a GIF File______37

Figure 3-8: Interfaces Between the Parts of the ISO 12087 Standard______39

Figure 3-9: Overall Structure of the IIF-DF File______40

Figure 3-10: An SFDU Label-Value-Object (LVO)______43

Figure 3-11: An SFDU Packaged Data Product______44

Figure 4-1: A Sample MADEL Description______60

Figure 4-2: A Sample PVL Listing______62

Figure 4-3: An Example of the use of the DEDSL______64

Figure 4-4: An Example of the use of EXPRESS______67

Figure 5-1: A 3-dimensional Multi-dimensional array with dimensions 4 by 3 by 9______72

Figure 5-2: Diagram of Pathfinder AVHRR Land Data product showing 4 of the 12 layers______72

Figure 5-3: A Raster Image______73

Figure 5-4: NSIDC SSM/I Data Product______73

Figure 5-5: Data organization in V Group and UNIX file system______74

TablesPage

Table 3-1: Standard Formats Comparison______50

Table 3-2: Illustrative Systems using Standard Formats______51

Table 4-1: Data Description Language Comparison______69

Table 5-1: HDF Utilities______76

Table 5-2: NCSA Tools______77

Table 5-3: Other Public Domain Tools______78

Table 5-4: Commercial Tools______78

Table 5-5: CEOS Format File Structure Overview______82

------ ------

Blank Page

FormGuid.doc

CEOS Guidelines on Standard Formats and Data DescriptionLanguagesPage 1 of 92

CEOS.WGISS.DS.TN01 Issue 1.0 May 1998

1.Introduction

1.1Purpose and Scope

Earth Observation data are currently available in a range of different formats and there is a strong desire to standardise how such data are presented in order to improve the efficiency with which the data are handled and processed. However, format systems have different characteristics and a single format standard is not capable of satisfying all formatting needs. It has to be accepted that a number of formatting systems will be used by different agencies and different organisations for the foreseeable future.

The role of CEOS is to try to prevent the needless proliferation of format systems, encourage standardisation where possible, and ensure that format systems are developed in such a way that format translation can be performed easily, if required.

This document provides an analysis and critique of a number of standard formatting techniques that are applicable for the formatting and delivery of digital data. It also provides an analysis of current data description techniques. It is hoped that this document provides a sufficient level of detail for an application engineer to made a decision as to which technique is most appropriate for the application in hand. Links to further information are given wherever possible.

The document does not attempt to cover all formats used for scientific data sets. It concentrates on those formats which are, or are likely to be, used for Earth Observation data.

Note: This document is based on an analysis performed in the first quarter of 1995 and reviewed in late 1996 and mid 1997. It is planned that this document should be considered an evolving one with update sufficiently frequent to reflect the current situation. However, the rapid pace of developments in this field means the document cannot be guaranteed to be fully up-to-date and it is recommended that the provided WWW links be investigated to obtain the latest information.

1.2Intended Readership

The intended readership of this report is anyone that must make a decision of which particular formatting technique or data description should be used for a particular application. It is intended that this report will provide enough detail for an engineer to make a reasonable analysis and reach a decision without having to obtain the full reference material for all the various techniques. Further details can be obtained from the reference documents, of which contact information is provided for each technique discussed.

The document should also be of use to users of data who wish to understand the characteristics of the particular format used for supplied data.

1.3Document Structure

In summary, the document is structured as follows:

  • Section 2 describes the basic concepts needed to understand the following sections;
  • Section 3 provides an analysis of the various Standard Data Formats available;
  • Section 4 provides an analysis of the various Data Description Languages available;
  • Section 5 discusses other aspects related to format systems;
  • Section 6 gives additional information on the two major format systems
  • Section 7 gives the conclusions and recommendations

1.4Maintenance Plan

It is intended that this document should be reviewed and updated at least annually. Early in its existence more frequent revisions may be warranted. The revisions will be carried out by members of the CEOS Format Guidelines Task Team although specific experts may be called upon to review particular sections.

The first official CEOS version will be V1.0. Subsequent minor revisions will increment the number after the decimal point (e.g., 1.1, 1.2, etc.). Major revisions will increment the first digit (e.g., 2.0, 3.0, etc.). Details of the revision history are given in Appendix C.

2.Concepts

2.1Basic Concepts

This is an introduction to the basic concepts of a reference model which is useful to have in mind when evaluating the format systems and data description languages described in later sections. This text is extracted from “Data Inter-Use Reference Model” [40].

The following diagram (Figure 2.1) and text describe the entities and groups that facilitate the exchange of information. It is a deliberate attempt to abstract the problem to simple basic concepts.

Figure 2-1- Reference Model - Basic Concept

Values

These are the actual data values (bits and bytes) that correspond to the measurements and associated data. It is the unique aspect of a data set that differentiates it from every other data set. Traditionally delivered in an operating system file or tape file.

Storage Structure

This is the focus of traditional format standardisation approach, e.g. CEOS format (in particular, the CEOS product descriptions rather than the media (CCT) related descriptions). This is the structure of the data set that allows values for each field to be located and interpreted.

Traditionally delivered as a User Guide, international standard or occasionally as “self describing data,” and tends to describe basic numerical representations (i.e. IEEE float, integers, etc.).

Meaning

This is the information that the values represent, i.e. how to interpret the values as information. Traditionally delivered as a User Guide or as separate reference information.

Data Package

This is the combination of Meaning, Structure and Values. There is no implication that these three components arrive simultaneously or in the same file, but without all three, information is not transferred. All components are required to effect use of the data. All three must be provided by a data supplier to enable Inter-use of the data by the user of data sets.

Data Packages are traditionally delivered as separate fragments (i.e., they do not contain all the information needed to completely understand the data set, particularly with regard to semantic information).

The mechanics of delivery are separate from what needs to be delivered, The following describes those components.

Delivery Unit

This a single delivery of data or information, e.g. a tape, E-Mail, etc.

Delivery Packet

This is simply the segmentation of a Delivery Unit into manageable lumps for transfer, which are reassembled on arrival, e.g. a file, network packet, etc.

The two delivery concepts are introduced here to contrast and exclude them from the discussion. A delivery mechanism should transport a Data Package, part of a Data Package or several Data Packages securely and faithfully without affecting or having to understand the data.

2.2Storage Models

Ultimately, most information is stored in bytes in a linear memory addressing model. All current commercial computer systems use this model for storage in memory and on media.

A linear memory model is where memory resources are managed as one sequence of memory units (i.e. bytes). Even arrays which are multidimensional entities are stored as a linear sequence, with an addressing calculation which takes the co-ordinates and converts them into a linear address location.

Since this model is so standard , Data Description Languages (DDLs) effectively assume that all descriptions are ones of mapping information entities to the underlying linear memory model.

The purpose of DDLs is to provide an OPEN standard for data access (i.e. one not dependent of a particular machine or software tool). In this way the writer of data and the reader of data can be separate systems.

By contrast, a CLOSED data access mechanism is one where the writer and reader use the same system. For instance, all third generation computer languages hide the data organisation from the user, so in Ada the user is not aware how an array is actually arranged, but can write and recover a piece of information using its co-ordinates. The entry point to data access has changed from the bits and bytes to the utilities that access them.

The HDF format system is a closed data access mechanisms since only HDF utilities can create and access the data values.

It seems that for information inter-operation an Open system is required, however, there is a competing approach, that is to expand a closed system until all the participants are included. The difficulties of this second approach (mainly, achieving a mutually agreed standard) are what cause DDLs to be needed.

However, the Internet and more specifically the World Wide Web in effect are providing a common ‘programming’ environment where the heterogeneity of the member systems is hidden under a common programming approach.

This means that an alternative storage model can now be considered, where providers and users construct, not descriptions, but access utilities (or applets) to data. This can then be thought of as open access to closed access mechanisms, in that the readers and writers of data are constructed at the same time under the same system, but the user has access to those accessors (which encapsulate the memory model of the data being used).

To summarise, there are two forms of storage model:

•Linear memory model (MSB first, or last).

•Shared Access Utility model

In developing a formatting system to facilitate the inter-operation of Information and data, both should be considered. The first provides the most flexibility and only requires descriptions to be constructed for a data set type to become a member of the system; the second is exemplified in the guise of the WWW, where there common open access is provided but the underlying format is hidden.

In both cases, the principle is to provisionally leave the data in its native form and provide an additional description/accessor that makes the data accessible to other users. It then becomes a matter of operational choice whether the access is performed on the fly (real time) as and when the data is required; or a part of a system format translation programme.

2.3Intermediate Data Structures

A data structure study has been carried out by the EOSDIS project to identify and define common data structures necessary to support EOS and other Earth science data products; to begin to develop Application Programming Interfaces (APIs) to such common data structures; and to develop or use existing Hierarchical Data Format (HDF) interfaces to implement these APIs. This activity has helped to identify data structures commonly used by science groups, standardize and promulgate those structures, and provide common utilities to support them. As data products are implemented, the data structures and science conventions that are used in building the product will be analyzed and incorporated into the development of a complete standard data model.

As a result of the EOSDIS project’s initial data format evaluation, it was recognized that a continuing survey of data structures required by the EOS science community was needed. An initial survey of selected Version 0 Data Products to be generated by DAACs was conducted. A list of data structures was compiled based on data models developed for these data products and from other sources. The descriptions of these data structures for selected data products are described in “EOSDIS V0 FY 92 Data Structures Report.” Some additional structures have been defined since the study. The list now contains the following structures:

  • Basic structures:

•Multi-dimensional Array

•Image

•Palette

•Ragged Array

•Array of Records

•Index Structure

•Collection of Structures

•Topological Structure

•Text Structure

•Document Structure

•Metadata

  • High level structures:

•Point Data

•Gridded Data

•Swath Data

  • Unique structures
  • Metadata

For the EOSDIS Core System (ECS), the follow-on to V0, this list has been further refined into the “Data Type Taxonomy.” The Taxonomy can be found through the ECS Data Handling System (EDHS) at:

2.3.1Basic Structures

A basic conceptual structure is intended to be a simple data structure that has wide ranging applicability to many science disciplines. These structures can serve as the building blocks from which more complex discipline-specific or instrument-specific structures can be built.

This section will provide a conceptual understanding of the basic structures which were listed in the previous section. It is assumed that data format systems will evolve to provide explicit software support for all structures described below.

Multi-dimensional ArrayMulti-dimensional arrays are n-dimensional arrays of homogenous data. Each array contains only one data type and size. All but one dimension are fixed length. This structure can be used for sensor data. Processing data can be stored in a binary table which is an instantiation of the Multi-dimensional array. The Multi-dimensional array might support the equal angle grid and sparse matrices. Examples of data types that can be stored in the Multi-dimensional array are integers of 8, 16, or 32 bits, and floating point numbers of 32 or 64 bits, and possibly n bit integers where n is not a multiple of 8. Figure 2-2 is an example of an n-dimensional array where n= 3. The Multi-dimensional array is not limited to three dimensions. Multi-dimensional arrays may be defined with their dimensions in any order to optimize the storage for a certain method of access or to emulate any style of interleaving (BSQ, BIP, BIL)

Figure 2-2: An Example of a Multi-dimensional Array


ImageAn image is a two dimensional array of spatially organized measurements. Images typically contain 8- or 24-bit pixels. Image data may contain bands in different spectral wavelengths. Figures 2-3 and 2-4 give examples of image structures. An 8-bit image is generally associated with a palette (Figure 2-5).

Figure 2-3: An 8-bit Image

Figure 2-4: Three Types of 24-bit Images

PaletteA palette consists of an 8 bit lookup table which associates a color with each of 256 possible pixel values which can be stored in an 8 bit image.

Figure 2-5: An Example of a Palette