CEOS Guidelines on Standard Formats and Data Description LanguagesPage 1of vi
CEOS.WGISS.DS.TN01 Issue 1.0 May 1998
CEOSWorking Group on Information Systems and Services
Data Subgroup
Guidelines on Standard Formats and Data Description Languages
Version 1.0
Draft for review
Doc. Ref.:CEOS.WGISS.DS.TN01Date:18 May 1998
Issue:1.0
Document Status Sheet
Issue / Date / Comments / EditorA / August 1996 / First issue for CEOS-FGTT review / W. Cudlip
B / April 1997 / Revised draft for general review / W. Cudlip
C / September 1997 / Version for final review / W. Cudlip
1.0 / May 1998 / Issued following no comments on Version C / W. Cudlip
Acknowledgements
This document is based on an edited version of “Technical Note on Standard Formats, Data Description Languages and Media” (LUK.502.EC21317/TN003) written by Steve Smith of Logica UK Ltd., as a result of a Data Packaging and Retrieval Study (DPRS) funded by ESA. Edited extracts from “Report for the CEOS Format Subgroup: An Inter-Use Reference Model” (CEOS-RP-NRL-SE-0006) written by Tim Fern of NRSC Ltd, UK and funded by BNSC, were also used. Additional material was provided by R. Suresh (NASA/Hughes), S. Suzuki (NASDA/EORC), H. Engels (DLR) and W. Cudlip (BNSC/DRA); and further comments by D. Ilg (NASA/Hughes).
CONTENTS
SectionsPage
1. Introduction1
1. 1 Purpose and Scope1
1. 2 Intended Readership1
1. 3 Document Structure2
1. 4 Maintenance Plan2
2. Concepts3
2. 1 Basic Concepts3
2. 2 Storage Models5
2. 3 Intermediate Data Structures7
2. 3 .1 Basic Structures8
2. 3 .2 Higher Level Structures12
2. 3 .3 Unique Structures15
2. 3 .4 Metadata15
3. Standard Generic Formats16
3. 1 Introduction16
3. 2 Comparison Criteria16
3. 3 ‘Standard’ Generic Formats18
3. 3 .1 Common Data Format(CDF/netCDF)18
3. 3 .2 Hierarchical Data Format (HDF)21
3. 3 .3 CEOS Superstructure Format25
3. 3 .4 MPH/SPH/DSR29
3. 3 .5 Spatial Data Transfer Standard (SDTS)32
3. 3 .6 Flexible Image Transport System (FITS)34
3. 3 .7 Graphics Interchange Format (GIF)37
3. 3 .8 ISO/IEC 12087 - Image Processing and Interchange39
3. 3 .9 Standard Formatted Data Units (SFDU)43
3. 3 .10 GeoTIFF47
3. 4 Formats Summary Comparison49
3. 5 Specifc Formats52
4. Data Description Languages53
4. 1 Introduction53
4. 2 ‘Standard’ DDLs55
4. 2 .1 FREEFORM55
4. 2 .2 EAST - Enhanced Ada SubSet57
4. 2 .3 MADEL - Modified ASN.1 as a Data Description Language59
4. 2 .4 PVL - Parameter Value Language61
4. 2 .5 DEDSL - Data Entity Dictionary Specification Language63
4. 2 .6 EXPRESS65
4. 3 DDL Summary Comparison69
5. Additional Information71
5. 1 Heirarchical Data Format (HDF)71
5. 1 .1 Introduction71
5. 1 .2 Scientific Data Set (SDS)71
5. 1 .3 HDF Vset74
5. 1 .4 Software Tools75
5. 1 .5 HDF Advantages79
5. 2 CEOS SAR Formats80
6. Other Aspects85
6. 1 Format Translation85
7. Conclusions and Recommendations86
APPENDIX A. REFERENCES9187
APPENDIX B. ACRONYMS93
APPENDIX C. REVISION HISTORY9689
APPENDIX C. REVISION HISTORY92
Figures and Tables
FiguresPage
Figure 2-1- Reference Model - Basic Concept______3
Figure 2-2: An Example of a Multi-dimensional Array______8
Figure 2-3: An 8-bit Image______9
Figure 2-4: Three Types of 24-bit Images______9
Figure 2-5: An Example of a Palette______9
Figure 2-6: A Ragged Array______10
Figure 2-7: A 3x3 Array of Records______10
Figure 2-8: A table as an Array of Records______11
Figure 2-9: An Index Structure______11
Figure 2-10: A Representation of a Point Data Set______12
Figure 2-11: A Swath______14
Figure 2-12: A “Label = Value” Metadata Structure______15
Figure 3-1: An Example organisation of Data Objects in an HDF File______21
Figure 3-2: The Software Interface of a HDF File______23
Figure 3-3: Schematic of the CEOS Superstructure Format______26
Figure 3-4: Schematic of an MPH/SPH/DSR Formatted File______29
Figure 3-5: Examples of MPH/SPH/DSR Media Format______30
Figure 3-6: Sample FITS Image File______35
Figure 3-7: Schematic of a GIF File______37
Figure 3-8: Interfaces Between the Parts of the ISO 12087 Standard______39
Figure 3-9: Overall Structure of the IIF-DF File______40
Figure 3-10: An SFDU Label-Value-Object (LVO)______43
Figure 3-11: An SFDU Packaged Data Product______44
Figure 4-1: A Sample MADEL Description______60
Figure 4-2: A Sample PVL Listing______62
Figure 4-3: An Example of the use of the DEDSL______64
Figure 4-4: An Example of the use of EXPRESS______67
Figure 5-1: A 3-dimensional Multi-dimensional array with dimensions 4 by 3 by 9______72
Figure 5-2: Diagram of Pathfinder AVHRR Land Data product showing 4 of the 12 layers______72
Figure 5-3: A Raster Image______73
Figure 5-4: NSIDC SSM/I Data Product______73
Figure 5-5: Data organization in V Group and UNIX file system______74
TablesPage
Table 3-1: Standard Formats Comparison______50
Table 3-2: Illustrative Systems using Standard Formats______51
Table 4-1: Data Description Language Comparison______69
Table 5-1: HDF Utilities______76
Table 5-2: NCSA Tools______77
Table 5-3: Other Public Domain Tools______78
Table 5-4: Commercial Tools______78
Table 5-5: CEOS Format File Structure Overview______82
------ ------
Blank Page
FormGuid.doc
CEOS Guidelines on Standard Formats and Data DescriptionLanguagesPage 1 of 92
CEOS.WGISS.DS.TN01 Issue 1.0 May 1998
1.Introduction
1.1Purpose and Scope
Earth Observation data are currently available in a range of different formats and there is a strong desire to standardise how such data are presented in order to improve the efficiency with which the data are handled and processed. However, format systems have different characteristics and a single format standard is not capable of satisfying all formatting needs. It has to be accepted that a number of formatting systems will be used by different agencies and different organisations for the foreseeable future.
The role of CEOS is to try to prevent the needless proliferation of format systems, encourage standardisation where possible, and ensure that format systems are developed in such a way that format translation can be performed easily, if required.
This document provides an analysis and critique of a number of standard formatting techniques that are applicable for the formatting and delivery of digital data. It also provides an analysis of current data description techniques. It is hoped that this document provides a sufficient level of detail for an application engineer to made a decision as to which technique is most appropriate for the application in hand. Links to further information are given wherever possible.
The document does not attempt to cover all formats used for scientific data sets. It concentrates on those formats which are, or are likely to be, used for Earth Observation data.
Note: This document is based on an analysis performed in the first quarter of 1995 and reviewed in late 1996 and mid 1997. It is planned that this document should be considered an evolving one with update sufficiently frequent to reflect the current situation. However, the rapid pace of developments in this field means the document cannot be guaranteed to be fully up-to-date and it is recommended that the provided WWW links be investigated to obtain the latest information.
1.2Intended Readership
The intended readership of this report is anyone that must make a decision of which particular formatting technique or data description should be used for a particular application. It is intended that this report will provide enough detail for an engineer to make a reasonable analysis and reach a decision without having to obtain the full reference material for all the various techniques. Further details can be obtained from the reference documents, of which contact information is provided for each technique discussed.
The document should also be of use to users of data who wish to understand the characteristics of the particular format used for supplied data.
1.3Document Structure
In summary, the document is structured as follows:
- Section 2 describes the basic concepts needed to understand the following sections;
- Section 3 provides an analysis of the various Standard Data Formats available;
- Section 4 provides an analysis of the various Data Description Languages available;
- Section 5 discusses other aspects related to format systems;
- Section 6 gives additional information on the two major format systems
- Section 7 gives the conclusions and recommendations
1.4Maintenance Plan
It is intended that this document should be reviewed and updated at least annually. Early in its existence more frequent revisions may be warranted. The revisions will be carried out by members of the CEOS Format Guidelines Task Team although specific experts may be called upon to review particular sections.
The first official CEOS version will be V1.0. Subsequent minor revisions will increment the number after the decimal point (e.g., 1.1, 1.2, etc.). Major revisions will increment the first digit (e.g., 2.0, 3.0, etc.). Details of the revision history are given in Appendix C.
2.Concepts
2.1Basic Concepts
This is an introduction to the basic concepts of a reference model which is useful to have in mind when evaluating the format systems and data description languages described in later sections. This text is extracted from “Data Inter-Use Reference Model” [40].
The following diagram (Figure 2.1) and text describe the entities and groups that facilitate the exchange of information. It is a deliberate attempt to abstract the problem to simple basic concepts.
Figure 2-1- Reference Model - Basic Concept
Values
These are the actual data values (bits and bytes) that correspond to the measurements and associated data. It is the unique aspect of a data set that differentiates it from every other data set. Traditionally delivered in an operating system file or tape file.
Storage Structure
This is the focus of traditional format standardisation approach, e.g. CEOS format (in particular, the CEOS product descriptions rather than the media (CCT) related descriptions). This is the structure of the data set that allows values for each field to be located and interpreted.
Traditionally delivered as a User Guide, international standard or occasionally as “self describing data,” and tends to describe basic numerical representations (i.e. IEEE float, integers, etc.).
Meaning
This is the information that the values represent, i.e. how to interpret the values as information. Traditionally delivered as a User Guide or as separate reference information.
Data Package
This is the combination of Meaning, Structure and Values. There is no implication that these three components arrive simultaneously or in the same file, but without all three, information is not transferred. All components are required to effect use of the data. All three must be provided by a data supplier to enable Inter-use of the data by the user of data sets.
Data Packages are traditionally delivered as separate fragments (i.e., they do not contain all the information needed to completely understand the data set, particularly with regard to semantic information).
The mechanics of delivery are separate from what needs to be delivered, The following describes those components.
Delivery Unit
This a single delivery of data or information, e.g. a tape, E-Mail, etc.
Delivery Packet
This is simply the segmentation of a Delivery Unit into manageable lumps for transfer, which are reassembled on arrival, e.g. a file, network packet, etc.
The two delivery concepts are introduced here to contrast and exclude them from the discussion. A delivery mechanism should transport a Data Package, part of a Data Package or several Data Packages securely and faithfully without affecting or having to understand the data.
2.2Storage Models
Ultimately, most information is stored in bytes in a linear memory addressing model. All current commercial computer systems use this model for storage in memory and on media.
A linear memory model is where memory resources are managed as one sequence of memory units (i.e. bytes). Even arrays which are multidimensional entities are stored as a linear sequence, with an addressing calculation which takes the co-ordinates and converts them into a linear address location.
Since this model is so standard , Data Description Languages (DDLs) effectively assume that all descriptions are ones of mapping information entities to the underlying linear memory model.
The purpose of DDLs is to provide an OPEN standard for data access (i.e. one not dependent of a particular machine or software tool). In this way the writer of data and the reader of data can be separate systems.
By contrast, a CLOSED data access mechanism is one where the writer and reader use the same system. For instance, all third generation computer languages hide the data organisation from the user, so in Ada the user is not aware how an array is actually arranged, but can write and recover a piece of information using its co-ordinates. The entry point to data access has changed from the bits and bytes to the utilities that access them.
The HDF format system is a closed data access mechanisms since only HDF utilities can create and access the data values.
It seems that for information inter-operation an Open system is required, however, there is a competing approach, that is to expand a closed system until all the participants are included. The difficulties of this second approach (mainly, achieving a mutually agreed standard) are what cause DDLs to be needed.
However, the Internet and more specifically the World Wide Web in effect are providing a common ‘programming’ environment where the heterogeneity of the member systems is hidden under a common programming approach.
This means that an alternative storage model can now be considered, where providers and users construct, not descriptions, but access utilities (or applets) to data. This can then be thought of as open access to closed access mechanisms, in that the readers and writers of data are constructed at the same time under the same system, but the user has access to those accessors (which encapsulate the memory model of the data being used).
To summarise, there are two forms of storage model:
•Linear memory model (MSB first, or last).
•Shared Access Utility model
In developing a formatting system to facilitate the inter-operation of Information and data, both should be considered. The first provides the most flexibility and only requires descriptions to be constructed for a data set type to become a member of the system; the second is exemplified in the guise of the WWW, where there common open access is provided but the underlying format is hidden.
In both cases, the principle is to provisionally leave the data in its native form and provide an additional description/accessor that makes the data accessible to other users. It then becomes a matter of operational choice whether the access is performed on the fly (real time) as and when the data is required; or a part of a system format translation programme.
2.3Intermediate Data Structures
A data structure study has been carried out by the EOSDIS project to identify and define common data structures necessary to support EOS and other Earth science data products; to begin to develop Application Programming Interfaces (APIs) to such common data structures; and to develop or use existing Hierarchical Data Format (HDF) interfaces to implement these APIs. This activity has helped to identify data structures commonly used by science groups, standardize and promulgate those structures, and provide common utilities to support them. As data products are implemented, the data structures and science conventions that are used in building the product will be analyzed and incorporated into the development of a complete standard data model.
As a result of the EOSDIS project’s initial data format evaluation, it was recognized that a continuing survey of data structures required by the EOS science community was needed. An initial survey of selected Version 0 Data Products to be generated by DAACs was conducted. A list of data structures was compiled based on data models developed for these data products and from other sources. The descriptions of these data structures for selected data products are described in “EOSDIS V0 FY 92 Data Structures Report.” Some additional structures have been defined since the study. The list now contains the following structures:
- Basic structures:
•Multi-dimensional Array
•Image
•Palette
•Ragged Array
•Array of Records
•Index Structure
•Collection of Structures
•Topological Structure
•Text Structure
•Document Structure
•Metadata
- High level structures:
•Point Data
•Gridded Data
•Swath Data
- Unique structures
- Metadata
For the EOSDIS Core System (ECS), the follow-on to V0, this list has been further refined into the “Data Type Taxonomy.” The Taxonomy can be found through the ECS Data Handling System (EDHS) at:
2.3.1Basic Structures
A basic conceptual structure is intended to be a simple data structure that has wide ranging applicability to many science disciplines. These structures can serve as the building blocks from which more complex discipline-specific or instrument-specific structures can be built.
This section will provide a conceptual understanding of the basic structures which were listed in the previous section. It is assumed that data format systems will evolve to provide explicit software support for all structures described below.
Multi-dimensional ArrayMulti-dimensional arrays are n-dimensional arrays of homogenous data. Each array contains only one data type and size. All but one dimension are fixed length. This structure can be used for sensor data. Processing data can be stored in a binary table which is an instantiation of the Multi-dimensional array. The Multi-dimensional array might support the equal angle grid and sparse matrices. Examples of data types that can be stored in the Multi-dimensional array are integers of 8, 16, or 32 bits, and floating point numbers of 32 or 64 bits, and possibly n bit integers where n is not a multiple of 8. Figure 2-2 is an example of an n-dimensional array where n= 3. The Multi-dimensional array is not limited to three dimensions. Multi-dimensional arrays may be defined with their dimensions in any order to optimize the storage for a certain method of access or to emulate any style of interleaving (BSQ, BIP, BIL)
Figure 2-2: An Example of a Multi-dimensional Array
ImageAn image is a two dimensional array of spatially organized measurements. Images typically contain 8- or 24-bit pixels. Image data may contain bands in different spectral wavelengths. Figures 2-3 and 2-4 give examples of image structures. An 8-bit image is generally associated with a palette (Figure 2-5).
Figure 2-3: An 8-bit Image
Figure 2-4: Three Types of 24-bit Images
PaletteA palette consists of an 8 bit lookup table which associates a color with each of 256 possible pixel values which can be stored in an 8 bit image.
Figure 2-5: An Example of a Palette