MTPII-MATER Data Management Final Report

1/22

MTP II- MATER Data Management

1996-1999

(MAS3-CT96-0051)

Final Report

C. MAILLARD1, M. FICHAUT1, A. GIORGETTI3, E. BALOPOULOS2, S. IONA2, A. LATROUITE1, B. MANCA3, P. NICOLAS4 and J-A. SANCHEZ-CABEZA5

1IFREMER/SISMER, BP 70, 29280 Plouzané, France

2NCMR/HNODC, Hellinikon, 16604 Athens, Greece

3OGS, PO Box 2011, 34016 Trieste, Italy.

4SAFEGE/CETIIS, 30 Av. Malacrida, 13100 Aix en Provence, France

5 Universitat Autònoma de Barcelona,08193 Bellaterra,Spain.

CONTENT

1.Introduction......

1.1.Summary of the Objectives and Methodology......

1.2.Data Management Structure and role......

1.3.Data and Meta-data Circulation......

2.Meta-data Management - WWW Catalogues......

3.Data sets Archived......

3.1.Physics......

3.2.Biochemistry......

3.3.Other specific parameters......

4.Quality Assurance......

4.1.Definition of Data Protocol......

4.2.Implementation : Data Formatting and Qualification......

4.3.Data Experts Review......

5.MTPII-MATER Database on CD-ROM......

6.EVENTS......

6.1.Data Management Meetings......

6.2.Participation to Scientific and technical Meetings......

7.Conclusion......

8.References......

9.Annexes - Regional Data Management Reports......

TMSI/IDM/SISMER/00-017 - février 2000

MTPII-MATER Data Management Final Report

1/22

1.Introduction

1.1.Summary of the Objectives and Methodology

The objective of the Data Management Workpackage, was to insure that the precious multidisciplinary data sets collected during the 105 sea cruises (about 1000 days at sea) and on the 126 mooring lines from 1996 to 1999, would be easily exchanged among the participants, safeguarded for further use and published on the best media available at the time of the project. As currently made during major international projects, an operational data management structure has been defined to carry out the corresponding tasks. It has been a regionally distributed structure.

A preliminary task has been to define a common protocol for data formatting and quality checking, so that data of the same type collected by different teams, would be coherent and comparable, whatever their sources. This protocol has been improved all along the project implementation.

The quality assurance and the safeguarding are improved by archiving the data as soon as possible after the data collection and scientific validation. The methodology for data tracking and for fastening the circulation of information and data has been to maintain catalogues by regular contacts with the scientists and to publish them on internet in a standardised form at each regional data centre. The data have been organised in "basic parameters" which are parameters useful for all disciplines, and other "specific parameters", for which no any agreed standards exist. Both are archived, but only the basic parameters are submitted to the full quality assurance protocol.

To meet a wide potential range of users, a prototype of the Cdrom has been prepared and demonstrated in the Perpignan final workshop for first evaluation; it is joint to this report.

1.2.Data Management Structure and role

The data management structure, schematised in Fig. 1, includes three regional archiving centres and two operational groups.

Fig 1 : MTPII-MATER Data Management Structure

The three regional centers are :

National Center for Marine Research, HNODC, Greece (Eastern Basin)

Osservatorio Geofisico Sperimentale, Italy (Adriatic/Ionian Basin)

IFREMER/SISMER, France (Western Basin and co-ordinator data manager)

They had the tasks to

  1. Develop the common protocol
  2. Compile the meta-data and disseminate on WWW
  3. Request copies of the data from the source laboratories, and process them for archiving in conformity with the data management protocol.

The Animation Task (CETIIS, France) maintained the cruise schedule and on line synthesis of the data status for the project management.

The Data Quality Group supervises the validating methods, issuing the appropriate qualification for each data set.

1.3.Data and Meta-data Circulation

The circulation of information (meta-data) and data (illustrated in Fig. 2) within the data management structure was the following :

  1. Search for cruise schedule, both from project and national authorities
  2. Request meta-data : summary reports for cruises (ROSCOP), mooring, instrument, data sets (EDMED) by sending standardised forms
  3. Request data from these reports, reformat, safeguard, check for quality
  4. Publish up to date catalogues of data and meta-data on WWW servers
  5. Disseminate data and meta-data according to the project policy.

Fig 2: CIRCULATION OF DATA & INFORMATION DURING MTP II-MATER

Rectangles : organism or person Ellipses: Deliverables - Services - Products

2.Meta-data Management - WWW Catalogues

The cruises, moorings, instruments and data sets summary reports have been archived and made available, without any restriction on the data management WWW servers At any time it has then be possible to get a complete visibility of the fieldwork and the data sets collected. These web servers are :

Western Basin :

Adriatic/Ionian :

Eastern Basin:

and synthesis made by the Animation:

On each of these web sites, a common data management home page has been developed, with access to all the standardised cruise, mooring, instrument and data reports, and links to the other centres and project web sites (Fig. 3).

Fig 3: Data Management home page of a Regional Data Centre

Clicking on “Cruises Summaries” returns the cruises listed by year, and clicking on any of the cruise returns the corresponding report including ship tracks, list of collected data, list of archived data (with location). The 105 sea cruises reports can be downloaded.

Clicking on "Moorings Summaries " returns a list linked with similar reports as for cruises reports (126 mooring reports).

Clicking on "Instruments Summaries " returns a list of the main instruments used during the project, the laboratories and information on the sensors calibrations.

Clicking on "Data sets Summaries " returns the list of the three main groups in which the data sets have been organised to encompass the difficulties due to the complexity and overlapping of the various disciplines:

Physics

Bio-chemistry

Specific parameters (miscellaneous non standard parameters)

These groups are themselves sub-divided into sub-groups (Fig. 4) depending on the method/sensor type (ex: CTD, Lagrangian floats, current meter time series for physics) or the compartment (ex: dissolved, particulate, settling particles, sediment, pore water for chemicals).

Fig 4: WWW page of the data sets catalogue for Central Basin

The first two groups correspond to basic parameters. Clicking on any item returns the corresponding list of cruises/moorings lists where these data have been collected and the data location.

3.Data sets Archived

The physical and biochemical basic parameters data have been reformatted at a uniquecommon format (MEDATLAS) and full quality assurance procedure have been applied before archiving. The other specific parameters have been only safeguarded. A brief synthesis of the archived data is presented here below, in the state available for the Perpignan meeting. In fact some more data are still in process of archiving and it does not represent the final content of the database.

3.1.Physics

VERTICAL PROFILES / TIME SERIES
3013 CTD / CTD TIME SERIE
572 XBT / 21 THERMISTOR STRING
461 ADCP VERTICAL PROFILES / 110 CURRENT METER
LAGRANGIAN FLOAT

The positions of the observations are reported on Fig. 5 for the vertical profiles, Fig.6 for the time series on fixed moorings and Fig. 7 for Lagrangian time series.

3.2.Biochemistry

VERTICAL PROFILES / TIME SERIES
1473 BOTTLES STATIONS / 37 SEDIMENT TRAPS
269 BIOLOGICAL STATIONS

The positions are reported on Fig. 8.

3.3.Other specific parameters

Additional or specific data (meteorological, biological..), have been only marginally archived, without any reformatting or quality check. They are available only at the original source formatof the scientific file, and can be extracted by cruise file only. The positions are not reported.

Fig. 5: Vertical Profiles of Physical Observations - Stations

Fig. 6: Time series of Physical Observations - Moorings

Fig. 7: Lagrangian Time series of Current

Fig. 8: Biochemical Data in station and sediment traps

4.Quality Assurance

Quality assurance (QA) has been a high point of the MTPII-MATER data management. The data have been first scientifically validated in the scientific laboratories; then copies are transmitted to the archiving centre where they are reformatted at a unique common format (extended MEDATLAS), checked for quality (QC) and safeguarded. Even if the validation is under the responsibility of the scientists, these last QC made before final archiving, allowed to crosscheck the data and insure that no errors have been introduced during the data reformatting. This procedure is in accordance with the recommendation of the international organisation like UNESCO/IOC and MAST and the practises of the other major international projects like WOCE and JGOFS.

Basically, QA procedure included 3 tasks:

definition of a common protocol for formatting and QC in accordance to the international standards

implementation of the protocol for archiving basis parameters

validation of the procedures by a Data Quality Expert group.

In each regional data centre, it was very important to follow standardised procedures of quality assurance to prepare coherent data sets that can be used by the whole community. Several hundred of parameters have been measured within MTPII-MATER, among them 105 basic parameters of physics and biochemistry were to be shared by the participants and later on, by other users. The quality assurance procedure is focused on these parameters. For the other parameters, resulting from new sensors or methodologies, or for which international agreed standards did not exist, the QA procedure was not applicable.

4.1.Definition of Data Protocol

The MTPII-MATER protocol (1) to handle the information and the data, is based on the international recommendations of UNESCO/IOC and MAST (2) and the previous MAST/MEDATLAS protocol (3) which has been developed to process all the basic parameters, vertical profiles and time series.

This protocol includes:

  1. A data dictionary where the parameter names, units and codes have been standardised. The International System (IS) was used for units, and rules for derived units were recalled, based on ISO standards guidelines.
  2. The description of the format: autodescriptive ASCII, including short cruise/mooring header with reference to the author, station header, columns of observations referred to pressure for vertical profiles (depth in the sediments) and time for the times series.

3.The description of the final quality checks (QC) performed on the basic parameters before archiving. QCs includes automatic and visual checks :

QC-0: check the format, units, codes and overall completeness and consistency of information

QC-1 : check the date and location

QC-2 : data points : minimum/maximum broad range values, comparison with climatological statistics (when available), search for spikes, stuck sensor, vertical instabilities etc.

The visual checks give the overall consistencies of the data within the same data set, find out the wrong value in case of vertical instability, validate the climatological test in some areas etc. The results are quality flags added to each numerical value.

4.2.Implementation : Data Formatting and Qualification

The data received at the data centres are reformatted at the common format, and if necessary, the source scientist is contacted to complete the information. In case of units problems, conversion to IS units were made, when possible to insure comparability with the other data set.

The QCs have been performed on the reformatted data files in the three MATER data centres. Hellenic and French data centres used an expert software (SCOOP), and Italian Data Centre, a local software. Examples of QC1 and QC2 checks from SCOOP are displayed in Fig. 9 and 10.

After control that the outliers are not artefacts due to formatting, the results of the QC are communicated to the responsible scientists, to take further actions like validation, correction or elimination if necessary. Close co-operation with the scientists who collected the data has widely contributed to the quality assurance.

Fig. 9: Check of the Location and Date of the Observations (QC1)

Fig. 10: Check of the Location and Date of the Observations (QC2)

(in green the current profile, in blue the corresponding climatological profile)

4.3.Data Experts Review

A Data Quality Group of Experts (DQE) has been created by the project co-ordinator and its first meeting was held in Rome (CNRS) 17-11-1997, during the International Conference “Progress in Oceanography of the Mediterranean Sea”. It was composed of scientists and data managers in the following fields of expertise:

Suspended Particulate Matter : Project Co-ordinator

Inorganic chemistry (metals, CFC and radionuclides): Joan-Albert Sanchez-Cabeza (DQE Leader).

Physical oceanography, satellite data: MATER Data Managers.

Biology and biochemistry of the water column (nutrients, biogenic compounds, primary production, microbiology, fauna): Paul Wassmann.

Biology and biochemistry of the benthic zone (nutrients, biogenic compounds, primary production, microbiology, fauna): Roberto Danovaro.

Three main tasks were assigned to DQE:

  1. The first and most basic role given to the DQG was to transmit to all scientists the need of Data Quality control.

All laboratories should observe adequate internal Data Quality Assurance, which includes good sampling and analytical protocols, good laboratory practice (blanks, replicates, proper recording and other aspects) and adequate reporting to the Data Manager.

All laboratories should participate to their maximum capabilities, or even organise, inter-comparison exercises, and the results of these exercises should be made available to the MATER community.

2. Another important item was to select the list of "Basic Parameters" to be used by most of the partners, and specifically requested by the modellers. It was agreed that the Steering Committee should provide the DQG a limited list of basic parameters. For these parameters, it was desirable to include a suggested methodology in the MTP Quality Assurance Manual, to organise the inter-comparison exercise, and to implement the full data management protocol before archiving and dissemination. For the other parameters: because of the complexity and cost foreseen, in particular regarding some organic chemistry and biological parameters, the archiving would be limited to safeguarding.

  1. Then the Data Quality Group intended to establish a reviewing procedure for data sets, in collaboration with expert scientists. Depending on the data type, the subsets of the data sets should be sent to one of the following Data Quality Experts:

From this, will issue a simple Data Quality Statement such as:

Data Set is OK

More information is needed (specify)

Data set must be re-evaluated.

Due to its relatively late settlement, the lack of funding, the belated data submissions etc.. DQE did not have real possibility neither for conducting inter-comparison nor for reviewing data sets out of the operational QC procedure applied in the Data Centres. However they have been very useful advisers for the crucial points:

  1. Selection of the list of basic parameters
  2. Validation of the data dictionary for micro-biology and radioisotopes
  3. Check the control values for the QC protocol.

5.MTPII-MATER Database on CD-ROM

To facilitate the access to data, a MTP II-MATER CD-ROM will be published. A beta-test CD-ROM (Fig. 11) has been presented at the Perpignan workshop (October 1999) with the complete inventories, subsets of the database, extraction software, and documentation.

An integrated user-friendly SELMATER software allows selection, extraction, and visualisation of the observations. The data selection can be done according to several criteria: geographical area, data type (vertical profiles or time series), cruise name or reference, time period (year, month, date), source country, ship, measured parameters and quality flags. Two output formats are available: MEDATLAS and CSV tables (with limited information on data) for loading into spreadsheet or other scientific software. For the vertical profiles, the software allows also to extract data at any interpolated standard levels (pre-defined, or defined by the user).

The basic data are organised in different directories on the CD-ROM. These directories correspond to the data types: Adcp, Bottle, Ctd, Current, Net, Thermi, Trap, Xbt. Otherspecific data are archived in the directory Others on the CD-ROM.

All the catalogues, format description, codes, QC processing are available in the documentation directory in html format.

Fig. 11: Home Page of the MTPII-MATER Database on CD-ROM

6.EVENTS

6.1.Data Management Meetings

  • Paris, July 3-4, 1997, Institut Océanographique and IFREMER
  • Rome , 17 November 1997, CNR
  • Athens, 9-10 March 1998, NCMR/HNODC
  • Rhodes, 13-14 October 1998, before the III rd MTPII workshop
  • Paris, 25-26 Mars 1999, Institut Océanographique
  • Perpignan, 28 October 1999, during the IVth MTPII Workshop

6.2.Participation to Scientific and technical Meetings

  • Ocean Data Symposium, Dublin, Ireland, 15-18 October 1997

Presentation:
«Distributed Data Management Structures for Scientific Programmes: the MTPII-MATER Case by Catherine MAILLARD, Beniamino B. MANCA, Efstathios BALOPOULOS and Jean-François RACAPE.