Development of an Aggregation Model for Mooring Data Served by DODS

Development of an Aggregation Model for Mooring Data Served by DODS

Development of an Aggregation Model for Mooring Data Served by DODS

Data from moored oceanographic instrumentation has been made available for access over the Internet by a small number of providers using OPenDAP (DODS) servers. This data usually consists of time series of such quantities as current velocity, water temperature and salinity, bottom pressure, travel time from inverted echo sounders, and could include surface meteorological data. Future developments in instrumentation may allow some of the chemical and biological properties of the ocean to be measured by unattended moorings deployed for periods of order months. Examples might be oxygen, nutrients, chlorophyll-A, and transmissivity. One of the goals of NVODS is for the user to search multiple OPenDAP sites in order to retrieve data specific to a given problem or analysis. This means that the user would be unaware of the locations of the data and the specifics of its organization and formatting.

One approach to this problem is to use an Aggregation server that receives the results of a request to a search engine (e.g. all the current records that are below 1000-m depth, have durations longer than 6 months, were located in the Gulf of Mexico, and were from moorings deployed later than January 1, 1990), assembles the requested data from the individual data providers, puts these data into a consistent organization by translating and possibly reformatting on the fly, and provides this to the user. Ideally, the process of translation by the Aggregation server would also apply standardized naming conventions to variables and attributes, and enforce consistent units for the variables. Aggregation servers would eventually deal with many different types of data such as satellite imagery, from moorings, from ship deployed instruments (CTD’s, XBT’s, etc.), lagrangian drifters, etc. However, time series from moored instruments are probably the most tractable, because of the relatively limited number of variables that are measured at a small number of fixed locations in the ocean. Some minimal conventions have been established, and most research institutions are well aware of the requirements of data that are input into well-established time-series analysis methods.

Thus, it is proposed to develop a strawman data model that will address the organization of data from multiple moored instruments, which will be supplied to a user by an Aggregation server. The data model will establish the structure of the time series and establish some preliminary guidelines on variable and attribute naming conventions. Examples of questions to be addressed are:

  • Should the returned data be in one data stream or in multiple files, each containing data from one location?
  • If a time-series has more than one dimension (e.g. bin depth for an ADCP), how is this accommodated and incorporated into the structure?
  • How is the time dimension defined across multiple datasets?
  • How are multiple variables from one instrument organized?
  • Which conventions (e.g. COARDS, EPIC) already established by the community, are most useful or most widely accepted?

It is also proposed that any structure developed for the metadata (e.g. position, water-depth, etc.) is compatible with a relational database table structure. This will allow efficient searching using standard SQL. Clearly a standardized structure for metadata of time series is essential if searches are to be made across multiple sites, and relational databases are an established way of organizing large quantities of complex data.

The proposed tasks are:

  1. Examine existing DODS moored data sites (~ 10 to 20) to determine the data structures already in use by the providers.
  2. Using these results, refine the questions given above, plus others that may arise, and request input from the ad-hoc workgroup on solutions.
  3. Develop the data organization strawman.
  4. Make recommendations for standardization of variables, dimensions and attributes, along with their units.
  5. Hold a workshop for interested parties (<~ 15 persons), to discuss and refine the results of 3 and 4.
  6. Report the recommendations of the workshop.

1

DODS_MooredData_Proposal

September 23, 2003.