SOST Interagency Ocean Observation Committee

Data Management and Communications

Steering Team

DRAFT SUMMARY REPORT

February 27-28, 2013

Consortium for Ocean Leadership

1201 New York Avenue, NW

Washington, D.C.

SOST Interagency Ocean Observation Committee – DMAC Steering Team

Summary Report of February 27-28, 2013 Meeting

DAY 1

1.Introduction and Icebreaker Exercise: DMAC Concept Mapping

The DMAC ST meeting was called to order. Introductions were made around the table and on the phone.

Opening remarks were provided by IOOC Co-Chair, David Legler, thanked the DMAC ST. Summit overview regarding the opportunities and vision for the system. The question for DMAC is how we achieve the ‘I’ (Integration) in IOOS.The DMAC ST Chair, Charly Alexander, provided the housekeeping items and reviewed the agenda.

Charly reviewed the DMAC ST Terms of Reference and the purpose of the team to:

  • Provide the IOOC with strategic guidance on DMAC-related activities and challenges[1]
  • Maintain focus on DMAC subsystem elements (i.e. discovery, transfer, access, archive)
  • Identify and solve specific data management challenges within the ocean observing realm; and
  • Serve as a forum for collaboration among IOOS agencies and partners on DMAC issues

Federal members represent agency data management practices and priorities, formulate DMAC guidance and recommendations for the IOOC, and serve two year terms, with a maximum of two consecutive terms[2]. Non-federal stakeholders align DMAC guidance and standards with realistic implementation which non-federal entities are often responsible for executing.

Charly reviewed the outcomes of the previous meetings, the DMAC Summit white paper, and the DMAC breakout session at the Summit. Charly reported the status of Summit follow-up activities.

Charly noted the questions and agreed to address it further in the meeting.

Moving forward, the DMAC ST is guided by but not driven by IOOS Summit priorities, includes connections to/recommendations for global, needs more systematic and routine engagement with IOOC, need to meet more often (phone/video, etc.) for shorter durations, focus on viable solution sets for the IOOS enterprise, develop a 12-18 month work plan with milestones/products, and re-boot when milestones/products complete (new ST/chair).

2. USGS: Virtual Host Presentation/Discussion

Rich Signell presented on the DMAC-relevant programs at the US Geologic Survey. The USGS mission is to provide geologic, topographic, and hydrologic information that contributes to the wise management of the Nation's natural resources and that promotes the health, safety, and well-being of the people. The USGS has various mission areas: Climate and Land Use Change, Water, Ecosystems, Energy and Minerals, Environmental Health, Natural Hazards, and Core Science Systems. The Core Science Systems is cross-cutting data with all of the science themes. USGS is a small agency with a budget $1.1B (FY12) as opposed to the other agencies NOAA $5.5B, NASA $18B, USGOV $3,670B. USGS is $0.03 on a $100 dollar bill. The R&D budget is about $0.6B.

The USGS Earth Resources Observation and Science center, which includes the Landsat Data Continuity Mission (joint operation by USGS and NASA), completed 40 years of observations. It is an interesting data set for looking at sea level rise. The National Elevation Dataset is an attempt to put the best bathymetry available into one dataset. The NED entered the ocean realm in August 2012 in collaboration with NOAA, NGDC, USGS, CMG, and USACE. Rich presented on the Environmental Data Discovery and Transformation and is aligning like DMAC in providing services. The Water Quality Portal shows how to use the cooperative services provided by USGS, EPA, and the National Water Quality Monitoring Council (NWQMC). The CIDA GDP Climate Downscaling Tool is built on THREDDS, OPeNDAP, WCS, WMS, CSW, and WPS. Python & Matlab interfaces to GDP. GitHub is another portal with extensive datasets. Rich works on the Coastal and Marine Geology program in the Natural Hazards mission area.

Scientific Analysis tools at USGS:

  • Majority of scientists use ArcGIS as their primary scientific analysis and visualation tool
  • ArcGIS 10 uses Python as it’s scripting and interface language
  • Use of Python allows ArcGIS users to utilize 100s of community packages
  • In 2012, USGS bought an Enthought Python Distribution (EPD) site license to facilitate interoperability of science workflows

Assessing a 30 year ocean hindcast in the Gulf of Maine included a One 15 TB dataset from 4D ocean model served via the THREDDS data server at UMASS/SMAST and 600 time series datasets from 3 different THREDDS catalogs (USGS, NOAA/NMFS, WHOI) in Woods Hole.

The interoperability takes model outputs and displays them in a common format. They are moving beyond just using models and expanding with time-series data, gridded data, and other observations.

In conclusion, USGS and DMAC are on same services bandwagon (OGC, OPeNDAP/CF + ESRI + Custom). The USGS has a lot of cool data and products but USGS should not be an organizing principle for data – we want to facilitate cross- agency, cross-sector, and cross-discipline mashups. USGS CDI, NOAA DMIT, DMAC are all pretty similar – let’s keep those circles overlapping. Keep building out the services, service-consuming web apps, science work flows, standards-based tools in Matlab, Python, R and listening to scientists & developers.

3.OGC Interoperability Experiments

Nadine Alameh presented an extensive list of references which are available through her presentation. OGC’s approach for advancing interoperability is concentrated among four programs. The Interoperability Program (IP) is a global, innovative, hands-on rapid prototyping and testing program designed to unite users and industry in accelerating interface development and validation, and the delivery of interoperability to the market. The Specification Development Program is a consensus standards process similar to other Industry consortia (World Wide Web Consortium, OMA etc.). The Compliance Testing and Certification Program allows organizations that implement an OGC standard to test their implementations with the mandatory elements of that standard. The Marketing and Communications Program provides education and training and encourages take up of OGC specifications, business development, communications programs.

Reasons for conducting a project as an interoperability experiment is for cost reduction since all participation is in-kind, least external management overhead (participating organizations self-organize), and meeting the challenge of effectively managing diverse, multi-organization, multi-national team. Reasons for conducting a project as a pilot project is that OGC assumes management role, OGC IP staff handle all meeting and administration tasks, projects follow milestones closely, and deliverables include documentation subject to peer review.

The OGC process consists of five broad activities:

  1. Startup Package
  2. Startup Preparation
  3. Kickoff
  4. Execution
  5. Wrap-up and Reporting

Nadine presented on the various experiments and use cases that OGC is currently working on.

DISCUSSION

Take-aways from the presentations. Search and discovery would be a useful exercise because that’s one of the greatest challenges. In addition to technology, we need a social engineering pilot that we can demonstrate how the information will be distributed to the masses. EarthCube is one method of getting the science out but more generally how can it become integrated and prevalent within graduate schools, congress, etc. Interoperability experiment within data.gov would be useful. Full coast display that is multi-regions and multi-agency: the OGC specs could be a part of it. OGC is having the same dialogue with other programs and may be converging with the DMAC ST to achieve some common goals.

4.Evaluation/relevance of use cases from September 2012 ST Meeting

The DMAC ST ended up with three use cases: oil spill, hypoxia and beach closures. There was a decision not to move forward on the oil spill use case. The DMAC ST discussed the relevance of these use cases.

Julie and Michelle worked on the hypoxia use case. The use case assessed the ability to find data (e.g., nutrient loading, wind, biology, dissolved oxygen, river discharge,currents, and temperature/salinity/density) on the hypoxic region of the Gulf of Mexico, and findconnections between the data through visualization tools. The Use Case 1 looked at a commercial fisherman would like to assess the current conditions in the Louisiana region of theGulf of Mexico prior to his cruise. The Use Case 2 looked at the water (agricultural) management would like to assess the current conditions in the Louisianaregion of the Gulf of Mexico to make informed management decisions. They would really like to look at the data portal interaction and interoperability.

Kevin’s group looked at beach closures. The idea was to link existing information, historical trends, and real-time data sources at Federal and Statelevels that would enable a forecasting model to alert the public of potential risks in targetedrecreational beach areas due to degraded water quality.The historic causes and effects leading to Beach Closures due to degraded water quality insome coastal communities have been widely studied. There are existing internet-based toolsthat track the occurrence and distribution of beach closures based on historic water qualitysampling. This use case focuses on what the technological and data architecture requirementscould be to develop and deliver an early forecasting capability to inform the public of periodsof high risk due to degraded water conditions based on existing land uses and climatologicalconditions.

Harry and Rich brought the idea of an oil spill use case. One of the most important needs is knowing where the oil will go. You would need precise modeling capability. DMAC could help bring all of the pieces and players come together in a common operating picture.

Discussion

  • Great architecture description at OGC through use cases that are variations of the use case topics the DMAC ST decided on.
  • They also lined up with the same societal benefits that are used in IOOS.
  • Use cases are useful in some context, the ones identified would help locate the datasets needed.
  • Use cases to demonstrate success or bring tools online. It would be better to make use cases aimed at the implementers of ocean data.
  • Superstorm Sandy could be used as a more simple test case to compare models and data.
  • Use cases could help talk about the value chain in discovery and collection.
  • Before defining use cases you have to categorize users, advocate keeping a model of the users online to tailor each use cases
  • It’s utilitarian at the design phase of a system, where there is a user need is tested. Then develop a concept of operations document.
  • This would be useful to IOOS to look at changes needed in the system.
  • It’s important to meet the users on the front end, design the concept, and then find out if it works at the end.
  • Need to be careful how to present the data in the environment where it is being used
  • They need to have enough detail in defining the needs and requirements. The requirements are built on previous successful use cases. Each use case is an individual facet of the system but they are all important
  • Inter-related use cases

5.Assessment of priority tasks for ST

Charly provided a consolidated list of proposed DMAC priorities from the DMAC ST white paper, suggestions solicited before IOOS Summit, and from DMAC break-out discussions at IOOS Summit.

  1. COMMUNICATING DMAC
  2. Stay the course, maintain current vision for DMAC
  3. Adopt the proposed new IOOS sub-system diagram (includes DMAC) and consider a new name for the DMAC subsystem
  1. SUGGESTED TECHNICAL FOCUS AREAS [with voting tallies]

a)Governance/compliance

  1. Accelerate standards adoption at IOOC member agencies [10]
  2. Require formal data citation records in the metadata to document sources [3]
  3. Metadata documentation and compliance [7]
  4. Evaluate/explain the costs/benefits of ISO 19115 [0]
  5. Establish/expand data interoperability through standards [3]

b)Archiving/Stewardship

  1. Get historical data on-line in IOOS-compatible ways [1]
  2. Improved stewardship/archiving [3]

c)Data Access/Transport Services

  1. Endorse/implement a service oriented architecture [0]
  2. Implement a core set of recommended data transport services [1]

d)Training

  1. Education/Outreach to improve understanding, use, importance of DMAC [3]
  2. Training per simpler access/use of DMAC [1]

e)Compliance/Governance [omitted from vote]

  1. Global and coastal ocean observing enterprises need to be more closely aligned including access to data and sharing of best practices and protocols
  2. Define minimum DMAC compliance
  3. Provide data in useful formats

f)Registry/Catalog

  1. Machine-accessible registry of IOOS service access points [3]
  2. Enable improved data discovery via service registry [1]
  3. Focus on catalog services [8]
  4. Establish DMAC as a distributed system w/ a centralized catalog [10]

g)Client Tools

  1. Develop DMAC tools that are simpler or more intuitive to use [2]
  2. Develop/apply web-based tools for visualizations of IOOS data-provider holdings [1]
  3. Develop modules that enable network access to SOS & OPeNDAP objects in IDL, Matlab, etc. [1]
  4. Tools for sharing code, experiences etc. such as use of social networks, etc. [2]

h)Thematic priorities

  1. Broader access to and exchange of biological data/observations [0]
  2. Ensure QA/QC is supported and is well documented in associated metadata [3]
  1. HOW TO EXECUTE DMAC
  2. Use DMAC “swat” teams to conduct a tech-transfer sprint to do on-site installations, training, testing
  3. Use(r) cases to accelerate DMAC
  4. Demonstrate producing useful outcomes

a) ERMA-like displays in all regions

b)Weather Channel-like content for coastal ocean

c) Modeler resource

Discussion

  • The DMAC ST should look at not only what is the most important but also what issues each member can contribute.
  • Regarding the phrasing, some of them are too broad. We could triage them as done already, not understood, understood, doable, not important.
  • Steve has worked on technology focus areas, another way to think about it is by trying to develop a roadmap for high priority, high value areas. Biggest problem right now is people trying to find the data they need in a format that is useful are spending more time on discovery than analysis.
  • There is also a lot of data that it doesn’t make sense to make interoperable – what would be useful is better scoping of should be inoperable. Cataloging and data registry is most important for getting the most important data floating to the top.
  • The social component of contributing data is a cultural problem that needs to be overcome, scientists need to be more proactive in getting their data discoverable.
  • Need to look at catalog registries versus a Google approach.
  • We need better filtering and data access services
  • If the IOOS program office is funding the regions they need to be more direct about what they are asking for
  • Need a better idea of what is realistic through a meta-data model
  • Authenticity and security of the source of data needs to be analyzed
  • Need to take a look at the old DMAC ST document and see how it aligns
  • Jeff DLB has suggestions based on what NOAA is currently working on
  • David Legler agrees that a triage exercise would be useful and that the DMAC ST can recommend paths to the IOOC.
  • Success is more than just regions and use cases

6. Break-out Session I: plans of action for priority tasks

Rather than break-outs the DMAC ST voted individually on the list, selecting their top four choices.

Below are the highest ranked topics:

  • (1) Accelerate standards adoption at IOOC member agencies – 10 votes
  • (18) Establish DMAC as a distributed system w/ a centralized catalog– 10 votes
  • (17) Focus on catalog services – 8 votes
  • (3) Metadata documentation and compliance – 7 votes

Break into groups and look at what the topic means and what would be the immediate next step.

REPORT OUTS

Cataloging (17 and 18)

Agreed that it is best to recognize there is more than one catalog (have aggregation centers). They need to federate around them. Getting data into the archive and having standards for that data. You may have subcatalogs. The RAs might need to populate catalog. Agreed that need to determine what goes in the metadata for these catalogs.

Metadata (3)

What it means that make the data processable, discoverable, and accessible. The compliance source is an authority that can ensure that a group is using a minimum set of metadata. Next steps need a status report from the IOOS Program Office where metadata stands with the System and the broader enterprise. Moving forward, it would be developing guides so that observers can know how to develop the metadata standards. Final step is creating the compliance authority.

Standards Adoption (1)

The group discussed various standards and generated a list of the top standards but then realized they needed to have a key understanding of the different domains. Fisheries, water quality, etc. It’s challenging even within a single agency in a single domain to collect data under certain standards.

DAY 2

1.Morning Announcements and Review of Day 1

Feedback from the DMAC ST from the previous day’s proceedings:

Karl noted that they spent a lot of time defining the scope of what they need to do and need to continue building on that today.

Still struggling to find an identity for the DMAC ST.

Trying to determine a recommendation for the IOOC but it doesn’t seem we have an idea of what that might be. As we move forward, if we could get a reference from the agencies and regions that would report on their DMAC activities.

If we could do that from meeting to meeting that would be presented to the IOOC. In the metadata group, there were outcomes that get at some activities to do.