FP7-INFRASTRUCTURES-2007-1I3 proposal

24/04/07 V2EuroVO-AIDA

COMBINATION OF COLLABORATIVE PROJECT AND COORDINATION AND SUPPORT ACTION

Integrated Infrastructures Initiative project (I3) proposal

Infrastructures Call 1

FP7-INFRASTRUCTURES-2007-1

Euro-VO Astronomical Infrastructure for Data Access

EuroVO-AIDA

Date of preparation:April2024, 2007

Version number (optional): V2

Participant no. * / Participant organisation name / Part. short name / Country
1 (Coordinator) / Centre National de la Recherche Scientifique - Institut National des Sciences de l’Univers / CNRS-INSU / France
2 / European Space Agency / ESA / France
3 / European Southern Observatory / ESO / Germany
4 / Istituto Nazionale di Astrofisica / INAF / Italy
5 / Instituto Nacional de Técnica Aeroespacial / INTA / Spain
6 / Ruprecht-Karls-Universitaet Heidelberg / Germany
67 / Nederlandse Onderzoekschool voor Astronomie, legally represented by the University of Groningen / RUG-NOVA / The Netherlands
78 / University of Edinburgh? / ?UEDIN / United Kingdom
8 / Ruprecht-Karls-Universität Heidelberg - Zentrum für Astronomie der Universität Heidelberg / UHEI / Germany

*Please use the same participant numbering as that used in Section A2 of the administrative forms

Work programme topics addressed

(if more than one, indicate their order of importance to the project. The main (first) objective must be one included in this call)

Scientific Digital Repositories

Name of the coordinating person: Françoise GENOVA

e-mail:

fax:+33 3 90 24 24 32

Proposal abstract

(copied from Part A, if not in English include an English translation)

*** This has obviously to be improved. please check and comment the abstract which has to fit with the Digital Scientific Repositories. In particular we have to write a conclusion on what the project will bring***

The concept of a Virtual Observatory is that all the world’s astronomical data should feel like it sits on the astronomer’s desk top, analysable with a user selected workbench of tools and made available through standard interfaces across the whole range of astronomical research topics. The VObs concept has the potential to transform and restructure the way astronomy research is done, and the astronomical Virtual Observatory is a world-wide, community-based initiative. Euro-VO is the European implementation of this idea. The EuroVO-AIDA project aims at coordinating the transition of Euro-VO to operations, integrating its different aspects: networking with the European astronomy community of data providers and astronomers, and service activities in support to their uptake of the VObs framework for the implementation of VObs-enabled data repositories, with in particular the construction of a registry of European VObs-compliant resources, and for full science usage of the resulting eInfrastructure; research activities tackling the evolution and adaptation of VObs interoperability standards to widespread implementation and usage, with specific emphasis on data access protocols and data access layer, taking into account feedback from data centres and users, and assessing possible usage of several emerging emerging technologies such as Web 2.0 by data centres; co-ordination activities to ensure proper networking with the international VObs community and to initiate discussions with other scientific communities and with projects which develop relevant generic tools and environments; service activities in support to outreach towards higher education and the general public.

Table of contents

Section 1: Scientific and/or technological excellence, relevant to the topics addressed by the call

1.1Concept and objectives

1.2Provision of integrated services and co-ordination of high quality research

1.3Networking Activities and associated work plan

1.3.1Networking activities work plan overall strategy

1.3.2Networking activities work planning

1.3.3Networking activities work description

1.3.4Graphical presentation of the components

1.3.5Risk management of the Networking Activities

1.4Service Activities, and associated work plan

1.4.1Work Plan overall strategy

1.4.2Work planning

1.4.3Work description

1.4.4Graphical presentation of the components

1.4.5Risk management of Service Activities

1.5Joint Research Activities and associated work plan

1.5.1Work plan overall strategy

1.5.2Work planning

1.5.3Work description

1.5.4Graphical presentation of the components

1.5.5Risk management of Joint Research Activities

1.6 Summaries

References

Section 2.Implementation

2.1Management structure and procedures

2.2Individual participants

CNRS-INSU

ESA

ESO

INAF

INAF

INTA

Ruprecht-Karls-Universität Heidelberg, representing the GAVO Community

NOVA, represented by the University of Groningen

UEDIN, representing the AstroGrid Consortium

2.3Consortium as a whole

ii) Other countries

iii) Additional partners

2.4Resources to be committed

Section 3.Impact

3.1Collaborative arrangements and perspectives for their long-term sustainability

3.2Expected impacts from Service activities

3.3Expected impacts from Joint Research Activities

3.4Dissemination and/or exploitation of project results, and management of intellectual property

Acronyms used in the proposal

Section 4.Ethical Issues

Section 5.Consideration of gender aspects

Section 1: Scientific and/or technological excellence, relevant to the topics addressed by the call 5

1.1Concept and objectives...... 5

1.2Provision of integrated services and co-ordination of high quality research...8

1.3Networking Activities and associated work plan...... 13

1.3.1Networking activities work plan overall strategy...... 13

1.3.2Networking activities work planning...... 14

1.3.3Networking activities work description...... 14

1.3.4Graphical presentation of the components...... 27

1.3.5Risk management of the Networking Activities...... 27

1.4Service Activities, and associated work plan...... 28

1.4.1Work Plan overall strategy...... 28

1.4.2Work planning...... 28

1.4.3Work description...... 28

1.4.4Graphical presentation of the components...... 37

1.4.5Risk management of Service Activities...... 37

1.5Joint Research Activities and associated work plan...... 38

1.5.1Work plan overall strategy...... 38

1.5.2Work planning...... 38

1.5.3Work description...... 38

1.5.4Graphical presentation of the components...... 46

1.5.5Risk management of Joint Research Activities...... 46

1.6 Summaries...... 47

References...... 50

Section 2.Implementation...... 51

2.1Management structure and procedures...... 51

2.2Individual participants...... 55

CNRS-INSU...... 55

ESA...... 56

ESO...... 56

INAF...... 57

INTA...... 57

Ruprecht-Karls-Universitaet Heidelberg...... 58

NOVA, represented by the University of Groningen...... 59

U?, representing the AstroGrid project...... 59

2.3Consortium as a whole...... 60

ii) Other countries...... 62

iii) Additional partners...... 62

2.4Resources to be committed...... 64

Section 3.Impact...... 65

3.1Collaborative arrangements and perspectives for their long-term sustainability 65

3.2Expected impacts from Service activities...... 69

3.3Expected impacts from Joint Research Activities...... 71

3.4Dissemination and/or exploitation of project results, and management of intellectual property 73

Acronyms used in the proposal...... 74

Section 4.Ethical Issues...... 75

Section 5.Consideration of gender aspects...... 76

Proposal

Section 1: Scientific and/or technologicalexcellence, relevant to the topics addressed by the call

(Recommended length for the whole of Section 1 – forty pages, not including the tables)

1.1Conceptand objectives

(Explain the concept of your project. What are the main ideas that led you to propose this work?

Describe in detail the S&T objectives. Show how they relate to the topics addressed by the call. The objectives should be those achievable within the project, not through subsequent development. They should be stated in a measurable and verifiable form, including through the milestones that will be indicated under sections 1.3, 1.4 and 1.5 below.)

Astronomy has traditionally been at the forefront for the implementation of digital data repositories, with a continuum between observational data stored in observatory archives, value-added databases, and results published in the academic journals. National and international ground- and space- based observatories produce terabytes of data, which are publicly available from observatory archives all around the world. Data centres and archives develop added-value services, such as high level data (‘science ready’) data products, analysis tools, databases compiling for instance result data published in journals, which allow these data to be readily useable.

Some of these scientific data repositories were available long before the World Wide Web. For instance, the database containing the observations of the International Ultraviolet Explorer satellite of the European Space Agency, which operated between 1978 and 1996, has been a remarkable early precursor, to demonstrate the importance of public availability of data, allowingscientists to re-use it for scientific purposes which are often different from the original ones: roughly five times as many publications were based on archive data as those based on the original proposals (Wamsteker & Griffin, 1995). An early example of value-added database is SIMBAD, which compiles published information about astronomical objects(30.000 queries/day in average in 2006). It has been created in 1983 by the merging of the Bibliographic Stellar Index and the Catalogue of Stellar Identifications, which had been developed at the beginning of the seventies and had since the beginning been remotely accessible. SIMBAD has evolved a lot over the years, in phase with the evolution of astronomy and with technical evolution. Along the years, Europe has played a leading role by setting up some of the most successful archives of data obtained from satellites, such as the European Space Agency’s ISO (infrared) and XMM (X-ray) observatories. Nowadays Europe also hosts the world-leading archive of ground-based astronomical observations, the ESO archive, which collects observations carried out with ESO telescopes since the early 1990s, including the complete record of observations obtained with the Very Large Telescope.

Many other information repositories have been developed along the years, and as soon as the World Wide Web began to emergethe data providers have begun to implement Web access to them and http links between them. This data network, which consists in huge volumes of highly distributed, heterogeneous data, is heavily used by astronomers in their daily research work. However, the identification by users of data of interest to them among the wealth of available resources can be difficult. Additionally, each data set and service has its own user interface and access methods, requiring a specific learning process.

The advent of the World Wide Web and the early implementation of on-line digital repositories have produced a revolution in the way astronomical data and information is distributed and used in the daily work of astronomers. Building on this, astronomy is thus at the leading edge for conceptualizing and implementing a coordinated approach to the digital repository deployment. The global vision comprises several key elements:

-An ‘open’ data policy – most observational data are publicly available after a proprietary period of one year;

-Agile and eager-to-evolve data repositories - contrary to the ‘static’ vision conveyed by the term ‘repository’-,provided thatas long asthe providers get the proper support and that the proposed evolution remains manageable;

-a bottom-up approach, in particular for the definition of interoperability standards.Great attention is paid to the participation of and feedback from data managers in the standardization process, and to a proper evaluation of new technologies, which have to be implemented not too early, since the goal is to build an operational system with some sustainability in mind, and not too late, to take the full profit of new possibilities;

-a fully science-driven approach: all evolutions are driven by the needs of the science community, and in turn the science community is willing motivated to use the VObs-enabled data and tools because they are adequate to its needsof the unique and powerful capabilities they provide.

Another lesson learnt is that in such a system, which facilitates access to data, a proper evaluation of data quality, a proper definition of data characterization, the proper implementation of the required metadata, and the proper propagation of these information in the system, are mandatory to allow high quality research re-using the data, and also to give the user enough confidence in the whole system – which means additional tasks on the data repositories.

A radical new step has been under way since about 2000 with the rapid development of the Virtual Observatory (VObs) concept. This aims at providing seamless unified access to data holdings – all archives speaking the same “language”, accessed through uniform portals, and analysable by the same tools. This means to give all astronomers access to and usage of data gathered by the other disciplines of astronomy, well in pace with the rapid development of multi-wavelength astronomy, which now makes a significant fraction of the published papers. VObs projects around the world have formed the International Virtual Observatory Alliance (IVOA, which coordinates in particular the assessment of the VObs architecture and the development of domain-specific interoperability standards. The VObs goes one step further than giving access to distributed data repositories: the implementation of the interoperability standards (the interoperability layer) on the top of the data repositories, permits operations, such as data aggregation and combination, which are essential for the full scientific exploitation of the data infrastructure.

The VObs is not a monolithic system, but, like the Web, a set of standards, which make all the components of the system interoperable – data and metadata standards, agreed protocols and methods, standardised mix-and-match software components. These standards and software modules constitute the VObs Framework. Several strands of work are needed to fully implement the Astronomical Virtual Observatory:

1)Development of standards and protocols, and their international agreement, coordinated by IVOA;

2)Construction of "glue" software components - portal, registry, workflow, user authentication, virtual storage;

3)Uptake by data centres, who need to "publish" to the system, i.e. to write VObs compliant data services connected to their holdings;

4)Construction of tools to effectively take advantage of seamless access to data;

5)Support to the scientific community in its uptake of the new framework.

Since the emergence of the concept in 2000, the VObs development has first been prototyping, the building of the essential infrastructure components, and a first uptake by the data providers. In addition to agency-run observatory archives and well-established data centres, many scientific teams express their willingness to share their knowledge and expertise by publishing services focussed on a given scientific question in the VObs framework. On the science community side, new data and data portals are made available incrementally, and the usage of the basic functionalities is straightforward. But it appears that some support has to be provided, so that astronomers can progressively learn to take advantage of the full power of the new coordinated research e-Infrastructure (***to be expanded?***.

The first objective of the EuroVO-AIDA project is to lead understand how the complex system which is the European VObs can operate, to coordinate the transition of the European VObs to operations. EuroVO-AIDA will support the data providers and and the support to the astronomy community and data providers in this phase, to identify the necessary adjustments from their feedback, to explore important R&D domains which may have important spin-offs on the medium term, and to evaluate the conditions for long term sustainability of the system.The VObs will be an operational, discipline-wide knowledge infrastructure based on very diverse data repositories, taking into account the needs of the science community and fully available for daily usage by scientists.

This will allow the VObs to remain at the leading edge, and provide a rare example of an operational knowledge infrastructure, including a huge diversity of resource providers and providing supplying a flexible environment adapted to the users’ needs, seamlessly used by a whole scientific community, giving all scientists access to the best data and data access tools irrespective of their location, and with a strong potential for outreach in education and the general public.

*** more details ******Milestones and detailed measurable objectives are missing here***

There are different possible models for the organisation of “virtual communities”, depending on the specific needs and organisation of each scientific community. The astronomical Virtual Observatory is often cited by other communities (e.g. by DARIAH - Digital Research Infrastructure for the Arts and Humanities - in the ESFRI 2006 Roadmap). Other successful models may have different starting points, for instance security can be a central requirement for some disciplines, e.g. Bioinformatics. The general data policy of astronomy, based on public availability of data, is quite different, but some elements of the astronomical VObs require the implementation of a security layer. Discussions with other disciplines around commonalities, differences, and lessons learnt, would be useful. To prepare for that, a second high level goal of EuroVO-AIDA is to make the organisation of the astronomical Virtual Observatory explicit, and to initiate discussions with other disciplines, and with projects which develop generic tools and environments, to be prepared to explore possible sharing of best practice and possible synergies in a next step.

1.2Provision of integrated services and co-ordination of high quality research

(Describe the state-of-the-art in the area concerned, and the advance that the proposed project would bring about. If applicable, refer to the results of any patent search you might have carried out.)

The VObs framework is made up of several components operating at different levels. It of course relies heavily on generic e-Infrastructure elements:

-the communication network is in place, thanks to GéANT and the European National Research and Education Networks;

-the general framework of the Grid infrastructure is almost done (GlobusgLite, Globus, web services, etc);

-generic computational grids are being deployed (EGEE).

To complement these generic efforts, each discipline has to tackle its specific needs. Astronomy has in particular to deal with massive distributed data, and with the integration of data and services. The first international forum for the discussion of the discipline-specific astronomical Virtual Observatory interoperability standards was the Interoperability Working Group of the FP5 OPTICON Thematic Network (IHP-INF-99-1), which was formed in 2001. It included from the beginning international participants from Canada and the USA, and successfully defined in April 2002 the first astronomical VObs standard, VOTable, an XML-based standard used to exchange tabular data, which has been heavily used in all VObs developments since then. The International Virtual Observatory Alliance, formed in June 2002, took up the co-ordination of the standard definition, and the European VObs projects are heavily involved in the effort to define internationally agreed interoperability standards, in close collaboration with other VObs initiatives in North America (USA, Canada) and Asian-Pacific region (Australia, China, India, Japan). IVOA incrementally defines interoperability standards for different layers of the VObs architecture, which are covered by different Working Groups and Interest Groups: