JW, LECSeptember 12, 2012
Guidelines for Expanding the GROWup Database
Prepared for discussion at the WG1 meeting in Oslo
Background:Besides facilitating the collaboration among researchers engaging in quantitative studies of political violence, a core objective of the ENCoRe project is to ease the exchange and integration of relevant data within the network and beyond. Ideally, seamlessly integrated data will open up new avenues for exciting new research at significantly lower levels of effort (and frustrations) when it comes to preparing data for analysis. These efforts build on a relational database infrastructure that links data collected at various institutions. This document proposes some practical guidelines to help bring about the process of data integration.
Premise: Being part of the ENCoRe network, data issues are a collective objective and can exist only through collaborative efforts. To bring these efforts together we will rely on theGROWup relational database infrastructure that is hosted and developed by ETH Zurich. The data-providing institutions retain full autonomy over the own data source but pledge to make efforts to coordinate their own data collection activities and updating schedules in order to facilitate the effort of merging and updating the integrated data structures according to a previously agreed time plan.
Expansion Strategy:Initially, we will expand the existing database gradually andsequentially by adding contributed data that share a clear and meaningful relation to the existing data through the respective unit of analysis.
Compatibility and Consistency:Ideally, contributed data should be fully linked to, and compatiblewith, at least one existing type of data in the database. Specifically, this means that the unit of analysis should link up meaningfully through an accepted identifier code (ID). Currently this includes the following:
- Country level (COW Interstate System State ID (v.2008 or later), Gleditschand Ward Country ID (v.4), year)
- Ethnic group level ( EPR Cow Group ID (v. 2.0))
- UCDP/PRIO Conflict level ( UCDP Conflict ID (v. 4-2010 or later))
- Dyadic actor level ( UCDP DyadID (v. 1-2010 or later))
- Spatial Data (ideally also linked to one of the above):
- Vector data(e.g. CShapes country shapes)
- Longitude/Latitude geo-referenced point data (e.g. UCDP GED conflict events)
- Raster data (e.g. PRIO grid)
Data Format: Data sources in tabular form should be provided in machine-readable UTF-8 comma-separated value (csv) format. All spatial data should contain explicit reference to a SRID coordinate system. Vector data should be provided as ESRI Shapefile or as WKT strings in a UTF-8 text file. Raster data may be provided in any raster format supported by GDAL (the Geospatial Data Abstraction Library).
Documentation: It is the full responsibility of the data providing institution to provide adequate documentation, including a codebook, on its own web page. GROWup will offer links to this information, including clear references to be cited by users.
Contact: There should be a single point of contact from the data providing institution for each contributed dataset. Ideally, this representative should participate in the WG1 meetings on a regular basis.
Responsibility: The data providing institution maintains the sole responsibility for the data, including ensuring full compatibility through the above stated terms, as well as future updates. Failure to ensure compatibility may lead to the data source’s removal from GROWup.
Authorship: The data providing institution maintains the authorship for their data. This will be made explicit directly by the database, which will point out to the user all component authorships of customized data.
Public Accessibility: Data providing institutions may temporarily restrict the accessibility of the data to members of the ENCoRe network through the Research Front End (RFE), thus delaying their publication through the Public Front End (PFE). Direct access to the relational data base through SQL queries will not be provided to the public but can be granted to ENCoRe members on request.
Replicability: To ensure replication of results and to simplify the technical infrastructure underlying the database, a copy of the raw input data will be stored (and archived) locally within the GROWupdatabase at ETH Zurich, not withholding any authorship claims (see above).