IDCC11 Session 3B: Environmental data

9 December, 2011

Sarah Callaghan began from the premise that most researchers think data sharing is a good idea – as long as it is someone else’s data! They have significant reservations when it comes to sharing their own data as they worry about losing credit for having created the data in the first place. Sarah suggested that reframing ‘sharing’ as ‘publication’ might encourage more openness. However, there is currently no universally accepted method of obtaining academic credit for sharing, and data creators prefer to hold onto their data until they have extracted all possible value from it.

The Data Citation & Publication project at NERC aims to reward creators for appropriately curating, archiving, and sharing their data. As part of this NERC data centres can now issue DOI’s for suitable datasets allowing them to be cited and therefore enabling creators to get due credit for the work they have put into creating that dataset.

Sarah concluded with a quote from Jason Priem “We share because we do science, not alchemy”

Brian Matthews discussed the Advanced Climate Research Infrastructure for Data (ACRID) project which concluded in August 2011. One of the drivers for this was the uncontrolled release of e-mails from the University of East Anglia and the subsequent House of Commons report that recommended that “researchers take steps to make available all data that support their work”.  

One of the main aims of ACRID was citability of data and, like NERC, this is being addressed by the use of DOI’s for datasets. The project resulted in a common information model that is compatible with existing models in use in environmental sciences. Further work is required to realise the full potential of both the model and the use of DOI’s.

Constanze Curdt described the research data management systems that they have put in place to manage data generated by the CRC/TR32 project. This project is long-term and generates large quantities of very varied data which must be stored and managed in compliance with DFG requirements.

To achieve this they have created their own data storage and access systems with a web interface, and a bespoke metadata framework. The web interface is available to the public, though only users with appropriate permissions can access or download all of the data.

Ruth McNally closed the session with a discussion of dataflows in data intensive research. The pilot study conducted during 2011 looked at dataflow in next generation sequencing and environmental network centres, two areas in which vast amounts of data can be generated in relatively short timescales.

The conclusions were surprising; dataflow isn’t a single constant, smooth stream but can be influenced by topography created by the interactions of people, things, and ideas. This needs to be taken into account when undertaking any form of data intensive research.