What should I say about data sharing?

MRC were heavily involved in the work leading to the Joint Data Standards Study. This document details an approach for data sharing (defined as the reuse of data, in whatever way, wherever it is held, at any time after its creation). It also defines a standard vocabulary and illustrates the process of synchronous (contemporary), consecutive (from one person/body to another) and asynchronous (after longer intervals) sharing. Sharing involves a producer, who is the source of the information, and a consumer, who represents the end (re)user. Sharing may involve a push from the producer or data resource itself, or a pull from the consumer. Various prerequisites must be in place, including appropriate discovery mechanisms and support. The kinds of information that will be shared include the information itself, associated metadata, tools and associated methods and workflows that may be necessary to process and view it. We would certainly encourage you to read and refer to this document when preparing this particular section of the application.

To this end it is important to describe exactly what digital information will be conceived throughout the course of the proposed work and this ought to be approached from several perspectives.

Most obvious is the semantic information about each particular data set — this will assist your own understanding of how these things are likely to be used and reused, the communities that will aim to exploit them and the additional information that you may have to associate with them in order to ensure their ongoing value. For instance, a series of responses to a questionnaire is of pretty limited value without also maintaining the associated questions and ensuring their persistent availability.

Technological insights are also of fundamental importance; the types of file formats employed, the choices of particular storage media and any necessary hardware or software environments should all be understood and documented. There may also be legal implications of particular data that has been accumulated, which may also have to be addressed.

Data protection issues for instance may require the anonymising of information sets and this should be documented within the context of an overall policy. Some guidelines on anonymising data sets are available on the UK Data Archive's advice pages. Any data sharing agreements that set parameters for the collection or usage procedures will be of relevance here. Overviews of how management of consent, confidentiality, ethical and legal considerations and management of access rights and intellectual property will be conducted will also be appropriate and necessary.

In terms of reuse potential one ought to document the associated or secondary intra- and inter-domain uses to which each dataset or group of datasets might legitimately be expected to be put. Significant effort has been invested in understanding how we understand the significant properties of our digital information: the aspects of it that are fundamental to its ongoing value and which must be maintained to ensure its usefulness. There is a perception that some things, if lost, won't really detract from our ability to reap the full benefits. A paper presented within an HTML page for instance might not be of any less value when the formatting and layout are removed and it is published within a single text file. Our understanding of what is vital to keep and make available and comprehensible are wholly coloured by the potential uses that we can identify.

It's at the stage where plans for preparing and documenting data for preservation for sharing are conceived that a great deal of curation policy is centred.

Preparations should involve a number of related commitments; you will no doubt be expected to indicate how data will be preserved and any strategies that you expect to rely upon for sharing. Inevitably this will incorporate both organisational and technological issues; the means to provide a range of functions such as ingest, documentation, data management, discovery and delivery will probably all have to be demonstrated.

Use and reliance on acknowledged standards and open source tools and licenses ought to facilitate the smooth deployment of these functions. 

Similarly, evidence will have to be offered to illustrate the preservation planning that has taken place and will continue to take place. 

Choices of stable media and transparent software formats, policies on things like media refreshment and format migration, and a demonstrable commitment to documentation using recognised metadata schemes are likely to provide such evidence. 

A common strategy for making information available to re-users is to rely upon a relevant community database. Organisations such as the UK Data Archive offer preservation services to countless projects and organisations. A quick browse of their website affords a range of insights into the kinds of services they provide, consumer expectations and requirements for producers. UKDA offers some really good advice for data creators and depositors

The ULCC National Data Repository is a further example of a service that offers storage and preservation solutions for a wide range of digital formats for subsequent discovery, retrieval and use. 

A directory of UK repositories and services is available within the Digital Preservation Coalition's website.

In addition to outlining policy, functions and areas to which you are committed, you may also wish to outline roles and responsibilities; a description of the allocation of personnel responsible for performing particular checks and balances will of indubitable value in this context. 

Similarly, in situations where more than one institution and/or researcher are involved in bidding for the MRC grant, you might wish to clearly state the partner institutions and identify any IPR issues that might arise from collaborative research efforts.