MRC data plan FAQ

Q1. Could you explain what the problem is?

From 1 January 2006 all applicants submitting funding proposals to MRC must include a statement explaining their strategy for data preservation and sharing. Applicants must include this information within their Case for Support under a separate section entitled 'Data sharing and preservation strategy'. See the MRC Policy on Data Sharing and Preservation web page and the Applicants Handbook for more information.

In addition, it is noted that any potential problems in archiving the data that are anticipated or arise during the research project should be discussed with MRC as soon as possible, and data should be deposited in a standard that will enable them to be used by a third party, including the provision of adequate documentation.

Q2. What would you recommend I say in my funding proposal about my strategy for data preservation?

Your strategy for data preservation should cover the issues of contemporary and long term information management, and incorporate evidence or testimony of appropriate and sustainable organisational, technological and resource allocation policy and infrastructure. Means of access are also intrinsic to data preservation, but as important as providing the appropriate systems to facilitate the discovery and retrieval of data is, it is even more important to have the means to ensure that these can be understood. This presupposes an understanding of the data's user communities and an assurance that the form of the information remains persistently comprehensible.

A presentation by Warren Hilder entitled "60 years of data curating a life course study, the Medical Research Council National Survey of Health and Development (NSHD)" at our DCC/ERPANET workshop on the long-term curation and preservation of medical databases outlines many of the important criteria that must be satisfied in order to confidently claim that you are preserving and curating data. It also includes some definitions which might prove useful.

Q3. What about data sharing?

MRC were heavily involved in the work leading to the Joint Data Standards Study. This document details an approach for data sharing (defined as the re-use of data, in whatever way, wherever it is held, at any time after its creation). It also defines a standard vocabulary and illustrates the process of synchronous (contemporary), consecutive (from one person/body to another) and asynchronous (after longer intervals) sharing. Sharing involves a producer, who is the source of the information, and a consumer, who represents the end (re)user. Sharing may involve a push from the producer or data resource itself, or a pull from the consumer. Various prerequisites must be in place, including appropriate discovery mechanisms and support. The kinds of information that will be shared include the information itself, associated metadata, tools and associated methods and workflows that may be necessary to process and view it. We would certainly encourage you to read and refer to this document when preparing this particular section of the application.

To this end it is important to describe exactly what digital information will be conceived throughout the course of the proposed work and this ought to be approached from several perspectives. Most obvious is the semantic information about each particular data set — this will assist your own understanding of how these things are likely to be used and reused, the communities that will aim to exploit them and the additional information that you may have to associate with them in order to ensure their ongoing value. For instance, a series of responses to a questionnaire is of pretty limited value without also maintaining the associated questions and ensuring their persistent availability. Technological insights are also of fundamental importance; the types of file formats employed, the choices of particular storage media and any necessary hardware or software environments should all be understood and documented. There may also be legal implications of particular data that has been accumulated, which may also have to be addressed. Data protection issues for instance may require the anonymising of information sets and this should be documented within the context of an overall policy. Some guidelines on anonymising data sets are available on the Economic and Social Data Service's advice pages. Any data sharing agreements that set parameters for the collection or usage procedures will be of relevance here. Overviews of how management of consent, confidentiality, ethical and legal considerations and management of access rights and intellectual property will be conducted will also be appropriate and necessary.

In terms of reuse potential one ought to document the associated or secondary intra- and inter-domain uses to which each dataset or group of datasets might legitimately be expected to be put. Significant effort has been invested in understanding how we understand the significant properties of our digital information: the aspects of it that are fundamental to its ongoing value and which must be maintained to ensure its usefulness. There is a perception that some things, if lost, won't really detract from our ability to reap the full benefits. A paper presented within an HTML page for instance might not be of any less value when the formatting and layout are removed and it is published within a single text file. Our understanding of what is vital to keep and make available and comprehensible are wholly coloured by the potential uses that we can identify.

It's at the stage where plans for preparing and documenting data for preservation for sharing are conceived that a great deal of curation policy is centred. Preparations should involve a number of related commitments; you will no doubt be expected to indicate how data will be preserved and any strategies that you expect to rely upon for sharing. Inevitably this will incorporate both organisational and technological issues; the means to provide a range of functions such as ingest, documentation, data management, discovery and delivery will probably all have to be demonstrated. Use and reliance on acknowledged standards and open source tools and licenses ought to facilitate the smooth deployment of these functions. Similarly, evidence will have to be offered to illustrate the preservation planning that has taken place and will continue to take place. Choices of stable media and transparent software formats, policies on things like media refreshment and format migration, and a demonstrable commitment to documentation using recognised metadata schemes are likely to provide such evidence. A common strategy for making information available to re-users is to rely upon a relevant community database. Organisations such as the UK Data Archive offer preservation services to countless projects and organisations. A quick browse of their website affords a range of insights into the kinds of services they provide, consumer expectations and requirements for producers. UK Data Archive offers some really good advice for data creators and depositors

In addition to outlining policy, functions and areas to which you are committed, you may also wish to outline roles and responsibilities; a description of the allocation of personnel responsible for performing particular checks and balances will of indubitable value in this context. Similarly, in situations where more than one institution and/or researcher are involved in bidding for the MRC grant, you might wish to clearly state the partner institutions and identify any IPR issues that might arise from collaborative research efforts.

Q4. Where can I find more information?

The DCC has provided an overview to the MRC's policy and is continually developing resources to support researchers meet requirments, see for example the work we're doing on data management plans.

Q5. Where can I find details of other funders' data policies?

The DCC policy comparison table will give you a good overview, and the more detailed abstract of each funder's policy takes you a step further.

We endeavour to keep these webpages up to date, but always check your funder's policy direct too.

The DCC is funded by

Joint Information Systems Committee