Because good research needs good data

A National Research Data Infrastructure?

Chris Rusbridge | 05 February 2009

Two weeks without a post? My apologies! How about this piece of speculation that has been brewing at the back of my mind for some time...It is clear that a national research data infrastructure is needed, but there are problems with all of the approaches taken so far to address this. Subject data centres provide subject domain curation expertise, but there are scalability issues across the domain spectrum: it appears unlikely that research funders will extend their funding to a much larger set of these data centres (indeed the AHDS experience might suggest a concern to cut back). Institutional data repositories are being explored, but while disclosing institutional data outputs might provide sustainability incentives, and such data repositories might be managed at a storage level by developments from existing institutional library/archive and IT support services, it is difficult to see how domain expertise can be brought to bear from so many domains across so many disciplines. Meanwhile, various of the studies done by UKOLN/DCC with Southampton University suggest the value of laboratory or project repositories in assisting with curation in a more localised context.To square this circle, perhaps we have to realise that the storage infrastructure and the curation expertise are orthogonal issues. It is reasonable to suggest that institutions, faculties, departments, laboratories or projects should manage data repositories, databases etc with varying degrees of persistence. But in terms of the curation of the data objects (ie aspects of appraisal, selection, retention, transformation, combination, description, annotation, quality etc), somehow expertise from each domain as a whole has to be brought to bear. It is tempting to think of this as a parallel with the "editorial board" function of a "virtual data journal". This would clearly only be scalable if it were managed across the sector, rather than individually for each data repository.So we might suggest a federation of repositories on the one hand, and a collective organisation (or set of differing collective organisations) of curation expertise in different disciplines or domains on the other hand; the latter is referred to below as national curation mechanisms.In such a system, we might see roles for some of the main stakeholders as follows:

  1. Research funders define their policies and mandates, and their compliance mechanisms; Research Information Network to participate and assist?
  2. Publishers likewise define their own policies and mandates with regard to data supporting publication; JISC and RIN could assist coordination here.
  3. Research Institutions define their own policies, establish local research data infrastructure, and encourage appropriate researchers to participate in national curation mechanisms (could participation in these become an element of researcher prestige as membership of editorial boards currently is?).
  4. Researchers are responsible for their own good practice in managing and curating their data (possibly in laboratory or project data repositories or databases), and where appropriate for participating in national curation mechanisms.
  5. Subject/discipline domain data centres are responsible for managing and curating data in their domain, as required by their funders. They also assist in defining good practice in their domains and more widely, undertake community proxy roles (NSB, 2005), and participate in national curation mechanisms.
  6. A number of bodies such as the Digital Curation Centre and the proposed UK Research Data Service could undertake some coordination roles in this scenario, and could also undertake good practice dissemination and skills development through knowledge exchange activities.

(NSB. (2005). Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Retrieved from http://www.nsf.gov/pubs/2005/nsb0540/)What do you think? Is that at all plausible?