Reflections on a few data management events

28 June, 2010 |
By: Sarah Jones

I’ve been hastily writing up travel reports to submit expenses before year end, and in reviewing the data management events I’ve been to in the last few months, a few recurrent questions emerge...

The question of who is responsible for managing research data came up repeatedly at the RSP preservation workshops for repository managers. Attendees noted that many repositories infer (or directly state as a benefit) that they’ll preserve research outputs in the long-term, but most were unsure how to achieve this and didn’t feel they had sufficient plans in place as yet. Interestingly most felt it was their responsibility to be concerned about preservation, and believed the University and researchers expected that of them. If managing and preserving research data is an institutional responsibility – and after all it’s the HEI that stands to lose reputation or be prosecuted if data are mismanaged – then how will it be funded? Is it an institutional overhead? A core part of research support that ought to be provided through central services?

Many funders now expect researchers to plan for preservation and data sharing and some state explicitly that they’ll meet the cost of this. At the EIDCSR institutional policy workshop, David McAllister of BBSRC reflected that although they’ve made it clear they’ll support data sharing costs, researchers aren’t submitting these. It’s somewhat of a Catch 22: researchers feel their bids will be less competitive if they put in these costs (and also seem reluctant to do so as this “takes money away from real research”) yet the costs need to be met – and arguably this is more appropriately done as institutional provision rather than on a grant-by-grant basis so the infrastructure and support that develops is joined up and more sustainable.

Is it feasible, therefore, for HEIs to demonstrate they have provision in place and negotiate single agreements with funders – e.g. a percentage to be applied to relevant grants to cover such costs – rather than working with individual researchers to cost this in on every application? It would seem a more efficient way to work this. Institutions prevail while projects and researchers come and go, so there’d be more scope for funders to follow up and monitor plans and assured made by HEIs…

For the time being, a pivotal part of the process is the creation of data management plans (DMPs), which are required by funders at the application stage. These raise awareness of good practice and make researchers think about the long-term potential of their data. At the JISC MRD programme workshop back in March there was a discussion on DMPs in which Veerle Van den Eynden of UKDA noted from RELU experience that these are filled in but often viewed as red tape by researchers so aren’t always revisited once funded.  Similarly at the EIDCSR institutional policy workshop,  Kathryn Dally of Melbourne University reflected that anyone can write a policy - the key is in implementation. Their research data policy became mandatory in 2005 (this formalised guidelines in place since 1996) yet when reviewing uptake in 2009 they found it was still off the radar for many researchers. So how do we make data management a reality? How do you turn plans into action?

Personally, I like the UKRDS argument that we need to make a value proposition. When I was speaking with researchers at Glasgow last year for a study that’s led into the Incremental   project, many were apathetic about ‘the long-term’. Managing and preserving data wasn’t seen as relevant to their area, either because everything they needed was in publications, or the field advanced so quickly, or data could be regenerated if needed, or you needed to preserve the usability and that just wasn’t practical after 5-10 years without major re-investment, or, or, or…. There was a surprising degree of resignation among researchers that you have to just let some things go. Have they understood something we’re (I’m speaking here as a dusty archivist btw!) still struggling to grasp?

Undoubtedly there are cases where data really must be kept, but perhaps these are very few and far between. Archives typically select 1% for long-term preservation, and given the exponential rate of data growth, perhaps we should be aiming for a very small percentage of that.  Perhaps we should be focusing our efforts in terms of ‘data management’ on the short-term. I’ve always used the tem data management interchangeably with the DCC term ‘curation’ to mean the whole lifecycle of actions from initial thoughts, through creation of data, storage, management, preservation and reuse. But what does ‘data management’ actually mean? And does it encompass preservation?

A colleague on Incremental has been searching uni websites for data management resources and has  come across a number which use the term to refer to the nitty-gritty of the research process i.e. how researchers gather their data, find relevant information sources and manipulate of all these inputs to produce results. I was interviewed for the Research Information Manager  project last week and they describe a similar support role – an information specialist who’s embedded within the research team to help researchers navigate the daunting volume of scholarly publications, large and complex datasets, images, software tools, workflows and a myriad of other information resources and outputs. Is this what researchers need? Forget about preservation (except in the 0.0001% of cases where it’s actually relevant of course!) - perhaps the support researchers need is in terms of the day-to-day challenges of navigating and managing so much stuff.

This is a moot point for the DCC as our aim in phase 3 is to support the intermediaries who assist researchers to manage data. So, what is it researchers need? What role will those intermediaries play? And what can we do to support this at a national level? Fortunately we’re off for our 6-monthly planning meeting later today, so in the coming months we should have some suggestions to put to you…

Thanks for granting me this extended musing. A lot of rambling and far too many questions! Answers anyone? 

Comments

Should repository managers be concerned with preservation?

There's been some interesting discussion in the repository community about who is reponsible for preservation. See Richard Poynder's blog post at: http://poynder.blogspot.com/2010/08/preserving-scholarly-record-intervie...