Because good research needs good data

Navigating the uncertain waters of data archiving and curation

A guest blog post by Gaz J Johnson, Repository Manager at the University of Leicester, on his experiences so far as he takes his 'first tentative steps into the terrifyingly uncertain waters of research data archiving'.

Gaz Johnson | 15 June 2012

The DCC recognises that within institutions research data management initiatives have emanated from different departments (the library, the research office, IT services etc.) and from those with different roles. Many of the individuals now working in this area have had to learn new skills and deal with new challenges. One such individual is Gaz J Johnson (@llordllama), Repository Manager at the University of Leicester. Gaz has kindly written a guest blog post about his experiences of ‘navigating the uncertain waters of data archiving and curation’.


As those who might have read some of my posts on the University of Leicester’s Library blog will be aware, I’m one of a number of repository managers and librarians who are taking their first tentative steps into the terrifyingly uncertain waters of research data archiving.

Did I say terrifying? I meant of course exciting!  Although when I start speculating about the scale and complexity of research data that my institution creates in the course of its average working day, a little of the mind-killing fear does return. And these waters seem awfully deep. And wide. And just how am I supposed to navigate them?

That said, archiving research data isn’t for me a total unknown quantity. Over the years on my local institutional repository the Leicester Research Archive we’ve had a handful of code and the occasional data set to store from academics. Nothing massive and to be honest nothing especially complex, and mostly simple Access-based databases or CSV delineated outputs. 

Certainly they’ve not been at the forefront of my mind when looking to expand our content. I almost typed full text content there which is a real give away as to what we have been focussing on collecting. 

Interestingly though in the last couple of years I’ve been having more and more conversations with academics looking for somewhere to host their project data outputs. I’m hearted they’ve looked towards the repository and our expertise with digital curation, although I’ve been acutely aware that practically there must be a lot of bridges to cross.

These calls have not been a flood, more of a trickle if I were honest. However, the frequency with which my phone buzzes and I’m suddenly talking to a concerned potential PI finishing off a funding proposal, which has a requirement to archive the data outputs, has slowly risen.

It’s something that’s not gone unnoticed in our institution, and with moves from funders like EPSRC to bring about a firm open data policy in the coming years suddenly we’re up against the clock.

Of course thinking about doing something or even having a response to a policy in place is one thing, having an operational response to cope with the practical side of things is another entirely, as any repository manager will tell you.

Personally I consider myself reasonably tech savvy, and while I’m pretty rusty at coding say, I am fairly up to date on a lot of the issues around running an effective open access repository. 

But when I started to think at the start of the year about working with data archiving I had to take a long hard look at my skills and experiences and question if I was equipped to deal with the issue. Fundamentally yes, practically…well it came down to that often asked day one interview question – that is “If you were managing the institution’s data repository – what’s the first thing you’d need to do?”

I think I’ve gone a little way towards answering this through attendance at one of the DCC’s residential data management forums and following this up with a JISC/BL DataCite workshop. Both of these have been useful for a number of reasons.

Firstly, I’ve met a good cross section of people who seemed to be at various stages of trying to answer the same question; and with the exception of a crystallographer I met, most of them weren’t much further along than I.

One of the highlights of attending the workshops was in conversation with various people about the type, scale and complexity of just what they considered research data. Personally I’d not even begun to give any thought towards access or curation of non-born digital materials!

Secondly it’s raised my awareness of some of the projects and resources I might be able to tap into.

Certainly I suspect that the solution to managing research data is not going to come entirely from within any one institution or projects resources.

Finally, it’s given me an awareness of where I need some upskilling. Some of my repository and library born skills like metadata handling, collection management and information organisation are all going to come into play; as it seems are my finely honed advocacy skills (I remain unconvinced that the mass of academics will take to data archiving like ducks to water any more than they have to open access).

But beyond this there are clearly new areas I need to understand, or fresher concepts like minting DOIs that will be totally new.

From where I stand now I feel encouraged that rather than being on the shores of a vast unknown data archiving lake, I now have at least a canoe and a couple of trusty oars. There are going to be uncertain currents ahead, but at least now I can see how I might get to the other side. 

But I suspect there’s still a whole lot of paddling to go!