Because good research needs good data

IDCC15 What do we need? - Gathering RDM infrastructure requirements.

An IDCC event report on the Birds of a Feather session: Collecting research data management requirements, how can we collate experiences?

Jonathan Rans | 05 March 2015

The Birds of a Feather sessions are a new addition to the roster at IDCC and offer the opportunity for people to get together with like-minded delegates and really explore a specific issue in some depth. I attended a packed session discussing the issues around gathering researcher requirements for RDM infrastructure.

 

Angus Whyte – Data Asset Framework

Angus introduced the session by discussing where we stand in terms of our ability to identify infrastructure requirements and noted that while we do have tools we can use help, they are starting to look rather long in the tooth. We are now entering a stage where greater numbers of research institutions are aiming to provide suitable, locally-tailored infrastructure to support their researchers’ data management and are looking for robust ways to translate their requirements into services.

As both researchers and support staff become more knowledgeable about RDM the tools that facilitate their conversations need to become more sophisticated and nuanced. We have to be able to support subject librarians in asking the right kinds of questions of their researchers and most crucially we need to recognise and respond to the fact that those questions are very likely to vary with context.  Angus acknowledged that RDM lifecycle models are a fiction, but a useful one and a necessary intermediate on the road to more sophisticated models of the research context.

Angus then spoke about the DCC’s own Data Asset Framework tool, which has been used extensively across the HE sector to inform RDM policy and strategy development, its popularity driven at least in part by its simplicity and flexibility. However, it has tended to be used by people with a deep understanding of the issues who have tailored existing material to suit their circumstances. In this respect, the tool is not necessarily as accessible as it might be to a broader user-base looking for reusable examples of questions and answers. The tool has also been used almost exclusively for high-level requirements gathering and consequently suffers from a lack of discipline-specific questions. These are all issues that could potentially be addressed by developing a central DAF question bank.

 

Jake Carlson – Data Curation Profiles

Jake Carlson from the University of Michigan discussed the role of the library in data curation and the difficulties that lie in trying to engage with the process in a way that is meaningful and useful rather than simply intrusive. The Data Curation Profile toolkit was developed as a way of facilitating the discussion between the library and the research body with a view to understanding what it is that researchers are currently doing with their data and then contrasting that with what it is that they feel they need to do with it. The support services then have an opportunity to identify gaps and offer support to fill them.

The DCP tool is interview-based and aims to provide a deep understanding by capturing the researcher’s data story focussed on a single dataset. Although it was originally designed for STEM subjects, the tool is discipline agnostic and the project is compiling a directory completed profiles spanning a variety of disciplines.

The next version of the DCP toolkit will address the issue of how you go from what is essentially a paper-based exercise to something that takes advantage of more technical, online methodologies and will also be aiming to deliver changing question sets to help steward discussion through the lifecycle.

 

Anthony Beitz – Director UCT eResearch Centre

Anthony opened with the observation that it can be very difficult to characterise researchers’ needs, even with their full cooperation. In many cases, this is not an issue that they have a good understanding of – they themselves are unable to fully articulate their requirements, let alone communicate them to someone else. Even in cases where researchers can identify their support requirements for a project there is a good chance that those will be subject to change and, at present, there tends to be a single point of contact during the project lifecycle when this conversation occurs. It’s the nature of research that it pushes the envelope of what is understood and it’s hardly surprising that support requirements will change in response to that push. What’s needed, Anthony argues, is a continual assessment, an ongoing discussion that allows solutions to be led by researchers. Agile frameworks for discussion are required to deal with the complex and ambiguous research environment (eg SCRUM)

Anthony divides his researchers into two camps – peak and long-tail, as their distinctly different characters and requirements means that they have individual roles to play in RDM service development. Peak researchers can be defined as those with a national or international scope to their work, consequently they tend to be at the cutting edge of the field, are well-resourced and are looking for tailored solutions to a specific problem and are willing to invest in getting it right. Long tail researchers, in this case, describes pretty much everyone else. These groups are looking for RDM solutions that fits their funding  - on the whole this is infrastructure which is cheap or free at point of use. This means that long-tail researchers will look to their institution to provide the support they need and will be more willing to accept imperfect solutions if they can be delivered at low cost. Peak researchers will tend to lead the creation of a solution, gathering requirements from their peers and their own research group, with the most senior researcher channelling those findings and identifying development priorities. Compare this to requirements gathering from long-tail researchers where the community suggests requirements and solutions and collectively comes to agreement on which areas to prioritise for development.

 

Discussion

After the presentations, the session was thrown open to the floor, giving the audience the opportunity to discuss their experiences and the tools that they have been using to identify their own researchers’ requirements.

At the University of Pittsburgh library, they have been using the DCP toolkit to aid discussion with researcher focus groups and liked the participatory element of the design. As part of these workshops they have taken the unusual step of asking researchers to draw their research processes, an approach which has allowed them to capture elements which don’t traditionally appear in lifecycle models – for example the element of confusion which may arise at different stages.

There was general agreement that there is a need for a participatory style of service development but concerns were voiced that iterative surveys, delivered throughout the research lifecycle could lead to more pronounced survey fatigue than usual. Anthony Beitz broadly agreed with that, but pointed out that in order to deliver a deep understanding of RDM it is necessary to be closely embedded with research groups, an approach which offsets the fatigue associated with arms-length surveying.

As the discussion moved across topics one idea that kept cropping up was that of where the expertise lies, and who is actually best placed to define the community’s needs. Although clearly fundamental to the process, it can sometimes be difficult for researchers to clearly define their issues and requirements – take the estimation of storage volumes as an example; this is something that researchers have proved to be quite bad at putting an accurate figure on. If there is a knowledge gap here then it may well be that the only way to fill it is through practical experience of handling datasets, and to gain this we need to be getting them into repositories and working with them, gradually accumulating knowledge around those shared services.

Of course, on top of the issues of capability, there is also the question of engagement and the fact that there are quite a few researchers who aren’t really thinking about RDM very much at all. I was interested to hear that some people have gone so far as to have a codified procedure for deciding when to cut their losses and stop nagging! I’d be worried that the researchers would get to hear about it and act accordingly, but I do think it’s worth being pragmatic over questions of resource and return.

Wrapping up the session, the group were asked where they were collectively in terms of putting services in place – it was a show of hands so the figures are somewhat rough but of those that answered around half were asking open questions of researchers, a quarter had some services in place and another quarter were at the stage of putting datasets into repositories. Certainly further ahead than we were ten years ago with a positive direction of travel.