RDMF12: notes from breakout group 1: choosing research data solutions

30 January, 2015

Breakout group 1 discussed the issues around identifying research data solutions.  The initial proposal was around the researcher perspective on choosing solutions, but we quickly moved to a more holistic view, incorporating the views of the other parties involved in the research data management lifecycle.  We didn’t come up with a list of particular technical solutions. More tractably, our discussion looked at the features and issues that are worth considering when identifying a suitable research data solution.  Some thoughts that emerged from group discussion include:

  • Institutional solutions need to deliver what institutions need as well as what researchers need. This is a reminder that if we view any solution or approach purely from the perspective of a single stakeholder, we lose the overall picture – and for institutional solutions so be taken up and used effectively, they need to work at least reasonably well for every major stakeholder.
  • Some users want appropriate options for their institutional role or research discipline, whilst others may feel overwhelmed by having to make decisions and would prefer clear instructions about What You Need To Do. These two attitudes can be tricky to accommodate at the same time. Projects such as Incremental (2009-11, universities of Glasgow and Cambridge), found that it is useful to provide guidance to different levels of detail to cater for different levels of RDM awareness and confidence. In this way, the user finds the information unfolds to the required level of detail to be useful. Following these findings, the DCC provides RDM guidance online in a variety of ways such as briefing papers, ‘How To’ guides, the curation reference manual, etc., and European initiatives such as DigCurV have also found the value of stratified delivery of digital curation knowledge for better digital resource management. This doesn’t mean that it is hopeless to try and identify RDM solutions that will work across the institution, but more a reminder that users will need greater or lesser advocacy and support, no matter which solution is deployed.
  • We need to accept the complexity of the task. If we are increasingly seeing sustainable RDM as part of good research practice, then it is useful if we are prepared to engage intellectually with it, and be curious about how we can gain benefits from it. RDM aims to allow researchers and others to have easier access to more research data. This is an aim with universal benefit, but which will look different to different constituencies.  There is no magic wand or universal model. However, a great wealth of knowledge has accumulated that can help, particularly when dealing with the distinctive features of a given discipline. Initiatives such as re3data – which has now indexed more than 1,000 research data repositories worldwide – illustrate this point. Resources like this can be useful to subject librarians, research support staff and researchers when seeking appropriate examples and expertise. This wider landscape can provide useful information and context when considering which technical solution to choose.
  • There is uncertainty about how motivated researchers are to engage.  Some discipline areas and their related professional staff have much experience in sustainable RDM; others are coming to it only now in response to RCUK requirements. But initiatives such as the two Jisc MRD programmes (2009-11; 2011-13) have undertaken extensive requirements gathering which have established there is appetite as well as need for support and infrastructure for RDM, and demand for initiatives such as the DCC’s institutional engagement programme continue to demonstrate the interest of researchers and others.  A recent Knowledge Exchange report provides an insight into motivations in five different research disciplines. Clearly, research and advocacy activities can help establish which features would make a RDM solution attractive to your institution, and interoperable with the infrastructure of relevant partners.
  • It is important that institutions have a sense of the scale of the data held. This can help clarify the scale of the infrastructure currently required, and not just in terms of data storage. There is no obligation for any university to store all of the data it produces on-site. The data should, however, be sustainably stored somewhere, and the institution should know where that is.  Better exposure through promotion to relevant audiences can be achieved through deposit with a discipline-specific repository. The scale of data currently held and likely to be deposited in the future is also important to understand in order to develop accurate and helpful guidance, advocacy and policy which is relevant to the scale of the activity required.  Scale of data holdings is also critical to help make the case for an appropriate staffing level to deliver advocacy, training, advice and technical support.  All these considerations play a part when selecting RDM solutions for your particular institution and context.
  • And talking of discipline-specific repositories, these offer value in terms of discipline knowledge – it’s not just about the storage service.  Recent research by the DCC shows, even from a small sample, that discipline-specific datacentres in the UK have, in some cases, a history stretching back to 1969. Such resources know the intricacies of the discipline, what is likely to be useful to keep, how to prepare and present data and how to avoid many common difficulties.  These agencies provide a valuable fund of research expertise which can be of benefit to researchers, research support professionals and others.  RDM solutions should be considered in terms of the extent to which they are confluent with the good practice promulgated by these repositories. 
  • Preservation of workflows, software and tools should be considered along with the research data. The Software Sustainability Institute can advise  and the presentations at RDMF 11 will also be of interest to those looking for solutions to these issues.
  • Training and advocacy can stimulate demand for access to RDM solutions and infrastructure, and we may find use of discipline-specific examples particularly helps make the case to researchers more than generic training offerings.

A practical output identified by the session was to share requirements from our own institutions, as it is likely that the institutions in attendance are addressing similar questions. Another possible collaborative activity is the collation of examples from a wide variety of disciplines demonstrating good RDM practice. The DCC is currently working on a ‘data stories’ blog at http://datastories.jiscinvolve.org/wp/ in partnership with the Research Data Alliance Engagement Interest Group and Data Re-Use ‘Birds of a Feather’ Group.  Earlier work (2008-10) from the SCARP project may also be useful here.