Sharing RDM Services – Where to Now?

28 November, 2014

The RDMF12 meeting on 18/19 November ran under the banner “Linking Data and Repositories (and other systems)” so it was a natural topic for a breakout group to discuss opportunities for sharing these across institutions. We came up with a list of 5 take-home points, which are covered below. The case for and against a national data archive was the subject of some lively discussion and at the end of this article I’ve drawn some personal conclusions.

Helen McEvoy from University of Salford chaired our discussion. Mark Thorley offered his comments from a funders’ perspective, as NERC Data Management Coordinator. Views from ‘the North’ were well represented in this group. With Manchester’s Mary McDerby and Leeds’ Bo Middleton present, we had perspectives from the N8 Research Partnership and the White Rose university consortium, as well as the larger Northern Collaboration, which joins the libraries of Salford and 25 other Northern institutions.

A few of us were from farther flung corners like Glasgow, Belfast, not to mention Milton Keynes. So we noted that there increasing signs of pooled efforts on the RDM front, for example regional consortia like MidPlus, GW4, and M5 have been active in this space, and there have been relevant joint efforts by London-based, Welsh and Scottish institutions.

Moving on to those five points and the discussion about them, I have slightly altered the wording we used and generally avoided naming names, so if anyone present feels misrepresented please let me know! The first three points outlined different forms of joint action. The fourth and fifth highlighted some enablers and barriers, with funding issues naturally featuring high on both lists.

1.    Coordinating action to make best use of resources

RDMF and other DCC activities promote sharing of good practice. Beyond that, institutions need more opportunities to make contact and pursue shared interests, and regional consortia can facilitate that.  The N8 group for example has offered a forum to explore shared interests in aspects of EPSRC data policy compliance, including:

  • Discovery metadata for research data catalogues
  • Preservation infrastructure
  • Training provision
  • Enterprise architecture modelling, to help compare universities on which parts of the lifecycle they are covering, and map their service maturity

Purchasing consortia were also seen as offering a lot of potential for the sector. These could help drive down costs with commercial suppliers, potentially on a national basis, taking a leaf from the Janet cloud framework agreements.

2.    Making the case for further collaborative solutions

Beyond making efficient use of established resources, institutions are also looking to share experience and expertise to help lobby for shared solutions in other areas, particular when there is a use case for a national solution.

Data repositories require development resource of course, and some of those present had been involved in discussion on sharing that.  These had mixed prospects; experience in established regional consortia sounded more positive than those between pairs of universities, where the initial effort to understand different research profiles and governance structures can be a hurdle.  The potential for shared data catalogues seems stronger than for shared data repositories. The barrier mentioned here was the conflicting requirements institutions may have around data governance issues, particularly for ethics clearance and stewardship of personal data. Sharing metadata between institutions has fewer of these barriers.

There was less consensus about how nationally shared services can fill the gaps between the funder-supported national data centres/repositories.  One view on this is that funders should support development of a national data archive to fill the disciplinary gaps in repositories, for example in engineering, physical sciences and the arts. Other possibilities were raised; Mark Thorley suggested that since data gain value by being held alongside data of the same type, there could be more provision of national foci for particular data types. Another take on this, which I’ll come back to in my conclusions, is that greater specialisation by universities on their particular research strengths could lead to more exchange of ‘data assets’ between institutions.

3.    Sharing more of the infrastructure

Shared infrastructure development pools the risks in developing new resources, and these may take different forms. N8 have been considering the softer end of infrastructure - shared training activity. Discipline-oriented training materials seem a good case for collaborative development. There is some to build on; including Jisc MRD output. It can be a daunting prospect for any single institution to go beyond generic training offerings. It seems an obvious focus for collaboration, and a number of actors could be instrumental in that happening, for example Learned Societies and Doctoral Training Centres. 

For many present the ‘poster-child’ for shared services is Research Data Australia, whose metadata catalogue offers a common point of discovery for institutional outputs. The N8 work on a shared schema for discovery metadata started with a more regional focus, and has gained some national impact by building on Jisc’s pump-priming of the EPrints Recollect plug-in for research data. This may lead to a shared metadata catalogue, along similar lines to that which the White Rose Consortium already has for publications.

Meanwhile the Jisc Research Data Registry and Discovery Service (RDRDS), coordinated by DCC colleague Laura Molloy, looks set to offer a national data catalogue (including a cross-walk from Recollect). There are no doubt opportunities for other regional catalogues as well. The RDRDS might offer the kind of thematic views that Research Data Australia offers on its contents.

4.    Enablers: more convergence on policy and data repository support

We talked about steps that could help with shared services. One rhetorical question was “why do we need seven different national data policies?” We heard that RCUK are working towards better-aligned guidance on the data policy principles they issued back in 2011, which will be very welcome. Another form of support asked for was “an equivalent to the Jisc Repositories Support Project for data repositories”.

5.    Barriers: ambiguity on funding sources

There were other more open issues, or perhaps ‘open sores’, mostly around funding and costing. A common view was that many institutions have already invested a lot in RDM services, have heard the sound of wheels being reinvented or unnecessarily replicated, and would like to see more cross-institutional support being funded. What they do not want however is to contribute more institutional funding than they do already. It is difficult enough, we heard, to justify new infrastructure for RDM. The scope for including costs in the institutional overheads that may be charged to externally funded projects seems to be limited, not least by penalties from funding bodies.  One answer may be to share information on costs that funders have allowed to be charged as direct costs to projects. RCUK are quite clear that they want to see RDM costs budgeted for in bids [1]. The 4C project has put in place a forum to exchange examples of direct and indirect costs of data management and curation [2].

A collaboration model, and conclusions

Collaborations can cover a diverse range of activity. To make sense of these and help articulate the potential Bo Middleton mentioned a useful typology from OCLC - the ‘collaboration continuum’ [3].  As shown in the figure below this envisages progressively greater levels of shared investment, risk-taking and benefit, from contact to build initial trust, through cooperation on ad-hoc activity to more formal coordination of that. This can engender collaboration – joint activity that wasn’t there before, and if sustained to the point that it’s ‘extensive, engrained and assumed’ may potentially lead on to convergence of the activity across institutions.

 

This model and OCLC description looks like a useful aid to realistic thinking about the prospects for moving RDM services along this continuum. There will be a lot more of this needed to fill gaps in the national picture. The Research Data Spring, which was introduced earlier in the day by Daniela Duca, has a strong steer towards proposals for shared services and offers an opportunity to gauge support for different options. Jisc is itself being recast as a shared service of course, and DCC with it, and we are interested in supporting proposals wherever we can.

What can be scaled to where?

In his summing up of the RDMF event, Simon Hodson pinpointed two different ways of seeing institutional research data collections. One [4] sees the university data repository as the ‘banker of last resort’, offering depositors of data assets a minimal guarantee of access in the long-term regardless of whether there is another data bank to deposit it in. The other [5] sees research data as ‘the new special collections’, with all the connotation of investment to add value and showcase the more valuable digital assets.

These are key ways of framing the way forward for RDM service sharing. Alongside other metaphors (such as data publication) they encourage different ways to consider the potential for innovation. For reasons below I think the ‘banker of last resort’ metaphor entails centralisation, while ‘digital special collections’ suggest a more decentralised approach to stewardship. Rather than being alternatives, I guess that some services – and data - will converge around centralised cross-institutional ‘common data services’, including commercial ones. Others will be decentralised within and between institutions, and more directly involve researchers’ content expertise.

Curation activities that involve lots of similar data, or similar operations of the same type are the more obvious candidates for automation, e.g. format migration, data replication, transformation and analytics. Other curation activities that need human expertise will call for more open-ended platforms to pull that expertise into and across institutional boundaries, and evolve new workflows for analysis of pooled data objects. Much of that will hopefully be open data, with sharing of the analytic skill that adds value on the time-honoured mutual back-scratching model.

I don’t personally see these trends heading in the direction of the centralised national data archive that some call for. I’m not entirely sure that those in the community who ask for a national archive solution see it as producer of ‘digital special collections’. The more likely result would be a site for (in Mark Thorley’s lovely phrase) ‘digital landfill’.  From conversations other occasions with colleagues in the UKDA, NERC Data Centres and ADS who produce ‘digital special collections’, these services have taken decades to identify secondary markets for the more reusable data types in their respective domains.

They are highly selective and specialised, and offer high levels of added value (see for example Neil Beagrie et al’s report for Jisc [6]). They do not serve the output of entire disciplines, or even entire funders. I cannot see any national service providing their level of added value for any and all institutional output.  The high level of curation required to showcase ‘special collections’ needs specialist knowledge in buckets, not just big buckets of storage.

Perhaps I am missing something but I do not see how a national data archive could sustain a centralised service to fill the gaps between data centres, except by limiting its offer to lower level preservation tasks, those  that don’t need people to apply knowledge of what the stuff is or how to make it more useful. If we are looking for shared services to coordinate production of 'digital special collections' we could take a broader view, towards interoperablity with European Research Infrastructures (ERICS). Large-scale disciplinary and inter-disciplinary challenges, such as preserving world languages or biodiversity, are the kinds of things that marshall large-scale resources for digital special collections.

Shared services offering ‘bit-level’ preservation with little added stewardship are already there in Arkivum’s data archiving service, for example, but this is content-agnostic. You’ll get your files back, but you won't get help to produce data collections.  No doubt we’ll see more such ‘common data services’ emerging, for example to allow more effective collaborative management of storage and access to it.  The EUDAT project for example has these in the pipeline [7].

Off-loading your institution’s least-likely-to-be-reused data to a generic self-deposit archive makes some sense. Depositing your best research data externally is also very good sense, but only if they are offering to do something with it. (At one time I suggested the DCC strapline ought to be ‘don’t sit on your digital assets’, which didn’t take off for some reason!) Anyway, if all an external repository does is sit on your files, someone else might just find them and reuse them, and good luck to them. The chances are it won’t be researchers in your institution, unless you have done something to make them aware of the opportunity, or they find it through disciplinary channels. That is why institutional data catalogues are necessary. Institutional profiles on national, or regional, or disciplinary catalogues will be even better because they’ll facilitate more discovery and analytics.

Another way is the route Simon Hodson suggested in his closing remarks, more specialisation by institutions (who know their research strengths) and better engagement with researchers. So one way that DCC and others in the community might help develop shared services is to promote sharing of ‘academic engagement models’. Institutional RDM services need to trust some of their researchers with some curation, at least some of the time. There is I think something to be learned from trusted repository certification – perhaps something like a ‘data seal of approval’ that institutions can offer research groups or departments?

As researchers expectations of the RDM service rise they will demand more. Can institutions really deliver more service cost-effectively by centralising them through shared services?  For some preservation tasks yes no doubt they can, but I wonder if centralisation is really the route to driving down institutional RDM costs that some think it is. Getting researchers more involved in curating their own data just might be.  

One thing DCC could perhaps support is to pass on lessons from ‘front-office, back-office’ models that are found to work elsewhere. For example how can regional consortia learn from the model operated by Research Data Netherlands, or repositories operating regionally in the US and elsewhere?  Institutions themselves can look to their own research groups to be the 'front office' for datasets that are centrally deposited (see our case study of Oxford Brookes University's work with their Sonic Art Research Unit for example).  The interesting question for me is how to help institutions grow small domain archives, and join up these ‘front-offices’ across institutions. Is anyone doing this? If so please let us know!

[1] RCUK has given general guidance on what costs may be included in grants, available at blogs.rcuk.ac.uk/2013/07/09/supporting-research-data-management-costs-through-grant-funding

[2] 4C Project Curation Costs Exchange, available at: www.curationexchange.org

[3] Zorich, Diane, Günter Waibel and Ricky Erway, 2008. Beyond the Silos of the LAMs: Collaboration Among Libraries, Archives and Museums Report produced by OCLC Research. Published online at: www.oclc.org/research/publications/library/2008/2008-5.pdf

See also Waibel, Günter, 2010 Collaboration Contexts: Framing Local, Group and Global Solutions. Report produced by OCLC Research. Published online at: www.oclc.org/research/publications/library/2010/2010-09.pdf

[4] Attributed to Jeff Haywood, Vice Principal of Knowledge Management at University of Edinburgh at a previous RDMF, in DCC blog article “A conversation with the funders” available at: www.dcc.ac.uk/blog/conversation-funders

[5] Attributed to Sayeed Choudhury in Palmer, C., Cragin, M., MacMullen, J., Chao, T., Renear, A., Dubin, D., Sacchi, S., Michael Welge, M., & Auvil, L. (2010). The Data Conservancy: Research on data curation and repositories [PowerPoint slides]. Retrieved from: groups.lis.illinois.edu/guest_lectures/showcase10/palmer.ppt

[6] Jisc (2014): The value and impact of data sharing and curation - synthesis of three recent UK studies. www.jisc.ac.uk/publications/reports/2014/data-sharing-and-curation.aspx

[7] EUDAT Services & Support. Available at: www.eudat.eu/services

More about

#rdmf12, N8, RDM, shared services